Research
The minds behind Feature Labs are at the forefront of data science research. You can watch our videos and see a sampling of our peer-reviewed academic research below.
Videos
Towards increasing data scientists' productivity by 1000x
Label, Segment, Featurize : a reusable framework for prediction engineering
Trane: automatically formulating and solving thousands of prediction problems
Peer reviewed papers
Deep Feature Synthesis: Towards automating data science endeavors
Authors:  James Max Kanter, Kalyan Veeramachaneni
Published in:  IEEE International Conference on Data Science and Advanced Analytics 2015
In this paper, we develop the Data Science Machine, which is able to derive predictive models from raw data automatically. To achieve this automation, we first propose and develop the Deep Feature Synthesis algorithm for automatically generating features for relational datasets. The algorithm follows relationships in the data to a base field, and then sequentially applies mathematical functions along that path to create the final feature. Second, we implement a generalizable machine learning pipeline and tune it using a novel Gaussian Copula process based approach. We entered the Data Science Machine in 3 data science competitions...
What would a data scientist ask? Automatically formulating and solving prediction problems
Authors:  Benjamin Schreck, Kalyan Veeramachaneni
Published in:  IEEE International Conference on Data Science and Advanced Analytics 2016
In this paper, we designed a formal language, called Trane, for describing prediction problems over relational datasets, implemented a system that allows data scientists to specify problems in that language. We show that this language is able to describe several prediction problems and even the ones on Kaggle- a data science competition website. We express 29 different Kaggle problems in this language. We designed an interpreter, which translates input from the user, specified in this language, into a series of transformation and aggregation operations...
Label, Segment, Featurize: a cross domain framework for prediction engineering
Authors:  James Max Kanter, Owen Gillespie, Kalyan Veeramachaneni
Published in:  IEEE International Conference on Data Science and Advanced Analytics 2016
In this paper, we introduce "prediction engineering" as a formal step in the predictive modeling process. We define a generalizable 3 part framework --- Label, Segment, Featurize (L-S-F) --- to address the growing demand for predictive models. The framework provides abstractions for data scientists to customize the process to unique prediction problems. We describe how to apply the L-S-F framework to characteristic problems in 2 domains and demonstrate an implementation over 5 unique prediction problems...
Want to help push data science automation forward?