Research

The minds behind Feature Labs are at the forefront of data science research. You can watch our videos and see a sampling of our peer-reviewed academic research below.

Research

The minds behind Feature Labs are at the forefront of data science research. You can watch our videos and see a sampling of our peer-reviewed academic research below.

Videos

Towards increasing data scientists’ productivity by 1000x

Label, Segment, Featurize : a reusable framework for prediction engineering

Trane: automatically formulating and solving thousands of prediction problems

Peer reviewed papers

Deep Feature Synthesis: Towards automating data science endeavors

Authors: James Max Kanter, Kalyan Veeramachaneni
Published in: IEEE International Conference on Data Science and Advanced Analytics 2015

In this paper, we develop the Data Science Machine, which is able to derive predictive models from raw data automatically. To achieve this automation, we first propose and develop the Deep Feature Synthesis algorithm for automatically generating features for relational datasets. The algorithm follows relationships in the data to a base field, and then sequentially applies mathematical functions along that path to create the final feature. Second, we implement a generalizable machine learning pipeline and tune it using a novel Gaussian Copula process based approach. We entered the Data Science Machine in 3 data science competitions…

What would a data scientist ask? Automatically formulating and solving prediction problems

Authors: Benjamin Schreck, Kalyan Veeramachaneni
Published in: IEEE International Conference on Data Science and Advanced Analytics 2016

In this paper, we designed a formal language, called Trane, for describing prediction problems over relational datasets, implemented a system that allows data scientists to specify problems in that language. We show that this language is able to describe several prediction problems and even the ones on Kaggle- a data science competition website. We express 29 different Kaggle problems in this language. We designed an interpreter, which translates input from the user, specified in this language, into a series of transformation and aggregation operations…

Label, Segment, Featurize: a cross domain framework for prediction engineering

Authors: James Max Kanter, Owen Gillespie, Kalyan Veeramachaneni
Published in: IEEE International Conference on Data Science and Advanced Analytics 2016

In this paper, we introduce “prediction engineering” as a formal step in the predictive modeling process. We define a generalizable 3 part framework — Label, Segment, Featurize (L-S-F) — to address the growing demand for predictive models. The framework provides abstractions for data scientists to customize the process to unique prediction problems. We describe how to apply the L-S-F framework to characteristic problems in 2 domains and demonstrate an implementation over 5 unique prediction problems…

Want to push data science automation forward?

Reach out to careers@featurelabs.com

Get in touch

Feature Labs builds tools and API’s to deploy impactful machine learning solutions by combining open source software and proprietary algorithms for automated feature engineering. Contact us to learn how we can help you succeed with data science and predictive modeling endeavors.

Follow us on

 

Get in touch

 

Feature Labs builds tools and API’s to deploy impactful machine learning solutions by combining open source software and proprietary algorithms for automated feature engineering.

Follow us on