Research

The minds behind Feature Labs are at the forefront of data science research. You can watch our videos and see a sampling of our peer-reviewed academic research below.

Research

The minds behind Feature Labs are at the forefront of data science research. You can watch our videos and see a sampling of our peer-reviewed academic research below.

Videos

Towards increasing data scientists’ productivity by 1000x

Label, Segment, Featurize : a reusable framework for prediction engineering

Trane: automatically formulating and solving thousands of prediction problems

Peer reviewed papers

Deep Feature Synthesis: Towards automating data science endeavors

Authors: James Max Kanter, Kalyan Veeramachaneni
Published in: IEEE International Conference on Data Science and Advanced Analytics 2015

In this paper, we develop the Data Science Machine, which is able to derive predictive models from raw data automatically. To achieve this automation, we first propose and develop the Deep Feature Synthesis algorithm for automatically generating features for relational datasets. The algorithm follows relationships in the data to a base field, and then sequentially applies mathematical functions along that path to create the final feature. Second, we implement a generalizable machine learning pipeline and tune it using a novel Gaussian Copula process based approach. We entered the Data Science Machine in 3 data science competitions…

What would a data scientist ask? Automatically formulating and solving prediction problems

Authors: Benjamin Schreck, Kalyan Veeramachaneni
Published in: IEEE International Conference on Data Science and Advanced Analytics 2016

In this paper, we designed a formal language, called Trane, for describing prediction problems over relational datasets, implemented a system that allows data scientists to specify problems in that language. We show that this language is able to describe several prediction problems and even the ones on Kaggle- a data science competition website. We express 29 different Kaggle problems in this language. We designed an interpreter, which translates input from the user, specified in this language, into a series of transformation and aggregation operations…

Label, Segment, Featurize: a cross domain framework for prediction engineering

Authors: James Max Kanter, Owen Gillespie, Kalyan Veeramachaneni
Published in: IEEE International Conference on Data Science and Advanced Analytics 2016

In this paper, we introduce “prediction engineering” as a formal step in the predictive modeling process. We define a generalizable 3 part framework — Label, Segment, Featurize (L-S-F) — to address the growing demand for predictive models. The framework provides abstractions for data scientists to customize the process to unique prediction problems. We describe how to apply the L-S-F framework to characteristic problems in 2 domains and demonstrate an implementation over 5 unique prediction problems…

Want to push data science automation forward?

Reach out to careers@featurelabs.com

Get in touch

Feature Labs is a predictive analytics platform created to make data science automation a strategic component of any organization. Contact us to learn how we can help you succeed with data science and predictive modeling endeavors.

Follow us on

Feature Engineering vs Feature Selection

All machine learning workflows depend on feature engineering and feature selection. However, they are often erroneously equated by the data science and machine learning communities. Although they share some overlap, these two ideas have different objectives. Knowing...

read more

Feature Engineering: Secret to data science success

Prior to starting Feature Labs, I researched data science automation in the Data to AI Lab at MIT. Unlike most data scientists who work in a single domain, our group had sponsors from a wide range of industries. This gave us the unique opportunity to develop innovative solutions to use with the diverse problems we worked on.

read more

Learn Feature Engineering in MIT’s Big Data Analytics Course

Feature Labs is pleased to share that our open source library, Featuretools, is being used in a new MIT course on Data Science and Big Data Analytics. Feature engineering is a vital skill for all data scientists, so we are excited to provide the library that enables teaching it alongside other important machine learning topics for the first time.

read more

Applying Data Science Automation to Better Predict Credit Card Fraud

If you use a credit card, you probably know the feeling of having your card declined due to a suspected fraudulent transaction. An industry report from 2015 found that one out of every six legitimate cardholders experienced at least one declined transaction because of inaccurate fraud detection in the past year. That makes fraud detection an expensive problem for issuers: Those declined transactions lead to nearly $118 billion dollars in losses on an annual basis.

Even though numerous machine learning approaches have been developed in the past to address fraud, newly introduced data science automation platforms like Feature Labs give us a reason to revisit the problem. And now, any organization can see the power of automation for themselves using our just announced developer library, Featuretools.

read more

Featuretools at CMU’s Learn Lab

Feature Labs visited Carnegie Mellon University this past July to participate in the 17th annual Simon Initiative’s LearnLab Summer School on Educational Data Mining. During the program we introduced teams to Featuretools, our open source feature engineering library. You can find the complete details in the Featuretools blog post, but here the highlights:

read more

About this blog

Thoughts, reflections, and examples of how organizations can take advantage of data science technologies today from the minds behind Feature Labs.

Follow us on

 

Get in touch

 

Feature Labs is a predictive analytics platform created to make data science automation a strategic component of any organization. Contact us to learn how we can help you succeed with data science and predictive modeling endeavors.

Follow us on