Machine Learning Articles

Machine learning, deep learning, artificial intelligence... The science of getting machines to perform actions without explicitly programming them to do so can be intimidating for the uninitiated. These machine learning articles aim to unpack the black box for beginners, with introductions to overall concepts and tutorials for training a model of their own.

Reinforcement Learning with Prediction-Based Rewards

When a reinforcement learning agent was incentivized to be curious and avoid "boredom” while playing Mario, it discovered warp levels, how to defeat bosses, and more. - OpenAI

Deepfake-busting apps can spot even a single pixel out of place

Speaking of AI-generated imagery... it's so easy to use that anyone can make a fake video or image, no matter their motives. Luckily, technology for discerning true images from manipulated creations is catching up. - MIT Technology Review

Generating custom photo-realistic faces using AI

Generating realistic images based on descriptions is much harder than describing an image—for humans and computers. But this new generative model is making that task easier. - Insight

How do you like your ML career?

“Over the last few years ML has lost some of its luster in my mind - the hype around deep learning and ML has added a lot of noise into the system, and for someone who cares about doing good science that's been hard for me.” - r/MachineLearning

Mask R-CNN Benchmark

A fast and modular implementation for Faster R-CNN and Mask R-CNN written entirely in PyTorch 1.0. It's 30% quicker than mmdetection during training. - Facebook Research

How Three French Students Used Borrowed Code to Put the First AI Portrait in Christie’s

The code used to generate this portrait is mostly the work of another artist and programmer. This raises a question about attribution in the open and collaborate AI art community, which is taking its first steps into mainstream attention. - The Wall Street Journal

Deepfake Videos Are Getting Real and That's a Problem

Changing photos used to be tedious and time-consuming. Fast-forward to now: nearly anyone can use deep learning and AI to generate incredibly realistic “fake videos”—President Obama saying something he never said, for instance. - The Wall Street Journal

Training Neural Nets on Larger Batches: Practical Tips for 1-GPU, Multi-GPU & Distributed setups

How can you train your model on large batches when your GPU can’t hold more than a few samples? Let's find out. - Hugging Face

Artwork Personalization at Netflix

Ever notice how the preview image for the same show or movie on Netflix changes whenever you log back in? Here's a peek into the system that figures out which piece of artwork is the best for convincing a particular member why that title is “for them.” - The Netflix Tech Blog

Data: A key requirement for your Machine Learning (ML) product

For all the PMs out there: here are some tips for how to talk about data in your Product Requirement Document for a machine learning product. - The Lever

A Review of the Neural History of Natural Language Processing

It's kind of crazy that neural network NLP is now old enough to have its own historical timeline. This post condenses about 15 years’ of work into eight milestones that impacted how these technologies are used today. - aylien

Introduction to Machine Learning for Coders: Launch

This new course uses modern tools and libraries, including python, pandas, scikit-learn, and pytorch. Unlike many educational materials in the field, this approach is “code first” rather than “math first.” -

Why building your own Deep Learning Computer is 10x cheaper than AWS

Avoid hefty cloud GPU costs by building a computer from scratch. - The Mission

Tabular Data in Scikit-Learn and Dask-ML

Take advantage of Scikit-Learn's latest improvements for working with tabular data. - datas-frame

Anatomy of an AI System

“The stack that is required to interact with an Amazon Echo goes well beyond the multi-layered ‘technical stack’ of data modeling, hardware, servers and networks. The full stack reaches much further into capital, labor and nature, and demands an enormous amount of each. The true costs of these systems – social, environmental, economic, and political – remain hidden and may stay that way for some time.” - Anatomy of an AI System

Help! I can’t reproduce a machine learning project!

Reproducibility breaks down in three main places: the code, the data and the environment. This guide should help you narrow down where your reproducibility problems are, so you can focus on fixing them. - No Free Hunch

Retracing your steps in Machine Learning: Versioning

New prediction systems are fragile things. Change one thing, and the accuracy of the model can drop dramatically, leading to a long troubleshooting process to find the root cause. Skip the headache with this guide to building a robust versioning system for your ML projects. - The Lever

Human translators are still on top—for now

Machine translation works well for sentences. For full documents? Not so much. - MIT Technology Review

No Machine Learning in your product? Start here

Just how much does a product owner need to know about machine learning? A Google PM shares his experience integrating machine learning into an existing product: Google Forms. - The Lever

VerbiAge: Using NLP to help writers craft age-specific writing

This app for tailoring a book’s description for a target K-12 age is a nice example of how machine learning can aid in creative tasks. - Insight

VerbiAge: Using NLP to help writers craft age-specific writing

This app for tailoring a book’s description for a target K-12 age is a nice example of how machine learning can aid in creative tasks. - Insight

What HBR Gets Wrong About Algorithms and Bias

This post injects some much-needed nuance into the biased algorithms discussion: humans vs machines is not a helpful framing and most critics of unjust bias aren’t anti-algorithm. -

Learning Meaning and Semantics in Natural Language Processing

A few weeks ago, data science Twitter spun out a fascinating mega-thread on NLP meaning and semantics. Since Twitter threads can be tricky to parse after-the-fact, this summary, interactive tweet tree, and commented map provide three entry points into the discussion. - Hugging Face

Differentiable Image Parameterizations

This powerful, under-explored tool for neural network visualizations and art produces vibrant images that look like they came straight out of Annihilation. - Distill

ACL 2018 Highlights: Understanding Representations and Evaluation in More Challenging Settings

This post digs into two themes of the Association for Computational Linguistics 2018 conference: gaining a better understanding what NLP models capture and to expose them to more challenging settings. - Sebastian Ruder

Machine Learning Glossary

Find yourself dragged under by wave after wave of machine learning jargon? Part of Google's Machine Learning Crash Course, this glossary provides plain-English descriptions of the terms you've heard thrown around by ML experts, without sacrificing accuracy. - Google

Reinforcement learning’s foundational flaw

“Does it really make sense to start learning a new skill based only on its reward signal, with neither prior experience nor higher-level instruction?” - The Gradient

Feature-wise transformations

Many real-world problems require integrating multiple sources of information. Feature-wise transformations offer a way to effectively capture and leverage the relationship of various sources, across a wide range of problem settings like image recognition, reinforcement learning, and style transfer. - Distill

What do machine learning practitioners actually do?

“Any solution to the shortage of machine learning expertise requires answering this question: whether it’s so we know what skills to teach, what tools to build, or what processes to automate.” -

AdamW and Super-convergence is now the fastest way to train neural nets

It’s time to give Adam another go. -

Papers with Code

A searchable site that links machine learning papers on ArXiv with code on GitHub. - Papers with Code

Model Tuning and the Bias-Variance Tradeoff

This visual intro to machine learning covers how errors can arise due to assumptions that are overly simple (bias) or overly complex (variance). - R2D3

Gender Shades

This evaluation compares how well IBM, Microsoft, and Face++ products are able to classify gender across skin types. All companies perform better on lighter subjects as a whole than on darker subjects as a whole with an 11.8% - 19.2% difference in error rates, and all companies perform worst on darker females. - Joy Buolamwini

Why the Future of Machine Learning is Tiny

“I’m convinced that machine learning can run on tiny, low-power chips, and that this combination will solve a massive number of problems we have no solutions for right now.” - Pete Warden

Machine learning predicts World Cup winner

Researchers have predicted the outcome after simulating the entire soccer tournament 100,000 times. (Good news awaits if you’re pulling for Brazil, Germany, or Spain!) - MIT Technology Review

How The New York Times Uses Software To Recognize Members of Congress

The most interesting part of this project isn't the models used (Amazon's Rekognition API), but the practical considerations the team faced when introducing the “Who the Hill” app to the real world: poor lighting for photos in the Capitol halls, bad cell phone reception, and celebrity doppelgängers. - Times Open

A Developer’s Guide to Building AI Applications

O'Reilly and Microsoft collaborated on a free e-book that walks you through the process of building intelligent cloud-based bots (with relevant code samples available on GitHub). - Microsoft Machine Learning Blog

Launching Cutting Edge Deep Learning for Coders: 2018 edition

Part 2 of’s free deep learning course is here! All you need is high school math and 1 year of coding experience. -

Why you need to improve your training data, and how to do it

When you use deep learning as part of an application, getting better training data is vastly more effective than making model adjustments. - Pete Warden

Smart Compose: Using Neural Networks to Help Write Emails

The engineers behind Smart Compose—a Gmail feature that offers sentence completion suggestions as you type—dig into how they tackled the challenges of fairness and privacy, latency, and scale. - Google AI Blog

Feature Engineering and Selection: A Practical Approach for Predictive Models

This book on predictive modeling is about 60% done and the authors are looking for feedback. The section on Engineering Numeric Predictors alone is fantastic. - Max Kuhn and Kjell Johnson

Qualitative before Quantitative: How Qualitative Methods Support Better Data Science

“Have you ever been embarrassed by the first iteration of one of your machine learning projects, where you didn’t include obvious and important features? In the practical hustle and bustle of trying to build models, we can often forget about the observation step in the scientific method and jump straight to hypothesis testing.” - Indeed Data Science

Picking Trending Topics and Celebrities Using Machine Learning

The machine learning engineers at Conde Nast applied their expertise to help Vanity Fair’s writers and editors better craft stories that have a broad, meaningful impact. - Conde Nast Technology

Get Started with Eager Execution in TensorFlow

The folks at TensorFlow are putting their tutorials directly into Google Collab notebooks (which requires zero setup to run!). If you've ever wanted to learn more about machine learning, this time is now. Especially since a recent survey suggests that most data scientists lack advanced machine learning expertise. - TensorFlow

Artist + AI

Here's a new Twitter account for you to follow. This artist combines her hand-drawn work with generative adversarial networks (GANs) to create something completely new. - Helena Sarin

Demystifying Docker for Data Scientists – A Docker Tutorial for Your Deep Learning Projects

Is Docker really the best thing since sliced bread? Find out in this tutorial, which covers the basics of how to interact with Docker containers and create custom Docker images for your AI workloads. - Microsoft's Machine Learning Blog

The Building Blocks of Interpretability

This article really gets you inside a neural network's “head” by explaining the thought process as it decides between two labels for an image, like a bowtie and a pair of sunglasses. - Distill

The Malicious Use of Artificial Intelligence

This 101-page report “surveys the landscape of potential security threats from malicious uses of artificial intelligence technologies, and proposes ways to better forecast, prevent, and mitigate these threats.” Divvy it out across your commutes and moments of downtime this week. -

Descriptive mAchine Learning EXplanations (DALEX)

Unpack some black boxes with this handy cheatsheet for understanding how complex ML models work. - Przemyslaw Biecek

Manifesto for Data Practices

Give this a read, whether you sign it or not. -

So, How Many ML Models You Have NOT Built?

“What will put us out of our job is Machine Learning Overkill. I have seen implementation of Machine Learning algorithms to very frivolous problems and worse still the companies have invested heavily into the idea. It is a ticking time bomb. The moment the companies realize that the ROI is negative, they will shun the Data Science practice altogether.” - Towards Data Science

THREAD: How computer vision and natural-language processing systems reflect societal stereotypes

A rabbit hole worthy of your time: various types of machine learning bias as tracked by academic papers. - Arvind Narayanan

Exploring Recommendation Systems

In practice, recommenders don’t always work as well as we’d like them to. This post sets out to discover why. - FastForward Labs

Turning Design Mockups Into Code With Deep Learning

Ever wish you could automate the front-end engineering process? Here’s how to teach a neural network to code a basic HTML and CSS website from a design mockup. - FloydHub

Learning Curves for Machine Learning

How do you diagnose bias and variance? And what actions should you take once you’ve detected these errors? - Dataquest

Machine Learning: The High-Interest Credit Card of Technical Debt

There’s no such thing as a free machine learning project. Avoid or refactor these risk factors and design patterns to keep technical debt from piling up. - Research at Google

2017: The year AI beat us at all our own games

“Over the past 12 months AI crossed a series of new thresholds, finally beating human players in a variety of different games, from the ancient game of Go to the dynamic and interactive card game, Texas Hold-Em Poker.” - New Atlas

Deep Learning Achievements Over the Past Year

Carve out some time in your holiday schedule to explore 2017's most exciting developments in text, voice, and computer vision technologies. - Stats & Bots

Deep Learning Achievements Over the Past Year

Carve out some time in your holiday schedule to explore 2017's most exciting developments in text, voice, and computer vision technologies. - Stats & Bots

How many images do you need to train a neural network?

The technically correct answer is: “It depends.” The ballpark answer is: “1,000 representative images for each class.” (With some caveats of course.) - Pete Warden

The U.S. Leads in Artificial Intelligence, but for How Long?

Government policies such as the tax bill, reduced funding, and tightening of rules on immigration for international researchers threaten the U.S.’s advantage in AI. - MIT Technology Review

NIPS 2017 — Highlights

If you didn’t attend the conference on Neural Information Processing Systems last week, never fear! Catch up on the latest in AI with these day-by-day summaries. - Insight Data

Improving Palliative Care with Deep Learning

80% of Americans prefer to spend their final days in their home, but only 20% actually do. This 18-layer deep neural network identifies hospitalized patients with a high risk of death in the next 3-12 months, so they can get access to palliative care sooner. - Standford ML Group

Innovating Faster on Personalization Algorithms at Netflix Using Interleaving

“The interleaving approach allows us to quickly prune down the initial set of ranking algorithms to the most promising candidates, enabling us to conduct experiments a rate much faster than traditional A/B testing to identify winning ideas.” - Netflix Technology Blog

[VIDEO] Livecoding Madness: Let’s Build a Deep Learning Library

This is interesting on two levels: “how to build a deep learning library” and “how someone who’s not me writes Python” (in this case, the answer is: incredibly fast). - Joel Grus

Fairness Measures

Awareness of the bias of algorithms is important, but here’s a way to actually do something about it. Run your dataset through this Python package and you’ll get back a measure that quantifies discrimination within that dataset. - Fairness Measures

The era of easily faked, AI-generated photos is quickly emerging

Nvidia’s researchers trained algorithms on 30,000 images of celebrities, and it’s nearly impossible to tell the generated images from the real ones. - Quartz

Scalable Machine Learning (Part 1)

What do you do when your training dataset fits in memory, but the dataset you're making predictions on doesn't? This post identifies where the usual pandas and scikit-learn for in-memory analytics workflow breaks down and offers some solutions for scaling out to larger problems. - Tom Augspurger

Can Neural Nets Detect Sexual Orientation? A Data Scientist’s Perspective

Dig into the data behind Stanford's controversial paper Deep Neural Networks Can Detect Sexual Orientation From Faces. -

My Neural Network isn't working! What should I do?

11 mistakes you may make while implementing a neural network—and how to fix them. - Daniel Holden

Train, Score, Repeat, Watch Out! Zillow's Andrew Martin on modeling pitfalls in a dynamic world.

One of Zillow's data scientists addresses the challenges that don’t crop up in standard textbook problems or most ML competitions: feedback loops, dynamic datasets, and temporal consistency. A great read for Kagglers and non-Kagglers alike. - No Free Hunch

Switching to a Probabilistic Model for Venue Search in Foursquare

How Foursquare’s engineering team improved the accuracy and user experience of their location intelligence by switching from a search ranking algorithm to regression trees and probabilities. - Foursquare Engineering

BuzzFeed News Trained A Computer To Search For Hidden Spy Planes. This Is What We Found.

Learn how BuzzFeed trained a random forest algorithm to spot planes flown by the FBI and DHS. - BuzzFeed

Using Machine Learning to Predict Value of Homes On Airbnb

How Airbnb used internal and open-source tools (like Python!) to lower the overall development costs of customer lifetime value (LTV) modeling. Code examples abound. - Airbnb Engineering and Data Science

Technical Debt in Machine Learning

What do feedback loops, correction cascades, and hobo-features have in common? They’re all machine learning anti-patterns that can slowly creep into your infrastructure and create a ticking time bomb. - Towards Data Science

Inside Facebook’s AI Workshop

When Joaquin Candela first started at Facebook, he worked on an ad-targeting algorithm with a handful of engineers. Five years later, he runs the Applied Machine Learning team, which comprises hundreds of employees running thousands of experiments a day. Here’s how he scaled up Facebook’s AI factory at breakneck speed. - Harvard Business Review

Improving the Realism of Synthetic Images

Producing a large, diverse, and accurate training set for machine learning models is a pricey endeavor. Apple provides a rare behind-the-scenes look at how they cut costs and improved their models by making simulated images look more realistic. - Apple Machine Learning Journal

Human-Centered Machine Learning

For UX folks: A 7-step guide to stay focused on human needs when designing with machine learning. - Google Design

Visualizing High Dimensional Data In Augmented Reality

When you’re trying to understand the relationships in a really big dataset (three-million-grocery-orders big), a 2D scatterplot might not cut it. This immersive 3D visualization technique offers a way to make sense of data with multiple attributes and improve machine learning features and models. - Inside Machine Learning

How HBO’s Silicon Valley built “Not Hotdog” with mobile TensorFlow, Keras & React Native

The use-case may be farcical, but the deep learning and edge computing behind it are very real. - Hacker Noon

Predicting the Success of a Reddit Submission with Deep Learning and Keras

It all comes down to two things: the time of day and a catchy title. - Max Woolf

Vertical AI Startups: Solving Industry-specific Problems by Combining AI and Subject Matter Expertise

“While most of the machine learning talent works in big tech companies, massive and timely problems are lurking in every major industry outside tech.” - Bradford Cross

J.P. Morgan’s massive guide to machine learning and big data jobs in finance

Get the key takeaways from this 280-page report, including essential data analysis packages, hiring tips, and which machine learning techniques to apply to which problems. - efinancialcareers

“Many enterprise ‘AI products’ and ‘machine intelligence’ products built today have limited appeal or impact”

One investor’s self-described “unpopular” opinion - Sarah Guo

Is Your Organization Ready for ML?

Don’t make this mistake: “[M]any organizations rush to hire ML experts without laying the proper foundation to ensure their success, including creating proper database architecture, building out essential data science technology, establishing data governance, and instilling data-driven decision-making throughout the organization.” - RE•WORK


Save this hashtag for the moments when you need to jog your memory on some basic concepts. - Chris Albon

Machine Learning for Product Managers

A brilliant, non-technical read for anyone who designs, supports, manages, or plans for products that use machine learning. - Hacker Noon

Distill: An Interactive, Visual Journal for Machine Learning Research

This new online publication is bringing academic journals into the 21st century: “A Distill article… isn’t just a paper. It’s an interactive medium that lets users – 'readers' is no longer sufficient – work directly with machine learning models.” - Y Combinator

Tips & Tricks for Feature Engineering / Applied Machine Learning

One commenter put it best: 'Probably the best feature engineering slides I have found [on] the internet.' Need we say more? - HJ van Veen

Learning about Machine Learning with an Earthquake Example

How well can we predict whether or not someone is prepared for an earthquake? - Simply Statistics

How Fitbit’s data science team scales machine learning

Workout regimens need to be tailored to each individual. Directional correctness isn’t enough. Fitbit’s head of data science shares how his team builds a model for every user to increase motivation and prevent injuries. - Mixpanel

Fake News Challenge

This grassroots effort is inviting teams to harness AI technologies to help human fact checkers identify hoaxes and deliberate misinformation in news stories. The top three teams get a cash prize, so grab a couple of friends and check out the training dataset. - Fake News Challenge

Machine Learning Videos

More of a visual learner? Here’s a repository of recorded talks at machine learning conferences, workshops, seminars, and more. - Dustin Tran

What is artificial intelligence? A three part definition

“As soon as it works, no one calls it AI anymore.” - Simply Statistics

What I Learned Implementing a Classifier from Scratch in Python

With libraries like scikit-learn, it’s easy to run an algorithm on some data and automagically get an answer—without understanding exactly how you arrived there. Prepare to unpack the black box. - Jean-Nicholas Hould


You could be a poet, and not know it. Feed the works of your favorite author through this new Python library to generate as many lines of verse as you want. - Anthony Federico

What’s the state of the job market in data science and machine learning?

“Th[e] proliferation of courses, resources, books and startups would hint that machine learning is becoming more and more accessible to the average programmer and that the market is on track to getting saturated quickly. Is this the current trend?” - Hacker News

20 Weird & Wonderful Datasets for Machine Learning

Getting your hands on a robust dataset is the hardest part of machine learning. Finding interesting datasets is tougher still. From UFO sightings to beautiful Flickr photos, you’re sure to find something to train your model. - Oliver Cameron

Deep-Fried Data

Opening your data can lead to unpredictable benefits, but requires being open to unexpected uses of your data. - Idle Words

Deep Learning Isn’t a Dangerous Magic Genie. It’s Just Math

This essay is a godsend for those of us who have trouble understanding or explaining what exactly deep learning is. - WIRED

Boosting Sales With Machine Learning

One developer shares how his team used natural language processing and machine learning in Python to pre-qualify sales leads so reps don’t have to spend hours doing it manually. - Xeneta

Hybrid Intelligence: How Artificial Assistants Work

When humans and machines work together, they accomplish a lot more than either could on their own. This is known as hybrid intelligence—a pretty intimidating term for those unfamiliar with machine learning. Here’s a breakdown. - Clare Corthell

The real prerequisite for machine learning isn’t math, it’s data analysis

Machine learning amateurs, take heart. Proficiency with high level math may be essential for machine learning theory. But with out-of-the-box tools like R’s gmodels package or Python’s scikit-learn library, you don’t need to know linear algebra or calculus to build a successful predictive model. You do, however, need to know your way around a dataset. - Sharp Sight Labs

How Kalman Filters Work, Part 1

This article unpacks different filtering algorithms in an incredibly intuitive way. It’s a long read, but you’ll come away having learned a ton (did you know that NASA used Kalman filters to help Apollo spacecraft navigate to the moon?). - An Uncommon Lab

Microsoft’s Tay is an Example of Bad Design

0r Why Interaction Design Matters, and so does QA-ing. - Caroline Sinders

Here's How We Prevent The Next Racist Chatbot is the consequence of poor training - Popular Science

Why Microsoft Accidentally Unleashed a Neo-Nazi Sexbot

It’s not surprising that Microsoft’s chatbot spewed racist invective, but here’s how it could have been avoided. - MIT Technology Review

Explained Visually

This website is an incredible collection of interactive visualizations aimed at making tricky concepts like Markov chains and regression easy to understand. Schedule a few hours to explore this one—you’re gonna need them. - Explained Visually

Lift analysis - A data scientist’s secret weapon

Learn how to spot flaws in machine learning models with lift analysis (and why you should add it to your list of evaluation metrics). - Andy Goldschmidt

We Now Have Algorithms To Predict Police Misconduct

You’ve probably heard of predictive policing, but what about predictive policing for the police? One police department teamed up with researchers to test an algorithm that detects troublesome behavior of officers early on. - FiveThirtyEight

Are Your Predictive Models like Broken Clocks?

How can you ensure you’ve picked the “right model” for a very big and very complex dataset? - Rocket-Powered Data Science

Startups Aim to Exploit a Deep-Learning Skills Gap

What do you do when every company wants to build a deep-learning network, but the experts are in short supply? Launch a product, of course. Some startups have created computer chips and software libraries that can accelerate algorithm training, all without having to hire an experienced team of deep-learning experts. - MIT Technology Review

Georgia Tech Researchers Demonstrate How the Brain Can Handle So Much Data

Random projection is frequently used in machine learning to make sense of big, diverse data. It turns out this method could be one of the ways that humans learn, too. - Georgia Tech

The current state of machine intelligence 2.0

These days, it feels like every other article in our newsfeeds is touting the potential of machine intelligence. This article cuts through the hype and presents this year’s major accomplishments in two categories—“(1) the emergence of autonomous systems in both the physical and virtual world and (2) startups shifting away from building broad technology platforms to focusing on solving specific business problems.” - O'Reilly