Mode Analytics Learn SQL Learn Python Data Viz Analytics Dispatch Forum

Top Data Science Articles

Each quarter we round up the most popular data science articles, videos, and podcasts from Mode's weekly newsletter, the Analytics Dispatch.

Star Wars, In One Chart

How does the most fearsome military force in the galaxy get whittled down from 6.8 million troops to 700k? This chart chronicles the casualties sustained by the Empire, from Rogue One to Return of the Jedi. - FiveThirtyEight

Data Science for Beginners

“These videos are basic but useful, whether you’re interested in doing data science or you work with data scientists.” - Microsoft Azure

8 Data Science Skills That Every Employee Needs

A nice primer to share with your colleagues. - Amplitude

Building & Maintaining a Master Data Dictionary: Part 2

Check out these ideas for structuring key metric definitions to keep everyone at your organization on the same page. - The Data Point

The Three Faces of Bayes

The term “Bayesian” can refer to a variety of philosophies and ideas. Read this article before the next quant-heavy cocktail party you attend, so you’ll know what’s what. - Slackpropagation

Scaling Data Science at Stitchfix

Not many companies can say they employ 80 data scientists. The folks at Stitchfix share their tactics for making data and compute resources more accessible—which in turn keeps data scientists happy and infrastructure healthy. - MultiThreaded

The best R package for learning to “think about visualization”

Spoiler alert: it’s ggplot2. - Sharp Sight Labs

How statistics lost their power – and why we should fear what comes next

“Not only are statistics viewed by many as untrustworthy, there appears to be something almost insulting or arrogant about them. Reducing social and economic issues to numerical aggregates and averages seems to violate some people’s sense of political decency.” - Guardian

My Experience as a Freelance Data Scientist

Itching to strike out on your own? Read up on the pros and cons before you give your two weeks notice. - Greg Reda

Practical advice for analysis of large, complex data sets

“This document has been read more than anything else I’ve done at Google over the last eleven years. Even four years after the last major update, I find that there are multiple Googlers with the document open any time I check.” - The Unofficial Google Data Science Blog

Awesome visualization research

A curated list of data visualization research papers, books, blog posts, and other readings. It’s pretty fresh, so submit a pull request and contribute! - Matthew Conlen

Two Alternatives to Using a Second Y-Axis

“Almost as often as I see a pie chart with a hundred tiny slivers, I see line graphs using two y-axes. And it is just as bad.” - Stephanie Evergreen

How These Three Women Made Mid-Career Pivots Into Data Science

How do we narrow the gender gap in data science? Early STEM education for girls isn’t the only solution. Here are the journeys of three women who switched from creative jobs to data roles mid-career. - Fast Company

Visualizing Distributions

16 ways to display distributions, from the barcode chart to the bean plot. - Darkhorse Analytics

What’s the state of the job market in data science and machine learning?

“Th[e] proliferation of courses, resources, books and startups would hint that machine learning is becoming more and more accessible to the average programmer and that the market is on track to getting saturated quickly. Is this the current trend?” - Hacker News

How To (Actually) Calculate CAC

Quick! What’s the difference between customer acquisition cost (CAC) and cost per acquisition (CPA)? If you hesitated, this post is for you. - Brian Balfour

What I Learned Recreating One Chart Using 24 Tools

An incredibly insightful and nuanced lay of the charting tools land. - Source

415 Data Visualization Tools

This collection of tools might seem overwhelming at first. Fear not! Filtering by features, data types, cost, and several other variables will help you find what you need, fast. - Adil Yalçın

Let's Chart: stop those lying line charts

In a quest for connected points and smoothed lines, we may be implying continuity where it doesn’t exist. - Signal v. Noise

Ten Ways Your Data Project is Going to Fail

“Many companies seem to go through a pattern of hiring a data science team only for the entire team to quit or be fired around 12 months later. Why is the failure rate so high?” - Martin Goodson

The Simpsons by the Data

America’s favorite family has been around for 27 years, providing plenty of data to analyze. Find out who’s the most talkative side character in Springfield, if Homer was always the star, and how much longer the show’s ratings can last. - Todd Schneider

Real-world data cleanup with Pandas and Python

Cleaning data is a tedious yet essential part of every analyst’s day. Learn how to use Python and Pandas to ensure that their data is clean, without worrying about overlooking any potential issues. - TrendCT

Visualizing Hundreds of My Favorite Songs on Spotify

A deep statistical dive into defining songs with attributes—such as tempo, energy, and valence. - Cuepoint

Asking good questions is hard (but worth it)

Although this framework is written from a programmer’s perspective, it’s a great read for analysts and the folks who ask them questions day-in and day-out. - Julia Evans

The State of Data Engineering

What makes a data engineer, well, a data engineer? And why does it feel like everyone is looking to hire one? This new study of LinkedIn data reveals that the number of data engineers doubled from 2013-2015, but demand still far outpaces supply. - Stitch Data

To the point: 7 reasons you should use dot graphs

The pros of dot plots (illustrated with real-world examples) and why they’re often a better choice over bar and line charts. - Maarten Lambrechts

R Psychologist

Puzzled by p-values? Confounded by confidence intervals? Stumped by significance testing? This site is a bevy of interactive visualizations illustrating tricky statistical concepts. Even if you’re a statistical genius, it’s worth a visit to play around. - Kristoffer Magnusson

3 Reasons Counting is the Hardest Thing in Data Science

Counting isn’t technically difficult; the real challenge lies in managing relationships and office politics that surround the task. - Dayne Batten

What I Wish I Knew About Data For Startups

One entrepreneur reflects on his learnings from four years of working with data at a startup. It’s a goldmine of advice on building a strong, scaleable data culture. Don’t skip this one. Seriously. - Jean-Nicholas Hould

Our nine-point guide to spotting a dodgy statistic

Numbers might appear unwavering and objective, but they’re easily manipulated—especially by politicians. Here are some common ways people spin numbers to support their agenda, with real-life examples from Brexit, the U.S. presidential election, and more. - The Guardian

10 Significant Visualisation Developments: January to June 2016

Every six months, visualization expert Andy Kirk puts together a list of people and projects he feels have impacted the field. This roundup includes climate spiral plots, #MakeoverMonday, and a talk from the Deputy Graphics Editor at The New York Times. - Visualising Data

A visual guide to Bayesian thinking

The best single source we’ve found for demystifying how Bayes’ Rule works, the intuition behind it, and how you can use it to inform your thinking. - Julia Galef

The Data Driven Daily

This newsletter provides definitions of business KPIs, how to calculate them for your business. This week they’re covering how to determine the size of your potential customer market. The archive is well worth perusing; past segments include revenue calculation and pricing strategy. - Outlier

The Theorem Every Data Scientist Should Know

Quick! Define the Central Limit Theorem. Scratching your head? You’re not alone. And yet, this theorem is key to what data scientists do every day: make statistical inferences about data. - Jean-Nicholas Hould

Thinking in SQL vs Thinking in Python

Using a new language requires a new mindset. Our chief analyst shares his learnings from adding Python to his SQL workflow. - Mode

Non-Mathematical Feature Engineering techniques for Data Science

This article is worth Pocketing for the straightforward, plain-English explanation of feature engineering alone. (And the best practices for pre-processing data ain’t bad either.) - Sachin Joglekar

Building a data science portfolio

Much like writers and designers, data scientists are now expected to provide portfolios when they apply for jobs. Here’s what you need to know to get started. - Dataquest

Escaping Excel Hell with Python & Pandas

A great presentation on the problems that arise from spreadsheet analysis and how you can ditch Excel by learning some Python. - Chris Moffitt

What SQL Analysts Need to Know About Python

Here's some info on the importance of Python and how to use it in day-to-day analysis. - Segment

Building Thumbtack’s Data Infrastructure

In this post, Thumbtack data engineer Nate Kupp sheds light on the company’s process for evaluating tools to add to their tech stack. It’s a goldmine for startups contemplating how to build a sustainable data infrastructure. - Thumbtack Engineering

The Five-Step Guide to Robust Help Center Metrics

When a documentation manager set out to revamp her company’s help site content, she was surprised to find very few resources on how to measure her project. Thankfully, she documented her journey so we can all learn from it. Great tips in here for anyone looking to make their help center more, well… helpful. - RJMetrics

You’re Measuring Daily Active Users Wrong

A high number of daily active users (DAU) may sound impressive, but does it actually mean anything? To make your DAU metric actionable, you need to measure how often users are getting core value out of your product, not how many times they log in. - Amplitude

How Instacart Uses Redshift to Drive Growth

In this interview, Fareed Mosavat, growth PM at Instacart, shares how his team combines behavior, shipping, and fulfillment data to inform product decisions. Check out how his team uses SQL to define internal metrics, conduct A/B tests, and discover how many touches it takes before a user makes their first order. - Segment

Choosing a Database for Analytics

A comprehensive rundown of criteria to consider when you’re ready to dedicate a database to analytics. Use this guide to evaluate your options depending on the type and size of your data, the state of your engineering resources, and your need to analyze data in real-time. - Segment