Mode Analytics Learn SQL Learn Python Data Viz Analytics Dispatch Forum

Learning Data Science

Many people have landed jobs as data scientists without any formal training because the internet is abundant in free resources for learning data science. This section includes tutorials for analytical languages such as SQL, Python, and R, career advice, and how-to posts about performing common tasks like A/B testing and time series analysis.

Setting up SQL for beginners is hard

SQL’s human-language-like syntax and declarative nature make it the perfect language for people with no coding experience. But getting data available in the right structure presents a major barrier to entry. Here’s how to quickly build a stack for teaching SQL to others. - Vicki Boykis

Alternatives to a Degree to Prove Yourself in Deep Learning

Why blogging might be the best way to land a job offer. - fast.ai

The Etymology of Trig Functions

Way more engaging than your high school math class. - Matthew Conlen

How to ask questions data science can solve

Asking the right questions is half the battle. This post takes a different approach to formulating questions, by mapping them to the tools of the trade. - Towards Data Science

1,000+ Women in Data Science

Your Twitter feed just got so much better. - Renee Teate

Taking Prophet for a Spin

Been meaning to try Prophet? Check out this walkthrough of Facebook’s Bayesian-influenced time series forecasting package (for both R and Python!). - Fast Forward Labs

Group-by From Scratch

What’s the best way to split-apply-combine in Python? Although pandas groupby() is the widely-accepted default answer, there are situations where using built-in Python operations and NumPy and SciPy operations are more effective. - Jake VanderPlas

Mathematicians becoming data scientists: Should you? How to?

Tips for determining if you’ll actually like the work data scientists do and positioning your mathematics background as an asset when you’re interviewing. - Quomodocumque

How to change careers and become a data scientist - one quant’s experience

One quant shares her story of switching from energy trading to data science: the resources she used, the classes she took, her decision to move to the Bay Area, and her advice for handling tech culture shock. - fast.ai

What’s Wrong With My Time Series

When you want to test a model’s predictive power, cross validation is usually the way to go. However, since data points in a time series are dependent on each other, randomly selecting subsets for training and testing won’t do. Check out these other ways to determine error sources in time series. - MultiThreaded

The Zero Bug

Hidden errors can be worse than visible errors. This post presents a fallacy that plagues many data analysts: common data aggregation tools usually can’t “count to zero” from examples. - Win-Vector Blog

Unlearning descriptive statistics

If you’ve ever used an arithmetic mean, a Pearson correlation, or a standard deviation to describe a dataset, this post is for you. - Stijn Debrouwere

I ranked every Intro to Data Science course on the internet, based on thousands of data points

There are a ton of data science training options online, but which one is the best? - freeCodeCamp

Guide to Encoding Categorical Values in Python

There are a ton of ways to turn categorical variables from text attributes into numerical values. Here’s how to implement the many options offered by pandas and scikit-learn on your own datasets. - Practical Business Python

Data Science for Beginners

“These videos are basic but useful, whether you’re interested in doing data science or you work with data scientists.” - Microsoft Azure

Intro to Data Science for Academics

From Reed College to Revenue at Twitter, one data scientist shares his insights on how academics can be successful in industry—by finding ways to create value in every corner of the business. - Noah Pepper

The best R package for learning to “think about visualization”

Spoiler alert: it’s ggplot2. - Sharp Sight Labs

My Experience as a Freelance Data Scientist

Itching to strike out on your own? Read up on the pros and cons before you give your two weeks notice. - Greg Reda

Matching to estimate the causal effects of firing an NFL coach

To fire or not to fire? When a football team gives their coach the boot, are they better off for it? (Bonus: a nice primer on causal inference.) - StatsbyLopez

How These Three Women Made Mid-Career Pivots Into Data Science

How do we narrow the gender gap in data science? Early STEM education for girls isn’t the only solution. Here are the journeys of three women who switched from creative jobs to data roles mid-career. - Fast Company

What’s the state of the job market in data science and machine learning?

“Th[e] proliferation of courses, resources, books and startups would hint that machine learning is becoming more and more accessible to the average programmer and that the market is on track to getting saturated quickly. Is this the current trend?” - Hacker News

What library do you use for information theory in Python?

This thread is a goldmine if you’re looking to calculate entropy, mutual information, or any other information theory metric. - Randy Olson

Time Series Analysis in Python- Linear Models to GARCH

A well-written, comprehensive primer on the time series models available in Python. - BlackArbs

The Game Theory of the Yankee Swap

Want to get the best present at this year’s White Elephant gift exchange? Prep for total domination with these Python models. - Ben Casselman

How the Circle Line rogue train was caught with data

When a series of signal interferences led to massive disruptions on a Singapore subway line, a team of data scientists stepped in to solve the mystery… with Python! - Data.gov.sg

Text Analysis and Visualization

Ever wanted to try text analysis in Python, but didn’t know where to start? Here’s your launch pad. - Irene Ros

Building a Financial Model with Pandas

Expand your knowledge of Python and Pandas and analyze your mortgage payment options. Two birds, one stone. - Practical Business Python

8 Data Science Skills That Every Employee Needs

A nice primer to share with your colleagues. - Amplitude

Is Bayesian A/B Testing Immune to Peeking? Not Exactly

A common A/B testing mistake is to monitor the test and stop it when the p-value reaches a certain threshold. Many have suggested that using Bayesian methods eliminates this “peeking problem,” but all is not as it appears. - Variance Explained

PostgreSQL Date Functions (and 7 Ways to Use Them in Business Analysis)

PostgreSQL date functions (like DATE_TRUNC, EXTRACT, and AGE) make wrangling timestamps much easier. Here are 7 examples of applying these date functions to business scenarios. - Mode

Farmers Markets

Can you find real maple syrup outside of Vermont? Or seafood in the midwest? Or pet food anywhere? Check out these interactive visualizations to see what you’re most likely to find at a farmers market near you. - Susie Lu

On Average

Does the average person actually exist? Probably not, as it turns out. Learn how the concept of “average” influences product design, and why that’s not always a good thing. - 99% Invisible

How to Master Anti Joins and Apply Them to Business Problems

How to perform an anti join using LEFT JOIN and WHERE. Plus three examples of using anti joins in business scenarios. - Mode

What Would It Take To Turn Blue States Red?

Explore this interactive data visualization to see how small voting shifts among different demographics can impact the Presidential election. - FiveThirtyEight

Goodbye, Ivory Tower. Hello, Silicon Valley Candy Store.

Some economists are trading in their professorships for tech jobs: 'Instead of thinking about national or global trends, they are studying the data trails of consumer behavior to help digital companies make smart decisions that strengthen their online marketplaces in areas like advertising, movies, music, travel and lodging.' - New York Times

Asking good questions is hard (but worth it)

Although this framework is written from a programmer’s perspective, it’s a great read for analysts and the folks who ask them questions day-in and day-out. - Julia Evans

The Three Faces of Bayes

The term “Bayesian” can refer to a variety of philosophies and ideas. Read this article before the next quant-heavy cocktail party you attend, so you’ll know what’s what. - Slackpropagation

Postgres Data Types to Redshift Data Types

Switching from one flavor of SQL to another can be a major pain. This table translates Postgres data types to their equivalent in Redshift. Definitely worth starring on Github. - Rob Story

R Psychologist

Puzzled by p-values? Confounded by confidence intervals? Stumped by significance testing? This site is a bevy of interactive visualizations illustrating tricky statistical concepts. Even if you’re a statistical genius, it’s worth a visit to play around. - Kristoffer Magnusson

3 Reasons Counting is the Hardest Thing in Data Science

Counting isn’t technically difficult; the real challenge lies in managing relationships and office politics that surround the task. - Dayne Batten

Forget Python vs. R: how they can work together

Apparently we can all get along. The folks at Civis Analytics share the benefits of using both languages and give an example of how you can use C as a bridge to both Python and R. (Slides and a video from the original SciPy talk are also available.) - Civis Analytics

70+ Resources for Transitioning to a Data Science Career

Considering a career in data science? Time to read up. Here's a list of tutorials, tips for interviewing, and stories from people who've made it. - Mode

Top 20 Pandas, NumPy, and SciPy Functions on Github

Some of the most popular Python functions, visualized in Python. - Alexander Galea

Ethics for powerful algorithms

Contrary to a ProPublica investigation, COMPAS—a proprietary algorithm used to predict police recidivism and inform parole—isn’t statistically biased against black people. However, that doesn’t mean COMPAS isn’t deeply unfair. This is the first of four posts digging into data science ethics. - Abe Gong

Build Algorithms Like You Give a Damn

Discussions at the 2016 WrangleConf focused on data science ethics and strategies for combatting harm by opening communication, recognizing bias, and fighting indifference. - Mode

Understanding Bias: A Pre-requisite For Trustworthy Results

“What causes bias? How can we correct it, and how does our picture of how the world works factor in to that?” - Adam Kelleher

A visual guide to Bayesian thinking

The best single source we’ve found for demystifying how Bayes’ Rule works, the intuition behind it, and how you can use it to inform your thinking. - Julia Galef

Practical advice for analysis of large, complex data sets

“This document has been read more than anything else I’ve done at Google over the last eleven years. Even four years after the last major update, I find that there are multiple Googlers with the document open any time I check.” - The Unofficial Google Data Science Blog

Thinking in SQL vs Thinking in Python

Using a new language requires a new mindset. Our chief analyst shares his learnings from adding Python to his SQL workflow. - Mode

The Theorem Every Data Scientist Should Know

Quick! Define the Central Limit Theorem. Scratching your head? You’re not alone. And yet, this theorem is key to what data scientists do every day: make statistical inferences about data. - Jean-Nicholas Hould

If Correlation Doesn’t Imply Causation, Then What Does?

This tweet sums up our feelings on this article exactly: 'Love that it gives a framework for thinking about correlations that isn’t just ¯ (ツ)_/¯' - Adam Kelleher

Building a data science portfolio

Much like writers and designers, data scientists are now expected to provide portfolios when they apply for jobs. Here’s what you need to know to get started. - Dataquest

Escaping Excel Hell with Python & Pandas

A great presentation on the problems that arise from spreadsheet analysis and how you can ditch Excel by learning some Python. - Chris Moffitt

10 Useful Python Data Visualization Libraries for Any Discipline

While many Python data visualizations libraries are narrowly focused on accomplishing a certain task, these libraries can be used regardless of your field. - Mode

Scientific Python Cheat Sheet

For those moments when you forget how to make a contour line plot in matplotlib or write a function in pure Python. - Institut de Physique du Globe de Paris

What SQL Analysts Need to Know About Python

Here's some info on the importance of Python and how to use it in day-to-day analysis. - Segment

PyData London Conference Presentations

A few weekends ago PyData hosted a conference in London, and they just released videos and slides of a bunch of the presentations. - PyData

Easier data analysis in Python with pandas

A series of video tutorials for pandas newbies who know some Python. Each video answers a student-posed question using real-world data. - Data School

Modern Pandas

This tutorial is great for experienced Python users looking to stay sharp on pandas. One Twitter user summed it up perfectly as “the abbreviated Strunk & White of data analysis.” - Tom Augspurger

SQL Joins Visualizer

Many a learner has embarked on the quest to learn SQL, only to be thwarted by the task of mastering joins. Never again. Click the type of join you want to execute and this site will generate the right code. - SQL Joins Visualizer

Spreadsheet Thinking vs. Database Thinking

This a great read for anyone who’s new to working with relational databases. - eagereyes

An Introduction to Inference

A good first step for those who work with data frequently and want to learn more about Bayesian statistical methods. From the author: 'It will be a bit mathy, but nothing beyond kahn-level probability.' - Vincent D. Warmerdam

6 Lesser Known Python Data Analysis Libraries

You’ve heard of NumPy and Pandas and matplotlib. Now check out these other handy libraries for dealing with data. - Jyotiska Khasnabish

10+2 Data Science Methods that Every Data Scientist Should Know in 2016

Forgive the click-baity title. This is actually a really well-done roundup of the statistical and machine learning methods data scientists use daily, with Python and R scripts for each. - Takashi J. Ozaki

How to Find Correlative Metrics For Conversion Optimization

A thorough walk-through of how to find correlative metrics and leverage them for conversion. It’s jam-packed with examples and advice from experts, plus a handy list of tools. - ConversionXL

This is the difference between statistics and data science

Another blog post trying to define data science? We know. We know. BUT! This one presents an interesting angle: the difference between a data scientist and a statistician comes down to product knowledge. - Mixpanel

Lift analysis - A data scientist’s secret weapon

Learn how to spot flaws in machine learning models with lift analysis (and why you should add it to your list of evaluation metrics). - Andy Goldschmidt

Not So Standard Deviations: Episode 11 - Start and Stop

If you haven’t listened to NSSD yet, you’re missing out on an inside look at how data scientists work in industry and academia. In this episode, statisticians Hilary Parker and Dr. Roger Peng discuss their methods for tackling the beginning and ending parts of analyses (discussion starts at 20:43). - Not So Standard Deviations

A Practical Guide to Anonymizing Datasets with Python & Faker

Sometimes you just want to show off an analysis or chart you built for your company… without revealing your company’s data. Now you can. - District Data Labs

Writing Data—an introduction to choosing & using data formats

JSON, CSV, or HDF5? This guide outlines the perks and pitfalls of file formats for alphanumeric data. - Build Things Together

Friction Between Programming Professionals and Beginners

In many technical forums, there’s a pattern of beginners asking a vague question and forum veterans responding with snarky or curt replies. Here are some suggestions both parties can use to keep conversations productive. - Programming for Beginners

Practical skills that practical data scientists need

Last week, Noah Lorang of Basecamp wrote that, most of the time, data scientists don’t need AI to solve business problems. They just need simple arithmetic. In this post, he elaborates on the skills he uses and questions he asks every day. - Signal v. Noise

Data scientists mostly just do arithmetic and that’s a good thing

A vast majority of the time, businesses don’t need machine learning to solve their problems. They need accurate, actionable data and people who consider context, know basic math, write SQL, and understand what makes businesses tick. - Signal v. Noise

The Art of Naming Things

Nothing’s worse than when you open a new dataset only to find it’s full of indecipherable labels. This two-part article provides suggestions to keep your naming convention consistent, concise, and informative while preventing data loss and a whole lot of headaches. - Penn State

The Elements of Python Style

This document goes beyond PEP8 to cover the core of what I think of as great Python style. It is opinionated, but not too opinionated. It goes beyond mere issues of syntax and module layout, and into areas of paradigm, organization, and architecture. - Andrew Montalenti

LowClass Python—Style Guide for Data Scientists

This style guide is meant for use by advanced beginner to advanced intermediate developers of scientific code in Python. In other words, non-professional programmers...for example, data scientists. - Columbia University Applied Data Science

A menagerie of messed up data analyses and how to avoid them

Don’t let mistakes botch your analyses. This post outlines six examples and offers advice for taking proactive measures against them. - Simply Statistics

Guess the Correlation

How good are you at gauging the correlation between two variables in a scatter plot? Find out! - Omar Wagih

Writing More Legible SQL

It’s easy to get lazy when writing SQL. Here are a few tips for cleaning up your queries so others can actually read your work. - Craig Kerstiens

How to Make the Leap from Excel to SQL

Learning SQL is easier when you have Excel in your toolbelt. And moving your analysis into SQL will seriously speed up your workflow. - Mode

Getting to the “Plateau of Productivity” with Python

Using the Gartner Hype Cycle as a framework, this post provides a load of context and tips for anyone who wants to pursue Python. As an added benefit, you could apply this structure to learning any technical language or tool. - Practical Business Python

AMA Data Scientist—Jake Porway of DataKind

Highlights of the discussion include advice for budding data scientists, ethical challenges, and opportunities to do good with data. - Reddit

The Missing 11th of the Month

According to Google’s Ngrams database, the 11th is mentioned significantly less than other monthly ordinals. But why? We don’t want to spoil the conclusion, but this post is a good reminder of why you shouldn’t blindly trust data. - Dr. David Hagen

Not Even Scientists Can Easily Explain P-values

We want to know if results are right, but a p-value doesn’t measure that. It can’t tell you the magnitude of an effect, the strength of the evidence or the probability that the finding was the result of chance. - FiveThirtyEight

Blinded by Statistical Significance

Putting too much stock in an arbitrary threshold may lead to bad decisions. - KelloggInsight Blog

The Field Guide to Data Science

Booz Allen just released The Second Edition of The Field Guide to Data Science, which walks you through how to use data to generate value for your organization. The guide includes practical advice, tested processes, and insights that are helpful for anyone who touches data, whether you’re a senior exec, a practioner, or a newbie. - Booz Allen Hamilton

Big Data Still Requires Humans To Make Meaningful Connections

It’s easy to get swept up in the exciting opportunities big data presents and forget that data alone isn’t a solution—it’s a tool to help solve problems. This article hits on a sentiment we’ve been hearing a lot lately—“we still need humans to help make sense of the data we are collecting.” - TechCrunch