Mode Analytics Learn SQL Learn Python Data Viz Analytics Dispatch Forum

Data Engineering Articles

Data engineers empower company initiatives by building tools, infrastructure, frameworks, and services to get data “in shape” for analysts to query. This section includes articles about building and maintaining scalable data infrastructure, data modeling, piping data from one database to another (ETL), integrating data generated from SaaS tools into a single data warehouse, and optimizing data processing and storage.

Airflow and the Future of Data Engineering: A Q&A

“[F]uture startups will be catapulted up the data maturity curve with access to better, cheaper, more accessible analytics software and services.” - Astronomer

The Rise of the Data Engineer

An in-depth manifesto for data science’s younger sibling. - Maxime Beauchemin

The State of Data Engineering

What makes a data engineer, well, a data engineer? And why does it feel like everyone is looking to hire one? This new study of LinkedIn data reveals that the number of data engineers doubled from 2013-2015, but demand still far outpaces supply. - Stitch Data

Goods: Organizing Google’s datasets

Most companies store their data in a central repository where everyone can go to publish or retrieve a dataset. Google manages their data in different way: they’ve built (surprise!) a crawling engine to index datasets and gather metadata about them. This gives folks the freedom to make and use datasets however they like.

When to use unstructured datatypes in Postgres–Hstore vs. JSON vs. JSONB

PostgresSQL has supported NoSQL for a while now, but when should you use the relational mode and when should you use non-relational mode? And if you use NoSQL, which data type should you pick? - Citus Data

Non-Mathematical Feature Engineering techniques for Data Science

This article is worth Pocketing for the straightforward, plain-English explanation of feature engineering alone. (And the best practices for pre-processing data ain’t bad either.) - Sachin Joglekar

Bridging the Gap Between Data Science and Data Engineering

Josh Wills, Director of Data Engineering at Slack, shares his thoughts on how data engineers and data scientists work best together. - Hakka Labs

The Purpose of Platforms in Data Science

How do you scale your data science org without hiring more people? Optimize for technical efficiency. In Uber’s case, that means data engineers building self-serve platforms to address specific problems in data scientists’ workflows. - Kevin Novak

Building Thumbtack’s Data Infrastructure

In this post, Thumbtack data engineer Nate Kupp sheds light on the company’s process for evaluating tools to add to their tech stack. It’s a goldmine for startups contemplating how to build a sustainable data infrastructure. - Thumbtack Engineering

Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department

Here’s one suggestion for fixing the sometimes hairy relationships between data scientists and engineers optimize for autonomy, not technical efficiency. - Stitchfix

Choosing a Database for Analytics

A comprehensive rundown of criteria to consider when you’re ready to dedicate a database to analytics. Use this guide to evaluate your options depending on the type and size of your data, the state of your engineering resources, and your need to analyze data in real-time. - Segment