Data Engineering Articles
Data engineers empower company initiatives by building tools, infrastructure, frameworks, and services to get data “in shape” for analysts to query. This section includes articles about building and maintaining scalable data infrastructure, data modeling, piping data from one database to another (ETL), integrating data generated from SaaS tools into a single data warehouse, and optimizing data processing and storage.
“[F]uture startups will be catapulted up the data maturity curve with access to better, cheaper, more accessible analytics software and services.” - Astronomer
An in-depth manifesto for data science’s younger sibling. - Maxime Beauchemin
What makes a data engineer, well, a data engineer? And why does it feel like everyone is looking to hire one? This new study of LinkedIn data reveals that the number of data engineers doubled from 2013-2015, but demand still far outpaces supply. - Stitch Data
Most companies store their data in a central repository where everyone can go to publish or retrieve a dataset. Google manages their data in different way: they’ve built (surprise!) a crawling engine to index datasets and gather metadata about them. This gives folks the freedom to make and use datasets however they like.
PostgresSQL has supported NoSQL for a while now, but when should you use the relational mode and when should you use non-relational mode? And if you use NoSQL, which data type should you pick? - Citus Data
This article is worth Pocketing for the straightforward, plain-English explanation of feature engineering alone. (And the best practices for pre-processing data ain’t bad either.) - Sachin Joglekar
Josh Wills, Director of Data Engineering at Slack, shares his thoughts on how data engineers and data scientists work best together. - Hakka Labs
How do you scale your data science org without hiring more people? Optimize for technical efficiency. In Uber’s case, that means data engineers building self-serve platforms to address specific problems in data scientists’ workflows. - Kevin Novak
In this post, Thumbtack data engineer Nate Kupp sheds light on the company’s process for evaluating tools to add to their tech stack. It’s a goldmine for startups contemplating how to build a sustainable data infrastructure. - Thumbtack Engineering
Here’s one suggestion for fixing the sometimes hairy relationships between data scientists and engineers optimize for autonomy, not technical efficiency. - Stitchfix
A comprehensive rundown of criteria to consider when you’re ready to dedicate a database to analytics. Use this guide to evaluate your options depending on the type and size of your data, the state of your engineering resources, and your need to analyze data in real-time. - Segment