Mode Analytics Learn SQL Learn Python Data Viz Analytics Dispatch Forum

Data Munging Articles

Need to anonymize a dataset? Get dates and times in a particular format? Replace null values? All of these tasks fall under the umbrella of data munging the process of cleaning and formatting data to be more consumable and convenient for analysis. These tutorials will help you get started with data munging (aka “data wrangling”)—whether you do it manually, or with the assistance of certain tools like Python libraries and R packages.

Data preparation in the age of deep learning

“When companies are spending millions or more dollars on training data, it's absolutely essential that they do it in a smart way.” - O’Reilly Data Show Podcast

Real-world data cleanup with Pandas and Python

Cleaning data is a tedious yet essential part of every analyst’s day. Learn how to use Python and Pandas to ensure that their data is clean, without worrying about overlooking any potential issues. - TrendCT

Handy Python Libraries for Formatting and Cleaning Data

These Python libraries will make the crucial task of data cleaning a bit more bearable—from anonymizing datasets to wrangling dates and times. - Mode

How to Treat Missing Values in Your Data

There’s nothing worse than opening up a new dataset only to discover it’s missing a ton of values. This two-part post evaluates techniques for handling missing data. - CleverTap

What every data scientist should know about data anonymization

You can uniquely identify a person with surprisingly little data. This PyData Berlin presentation walks you through the process of anonymizing a data set as well as best practices. - Katharina Rasch

A Practical Guide to Anonymizing Datasets with Python & Faker

Sometimes you just want to show off an analysis or chart you built for your company… without revealing your company’s data. Now you can. - District Data Labs

The Quartz guide to bad data

This comprehensive reference is intended for journalists, but it’s a worthwhile read for anyone working with data. Familiarize yourself with common issues—ambiguous field names, inconsistent date formats, biased samples—so you can catch data quality problems early. - Quartz