Data Munging Articles
Need to anonymize a dataset? Get dates and times in a particular format? Replace null values? All of these tasks fall under the umbrella of data munging the process of cleaning and formatting data to be more consumable and convenient for analysis. These tutorials will help you get started with data munging (aka “data wrangling”)—whether you do it manually, or with the assistance of certain tools like Python libraries and R packages.
Cleaning data is a tedious yet essential part of every analyst’s day. Learn how to use Python and Pandas to ensure that their data is clean, without worrying about overlooking any potential issues. - TrendCT
These Python libraries will make the crucial task of data cleaning a bit more bearable—from anonymizing datasets to wrangling dates and times. - Mode
There’s nothing worse than opening up a new dataset only to discover it’s missing a ton of values. This two-part post evaluates techniques for handling missing data. - CleverTap
You can uniquely identify a person with surprisingly little data. This PyData Berlin presentation walks you through the process of anonymizing a data set as well as best practices. - Katharina Rasch
Sometimes you just want to show off an analysis or chart you built for your company… without revealing your company’s data. Now you can. - District Data Labs
This comprehensive reference is intended for journalists, but it’s a worthwhile read for anyone working with data. Familiarize yourself with common issues—ambiguous field names, inconsistent date formats, biased samples—so you can catch data quality problems early. - Quartz