Data wrangling
Last updated
Was this helpful?
Last updated
Was this helpful?
Audit your data. For guidance, see Building Machine Learning Powered Applications, PDF pages 33-35. It describes levels of data availability, “from best-case scenario to most challenging”: “Labeled data exists”, “Weakly labeled data exists”, “Unlabeled data exists”, “We need to acquire data.” What level are you at now? What level can you reach?
For more guidance, read the Data Collection + Evaluation chapter in Google's People + AI Guidebook.
Safe Handling Instructions for Missing Data is a thought-provoking 30 minute video. What will you do about missing data?
This pandas pipe video series shows how to transition from “notebook-style” pandas code to clean, production code you'll be proud of! After you write exploratory code, can you clean it up?
Data Cleaning IS Analysis, Not Grunt Work is a great place to start. What are you learning about your data while you wrangle it? See this humorous, depressing example of why data cleaning is necessary and time-consuming: