🖋️
Labs DS Guide
  • Labs DS guide
  • Process
    • Labs for DS
    • Sprints
    • New Tech
    • How to get help
  • Tech
    • Structure
    • Architecture
    • Data wrangling
    • FastAPI
Powered by GitBook
On this page

Was this helpful?

  1. Tech

Data wrangling

PreviousArchitectureNextFastAPI

Last updated 4 years ago

Was this helpful?

Resources

  • Audit your data. For guidance, see , PDF pages 33-35. It describes levels of data availability, “from best-case scenario to most challenging”: “Labeled data exists”, “Weakly labeled data exists”, “Unlabeled data exists”, “We need to acquire data.” What level are you at now? What level can you reach?

  • For more guidance, read the .

  • is a thought-provoking 30 minute video. What will you do about missing data?

  • This shows how to transition from “notebook-style” pandas code to clean, production code you'll be proud of! After you write exploratory code, can you clean it up?

  • is a great place to start. What are you learning about your data while you wrangle it? See this humorous, depressing example of why data cleaning is necessary and time-consuming:

Building Machine Learning Powered Applications
Data Collection + Evaluation chapter in Google's People + AI Guidebook
Safe Handling Instructions for Missing Data
pandas pipe video series
Data Cleaning IS Analysis, Not Grunt Work