Data Cleaning Guide ๐Ÿงน๐Ÿ“Š

Fix & Prepare Data for Analysis

1. What is Data Cleaning?

Data Cleaning is the process of fixing missing, incorrect, or inconsistent data to make it usable.

2. Common Problems

- Missing values
- Duplicate data
- Wrong formats
- Outliers

3. Load Data

import pandas as pd

df = pd.read_csv("data.csv")

4. Handle Missing Values

df.dropna()        # remove rows
df.fillna(0)       # fill with 0

5. Remove Duplicates

df.drop_duplicates()

6. Fix Data Types

df["Age"] = df["Age"].astype(int)

7. Remove Outliers

df = df[df["Age"] < 100]

8. Rename Columns

df.rename(columns={"old":"new"}, inplace=True)

9. Save Clean Data

df.to_csv("clean_data.csv", index=False)

10. Real Use Cases