Fix & Prepare Data for Analysis
Data Cleaning is the process of fixing missing, incorrect, or inconsistent data to make it usable.
- Missing values - Duplicate data - Wrong formats - Outliers
import pandas as pd
df = pd.read_csv("data.csv")
df.dropna() # remove rows df.fillna(0) # fill with 0
df.drop_duplicates()
df["Age"] = df["Age"].astype(int)
df = df[df["Age"] < 100]
df.rename(columns={"old":"new"}, inplace=True)
df.to_csv("clean_data.csv", index=False)