Extract • Transform • Load
ETL is a process used to collect data from sources, clean/transform it, and load it into a database or data warehouse.
Extract → Get data (CSV, API, DB) Transform → Clean & process Load → Store in database
import pandas as pd
df = pd.read_csv("data.csv")
df.dropna(inplace=True) df["salary"] = df["salary"] * 1.1
import sqlite3
conn = sqlite3.connect("data.db")
df.to_sql("table", conn)
- Apache Airflow - Talend - Informatica - AWS Glue
Source → ETL → Database → Dashboard