Pandas Data Wrangling Example Explanation

Overview

This example demonstrates how to use Pandas for common data wrangling tasks in a data engineering context. It covers:

Loading data from CSV files
Cleaning and preprocessing data
Handling missing values
Performing basic analysis
Grouping and aggregating data
Merging datasets
Exporting results

Code Explanation

Imports and Data Loading

import pandas as pd
import numpy as np

# This will ensure we're referencing the correct files regardless of the current working directory:
import os
# Get the directory of the current script
script_dir = os.path.dirname(os.path.abspath(__file__))

# Construct full file paths
sales_file_path = os.path.join(script_dir, 'sales_data.csv')
customer_file_path = os.path.join(script_dir, 'customer_data.csv')

# Load sales and customer data from CSV
sales_df = pd.read_csv(sales_file_path)
customer_df = pd.read_csv(customer_file_path)

This section imports the necessary modules and loads the sales and customer data from a CSV files.

Data Cleaning and Preprocessing

sales_df['date'] = pd.to_datetime(sales_df['date'])
sales_df['product'] = sales_df['product'].str.lower()

Here, we convert the 'date' column to datetime format and standardize the 'product' column by converting it to lowercase.

Handling Missing Values

sales_df['quantity'] = sales_df['quantity'].fillna(sales_df['quantity'].mean())

This code fills missing values in the 'quantity' column with the mean value.

Feature Engineering

sales_df['total_revenue'] = sales_df['quantity'] * sales_df['price']

We create a new 'total_revenue' column by multiplying quantity and price.

Basic Analysis

print(sales_df['total_revenue'].describe())

This prints summary statistics for the 'total_revenue' column.

Grouping and Aggregation

monthly_sales = sales_df.groupby(sales_df['date'].dt.to_period('M'))['total_revenue'].sum()
print(monthly_sales)

Here, we group the data by month and calculate the total revenue for each month.

Data Merging

merged_df = pd.merge(sales_df, customer_df, on='customer_id', how='left')

This code loads customer data and merges it with the sales data based on the customer ID.

Exporting Results

analysis_file_path = os.path.join(script_dir, 'sales_analysis.csv')
merged_df.to_csv(analysis_file_path, index=False)

Finally, we export the merged and processed data to a new CSV file.

Running the Example

To run this example:

Ensure you have Pandas installed:
```
pip install pandas
```
Prepare your data files:
- Create a 'sales_data.csv' file with columns: date, product, quantity, price, customer_id
- Create a 'customer_data.csv' file with columns: customer_id, customer_name, etc.
Save the Python code in a file, e.g., 'pandas_example.py'
Run the script:
```
python pandas_example.py
```

The script will process the sales data, merge it with customer data, perform analysis, and create a 'sales_analysis.csv' file with the results.

Pandas Data Wrangling Example Explanation

Overview

This example demonstrates how to use Pandas for common data wrangling tasks in a data engineering context. It covers:

Loading data from CSV files
Cleaning and preprocessing data
Handling missing values
Performing basic analysis
Grouping and aggregating data
Merging datasets
Exporting results

Code Explanation

Imports and Data Loading

import pandas as pd
import numpy as np

# This will ensure we're referencing the correct files regardless of the current working directory:
import os
# Get the directory of the current script
script_dir = os.path.dirname(os.path.abspath(__file__))

# Construct full file paths
sales_file_path = os.path.join(script_dir, 'sales_data.csv')
customer_file_path = os.path.join(script_dir, 'customer_data.csv')

# Load sales and customer data from CSV
sales_df = pd.read_csv(sales_file_path)
customer_df = pd.read_csv(customer_file_path)

This section imports the necessary modules and loads the sales and customer data from a CSV files.

Data Cleaning and Preprocessing

sales_df['date'] = pd.to_datetime(sales_df['date'])
sales_df['product'] = sales_df['product'].str.lower()

Here, we convert the 'date' column to datetime format and standardize the 'product' column by converting it to lowercase.

Handling Missing Values

sales_df['quantity'] = sales_df['quantity'].fillna(sales_df['quantity'].mean())

This code fills missing values in the 'quantity' column with the mean value.

Feature Engineering

sales_df['total_revenue'] = sales_df['quantity'] * sales_df['price']

We create a new 'total_revenue' column by multiplying quantity and price.

Basic Analysis

print(sales_df['total_revenue'].describe())

This prints summary statistics for the 'total_revenue' column.

Grouping and Aggregation

monthly_sales = sales_df.groupby(sales_df['date'].dt.to_period('M'))['total_revenue'].sum()
print(monthly_sales)

Here, we group the data by month and calculate the total revenue for each month.

Data Merging

merged_df = pd.merge(sales_df, customer_df, on='customer_id', how='left')

This code loads customer data and merges it with the sales data based on the customer ID.

Exporting Results

analysis_file_path = os.path.join(script_dir, 'sales_analysis.csv')
merged_df.to_csv(analysis_file_path, index=False)

Finally, we export the merged and processed data to a new CSV file.

Running the Example

To run this example:

Ensure you have Pandas installed:
```
pip install pandas
```
Prepare your data files:
- Create a 'sales_data.csv' file with columns: date, product, quantity, price, customer_id
- Create a 'customer_data.csv' file with columns: customer_id, customer_name, etc.
Save the Python code in a file, e.g., 'pandas_example.py'
Run the script:
```
python pandas_example.py
```

The script will process the sales data, merge it with customer data, perform analysis, and create a 'sales_analysis.csv' file with the results.

Sales Data Analysis with Pandas

Prerequisites

What You'll Learn

Pandas Data Wrangling Example Explanation

Overview

Code Explanation

Imports and Data Loading

Data Cleaning and Preprocessing

Handling Missing Values

Feature Engineering

Basic Analysis

Grouping and Aggregation

Data Merging

Exporting Results

Running the Example

Category

Tools Used

Sales Data Analysis with Pandas

Prerequisites

What You'll Learn

Pandas Data Wrangling Example Explanation

Overview

Code Explanation

Imports and Data Loading

Data Cleaning and Preprocessing

Handling Missing Values

Feature Engineering

Basic Analysis

Grouping and Aggregation

Data Merging

Exporting Results

Running the Example

Category

Tools Used