Seaborn Data Visualization Tutorial

This document explains the Seaborn e-commerce sales analysis example provided in seaborn_example.py.

Overview

The example demonstrates how to:

Generate simulated e-commerce sales data
Create a DataFrame using pandas
Create multiple types of plots using Seaborn
Handle datetime objects for regression plots
Customize plot appearance
Save and display the plots
Calculate and print basic sales statistics

Code Explanation

Imports and Setup

import os
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

output_dir = os.path.dirname(os.path.abspath(__file__))
sns.set_style("whitegrid")

This section imports necessary modules, sets up the output directory, and sets the Seaborn style.

Data Generation

np.random.seed(42)
start_date = datetime(2023, 1, 1)
dates = [start_date + timedelta(days=i) for i in range(365)]
sales = np.random.normal(1000, 200, 365) + 500 * np.sin(np.arange(365) * 2 * np.pi / 365)
categories = np.random.choice(['Electronics', 'Clothing', 'Books', 'Home & Kitchen'], 365)

df = pd.DataFrame({
    'date': dates,
    'sales': sales,
    'category': categories
})

Here, we generate simulated e-commerce sales data for a full year and create a pandas DataFrame.

Data Processing

df['month'] = df['date'].dt.to_period('M')
df['date_ordinal'] = df['date'].apply(lambda x: x.toordinal())

This adds a month column for later aggregation.

Plotting

plt.figure(figsize=(15, 10))

# Plot 1: Time series of daily sales
plt.subplot(2, 2, 1)
sns.lineplot(x='date', y='sales', data=df)

# Plot 2: Box plot of sales by category
plt.subplot(2, 2, 2)
sns.boxplot(x='category', y='sales', data=df)

# Plot 3: Heatmap of monthly sales
monthly_sales = df.groupby(['month', 'category'])['sales'].sum().unstack()
plt.subplot(2, 2, 3)
sns.heatmap(monthly_sales, cmap='YlOrRd', annot=False)

# Plot 4: Scatter plot of daily sales with trend line
plt.subplot(2, 2, 4)
sns.regplot(x='date_ordinal', y='sales', data=df, scatter_kws={'alpha':0.5}, line_kws={'color': 'red'})
plt.title('Daily Sales with Trend')
plt.xlabel('Date')
plt.ylabel('Sales ($)')

This section creates four different types of plots using Seaborn: a line plot, a box plot, a heatmap, and a scatter plot with a regression line.

Saving and Displaying the Plot

plt.tight_layout()
output_file_path = os.path.join(output_dir, 'sales_analysis.png')
plt.savefig(output_file_path, dpi=300)
plt.show()

This saves the plot as a high-resolution PNG file and displays it.

Statistical Analysis

total_sales = df['sales'].sum()
avg_daily_sales = df['sales'].mean()
best_selling_category = df.groupby('category')['sales'].sum().idxmax()
worst_selling_category = df.groupby('category')['sales'].sum().idxmin()

print(f"Total sales: ${total_sales:.2f}")
print(f"Average daily sales: ${avg_daily_sales:.2f}")
print(f"Best selling category: {best_selling_category}")
print(f"Worst selling category: {worst_selling_category}")

This section calculates and prints basic statistics about the sales data.

Running the Example

To run this example:

Ensure you have Seaborn, Matplotlib, NumPy, and pandas installed:
```
pip install seaborn matplotlib numpy pandas
```
Run the script:
```
python seaborn_example.py
```

The script will generate a plot with four subplots analyzing different aspects of the simulated sales data, save it as 'sales_analysis.png' in the same directory as the script, display the plot, and print some basic statistics about the sales data.

Seaborn Data Visualization Tutorial

This document explains the Seaborn e-commerce sales analysis example provided in seaborn_example.py.

Overview

The example demonstrates how to:

Generate simulated e-commerce sales data
Create a DataFrame using pandas
Create multiple types of plots using Seaborn
Handle datetime objects for regression plots
Customize plot appearance
Save and display the plots
Calculate and print basic sales statistics

Code Explanation

Imports and Setup

import os
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

output_dir = os.path.dirname(os.path.abspath(__file__))
sns.set_style("whitegrid")

This section imports necessary modules, sets up the output directory, and sets the Seaborn style.

Data Generation

np.random.seed(42)
start_date = datetime(2023, 1, 1)
dates = [start_date + timedelta(days=i) for i in range(365)]
sales = np.random.normal(1000, 200, 365) + 500 * np.sin(np.arange(365) * 2 * np.pi / 365)
categories = np.random.choice(['Electronics', 'Clothing', 'Books', 'Home & Kitchen'], 365)

df = pd.DataFrame({
    'date': dates,
    'sales': sales,
    'category': categories
})

Here, we generate simulated e-commerce sales data for a full year and create a pandas DataFrame.

Data Processing

df['month'] = df['date'].dt.to_period('M')
df['date_ordinal'] = df['date'].apply(lambda x: x.toordinal())

This adds a month column for later aggregation.

Plotting

plt.figure(figsize=(15, 10))

# Plot 1: Time series of daily sales
plt.subplot(2, 2, 1)
sns.lineplot(x='date', y='sales', data=df)

# Plot 2: Box plot of sales by category
plt.subplot(2, 2, 2)
sns.boxplot(x='category', y='sales', data=df)

# Plot 3: Heatmap of monthly sales
monthly_sales = df.groupby(['month', 'category'])['sales'].sum().unstack()
plt.subplot(2, 2, 3)
sns.heatmap(monthly_sales, cmap='YlOrRd', annot=False)

# Plot 4: Scatter plot of daily sales with trend line
plt.subplot(2, 2, 4)
sns.regplot(x='date_ordinal', y='sales', data=df, scatter_kws={'alpha':0.5}, line_kws={'color': 'red'})
plt.title('Daily Sales with Trend')
plt.xlabel('Date')
plt.ylabel('Sales ($)')

This section creates four different types of plots using Seaborn: a line plot, a box plot, a heatmap, and a scatter plot with a regression line.

Saving and Displaying the Plot

plt.tight_layout()
output_file_path = os.path.join(output_dir, 'sales_analysis.png')
plt.savefig(output_file_path, dpi=300)
plt.show()

This saves the plot as a high-resolution PNG file and displays it.

Statistical Analysis

total_sales = df['sales'].sum()
avg_daily_sales = df['sales'].mean()
best_selling_category = df.groupby('category')['sales'].sum().idxmax()
worst_selling_category = df.groupby('category')['sales'].sum().idxmin()

print(f"Total sales: ${total_sales:.2f}")
print(f"Average daily sales: ${avg_daily_sales:.2f}")
print(f"Best selling category: {best_selling_category}")
print(f"Worst selling category: {worst_selling_category}")

This section calculates and prints basic statistics about the sales data.

Running the Example

To run this example:

Ensure you have Seaborn, Matplotlib, NumPy, and pandas installed:
```
pip install seaborn matplotlib numpy pandas
```
Run the script:
```
python seaborn_example.py
```

E-commerce Sales Analysis with Seaborn

Prerequisites

What You'll Learn

Seaborn Data Visualization Tutorial

Overview

Code Explanation

Imports and Setup

Data Generation

Data Processing

Plotting

Saving and Displaying the Plot

Statistical Analysis

Running the Example

Category

Tools Used

E-commerce Sales Analysis with Seaborn

Prerequisites

What You'll Learn

Seaborn Data Visualization Tutorial

Overview

Code Explanation

Imports and Setup

Data Generation

Data Processing

Plotting

Saving and Displaying the Plot

Statistical Analysis

Running the Example

Category

Tools Used