Master statistical visualization with Seaborn's high-level interface. Learn to create attractive distribution plots, regression visualizations, and categorical comparisons with minimal code - perfect for rapid data exploration and analysis.
This document explains the Seaborn e-commerce sales analysis example provided in seaborn_example.py.
The example demonstrates how to:
import os import seaborn as sns import matplotlib.pyplot as plt import pandas as pd import numpy as np from datetime import datetime, timedelta output_dir = os.path.dirname(os.path.abspath(__file__)) sns.set_style("whitegrid")
This section imports necessary modules, sets up the output directory, and sets the Seaborn style.
np.random.seed(42) start_date = datetime(2023, 1, 1) dates = [start_date + timedelta(days=i) for i in range(365)] sales = np.random.normal(1000, 200, 365) + 500 * np.sin(np.arange(365) * 2 * np.pi / 365) categories = np.random.choice(['Electronics', 'Clothing', 'Books', 'Home & Kitchen'], 365) df = pd.DataFrame({ 'date': dates, 'sales': sales, 'category': categories })
Here, we generate simulated e-commerce sales data for a full year and create a pandas DataFrame.
df['month'] = df['date'].dt.to_period('M') df['date_ordinal'] = df['date'].apply(lambda x: x.toordinal())
This adds a month column for later aggregation.
plt.figure(figsize=(15, 10)) # Plot 1: Time series of daily sales plt.subplot(2, 2, 1) sns.lineplot(x='date', y='sales', data=df) # Plot 2: Box plot of sales by category plt.subplot(2, 2, 2) sns.boxplot(x='category', y='sales', data=df) # Plot 3: Heatmap of monthly sales monthly_sales = df.groupby(['month', 'category'])['sales'].sum().unstack() plt.subplot(2, 2, 3) sns.heatmap(monthly_sales, cmap='YlOrRd', annot=False) # Plot 4: Scatter plot of daily sales with trend line plt.subplot(2, 2, 4) sns.regplot(x='date_ordinal', y='sales', data=df, scatter_kws={'alpha':0.5}, line_kws={'color': 'red'}) plt.title('Daily Sales with Trend') plt.xlabel('Date') plt.ylabel('Sales ($)')
This section creates four different types of plots using Seaborn: a line plot, a box plot, a heatmap, and a scatter plot with a regression line.
plt.tight_layout() output_file_path = os.path.join(output_dir, 'sales_analysis.png') plt.savefig(output_file_path, dpi=300) plt.show()
This saves the plot as a high-resolution PNG file and displays it.
total_sales = df['sales'].sum() avg_daily_sales = df['sales'].mean() best_selling_category = df.groupby('category')['sales'].sum().idxmax() worst_selling_category = df.groupby('category')['sales'].sum().idxmin() print(f"Total sales: ${total_sales:.2f}") print(f"Average daily sales: ${avg_daily_sales:.2f}") print(f"Best selling category: {best_selling_category}") print(f"Worst selling category: {worst_selling_category}")
This section calculates and prints basic statistics about the sales data.
To run this example:
Ensure you have Seaborn, Matplotlib, NumPy, and pandas installed:
pip install seaborn matplotlib numpy pandas
Run the script:
python seaborn_example.py
The script will generate a plot with four subplots analyzing different aspects of the simulated sales data, save it as 'sales_analysis.png' in the same directory as the script, display the plot, and print some basic statistics about the sales data.