Create publication-quality visualizations using Matplotlib's powerful plotting capabilities. Learn to build custom charts, control plot aesthetics, create subplots, and export figures in multiple formats - the foundation of data visualization in Python.
This document explains the Matplotlib weather data analysis example provided in matplotlib_example.py.
The example demonstrates how to:
import os import matplotlib.pyplot as plt import numpy as np import pandas as pd from datetime import datetime, timedelta output_dir = os.path.dirname(os.path.abspath(__file__))
This section imports necessary modules and sets up the output directory for saving the plot.
np.random.seed(42) start_date = datetime(2023, 1, 1) dates = [start_date + timedelta(days=i) for i in range(365)] temperatures = np.random.normal(15, 10, 365) + 5 * np.sin(np.arange(365) * 2 * np.pi / 365) df = pd.DataFrame({'date': dates, 'temperature': temperatures})
Here, we generate simulated weather data for a full year and create a pandas DataFrame.
df['moving_avg'] = df['temperature'].rolling(window=7).mean()
This calculates a 7-day moving average of temperatures.
plt.figure(figsize=(12, 6)) plt.plot(df['date'], df['temperature'], label='Daily Temperature', alpha=0.5) plt.plot(df['date'], df['moving_avg'], label='7-day Moving Average', color='red')
These lines create the main plot with daily temperatures and the moving average.
seasons = [ ('Winter', datetime(2023, 1, 1), datetime(2023, 3, 20)), ('Spring', datetime(2023, 3, 21), datetime(2023, 6, 20)), ('Summer', datetime(2023, 6, 21), datetime(2023, 9, 22)), ('Fall', datetime(2023, 9, 23), datetime(2023, 12, 20)), ('Winter', datetime(2023, 12, 21), datetime(2023, 12, 31)) ] for season, start, end in seasons: plt.axvspan(start, end, alpha=0.2, label=season if season not in plt.gca().get_legend_handles_labels()[1] else "")
This section highlights different seasons on the plot using axvspan.
plt.title('Daily Temperatures and 7-day Moving Average (2023)') plt.xlabel('Date') plt.ylabel('Temperature (°C)') plt.legend() plt.grid(True, linestyle='--', alpha=0.7) plt.gcf().autofmt_xdate()
These lines customize various aspects of the plot, including title, labels, legend, grid, and date formatting.
plt.tight_layout() output_file_path = os.path.join(output_dir, 'temperature_analysis.png') plt.savefig(output_file_path, dpi=300) plt.show()
This saves the plot as a high-resolution PNG file and displays it.
hottest_day = df.loc[df['temperature'].idxmax()] coldest_day = df.loc[df['temperature'].idxmin()] print(f"Hottest day: {hottest_day['date'].strftime('%Y-%m-%d')} with {hottest_day['temperature']:.2f}°C") print(f"Coldest day: {coldest_day['date'].strftime('%Y-%m-%d')} with {coldest_day['temperature']:.2f}°C") print(f"Average temperature: {df['temperature'].mean():.2f}°C")
This section calculates and prints basic statistics about the temperature data.
Here's a detailed example showcasing key Matplotlib features:
import matplotlib.pyplot as plt import numpy as np import pandas as pd from datetime import datetime, timedelta # Simulate weather data np.random.seed(42) start_date = datetime(2023, 1, 1) dates = [start_date + timedelta(days=i) for i in range(365)] temperatures = np.random.normal(15, 10, 365) + 5 * np.sin(np.arange(365) * 2 * np.pi / 365) # Create DataFrame df = pd.DataFrame({'date': dates, 'temperature': temperatures}) # Calculate moving average df['moving_avg'] = df['temperature'].rolling(window=7).mean() # Create the plot plt.figure(figsize=(12, 6)) # Plot daily temperatures plt.plot(df['date'], df['temperature'], label='Daily Temperature', alpha=0.5) # Plot moving average plt.plot(df['date'], df['moving_avg'], label='7-day Moving Average', color='red') # Highlight seasons seasons = [ ('Winter', datetime(2023, 1, 1), datetime(2023, 3, 20)), ('Spring', datetime(2023, 3, 21), datetime(2023, 6, 20)), ('Summer', datetime(2023, 6, 21), datetime(2023, 9, 22)), ('Fall', datetime(2023, 9, 23), datetime(2023, 12, 20)), ('Winter', datetime(2023, 12, 21), datetime(2023, 12, 31)) ] for season, start, end in seasons: plt.axvspan(start, end, alpha=0.2, label=season if season not in plt.gca().get_legend_handles_labels()[1] else "") # Customize the plot plt.title('Daily Temperatures and 7-day Moving Average (2023)') plt.xlabel('Date') plt.ylabel('Temperature (°C)') plt.legend() plt.grid(True, linestyle='--', alpha=0.7) # Rotate and align the tick labels so they look better plt.gcf().autofmt_xdate() # Save the plot plt.tight_layout() plt.savefig('/path/to/your/output/temperature_analysis.png', dpi=300) # Display the plot plt.show() # Calculate and print some statistics hottest_day = df.loc[df['temperature'].idxmax()] coldest_day = df.loc[df['temperature'].idxmin()] print(f"Hottest day: {hottest_day['date'].strftime('%Y-%m-%d')} with {hottest_day['temperature']:.2f}°C") print(f"Coldest day: {coldest_day['date'].strftime('%Y-%m-%d')} with {coldest_day['temperature']:.2f}°C") print(f"Average temperature: {df['temperature'].mean():.2f}°C")
Data Generation and Manipulation:
Moving Average Calculation:
Multiple Line Plots:
Customizing Plot Appearance:
Highlighting Seasons:
Date Handling:
Saving and Displaying Plots:
Basic Statistical Analysis:
This example illustrates how Matplotlib can be used in a data engineering workflow to visualize and analyze time series data. It demonstrates the library's flexibility in creating custom visualizations that combine multiple data series, highlight specific periods, and present data in a clear, informative manner. These skills are essential for data engineers who need to explore data patterns, validate data processing steps, and communicate insights effectively to stakeholders.
To run this example:
Ensure you have Matplotlib, NumPy, and pandas installed:
pip install matplotlib numpy pandas
Run the script:
python matplotlib_example.py
The script will generate a plot of simulated temperature data, save it as 'temperature_analysis.png' in the same directory as the script, display the plot, and print some basic statistics about the data.