Process and analyze time-series sensor data using NumPy's powerful array operations. Learn to perform statistical analysis, smooth data with rolling averages, detect anomalies, and visualize results - essential skills for IoT and monitoring applications.
This example demonstrates how to use NumPy for processing and analyzing sensor data in a data engineering context. It covers:
import os import numpy as np import matplotlib.pyplot as plt from datetime import datetime, timedelta script_dir = os.path.dirname(os.path.abspath(__file__))
This section imports necessary modules and sets up the script directory for file operations.
np.random.seed(42) num_days = 30 readings_per_day = 24 * 6 start_date = datetime(2023, 1, 1) timestamps = np.array([start_date + timedelta(minutes=10*i) for i in range(num_days * readings_per_day)]) base_temp = 20 + 5 * np.sin(np.linspace(0, 2*np.pi*num_days, num_days * readings_per_day)) noise = np.random.normal(0, 0.5, num_days * readings_per_day) anomalies = np.random.randint(0, num_days * readings_per_day, 5) temperatures = base_temp + noise temperatures[anomalies] += np.random.uniform(5, 10, 5)
This code generates synthetic temperature data with timestamps, including some noise and anomalies.
temperatures[temperatures < -273.15] = np.nan mean_temp = np.mean(temperatures) median_temp = np.median(temperatures) std_temp = np.std(temperatures) min_temp = np.min(temperatures) max_temp = np.max(temperatures)
Here, we clean the data by replacing impossible values with NaN and perform basic statistical analysis.
window_size = 6 rolling_avg = np.convolve(temperatures, np.ones(window_size)/window_size, mode='valid') anomaly_threshold = 3 * std_temp anomalies = np.abs(temperatures - mean_temp) > anomaly_threshold anomaly_count = np.sum(anomalies)
This section calculates a rolling average for data smoothing and detects anomalies based on standard deviation.
plt.figure(figsize=(12, 6)) plt.plot(timestamps, temperatures, label='Raw Data', alpha=0.5) plt.plot(timestamps[window_size-1:], rolling_avg, label='1-Hour Rolling Average', linewidth=2) plt.scatter(timestamps[anomalies], temperatures[anomalies], color='red', label='Anomalies', zorder=5) # ... (additional plotting code) figure_file_path = os.path.join(script_dir, 'temperature_analysis.png') plt.savefig(figure_file_path)
This code visualizes the raw data, rolling average, and detected anomalies, saving the plot as an image file.
To run this example:
Ensure you have NumPy and Matplotlib installed:
pip install numpy matplotlib
Save the Python code in a file, e.g., 'numpy_example.py'
Run the script:
python numpy_example.py
The script will generate synthetic sensor data, perform analysis, and create a 'temperature_analysis.png' file with the visualization.
np.convolve.This example showcases how NumPy can be used in data engineering tasks for efficient numerical computations, data analysis, and preprocessing, particularly when dealing with large arrays of sensor data.