Retrieve tweets, user profiles, trends, and engagement metrics from the Twitter/X platform via its REST and streaming APIs. Useful for social media analytics pipelines, sentiment analysis, and building real-time data streams with Python using the Tweepy library.
The `tweepy` library is the standard Python client for Twitter/X API v2. Engineers use streaming endpoints to ingest tweets into Kafka or Kinesis, then process them with Spark Streaming or Flink for real-time analytics.
Twitter data is a primary source for fine-tuning sentiment classifiers and training social media language models. RAG pipelines can retrieve recent tweets about a topic to ground LLM responses with up-to-date public opinion. The API also powers AI-driven trend detection and topic clustering systems.
# pip install tweepy
import tweepy
client = tweepy.Client(bearer_token="YOUR_BEARER_TOKEN")
tweets = client.search_recent_tweets(
query="python data engineering",
max_results=10
)
for tweet in tweets.data:
print(tweet.text)Official dataset source
More datasets used by Python data engineers.
Access music metadata, audio features (tempo, energy, danceability), playlist data, artist catalogues, and listening history from the Spotify platform. Used in data engineering for building music recommendation systems, audio feature datasets, and trend analysis pipelines with the spotipy Python library.
Access repositories, commits, pull requests, issues, users, and organisation data from GitHub. Ideal for building developer analytics pipelines, tracking open-source project activity, and ingesting code metadata into data warehouses using Python and the PyGitHub library.
A lightweight REST API that returns random facts and trivia about cats. Useful for learning API integration, testing HTTP client libraries in Python, and building practice ETL pipelines before connecting to more complex data sources.