Let me just start by saying: I have always been fascinated by the power of data. However, before embarking on this project, my experience with programming was limited to basic scripting — the furthest I’d gone was a step by step YouTube tutorial to put it into perspective haha.
With such a beginner level understanding, how did I create a robust Twitter Sentiment Analyzer that can:
Fetch real-time tweets based on user-defined queries
Clean and preprocess tweet text for analysis
Analyze sentiments using natural language processing
Visualize sentiment distribution with intuitive charts
Save and debug processed data for future reference?
I’ll walk you through each step below! However feel free to check out my tutorial for a full walkthrough on how this works!
Before you start, ensure you have Python installed on your machine along with the necessary libraries:
tweepy
,textblob
,matplotlib
, andre
. You can install these using pip:
pip install tweepy textblob matplotlib
Step 1: Loading Configuration
Configurations like API keys and default settings are crucial for accessing Twitter’s API and customizing the behavior of our analyzer.
Comparable to setting up your workspace, loading configurations ensures everything is in place before diving into the code.
import json
def load_config():
with open("config.json", "r", encoding="utf-8") as f:
data = f.read()
print("File content:", data) # Debugging: ensure the JSON is read correctly
return json.loads(data)
Explanation:
Purpose: Reads the
config.json
file containing sensitive information like API keys.Debugging: Prints the file content to verify correct reading (remove in production).
JSON Parsing: Converts the JSON string into a Python dictionary for easy access.
Sample config.json
:
{
"bearer_token": "YOUR_TWITTER_BEARER_TOKEN",
"default_tweet_count": 50,
"polarity_positive_threshold": 0.2,
"polarity_negative_threshold": -0.2
}
Visual Aid: Imagine a neatly organized toolbox where each tool has its designated place — this is what config.json
achieves for your project.
Step 2: Authenticating with Twitter API
Authentication is like getting your ticket to access Twitter’s data. Without it, you can’t fetch any tweets.
import tweepy
def authenticate(config):
# For Twitter API v2, we primarily use the bearer token
client = tweepy.Client(bearer_token=config["bearer_token"])
return client
Explanation:
Tweepy Client: Utilizes the bearer token from the config to authenticate requests.
Security: Keeps API keys secure by storing them in a separate configuration file.
Best Practice Tip: Never hard-code your API keys in your scripts. Always use environment variables or configuration files excluded from version control.
Step 3: Fetching Tweets
Fetching tweets based on user-defined queries is the core of our analyzer.
def fetch_tweets(client, query, count):
# Twitter API v2: search_recent_tweets can fetch up to 100 tweets per request
max_results = min(count, 100)
# Append `lang:en` to the query to filter English tweets
query = f"{query} lang:en"
response = client.search_recent_tweets(query=query, max_results=max_results, tweet_fields=['text'])
return response.data if response.data else []
Explanation:
Query Parameters: Filters tweets in English to maintain consistency in analysis.
Rate Limits: Caps the number of tweets per request to 100, adhering to Twitter’s API limits.
Error Handling: Returns an empty list if no tweets are found to prevent crashes.
Visual Comparison:
Before Fetching:
[Empty]
After Fetching:
["I love sunny days!", "Feeling sad about the news...", ...]
Step 4: Preprocessing Tweets
Raw tweet data can be messy. Preprocessing cleans the text, making it suitable for sentiment analysis.
import re
def preprocess_tweet(text):
text = re.sub(r'RT\s+', '', text) # Remove retweets
text = re.sub(r'@\w+', '', text) # Remove mentions
text = re.sub(r'http\S+', '', text) # Remove URLs
text = re.sub(r'\s+', ' ', text).strip() # Remove extra whitespace
return text
Explanation:
Regex Patterns: Efficiently removes unwanted parts of the tweet.
Clean Text: Ensures the sentiment analysis focuses solely on meaningful content.
Analogy: Think of preprocessing as cleaning ingredients before cooking — removing impurities to enhance the final dish’s flavor.
Step 5: Analyzing Sentiment
Using natural language processing to determine the sentiment of each tweet.
from textblob import TextBlob
def analyze_sentiment(tweets, pos_threshold, neg_threshold, debug=False):
positive_count = 0
neutral_count = 0
negative_count = 0
processed_tweets = []
for tweet in tweets:
cleaned_text = preprocess_tweet(tweet.text)
blob = TextBlob(cleaned_text)
polarity = blob.sentiment.polarity
if polarity > pos_threshold:
positive_count += 1
elif polarity < neg_threshold:
negative_count += 1
else:
neutral_count += 1
if debug:
processed_tweets.append((cleaned_text, polarity))
return positive_count, neutral_count, negative_count, processed_tweets
Explanation:
TextBlob: Analyzes the polarity of the tweet, ranging from -1 (negative) to +1 (positive).
Thresholds: Customizable to define what constitutes positive, neutral, or negative sentiments.
Debugging: Optionally stores processed tweets and their polarity for further inspection.
Visual Aid: Imagine a scale where tweets are placed based on their sentiment score — some leaning towards positive, some neutral, and others negative.
Step 6: Plotting Results
Visualizing the sentiment distribution helps in quickly understanding the overall mood of the tweets.
import matplotlib.pyplot as plt
def plot_results(positive, neutral, negative):
labels = ['Positive', 'Neutral', 'Negative']
counts = [positive, neutral, negative]
colors = ['green', 'blue', 'red']
plt.figure(figsize=(8, 6))
plt.bar(labels, counts, color=colors)
plt.title('Sentiment Distribution')
plt.ylabel('Count')
plt.xlabel('Sentiment')
plt.ylim(0, max(counts) + 10)
for i, count in enumerate(counts):
plt.text(i, count + 1, str(count), ha='center', va='bottom')
plt.show()
Explanation:
Bar Chart: Clearly displays the number of positive, neutral, and negative tweets.
Customization: Colors and labels enhance readability and visual appeal.
Annotations: Adds count labels above each bar for precise information.
Best Practice Tip: Always label your axes and provide a title to make charts self-explanatory.
Step 7: Running the Main Function
Integrating all components to execute the sentiment analysis workflow.
def main():
config = load_config()
client = authenticate(config)
query = input("Enter a search term: ")
count_input = input(f"Number of tweets to fetch (default: {config['default_tweet_count']}): ")
try:
count = int(count_input) if count_input.strip() else config["default_tweet_count"]
except ValueError:
print("Invalid input. Using default tweet count.")
count = config["default_tweet_count"]
tweets = fetch_tweets(client, query, count)
if not tweets:
print("No tweets found. Please try a different query.")
return
positive, neutral, negative, _ = analyze_sentiment(
tweets,
pos_threshold=config.get("polarity_positive_threshold", 0.2),
neg_threshold=config.get("polarity_negative_threshold", -0.2),
debug=False
)
print("\nSentiment Analysis Results:")
print(f"Positive: {positive}")
print(f"Neutral: {neutral}")
print(f"Negative: {negative}")
show_chart = input("\nShow chart? (y/n): ")
if show_chart.lower() == 'y':
plot_results(positive, neutral, negative)
debug_print = input("Print cleaned tweets for debugging? (y/n): ")
if debug_print.lower() == 'y':
_, _, _, debug_tweets = analyze_sentiment(
tweets,
pos_threshold=config.get("polarity_positive_threshold", 0.2),
neg_threshold=config.get("polarity_negative_threshold", -0.2),
debug=True
)
print("\nSample Cleaned Tweets:")
for t, p in debug_tweets[:10]:
print(f"\"{t}\" - Polarity: {p}")
Explanation:
User Interaction: Prompts users to input search terms and desired tweet counts.
Error Handling: Validates user input and handles scenarios where no tweets are found.
Conditional Execution: Offers options to display charts and debug information based on user preference.
Sample Output: Displays a subset of cleaned tweets with their polarity scores for deeper insights.
Best Practice Tip: Always validate user inputs to make your program robust against unexpected or invalid data.
Conclusion
Building a Twitter Sentiment Analyzer from scratch was an enlightening experience that bridged my understanding of data fetching, natural language processing, and data visualization. By integrating Python’s powerful libraries — Tweepy, TextBlob, and Matplotlib — I was able to create a tool that not only analyzes sentiments but also presents the data in an intuitive manner.
What Did I Learn?
API Integration: Gained hands-on experience with Twitter’s API using Tweepy, understanding how to authenticate and fetch real-time data.
Data Cleaning: Learned the importance of preprocessing raw data to ensure accurate sentiment analysis.
Natural Language Processing: Utilized TextBlob to analyze and interpret the sentiment of textual data effectively.
Data Visualization: Enhanced my ability to present data through clear and informative charts using Matplotlib.
Error Handling: Implemented robust error handling to make the application more user-friendly and resilient.
What’s the Bigger Picture?
In an era where data drives decisions, sentiment analysis tools like this one can be invaluable for businesses, researchers, and individuals seeking to gauge public opinion or emotional trends. Imagine scaling this project to:
Real-Time Monitoring: Integrate with live data streams to monitor sentiment in real-time.
Advanced Analytics: Incorporate machine learning models for more nuanced sentiment detection.
User Interface: Develop a web-based interface for broader accessibility and user interaction.
Multi-Language Support: Expand the analyzer to handle tweets in multiple languages, increasing its global applicability.
Future Enhancements:
Subscription Models: Offer premium features such as historical data analysis, detailed reports, and personalized dashboards.
Integration with Other Platforms: Extend the tool to analyze sentiments across various social media platforms like Facebook, Instagram, and Reddit.
AI and Machine Learning: Leverage more sophisticated models to improve sentiment accuracy and context understanding.
Thank You,
This is an old project of mine from ~ a year ago! It was originally posted on Medium but a good friend told me it helps to cross-post. If you took the time to read all of this, I want to say thank you for your time — I’m sure your a very busy person, so the time you’ve given me means a lot more than you might realize.
I want to use this blog experiment as a way for people to learn more about me and, hopefully, connect with more people. If you have any questions about what I’ve done, or would just like the chat — hit me up at pointone@barbaros.ca — if I don’t get back within 24 hours, coffee’s on me.
Additional Resources
Source Code (Github Repo)
Tweepy for interacting with the Twitter API
TextBlob for sentiment analysis
Matplotlib for data visualization
Regex (
re
) for text preprocessing