
We like to think we’re rational investors, driven by data and diligent analysis. But let’s be real: markets don’t just move on numbers — they swing wildly on emotion, hype, and headlines. Think about it. One tweet, one scandal, one new product launch, and a company’s stock can skyrocket or tank, sometimes for reasons that have nothing to do with the balance sheet. This is the hidden power of sentiment in the stock market: it’s not about what is happening, but how people feel about it.
In a world where perception often outweighs reality, the ability to read market sentiment becomes a critical skill. That’s where this project steps in. With a unique combination of sentiment analysis and real-time data scraping, we’re capturing the heartbeat of the market for companies like Amazon, Google, and Meta— seeing beyond the financial reports to understand the emotion driving the prices.
Whether you’re an investor looking for an edge or simply fascinated by the psychology of finance, this project lets you see the market as more than numbers on a screen. Get ready to dive into the minds of the market movers and see what makes prices tick.
TL;DR:
Sentiment analysis tools like VADER provide a powerful yet imperfect solution for extracting emotional insights from text data. While capable of classifying sentiments as positive or negative, they can fall short in scenarios involving sarcasm or conflicting language.
For visual learning, this video has all the information, and steps providing an in depth and engaging explanation of how the code works, and is applied.
A Classical Approach to Stock Analysis
In the world of traditional investing, tracking a stock’s sentiment means staying updated with news, reports, and analyst opinions — all of which take considerable time to read and interpret. Often, investors rely on financial advisors, news summaries, or quantitative analysis platforms to stay informed, but each of these is either time-consuming or expensive. What if we could combine the power of natural language processing (NLP) and sentiment analysis to speed up this process?
Manual Sentiment Analysis: A Tedious Process
A manual approach to reading stock sentiment looks something like this:
Read the Headline: Manually go through each piece of financial news.
Interpret the Tone: Decide if the news is positive, neutral, or negative.
Record the Sentiment: Track these interpretations in a database.
This method quickly becomes impractical. The speed and sheer volume of financial news demand a faster, more automated solution.
Sentiment Analysis with VADER: A Modern Solution
Enter VADER (Valence Aware Dictionary and sEntiment Reasoner). VADER is an NLP tool specifically designed for analyzing the sentiment of short texts, making it ideal for headlines. Here’s how VADER transforms headlines into scores:
Positive, Neutral, and Negative Scoring: Each headline is broken down into scores across positive, negative, and neutral categories.
Compound Score: The overall sentiment is expressed as a compound score ranging from -1 to +1, with positive scores indicating optimism and negative scores signaling concern.
Let’s consider a basic example:
A headline like “Amazon breaks sales records this quarter” might receive a high positive score.
In contrast, “Market challenges affect Meta’s profits” could score negatively.
VADER’s algorithm sorts these scores, allowing us to quickly see which companies are trending positively or negatively in the market.
The Naive Bayes Algorithm: The Engine Behind Sentiment Analysis
Our project taps into the fundamentals of sentiment analysis through a statistical approach known as the Naive Bayes Algorithm. This algorithm, grounded in probability, predicts whether a headline is likely to impact a stock positively or negatively. It works by evaluating the likelihood of certain words or phrases appearing in either positive or negative reviews based on historical data.
In this model, we let “X” represent all the headlines or news snippets, while “Y” becomes the final output score, where a positive score indicates an optimistic sentiment and a negative one a pessimistic sentiment. Here’s a look at the process:
P(Y=Positive | X) gives us the probability that the market sentiment is positive given a particular headline.
P(Y=Negative | X) does the same for negative sentiment, allowing us to gauge the influence of news on a stock’s public perception.
VADER: Quick, Rule-Based Sentiment Scoring
We use VADER (Valence Aware Dictionary and sEntiment Reasoner), a rule-based model built for general sentiment analysis. VADER scores text for positive, neutral, negative, and compound (overall) sentiment. The beauty of VADER is its efficiency: it doesn’t analyze every word but focuses on those that carry emotional weight, skipping filler words like “the” or “and.”
This tool enables us to calculate sentiment on a massive scale — quickly scoring each headline without heavy computational demands.
Web Scraping and Real-Time Analysis
Of course, analyzing sentiment would be meaningless without real-time data. Using Python’s web scraping tools, we fetch live headlines and stock news directly from Finviz, a leading financial website. Our scraper identifies and extracts headlines tied to specific tickers — AMZN, GOOG, META— pulling them into our analysis pipeline. By examining the latest news, we can dynamically update the sentiment score based on recent events.
Our project then structures the scraped data, linking each headline with its timestamp and sentiment score to create a live snapshot of how the market feels about each company. The final result? An insightful, constantly updating bar chart displaying sentiment over time for each stock.
Coding the Project: Step-by-Step Implementation
This section breaks down how I coded a simple sentiment analysis tool that scrapes headlines from Finviz, analyzes them with VADER, and visualizes the results.
Step 1: Data Collection — Gathering Headlines from Finviz
The first part of our process involves collecting data. For this project, we pull stock headlines directly from Finviz for tickers like Amazon (AMZN), Google (GOOG), and Meta (META), which serve as the foundation for our sentiment analysis.
Here’s the code for scraping headlines:
from urllib.request import urlopen, Request
from bs4 import BeautifulSoup
tickers = ['AMZN', 'GOOG', 'META']
finviz_url = 'https://finviz.com/quote.ashx?t='news_tables = {}
for ticker in tickers:
url = finviz_url + ticker
req = Request(url=url, headers={'user-agent': 'my-app'})
response = urlopen(req)
html = BeautifulSoup(response, 'html.parser')
news_table = html.find(id='news-table')
news_tables[ticker] = news_table
We start with a request to Finviz for each ticker symbol. Once we gather the HTML data, we use BeautifulSoup to isolate the table containing each company’s news. This table is stored for further processing, ready to be analyzed for sentiment.
Step 2: Applying Sentiment Analysis with VADER
Once we’ve gathered the news data, we dive into sentiment analysis using VADER. VADER, or Valence Aware Dictionary and sEntiment Reasoner, provides a compound sentiment score that categorizes text as positive, negative, or neutral.
Here’s the code for processing sentiment:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import pandas as pd
parsed_data = []
vader = SentimentIntensityAnalyzer()for ticker, news_table in news_tables.items():
for row in news_table.findAll('tr'):
title = row.a.text.strip()
date_data = row.td.text.strip().split(' ')
date = date_data[0]
time = date_data[1] if len(date_data) > 1 else 'N/A'
compound = vader.polarity_scores(title)['compound']
parsed_data.append([ticker, date, time, title, compound])df = pd.DataFrame(parsed_data, columns=['ticker', 'date', 'time', 'title', 'compound'])
This step applies VADER’s compound score to each headline, converting sentiment into a numeric value that is easy to interpret: positive, negative, or neutral. We now have structured data that reflects the emotional tone behind each headline, making it possible to track shifts in market sentiment over time.
Step 3: Visualizing the Results
Finally, we visualize our sentiment scores to reveal trends and patterns. This step involves grouping data by date, calculating average sentiment for each stock, and plotting it as a bar graph to showcase sentiment shifts over time.
Here’s how we generate this visual insight:
import matplotlib.pyplot as plt
df['date'] = pd.to_datetime(df['date'])
mean_df = df.groupby(['ticker', 'date']).mean().unstack()
mean_df = mean_df.xs('compound', axis="columns")
mean_df.plot(kind='bar')
plt.title("Daily Sentiment for Stocks")
plt.xlabel("Date")
plt.ylabel("Mean Compound Sentiment Score")
plt.show()
The bar plot effectively communicates sentiment shifts by date and ticker. A positive bar height indicates optimism, while a negative bar height signals caution or pessimism — making it easy to identify which stocks are receiving favorable or unfavorable attention
AI isn’t perfect
AI is powerful, but it’s not perfect. There are times when sentiment analysis models misinterpret the tone of reviews. For example, a headline may earn a sentiment score that doesn’t align with the true intent behind the text, leading to results that may not accurately reflect the market’s mood. This becomes especially evident in cases of sarcasm or complex phrasing, where the literal words don’t match the implied sentiment.
Let’s look at some cases where sentiment scores and actual sentiment diverge:
# Identifying positive sentiment in news that should be negative
positive_sentiment_negative_review = df.query('compound > 0 and title.str.contains("loss|decline|problem")', engine='python')
print(positive_sentiment_negative_review[['title', 'compound']].head(3))
# Identifying negative sentiment in news that should be positive
negative_sentiment_positive_review = df.query('compound < 0 and title.str.contains("growth|gain|success")', engine='python')
print(negative_sentiment_positive_review[['title', 'compound']].head(3))
By querying for specific keywords that may contradict the sentiment score, we can identify examples of sentiment discrepancies. Here’s an example of where this might occur:
“Company reports strong growth, but a decline in specific sectors”: VADER might score this as positive due to keywords like strong and growth, despite the mention of decline, which tempers the news sentiment.
“Revenue gains lead to success, although facing minor issues”: The model may interpret success and gains positively, overlooking the sentiment-shifting impact of minor issues.
These nuances illustrate how AI models, including VADER, can stumble with sentiment detection due to the inability to gauge context fully, especially for nuanced expressions like sarcasm.
Thank You,
This is an old project of mine from ~ a year ago! It was originally posted on Medium but a good friend told me it helps to cross-post. If you took the time to read all of this, I want to say thank you for your time — I’m sure your a very busy person, so your time means more than you would know to me.
I want to use this blog experiment as a way for people to learn more about me and, hopefully, connect with more people. If you have any questions about what I’ve done, or would just like the chat — hit me up at pointone@barbaros.ca — if I don’t get back within 24 hours, coffee’s on me.