Utilizing News in the Stock Market

Multi-Model Sentiment Analysis with Technical Integration for Short-Term Stock Price Prediction

Akash Juwadi, Manat Rao, and Nicole Reardon

Mentor: Sheffield Nolan

Our project addresses a fundamental challenge in financial markets: accurately predicting short-term stock price movements. While traditional approaches like technical analysis only rely on historical patterns, they often fail to capture sudden market shifts triggered by breaking news and ever-changing market sentiment. However, solely relying on market sentiment assumes high risk due to the subjective and biased nature of certain sentiment.

To bridge this gap, we developed an approach that combines both technical analysis with a sophisticated news headline sentiment analysis scorer to serve as inputs for a machine learning model that predicts future stock prices. By leveraging advanced natural language processing and machine learning techniques, our model captures the impact of realtime news on stock prices, enabling more precise market timing decisions.

Data Sources

Our study focused on a carefully selected set of stocks across four key market sectors: Technology, Healthcare, Energy, and Consumer Defensive. These sectors were chosen to minimize correlation and overlap, ensuring a diverse representation of market dynamics. Within each sector, we specifically chose companies with above-median market capitalization. For ample diversity, we selected 15 stocks that met this threshold, with only the Energy sector that had 11 companies above the median. This criterion was used to concentrate on companies that attract significant media attention and maintain enough trading volume for reliable price discovery. This ultimately helped address potential data sparsity issues that could arise with smaller, less-covered companies, where infrequent news mentions and price movements might be more influenced by liquidity constraints than by information flows.

List of 56 Selected Stocks by Sector

Healthcare Energy Technology Consumer Defensive
ISRG - Intuitive Surgical WMB - Williams Companies PLTR - Palantir Technologies COST - Costco
IDXX - IDEXX Laboratories KMI - Kinder Morgan CRWD - CrowdStrike WMT - Walmart
BSX - Boston Scientific OKE - ONEOK PANW - Palo Alto Networks MNST - Monster Beverage
LLY - Eli Lilly BKR - Baker Hughes NOW - ServiceNow HSY - Hershey’s
EW - Edwards Lifesciences CVX - Chevron ANET - Arista Networks PG - Procter & Gamble
ZTS - Zoetis XOM - Exxon Mobil CDNS - Cadence Design Systems CL - Colgate-Palmolive
SKY - Stryker MPC - Marathon Petroleum FTNT - Fortinet KO - Coca-Cola
DHR - Danaher EOG - EOG Resources ADSK - Autodesk PM - Philip Morris
ABT - Abbott Laboratories SLB - Schlumberger MSI - Motorola Solutions MDLZ - Mondelez International
A - Agilent Technologies COP - ConocoPhillips AVGO - Broadcom KVUE - Kenvue
CVS - CVS Health PSX - Phillips 66 CSCO - Cisco Systems TGT - Target
BMY - Bristol-Myers Squibb - NXPI - NXP Semiconductors KR - Kroger
CI - Cigna - QCOM - Qualcomm GIS - General Mills
MRK - Merck - DELL - Dell Technologies MO - Altria
PFT - Pfizer - MU - Micron Technology KHC - Kraft Heinz

The data in our project required a comprehensive collection of both real-time news headlines and financial market data. We established a systematic data acquisition strategy to scrape news article titles and utilized Yahoo Finance’s API for technical analysis.

We implemented an automated web scraping framework to collect news headlines from five major news outlets: The New York Times, CNN, Stat News, Associated Press, and ABC News. The Python-based web scraping framework is built using BeautifulSoup and Request libraries. It was executed at 30-minute intervals during standard market trading hours (9:30 AM to 4:30 PM Eastern Standard Time) to simulate real-time information processing capabilities that would be available to traders. As such, the time of scraping was also stored to be used in training the model. The data collection period extended from January 22, 2025, through February 28, 2025, encompassing 26 trading days. In total, we gathered over 14,600 unique headlines: 7,093 from the Associated Press, 4,280 from CNN, 1,923 from The New York Times, 923 from ABC News, and 413 from Stat News.

The technical analysis data for our study is built on a variety of indicators that capture different market trends and price behaviors. Using the Yahoo Finance API through the yfinance Python library, we systematically collected daily stock data for each selected company from January 1, 2024, to February 26, 2025. The data includes standard stock metrics such as daily opening and closing prices, as well as intraday price movements and trading volumes.

Key indicators derived from this data include:

  • Exponential Moving Averages (EMA): Short-term and medium-term trends were captured using two EMAs. These moving averages give more weight to recent price movements, making them more responsive to market changes.
  • Moving Average Convergence Divergence (MACD): This indicator measures the diference between two EMAs of different time periods, helping to identify buy or sell signals. The signal line, a smoothed version of the MACD, is also calculated to provide clearer insights into potential trend reversals.
  • Bollinger Bands: This indicator measures stock price volatility and identifies overbought or oversold conditions. The upper and lower bands, which are based on the stock’s moving average and standard deviation, are used to assess price extremes. The width of the bands reflects market volatility, while the relative position of the stock’s price within the bands can signal potential breakout points.

All these indicators were applied consistently across the stocks, allowing for the analysis of market trends, volatility, and momentum. This technical data forms the foundation for the model, helping to predict future stock price movements based on historical price patterns.

Headline Processing & Technical Indicactors

Headline Processing

Sentiment analysis of headlines was performed through a three-step process to efficiently filter and assess the relevance of news content.

Headline Processor Pipeline:

The first step of processing was removing redundant news headlines. Since we scraped news titles from multiple sources, several instances of headlines may have reported on the same event but with slight variations in wordings. Aside from reducing API costs, we wanted to remove duplicates and similarly reported articles to give equal weight to each story when passed into the large language model. We utilized an Agglomerative Clustering algorithm to facilitate this. After normalizing, each headline was first encoded into vector embeddings using a pre-trained language model developed by Microsoft, all-MiniLM-L6-v2. The Agglomerative Clustering model used cosine distance to measure similarity to determine which headlines were similar to each other. If it passed a certain threshold, 0.7, then they formed a cluster in which only one headline from each cluster was saved to pass into the large language model next. A threshold of 0.7 was selected to ensure that only highly similar headlines were grouped.

The next step involves determining the relevance of each selected headline to a specific sector. Relevance is a relatively straightforward task and can be handled by smaller large language models such as Mistral 7b, LLAMA 2-7b, or GPT-4o mini. For this task, we chose to use GPT-4o mini. Each headline is passed along with a sector to the model, which then classifies whether or not the headline is relevant for predicting price movement for stocks in that sector. Only those headlines deemed relevant are then forwarded to the final layer of the analysis.

In the final step, we assed how the relevant headlines influenced stock prices. All relevant headlines were passed together with their corresponding stocks to a more powerful LLM, such as GPT-4o or Gemini 1.5 Ultra. The model returns a score for each stock, ranging from -5 to 5, where -5 indicates strong confidence that the stock will decrease in price based on the headline, and 5 indicates strong confidence that the stock will increase in price the following day. These sentiment scores are then ready to be used in our final models.

Technical Indicactors

To predict stock price movements, we developed a technical model that combines traditional technical analysis with machine learning methods. Key technical indicators, including exponential moving averages (EMA), moving average convergence divergence (MACD), and Bollinger Bands, were calculated from the raw price data.

We calculated a 10-period EMA as well as a 25-period EMA. In general, a 10-period EMA captures short-term trends, while a 25-period EMA helps smooth out price fluctuations and reveals the medium-term direction. By comparing them, we can identify short-term reversals and confirm longer-term trends. More importantly, a crossover between shortterm and medium-term EMAs can be interpreted as a signal to buy or sell.

We calculate the difference between the 12-period EMA and the 26-period EMA for the MACD indicator. When the 12-period EMA is higher than the 26-period EMA, it indicates bullish momentum, and conversely, when the 12-period EMA is below the 26-period EMA, it signals bearish momentum. Additionally, we calculated the signal line by applying a 9-period EMA to the MACD line. The signal line serves as a trigger for buying or selling when the MACD line crosses above or below it.

Bollinger Bands consist of three lines: a moving average, typically the 5-period simple moving average, an upper band, and a lower band. The moving average typically represents the stock’s average price over a set period, while the upper and lower bands are calculated by adding and subtracting a specified number of standard deviations from the moving average. We selected to measure it against 1 standard deviation to be more sensitive to price changes as we are predicting for short-term trading. The distance between the upper and lower bands expands when the stock becomes more volatile and contracts during periods of low volatility. The relative position of the stock price within the bands is also used to gauge potential price movements and reversals.

These indicators were selected due to their ability to capture trends, momentum, volatility , and to identify overbought or oversold conditions. The resulting dataset is further enhanced by calculating the percentage change in price and adding a target variable that represents the next day’s closing price.

Technical Model

After feature extraction, the data was preprocessed to ensure quality and consistency. This included aligning stock data with sentiment scores derived from the headline processor, which was merged with the stock data on matching dates to create the final dataset. The data was split into training and test sets, with 90% of the data allocated for training and the remaining 10% used to evaluate the model’s predictive performance.

The machine learning model used is XGBoost, a gradient boosting framework known for its accuracy and efficiency with structured data. The model was trained to minimize squared error when predicting the next day’s closing price. After training, the model’s performance is evaluated using metrics such as mean squared error (MSE), mean absolute percentage error (MAPE), and the R-squared value. We also used feature importance to identify which technical indicators have the most influence on predictions to refine our model. Finally, the trained model is used to predict stock prices for the next three days, and the results are compared to actual prices to measure predictive accuracy and percentage difference.

Overall Model Pipeline:
pipeline

Additionally, another model was trained with all of the same data excluding the sentiment score, in order to determine the true impact of sentiment analysis when accounting for the technical indicactors.

Results

Overall, our research shows that adding sentiment analysis to stock prediction models substantially improves accuracy compared to just using technical indicators. Looking at all sectors together, predictions that incorporated sentiment had an average error of only 2.92% versus 4.46% for the technical-only approach, a 34.5% improvement. This confirms that market sentiment pulled from news headlines adds real predictive value when combined with traditional technical analysis.

A Comparison of Percent Error by Sector Based on Whether Sentiment Analysis was Used in the Model:
percent_error

Sector-Specific Performance

Different sectors responded differently to our sentiment analysis, reflecting how industries react to news in unique ways. Energy stocks showed the greatest improvement, with the error rate decreasing from 4.03% to 2.24%, a 44.4% reduction. This aligns with the high sensitivity of energy markets to real-time events, such as geopolitical tensions in the Middle East and hurricane forecasts, making timely news especially valuable for prediction.

Consumer staples saw a 37.8% reduction in error, reflecting their sensitivity to news about inflation and consumer confidence. Healthcare followed closely with a 36.1% improvement, consistent with how pharmaceutical and biotech stocks respond to FDA announcements, clinical trial results, and health policy updates.

Technology stocks showed the least improvement at 22.7%. This likely reflects the sector’s focus on long-term growth and quarterly earnings rather than short-term news cycles. While sentiment still plays a role in the technology sector, fundamental analysis and earnings projections may carry more weight. The key takeaway is that sentiment analysis should be adjusted by sector and that there is no universal approach that works for all industries.

Case Study: Abbott Laboratories

Looking at specific examples really drives home how sentiment improves prediction. We tracked Abbott Laboratories (ABT) over three days and saw the following:

Looking at the specific price predictions over a three-day period, the results demonstrate the value of sentiment analysis. On day one, both models performed identically, predicting a price of $135.78 compared to the actual closing price of $135.96, a minimal difference of 18 cents.

Comparison of actual and predicted stock price over three days for Abbott Laboratories:
Predicted vs. Actual for ABT

By the second day, differences between the models became more apparent. While the actual price remained relatively stable at $135.87, the sentiment-enhanced model predicted $135.29, with a minor deviation of just 58 cents. In contrast, the technical-only model showed a larger divergence, estimating $132.50 and undervaluing the actual price by $3.37.

The most notable divergence occurred on day three when the actual price rose to $138.01. The sentiment-enhanced model, accounting for positive market sentiment, predicted $136.25, underestimating by only $1.76. In contrast, the technical-only model failed to capture the upward movement, predicting a sharp decline to $126.75—$11.26 below the actual price, with an error exceeding 8%. This significant gap underscores the limitations of relying solely on technical indicators, as they can overlook key market shifts that sentiment analysis effectively detects.

Daily Error Analysis

Percent Daily Error By Sector and Analysis Type:
Healthcare Energy Consumer Tech
w/ Sent w/o Sent w/ Sent w/o Sent w/ Sent w/o Sent w/ Sent w/o Sent
Day 1 1.36% 1.34% 1.11% 1.26% 2.07% 2.48% 2.13% 2.23%
Day 2 2.81% 5.05% 2.19% 5.05% 3.24% 4.9% 3.98% 4.7%
Day 3 3.37% 5.75% 3.07% 5.78% 4.42% 6.14% 5.82% 5.43%

A closer analysis of daily error rates shows that integrating sentiment analysis consistently improves accuracy across different stock groups and time frames. The effect is especially noticeable on volatile days. For example, on Day 2, Group 1 stocks had an error rate of 2.81% with sentiment integration, compared to 5.05% without it. A similar pattern appeared in Group 3 on Day 3, where sentiment analysis reduced error from 6.14% to 4.42%.

Overall, this hybrid approach performs better than traditional models by integrating technical analysis with sentiment analysis. This framework enables traders to make better realtime decisions that align with both market trends and investor reactions to news.

Discussion

Initially we did not plan on using any large language models to assess the headlines and just use a BERT model to determine if a specific headline is positive or negative about a stock. However, we decided not to do this because a BERT model does not have a good enough reasoning ability to determine relevance and could not determine if a headline will postively or negatively affect a stock and as a whole weigh different types of headlines and aggregating them to create a single score.

One specific challenge that we had was figuring out a good prompt to properly utilize the LLM to the best of its ability. Another one was trying to trim the amount of headlines to reduce API costs, which is why we decided to create a similarity filter.

Conclusion

In this project we built a pipeline to determine stock movement from news headlines and historical stock data incorporating tools like Tree Boosting, ChatGPT, and vectorization. Our model with headlines performed significantly better than the one without indicating headlines is a relevant indicator for short term stock movements. However, it remains to be seen whether our headline processor can be used to generate alpha within the stock market over a long period of time and with a larger portfolio.

Our results display the complexity of predicting within the stock market as even something as small as news headlines can have a significant impact on the stock market. This makes sense as historically many traders have simply used traditional strategies like technical analysis which does not account for what is actually happening in the world. Thus the more types of media that can be captured will likely give more indicators to if a stock will perform better or worse.

Since no comprehensive dataset existed with hourly news headlines paired with stock prices, we had to construct our own covering only one month of data. This limited time frame makes it difficult to assess the model’s robustness across different market conditions. Future work could address this by expanding the dataset to include multiple years of historical news data.