XLK

XLK
-- --
-- Score
Research Score --
▸ Click for details
MAs:
Chart Legend
Loading chart data...
Technical Indicators 13
Trend
--
RSI (14)
--
MACD
--
MACD Signal
--
MACD Histogram
--
EMA 5
--
EMA 9
--
EMA 21
--
EMA 50
--
EMA 200
--
SMA 5
--
SMA 9
--
SMA 21
--
SMA 50
--
SMA 200
--
Support & Resistance --

Pivot Levels

Loading patterns...

S/R Levels

Loading...
Chart Patterns --
Loading...
Candlestick Patterns --
Loading...

Key Metrics

Valuation

Market Cap
--
P/E Ratio
--
Forward P/E
--
PEG Ratio
--
P/B Ratio
--

Earnings & Revenue

EPS
--
Revenue (TTM)
--
Profit Margin
--
ROE
--

Risk & Volatility

Beta
--
Debt / Equity
--
Dividend Yield
--
52-Week High
--
52-Week Low
--

Income Statement

The income statement shows revenue, expenses, and profit over each fiscal year. Key rows are highlighted. YoY% shows the year-over-year change — green indicates improvement, red indicates decline.

Loading financial data...

Balance Sheet

The balance sheet reports assets, liabilities, and shareholders' equity at the end of each fiscal year. It provides a snapshot of what the company owns and owes at a specific point in time.

Loading financial data...

Cash Flow Statement

Cash flow tracks the actual cash moving in and out of the business from operations, investments, and financing. Free cash flow (operating cash flow minus capital expenditure) is a key indicator of financial health.

Loading financial data...

Key Ratios

Computed ratios across growth, profitability, returns, leverage, and cash flow efficiency. These are derived from the financial statements above and help compare performance across years.

Loading financial data...

SEC Filings

Recent filings with the Securities and Exchange Commission, including annual reports (10-K), quarterly reports (10-Q), and current reports (8-K).

Loading SEC filings...

Insider Trading

Recent insider trading activity reported to the SEC. Company officers and directors must disclose trades within two business days.

Loading insider trading data...

Institutional Holders

Major institutional investors that hold positions in this stock, as reported in 13F filings with the SEC.

Loading institutional holder data...
Loading news...

Price Target Consensus

Loading analyst data...

Analyst Grades

Loading analyst grades...

Analyst Estimates

Loading analyst estimates...

Earnings History

Loading earnings data...

Dividend History

Loading dividend data...

Stock Split History

Loading split data...

Insider Trades

Purchases and sales by company insiders (officers, directors, and 10%+ shareholders). Buys are highlighted in green, sells in red.

Loading insider trading data...

Institutional Holders (13F)

Institutional investors managing over $100M must disclose holdings quarterly via SEC Form 13F.

Loading institutional holder data...

Congressional Trading

Stock transactions by U.S. Senators and Representatives, disclosed under the STOCK Act. Trades are reported within 45 days.

Loading senate trading data...

Put/Call Ratio

The put/call ratio (PCR) measures the volume of put options traded relative to call options. A PCR above 1.0 indicates more puts (bearish sentiment), below 1.0 indicates more calls (bullish sentiment). Values between 0.7–1.0 are considered neutral.

PCR History

Loading options data...

Understanding Options Data

PCR (Volume)

The volume-based put/call ratio divides total put option volume by total call option volume for all strikes and expirations on a given day. It reflects current-day trading activity and is more reactive to short-term sentiment shifts.

  • Below 0.7 — Bullish (heavy call buying)
  • 0.7 – 1.0 — Neutral range
  • 1.0 – 1.2 — Slightly bearish
  • Above 1.2 — Bearish (heavy put buying)

PCR (Open Interest)

The open interest-based PCR divides total outstanding put contracts by call contracts. Open interest represents accumulated positioning over time, making it a slower-moving but more reliable gauge of overall market sentiment.

  • Rising OI PCR — Growing bearish positioning
  • Falling OI PCR — Growing bullish positioning
  • OI PCR tends to run higher than volume PCR

Options Volume

Total volume shows how many put and call contracts traded that day. High volume indicates strong interest and conviction. The histogram at the bottom of the chart is colored green when PCR < 1.0 (bullish) and red when PCR > 1.0 (bearish).

  • Volume spikes often precede or accompany big moves
  • Low volume = less conviction in sentiment

Moving Averages (SMA 5 & SMA 20)

The 5-day SMA (pink dashed) smooths the PCR Volume line to show the short-term trend. The 20-day SMA (purple dashed) shows the longer-term baseline. These help filter out daily noise.

  • PCR crossing above SMA 20 — sentiment shifting bearish
  • PCR crossing below SMA 20 — sentiment shifting bullish
  • SMA 5 crossing SMA 20 — potential trend change

Contrarian Signal

Many traders use the PCR as a contrarian indicator. Extreme readings often signal that sentiment has become too one-sided, which historically tends to precede reversals.

  • Extremely high PCR (> 1.5) — peak fear, potential bottom
  • Extremely low PCR (< 0.5) — peak greed, potential top
  • Works best on broad indices (SPY, QQQ) rather than individual stocks

Data Details

Options data is sourced from Alpha Vantage HISTORICAL_OPTIONS (end-of-day). It includes all listed strikes and expiration dates for the symbol.

  • Contracts — total put + call contracts listed
  • Expirations — number of distinct expiration dates
  • Data includes full Greeks (delta, gamma, theta, vega, rho)
  • Updated daily after market close
Research Tool Only — ML forecasts are for educational and research purposes only. They are not investment advice. Past model performance does not guarantee future results.
Loading forecasts...

Model Comparison

Loading forecasts...

How These Forecasts Are Generated

Each stock is analyzed by up to 9 independent machine learning models across 3 time horizons (1, 5, and 20 trading days). Models are trained on historical OHLCV data (minimum 200 trading days). All models predict percentage returns, not raw prices — this normalizes across price levels and makes model comparisons meaningful. Click any section below to learn more.

The 3 Time Horizons

1-Day (Next Trading Day) — Forecasts the closing price on the next trading session. Most sensitive to short-term momentum, recent price action, and intraday patterns. This is the most granular forecast and also the most difficult — daily returns are heavily influenced by news and random noise.

5-Day (1 Trading Week) — Predicts the closing price 5 trading days out. Captures weekly trends, mean-reversion patterns, and short-term momentum cycles. Less noisy than 1-day because random daily fluctuations partially cancel out over a week.

20-Day (1 Trading Month) — Forecasts the closing price 20 trading days out. Reflects longer-term trend signals, moving average crossovers, and sector rotation. Higher uncertainty and wider forecast spread across models — but also the horizon where fundamental forces have more time to play out.

For each horizon, the target variable is:

target = (future_close − current_close) / current_close × 100

This gives a percentage return that the model learns to predict. For example, if today's close is $100 and the close 5 days later is $103, the 5-day target is +3.0%. The model never sees or predicts dollar amounts — only percentage changes.

Feature Engineering — What the Models See

The raw OHLCV data (Open, High, Low, Close, Volume) is transformed into ~34 normalized features. Every feature is a ratio, percentage, or bounded indicator — no raw dollar prices are used as inputs. This prevents the model from memorizing price levels and ensures features are comparable across stocks trading at $5 vs $500.

Returns1-20d %
SMA Ratios5-200d
EMA Ratios5-200d
RSI14-period
MACD12/26/9
Bollinger20d, 2σ
Volume5d/20d ratio
Volatility5d/20d σ
Candlesbody/shadow
52-Weekhi/lo dist
DayMon-Fri
Full feature list with formulas

Historical Returns (6 features) — Percentage price change over 1, 2, 3, 5, 10, and 20 trading days. Formula: (close / close_N_days_ago − 1) × 100. These capture momentum at multiple scales — a stock up 5% in the last 20 days but down 1% today shows divergent short/long momentum.

SMA Ratios (5 features) — Price distance from Simple Moving Averages at 5, 10, 20, 50, and 200-day windows. Formula: (close / SMA − 1) × 100. A stock 3% above its SMA 50 is in a short-term uptrend relative to its medium-term average.

EMA Ratios (5 features) — Same concept but using Exponential Moving Averages, which weight recent prices more heavily. EMAs react faster to price changes than SMAs.

MA Crossover Ratios (2 features)(SMA20 / SMA50 − 1) × 100 and (SMA50 / SMA200 − 1) × 100. Classic trend signals — when the shorter MA crosses above the longer, it signals trend acceleration (the “golden cross” and “death cross” patterns).

RSI — Relative Strength Index (1 feature) — 14-period momentum oscillator bounded 0–100. Computed as: 100 − 100 / (1 + avg_gain / avg_loss). Above 70 = overbought (may reverse down), below 30 = oversold (may reverse up). The model learns these mean-reversion signals.

MACD (2 features) — Moving Average Convergence Divergence. The MACD line is EMA12 − EMA26, normalized as a percentage of price. The histogram is the difference between the MACD line and its 9-period signal line. Captures momentum shifts and trend exhaustion.

Bollinger Bands (2 features) — Band width: (upper − lower) / SMA20 × 100 measures current volatility relative to average. %B: (close − lower) / (upper − lower) shows position within the bands (0 = at lower band, 1 = at upper band). Bands use 20-day SMA ± 2 standard deviations.

Volume Ratios (2 features) — Today's volume divided by the 5-day and 20-day average volume. A ratio of 2.5 means 2.5x normal volume — high volume on a price move confirms the trend; high volume on a reversal signals a potential turning point.

Volatility (2 features) — Rolling standard deviation of daily returns over 5 and 20 days, expressed as a percentage. Higher volatility means larger expected daily moves. The model uses this to gauge the “normal” noise level for the stock.

Candlestick Ratios (4 features) — Body: (close − open) / close × 100. Upper shadow: (high − max(open,close)) / (high − low). Lower shadow: (min(open,close) − low) / (high − low). Range: (high − low) / close × 100. These encode daily price action patterns that technical analysts have used for centuries.

52-Week Position (2 features) — Distance from the 252-day high and low as percentages. A stock at −2% from its 52-week high is near resistance; at +40% from its 52-week low shows strong recovery. These capture long-range mean-reversion and breakout signals.

Day of Week (1 feature) — Monday=0 through Friday=4. Research shows small but measurable weekday effects in stock returns (e.g., the “Monday effect” and “Friday effect”).

All features are standardized using StandardScaler (zero mean, unit variance) before training. Rows with NaN values from rolling window calculations (the first ~200 days) are dropped. This ensures every feature contributes proportionally regardless of its natural scale.

Feature Selection — Reducing Noise

Not all 34 features are equally useful for every time horizon. A feature that predicts 1-day returns well may be irrelevant for 20-day returns, and including irrelevant features adds noise that can hurt model performance.

Before training, mutual information regression (from scikit-learn) is used to rank each feature by its statistical dependency with the target return. Mutual information measures how much knowing a feature's value reduces uncertainty about the target — it captures both linear and non-linear relationships, unlike simple correlation.

The algorithm:

  1. Compute mutual information score between each feature and the target return for the given horizon
  2. Rank features by score (higher = more informative)
  3. Keep only the top 18 features (configurable via ml_config.json; set to 0 to disable)
  4. Train all models using only the selected features

This selection is performed independently for each horizon. For example, the 1-day model might rely heavily on RSI and 1-day returns, while the 20-day model might favor SMA200 ratio and volatility. Reducing from 34 to 18 features cuts noise and improves generalization, especially for tree-based models that can overfit to irrelevant splits.

The 10 ML Models — How Each One Works

We use a diverse ensemble of 10 models from 6 different algorithm families (tree ensembles, linear, instance-based, feedforward neural networks, recurrent neural networks, and probabilistic sequence models). Diversity is intentional — when fundamentally different algorithms agree, the signal is stronger than any single model.

RandomForest — Ensemble of Decision Trees

A Random Forest builds 200 independent decision trees, each trained on a random bootstrap sample of the data with a random subset of features at each split. The final prediction is the average of all 200 trees.

How a decision tree works: Starting from all training data, it finds the single feature and threshold that best splits the data into two groups with different average returns. It repeats recursively, creating a tree of if/then rules. For example: “If RSI < 30 AND volume_ratio > 1.5 AND return_5d < −3%, predict +0.8% return.”

Why “random”: Each tree sees only ~63% of the data (bootstrap sampling) and considers only a random subset of features at each split. This forces trees to be diverse — they make different errors, and averaging them cancels out individual mistakes (variance reduction).

Our configuration: n_estimators=200, max_depth=8, min_samples_leaf=5, n_jobs=-1, random_state=42. Max depth 8 limits tree complexity to prevent memorizing noise. Min 5 samples per leaf ensures predictions are based on at least 5 historical examples.

Output: Provides feature importance rankings — features used more often at the top of trees (where splits are most impactful) get higher importance scores.

ExtraTrees — Extremely Randomized Trees

Similar to RandomForest but with an additional layer of randomization: at each split, instead of finding the optimal threshold, it picks a random threshold for each candidate feature and uses the best among those random splits.

Why this helps: Random thresholds make the model even less likely to overfit to specific data points. The model trades a tiny bit of accuracy on the training data for better generalization to unseen data. In practice, ExtraTrees often outperforms RandomForest on noisy data like stock returns.

Same configuration: 200 trees, max depth 8, min 5 samples per leaf. Also provides feature importance rankings.

Ridge — Regularized Linear Regression

A linear model that finds the best weighted combination of all features: prediction = w1×feature1 + w2×feature2 + ... + bias. The weights are learned by minimizing the squared prediction error on training data.

L2 Regularization (alpha=1.0): Adds a penalty term alpha × sum(weights²) to the loss function. This penalizes large weights, preventing the model from relying too heavily on any single feature. Without regularization, linear models can assign huge weights to noisy features and overfit badly.

Limitations: Cannot capture non-linear patterns. If the relationship between RSI and returns is U-shaped (both very high and very low RSI predict reversals), Ridge cannot model this. However, its simplicity is also a strength — it won't hallucinate complex patterns that don't exist.

Interpretation: Each weight directly tells you how much a 1-unit increase in the (standardized) feature changes the predicted return. Positive weight = bullish signal; negative = bearish.

KNeighbors — K-Nearest Neighbors

A non-parametric, instance-based model. It doesn't learn explicit rules — instead, for each new prediction, it finds the 20 most similar historical trading days (by Euclidean distance across all selected features) and computes a distance-weighted average of their actual returns.

How it works: Imagine a trading day where RSI is 35, volume ratio is 2.1, and the stock is 5% below SMA50. KNN scans all historical days, finds the 20 closest matches to this feature profile, and predicts the average of what actually happened next on those 20 historical days. Closer matches get higher weight (distance-weighted).

Our configuration: n_neighbors=20, weights=“distance”. 20 neighbors balances having enough data points for a stable average vs being specific enough to capture local patterns.

Strengths: Makes no assumptions about the data distribution, naturally adapts to local patterns, and is intuitive to understand. Weakness: Suffers from the “curse of dimensionality” — in high-dimensional feature spaces, all points become roughly equidistant, which is why feature selection (reducing from 34 to 18 features) significantly helps KNN.

LSTM — Long Short-Term Memory Neural Network

A deep learning recurrent neural network specifically designed for sequential data. Unlike the other models which see each day independently, LSTM processes the last 30 trading days as an ordered sequence, learning temporal patterns (e.g., “when this pattern of features appears across 5 consecutive days, a reversal follows”).

Architecture:

  • LSTM Layer 1: 64 hidden units, returns sequences to next layer
  • Dropout: 20% of neurons randomly deactivated during training (prevents overfitting)
  • LSTM Layer 2: 32 hidden units, returns final hidden state
  • Dropout: 20%
  • Dense Output: 1 unit (predicted return)

How LSTM cells work: Each LSTM cell has 3 “gates” (forget, input, output) that control information flow. The forget gate decides what to discard from the cell state (e.g., “the price action from 25 days ago is no longer relevant”). The input gate decides what new information to store. The output gate decides what to pass to the next time step. These gates are learned during training.

Training details: Adam optimizer, MSE loss, 20 epochs, batch size 32. The target (return) is scaled separately with its own StandardScaler and inverse-transformed after prediction. The 80/20 train/validation split ensures the last 20% of the time series is used for evaluation.

Why separate target scaling: LSTM is sensitive to input scales. Feature scaling alone isn't enough — the target returns must also be scaled so the network's output neurons operate in a numerically stable range. The inverse transform recovers the actual percentage return.

XGBoost — Extreme Gradient Boosting

A gradient boosted tree ensemble that builds 300 trees sequentially, where each new tree is trained to correct the errors (residuals) of all previous trees combined. This is fundamentally different from RandomForest, which builds trees independently.

How boosting works: Tree 1 makes predictions. Tree 2 is trained on the errors of Tree 1. Tree 3 is trained on the remaining errors after Trees 1+2. Each tree adds a small correction, and the cumulative effect builds a powerful predictor from many simple ones.

Regularization: L1 (alpha=0.1) promotes sparsity — some features get zero weight. L2 (lambda=1.0) penalizes large leaf values. 80% row subsampling and 80% column subsampling at each tree introduce randomness similar to RandomForest. Learning rate 0.05 means each tree contributes only 5% correction, requiring many trees but reducing overfitting.

Our configuration: n_estimators=300, max_depth=6, learning_rate=0.05, subsample=0.8, colsample_bytree=0.8, reg_alpha=0.1, reg_lambda=1.0.

Why XGBoost dominates: It's the most successful algorithm in machine learning competitions for tabular data. The sequential error-correction mechanism is particularly effective at finding subtle patterns that random forests miss.

LightGBM — Light Gradient Boosting Machine

Another gradient boosted tree, but with two key algorithmic innovations that make it 10–20x faster than XGBoost while maintaining comparable or better accuracy.

Leaf-wise growth: Instead of growing trees level-by-level (like XGBoost), LightGBM grows leaf-wise — it always splits the leaf with the highest potential gain. This creates deeper, more asymmetric trees that reach better accuracy with fewer splits.

Histogram binning: Continuous feature values are bucketed into 256 bins. Instead of evaluating every possible split threshold, only 256 boundaries are tested. This dramatically reduces computation time and also acts as a form of regularization (smoothing).

Our configuration: n_estimators=300, max_depth=6, learning_rate=0.05, subsample=0.8, colsample_bytree=0.8, reg_alpha=0.1, reg_lambda=1.0. Same hyperparameters as XGBoost for fair comparison.

CatBoost — Categorical Boosting

A gradient boosted tree with ordered boosting — a technique that addresses a subtle form of overfitting called “prediction shift” that affects all other gradient boosting implementations.

The prediction shift problem: In standard boosting, when computing the residual for training example #500, the model has already been influenced by example #500 during earlier trees. This creates a slight data leakage — the model's errors on training data are systematically smaller than on new data. Over hundreds of trees, this bias accumulates.

Ordered boosting solution: CatBoost uses a random permutation of the data and, for each example, only uses preceding examples (in the permutation) to compute the model's prediction. This is analogous to time-series cross-validation applied within each boosting iteration.

Our configuration: iterations=300, depth=6, learning_rate=0.05, l2_leaf_reg=3.0. CatBoost tends to produce the best out-of-the-box results with less hyperparameter tuning needed.

MLP — Multi-Layer Perceptron Neural Network

A feedforward neural network (also called a “fully connected” or “dense” network). Unlike the LSTM which processes data as a time sequence, the MLP sees each day as an independent set of features — similar to the tree and linear models, but with the ability to learn complex non-linear relationships.

Architecture: Two hidden layers with 128 and 64 neurons respectively. Each neuron computes a weighted sum of its inputs, adds a bias, and passes the result through a ReLU activation function (max(0, x)). ReLU introduces non-linearity — allowing the network to learn curved decision boundaries that linear models cannot.

Training: Uses the Adam optimizer (adaptive learning rate per parameter) with an initial learning rate of 0.001 that automatically reduces when progress stalls (adaptive schedule). L2 regularization (alpha=0.001) penalizes large weights. Early stopping monitors validation loss and halts training if no improvement for 10 consecutive iterations, preventing overfitting.

Our configuration: hidden_layer_sizes=(128, 64), activation=relu, solver=adam, max_iter=200, early_stopping=True, validation_fraction=0.15, n_iter_no_change=10, alpha=0.001.

vs LSTM: MLP treats each day independently (no sequential memory), which makes it faster to train but unable to learn temporal patterns like “3 consecutive up days followed by high volume.” It complements LSTM by capturing static feature interactions that don't require sequential context.

HMM — Hidden Markov Model (Regime Detection)

A probabilistic sequence model that detects hidden market “regimes” (Bull, Bear, Sideways) from observable price data. Unlike the other models which predict returns directly, HMM first classifies the current market state, then uses learned transition probabilities to forecast the most likely future state and its associated return.

How it works: The model assumes the market is always in one of 3 hidden states, each with its own characteristic return distribution (mean and variance). It observes 4 features — daily log returns, 5-day and 20-day rolling volatility, and volume ratio — and uses the Baum-Welch algorithm (Expectation-Maximization) to learn the state distributions and transition probabilities from historical data.

Prediction: The Viterbi algorithm decodes the most likely regime sequence, identifying the current state. The learned transition matrix then gives the probability of moving to each state tomorrow. The predicted return is the probability-weighted average of each state's mean return. For 5-day and 20-day horizons, the transition matrix is raised to the Nth power to compute multi-step transition probabilities.

State labeling: The 3 states are automatically labeled by their mean returns: highest mean = Bull, lowest = Bear, middle = Sideways. This labeling is data-driven and adapts to each stock's characteristics.

Our configuration: n_components=3, covariance_type=diag, n_iter=200. Diagonal covariance assumes features are conditionally independent given the state — a reasonable simplification that prevents overfitting with limited data.

Unique value: HMM is the only model in the ensemble that explicitly models market regime transitions. While tree and linear models treat each day independently, HMM captures the persistence of market conditions (“bull markets tend to continue”) and the probabilities of regime shifts.

Training & Validation — How Models Are Evaluated

Evaluating ML models on financial data requires special care. Standard random train/test splits are invalid for time series because they let the model “see the future” during training (data from 2025 in training, data from 2024 in testing). We use two temporal-aware strategies:

Time-Series Cross-Validation (8 models) — For all scikit-learn models (excluding LSTM and HMM), we use TimeSeriesSplit(n_splits=5):

  • Fold 1: Train on years 1-2, test on year 3
  • Fold 2: Train on years 1-3, test on year 4
  • Fold 3: Train on years 1-4, test on year 5
  • Fold 4: Train on years 1-5, test on year 6
  • Fold 5: Train on years 1-6, test on year 7+

Each fold trains only on data before the test period. The reported R² and MAE are averages across all 5 folds. This gives a realistic estimate of how the model would have performed if deployed at different historical points.

Sequential Split (LSTM) — The LSTM uses a simpler 80/20 split: the first 80% of the time series for training, the last 20% for validation. This is because LSTM requires contiguous sequences of 30 days, making k-fold splitting impractical.

Quality gate: Any model with R² below −1.0 (more than twice as bad as predicting the mean) is automatically discarded and not shown in the results.

Why negative R² is expected

An R² of −0.05 does not mean the model is useless. It means the model explains −5% of the variance in returns — slightly worse than simply predicting “0% change every day.” But predicting exact returns is not the goal — predicting direction (up vs down) can still be profitable even with negative R².

Consider: a model predicts +0.5% when the stock goes up 2%, and −0.3% when it goes down 1.5%. The magnitudes are wrong (negative R²), but the direction is correct both times. This is why we track direction accuracy separately from R².

Academic research consistently shows R² values of 0.01–0.05 (1–5%) for daily stock return prediction, even for state-of-the-art models. Our negative values reflect honest evaluation with proper temporal cross-validation — many commercial “AI trading” products show inflated metrics due to look-ahead bias.

Forecast Accuracy Tracking — Backtesting

Forecasts are only meaningful if we track whether they were actually correct. Every night, a backtesting process compares past forecasts to what actually happened in the market.

How backtesting works:

  1. Find all past forecasts where enough trading days have elapsed (e.g., a 5-day forecast from last Monday can be evaluated this Monday)
  2. Look up the actual closing price N trading days after the forecast date
  3. Compute the actual percentage change and compare to the forecasted change
  4. Record: forecasted direction (up/down/flat), actual direction, whether direction was correct, forecast error, and absolute error

Metrics tracked per model:

  • Direction Accuracy — Percentage of forecasts where the model correctly forecasted up vs down. 50% is random; above 55% is noteworthy for stock forecasting.
  • Mean Absolute Error — Average size of forecast error in percentage points. If MAE is 1.5%, the model's forecasts are typically off by ±1.5 percentage points.
  • Forecast count — Number of backtested forecasts. More data = more reliable accuracy estimates.

Direction accuracy requires sufficient history to be meaningful. With only 10 forecasts, a 60% accuracy rate could easily be luck. With 100+ forecasts, the accuracy stabilizes and reveals each model's true forecasting ability.

Freshness: Backtesting runs automatically before each nightly batch. 1-day forecasts can be evaluated the next trading day. 5-day forecasts take a week. 20-day forecasts take about a month to evaluate. Accuracy data accumulates over time.

Consensus, Agreement & Confidence

Individual models are unreliable, but their collective behavior provides useful signals. Three ensemble metrics summarize the models' agreement:

Model Agreement — Shows how many models predict the same direction (e.g., “6/8 UP”). High agreement (75%+) means most models independently arrived at the same conclusion despite using different algorithms. Low agreement (<50%) means the models are split — the market outlook for this stock is ambiguous.

  • Strong (75%+) — Clear directional consensus
  • Moderate (50–74%) — Mild consensus, some disagreement
  • Weak (<50%) — Models are split, no clear signal

Consensus — The average predicted change across all models. This combines both direction and magnitude into a single number. A consensus of +0.5% means models on average expect a moderate gain.

Confidence — Derived from the standard deviation of forecasts across models. If all 8 models forecast returns between +0.1% and +0.3%, the standard deviation is small and confidence is High. If forecasts range from −2% to +3%, the spread is large and confidence is Low.

  • High — Std dev < 0.5% (models closely agree on magnitude)
  • Medium — Std dev 0.5–1.5% (moderate spread)
  • Low — Std dev ≥ 1.5% (wide disagreement)

The most reliable signals come from high agreement + high confidence: most models agree on direction AND agree on the magnitude of the expected move.

Understanding the Metrics

R² (R-squared / Coefficient of Determination) — Measures how much of the variance in actual returns the model explains. Ranges from −∞ to 1.0. A value of 0.0 means the model is no better than forecasting the average return every day. Negative means worse than the average. For stock forecasting, even values of 0.01–0.05 are considered useful.

MAE (Mean Absolute Error) — Average size of forecast errors in percentage points. If MAE is 1.82%, the model's forecasts are off by ~1.82 percentage points on average. Lower is better. For context, the average daily move of the S&P 500 is about ±0.8%, so a MAE of 1.82% on daily forecasts means the model's error is roughly 2x the typical daily move.

Samples — Number of historical data points used for training. More samples generally mean more reliable models. Stocks with 10,000+ bars (40+ years of data) produce more stable models than stocks with 200 bars (1 year). LSTM uses fewer samples than other models because it requires 30 contiguous days per training example.

Direction Accuracy — Backtest metric: how often the model correctly forecasted whether the stock would go up or down. Computed from actual historical forecasts vs actual outcomes. 50% is random coin-flip level; above 55% is noteworthy. This metric is arguably more useful than R² for evaluating practical utility.

Feature Importance — For tree-based models (RandomForest, ExtraTrees, XGBoost, LightGBM, CatBoost), shows which features contributed most to the forecasts. Importance is measured by how much each feature improved forecast accuracy across all trees. The top 5 features are shown. This helps understand what is driving the forecast.

About the Projection Charts

Each chart displays 30 trading days of actual closing prices (white line) followed by a projected path (colored dashed line) passing through the model's 1-day, 5-day, and 20-day forecasted prices.

The zigzag between waypoints is generated using a Brownian bridge simulation — a mathematical construct that creates a random path constrained to pass through fixed endpoints. The amplitude is calibrated to the stock's actual historical daily volatility, so high-volatility stocks show wider zigzags.

Important: The intermediate daily points between waypoints (e.g., days 2–4 between the 1-day and 5-day forecasts) are illustrative only. They show a plausible path given the stock's volatility, not an actual forecast. Only the marked waypoints (day 1, day 5, and day 20) represent the model's actual output.

All charts share the same Y-axis scale and time range for easy visual comparison across models. A deterministic random seed per model ensures charts don't flicker when re-rendering.

Technical Stack & Infrastructure

The forecast system is built entirely with open-source tools:

  • Python 3.12 — Core forecast engine
  • scikit-learn — RandomForest, ExtraTrees, Ridge, KNeighbors, TimeSeriesSplit, StandardScaler, mutual information feature selection
  • TensorFlow/Keras — LSTM neural network
  • XGBoost — Gradient boosted trees (C++ core with Python bindings)
  • LightGBM — Microsoft's gradient boosting framework
  • CatBoost — Yandex's ordered boosting framework
  • pandas & NumPy — Data manipulation and feature engineering
  • MariaDB — Stores forecasts, accuracy results, and historical price data

Execution: Forecasts are generated via a Python script with up to 5 concurrent processes (configurable). Each symbol takes 3–6 minutes to forecast across all 8 models and 3 horizons. A nightly cron job processes all tracked symbols, with a FIFO queue and self-chaining to maximize throughput. On-demand forecasts are triggered when you visit this page.

Data pipeline: OHLCV data flows from market data providers (Finnhub, Stooq, FMP, Alpha Vantage) → MariaDB daily_prices table → Python feature engineering → model training → ml_predictions table → this page. Backtest results are stored in ml_prediction_accuracy and evaluated nightly.

Important Limitations

Price & volume only. All models are trained purely on historical price and volume data. They do not consider earnings, news, macroeconomic events, analyst ratings, insider trading, geopolitical events, or fundamental data. A stock about to report blowout earnings will not be predicted differently from normal.

Historical patterns may not repeat. Markets evolve. A pattern that worked in 2015–2020 may not work in 2025–2030. Regime changes (interest rate shifts, pandemics, regulatory changes) can invalidate historically learned relationships.

Negative R² is normal. Stock returns at the daily level are dominated by noise. Even the world's best quant funds struggle to achieve positive R² on daily return forecasts. Our transparent reporting of negative R² reflects honest evaluation, not poor model quality.

Forecasts change daily. Each night, models are retrained with the latest data. Yesterday's forecast for “5 days out” will differ from today's forecast for “4 days out” because the model sees one more day of data and may change its assessment.

Not investment advice. These tools are for educational and research purposes only. They demonstrate how different ML algorithms approach the problem of stock forecasting and help you understand the inherent difficulty. They should never be the sole basis for any investment decision.

Click to generate an AI-powered equity research report for XLK.

Uses GPT-4o to analyze fundamentals, technicals, ML forecasts, ownership data, and congressional trades.

Loading chat history...
Research Tool Only — Autonomous research reports are AI-generated for educational purposes only. They are not investment advice.

No autonomous research report available for XLK.

Log in or register to run autonomous research.