Scoring Guide

How the system works and what to look for when making your BTS picks

What is Beat the Streak?

Beat the Streak (BTS) is MLB’s free prediction game. Each day you pick one or two batters you think will get at least one hit. If your pick gets a hit, your streak continues. If they go hitless, your streak resets to zero. Reach a 57-game streak and you win $5.6 million. You can “double down” by picking two batters on the same day — if both get hits your streak advances by two, but if either goes hitless, your streak resets.

This system uses a multi-agent AI pipeline to score every batter in today’s lineups across 15 statistical factors, then converts those scores into hit probabilities. The goal: find the batters most likely to get at least one hit today.

Understanding the Three Metrics

Every batter gets three independent numbers. Each one tells you something different.

Score (0-100)

A weighted algorithm that grades the batter across 15 factors — recent performance, contact quality, matchup against the opposing pitcher, ballpark, and more. Think of it as a “matchup quality” rating. Higher is better, but the number alone doesn’t account for how many plate appearances the batter will get.

Hit Probability (50-90%)

The score converted to “what are the odds this batter gets at least one hit today?” This is the primary metric for BTS picks. It accounts for lineup position — a leadoff hitter with the same score as a 9-hole hitter will have a higher hit probability because they get more plate appearances.

ML Probability (55-75%)

An independent machine learning model (LightGBM) trained on 180,000+ historical batter-games across 5 seasons. It uses 23 raw features and doesn’t know about the score — it’s a completely separate second opinion. When Hit Prob and ML Prob agree, you can be more confident in the pick.

Which one should I use? Hit Probability is the primary ranking metric — it’s what the “Hit Prob” tab sorts by. Use ML Probability as a confidence check. When both agree on a pick, that’s your strongest signal.

How the Score is Built

The 100-point score is the sum of 15 weighted factors. Each factor is normalized to 0-1 based on league targets, then multiplied by its weight. The categories below show where the points come from.

Plate Discipline

23 pts

Contact Rate16

K Rate (inverted)7

Contact rate is the single strongest predictor of getting a hit. Batters who put the ball in play give themselves a chance on every swing. K rate is inverted — lower strikeout rates score higher. Both are further adjusted by chase rate and whiff rate percentiles (see Discipline Adjustments below).

Performance

31 pts

Season BA10

Last 15 Games BA8

Last 7 Games BA7

Career BA6

Who’s hot right now? L7 and L15 batting averages are rolling windows that capture streaks and slumps — they use raw points with no smoothing, so what you see is what the batter is actually doing. Career BA provides a stable baseline. Season BA is blended with prior-season data early on (see PA Confidence below).

Expected Metrics

11 pts

xBA (expected batting avg)6

xwOBA (expected weighted OBA)5

Statcast data based on exit velocity and launch angle. These measure how hard and how well a batter is hitting the ball, regardless of whether fielders happened to catch it. They strip out luck. These require the most PA to stabilize (see PA Confidence).

Contact Quality

11 pts

Hard Hit Rate6

Line Drive Rate5

Hard-hit balls (95+ mph exit velo) and line drives are the types of contact most likely to become hits. A batter who barrels everything but hits it right at fielders will still show up well here.

Matchup & Context

24 pts

Pitcher Recent Form8

Sprint Speed6

Batter vs. Pitcher4

Ballpark Factor3

Bullpen Quality3

The opposing pitcher’s recent form matters most here. Sprint speed uses raw points (it’s a physical trait, not sample-size dependent) with a ground ball rate multiplier — fast runners who hit grounders benefit more. BvP uses historical head-to-head data (requires 5+ at-bats). Hitter-friendly parks like Coors Field provide a small boost.

BvP Redistribution

When the batter has fewer than 5 career at-bats against today’s pitcher, the BvP data is unreliable. Its 4 points are zeroed out and 1 point each is added to K rate, contact rate, pitcher form, line drive rate, and L15 BA — factors with more stable data. This adds 5 points to the pool (net +1), so the total possible score is 101 when BvP is insufficient.

Discipline Adjustments (Chase & Whiff)

After K rate and contact rate points are calculated, they’re adjusted by the batter’s plate discipline percentiles from Statcast. This rewards batters who don’t chase pitches outside the zone and penalizes free swingers.

Chase Rate → adjusts K% points

How often the batter swings at pitches outside the strike zone. Lower chase rate = better discipline = up to 15% boost to K% points. High chase rate = up to 15% penalty.

Whiff Rate → adjusts Contact% points

How often the batter swings and misses. Lower whiff rate = better contact ability = up to 15% boost to contact rate points. High whiff rate = up to 15% penalty.

Both adjustments use a percentile scale where 50th percentile is neutral (no change). The 1st percentile (best discipline) gets the full +15% boost; the 99th percentile (worst) gets the full -15% penalty.

Modifiers That Adjust the Score

After the 15 weighted factors are totaled, modifiers are applied on top. These capture matchup dynamics that don’t fit neatly into a single stat.

Modifier	Effect	Why it matters
Platoon advantage (LHH vs RHP)	+4	Left-handed batters have the largest historical advantage against right-handed pitchers (+28 wOBA points). This is the biggest platoon boost.
Switch hitter	+3	Switch hitters always bat from the opposite side, guaranteeing a platoon advantage regardless of the pitcher.
Platoon advantage (RHH vs LHP)	+2	Right-handed batters have a smaller but real advantage against lefties (+16 wOBA points).
Same-hand disadvantage	-3	Same-hand matchups (RHH vs RHP, LHH vs LHP) are harder. Breaking balls move away from same-side hitters.
Elite pitcher penalty	-3 to -15	Based on the pitcher’s blended FIP (current + prior season). Scaled by grade: -3 (grade 1), -5 (grade 2), -8 (grade 3), -12 (grade 4), -15 (grade 5). Grade 4-5 aces also reduce the batter’s xBA and xwOBA scores by 15-20% before the subtotal.
Weak pitcher bonus	+3 to +10	Pitchers with a blended FIP of 4.50+ are giving up hits to everyone. The bonus scales: +3 (FIP 4.50-4.99), +7 (FIP 5.00-5.49), +10 (FIP 5.50+).
BABIP correction	0 to -4	Batters with an unsustainably high batting average on balls in play (BABIP > .340) who are also outperforming their career average by .050+ get a haircut: -1 point per .025 of BABIP above .340, capped at -4. Requires 100+ PA to activate.
Injury risk	-2	Flat penalty if the batter has a recent injury flag. They may leave the game early, reducing plate appearances.

How Pitcher Quality is Measured (FIP)

The elite and weak pitcher grades above are driven by FIP (Fielding Independent Pitching) rather than ERA. FIP isolates what the pitcher actually controls — strikeouts, walks, and home runs — and strips out defense and luck.

FIP = ((13 × HR/9) + (3 × BB/9) - (2 × K/9)) / 9 + constant

The constant (currently 3.10) calibrates FIP to league ERA. It drifts slightly year to year as league run environment changes.

Early in the season a pitcher may have only a few innings pitched, making their current FIP unreliable. The system blends current and prior-season FIP using the Marcel reliability formula:

weight = IP / (IP + 50)

blended FIP = weight × current FIP + (1 - weight) × prior FIP

Season point	~IP	Current weight	Prior weight
Opening day	0	0%	100%
1st start	6	10.7%	89.3%
1 month	30	37.5%	62.5%
Mid-season	100	66.7%	33.3%
Full season	180	78.3%	21.7%

For rookies or pitchers with fewer than 30 IP the previous season, the prior FIP defaults to the league average (4.00). This means unknown pitchers are treated as average until they prove otherwise.

From Score to Hit Probability

The raw score (0-100) is converted into a per-game hit probability using a calibrated sigmoid function fitted against 7 seasons of historical outcomes. The key insight: lineup position matters because batters higher in the order get more plate appearances per game.

How it works:

The score is fed through a sigmoid function to get a base probability of getting a hit in the game (calibrated at league-average 3.89 PA/game)
That probability is inverted to find the per-plate-appearance hit rate
The per-PA rate is expanded back using the batter’s expected plate appearances based on lineup position (leadoff = 4.65 PA, 9th = 3.75 PA)

This is why a leadoff hitter with a score of 60 can have a higher hit probability than a 9-hole hitter with a score of 70.

Score	Base P(hit)	Leadoff (#1)	Cleanup (#4)	7-hole (#7)	9-hole (#9)
40	59.2%	72.6%	68.5%	65.5%	63.7%
50	63.1%	77.8%	73.6%	70.4%	68.8%
60	66.9%	81.7%	78.0%	75.1%	73.4%
70	70.4%	85.2%	81.8%	79.2%	77.6%
80	73.7%	88.0%	85.2%	82.8%	81.5%

Expected PA by lineup position: #1 = 4.65, #2 = 4.53, #3 = 4.31, #4 = 4.19, #5 = 4.07, #6 = 3.95, #7 = 3.89, #8 = 3.82, #9 = 3.75

The ML Model — A Second Opinion

The ML probability comes from a LightGBM model that was trained completely independently from the scoring formula. It sees raw data, not scores.

Training data

180,000+ historical batter-game records from 2019-2023, tested against 2024-2025 holdout data. Each record has 23 features built from rolling stats (not end-of-season aggregates) to avoid lookahead bias. The model learned what patterns predict hits from real in-season data.

What it knows

23 features including batting averages (L7, L15, season, career), prior-season pitcher stats (ERA, WHIP, FIP, K/9, IP), Statcast metrics (xBA, xwOBA), hard hit rate, sprint speed, contact rate, line drive rate, rest days, home/away, lineup position, ballpark factor, and platoon matchup.

Top 5 most important features (by model weight)

Lineup Position

Career BA

Line Drive Rate

Season PA

Hard Hit Rate

Lineup position being #1 reinforces why Hit Probability (which accounts for PA from lineup position) is the recommended primary metric.

Historical performance

59.1%

Baseline hit rate (any starter)

64.2%

Top 10% picks hit rate

67.2%

Top 5% picks hit rate

Confidence indicator: When Hit Prob and ML Prob agree within 3 percentage points, the card shows a “High confidence” badge. When they disagree by 10+ points, you’ll see a “Low confidence” warning. This doesn’t mean the pick is bad — just that the two methods see different things.

Ranking Tabs & Combined Score

The main page offers four ways to sort batters. Each tab shows a live accuracy badge based on historical results for that method.

Score

Sorted by the raw 100-point score. Best for seeing overall matchup quality without lineup position adjustment.

Hit Prob (default)

Sorted by hit probability. The recommended primary sort because it accounts for lineup position and expected plate appearances.

ML Prob

Sorted by the independent LightGBM model’s prediction. Useful when you want a pure data-driven ranking with no formula tuning.

Combined

A 3-way rank average: each batter’s rank position in Score, Hit Prob, and ML Prob is averaged. The lowest average rank (best consensus across all methods) sorts to the top. This is the best way to find batters that all three methods agree on.

Pool Size

You can filter to show #1 Pick, Top 3, Top 5, or Top 15 batters. The accuracy badges on each tab update to reflect only the selected pool size. Your selection is saved between visits.

AI Picks

Each day at 11:00 AM ET, the system generates two AI-selected picks displayed in the card at the top of the main page. These are the system’s best recommendations for your BTS picks that day.

Confidence badges

Each AI pick shows a confidence level: High (strong agreement across methods), Medium (reasonable signal), or Low (methods disagree or marginal stats).

Streak tracking

The AI picks card tracks two streak types: 1x (singles) where only the #1 pick needs a hit each day, and 2x (doubles) where both picks must hit or the streak resets. These mirror the BTS game’s single pick and double down modes.

On past dates, each AI pick shows a green HIT or red MISS badge so you can track historical performance. If the primary AI picks couldn’t be generated, the card shows a “Fallback” indicator.

Accuracy Tracker

The Accuracy page tracks how well each ranking method predicts hits over time. It answers the question: “If I had followed this method’s top picks every day, how often would they have gotten hits?”

Five methods tracked

Score, Hit Prob, ML Prob, Combined, and Consensus (a method that only picks batters appearing in the top N across multiple methods). Each gets a hero card with cumulative accuracy and head-to-head day wins.

Filters and views

Filter by pool size (#1, Top 3, Top 5, Top 15) and date range. Toggle between daily accuracy line chart and $1 bankroll simulation. Expand any day in the table to see exactly which batters were picked and whether they got hits.

BTS Streak Simulator

At the bottom of the accuracy page, the streak simulator shows what your Beat the Streak score would be if you had followed each method’s top picks every day. It tracks singles mode (only #1 pick must hit) and double down mode (both top 2 picks must hit). The method with the longest best streak gets highlighted. This is the closest thing to a real backtest of how the system would perform in the actual BTS game.

The PA Confidence System

Some metrics need a minimum sample of plate appearances before they’re reliable. The PA confidence system blends these metrics toward a prior-season anchor until the batter has enough plate appearances for the stat to stabilize. Not all metrics are blended — several use raw points directly.

Unblended (raw points)

Always trusted

L7 BA, L15 BA, career BA, sprint speed, ballpark factor, pitcher recent, BvP, bullpen quality.

These either don’t depend on current-season sample size (career BA, sprint speed, ballpark) or are rolling windows already bounded by their time period (L7, L15). What you see is what the batter is actually doing.

Fast Tier

Full trust at 60 PA

~2-3 weeks of games

Season BA, K rate, contact rate, hard hit rate, line drive rate. These need some data to be reliable but stabilize relatively quickly. Blended with prior-season values until 60 PA.

Slow Tier

Full trust at 200 PA

~8-9 weeks of games

xBA and xwOBA only. Statcast expected metrics need the most data because they rely on batted ball distributions that take longer to stabilize.

What about career BA and pitcher matchups?

About 24 points of the score come from factors that don’t depend on current-season plate appearances at all — career batting average, opposing pitcher form, ballpark factor, and bullpen quality. Plus 15 points from L7/L15 rolling windows that use raw performance. This means roughly 39% of the score is reliable from day one, before any PA-based blending kicks in.

Data Sources & Freshness

The system pulls from three data sources, each with different update cadences.

Source	What it provides	Update timing
MLB Stats API	Game schedules, confirmed lineups, player batting stats (season, career, game logs), pitcher stats, BvP history	Real-time. Lineups typically confirmed 1-2 hours before first pitch.
Statcast / Baseball Savant	xBA, xwOBA, exit velocity, barrel rate, hard hit rate, sprint speed, chase rate, whiff rate percentiles	Updated daily, typically by mid-morning. Prior-season data used as fallback early in the year.
Pybaseball (Statcast bulk)	Prior-season Statcast data for PA confidence anchors (xBA, xwOBA, hard hit rate, sprint speed)	Static per season. Loaded once for prior-year anchors.

Projected lineups: If lineups haven’t been confirmed yet, the system uses projected lineups based on recent history. You’ll see an amber warning banner on the main page when this happens. Projected lineup data may be less accurate for lineup position, which affects hit probability calculations.

Limitations

No model is perfect. Here’s what the system can’t account for.

Late scratches & pinch hitters

If a batter is scratched after lineups are confirmed, or gets pinch-hit for early, they may get fewer PA than expected. The system can’t predict mid-game decisions.

Weather & delays

Rain delays, postponements, and shortened games can reduce plate appearances or cancel games entirely. The system doesn’t factor in weather forecasts.

Unreported injuries

A batter playing through a minor injury that hasn’t been reported won’t trigger the injury modifier. The system only knows what the API reports.

Pitcher changes

The score is calculated against the announced starting pitcher. If the starter gets pulled early or an opener is used, later at-bats may be against a different pitcher than the one scored.

Early-season noise

In the first 2-3 weeks, current-season stats are based on small samples. The PA confidence system smooths this for some metrics, but L7/L15 averages can be volatile. A 3-for-5 day can swing a batter from .200 to .333 in L7.

The 59% ceiling

The baseline hit rate for any MLB starter is ~59%. Even the best picks only push that to ~67-70%. Getting a hit in any single game is inherently uncertain — the edge comes from making consistently good picks over many games.

Tips for Making BTS Picks

Use Hit Probability as your primary metric

It accounts for lineup position, so a leadoff hitter with a good score is better than a 9-hole hitter with a great score. The “Hit Prob” tab is the default sort for this reason.

Look for agreement between Hit Prob and ML Prob

When both methods rank a batter highly, that’s your strongest signal. The “Combined” tab does a 3-way rank average across Score, Hit Prob, and ML Prob to surface consensus picks.

Favor leadoff and 2-hole hitters

They get ~4.5-4.65 plate appearances per game vs ~3.75-3.89 for the bottom of the order. That extra plate appearance is roughly a 5-8% boost in hit probability at the same score.

Avoid batters facing elite pitchers

Even great hitters struggle against grade 3+ pitchers. A -8 to -15 point penalty is significant. Check the opposing pitcher’s elite grade in the matchup info on each card.

Early in the season, lean on matchups

Career batting average, platoon advantage, weak pitcher bonuses, and ballpark factors are all reliable from day one. L7 and L15 averages show real performance even with small samples. Current-season stats like season BA and xBA take a few weeks to become fully trusted.

Small edges matter — you're playing the long game

A 3-4% edge in hit probability doesn’t sound like much, but over 57 picks in a BTS streak attempt, it compounds significantly. Consistent small edges beat occasional home runs.

Check the accuracy page regularly

The accuracy tracker shows which ranking method is performing best recently. Methods go on hot and cold streaks. If ML Prob has been outperforming Hit Prob for the last week, consider weighting its picks more heavily.

Watch for the projected lineup warning

If you see an amber banner about projected lineups, the lineup positions haven’t been confirmed yet. Hit probability calculations may shift once real lineups are released. Consider waiting until lineups are confirmed before locking in picks.

Glossary

Quick reference for baseball stats and terms used throughout the system.

Term	Definition
BA	Batting Average. Hits divided by at-bats. A .300 BA means getting a hit 30% of the time.
PA	Plate Appearances. Every time a batter completes a turn at bat (includes walks, HBP, sacrifices). More comprehensive than at-bats.
xBA	Expected Batting Average. Based on how hard and at what angle the batter hits the ball (exit velocity + launch angle), stripped of fielding and luck. From Statcast.
xwOBA	Expected Weighted On-Base Average. Like xBA but weights extra-base hits more heavily. A more complete measure of offensive production, luck-adjusted.
BABIP	Batting Average on Balls In Play. BA excluding home runs and strikeouts. League average is ~.300. Significantly above .340 often indicates unsustainable luck.
FIP	Fielding Independent Pitching. Measures what a pitcher controls (K, BB, HR) and strips out defense. Better than ERA for evaluating pitcher talent.
K%	Strikeout Rate. Percentage of plate appearances that end in a strikeout. Lower is better for BTS (more balls in play = more hit chances).
Contact%	Contact Rate. Percentage of swings where the bat makes contact with the ball. Higher is better. The strongest single predictor of getting a hit.
Hard Hit%	Percentage of batted balls with 95+ mph exit velocity. Hard-hit balls become hits more often.
LD%	Line Drive Rate. Percentage of batted balls that are line drives. Line drives have the highest batting average of any batted ball type (~.680).
Sprint Speed	Measured in feet per second. Faster runners beat out more infield hits and have more range on the bases. League average is ~27 ft/sec.
BvP	Batter vs. Pitcher. Historical head-to-head batting average. Requires 5+ at-bats to be considered reliable. Small samples can be misleading.
Platoon Split	The performance difference when a batter faces an opposite-hand pitcher (LHH vs RHP or RHH vs LHP). Most batters perform better against opposite-hand pitching.
Chase Rate	How often a batter swings at pitches outside the strike zone. Lower is better (better discipline). Measured in Statcast percentiles.
Whiff Rate	How often a batter swings and misses. Lower is better (better contact ability). Measured in Statcast percentiles.
Marcel Blending	A reliability-weighted average that blends current and prior-season stats. Named after the forecasting method. Used for pitcher FIP and batter PA confidence in this system.