Finding the Best Game to Stream: A Data-Driven Approach
Every streamer knows the dilemma: you want to play something you own, but you also want viewers to actually find your stream. Browsing Twitch manually is slow and random. I built Best Game to solve this — it reads your game library, cross-references it with live Twitch data, and ranks every game by discovery potential. Here’s how the recommendation engine works.
The Problem
As a small streamer, your biggest challenge is discoverability. Streaming a game with 200 other streamers means you’re buried at the bottom of the browse page. Streaming a game with zero viewers means nobody is searching for it. The sweet spot is somewhere in between — a game with enough demand to be searched, but few enough streams that you’ll actually be seen.
Best Game automates finding that sweet spot across your entire library.
Architecture Overview
┌──────────────────────────────────────────────────────────────┐
│ Your Game Libraries │
│ Steam Web API │ GOG Galaxy DB │ Epic Manifest │ Retro │
└────────┬────────────────┬───────────────┬──────────────┬─────┘
│ │ │ │
▼ ▼ ▼ ▼
┌──────────────────────────────────────────────────────────────┐
│ Game Name → Twitch Category Match │
│ Exact match → Substring match → Fuzzy word overlap (0.8) │
└───────────────────────────┬──────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ Batch Twitch Helix API: Per-Stream Viewer Stats │
│ Whale adjustment → Median-blended average → Top share │
└───────────────────────────┬──────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ Discovery Score Calculation │
│ Visibility (0.45) + Demand (0.35) + Floor (0.10) + │
│ Balance (0.10) │
└───────────────────────────┬──────────────────────────────────┘
│
▼
┌────────────────────┐
│ Sorted Dashboard │
│ + History + │
│ Opportunity Finder│
└────────────────────┘
Multi-Platform Library Discovery
Best Game reads your games from four sources:
Steam — Uses the Steam Web API (IPlayerService/GetOwnedGames) with your API key and SteamID64. Includes free-to-play games via include_played_free_games=1 and tags installed games by parsing libraryfolders.vdf.
GOG Galaxy — Reads directly from GOG’s local SQLite database (galaxy-2.0.db). The query joins across LibraryReleases, GamePieces, and GameTimes to extract titles, images, playtime, and install status — filtering out DLC with isDlc = 0.
Epic Games — Parses two local files: catcache.bin (base64-encoded JSON catalog) for ownership data, and *.item manifests for install status. Filters out non-game entries by checking for the "games" category.
Retro — A static catalog of console games (NES, SNES, PS1, etc.) cross-referenced against libretro thumbnail databases, making retro streaming viable for discovery.
All platforms normalize to a common format: {appid, name, platform, playtime, last_played}.
Matching Games to Twitch Categories
Once the library is loaded, each game needs to be matched to a Twitch category. The matching uses a three-tier algorithm:
def _match_twitch_game(self, steam_name, twitch_games):
steam_lower = steam_name.lower()
# Tier 1: Exact name match (case-insensitive)
for game in twitch_games:
if game["name"].lower() == steam_lower:
return game
# Tier 2: Substring match
for game in twitch_games:
twitch_lower = game["name"].lower()
if steam_lower in twitch_lower or twitch_lower in steam_lower:
return game
# Tier 3: Fuzzy word overlap with threshold
common_words = {"the", "and", "of", "in", "to", "for"}
steam_words = {w for w in steam_lower.split() if w not in common_words}
for game in twitch_games:
twitch_words = {w for w in game["name"].lower().split()
if w not in common_words}
overlap = steam_words & twitch_words
if overlap and len(overlap) / max(len(steam_words), 1) >= 0.8:
return game
return None
The threshold is configurable (default 0.8), and misses are cached with an exponential scan backoff — games that repeatedly fail to match get skipped more aggressively (2^n skips, capped at 64), saving API quota.
The Discovery Scoring Algorithm
The heart of the system is the discovery score — a value from 0 to 1 that represents how good a streaming opportunity a game is right now. It combines four weighted factors:
Factor 1: Visibility (weight 0.45)
discoverability = 12.0 / max(stream_count, 12.0)
This models organic browse discovery. Viewers typically scroll past about 12 thumbnails. If a category has fewer than 12 streams, your stream is likely to appear without scrolling. More streams = worse visibility. Example: 6 streams → 1.0 score, 24 streams → 0.5, 48 streams → 0.25.
Factor 2: Demand (weight 0.35)
effective_average = adjusted_average * 0.5 + median * 0.5
demand = min(effective_average / 5.0, 1.0)
Measures how many viewers each non-whale stream gets on average, capped at 5. The mean and median are blended 50/50 because the median better predicts what a typical small streamer will get, while the mean captures overall interest. Beyond 5 viewers per stream, the category is already well-served.
Factor 3: Audience Floor (weight 0.10)
needed_viewers = max(stream_count * 2.0, 10.0)
audience_floor = min(viewer_count / needed_viewers, 1.0)
Ensures there’s enough total viewership to go around. Every streamer should average at least 2 concurrent viewers. A minimum floor of 10 prevents borderline-niche categories from being inflated.
Factor 4: Balance (weight 0.10)
balance = 1.0 - max(0.0, (top_stream_share - 0.5) / 0.5)
Penalizes categories dominated by one large streamer. If the top stream has ≤50% of viewers, balance = 1.0. If the top stream has 100%, balance = 0.0. This filters out categories where viewers come for a specific personality, not the game.
Final Score
score = discoverability * 0.45 + demand * 0.35 + audience_floor * 0.1 + balance * 0.1
Whale Adjustment: Filtering Out Outlier Streamers
Before scoring, viewer distribution stats are computed with a “whale adjustment” that removes outlier streams. Without this, a single 2000-viewer stream would inflate the average for a category with 10 other streams at 2 viewers each, making it look more attractive than it really is.
if stream_count >= 5:
trim_count = max(1, int(stream_count * 0.1)) # Trim top 10%
adjusted_counts = counts[trim_count:]
elif stream_count >= 2 and top_share >= 0.7 and top_count >= 25:
adjusted_counts = counts[1:] # Trim if top has 70%+ share and 25+ viewers
The Opportunity Finder
Not all good opportunities are currently live. The Opportunity Finder scans your library for games that have zero streams right now but a proven history of viewership. This is the “be first to stream” strategy:
def get_opportunities(min_peak=50, max_avg_vps=200.0,
min_live_fraction=0.1, min_snapshots=30):
# 1. Find games with 0 current streams
# 2. Check history: require ≥30 snapshots, ≥10% were live
# 3. Verify peak viewers ≥ 50
# 4. Filter out whale categories: avg_viewers_per_stream ≤ 200
# 5. Sort by avg_viewers_when_live (default)
Each opportunity comes with historical confidence data: peak viewers ever recorded, average viewers when live, average stream count, and the fraction of snapshots that had activity. This gives you data-backed confidence to go live on a game with zero competition.
Historical Heatmaps
The system also builds a 7×24 heatmap (day-of-week × hour-of-day) for each game, computing average viewers, streams, and discovery scores for each time slot. This reveals patterns like:
- A game that peaks at 8 PM on Fridays but is dead on Tuesdays
- A game with consistent low viewership perfect for morning streams
- A game with weekend spikes driven by a single streamer
The heatmap is computed on-the-fly from historical snapshot data:
def get_heatmap(game_id, days=7):
# Query snapshots, bucket by (day_of_week, hour)
# For each cell: avg_viewers, avg_streams, avg_discovery, count
# Returns 7×24 grid
Background Collection
A background collector thread snapshots viewer counts for all known Twitch games every 15 minutes. It never runs concurrently with a game processing job and hits the batch endpoint (up to 100 game IDs per API call) to minimize API usage:
POST /api/history/collect → Manual trigger
Background thread (15 min) → Automatic
All data lands in a local SQLite database — no cloud dependencies, no accounts needed beyond free API keys.
Running It
The entire app is a Flask server that runs locally. Setup is automated on Windows with a batch script that creates a portable Python environment:
setup_windows.bat → Downloads Python 3.13, creates venv, installs Playwright Chromium
run.bat → Launches dashboard at https://localhost:5000
From the dashboard, you configure your Steam Web API key, Twitch Client ID/Secret, and optionally GOG/Epic paths. The app never sends your data anywhere — everything stays on your machine.
Grab the source and start finding your best game at github.com/blakelinkd/best_game.