Finding the Best Game to Stream: A Data-Driven Approach

Every streamer knows the dilemma: you want to play something you own, but you also want viewers to actually find your stream. Browsing Twitch manually is slow and random. I built Best Game to solve this — it reads your game library, cross-references it with live Twitch data, and ranks every game by discovery potential. Here’s how the recommendation engine works.

The Problem

As a small streamer, your biggest challenge is discoverability. Streaming a game with 200 other streamers means you’re buried at the bottom of the browse page. Streaming a game with zero viewers means nobody is searching for it. The sweet spot is somewhere in between — a game with enough demand to be searched, but few enough streams that you’ll actually be seen.

Best Game automates finding that sweet spot across your entire library.

Architecture Overview

┌──────────────────────────────────────────────────────────────┐
│                     Your Game Libraries                       │
│  Steam Web API  │  GOG Galaxy DB  │  Epic Manifest  │  Retro │
└────────┬────────────────┬───────────────┬──────────────┬─────┘
         │                │               │              │
         ▼                ▼               ▼              ▼
┌──────────────────────────────────────────────────────────────┐
│              Game Name → Twitch Category Match                │
│   Exact match → Substring match → Fuzzy word overlap (0.8)   │
└───────────────────────────┬──────────────────────────────────┘
                            │
                            ▼
┌──────────────────────────────────────────────────────────────┐
│          Batch Twitch Helix API: Per-Stream Viewer Stats      │
│   Whale adjustment → Median-blended average → Top share      │
└───────────────────────────┬──────────────────────────────────┘
                            │
                            ▼
┌──────────────────────────────────────────────────────────────┐
│                  Discovery Score Calculation                  │
│  Visibility (0.45) + Demand (0.35) + Floor (0.10) +          │
│  Balance (0.10)                                              │
└───────────────────────────┬──────────────────────────────────┘
                            │
                            ▼
                  ┌────────────────────┐
                  │  Sorted Dashboard  │
                  │  + History +       │
                  │  Opportunity Finder│
                  └────────────────────┘

Multi-Platform Library Discovery

Best Game reads your games from four sources:

Steam — Uses the Steam Web API (IPlayerService/GetOwnedGames) with your API key and SteamID64. Includes free-to-play games via include_played_free_games=1 and tags installed games by parsing libraryfolders.vdf.

GOG Galaxy — Reads directly from GOG’s local SQLite database (galaxy-2.0.db). The query joins across LibraryReleases, GamePieces, and GameTimes to extract titles, images, playtime, and install status — filtering out DLC with isDlc = 0.

Epic Games — Parses two local files: catcache.bin (base64-encoded JSON catalog) for ownership data, and *.item manifests for install status. Filters out non-game entries by checking for the "games" category.

Retro — A static catalog of console games (NES, SNES, PS1, etc.) cross-referenced against libretro thumbnail databases, making retro streaming viable for discovery.

All platforms normalize to a common format: {appid, name, platform, playtime, last_played}.

Matching Games to Twitch Categories

Once the library is loaded, each game needs to be matched to a Twitch category. The matching uses a three-tier algorithm:

def _match_twitch_game(self, steam_name, twitch_games):
    steam_lower = steam_name.lower()

    # Tier 1: Exact name match (case-insensitive)
    for game in twitch_games:
        if game["name"].lower() == steam_lower:
            return game

    # Tier 2: Substring match
    for game in twitch_games:
        twitch_lower = game["name"].lower()
        if steam_lower in twitch_lower or twitch_lower in steam_lower:
            return game

    # Tier 3: Fuzzy word overlap with threshold
    common_words = {"the", "and", "of", "in", "to", "for"}
    steam_words = {w for w in steam_lower.split() if w not in common_words}
    for game in twitch_games:
        twitch_words = {w for w in game["name"].lower().split()
                        if w not in common_words}
        overlap = steam_words & twitch_words
        if overlap and len(overlap) / max(len(steam_words), 1) >= 0.8:
            return game
    return None

The threshold is configurable (default 0.8), and misses are cached with an exponential scan backoff — games that repeatedly fail to match get skipped more aggressively (2^n skips, capped at 64), saving API quota.

The Discovery Scoring Algorithm

The heart of the system is the discovery score — a value from 0 to 1 that represents how good a streaming opportunity a game is right now. It combines four weighted factors:

Factor 1: Visibility (weight 0.45)

discoverability = 12.0 / max(stream_count, 12.0)

This models organic browse discovery. Viewers typically scroll past about 12 thumbnails. If a category has fewer than 12 streams, your stream is likely to appear without scrolling. More streams = worse visibility. Example: 6 streams → 1.0 score, 24 streams → 0.5, 48 streams → 0.25.

Factor 2: Demand (weight 0.35)

effective_average = adjusted_average * 0.5 + median * 0.5
demand = min(effective_average / 5.0, 1.0)

Measures how many viewers each non-whale stream gets on average, capped at 5. The mean and median are blended 50/50 because the median better predicts what a typical small streamer will get, while the mean captures overall interest. Beyond 5 viewers per stream, the category is already well-served.

Factor 3: Audience Floor (weight 0.10)

needed_viewers = max(stream_count * 2.0, 10.0)
audience_floor = min(viewer_count / needed_viewers, 1.0)

Ensures there’s enough total viewership to go around. Every streamer should average at least 2 concurrent viewers. A minimum floor of 10 prevents borderline-niche categories from being inflated.

Factor 4: Balance (weight 0.10)

balance = 1.0 - max(0.0, (top_stream_share - 0.5) / 0.5)

Penalizes categories dominated by one large streamer. If the top stream has ≤50% of viewers, balance = 1.0. If the top stream has 100%, balance = 0.0. This filters out categories where viewers come for a specific personality, not the game.

Final Score

score = discoverability * 0.45 + demand * 0.35 + audience_floor * 0.1 + balance * 0.1

Whale Adjustment: Filtering Out Outlier Streamers

Before scoring, viewer distribution stats are computed with a “whale adjustment” that removes outlier streams. Without this, a single 2000-viewer stream would inflate the average for a category with 10 other streams at 2 viewers each, making it look more attractive than it really is.

if stream_count >= 5:
    trim_count = max(1, int(stream_count * 0.1))  # Trim top 10%
    adjusted_counts = counts[trim_count:]
elif stream_count >= 2 and top_share >= 0.7 and top_count >= 25:
    adjusted_counts = counts[1:]  # Trim if top has 70%+ share and 25+ viewers

The Opportunity Finder

Not all good opportunities are currently live. The Opportunity Finder scans your library for games that have zero streams right now but a proven history of viewership. This is the “be first to stream” strategy:

def get_opportunities(min_peak=50, max_avg_vps=200.0,
                      min_live_fraction=0.1, min_snapshots=30):
    # 1. Find games with 0 current streams
    # 2. Check history: require ≥30 snapshots, ≥10% were live
    # 3. Verify peak viewers ≥ 50
    # 4. Filter out whale categories: avg_viewers_per_stream ≤ 200
    # 5. Sort by avg_viewers_when_live (default)

Each opportunity comes with historical confidence data: peak viewers ever recorded, average viewers when live, average stream count, and the fraction of snapshots that had activity. This gives you data-backed confidence to go live on a game with zero competition.

Historical Heatmaps

The system also builds a 7×24 heatmap (day-of-week × hour-of-day) for each game, computing average viewers, streams, and discovery scores for each time slot. This reveals patterns like:

The heatmap is computed on-the-fly from historical snapshot data:

def get_heatmap(game_id, days=7):
    # Query snapshots, bucket by (day_of_week, hour)
    # For each cell: avg_viewers, avg_streams, avg_discovery, count
    # Returns 7×24 grid

Background Collection

A background collector thread snapshots viewer counts for all known Twitch games every 15 minutes. It never runs concurrently with a game processing job and hits the batch endpoint (up to 100 game IDs per API call) to minimize API usage:

POST /api/history/collect  →  Manual trigger
Background thread (15 min) →  Automatic

All data lands in a local SQLite database — no cloud dependencies, no accounts needed beyond free API keys.

Running It

The entire app is a Flask server that runs locally. Setup is automated on Windows with a batch script that creates a portable Python environment:

setup_windows.bat   →  Downloads Python 3.13, creates venv, installs Playwright Chromium
run.bat             →  Launches dashboard at https://localhost:5000

From the dashboard, you configure your Steam Web API key, Twitch Client ID/Secret, and optionally GOG/Epic paths. The app never sends your data anywhere — everything stays on your machine.


Grab the source and start finding your best game at github.com/blakelinkd/best_game.

Watch me build stuff like this live on Twitch