Design decisions
Architecture decision records (ADRs) — the technical choices behind the project, why we made each one, and what trade-offs we accepted. Append-only and numbered. If you want the consumer-facing explanation of how grades work, see How grades work.
Monorepo with shared `db/` as schema source of truth
- Status: Accepted
- Date: 2026-04-22
Context#
This project has two distinct codebases:
- A Python pipeline that ingests data from
nfl_data_pyand writes per-season + career grades to Postgres. - A Next.js web app that reads from Postgres and renders teams, depth charts, and grades.
Both touch the same database schema. We considered:
- Two repos (pipeline + web), each with its own copy of the schema.
- Monorepo with a shared
db/directory holding SQL migrations. - Schema-first ORM (Drizzle in TS, then introspect from Python; or SQLAlchemy in Python with TS clients consuming an OpenAPI spec).
Decision#
Monorepo. SQL migrations in db/migrations/ are the single source of
truth. Both Python and TypeScript follow that schema; neither owns it.
The TS side gets type safety via nflgrades gen-types, which introspects
the live DB and emits web/src/types/db.generated.ts. The Python side uses
raw SQL + pandas (no ORM models — see ADR 0002).
Consequences#
Easier:
- One PR can include a schema change + the pipeline change that uses it + the web change that displays it. No cross-repo coordination.
- New contributors (and AI agents) see the whole system in one tree.
docker compose up -dbrings up Postgres with migrations auto-applied, giving both halves a working environment instantly.
Harder:
- Repo grows two ecosystems' worth of tooling (npm + pip). Mitigated by keeping each in its own directory with its own README.
- Can't independently version the two halves. We don't need to.
Explicitly given up:
- Schema-first ORMs (Drizzle, Prisma) where the ORM file generates migrations. They'd push us to TS-first thinking, which is wrong here: the data pipeline is the primary writer and the analyst-friendly layer. See ADR 0002.
Python pipeline as installable package + raw SQL
- Status: Accepted
- Date: 2026-04-22
Context#
The Python side has to:
- Pull large DataFrames from
nfl_data_py - Compute statistical components and grades on those DataFrames
- Bulk-write results to Postgres
Two architectural questions:
- Loose scripts in
scripts/versus an installable package with a CLI entry point. - SQLAlchemy ORM models versus raw SQL + pandas
to_sql.
Decision#
Installable package. pipeline/ has a pyproject.toml defining the
nfl_grades package. After pip install -e ".[dev]", the user gets:
from nfl_grades.grading import sigmoidworks from anywherenflgradesCLI command (defined innfl_grades.cli:main)- Tests can
import nfl_gradeswithout path hacks - The package can be reused from notebooks, CI jobs, and scheduled runs
Raw SQL + pandas. No Base = declarative_base(), no class Player(Base).
Pipeline code uses:
pandas.read_sql/df.to_sqlfor bulk reads/writessqlalchemy.text("...")+ the engine fromnfl_grades.dbfor one-off statementsnfl_grades.db.session()context manager for transactional work
Consequences#
Easier:
- The CLI gives us one obvious entry point per stage (
nflgrades ingest,nflgrades grade, etc.) instead of a sprawl ofpython scripts/*.py. - Bulk DataFrame writes via
to_sqlare 10-100x faster than ORMadd_allfor the row counts we deal with (tens of millions of PBP rows). - Schema lives in SQL only (ADR 0001). No risk of "ORM model says X, DB says Y" drift.
Harder:
- No automatic relationship traversal (
player.seasons[0].grades). We don't need it — every analytical query is a SQL JOIN. - No Alembic auto-generation from models. We use a tiny custom migration runner instead — see ADR 0006.
Explicitly given up:
- ORM ergonomics. We're a data-analysis pipeline, not a CRUD app.
Data tier and `qualified` flag as first-class columns
- Status: Accepted
- Date: 2026-04-22
Context#
Our grades come in three data-quality tiers:
- Tier 1 (QB/RB/WR/TE): rich data, full pipeline incl. opponent adjustment
- Tier 2 (CB/S/EDGE): decent data
- Tier 3 (OL/iDL/off-ball LB/ST): proxy stats, directional only
We also have to handle players who fall below minimum-snaps thresholds: their season exists in data but the grade isn't reliable enough to display as if it were.
Options for representing both:
- Compute on read — the web app derives tier from position and
qualifiedfrom a snap-count join. - First-class columns on
season_grades—data_tier SMALLINTandqualified BOOLEANwritten by the pipeline, read directly. - Separate views per tier —
season_grades_tier1, etc.
Decision#
First-class columns on season_grades:
data_tier SMALLINT NOT NULL CHECK (data_tier BETWEEN 1 AND 3)qualified BOOLEAN NOT NULL DEFAULT TRUE
The pipeline sets both at write time. The web app reads them directly and shows a tier badge / "insufficient sample" pill without any joins or recomputation.
Consequences#
Easier:
- One query returns everything the UI needs to render a grade with full
context (
SELECT composite_grade, data_tier, qualified ...). - Tier-mapping logic lives in one place (the pipeline) and isn't duplicated between Python and TS.
- Filtering ("only show qualified Tier 1 grades") is a trivial WHERE clause with index support.
Harder:
- Changing the tier-mapping rules requires re-running grading to refresh the column. We accept this; tiers don't change often.
- A small amount of data redundancy: tier is implied by position. We accept this for query simplicity.
Explicitly given up:
- Computed-on-read flexibility. If we ever need per-user tier overrides (we won't), we'd have to add them as a separate table.
See also#
- ADR-0016 — TE
roleanddata_tier_reason(era + blocking-role merge) written alongsidedata_tieronseason_grades.
Normalize historical team abbreviations to current
- Status: Accepted
- Date: 2026-04-22
Context#
nfl_data_py uses the contemporary team abbreviation for each season's
data:
- 2016 Chargers are
SD, 2017+ areLAC - 2016–2019 Raiders are
OAK, 2020+ areLV - Pre-2016 Rams are
STL; from 2016 they'reLA(some sources useLAR) - A few sources sprinkle in
WSH,ARZ,BLT, etc.
If we naively join pbp.posteam = teams.abbr, 2016 Chargers rows silently
drop or fail FK constraints. We have to handle this somewhere.
Options:
- Store historical abbreviations as-is, display them as-of the season. ("In 2016, SD went 5-11" — but they're the Chargers, same franchise.)
- Normalize everything to the current abbreviation at ingestion time
via a
team_aliaseslookup table. - Use
nflverse-teampackage mappings at query time.
Decision#
Normalize to current abbreviation at ingestion. The team_aliases table
maps every historical abbr (and a few alternate spellings) to the current
team_id. Every current abbr aliases to itself, so the lookup is one
unconditional query.
The UI never displays SD or OAK. A 2016 Chargers depth chart is shown
under "Los Angeles Chargers" with a note that the team relocated.
Consequences#
Easier:
- All FK relationships work without special-casing historical abbrs.
- Cross-season queries ("show me all Chargers QBs since 2016") return the expected rows without UNIONs or OR clauses.
- Adding a new alias (some future relocation, or a new alternate spelling found in PFR data) is one INSERT.
Harder:
- Historical "purity" lost — a 2016 game line in our DB will say
LAC, notSD. We accept this; the franchise identity matters more than the city-of-record for player grading. - Need a small chunk of UI copy when showing pre-relocation seasons ("relocated 2017 from San Diego"). Cheap.
Explicitly given up:
- Showing "as the team was named at the time." If we ever build a historical game viewer, we'd surface that there.
Hand-written TS types with codegen guardrail
- Status: Accepted
- Date: 2026-04-22
Context#
The DB schema is the source of truth (ADR 0001). The Next.js side needs TypeScript types that match the schema. We considered:
- Hand-write everything. Simple but rots silently when migrations change.
- Switch to Drizzle/Prisma, define schema in TS, generate everything. Wrong direction — would make TS the source of truth.
- Auto-generate from the live DB with
kanel/pg-to-ts, replace hand-written types entirely. - Hand-write the public types, auto-generate the raw row types as a guardrail.
Decision#
Option 4. Two layers:
web/src/types/db.generated.ts— auto-generated frominformation_schemabynflgrades gen-types. Mirrors raw table shapes one-to-one. Never edited by hand. Committed to the repo so TS compiles without a live DB.web/src/types/index.ts— hand-written. Imports the generated row types and re-exports them with curated names, narrowed string-literal unions (e.g."AFC" | "NFC"instead ofstring), and view-shaped types for joins and aggregates.
In CI we'll run nflgrades gen-types --check which exits non-zero if the
generated file is stale. That's the guardrail: if you change a migration
without regenerating, CI catches it.
Consequences#
Easier:
- The schema can grow without TS imports breaking — add a column, run
gen-types, decide whether to expose it in
index.ts. - We get string-literal unions (
Conference,DataTier) where the raw Postgres type is justtext/smallint. Better than what any pure generator gives us. - Reviewers see the type changes in
index.tsPRs and can reason about the public API surface.
Harder:
- Two files to keep mentally aligned. Mitigated by
index.tsbeing short anddb.generated.tsbeing mechanical. gen-typesrequires a live DB. Acceptable since we have docker-compose.
Explicitly given up:
- Fully automatic types. We're trading a small amount of manual work for the ability to express domain types more precisely than introspection can give us.
Forward-only migrations with `schema_migrations` tracking
- Status: Accepted
- Date: 2026-04-22
Context#
We need a migration story. Options:
- Alembic. Standard for SQLAlchemy projects. Auto-generation from ORM models. We have no ORM models (ADR 0002), so the auto-generation isn't useful.
- Raw
psql -fper file, no tracking. Simple but easy to apply the same migration twice or skip one. - A tiny custom runner that tracks applied migrations in a
schema_migrationstable and refuses to re-apply or run modified files.
Decision#
Option 3. pipeline/src/nfl_grades/migrate.py (~80 lines) does:
- Creates
schema_migrations(filename PRIMARY KEY, sha256, applied_at)if it doesn't exist. - Lists
db/migrations/*.sqllexically. - For each file: skip if applied with matching sha; error if applied with different sha (someone edited an applied migration); apply otherwise.
- Each migration runs in its own transaction.
- Optional
--seedsflag also runsdb/seeds/*.sql(idempotent, re-runs every time).
Migrations are forward-only. To fix a bad migration, ship a new one
(0007_fix_bad_constraint.sql).
Consequences#
Easier:
- Deploying to Supabase/Neon is
nflgrades migrate. Same code as local. - New developers' first command is obvious and safe.
- Sha tracking catches "I edited an applied migration" mistakes loudly instead of silently going out of sync.
Harder:
- No down migrations. Acceptable: in 6 years of running this kind of pipeline, down migrations are almost always the wrong tool — you ship a forward fix instead.
- No model -> migration auto-generation. We don't want it; we'd rather hand-write SQL and review it.
Explicitly given up:
- Alembic ecosystem (branching, multiple heads, etc.). We have one head and we ship to it. If this ever stops being true, revisit.
Edge cases#
0001_init.sqlis currently editable because nothing has been applied anywhere yet. The moment it's applied to any environment, it becomes immutable.- The
schema_migrationstable is not itself in a migration file — the migration runner creates it on first invocation. That's intentional; bootstrapping a tracking table inside a tracked migration is a chicken- and-egg problem we don't need.
Pure-function grading math, DB I/O isolated to `ingest/`
- Status: Accepted
- Date: 2026-04-22
Context#
The grading pipeline has many moving parts: empirical Bayes shrinkage, opponent adjustment, z-score within position, inverse-noise composite weighting, sigmoid mapping to 0-100, Kalman smoothing across seasons. We need to be able to:
- Tune parameters interactively in notebooks
- Unit-test math without spinning up Postgres
- Re-run grading on cached/synthetic data
- Compare two grading variants side-by-side without committing one to disk
If grading code calls into the database, all of this gets harder.
Decision#
Modules under grading/, career/, components/, and adjust/ are pure
functions. They take pandas DataFrames and return pandas DataFrames. They
must not import from nfl_grades.db or nfl_grades.ingest.
DB I/O lives in two places only:
nfl_grades.ingest.*— reads fromnfl_data_py, writes to raw tables- The CLI commands in
nfl_grades.cli— orchestrate by reading from DB, passing DataFrames to the pure functions, writing results back
Concretely: grading/empirical_bayes.shrink(df, ...) returns a Series. The
CLI does df = pd.read_sql(...); shrunk = shrink(df); df.to_sql(...).
Consequences#
Easier:
- Tests for grading math are pure-Python, no fixtures, no test DB. See
pipeline/tests/grading/test_sigmoid.pyfor the pattern. - Notebooks can iterate on math by passing in any DataFrame, including hand-constructed ones for edge cases.
- A future "grade variant comparison" feature is just calling the same pure function with two parameter sets and diffing the outputs.
Harder:
- The CLI is responsible for the orchestration glue. That code is less interesting and less tested. Acceptable; it's mostly two-liners.
Enforcement:
- ADR-only for now. If we get tempted to add a DB call inside
grading/, the import would be the obvious red flag in code review. If this becomes a recurring problem, add animport-linterrule.
Sigmoid grade mapping with k=1.15, z=0->50, z=+2->90
- Status: Accepted
- Date: 2026-04-22
Context#
After computing a composite z-score per (player, season, position), we need to map it onto the 0-100 grade scale users see. Options:
- Linear rescale:
grade = 50 + 20*z, clipped to [0, 100]. Simple, but cliffs at the boundaries and stretches the middle. - Percentile-based:
grade = 100 * percentile_rank(z). Self-rescaling year over year (a "90" never means the same thing twice). - Sigmoid:
grade = 100 / (1 + exp(-k * (z - z0))). Smooth, bounded, monotonic, never rescales.
Decision#
Sigmoid with k=1.15 and z0=0. Implementation in
pipeline/src/nfl_grades/grading/sigmoid.py.
Parameters chosen so that:
- z = 0 -> grade = 50
- z = +1 -> grade ~= 76
- z = +2 -> grade ~= 91
- z = -2 -> grade ~= 9
Rough interpretation: a "90" is roughly 2 standard deviations above the positional mean — about the 97th percentile of qualified players.
Consequences#
Easier:
- Grades are stable across seasons. A 90 in 2018 means roughly the same thing as a 90 in 2024.
- Bounded [0, 100] without clipping artifacts.
- Smooth and monotonic — small z changes produce small grade changes.
- Same mapping works for every position.
Harder:
- Not directly interpretable as a percentile. We address this by storing
percentilealongsidecomposite_gradeonseason_grades. - Tuning k requires balancing "spread between elite players" (higher k) against "starters cluster near 50" (lower k). 1.15 is the current sweet spot from synthetic-data tuning; will be re-checked once we have real QB grades to eyeball.
Subject to revision:
- This is the v1 default. If face-validity tests after build step 2 say "the top 10 QBs are all 95+ and indistinguishable," we lower k. If they say "Mahomes is 78," we raise k. Document changes by superseding this ADR.
Raw nflverse data cached as parquet; only typed tables in Postgres
- Status: Accepted
- Date: 2026-04-23
Context#
Every ingest module pulls a DataFrame from nfl_data_py (play-by-play,
rosters, depth charts, NGS passing/receiving/rushing, weekly snap counts,
schedules) and eventually has to populate our typed tables (players,
player_seasons, depth_charts, stat_components, etc.).
The question: what happens to the raw DataFrame between the network call and the typed insert? Three real options:
- Direct ETL. Pull from
nfl_data_py, transform in memory, write typed rows. Discard the raw DataFrame. - Raw tables in Postgres. Persist the raw DataFrame to
raw_pbp,raw_rosters, etc. (text/jsonb-heavy schemas). Transform reads from those raw tables and writes to typed tables. - Parquet on disk. Cache the raw DataFrame to
pipeline/.cache/raw/{source}/{season}.parquet. Transform reads from parquet and writes typed rows to Postgres.
Things that matter for our project:
- PBP is large. ~50k rows × 300+ columns per season × 10 seasons is the bulk of our raw data. Most of those columns we never use.
- Iteration speed dominates. Tuning grade weights or the garbage-time
filter means re-running transforms many times per session. Re-downloading
PBP each time would kill the loop.
nfl_data_py.import_pbp_data([2024])takes ~30s; across 10 seasons that's 5 minutes per iteration. - Upstream churn happens.
nfl_data_pycorrects historical data and occasionally renames columns. A snapshot of "what we believed the schema was on date X" is valuable for debugging "why did this player's grade change?" - Postgres is for the product, not the archive. The web app, indexes, and analytical queries all target typed tables. Mixing 100M+ raw PBP rows in the same DB blows up backups, dump sizes, and query planner headroom.
- Pure-function math (ADR 0007). Transforms take DataFrames in and return DataFrames out. They don't care whether the source was a live API call, a parquet file, or a SQL query.
Decision#
Three-layer separation:
- Raw layer — parquet on disk. Every
nfl_data_pycall funnels through acache_or_fetch(source, season)helper that:- Returns
pd.read_parquet(...)if the file exists. - Otherwise calls the upstream function, writes the parquet, returns the DataFrame.
- Path:
pipeline/.cache/raw/{source}/{season}.parquet(already in.gitignore, configurable viaPIPELINE_CACHE_DIR).
- Returns
- Manifest — JSON sidecar.
pipeline/.cache/raw/manifest.jsonrecords{source, season, fetched_at, nfl_data_py_version, row_count, sha256}per file. Lets us detect upstream churn without re-downloading and surfaces stale caches innflgrades validate. - Typed layer — Postgres. Only schema-defined tables live in Postgres
(
db/migrations/*.sql). Noraw_*tables, nojsonbcolumns holding raw payloads.
CLI behavior:
nflgrades ingest <source> --seasons 2024,2025uses the cache by default.nflgrades ingest <source> --refreshignores the cache, re-fetches, and rewrites parquet + manifest.nflgrades ingest --refresh-stalere-fetches anything where the manifest shows the cachednfl_data_pyversion differs from the installed one.
Audit trail in Postgres: the existing pipeline_runs table records
each ingest invocation (stage='ingest:{source}', season, rows_written,
status). The pipeline_runs row says we ingested season X on date Y;
the parquet file holds what we actually saw.
Consequences#
Easier:
- Re-running grading on new parameters costs the transform time only — no
network, no waiting on
nfl_data_py. - Notebooks load raw with one line:
pd.read_parquet(cache_path("pbp", 2024)). - Reproducing a historical grade is
git checkout <sha>+ the parquet files; the database can be rebuilt from those two inputs alone. - Postgres backups stay small (~tens of MB for the typed product) instead of carrying GBs of raw PBP we never query in SQL.
- If we ever need ad-hoc SQL over raw, DuckDB reads the parquet directly
(
duckdb.sql("SELECT * FROM 'pipeline/.cache/raw/pbp/2024.parquet'")). We don't have to commit to that now.
Harder:
- Raw isn't backed up automatically. Acceptable: raw is regenerable
from
nfl_data_pyfor any season we cover. The cost of a wiped cache is one slow re-ingest, not data loss. - Two storage systems instead of one. Acceptable: the boundary is
obvious — anything inside
ingest/cache_or_fetch(...)reads/writes parquet, everything downstream reads from typed Postgres. - Detecting upstream column renames isn't automatic. The manifest catches
fetched-with-different-version; the schema-mapping code in
ingest/catches renamed-column loudly when it tries to access the missing key. Both are acceptable failure modes — loud and early.
Explicitly given up:
- Raw-in-DB convenience. Some teams like being able to
psqlinto araw_pbptable mid-debug. We're a pandas pipeline; you'd open a notebook andpd.read_parquetinstead. If this ever becomes painful, expose raw via a DuckDB-backed FDW or a thinrawschema — don't migrate the primary store. - Streaming ingest. Parquet is batch-oriented. We have no streaming use case (NFL data lands once a week); revisit if that changes.
Implementation notes (non-binding)#
cache_or_fetchlives innfl_grades.ingest._cacheand is the only module allowed to importnfl_data_py. Every concrete ingester (ingest/pbp.py,ingest/rosters.py, ...) calls it with its source key.- The manifest is rewritten atomically (write to
manifest.json.tmp, rename) so a Ctrl-C mid-update can't corrupt it. - Parquet uses pyarrow with default compression (snappy). Don't override unless we hit a real size or speed problem.
- Cache invalidation policy: never automatic. Refresh is always an explicit CLI flag. We'd rather work on stale data than silently re-run ingest under a developer.
Use nflreadpy (official nflverse) instead of nfl_data_py
- Status: Accepted
- Date: 2026-04-23
- Supersedes: implicit choice of
nfl_data_pyin earlier scaffolding
Context#
The original pipeline scaffolding picked nfl_data_py as the data-source
client (mentioned in data-sources.md, pipeline/README.md, and
pyproject.toml's [ingest] extra). This was the de-facto standard for
Python access to nflverse data for several years.
Two things forced a re-evaluation:
- Python 3.13 incompatibility.
nfl_data_py 0.3.3(the latest release, shipped in early 2024) caps its dependencies atnumpy<2.0. Our stack is Python 3.13 withnumpy>=2.1(which is required for Python 3.13 wheels — there are nonumpy<2wheels for cp313).pip install ".[ingest]"fails withResolutionImpossible. nflreadpyexists and is the official successor. Released September 2025 by Tan Ho (nflverse maintainer),nflreadpyis a Python port ofnflreadr(the canonical R package for nflverse data). It pulls from the samenflverse-dataGitHub releases — the actual data source is identical.
Comparison:
| Aspect | nfl_data_py 0.3.3 | nflreadpy 0.1.5 |
|---|---|---|
| Maintainer | Cooper Adams (community) | Tan Ho (nflverse core team) |
| Last release | Feb 2024 | Nov 2025 (5 releases in 3 months) |
| Python 3.13 | broken (numpy<2 pin) | supported, classifier present |
| DataFrame backend | pandas | polars (with .to_pandas() method) |
| Data source | nflverse-data releases | nflverse-data releases (same) |
| Caching | none | built-in (memory or filesystem) |
| API surface | import_pbp_data, import_seasonal_rosters, ... | load_pbp, load_rosters, ... |
| Coverage | PBP, NGS, rosters, snaps, etc. | PBP, NGS, rosters, snaps, FTN, contracts, draft, injuries, ... (superset) |
The "Beta" status warning on nflreadpy is real but the API mirrors
nflreadr exactly, so the contract is well-defined and the underlying
data files are the same we'd be reading either way.
Decision#
Use nflreadpy for all nflverse data access. Specifically:
pipeline/pyproject.toml[ingest]extra:nflreadpy>=0.1.5,polars>=1.0,pyarrow>=18.0.- All ingest modules (
ingest/pbp.py,ingest/rosters.py, etc.) callnflreadpy.load_*functions. - The
cache_or_fetchhelper from ADR 0009 wrapsnflreadpycalls and converts polars → pandas at the boundary so the rest of the pipeline stays pandas-based (we have no reason to rewrite the math layer in polars yet). nflreadpy's built-in cache is disabled (NFLREADPY_CACHE=off); we control caching ourselves via parquet files- manifest per ADR 0009. Two cache layers would be redundant and the manifest needs the raw network fetch to record correctly.
- Function-name mapping is documented in
docs/data-sources.md(import_pbp_data→load_pbp,import_seasonal_rosters→load_rosters, etc.).
Consequences#
Easier:
- Python 3.13 just works. We keep the modern numpy/pandas/scipy stack without downgrading.
- We're tracking the same library as the R-side nflverse community uses, which means R-language docs and examples translate almost directly.
- Active development: bugs and data updates land in weeks, not years.
- Polars is faster than pandas for the kinds of bulk reads ingest does (10-50M PBP rows). Even though we convert to pandas, the read+parse step is faster.
Harder:
- We pull in
polars(~50MB) andpyarrow(~30MB) at the ingest extra. Acceptable: ingest is a power-user/CI workload, not a thin import. - Polars → pandas conversion at the ingest boundary is one extra
.to_pandas()call. Effectively free (zero-copy via Arrow when possible). - "Beta" library risk: API could shift between 0.x releases. Mitigated by
pinning a minimum version and keeping the wrapper layer (
_cache) thin enough that an API change is one-file fix.
Explicitly given up:
nfl_data_pyecosystem familiarity. Function-name muscle memory needs retraining (import_pbp_data→load_pbp). Net cost: a doc table.- Pandas-native reads. We could keep using pandas directly via
pd.read_parqueton nflverse parquet URLs, but then we'd be re-implementing the discovery/versioning logic thatnflreadpyalready handles. Not worth it.
What this changes in the repo#
pipeline/pyproject.toml[ingest]extradocs/data-sources.md— function-name mapping,nflreadpyreferencespipeline/README.md— replacenfl_data_pymentionspipeline/src/nfl_grades/ingest/__init__.pydocstringAGENTS.md— convention #5 already cites ADR 0009; nothing to change beyond the data-source namedocs/adr/0009— still correct (parquet caching strategy is source-agnostic); leave it alone
What this does NOT change#
- The grading methodology, schema, ADRs 0001–0008.
- ADR 0009's three-layer separation. Parquet on disk, manifest sidecar, typed Postgres — all independent of which Python client we use to fetch.
Store a thin `plays` table in Postgres, not the full PBP fat table
- Status: Accepted
- Date: 2026-04-23
Context#
The nflverse PBP feed (nflreadpy.load_pbp) returns ~49,500 rows × 372
columns per season. It's the input to every grading formula. ADR-0009
already decided that raw source data lives as Parquet on disk, with only
typed queryable tables in Postgres. The question now is what shape the
Postgres-side plays table takes.
Three options:
- No plays in Postgres. Grading reads Parquet each run. Web app can never drill into individual plays.
- Thin plays table — ~40 columns we actually use: identifiers, situation, classification, player attribution, outcomes.
- Fat plays table — store all 372 columns.
Decision#
Option 2. Create a plays table with ~40 curated columns, documented
below. The full 372-column Parquet remains the source of truth on disk
(pipeline/.cache/raw/pbp/<season>.parquet), and any analysis that needs
columns not in the table can re-read the Parquet directly.
Column selection#
Columns chosen for one of four reasons:
- Required by the v1 grading formula (QB composite: EPA/db, CPOE, success rate + garbage-time filter).
- Required by likely v1.x grading expansions (RB RYOE context, WR separation context, defensive attribution).
- Required by UI drill-down ("top 10 EPA plays for player X").
- Cheap to keep and likely needed soon (penalty, air_yards, yac).
Everything else — Elias IDs, no_huddle flags, yardline strings, 200+ tracking-derived columns — stays in Parquet only.
Columns#
See db/migrations/0003_create_plays.sql for the authoritative schema.
Summary:
| group | columns |
|---|---|
| identifiers (PK) | game_id, play_id |
| game context | season, season_type, week, game_date |
| teams (text abbrs, not FK) | posteam, defteam, home_team, away_team |
| situational | qtr, down, ydstogo, yardline_100, score_differential, game_seconds_remaining, half_seconds_remaining, wp |
| classification | play_type, qb_dropback, pass_attempt, rush_attempt, sack, qb_scramble, qb_spike, qb_kneel, aborted_play, two_point_attempt, penalty |
| player attribution (gsis_id text) | passer_player_id, rusher_player_id, receiver_player_id, sack_player_id, interception_player_id |
| outcomes | yards_gained, epa, wpa, cpoe, success, air_yards, yards_after_catch, complete_pass, incomplete_pass, interception, fumble_lost, pass_touchdown, rush_touchdown, touchdown |
| debugging | play_desc (renamed from nflverse desc to avoid SQL reserved-word friction) |
Total: ~42 columns.
Team and player references: strings, not FKs#
posteam/defteamstay asTEXT(not FK toteams). Historical team abbreviations (STL,OAK,SD,LApre-rebrand) already have normalization coverage viateam_aliases; pushing FK semantics into the plays table would force rewriting team abbrs during ingest and fight against the source.*_player_idcolumns store the rawgsis_idasTEXT. Joining toplayers.gsis_idis one-line SQL. Deferred advantages: we can ingest plays before rosters for that season (hasn't happened yet, but is a real recovery story if rosters breaks), and we don't have to manage FK cascades when a player is deleted.
Indexes#
(season, season_type)— partitions most grading queries.(passer_player_id, season),(rusher_player_id, season),(receiver_player_id, season)— for the "feature extraction" queries that pull one player-season's plays at a time.
Size and storage#
- ~50k rows/season. 10 seasons of history = ~500k rows.
- ~40 columns, mostly nullable small numerics + a few text keys.
- Estimated ~80 MB for 10 seasons in Postgres (10x smaller than the Parquet cache, since we're dropping 330 columns).
- Well inside "don't bother partitioning" territory.
Consequences#
Easier:
- Grading reads
SELECT ... FROM plays WHERE season=? AND passer_player_id=?with no pandas overhead. - UI player pages can show "top 10 EPA plays" with a cheap indexed query.
- New stat components for existing positions are small SQL additions — no new ingest needed.
Harder:
- Adding a new column we later need means a new migration + a full re-ingest of affected seasons. We accept this: the column list above is conservative and covers the build plan through career grading.
- Two sources of truth for raw PBP (Parquet + Postgres). The Parquet file is canonical; if the Postgres table disagrees we re-ingest.
Explicitly given up:
- Per-play tracking fields (time-to-throw per play, pressure tags) — those live in NGS / FTN, not PBP, and are ingested separately.
- The 300 "everything else" PBP columns — fumble recovery IDs, drive numbers, kicker yards etc. Available via the Parquet cache if needed for ad-hoc analysis.
References#
- ADR-0009: Raw data cached as Parquet, typed tables in Postgres.
docs/exploration/2026-04-23-pbp.md(to follow this ADR) — probe output that anchored this column selection.
Store NGS as three tables, not one unified fact table
- Status: Accepted
- Date: 2026-04-23
Context#
Next Gen Stats (NGS) arrives via nflreadpy.load_nextgen_stats(stat_type=...)
in three flavors:
- passing (29 cols):
avg_time_to_throw,aggressiveness,completion_percentage_above_expectation(NGS's CPOE),avg_air_yards_to_sticks, plus derived efficiency numbers. - rushing (22 cols):
rush_yards_over_expected_per_att,efficiency,percent_attempts_gte_eight_defenders,avg_time_to_los. - receiving (23 cols):
avg_separation,avg_cushion,avg_yac_above_expectation,percent_share_of_intended_air_yards.
Column overlap across the three: only the keys
(player_gsis_id, season, season_type, week, team_abbr) and the
"display" fields we drop. Zero substantive stat overlap.
NGS coverage: 2016 → present. Earlier seasons have no NGS data at all.
Options:
- Three tables:
ngs_passing,ngs_rushing,ngs_receiving, each with its native columns. - One unified
ngs_stats(player_id, season, week, component_name, value)EAV table: normalizes across stat types. - One wide table with all 29+22+23 columns, most nullable.
Decision#
Option 1. Three tables, each holding its source columns verbatim
(minus display dupes like player_first_name). Feature extraction joins
whichever table the position needs.
Rationale#
- Column overlap is zero. An EAV table would force every query to
filter by
component_name, losing type safety and pushing schema into strings. No analytic win. - Query shape matches the storage shape. QB grading reads one row
per passer from
ngs_passing. RB reads one row fromngs_rushing. Not joining across stat types — no benefit to unifying them. - Size is trivial. ~600 QB-season-weeks + ~600 RB-season-weeks + ~1400 WR/TE-season-weeks × 10 seasons × ~74 columns total = well under 100 MB. Three tables don't hurt.
- Rejected Option 3 (wide table): half the row would be nulls for any given position. Ugly, misleading query surface, same storage win as Option 1 once you exclude nulls.
Grain and keys#
Each table: one row per (player, season, season_type, week, team).
week = 0is the season summary row (nflverse convention). The grading pipeline readsWHERE week = 0for per-season metrics.week > 0preserved for future weekly UI / trend charts.season_typeis kept because NGS includes postseason rows (weeks 19, 20, 21, 23 on the nflverse week axis).team_idis part of the PK because a player traded mid-season gets separate NGS rows per team (the season-summary row too — each team segment gets its own summary).
Team normalization#
team_abbr in the source is the contemporary abbreviation (LAR,
LAC, LV, etc.). We resolve via team_aliases at ingest time to
get team_id, same as every other ingest. See ADR-0004.
Player mapping#
player_gsis_id in NGS is the nflverse gsis id, which we already use
as the canonical identifier on players.gsis_id. No name matching
required.
Minimum season#
season >= 2016 is enforced in ingest. Earlier seasons have no NGS;
the grading pipeline handles their absence via data_tier (ADR-0003).
What we store#
Every NGS-specific column, verbatim. No pruning — NGS is small and
future formula variants may want max_air_distance or
percent_attempts_gte_eight_defenders even if v1 doesn't.
We drop: player_first_name, player_last_name, player_display_name,
player_short_name, player_position, player_jersey_number. All
already available on players / player_seasons / depth charts.
Consequences#
Good:
- Natural query shape:
SELECT * FROM ngs_passing WHERE week=0 AND season=2024. - Adding new NGS columns (if nflverse exposes them) is a single
ALTER TABLEper affected stat type — no EAV-row-count explosion. - Type-safe columns in generated TypeScript.
Trade-offs:
- Three ingest code paths (shared via a dispatcher — see
ingest/ngs.py). - Adding a new stat type (hypothetical
ngs_defense) is a new migration rather than "just insert rows".
References#
- ADR-0003 —
data_tierfor missing historical coverage - ADR-0004 — team abbr normalization
docs/exploration/2026-04-23-ngs.md(when populated) — schema probe
QB v1 grading formula
- Status: Accepted (supersedable — v1 of the formula)
- Date: 2026-04-23
- Supersedes: None
- Formalizes:
docs/grading/qb-v1-proposal.md(strawman approved by the user 2026-04-23)
Context#
First concrete grading formula the pipeline needs to compute. Scope limited to the QB position so we can ship a full vertical slice (ingest → features → grades → UI) and iterate on the formula once real numbers are on screen.
Decision#
Composite#
grade = sigmoid(composite_z)
composite_z = 0.50 * z(shrunk_EPA_per_dropback)
+ 0.25 * z(shrunk_CPOE)
+ 0.25 * z(shrunk_success_rate)
z()= within-position, within-season standardization.sigmoid()= existinggrading/sigmoid.py, tuned soz = 0 → 50,z = +2 → ~90,z = -2 → ~10.- z-score mean/SD computed from qualified QBs only (see below).
Per-component definitions (before shrinkage)#
| Component | Raw value | Sample space |
|---|---|---|
qb_epa_per_dropback | mean of plays.epa | dropbacks (post-filter) |
qb_cpoe | mean of plays.cpoe | pass attempts only (CPOE is null on sacks/scrambles) |
qb_success_rate | mean of plays.success | dropbacks (post-filter) |
Filter#
A play counts toward the grade iff ALL:
plays.season_type = 'REG'
plays.qb_dropback = TRUE
plays.aborted_play = FALSE
plays.two_point_attempt = FALSE
NOT garbage_time
Garbage-time (ADR-0013 formalizes the proposal's rule):
garbage_time =
(qtr >= 4 AND ABS(score_differential) > 21)
OR (qtr = 4 AND game_seconds_remaining < 300
AND ABS(score_differential) > 14)
Chosen over the wp < 0.05 OR wp > 0.95 convention because nflverse
WP is aggressive about locking in late-game outcomes, and we'd rather
err on the side of keeping a legitimate play than dropping one.
Empirical Bayes shrinkage#
Per component, before z-scoring:
shrunk = (n * raw + k * mu_league) / (n + k)
n= sample size for that component (dropbacks for EPA + success rate; pass attempts for CPOE — CPOE is null on sacks/scrambles so we use only the plays where it's defined).mu_league= league mean of the raw component among all QBs, weighted by their sample size (volume-weighted, not simple average).kshrinkage strength:k = 150for EPA/db and success ratek = 100for CPOE (lower variance, less shrinkage needed)
Qualified threshold#
qualified = TRUEiffn_dropbacks >= 200in the regular season.- All QBs with any dropbacks get a row in
season_grades, but those below the threshold havequalified = FALSE— the UI can de- emphasize them. - Unqualified QBs still get shrunk / z-scored so their grade is on the same 0–100 scale.
Position assignment#
A player grades as QB iff they're in player_seasons.position_played = 'QB' for that season. If a player appears at multiple positions, they
only grade at each position they occupied. Non-QBs with a passing play
(e.g. wildcat RB throws) don't get a QB grade — the QB feature query
joins against player_seasons.position_played.
What opponent adjustment?#
None for v1. Deferred. The composite runs off raw EPA, no defense-strength normalization. Revisit in v2 once face-validity feedback shows whether it's missing.
Confidence#
season_grades.confidence is set to min(1, n_dropbacks / 300).
Rough proxy — 300 dropbacks is roughly half a full-season starter's
workload; anyone at/above that gets confidence = 1.
Data tier#
Per ADR-0003:
- 2016+: tier 1 (full PBP + NGS available)
- 2006–2015: tier 2 (PBP available, no NGS — not relevant to the v1 formula since we don't use NGS)
- pre-2006: tier 3 (no EPA model — cannot grade with this formula)
For now we only compute grades for seasons that have PBP ingested. The
data_tier column on season_grades records which tier the grade
belongs to.
Consequences#
Testability: Each stage (filter, shrinkage, z-score, composite, sigmoid) is a pure function on a DataFrame. Component tests verify the math, integration tests verify the top-10 list looks sane.
Iteration: If we decide CPOE is overweighted, that's a single
coefficient change in grading/qb.py. If we want to add opponent
adjustment later, it's a new column on stat_components — no schema
change. The formula is a library, not an API.
Superseded when: We add NGS-derived components (time-to-throw, aggressiveness), opponent adjustment, or a defensibly-tuned inverse- variance weighting. Those go in v2 and get their own ADR.
References#
docs/grading/qb-v1-proposal.md— the strawman user approved- ADR-0003 — data tiering for missing historical coverage
- ADR-0007 — originally sketched inverse-noise weighting; v1 skips this intentionally for explainability
db/migrations/0001_init.sql—stat_componentsandseason_gradestables (pre-existing)
RB v1 grading formula
- Status: Accepted (supersedable — v1.1 of the formula)
- Date: 2026-04-24
- Updated: 2026-04-22 (v1.1 — see "v1.1 refinement" section)
- Supersedes: None
- Companion to: ADR-0013 (QB v1). Same pipeline shape, different components and per-skill sample sizes.
Context#
Second concrete grading formula. QB v1 shipped (ADR-0013); we're extending the same architecture (extract → shrink → z → composite → sigmoid) to RB. Two things make RB harder than QB:
- Role variation is huge. Derrick Henry (280 carries / 20 targets) and Christian McCaffrey (220 / 100) are both elite by very different profiles. A naive single-composite formula would wrongly penalize a thumper's "bad" receiving or reward a pass- catching back's "easy" rushing.
- Raw RB stats reflect a lot of non-RB stuff (OL quality, box counts, play-action, scheme). NGS RYOE and YAC-over-expected already try to strip this out, so they deserve meaningful weight.
We choose to handle (1) with usage-aware empirical Bayes
shrinkage — a pure thumper's receiving components shrink hard
toward the league mean (because their n_targets is small relative
to the shrinkage k) and contribute close to zero to the composite.
No explicit role detection is needed.
Decision#
Composite#
grade = sigmoid(composite_z)
composite_z = 0.28 * z(shrunk_ryoe_per_attempt)
+ 0.18 * z(shrunk_rush_epa_per_attempt)
+ 0.14 * z(shrunk_rush_success_rate)
+ 0.18 * z(shrunk_rec_epa_per_target)
+ 0.12 * z(shrunk_yac_over_expected_per_rec)
+ 0.05 * z(shrunk_catch_pct)
- 0.05 * z(shrunk_fumble_rate)
Rush 60% / Rec 35% / Security 5%. Fumble rate enters with a negative sign (fumbles are bad).
z()= within-position, within-season standardization (same helper as QB; mean/SD computed from qualified RBs only).sigmoid()= existinggrading/sigmoid.pytuned soz = 0 → 50,z = +2 → ~90.
Per-component definitions (before shrinkage)#
| Component | Raw value | Sample (n) | Source |
|---|---|---|---|
rb_ryoe_per_attempt | NGS rush_yards_over_expected_per_att | carries | ngs_rushing (week=0) |
rb_rush_epa_per_attempt | mean of plays.epa on rushes | carries | plays |
rb_rush_success_rate | mean of plays.success on rushes | carries | plays |
rb_rec_epa_per_target | mean of plays.epa on targets | targets | plays |
rb_yac_over_expected_per_rec | mean of plays.yards_after_catch - plays.xyac_mean_yardage on completions | receptions scored by xYAC model (n_rec_with_xyac) | plays (nflfastR xYAC) |
rb_catch_pct | n_receptions / n_targets (from plays, filter-matched) | targets | plays |
rb_fumble_rate | fumble rate per touch (any fumble by ball carrier) | total touches | plays |
Pre-adjusted flag: rb_ryoe_per_attempt and
rb_yac_over_expected_per_rec are already context-adjusted by their
upstream models (NGS's RYOE model and nflfastR's xYAC model
respectively). When opponent adjustment lands in v2, these two
components must be flagged so we don't double-adjust.
Catch-% source: NGS's load_nextgen_stats("receiving") only
publishes rows for WR/TE — RBs are never included regardless of
target volume. For v1 we derive catch % directly from plays:
n_receptions / n_targets, with the same garbage-time / 2-pt
filter as the rest of the receiving components. No expected-catch
baseline is applied (none is available); we accept the limitation
because RB target diet is relatively uniform (mostly short routes)
and the component's weight is only 5%.
YAC-over-expected source: same NGS-receiving RB gap — we
instead use nflfastR's xyac_mean_yardage column published on
every completion in plays. For each RB reception with a non-null
xyac_mean_yardage, the residual yards_after_catch - xyac_mean_yardage is the RB's YAC over expected on that play. We
average across the RB's receptions (filter matches the rest of the
receiving components). Coverage on RB completions in the modern
era is >99% (≈0.9% null in 2024), so sample size effectively equals
n_receptions. See v1.1 refinement section below.
Fumble rate: computed from plays.fumble (any fumble by the
ball carrier, not just ones recovered by the defense). Counted on
both rushing and receiving plays within the same per-skill filters
as the production metrics. See v1.1 refinement section below.
Filter#
A rushing play counts toward the rushing components iff ALL:
plays.season_type = 'REG'
plays.rush_attempt = TRUE
plays.rusher_player_id IS NOT NULL
plays.qb_kneel IS NULL OR plays.qb_kneel = FALSE
plays.qb_scramble IS NULL OR plays.qb_scramble = FALSE -- scrambles aren't RB production
plays.two_point_attempt IS NULL OR plays.two_point_attempt = FALSE
NOT garbage_time
A receiving play counts toward the receiving components iff ALL:
plays.season_type = 'REG'
plays.pass_attempt = TRUE
plays.receiver_player_id IS NOT NULL
plays.two_point_attempt IS NULL OR plays.two_point_attempt = FALSE
NOT garbage_time
Garbage-time rule is identical to ADR-0013:
garbage_time =
(qtr >= 4 AND ABS(score_differential) > 21)
OR (qtr = 4 AND game_seconds_remaining < 300
AND ABS(score_differential) > 14)
Position assignment#
A player grades as RB iff players.position = 'RB'. We grade from
the master players table (not player_seasons.position_played) so
that a rookie who changed teams mid-season still gets one grade per
player, not one per team stint.
Non-RBs with rushes (scrambling QBs, WR jet-sweepers, gadget TEs)
don't get an RB grade — the feature query joins on
players.position = 'RB'.
Empirical Bayes shrinkage#
Per component, before z-scoring:
shrunk = (n * raw + k * mu_league) / (n + k)
where mu_league is the volume-weighted RB league mean (summed over
qualified and unqualified RBs, same convention as QB v1).
k per component (picked so n == k means "half-shrunk toward
league mean"):
| Component | n column | k |
|---|---|---|
rb_ryoe_per_attempt | carries | 100 |
rb_rush_epa_per_attempt | carries | 100 |
rb_rush_success_rate | carries | 100 |
rb_rec_epa_per_target | targets | 40 |
rb_yac_over_expected_per_rec | receptions scored by xYAC (n_rec_with_xyac) | 30 |
rb_catch_pct | targets | 40 |
rb_fumble_rate | total touches | 200 |
The large k on fumble rate is deliberate — fumble rate (even with
the recovery coin-flip removed by switching from fumble_lost to
fumble) still has weak year-over-year reliability (~r=0.1-0.2),
so we shrink hard.
Handling missing data#
Some RBs are below NGS's volume thresholds and won't have
ngs_rushing season-summary rows. Our joins are LEFT JOINs and the
missing metrics come through as NaN with n = 0. Similarly, an RB
with no receptions has NaN receiving metrics.
Policy: before combining into the composite, any NaN component z-score is replaced with 0 (neutral). This covers three distinct "no data" cases under a single rule:
- A pure thumper with
n_targets = 0has NaN receiving z-scores. - A pass-game specialist with 15 carries (under NGS's rushing
volume threshold for RYOE) has a NaN z for RYOE even though
their
n_carries > 0. - Some RBs may be missing NGS rushing rows for the season entirely (e.g. rookies whose first week was postseason).
All three collapse to "no evidence on this skill = assume league average on this skill". The alternative — renormalizing composite weights per-player to drop missing components — would re-introduce role-aware weighting, which we explicitly wanted to avoid.
The stat_components.z_score column keeps the true NaN for these
rows so the UI can render "—" rather than "0.0" and be honest about
what we don't know. Only the composite calculation substitutes 0.
Qualified thresholds#
Three separate qualification concepts, because RBs have two skills:
| Threshold | Rule | Purpose |
|---|---|---|
| Grade at all | touches >= 30 | Excludes fringe players we can't say anything meaningful about |
| Composite qualified | touches >= 120 | "Real contributor" — appears in main leaderboard |
| Rushing sub-grade qualified | carries >= 80 | Rushing sub-grade displays; else "—" |
| Receiving sub-grade qualified | targets >= 40 | Receiving sub-grade displays; else "—" |
120 touches is roughly 7-8 touches/game over a full season — half
a full-season bell cow's workload, or all of a receiving specialist
like Ekeler. Tunable if the face-check shows too many marginal
committee backs at the top.
All backs with touches >= 30 get a season_grades row; the
qualified column distinguishes them.
Sub-grades#
The season_grades row holds the composite grade only. Sub-
grades (rushing / receiving) are computed at read time in the
web app by combining the already-z-scored component rows in
stat_components. No schema change.
Rushing sub-grade z =
(0.28*z_ryoe + 0.18*z_rush_epa + 0.14*z_rush_success) / (0.28 + 0.18 + 0.14) then sigmoid to 0-100.
Receiving sub-grade z =
(0.18*z_rec_epa + 0.12*z_yac_over_exp + 0.05*z_catch) / (0.18 + 0.12 + 0.05) then sigmoid to 0-100.
A sub-grade renders as "—" when the sample-size threshold for that
skill isn't met. This is purely a UI convention — the composite
grade in season_grades is unaffected.
Confidence#
season_grades.confidence = min(1, touches / 250). 250 touches is
roughly a full-season starter's workload; anyone at/above that gets
confidence = 1.
Data tier#
Per ADR-0003:
- 2016+: tier 1 (PBP + NGS available; full formula computes).
- Pre-2016: out of scope for v1. The formula depends on NGS components (RYOE, YAC-over-expected, catch %) for 45% of weight. Backfilling a pre-NGS fallback is deferred.
Consequences#
Testability: each stage is a pure function (same as QB); unit tests verify the "n=0 → z=0" neutralization, the sub-grade threshold gating, and that dual-threat backs outrank specialists.
Web app: the existing leaderboard + player detail pages render
RBs as soon as season_grades has rows. A position switcher on the
home page is a one-component follow-up (bundle with WR/TE).
Iteration: weight and k changes are single-coefficient edits
in weights.py. Adding broken-tackle-rate from PFR is a new
component row, no schema change.
v1.1 refinement (2026-04-22)#
Two caveats from the original v1 were resolved by adding two
columns to plays (migration 0005_add_fumble_and_xyac_to_plays)
and switching the RB grader's data sources:
-
Fumble rate now uses
plays.fumblerather thanplays.fumble_lost. Fumble-lost depends on who recovers (a near-coin-flip), making it strictly noisier than true fumble rate. The change is source-only — the weight (-0.05), the large shrinkagek(200), and the ball-carrier attribution rules are unchanged. -
YAC-over-expected now sourced from
plays.xyac_mean_yardage(nflfastR's xYAC model output on each completion) rather thanngs_receiving.avg_yac_above_expectation. Root cause: NGS's receiving product publishes zero RB rows regardless of target volume, so the NGS-based component collapsed to a NaN-then- neutralized 0 for effectively every RB, silently wasting its 12% composite weight. The xYAC column covers >99% of modern-era RB completions, so the component is now active signal.
Both changes preserve the existing composite weights, shrinkage
constants, qualification thresholds, and pre_adjusted flags — the
data sources change, the formula does not. Pre-adjusted remains
True for the YAC component (xYAC is still a per-play, context-
aware model — opponent adjustment in v2 must still skip this
component).
The stat_components.component_name strings remain the same
(rb_fumble_rate, rb_yac_over_expected_per_rec), preserving the
public contract with the web app.
Deferred#
- Opponent adjustment: same deferral as QB v1. When added, the
RYOE and YAC-over-expected components must be flagged as
pre_adjusted: Trueto avoid double-adjustment. - Broken-tackle rate from PFR — valuable skill signal, but reliability needs cross-year validation before we weight it.
- Red-zone / goal-line efficiency — small sample, mostly usage- driven, skipped.
- Two-point conversion efficiency — same reasoning.
- 20+ yard breakaway rate — potentially distinct signal from EPA, but correlation is high enough that we're dropping it for v1. Revisit if breakaway-archetype backs grade unfairly low.
- Route participation / target share as a graded input — no routes-run data ingested yet.
- Forced-fumble attribution, recoveries-in-pileups — deferred to a defensive-grading pass.
- Usage labels ("Feature / Committee / Specialist") derived from snap share. Nice UI add, not a grading change. v1.5.
References#
- ADR-0013 — QB v1 grading formula (same architecture)
- ADR-0003 — data tiering
- ADR-0011 — thin plays table (updated by migration 0005 to include
fumbleandxyac_mean_yardage) - ADR-0012 — NGS three-table layout (rushing used; receiving intentionally not joined for RB grading)
WR v1 grading formula
- Status: Accepted (supersedable — v1 of the formula)
- Date: 2026-04-22
- Supersedes: None
- Companion to: ADR-0013 (QB v1), ADR-0014 (RB v1). Same pipeline shape (extract -> shrink -> z -> composite -> sigmoid), different components, filters, and qualification thresholds.
Context#
Third concrete grading formula. QB v1 and RB v1 shipped; we're extending the same architecture to WR. Three things distinguish WR grading from the prior two:
- WRs have one skill, not two. There's no RB-style dual-skill split (rushing + receiving), so there's one composite and no sub-grades in v1. "Route runner vs YAC monster" is interesting UI data viz but not a separate qualification bucket.
- NGS receiving publishes WRs cleanly (unlike RBs, which NGS
excludes). We get
avg_separationandavg_yac_above_expectationon essentially all qualified WRs from 2016+. - Target earn rate is a real signal for WRs (unlike for RBs, where carries are decreed by scheme). WRs partly earn their targets by winning routes and forcing the QB's eye. This is a new component with no RB analog.
The grade is meant to answer "how well did this WR play the receiving role this season?" — separated from usage-driven accumulators (total yards, touchdowns, target share as a volume stat).
Decision#
Composite#
grade = sigmoid(composite_z)
composite_z = 0.35 * z(shrunk_rec_epa_per_target)
+ 0.27 * z(shrunk_yac_over_expected_per_rec)
+ 0.10 * z(shrunk_separation)
+ 0.10 * z(shrunk_target_earn_rate)
+ 0.08 * z(shrunk_success_rate_per_target)
- 0.05 * z(shrunk_fumble_rate)
Sum of magnitudes = 0.95. The composite combiner normalizes by
sum of magnitudes (not signed sum); fumble contributes at its
designed 5.3% share (0.05 / 0.95). This invariant is locked by
test_signed_weights_normalize_by_magnitude in
pipeline/tests/grading/test_composite.py and further reinforced
by test_wr_v1_weights_example which uses the exact
WR_V1_WEIGHTS dict.
Rough shape:
-
62% outcome-based: EPA/target 35% + YAC-over-expected 27%
-
28% process + usage: separation 10% + target earn rate 10% + success rate 8%
-
5% ball security: fumble rate (negative)
-
z()= within-position, within-season standardization against qualified WRs only (same helper as QB and RB). -
sigmoid()=grading/sigmoid.py, z=0 -> 50, z=+2 -> ~90.
Why these weights#
- EPA at 35%, not 40%. A single metric at 40% gives any systematic bias (QB quality, scripted touches, YAC-heavy offense) too much leverage. 35% keeps EPA the biggest contributor without dominating the composite.
- YAC at 27%. Highest-reliability WR signal after EPA. xYAC pre-adjusts for coverage state at the catch, so this is close to pure WR skill.
- Target earn rate at 10%, not 22%. Target share is structurally correlated with team environment (top QB, pass-heavy scheme, weak WR2 competition, weak TE/RB pass game). These confounds don't wash out across a season; they persist for players in stable situations. 10% captures the "QB looks at you" signal without letting offensive environment drive a fifth of the grade.
- Separation at 10%, not 15%. Process metric, not outcome; inflated by easy targets (screens, hitches); NGS measures at-catch rather than at-throw. Keep it modest.
- Success rate at 8%. Diversifies efficiency measurement away from pure EPA, but it's partly role-contaminated (slot checkdowns on 3rd-and-medium have a different success-rate baseline than outside verticals on 1st-and-10). 8% is a compromise — not 5% (which underweights a second efficiency lens), not 10% (which overweights a role-biased metric). Flagged as a face-check watch item: if slot specialists systematically outgrade deep threats, dial this back first.
- Catch-rate-over-expected dropped entirely. Every version of
this from public data is either QB-contaminated (aggregated
plays.cpoeper receiver rewards pairing with accurate QBs) or role-contaminated (raw NGS catch % punishes deep threats and rewards screen/flat receivers). Omitting a component is an honesty signal — PFF has proprietary charting for catchable targets; we don't. Surface raw catch % on the player page as context, keep it out of the composite. - Fumble rate at -5%. Same rationale as RB v1.1: rare event, low YoY reliability, shrink hard.
Per-component definitions (before shrinkage)#
| Component | Raw value | Sample (n) | Source | Pre-adjusted |
|---|---|---|---|---|
wr_rec_epa_per_target | mean of plays.epa on targets | targets | plays | No |
wr_yac_over_expected_per_rec | mean of plays.yards_after_catch - plays.xyac_mean_yardage on completions with non-null xYAC | n_rec_with_xyac | plays (nflfastR xYAC) | Yes |
wr_separation | avg_separation | targets | ngs_receiving (week=0) | Yes |
wr_target_earn_rate | n_targets / n_team_pass_att_active | team pass attempts while active | plays | No |
wr_success_rate_per_target | mean of plays.success on targets | targets | plays | No |
wr_fumble_rate | rate of plays.fumble per reception | receptions | plays | No |
Target earn rate denominator: n_team_pass_att_active is the
sum of posteam's regular-season pass attempts across the set of
(posteam, game_id) pairs that appear in the WR's own target
plays. This handles mid-season trades cleanly — each game's
denominator is its correct team's pass volume. The "had >=1 target"
proxy for active may slightly under-count games where the WR
played but wasn't targeted; for qualified WRs this is rare.
Fumble denominator = receptions (not targets): WRs only touch the ball on completions. Keeps fumble rate comparable across possession WRs and deep threats.
Pre-adjusted flag: wr_yac_over_expected_per_rec and
wr_separation are already context-adjusted by their upstream
models. When opponent adjustment lands in v2, these components
must be flagged so we don't double-adjust.
Filter#
A receiving play counts toward WR components iff ALL:
plays.season_type = 'REG'
plays.pass_attempt = TRUE
plays.receiver_player_id IS NOT NULL
plays.two_point_attempt IS NULL OR plays.two_point_attempt = FALSE
NOT garbage_time
Identical to the RB v1 receiving filter — reused verbatim from
grading/filters.py::RB_REC_FILTER_SQL. Garbage-time rule is the
one defined in ADR-0013.
The team-pass-attempts aggregate for the earn-rate denominator uses the same filter so numerator and denominator are consistent (both count REG-season, non-garbage, non-2pt pass attempts).
Position assignment#
A WR grade is issued iff players.position = 'WR'. A WR running a
jet sweep doesn't get rushing credit — this is a receiving grade
only. A TE/RB running routes out of the backfield doesn't get a WR
grade; they belong in their own position's pipeline.
Empirical Bayes shrinkage#
Per component, before z-scoring:
shrunk = (n * raw + k * mu_league) / (n + k)
where mu_league is the volume-weighted WR league mean (summed
over qualified and unqualified WRs, same convention as QB/RB v1).
k per component:
| Component | n units | k |
|---|---|---|
| EPA per target | targets | 50 |
| YAC over expected per rec | receptions scored by xYAC | 30 |
| Separation | targets | 40 |
| Target earn rate | team pass attempts while active | 200 |
| Success rate per target | targets | 50 |
| Fumble rate | receptions | 100 |
Separation's k (40) is slightly below the other per-target components (50) because NGS separation has higher year-over-year reliability than raw per-play efficiency metrics. Target earn rate uses its natural denominator (team pass attempts) rather than games — the EB formulation shrinks toward league-mean target share weighted by the number of observations, which is the correct statistical framing. k=200 team pass attempts is roughly 35% of a team's regular-season pass volume.
Handling missing data#
Same policy as RB v1 (see ADR-0014 "Handling missing data"): any
NaN component z-score is replaced with 0 (neutral) before entering
the composite. stat_components.z_score keeps the true NaN so the
UI can render "-" rather than "0.0".
Practically, this matters most for:
- WRs under NGS's separation volume threshold (rookies with partial seasons, or below the volume NGS publishes). Separation is NaN; z is NaN; composite substitutes 0.
- A WR with 0 completions (only happens at the extreme low-volume end) has NaN YAC and NaN fumble rate.
The alternative — renormalizing composite weights per-player to drop missing components — would re-introduce role-aware weighting, which we explicitly want to avoid.
Weight normalization invariant#
The composite combiner normalizes by sum of magnitudes
(sum(abs(w))), not signed sum. A player at z=+1 on every
component (including fumble rate — where z=+1 means "fumbles a
lot") gets composite_z = (0.35 + 0.27 + 0.10 + 0.10 + 0.08 -
0.05) / 0.95 ≈ 0.894, and fumble penalizes at exactly its
designed 5.3% share rather than being amplified by a smaller
signed-sum denominator.
This is locked by test_signed_weights_normalize_by_magnitude
(added during RB v1.1) and by the new
test_wr_v1_weights_example which exercises the actual
WR_V1_WEIGHTS dict.
Qualification thresholds#
Two qualification concepts:
| Threshold | Rule | Purpose |
|---|---|---|
| Grade at all | targets >= 20 | Excludes fringe WRs we can't say anything meaningful about |
| Composite qualified | targets >= 50 | Rotational WR3 or better; appears in main leaderboard; defines z-score population |
~3/game over a full season is roughly the floor for "this player got real route time." Tunable if face-check shows too many marginal WR3s at the top or too many clear WR1s falling below.
All WRs with targets >= 20 get a season_grades row; the
qualified column distinguishes them.
Confidence#
season_grades.confidence = min(1, targets / 100). 100 targets is
~6/game — "real starter usage" rather than WR1 workload
(which would be ~120-140+). Pegging full confidence here gives
most healthy starters confidence = 1 and reserves the fractional
band for genuine part-season / rotational players.
Data tier#
Per ADR-0003:
- 2016+: tier 1 (PBP + NGS available; full formula computes).
- Pre-2016: out of scope for v1. The formula depends on NGS components (separation, xYAC availability) for 37% of weight. A pre-NGS fallback is deferred; call it a v2 concern.
Validation expectations#
Expect WR composite year-over-year Pearson r on 2+-season
samples in the band 0.45 - 0.60.
Interpretation triggers:
- Below 0.45 — methodology problem. Most likely a process component (separation or success rate) dominating noise over EPA/YAC. Investigate weight distribution and per-component reliability.
- 0.45 - 0.60 — the expected band. WR production is genuinely more defense-dependent than QB production, and we don't have CB matchup adjustment in v1.
- Above 0.65 — suspicious. Likely means we're accidentally measuring usage (target volume, team context) rather than skill. Investigate whether target earn rate is pulling the stability or whether separation's metric-stability is doing more work than intended.
QB v1 for comparison was in the 0.60 - 0.70 band; WR's lower ceiling is a data limit (no CB matchup data), not a grading failure. Don't chase the QB number by tuning weights.
Consequences#
Testability: each stage is a pure function, same as prior
positions. Unit tests verify NaN neutralization, that a pure
separator outranks a non-separator with the same efficiency,
that the fumble penalty actually subtracts, and that the
composite normalization constant matches the hand-computed value
from WR_V1_WEIGHTS.
Web app: the existing leaderboard + player detail pages
render WRs as soon as season_grades has rows. A position
switcher on the home page is a separate follow-up (currently
hardcoded to QB; RB and WR both pending surfacing).
Iteration: weight and k changes are single-coefficient
edits in weights.py. Adding a new component (say, separation
at-throw once it becomes publicly available) is a new SQL CTE
and a new row in the weights dicts; no schema change.
Deferred (v1.1+)#
- Target-per-route-run — the clean v1.5 upgrade to target earn rate, replaces the "team pass attempts while active" proxy with a true "routes run" denominator. Requires routes-run data (PFF/FTN); not ingested.
- Team-context-adjusted target earn rate — regress target share on team pass volume + QB EPA, grade on the residual. ~30 lines of code, a v1.1 candidate if face-check shows earn rate rewarding bad-team-WR1s too generously.
- Drop rate —
playscan't cleanly isolate drops from defended passes. Requires explicit drop charting. - Slot vs outside split — no alignment data ingested. Face- check will tell us if the one-scale approach systematically biases one archetype.
- Contested catch rate — not available in public tracking data.
- Red-zone / goal-line efficiency — small sample, mostly role-driven.
- Opponent adjustment, team-level — same deferral as QB/RB
v1.
wr_yac_over_expected_per_recandwr_separationmust be flaggedpre_adjusted=Trueto avoid double-adjustment. - CB matchup adjustment — the v2+ work that would push YoY
rfrom the 0.45-0.60 band toward QB-level 0.60-0.70. Requires per-target defender charting.
References#
- ADR-0013 — QB v1 grading formula (same pipeline architecture)
- ADR-0014 — RB v1 grading formula (shares receiving machinery, same NaN neutralization policy, same xYAC source for YAC-over- expected)
- ADR-0012 — NGS three-table layout (receiving table used for
avg_separation) - ADR-0011 — thin
playstable (withfumbleandxyac_mean_yardageadded by migration 0005) - ADR-0003 — data tiering
TE v1 grading formula
- Status: Accepted (v1; iterates like RB/WR)
- Date: 2026-04-23
- Companion to: ADR-0013 (QB), 0014 (RB), 0015 (WR); ADR-0003 (data tier); ADR-0009 (parquet cache)
Context#
TE grades must reflect receiving only in v1: public data does not support a
repeatable blocking grade (no PFF-style charting). Role labels and
data_tier_reason communicate what the number measures (see Role and
data_tier below).
Decision — composite (tier 1, full six components)#
Same structure as WR v1 with separation at 7% (WR uses 10%). NGS separation is WR-coverage-geometry calibrated; TE-vs-LB/S matchups are noisier in the same metric — downweight, do not drop.
| Component | Weight |
|---|---|
te_rec_epa_per_target | 0.35 |
te_yac_over_expected_per_rec | 0.27 |
te_separation | 0.07 |
te_target_earn_rate | 0.10 |
te_success_rate_per_target | 0.08 |
te_fumble_rate | -0.05 |
Sum of magnitudes |w| = 0.92 (signed sum 0.82; composite normalizer uses
sum of absolute weights — see test_signed_weights_normalize_by_magnitude
and TE tests in test_composite.py).
The earlier "0.95" figure in this ADR was a copy-paste artifact from WR v1
(WR has separation at 0.10 → WR |w| = 0.95); TE separation is downweighted
to 0.07 for NGS-calibration reasons, giving |w| = 0.92.
YAC weight = WR (27%): do not increase TE YAC weight on intuition alone; if TE YAC YoY correlation meaningfully exceeds WR YAC in validation, consider v1.1 weight shift with evidence.
Tier 2 — role = blocking_te#
Target earn rate is role-dominated for Y-heavy TEs. Omit earn from the
composite; redistribute 0.10 to EPA and YAC in proportion 0.35∶0.27
(→ 0.406 and 0.314). Other components unchanged. The component row for
te_target_earn_rate is still written with raw / shrunk / z;
stat_components.used_in_composite = false for that row.
Because the redistribution preserves magnitude, tier-2 has the same
|w| = 0.92 and signed sum 0.82 as tier-1 — on an all-z=1 TE the two
dicts both produce 0.82 / 0.92 ≈ 0.8913. The dicts differ by where the
earn mass lands, not by total weight.
Filters, features#
- Receiving filter: same as WR/RB receiving (
RB_REC_FILTER_SQL). - Features: plays +
ngs_receiving(week=0) for separation;playsfor xYAC-based YAC-over-expected;player_seasonssummedsnaps_offensefor role. - Fumble denominator: receptions.
Qualification#
- 15 targets minimum to emit a grade row.
- 40 targets for
qualified. - Confidence =
min(1, targets / 70).
Shrinkage (per-position k)#
TE target earn k = 100 team pass attempts (vs WR 200) — smaller
cross-player dispersion in earn rate. Other components align with WR (EPA 50,
YAC 30, separation 40, success 50, fumble 100).
Role buckets#
receiving_te: target share ≥ 0.10 (targets / offensive snaps, season).balanced_te: 0.05 ≤ share < 0.10, or low-snap / low-rate catch-alls.blocking_te: share < 0.05 and offensive snaps ≥ 200.
data_tier and data_tier_reason#
Era leg: _era_tier_for_season in grading/era_tier.py → (tier, reason) with
reason = era_pre_ngs when tier ≥ 2 from era alone.
TE merge (grading-only):
- If
role == blocking_teand era tier 1 →data_tier = 2,data_tier_reason = role_blocking_te. - If
role == blocking_teand era tier ≥ 2 → keep era tier,data_tier_reason = era_and_role. - Else → era
(tier, reason)only.
Non-TE positions: role NULL; data_tier / data_tier_reason from era tuple
only.
Schema (migration 0006)#
season_grades.role, season_grades.data_tier_reason,
stat_components.used_in_composite.
Pure blocking TEs (< 15 targets)#
No season_grades row. Team/roster UI must not hide these players when built
(see plan / UX note).
Validation#
Target TE YoY r band 0.40–0.55 (slightly below WR); interpret like ADR-0015.
Deferred#
Blocking grade, alignment splits, red-zone split, target-per-route earn rate, CB matchup, etc.
References#
pipeline/src/nfl_grades/grading/te.pypipeline/src/nfl_grades/grading/era_tier.pydocs/adr/0003-data-tier-and-qualified-as-first-class-columns.md
v1 face-check: offense-context contamination in high-volume receiver grades
- Status: Accepted (v1 limitation, documented; fix deferred to v1.5)
- Date: 2026-04-24
- Companion to: ADR-0014 (RB v1), ADR-0015 (WR v1), ADR-0016 (TE v1)
Context#
After shipping WR v1 and TE v1 and running both against the 2024/2025 seasons, a face-check surfaced a recurring pattern: several high-volume receivers on bad offenses graded notably lower than their tape/production would suggest. The prompting case was Brock Bowers (LV, 2024) — the rookie-target-record holder at 153 targets who landed at grade 50.4 / rank 14 of 34 qualified TEs.
The open question was whether v1's grader has a systematic bias (treat all bad-offense receivers as underrated) or something narrower. We ran a pre-check on the 2024 data before picking a direction; the data shows the confound is narrower than "all bad-offense receivers" and also real enough to need written disclosure before declaring v1 done.
Finding#
Affected WRs — 2024, top-15 by targets#
| Name | Tm | Tgt | Grade | Rk / 84 | Tm EPA# | Top QB |
|---|---|---|---|---|---|---|
| Garrett Wilson | NYJ | 154 | 43.3 | 50 | 17 | 33.8 |
| Jerry Jeudy | CLE | 148 | 55.1 | 32 | 32 | 28.8 |
| Malik Nabers | NYG | 172 | 55.2 | 31 | 28 | 45.4 |
- Wilson: 1,100+ yds despite Rodgers' worst NFL season; ranked in the bottom 40% of qualified WRs.
- Jeudy: 1,229 yds on the league's worst offense (CLE, −0.183 EPA/play); ranked #32 is defensible but feels light.
- Nabers: rookie target record, 37th percentile grade.
Affected TEs — 2024, top-10 by targets#
| Name | Tm | Tgt | Grade | Rk / 34 | Tm EPA# | Top QB |
|---|---|---|---|---|---|---|
| David Njoku | CLE | 99 | 21.2 | 34 | 32 | 28.8 |
| Dalton Schultz | HOU | 93 | 30.0 | 31 | 22 | 31.7 |
| Brock Bowers | LV | 153 | 50.4 | 14 | 31 | 29.5 |
- Njoku: last among all qualified TEs despite 1,000+ snaps, solid reputation. Strongest single data point for offense contamination.
- Schultz: rank 31/34 with 93 targets on the Stroud-injured/Young HOU offense.
- Bowers: mid-pack grade for the highest TE target volume in 2024.
Six players across the two positions, all on offenses with top-QB grade below ~46. Matches the "bad QB play × high receiver volume" pattern.
What v1 handles correctly#
The methodology is not uniformly biased against receivers on weak offenses. Two cases prove the grader distinguishes efficient play from volume-only play inside a bad offensive environment:
Brian Thomas Jr. — 2024 WR, JAX#
- 135 targets, team EPA rank #18, top QB grade 44.9 (Lawrence's rough season)
- Grade 73.9, rank 10 / 84 — top-12 WR by grade despite the weak passing context.
A naive "bad offense → underrate" bias would predict Thomas below the WR median. He's in the top 12%.
Jonnu Smith — 2024 TE, MIA#
- 111 targets, team EPA rank #21, top QB grade 80.0
- Grade 71.4, rank 4 / 34 — top-5 TE.
MIA wasn't great offensively (below-average EPA), yet Smith's per-target efficiency was high enough to surface a top-5 grade.
Zach Ertz (WAS, 2024) is the inverse counter-example worth noting: WAS was a top-4 offense by EPA (top QB 78.7), Ertz ranked 24/34. Strong offense did not lift a clearly declining player. The grade was right.
These three cases together show the grader is responsive to per-target efficiency rather than team context as such.
The specific confound#
The failure mode is narrower than "bad-offense receivers underrated". It is specifically:
High-volume receivers whose targets are forced by their role on a team with below-replacement QB play.
Mechanics:
wr_rec_epa_per_targetandte_rec_epa_per_targetcarry ~35% of the composite. EPA is QB-dependent — the same route/catch generates less EPA when the QB throws late, off-platform, or low-completion.wr_yac_over_expected_per_rec/te_yac_over_expected_per_reccarry ~27%. xYAC is calibrated on league-average receptions; on a bad-QB offense, contested catches and off-schedule throws reduce real YAC relative to xYAC without the receiver doing anything wrong.wr_target_earn_rate/te_target_earn_ratecarries only ~10% and is a volume-adjacent signal — it helps, but not enough to outweigh the 62%+ from EPA and YAC-over-expected when both are QB-suppressed.
So a receiver who is forced to absorb record target volume on a team whose QB depresses EPA/target and YAC-over-expected across the board gets dinged twice (two big components each running 0.5–1.0 z below true skill) and credited once (one small component at +1.5 to +2.0 z for volume). Net: 5–15 composite points below a reasonable estimate.
The Thomas / Jonnu Smith counter-examples work because their per-target efficiency was high enough in absolute terms to offset the QB context — they weren't just surviving on forced volume.
Why naive offense adjustment is wrong#
The intuitive "residualize components by team offensive EPA" would:
- Over-correct Thomas and Jonnu Smith — they already showed the efficiency needed; an additional boost for "bad offense" makes their grades unjustifiably high and distorts the top of the leaderboard.
- Under-correct Bowers / Njoku relative to what they actually need — their issue is specifically per-target efficiency suppression from QB play, not general offense-level depression. Team EPA mixes run game + line play + YAC culture, so a team-EPA adjustment would dilute the QB-specific signal.
- Create new problems on good offenses — a good-offense receiver who's actually mediocre (Ertz 2024) would get a negative context adjustment and drop below where he belongs.
The right fix is usage-conditional and QB-specific: adjust per-target efficiency components for the QB quality the receiver was playing with, but only for the portion of targets that are "forced" (high target share on bad QB), and leave already-efficient-despite-bad-QB players unadjusted.
That is not a hotfix. It is a methodology change.
Decision#
Ship v1 as-is. Document the confound here. Do not modify weights, thresholds, or components. Do not layer a naive offense adjustment on top of v1.
Defer the real fix to v1.5.
v1.5 plan candidates (do not pick now; analyze first)#
- QB-quality-conditional z-scoring — when z-scoring
*_rec_epa_per_targetand*_yac_over_expected_per_rec, condition on the receiver's primary-QB composite grade (or a CPOE-derived QB quality score). Requires a second regression pass over historical seasons to calibrate. - Usage-residualized volume — add a "forced target share" signal and partially upweight it when the receiver's QB is below a threshold. Functions as a compensating positive weight only for the high-volume-on-bad-QB cell.
- Combination — (1) corrects the EPA/YAC depression, (2) credits the fact that absorbing forced volume is itself a skill signal.
All three need a validation pass against multi-season data before picking. Historical backfill of 2016–2023 (already flagged as the other major pending work) is a prerequisite — single-season analysis can't separate noise from true context effects.
UI mitigation for v1#
On player pages, display alongside the composite grade:
- Team offensive EPA/play and its league rank that season.
- Top QB grade on the player's team that season.
- If the player is a receiver (WR/TE/RB) with top-15 volume and their team's top QB grade is below ~45, a small inline note: "grade may be suppressed by QB context — see ADR-0017."
This does not change the grade. It surfaces the context the grade doesn't fully capture, so a user reading Bowers' 50.4 sees "Raiders offense #31, top QB 29.5" next to it and understands what they're looking at.
The note trigger is deliberately narrow (top-volume + bad QB) so it doesn't fire on every bad-offense receiver — that would dilute its meaning and contradict what the data actually shows (see Thomas / Smith).
Consequences#
Easier:
- v1 ships with a known, bounded limitation instead of an unfinished methodology fix. The boundary is written down and visible to users.
- v1.5 has a clear mandate backed by specific player cases to validate against (Wilson, Jeudy, Nabers, Njoku, Schultz, Bowers; counter- examples Thomas, Jonnu Smith, Ertz).
Harder:
- Until v1.5 lands, six named players per season carry visibly suppressed grades and users have to read the context panel to interpret them correctly. Acceptable for an MVP; not acceptable long-term.
- The UI has to carry context columns that wouldn't be needed if the grade self-adjusted.
Explicitly given up:
- Claiming v1 is "context-neutral". It isn't. It is "per-target efficiency-weighted within the population", which is adjacent but not the same. The /about page and the ADR index should both reflect that honestly.
References#
- 2024 face-check data (throwaway query, not committed) — results inlined above in §Finding and §What v1 handles correctly.
- ADR-0015 §Validation — the WR YoY-r band that would inform v1.5 calibration.
- ADR-0016 §Validation — TE YoY-r band.
- Pending: multi-season backfill (2016–2023) to enable usage- conditional z-scoring without overfitting to one season.