About

Design decisions

Architecture decision records (ADRs) — the technical choices behind the project, why we made each one, and what trade-offs we accepted. Append-only and numbered. If you want the consumer-facing explanation of how grades work, see How grades work.

#	Title	Status
0001	Monorepo with shared `db/` as schema source of truth	Accepted
0002	Python pipeline as installable package + raw SQL	Accepted
0003	Data tier and `qualified` flag as first-class columns	Accepted
0004	Normalize historical team abbreviations to current	Accepted
0005	Hand-written TS types with codegen guardrail	Accepted
0006	Forward-only migrations with `schema_migrations` tracking	Accepted
0007	Pure-function grading math, DB I/O isolated to `ingest/`	Accepted
0008	Sigmoid grade mapping with k=1.15, z=0->50, z=+2->90	Accepted
0009	Raw nflverse data cached as parquet; only typed tables in Postgres	Accepted
0010	Use nflreadpy (official nflverse) instead of nfl_data_py	Accepted
0011	Store a thin `plays` table in Postgres, not the full PBP fat table	Accepted
0012	Store NGS as three tables, not one unified fact table	Accepted
0013	QB v1 grading formula	Accepted (supersedable — v1 of the formula)
0014	RB v1 grading formula	Accepted (supersedable — v1.1 of the formula)
0015	WR v1 grading formula	Accepted (supersedable — v1 of the formula)
0016	TE v1 grading formula	Accepted (v1; iterates like RB/WR)
0017	v1 face-check: offense-context contamination in high-volume receiver grades	Accepted (v1 limitation, documented; fix deferred to v1.5)

ADR-0001· 2026-04-22

Monorepo with shared `db/` as schema source of truth

Status: Accepted

Status: Accepted
Date: 2026-04-22

Context#

This project has two distinct codebases:

A Python pipeline that ingests data from nfl_data_py and writes per-season + career grades to Postgres.
A Next.js web app that reads from Postgres and renders teams, depth charts, and grades.

Both touch the same database schema. We considered:

Two repos (pipeline + web), each with its own copy of the schema.
Monorepo with a shared db/ directory holding SQL migrations.
Schema-first ORM (Drizzle in TS, then introspect from Python; or SQLAlchemy in Python with TS clients consuming an OpenAPI spec).

Decision#

Monorepo. SQL migrations in db/migrations/ are the single source of truth. Both Python and TypeScript follow that schema; neither owns it.

The TS side gets type safety via nflgrades gen-types, which introspects the live DB and emits web/src/types/db.generated.ts. The Python side uses raw SQL + pandas (no ORM models — see ADR 0002).

Consequences#

Easier:

One PR can include a schema change + the pipeline change that uses it + the web change that displays it. No cross-repo coordination.
New contributors (and AI agents) see the whole system in one tree.
docker compose up -d brings up Postgres with migrations auto-applied, giving both halves a working environment instantly.

Harder:

Repo grows two ecosystems' worth of tooling (npm + pip). Mitigated by keeping each in its own directory with its own README.
Can't independently version the two halves. We don't need to.

Explicitly given up:

Schema-first ORMs (Drizzle, Prisma) where the ORM file generates migrations. They'd push us to TS-first thinking, which is wrong here: the data pipeline is the primary writer and the analyst-friendly layer. See ADR 0002.

ADR-0002· 2026-04-22

Python pipeline as installable package + raw SQL

Status: Accepted

Status: Accepted
Date: 2026-04-22

Context#

The Python side has to:

Pull large DataFrames from nfl_data_py
Compute statistical components and grades on those DataFrames
Bulk-write results to Postgres

Two architectural questions:

Loose scripts in scripts/ versus an installable package with a CLI entry point.
SQLAlchemy ORM models versus raw SQL + pandas to_sql.

Decision#

Installable package. pipeline/ has a pyproject.toml defining the nfl_grades package. After pip install -e ".[dev]", the user gets:

from nfl_grades.grading import sigmoid works from anywhere
nflgrades CLI command (defined in nfl_grades.cli:main)
Tests can import nfl_grades without path hacks
The package can be reused from notebooks, CI jobs, and scheduled runs

Raw SQL + pandas. No Base = declarative_base(), no class Player(Base). Pipeline code uses:

pandas.read_sql / df.to_sql for bulk reads/writes
sqlalchemy.text("...") + the engine from nfl_grades.db for one-off statements
nfl_grades.db.session() context manager for transactional work

Consequences#

Easier:

The CLI gives us one obvious entry point per stage (nflgrades ingest, nflgrades grade, etc.) instead of a sprawl of python scripts/*.py.
Bulk DataFrame writes via to_sql are 10-100x faster than ORM add_all for the row counts we deal with (tens of millions of PBP rows).
Schema lives in SQL only (ADR 0001). No risk of "ORM model says X, DB says Y" drift.

Harder:

No automatic relationship traversal (player.seasons[0].grades). We don't need it — every analytical query is a SQL JOIN.
No Alembic auto-generation from models. We use a tiny custom migration runner instead — see ADR 0006.

Explicitly given up:

ORM ergonomics. We're a data-analysis pipeline, not a CRUD app.

ADR-0003· 2026-04-22

Data tier and `qualified` flag as first-class columns

Status: Accepted

Status: Accepted
Date: 2026-04-22

Context#

Our grades come in three data-quality tiers:

Tier 1 (QB/RB/WR/TE): rich data, full pipeline incl. opponent adjustment
Tier 2 (CB/S/EDGE): decent data
Tier 3 (OL/iDL/off-ball LB/ST): proxy stats, directional only

We also have to handle players who fall below minimum-snaps thresholds: their season exists in data but the grade isn't reliable enough to display as if it were.

Options for representing both:

Compute on read — the web app derives tier from position and qualified from a snap-count join.
First-class columns on season_grades — data_tier SMALLINT and qualified BOOLEAN written by the pipeline, read directly.
Separate views per tier — season_grades_tier1, etc.

Decision#

First-class columns on season_grades:

data_tier SMALLINT NOT NULL CHECK (data_tier BETWEEN 1 AND 3)
qualified BOOLEAN NOT NULL DEFAULT TRUE

The pipeline sets both at write time. The web app reads them directly and shows a tier badge / "insufficient sample" pill without any joins or recomputation.

Consequences#

Easier:

One query returns everything the UI needs to render a grade with full context (SELECT composite_grade, data_tier, qualified ...).
Tier-mapping logic lives in one place (the pipeline) and isn't duplicated between Python and TS.
Filtering ("only show qualified Tier 1 grades") is a trivial WHERE clause with index support.

Harder:

Changing the tier-mapping rules requires re-running grading to refresh the column. We accept this; tiers don't change often.
A small amount of data redundancy: tier is implied by position. We accept this for query simplicity.

Explicitly given up:

Computed-on-read flexibility. If we ever need per-user tier overrides (we won't), we'd have to add them as a separate table.

Context#

nfl_data_py uses the contemporary team abbreviation for each season's data:

2016 Chargers are SD, 2017+ are LAC
2016–2019 Raiders are OAK, 2020+ are LV
Pre-2016 Rams are STL; from 2016 they're LA (some sources use LAR)
A few sources sprinkle in WSH, ARZ, BLT, etc.

If we naively join pbp.posteam = teams.abbr, 2016 Chargers rows silently drop or fail FK constraints. We have to handle this somewhere.

Options:

Store historical abbreviations as-is, display them as-of the season. ("In 2016, SD went 5-11" — but they're the Chargers, same franchise.)
Normalize everything to the current abbreviation at ingestion time via a team_aliases lookup table.
Use nflverse-team package mappings at query time.

Decision#

Normalize to current abbreviation at ingestion. The team_aliases table maps every historical abbr (and a few alternate spellings) to the current team_id. Every current abbr aliases to itself, so the lookup is one unconditional query.

The UI never displays SD or OAK. A 2016 Chargers depth chart is shown under "Los Angeles Chargers" with a note that the team relocated.

Consequences#

Easier:

All FK relationships work without special-casing historical abbrs.
Cross-season queries ("show me all Chargers QBs since 2016") return the expected rows without UNIONs or OR clauses.
Adding a new alias (some future relocation, or a new alternate spelling found in PFR data) is one INSERT.

Harder:

Historical "purity" lost — a 2016 game line in our DB will say LAC, not SD. We accept this; the franchise identity matters more than the city-of-record for player grading.
Need a small chunk of UI copy when showing pre-relocation seasons ("relocated 2017 from San Diego"). Cheap.

Explicitly given up:

Showing "as the team was named at the time." If we ever build a historical game viewer, we'd surface that there.

ADR-0005· 2026-04-22

Hand-written TS types with codegen guardrail

Status: Accepted

Status: Accepted
Date: 2026-04-22

Context#

The DB schema is the source of truth (ADR 0001). The Next.js side needs TypeScript types that match the schema. We considered:

Hand-write everything. Simple but rots silently when migrations change.
Switch to Drizzle/Prisma, define schema in TS, generate everything. Wrong direction — would make TS the source of truth.
Auto-generate from the live DB with kanel / pg-to-ts, replace hand-written types entirely.
Hand-write the public types, auto-generate the raw row types as a guardrail.

Decision#

Option 4. Two layers:

web/src/types/db.generated.ts — auto-generated from information_schema by nflgrades gen-types. Mirrors raw table shapes one-to-one. Never edited by hand. Committed to the repo so TS compiles without a live DB.
web/src/types/index.ts — hand-written. Imports the generated row types and re-exports them with curated names, narrowed string-literal unions (e.g. "AFC" | "NFC" instead of string), and view-shaped types for joins and aggregates.

In CI we'll run nflgrades gen-types --check which exits non-zero if the generated file is stale. That's the guardrail: if you change a migration without regenerating, CI catches it.

Consequences#

Easier:

The schema can grow without TS imports breaking — add a column, run gen-types, decide whether to expose it in index.ts.
We get string-literal unions (Conference, DataTier) where the raw Postgres type is just text/smallint. Better than what any pure generator gives us.
Reviewers see the type changes in index.ts PRs and can reason about the public API surface.

Harder:

Two files to keep mentally aligned. Mitigated by index.ts being short and db.generated.ts being mechanical.
gen-types requires a live DB. Acceptable since we have docker-compose.

Explicitly given up:

Fully automatic types. We're trading a small amount of manual work for the ability to express domain types more precisely than introspection can give us.

ADR-0006· 2026-04-22

Forward-only migrations with `schema_migrations` tracking

Status: Accepted

Status: Accepted
Date: 2026-04-22

Context#

We need a migration story. Options:

Alembic. Standard for SQLAlchemy projects. Auto-generation from ORM models. We have no ORM models (ADR 0002), so the auto-generation isn't useful.
Raw psql -f per file, no tracking. Simple but easy to apply the same migration twice or skip one.
A tiny custom runner that tracks applied migrations in a schema_migrations table and refuses to re-apply or run modified files.

Decision#

Option 3. pipeline/src/nfl_grades/migrate.py (~80 lines) does:

Creates schema_migrations(filename PRIMARY KEY, sha256, applied_at) if it doesn't exist.
Lists db/migrations/*.sql lexically.
For each file: skip if applied with matching sha; error if applied with different sha (someone edited an applied migration); apply otherwise.
Each migration runs in its own transaction.
Optional --seeds flag also runs db/seeds/*.sql (idempotent, re-runs every time).

Migrations are forward-only. To fix a bad migration, ship a new one (0007_fix_bad_constraint.sql).

Consequences#

Easier:

Deploying to Supabase/Neon is nflgrades migrate. Same code as local.
New developers' first command is obvious and safe.
Sha tracking catches "I edited an applied migration" mistakes loudly instead of silently going out of sync.

Harder:

No down migrations. Acceptable: in 6 years of running this kind of pipeline, down migrations are almost always the wrong tool — you ship a forward fix instead.
No model -> migration auto-generation. We don't want it; we'd rather hand-write SQL and review it.

Explicitly given up:

Alembic ecosystem (branching, multiple heads, etc.). We have one head and we ship to it. If this ever stops being true, revisit.

Edge cases#

0001_init.sql is currently editable because nothing has been applied anywhere yet. The moment it's applied to any environment, it becomes immutable.
The schema_migrations table is not itself in a migration file — the migration runner creates it on first invocation. That's intentional; bootstrapping a tracking table inside a tracked migration is a chicken- and-egg problem we don't need.

ADR-0007· 2026-04-22

Pure-function grading math, DB I/O isolated to `ingest/`

Status: Accepted

Status: Accepted
Date: 2026-04-22

Context#

The grading pipeline has many moving parts: empirical Bayes shrinkage, opponent adjustment, z-score within position, inverse-noise composite weighting, sigmoid mapping to 0-100, Kalman smoothing across seasons. We need to be able to:

Tune parameters interactively in notebooks
Unit-test math without spinning up Postgres
Re-run grading on cached/synthetic data
Compare two grading variants side-by-side without committing one to disk

If grading code calls into the database, all of this gets harder.

Decision#

Modules under grading/, career/, components/, and adjust/ are pure functions. They take pandas DataFrames and return pandas DataFrames. They must not import from nfl_grades.db or nfl_grades.ingest.

DB I/O lives in two places only:

nfl_grades.ingest.* — reads from nfl_data_py, writes to raw tables
The CLI commands in nfl_grades.cli — orchestrate by reading from DB, passing DataFrames to the pure functions, writing results back

Concretely: grading/empirical_bayes.shrink(df, ...) returns a Series. The CLI does df = pd.read_sql(...); shrunk = shrink(df); df.to_sql(...).

Consequences#

Easier:

Tests for grading math are pure-Python, no fixtures, no test DB. See pipeline/tests/grading/test_sigmoid.py for the pattern.
Notebooks can iterate on math by passing in any DataFrame, including hand-constructed ones for edge cases.
A future "grade variant comparison" feature is just calling the same pure function with two parameter sets and diffing the outputs.

Harder:

The CLI is responsible for the orchestration glue. That code is less interesting and less tested. Acceptable; it's mostly two-liners.

Enforcement:

ADR-only for now. If we get tempted to add a DB call inside grading/, the import would be the obvious red flag in code review. If this becomes a recurring problem, add an import-linter rule.

ADR-0008· 2026-04-22

Sigmoid grade mapping with k=1.15, z=0->50, z=+2->90

Status: Accepted

Status: Accepted
Date: 2026-04-22

Context#

After computing a composite z-score per (player, season, position), we need to map it onto the 0-100 grade scale users see. Options:

Linear rescale: grade = 50 + 20*z, clipped to [0, 100]. Simple, but cliffs at the boundaries and stretches the middle.
Percentile-based: grade = 100 * percentile_rank(z). Self-rescaling year over year (a "90" never means the same thing twice).
Sigmoid: grade = 100 / (1 + exp(-k * (z - z0))). Smooth, bounded, monotonic, never rescales.

Decision#

Sigmoid with k=1.15 and z0=0. Implementation in pipeline/src/nfl_grades/grading/sigmoid.py.

Parameters chosen so that:

z = 0 -> grade = 50
z = +1 -> grade ~= 76
z = +2 -> grade ~= 91
z = -2 -> grade ~= 9

Rough interpretation: a "90" is roughly 2 standard deviations above the positional mean — about the 97th percentile of qualified players.

Consequences#

Easier:

Grades are stable across seasons. A 90 in 2018 means roughly the same thing as a 90 in 2024.
Bounded [0, 100] without clipping artifacts.
Smooth and monotonic — small z changes produce small grade changes.
Same mapping works for every position.

Harder:

Not directly interpretable as a percentile. We address this by storing percentile alongside composite_grade on season_grades.
Tuning k requires balancing "spread between elite players" (higher k) against "starters cluster near 50" (lower k). 1.15 is the current sweet spot from synthetic-data tuning; will be re-checked once we have real QB grades to eyeball.

Subject to revision:

This is the v1 default. If face-validity tests after build step 2 say "the top 10 QBs are all 95+ and indistinguishable," we lower k. If they say "Mahomes is 78," we raise k. Document changes by superseding this ADR.

ADR-0009· 2026-04-23

Raw nflverse data cached as parquet; only typed tables in Postgres

Status: Accepted

Status: Accepted
Date: 2026-04-23

Context#

Every ingest module pulls a DataFrame from nfl_data_py (play-by-play, rosters, depth charts, NGS passing/receiving/rushing, weekly snap counts, schedules) and eventually has to populate our typed tables (players, player_seasons, depth_charts, stat_components, etc.).

The question: what happens to the raw DataFrame between the network call and the typed insert? Three real options:

Direct ETL. Pull from nfl_data_py, transform in memory, write typed rows. Discard the raw DataFrame.
Raw tables in Postgres. Persist the raw DataFrame to raw_pbp, raw_rosters, etc. (text/jsonb-heavy schemas). Transform reads from those raw tables and writes to typed tables.
Parquet on disk. Cache the raw DataFrame to pipeline/.cache/raw/{source}/{season}.parquet. Transform reads from parquet and writes typed rows to Postgres.

Things that matter for our project:

PBP is large. ~50k rows × 300+ columns per season × 10 seasons is the bulk of our raw data. Most of those columns we never use.
Iteration speed dominates. Tuning grade weights or the garbage-time filter means re-running transforms many times per session. Re-downloading PBP each time would kill the loop. nfl_data_py.import_pbp_data([2024]) takes ~30s; across 10 seasons that's 5 minutes per iteration.
Upstream churn happens. nfl_data_py corrects historical data and occasionally renames columns. A snapshot of "what we believed the schema was on date X" is valuable for debugging "why did this player's grade change?"
Postgres is for the product, not the archive. The web app, indexes, and analytical queries all target typed tables. Mixing 100M+ raw PBP rows in the same DB blows up backups, dump sizes, and query planner headroom.
Pure-function math (ADR 0007). Transforms take DataFrames in and return DataFrames out. They don't care whether the source was a live API call, a parquet file, or a SQL query.

Decision#

Three-layer separation:

Raw layer — parquet on disk. Every nfl_data_py call funnels through a cache_or_fetch(source, season) helper that:
- Returns pd.read_parquet(...) if the file exists.
- Otherwise calls the upstream function, writes the parquet, returns the DataFrame.
- Path: pipeline/.cache/raw/{source}/{season}.parquet (already in .gitignore, configurable via PIPELINE_CACHE_DIR).
Manifest — JSON sidecar. pipeline/.cache/raw/manifest.json records {source, season, fetched_at, nfl_data_py_version, row_count, sha256} per file. Lets us detect upstream churn without re-downloading and surfaces stale caches in nflgrades validate.
Typed layer — Postgres. Only schema-defined tables live in Postgres (db/migrations/*.sql). No raw_* tables, no jsonb columns holding raw payloads.

CLI behavior:

nflgrades ingest <source> --seasons 2024,2025 uses the cache by default.
nflgrades ingest <source> --refresh ignores the cache, re-fetches, and rewrites parquet + manifest.
nflgrades ingest --refresh-stale re-fetches anything where the manifest shows the cached nfl_data_py version differs from the installed one.

Audit trail in Postgres: the existing pipeline_runs table records each ingest invocation (stage='ingest:{source}', season, rows_written, status). The pipeline_runs row says we ingested season X on date Y; the parquet file holds what we actually saw.

Consequences#

Easier:

Re-running grading on new parameters costs the transform time only — no network, no waiting on nfl_data_py.
Notebooks load raw with one line: pd.read_parquet(cache_path("pbp", 2024)).
Reproducing a historical grade is git checkout <sha> + the parquet files; the database can be rebuilt from those two inputs alone.
Postgres backups stay small (~tens of MB for the typed product) instead of carrying GBs of raw PBP we never query in SQL.
If we ever need ad-hoc SQL over raw, DuckDB reads the parquet directly (duckdb.sql("SELECT * FROM 'pipeline/.cache/raw/pbp/2024.parquet'")). We don't have to commit to that now.

Harder:

Raw isn't backed up automatically. Acceptable: raw is regenerable from nfl_data_py for any season we cover. The cost of a wiped cache is one slow re-ingest, not data loss.
Two storage systems instead of one. Acceptable: the boundary is obvious — anything inside ingest/cache_or_fetch(...) reads/writes parquet, everything downstream reads from typed Postgres.
Detecting upstream column renames isn't automatic. The manifest catches fetched-with-different-version; the schema-mapping code in ingest/ catches renamed-column loudly when it tries to access the missing key. Both are acceptable failure modes — loud and early.

Explicitly given up:

Raw-in-DB convenience. Some teams like being able to psql into a raw_pbp table mid-debug. We're a pandas pipeline; you'd open a notebook and pd.read_parquet instead. If this ever becomes painful, expose raw via a DuckDB-backed FDW or a thin raw schema — don't migrate the primary store.
Streaming ingest. Parquet is batch-oriented. We have no streaming use case (NFL data lands once a week); revisit if that changes.

Implementation notes (non-binding)#

cache_or_fetch lives in nfl_grades.ingest._cache and is the only module allowed to import nfl_data_py. Every concrete ingester (ingest/pbp.py, ingest/rosters.py, ...) calls it with its source key.
The manifest is rewritten atomically (write to manifest.json.tmp, rename) so a Ctrl-C mid-update can't corrupt it.
Parquet uses pyarrow with default compression (snappy). Don't override unless we hit a real size or speed problem.
Cache invalidation policy: never automatic. Refresh is always an explicit CLI flag. We'd rather work on stale data than silently re-run ingest under a developer.

ADR-0010· 2026-04-23

Use nflreadpy (official nflverse) instead of nfl_data_py

Status: Accepted

Status: Accepted
Date: 2026-04-23
Supersedes: implicit choice of nfl_data_py in earlier scaffolding

Context#

The original pipeline scaffolding picked nfl_data_py as the data-source client (mentioned in data-sources.md, pipeline/README.md, and pyproject.toml's [ingest] extra). This was the de-facto standard for Python access to nflverse data for several years.

Two things forced a re-evaluation:

Python 3.13 incompatibility. nfl_data_py 0.3.3 (the latest release, shipped in early 2024) caps its dependencies at numpy<2.0. Our stack is Python 3.13 with numpy>=2.1 (which is required for Python 3.13 wheels — there are no numpy<2 wheels for cp313). pip install ".[ingest]" fails with ResolutionImpossible.
nflreadpy exists and is the official successor. Released September 2025 by Tan Ho (nflverse maintainer), nflreadpy is a Python port of nflreadr (the canonical R package for nflverse data). It pulls from the same nflverse-data GitHub releases — the actual data source is identical.

Comparison:

Aspect	`nfl_data_py 0.3.3`	`nflreadpy 0.1.5`
Maintainer	Cooper Adams (community)	Tan Ho (nflverse core team)
Last release	Feb 2024	Nov 2025 (5 releases in 3 months)
Python 3.13	broken (`numpy<2` pin)	supported, classifier present
DataFrame backend	pandas	polars (with `.to_pandas()` method)
Data source	nflverse-data releases	nflverse-data releases (same)
Caching	none	built-in (memory or filesystem)
API surface	`import_pbp_data`, `import_seasonal_rosters`, ...	`load_pbp`, `load_rosters`, ...
Coverage	PBP, NGS, rosters, snaps, etc.	PBP, NGS, rosters, snaps, FTN, contracts, draft, injuries, ... (superset)

The "Beta" status warning on nflreadpy is real but the API mirrors nflreadr exactly, so the contract is well-defined and the underlying data files are the same we'd be reading either way.

Decision#

Use nflreadpy for all nflverse data access. Specifically:

pipeline/pyproject.toml [ingest] extra: nflreadpy>=0.1.5, polars>=1.0, pyarrow>=18.0.
All ingest modules (ingest/pbp.py, ingest/rosters.py, etc.) call nflreadpy.load_* functions.
The cache_or_fetch helper from ADR 0009 wraps nflreadpy calls and converts polars → pandas at the boundary so the rest of the pipeline stays pandas-based (we have no reason to rewrite the math layer in polars yet).
nflreadpy's built-in cache is disabled (NFLREADPY_CACHE=off); we control caching ourselves via parquet files
- manifest per ADR 0009. Two cache layers would be redundant and the manifest needs the raw network fetch to record correctly.
Function-name mapping is documented in docs/data-sources.md (import_pbp_data → load_pbp, import_seasonal_rosters → load_rosters, etc.).

Consequences#

Easier:

Python 3.13 just works. We keep the modern numpy/pandas/scipy stack without downgrading.
We're tracking the same library as the R-side nflverse community uses, which means R-language docs and examples translate almost directly.
Active development: bugs and data updates land in weeks, not years.
Polars is faster than pandas for the kinds of bulk reads ingest does (10-50M PBP rows). Even though we convert to pandas, the read+parse step is faster.

Harder:

We pull in polars (~50MB) and pyarrow (~30MB) at the ingest extra. Acceptable: ingest is a power-user/CI workload, not a thin import.
Polars → pandas conversion at the ingest boundary is one extra .to_pandas() call. Effectively free (zero-copy via Arrow when possible).
"Beta" library risk: API could shift between 0.x releases. Mitigated by pinning a minimum version and keeping the wrapper layer (_cache) thin enough that an API change is one-file fix.

Explicitly given up:

nfl_data_py ecosystem familiarity. Function-name muscle memory needs retraining (import_pbp_data → load_pbp). Net cost: a doc table.
Pandas-native reads. We could keep using pandas directly via pd.read_parquet on nflverse parquet URLs, but then we'd be re-implementing the discovery/versioning logic that nflreadpy already handles. Not worth it.

What this changes in the repo#

pipeline/pyproject.toml [ingest] extra
docs/data-sources.md — function-name mapping, nflreadpy references
pipeline/README.md — replace nfl_data_py mentions
pipeline/src/nfl_grades/ingest/__init__.py docstring
AGENTS.md — convention #5 already cites ADR 0009; nothing to change beyond the data-source name
docs/adr/0009 — still correct (parquet caching strategy is source-agnostic); leave it alone

What this does NOT change#

The grading methodology, schema, ADRs 0001–0008.
ADR 0009's three-layer separation. Parquet on disk, manifest sidecar, typed Postgres — all independent of which Python client we use to fetch.

ADR-0011· 2026-04-23

Store a thin `plays` table in Postgres, not the full PBP fat table

Status: Accepted

Status: Accepted
Date: 2026-04-23

Context#

The nflverse PBP feed (nflreadpy.load_pbp) returns ~49,500 rows × 372 columns per season. It's the input to every grading formula. ADR-0009 already decided that raw source data lives as Parquet on disk, with only typed queryable tables in Postgres. The question now is what shape the Postgres-side plays table takes.

Three options:

No plays in Postgres. Grading reads Parquet each run. Web app can never drill into individual plays.
Thin plays table — ~40 columns we actually use: identifiers, situation, classification, player attribution, outcomes.
Fat plays table — store all 372 columns.

Decision#

Option 2. Create a plays table with ~40 curated columns, documented below. The full 372-column Parquet remains the source of truth on disk (pipeline/.cache/raw/pbp/<season>.parquet), and any analysis that needs columns not in the table can re-read the Parquet directly.

Column selection#

Columns chosen for one of four reasons:

Required by the v1 grading formula (QB composite: EPA/db, CPOE, success rate + garbage-time filter).
Required by likely v1.x grading expansions (RB RYOE context, WR separation context, defensive attribution).
Required by UI drill-down ("top 10 EPA plays for player X").
Cheap to keep and likely needed soon (penalty, air_yards, yac).

Everything else — Elias IDs, no_huddle flags, yardline strings, 200+ tracking-derived columns — stays in Parquet only.

Columns#

See db/migrations/0003_create_plays.sql for the authoritative schema. Summary:

group	columns
identifiers (PK)	`game_id`, `play_id`
game context	`season`, `season_type`, `week`, `game_date`
teams (text abbrs, not FK)	`posteam`, `defteam`, `home_team`, `away_team`
situational	`qtr`, `down`, `ydstogo`, `yardline_100`, `score_differential`, `game_seconds_remaining`, `half_seconds_remaining`, `wp`
classification	`play_type`, `qb_dropback`, `pass_attempt`, `rush_attempt`, `sack`, `qb_scramble`, `qb_spike`, `qb_kneel`, `aborted_play`, `two_point_attempt`, `penalty`
player attribution (gsis_id text)	`passer_player_id`, `rusher_player_id`, `receiver_player_id`, `sack_player_id`, `interception_player_id`
outcomes	`yards_gained`, `epa`, `wpa`, `cpoe`, `success`, `air_yards`, `yards_after_catch`, `complete_pass`, `incomplete_pass`, `interception`, `fumble_lost`, `pass_touchdown`, `rush_touchdown`, `touchdown`
debugging	`play_desc` (renamed from nflverse `desc` to avoid SQL reserved-word friction)

Total: ~42 columns.

Team and player references: strings, not FKs#

posteam / defteam stay as TEXT (not FK to teams). Historical team abbreviations (STL, OAK, SD, LA pre-rebrand) already have normalization coverage via team_aliases; pushing FK semantics into the plays table would force rewriting team abbrs during ingest and fight against the source.
*_player_id columns store the raw gsis_id as TEXT. Joining to players.gsis_id is one-line SQL. Deferred advantages: we can ingest plays before rosters for that season (hasn't happened yet, but is a real recovery story if rosters breaks), and we don't have to manage FK cascades when a player is deleted.

Indexes#

(season, season_type) — partitions most grading queries.
(passer_player_id, season), (rusher_player_id, season), (receiver_player_id, season) — for the "feature extraction" queries that pull one player-season's plays at a time.

Size and storage#

~50k rows/season. 10 seasons of history = ~500k rows.
~40 columns, mostly nullable small numerics + a few text keys.
Estimated ~80 MB for 10 seasons in Postgres (10x smaller than the Parquet cache, since we're dropping 330 columns).
Well inside "don't bother partitioning" territory.

Consequences#

Easier:

Grading reads SELECT ... FROM plays WHERE season=? AND passer_player_id=? with no pandas overhead.
UI player pages can show "top 10 EPA plays" with a cheap indexed query.
New stat components for existing positions are small SQL additions — no new ingest needed.

Harder:

Adding a new column we later need means a new migration + a full re-ingest of affected seasons. We accept this: the column list above is conservative and covers the build plan through career grading.
Two sources of truth for raw PBP (Parquet + Postgres). The Parquet file is canonical; if the Postgres table disagrees we re-ingest.

Explicitly given up:

Per-play tracking fields (time-to-throw per play, pressure tags) — those live in NGS / FTN, not PBP, and are ingested separately.
The 300 "everything else" PBP columns — fumble recovery IDs, drive numbers, kicker yards etc. Available via the Parquet cache if needed for ad-hoc analysis.

References#

ADR-0009: Raw data cached as Parquet, typed tables in Postgres.
docs/exploration/2026-04-23-pbp.md (to follow this ADR) — probe output that anchored this column selection.

ADR-0012· 2026-04-23

Store NGS as three tables, not one unified fact table

Status: Accepted

Status: Accepted
Date: 2026-04-23

Context#

Next Gen Stats (NGS) arrives via nflreadpy.load_nextgen_stats(stat_type=...) in three flavors:

passing (29 cols): avg_time_to_throw, aggressiveness, completion_percentage_above_expectation (NGS's CPOE), avg_air_yards_to_sticks, plus derived efficiency numbers.
rushing (22 cols): rush_yards_over_expected_per_att, efficiency, percent_attempts_gte_eight_defenders, avg_time_to_los.
receiving (23 cols): avg_separation, avg_cushion, avg_yac_above_expectation, percent_share_of_intended_air_yards.

Column overlap across the three: only the keys (player_gsis_id, season, season_type, week, team_abbr) and the "display" fields we drop. Zero substantive stat overlap.

NGS coverage: 2016 → present. Earlier seasons have no NGS data at all.

Options:

Three tables: ngs_passing, ngs_rushing, ngs_receiving, each with its native columns.
One unified ngs_stats(player_id, season, week, component_name, value) EAV table: normalizes across stat types.
One wide table with all 29+22+23 columns, most nullable.

Decision#

Option 1. Three tables, each holding its source columns verbatim (minus display dupes like player_first_name). Feature extraction joins whichever table the position needs.

Rationale#

Column overlap is zero. An EAV table would force every query to filter by component_name, losing type safety and pushing schema into strings. No analytic win.
Query shape matches the storage shape. QB grading reads one row per passer from ngs_passing. RB reads one row from ngs_rushing. Not joining across stat types — no benefit to unifying them.
Size is trivial. ~600 QB-season-weeks + ~600 RB-season-weeks + ~1400 WR/TE-season-weeks × 10 seasons × ~74 columns total = well under 100 MB. Three tables don't hurt.
Rejected Option 3 (wide table): half the row would be nulls for any given position. Ugly, misleading query surface, same storage win as Option 1 once you exclude nulls.

Grain and keys#

Each table: one row per (player, season, season_type, week, team).

week = 0 is the season summary row (nflverse convention). The grading pipeline reads WHERE week = 0 for per-season metrics.
week > 0 preserved for future weekly UI / trend charts.
season_type is kept because NGS includes postseason rows (weeks 19, 20, 21, 23 on the nflverse week axis).
team_id is part of the PK because a player traded mid-season gets separate NGS rows per team (the season-summary row too — each team segment gets its own summary).

Team normalization#

team_abbr in the source is the contemporary abbreviation (LAR, LAC, LV, etc.). We resolve via team_aliases at ingest time to get team_id, same as every other ingest. See ADR-0004.

Player mapping#

player_gsis_id in NGS is the nflverse gsis id, which we already use as the canonical identifier on players.gsis_id. No name matching required.

Minimum season#

season >= 2016 is enforced in ingest. Earlier seasons have no NGS; the grading pipeline handles their absence via data_tier (ADR-0003).

What we store#

Every NGS-specific column, verbatim. No pruning — NGS is small and future formula variants may want max_air_distance or percent_attempts_gte_eight_defenders even if v1 doesn't.

We drop: player_first_name, player_last_name, player_display_name, player_short_name, player_position, player_jersey_number. All already available on players / player_seasons / depth charts.

Consequences#

Good:

Natural query shape: SELECT * FROM ngs_passing WHERE week=0 AND season=2024.
Adding new NGS columns (if nflverse exposes them) is a single ALTER TABLE per affected stat type — no EAV-row-count explosion.
Type-safe columns in generated TypeScript.

Trade-offs:

Three ingest code paths (shared via a dispatcher — see ingest/ngs.py).
Adding a new stat type (hypothetical ngs_defense) is a new migration rather than "just insert rows".

References#

ADR-0003 — data_tier for missing historical coverage
ADR-0004 — team abbr normalization
docs/exploration/2026-04-23-ngs.md (when populated) — schema probe

ADR-0013· 2026-04-23

QB v1 grading formula

Status: Accepted (supersedable — v1 of the formula)

Status: Accepted (supersedable — v1 of the formula)
Date: 2026-04-23
Supersedes: None
Formalizes: docs/grading/qb-v1-proposal.md (strawman approved by the user 2026-04-23)

Context#

First concrete grading formula the pipeline needs to compute. Scope limited to the QB position so we can ship a full vertical slice (ingest → features → grades → UI) and iterate on the formula once real numbers are on screen.

Decision#

Composite#

grade = sigmoid(composite_z)

composite_z = 0.50 * z(shrunk_EPA_per_dropback)
            + 0.25 * z(shrunk_CPOE)
            + 0.25 * z(shrunk_success_rate)

z() = within-position, within-season standardization.
sigmoid() = existing grading/sigmoid.py, tuned so z = 0 → 50, z = +2 → ~90, z = -2 → ~10.
z-score mean/SD computed from qualified QBs only (see below).

Per-component definitions (before shrinkage)#

Component	Raw value	Sample space
`qb_epa_per_dropback`	mean of `plays.epa`	dropbacks (post-filter)
`qb_cpoe`	mean of `plays.cpoe`	pass attempts only (CPOE is null on sacks/scrambles)
`qb_success_rate`	mean of `plays.success`	dropbacks (post-filter)

Filter#

A play counts toward the grade iff ALL:

plays.season_type = 'REG'
plays.qb_dropback  = TRUE
plays.aborted_play = FALSE
plays.two_point_attempt = FALSE
NOT garbage_time

Garbage-time (ADR-0013 formalizes the proposal's rule):

garbage_time =
    (qtr >= 4 AND ABS(score_differential) > 21)
 OR (qtr  = 4 AND game_seconds_remaining < 300
                 AND ABS(score_differential) > 14)

Chosen over the wp < 0.05 OR wp > 0.95 convention because nflverse WP is aggressive about locking in late-game outcomes, and we'd rather err on the side of keeping a legitimate play than dropping one.

Empirical Bayes shrinkage#

Per component, before z-scoring:

shrunk = (n * raw + k * mu_league) / (n + k)

n = sample size for that component (dropbacks for EPA + success rate; pass attempts for CPOE — CPOE is null on sacks/scrambles so we use only the plays where it's defined).
mu_league = league mean of the raw component among all QBs, weighted by their sample size (volume-weighted, not simple average).
k shrinkage strength:
- k = 150 for EPA/db and success rate
- k = 100 for CPOE (lower variance, less shrinkage needed)

Qualified threshold#

qualified = TRUE iff n_dropbacks >= 200 in the regular season.
All QBs with any dropbacks get a row in season_grades, but those below the threshold have qualified = FALSE — the UI can de- emphasize them.
Unqualified QBs still get shrunk / z-scored so their grade is on the same 0–100 scale.

Position assignment#

A player grades as QB iff they're in player_seasons.position_played = 'QB' for that season. If a player appears at multiple positions, they only grade at each position they occupied. Non-QBs with a passing play (e.g. wildcat RB throws) don't get a QB grade — the QB feature query joins against player_seasons.position_played.

What opponent adjustment?#

None for v1. Deferred. The composite runs off raw EPA, no defense-strength normalization. Revisit in v2 once face-validity feedback shows whether it's missing.

Confidence#

season_grades.confidence is set to min(1, n_dropbacks / 300). Rough proxy — 300 dropbacks is roughly half a full-season starter's workload; anyone at/above that gets confidence = 1.

Data tier#

Per ADR-0003:

2016+: tier 1 (full PBP + NGS available)
2006–2015: tier 2 (PBP available, no NGS — not relevant to the v1 formula since we don't use NGS)
pre-2006: tier 3 (no EPA model — cannot grade with this formula)

For now we only compute grades for seasons that have PBP ingested. The data_tier column on season_grades records which tier the grade belongs to.

Consequences#

Testability: Each stage (filter, shrinkage, z-score, composite, sigmoid) is a pure function on a DataFrame. Component tests verify the math, integration tests verify the top-10 list looks sane.

Iteration: If we decide CPOE is overweighted, that's a single coefficient change in grading/qb.py. If we want to add opponent adjustment later, it's a new column on stat_components — no schema change. The formula is a library, not an API.

Superseded when: We add NGS-derived components (time-to-throw, aggressiveness), opponent adjustment, or a defensibly-tuned inverse- variance weighting. Those go in v2 and get their own ADR.

References#

docs/grading/qb-v1-proposal.md — the strawman user approved
ADR-0003 — data tiering for missing historical coverage
ADR-0007 — originally sketched inverse-noise weighting; v1 skips this intentionally for explainability
db/migrations/0001_init.sql — stat_components and season_grades tables (pre-existing)

ADR-0014· 2026-04-24

RB v1 grading formula

Status: Accepted (supersedable — v1.1 of the formula)

Status: Accepted (supersedable — v1.1 of the formula)
Date: 2026-04-24
Updated: 2026-04-22 (v1.1 — see "v1.1 refinement" section)
Supersedes: None
Companion to: ADR-0013 (QB v1). Same pipeline shape, different components and per-skill sample sizes.

Context#

Second concrete grading formula. QB v1 shipped (ADR-0013); we're extending the same architecture (extract → shrink → z → composite → sigmoid) to RB. Two things make RB harder than QB:

Role variation is huge. Derrick Henry (280 carries / 20 targets) and Christian McCaffrey (220 / 100) are both elite by very different profiles. A naive single-composite formula would wrongly penalize a thumper's "bad" receiving or reward a pass- catching back's "easy" rushing.
Raw RB stats reflect a lot of non-RB stuff (OL quality, box counts, play-action, scheme). NGS RYOE and YAC-over-expected already try to strip this out, so they deserve meaningful weight.

We choose to handle (1) with usage-aware empirical Bayes shrinkage — a pure thumper's receiving components shrink hard toward the league mean (because their n_targets is small relative to the shrinkage k) and contribute close to zero to the composite. No explicit role detection is needed.

Decision#

Composite#

grade = sigmoid(composite_z)

composite_z = 0.28 * z(shrunk_ryoe_per_attempt)
            + 0.18 * z(shrunk_rush_epa_per_attempt)
            + 0.14 * z(shrunk_rush_success_rate)
            + 0.18 * z(shrunk_rec_epa_per_target)
            + 0.12 * z(shrunk_yac_over_expected_per_rec)
            + 0.05 * z(shrunk_catch_pct)
            - 0.05 * z(shrunk_fumble_rate)

Rush 60% / Rec 35% / Security 5%. Fumble rate enters with a negative sign (fumbles are bad).

z() = within-position, within-season standardization (same helper as QB; mean/SD computed from qualified RBs only).
sigmoid() = existing grading/sigmoid.py tuned so z = 0 → 50, z = +2 → ~90.

Per-component definitions (before shrinkage)#

Component	Raw value	Sample (n)	Source
`rb_ryoe_per_attempt`	NGS `rush_yards_over_expected_per_att`	carries	`ngs_rushing` (week=0)
`rb_rush_epa_per_attempt`	mean of `plays.epa` on rushes	carries	`plays`
`rb_rush_success_rate`	mean of `plays.success` on rushes	carries	`plays`
`rb_rec_epa_per_target`	mean of `plays.epa` on targets	targets	`plays`
`rb_yac_over_expected_per_rec`	mean of `plays.yards_after_catch - plays.xyac_mean_yardage` on completions	receptions scored by xYAC model (`n_rec_with_xyac`)	`plays` (nflfastR xYAC)
`rb_catch_pct`	`n_receptions / n_targets` (from `plays`, filter-matched)	targets	`plays`
`rb_fumble_rate`	`fumble` rate per touch (any fumble by ball carrier)	total touches	`plays`

Pre-adjusted flag: rb_ryoe_per_attempt and rb_yac_over_expected_per_rec are already context-adjusted by their upstream models (NGS's RYOE model and nflfastR's xYAC model respectively). When opponent adjustment lands in v2, these two components must be flagged so we don't double-adjust.

Catch-% source: NGS's load_nextgen_stats("receiving") only publishes rows for WR/TE — RBs are never included regardless of target volume. For v1 we derive catch % directly from plays: n_receptions / n_targets, with the same garbage-time / 2-pt filter as the rest of the receiving components. No expected-catch baseline is applied (none is available); we accept the limitation because RB target diet is relatively uniform (mostly short routes) and the component's weight is only 5%.

YAC-over-expected source: same NGS-receiving RB gap — we instead use nflfastR's xyac_mean_yardage column published on every completion in plays. For each RB reception with a non-null xyac_mean_yardage, the residual yards_after_catch - xyac_mean_yardage is the RB's YAC over expected on that play. We average across the RB's receptions (filter matches the rest of the receiving components). Coverage on RB completions in the modern era is >99% (≈0.9% null in 2024), so sample size effectively equals n_receptions. See v1.1 refinement section below.

Fumble rate: computed from plays.fumble (any fumble by the ball carrier, not just ones recovered by the defense). Counted on both rushing and receiving plays within the same per-skill filters as the production metrics. See v1.1 refinement section below.

Filter#

A rushing play counts toward the rushing components iff ALL:

plays.season_type = 'REG'
plays.rush_attempt = TRUE
plays.rusher_player_id IS NOT NULL
plays.qb_kneel  IS NULL OR plays.qb_kneel  = FALSE
plays.qb_scramble IS NULL OR plays.qb_scramble = FALSE   -- scrambles aren't RB production
plays.two_point_attempt IS NULL OR plays.two_point_attempt = FALSE
NOT garbage_time

A receiving play counts toward the receiving components iff ALL:

plays.season_type = 'REG'
plays.pass_attempt = TRUE
plays.receiver_player_id IS NOT NULL
plays.two_point_attempt IS NULL OR plays.two_point_attempt = FALSE
NOT garbage_time

Garbage-time rule is identical to ADR-0013:

garbage_time =
    (qtr >= 4 AND ABS(score_differential) > 21)
 OR (qtr  = 4 AND game_seconds_remaining < 300
                 AND ABS(score_differential) > 14)

Position assignment#

A player grades as RB iff players.position = 'RB'. We grade from the master players table (not player_seasons.position_played) so that a rookie who changed teams mid-season still gets one grade per player, not one per team stint.

Non-RBs with rushes (scrambling QBs, WR jet-sweepers, gadget TEs) don't get an RB grade — the feature query joins on players.position = 'RB'.

Empirical Bayes shrinkage#

Per component, before z-scoring:

shrunk = (n * raw + k * mu_league) / (n + k)

where mu_league is the volume-weighted RB league mean (summed over qualified and unqualified RBs, same convention as QB v1).

k per component (picked so n == k means "half-shrunk toward league mean"):

Component	`n` column	`k`
`rb_ryoe_per_attempt`	carries	100
`rb_rush_epa_per_attempt`	carries	100
`rb_rush_success_rate`	carries	100
`rb_rec_epa_per_target`	targets	40
`rb_yac_over_expected_per_rec`	receptions scored by xYAC (`n_rec_with_xyac`)	30
`rb_catch_pct`	targets	40
`rb_fumble_rate`	total touches	200

The large k on fumble rate is deliberate — fumble rate (even with the recovery coin-flip removed by switching from fumble_lost to fumble) still has weak year-over-year reliability (~r=0.1-0.2), so we shrink hard.

Handling missing data#

Some RBs are below NGS's volume thresholds and won't have ngs_rushing season-summary rows. Our joins are LEFT JOINs and the missing metrics come through as NaN with n = 0. Similarly, an RB with no receptions has NaN receiving metrics.

Policy: before combining into the composite, any NaN component z-score is replaced with 0 (neutral). This covers three distinct "no data" cases under a single rule:

A pure thumper with n_targets = 0 has NaN receiving z-scores.
A pass-game specialist with 15 carries (under NGS's rushing volume threshold for RYOE) has a NaN z for RYOE even though their n_carries > 0.
Some RBs may be missing NGS rushing rows for the season entirely (e.g. rookies whose first week was postseason).

All three collapse to "no evidence on this skill = assume league average on this skill". The alternative — renormalizing composite weights per-player to drop missing components — would re-introduce role-aware weighting, which we explicitly wanted to avoid.

The stat_components.z_score column keeps the true NaN for these rows so the UI can render "—" rather than "0.0" and be honest about what we don't know. Only the composite calculation substitutes 0.

Qualified thresholds#

Three separate qualification concepts, because RBs have two skills:

Threshold	Rule	Purpose
Grade at all	`touches >= 30`	Excludes fringe players we can't say anything meaningful about
Composite qualified	`touches >= 120`	"Real contributor" — appears in main leaderboard
Rushing sub-grade qualified	`carries >= 80`	Rushing sub-grade displays; else "—"
Receiving sub-grade qualified	`targets >= 40`	Receiving sub-grade displays; else "—"

120 touches is roughly 7-8 touches/game over a full season — half a full-season bell cow's workload, or all of a receiving specialist like Ekeler. Tunable if the face-check shows too many marginal committee backs at the top.

All backs with touches >= 30 get a season_grades row; the qualified column distinguishes them.

Sub-grades#

The season_grades row holds the composite grade only. Sub- grades (rushing / receiving) are computed at read time in the web app by combining the already-z-scored component rows in stat_components. No schema change.

Rushing sub-grade z = (0.28*z_ryoe + 0.18*z_rush_epa + 0.14*z_rush_success) / (0.28 + 0.18 + 0.14) then sigmoid to 0-100.

Receiving sub-grade z = (0.18*z_rec_epa + 0.12*z_yac_over_exp + 0.05*z_catch) / (0.18 + 0.12 + 0.05) then sigmoid to 0-100.

A sub-grade renders as "—" when the sample-size threshold for that skill isn't met. This is purely a UI convention — the composite grade in season_grades is unaffected.

Confidence#

season_grades.confidence = min(1, touches / 250). 250 touches is roughly a full-season starter's workload; anyone at/above that gets confidence = 1.

Data tier#

Per ADR-0003:

2016+: tier 1 (PBP + NGS available; full formula computes).
Pre-2016: out of scope for v1. The formula depends on NGS components (RYOE, YAC-over-expected, catch %) for 45% of weight. Backfilling a pre-NGS fallback is deferred.

Consequences#

Testability: each stage is a pure function (same as QB); unit tests verify the "n=0 → z=0" neutralization, the sub-grade threshold gating, and that dual-threat backs outrank specialists.

Web app: the existing leaderboard + player detail pages render RBs as soon as season_grades has rows. A position switcher on the home page is a one-component follow-up (bundle with WR/TE).

Iteration: weight and k changes are single-coefficient edits in weights.py. Adding broken-tackle-rate from PFR is a new component row, no schema change.

Two caveats from the original v1 were resolved by adding two columns to plays (migration 0005_add_fumble_and_xyac_to_plays) and switching the RB grader's data sources:

Fumble rate now uses plays.fumble rather than plays.fumble_lost. Fumble-lost depends on who recovers (a near-coin-flip), making it strictly noisier than true fumble rate. The change is source-only — the weight (-0.05), the large shrinkage k (200), and the ball-carrier attribution rules are unchanged.
YAC-over-expected now sourced from plays.xyac_mean_yardage (nflfastR's xYAC model output on each completion) rather than ngs_receiving.avg_yac_above_expectation. Root cause: NGS's receiving product publishes zero RB rows regardless of target volume, so the NGS-based component collapsed to a NaN-then- neutralized 0 for effectively every RB, silently wasting its 12% composite weight. The xYAC column covers >99% of modern-era RB completions, so the component is now active signal.

Both changes preserve the existing composite weights, shrinkage constants, qualification thresholds, and pre_adjusted flags — the data sources change, the formula does not. Pre-adjusted remains True for the YAC component (xYAC is still a per-play, context- aware model — opponent adjustment in v2 must still skip this component).

The stat_components.component_name strings remain the same (rb_fumble_rate, rb_yac_over_expected_per_rec), preserving the public contract with the web app.

Deferred#

Opponent adjustment: same deferral as QB v1. When added, the RYOE and YAC-over-expected components must be flagged as pre_adjusted: True to avoid double-adjustment.
Broken-tackle rate from PFR — valuable skill signal, but reliability needs cross-year validation before we weight it.
Red-zone / goal-line efficiency — small sample, mostly usage- driven, skipped.
Two-point conversion efficiency — same reasoning.
20+ yard breakaway rate — potentially distinct signal from EPA, but correlation is high enough that we're dropping it for v1. Revisit if breakaway-archetype backs grade unfairly low.
Route participation / target share as a graded input — no routes-run data ingested yet.
Forced-fumble attribution, recoveries-in-pileups — deferred to a defensive-grading pass.
Usage labels ("Feature / Committee / Specialist") derived from snap share. Nice UI add, not a grading change. v1.5.

References#

ADR-0013 — QB v1 grading formula (same architecture)
ADR-0003 — data tiering
ADR-0011 — thin plays table (updated by migration 0005 to include fumble and xyac_mean_yardage)
ADR-0012 — NGS three-table layout (rushing used; receiving intentionally not joined for RB grading)

ADR-0015· 2026-04-22

WR v1 grading formula

Status: Accepted (supersedable — v1 of the formula)

Status: Accepted (supersedable — v1 of the formula)
Date: 2026-04-22
Supersedes: None
Companion to: ADR-0013 (QB v1), ADR-0014 (RB v1). Same pipeline shape (extract -> shrink -> z -> composite -> sigmoid), different components, filters, and qualification thresholds.

Context#

Third concrete grading formula. QB v1 and RB v1 shipped; we're extending the same architecture to WR. Three things distinguish WR grading from the prior two:

WRs have one skill, not two. There's no RB-style dual-skill split (rushing + receiving), so there's one composite and no sub-grades in v1. "Route runner vs YAC monster" is interesting UI data viz but not a separate qualification bucket.
NGS receiving publishes WRs cleanly (unlike RBs, which NGS excludes). We get avg_separation and avg_yac_above_expectation on essentially all qualified WRs from 2016+.
Target earn rate is a real signal for WRs (unlike for RBs, where carries are decreed by scheme). WRs partly earn their targets by winning routes and forcing the QB's eye. This is a new component with no RB analog.

The grade is meant to answer "how well did this WR play the receiving role this season?" — separated from usage-driven accumulators (total yards, touchdowns, target share as a volume stat).

Decision#

Composite#

grade = sigmoid(composite_z)

composite_z = 0.35 * z(shrunk_rec_epa_per_target)
            + 0.27 * z(shrunk_yac_over_expected_per_rec)
            + 0.10 * z(shrunk_separation)
            + 0.10 * z(shrunk_target_earn_rate)
            + 0.08 * z(shrunk_success_rate_per_target)
            - 0.05 * z(shrunk_fumble_rate)

Sum of magnitudes = 0.95. The composite combiner normalizes by sum of magnitudes (not signed sum); fumble contributes at its designed 5.3% share (0.05 / 0.95). This invariant is locked by test_signed_weights_normalize_by_magnitude in pipeline/tests/grading/test_composite.py and further reinforced by test_wr_v1_weights_example which uses the exact WR_V1_WEIGHTS dict.

Rough shape:

62% outcome-based: EPA/target 35% + YAC-over-expected 27%
28% process + usage: separation 10% + target earn rate 10% + success rate 8%
5% ball security: fumble rate (negative)
z() = within-position, within-season standardization against qualified WRs only (same helper as QB and RB).
sigmoid() = grading/sigmoid.py, z=0 -> 50, z=+2 -> ~90.

Why these weights#

EPA at 35%, not 40%. A single metric at 40% gives any systematic bias (QB quality, scripted touches, YAC-heavy offense) too much leverage. 35% keeps EPA the biggest contributor without dominating the composite.
YAC at 27%. Highest-reliability WR signal after EPA. xYAC pre-adjusts for coverage state at the catch, so this is close to pure WR skill.
Target earn rate at 10%, not 22%. Target share is structurally correlated with team environment (top QB, pass-heavy scheme, weak WR2 competition, weak TE/RB pass game). These confounds don't wash out across a season; they persist for players in stable situations. 10% captures the "QB looks at you" signal without letting offensive environment drive a fifth of the grade.
Separation at 10%, not 15%. Process metric, not outcome; inflated by easy targets (screens, hitches); NGS measures at-catch rather than at-throw. Keep it modest.
Success rate at 8%. Diversifies efficiency measurement away from pure EPA, but it's partly role-contaminated (slot checkdowns on 3rd-and-medium have a different success-rate baseline than outside verticals on 1st-and-10). 8% is a compromise — not 5% (which underweights a second efficiency lens), not 10% (which overweights a role-biased metric). Flagged as a face-check watch item: if slot specialists systematically outgrade deep threats, dial this back first.
Catch-rate-over-expected dropped entirely. Every version of this from public data is either QB-contaminated (aggregated plays.cpoe per receiver rewards pairing with accurate QBs) or role-contaminated (raw NGS catch % punishes deep threats and rewards screen/flat receivers). Omitting a component is an honesty signal — PFF has proprietary charting for catchable targets; we don't. Surface raw catch % on the player page as context, keep it out of the composite.
Fumble rate at -5%. Same rationale as RB v1.1: rare event, low YoY reliability, shrink hard.

Per-component definitions (before shrinkage)#

Component	Raw value	Sample (n)	Source	Pre-adjusted
`wr_rec_epa_per_target`	mean of `plays.epa` on targets	targets	`plays`	No
`wr_yac_over_expected_per_rec`	mean of `plays.yards_after_catch - plays.xyac_mean_yardage` on completions with non-null xYAC	`n_rec_with_xyac`	`plays` (nflfastR xYAC)	Yes
`wr_separation`	`avg_separation`	targets	`ngs_receiving` (week=0)	Yes
`wr_target_earn_rate`	`n_targets / n_team_pass_att_active`	team pass attempts while active	`plays`	No
`wr_success_rate_per_target`	mean of `plays.success` on targets	targets	`plays`	No
`wr_fumble_rate`	rate of `plays.fumble` per reception	receptions	`plays`	No

Target earn rate denominator: n_team_pass_att_active is the sum of posteam's regular-season pass attempts across the set of (posteam, game_id) pairs that appear in the WR's own target plays. This handles mid-season trades cleanly — each game's denominator is its correct team's pass volume. The "had >=1 target" proxy for active may slightly under-count games where the WR played but wasn't targeted; for qualified WRs this is rare.

Fumble denominator = receptions (not targets): WRs only touch the ball on completions. Keeps fumble rate comparable across possession WRs and deep threats.

Pre-adjusted flag: wr_yac_over_expected_per_rec and wr_separation are already context-adjusted by their upstream models. When opponent adjustment lands in v2, these components must be flagged so we don't double-adjust.

Filter#

A receiving play counts toward WR components iff ALL:

plays.season_type = 'REG'
plays.pass_attempt = TRUE
plays.receiver_player_id IS NOT NULL
plays.two_point_attempt IS NULL OR plays.two_point_attempt = FALSE
NOT garbage_time

Identical to the RB v1 receiving filter — reused verbatim from grading/filters.py::RB_REC_FILTER_SQL. Garbage-time rule is the one defined in ADR-0013.

The team-pass-attempts aggregate for the earn-rate denominator uses the same filter so numerator and denominator are consistent (both count REG-season, non-garbage, non-2pt pass attempts).

Position assignment#

A WR grade is issued iff players.position = 'WR'. A WR running a jet sweep doesn't get rushing credit — this is a receiving grade only. A TE/RB running routes out of the backfield doesn't get a WR grade; they belong in their own position's pipeline.

Empirical Bayes shrinkage#

Per component, before z-scoring:

shrunk = (n * raw + k * mu_league) / (n + k)

where mu_league is the volume-weighted WR league mean (summed over qualified and unqualified WRs, same convention as QB/RB v1).

k per component:

Component	n units	k
EPA per target	targets	50
YAC over expected per rec	receptions scored by xYAC	30
Separation	targets	40
Target earn rate	team pass attempts while active	200
Success rate per target	targets	50
Fumble rate	receptions	100

Separation's k (40) is slightly below the other per-target components (50) because NGS separation has higher year-over-year reliability than raw per-play efficiency metrics. Target earn rate uses its natural denominator (team pass attempts) rather than games — the EB formulation shrinks toward league-mean target share weighted by the number of observations, which is the correct statistical framing. k=200 team pass attempts is roughly 35% of a team's regular-season pass volume.

Handling missing data#

Same policy as RB v1 (see ADR-0014 "Handling missing data"): any NaN component z-score is replaced with 0 (neutral) before entering the composite. stat_components.z_score keeps the true NaN so the UI can render "-" rather than "0.0".

Practically, this matters most for:

WRs under NGS's separation volume threshold (rookies with partial seasons, or below the volume NGS publishes). Separation is NaN; z is NaN; composite substitutes 0.
A WR with 0 completions (only happens at the extreme low-volume end) has NaN YAC and NaN fumble rate.

The alternative — renormalizing composite weights per-player to drop missing components — would re-introduce role-aware weighting, which we explicitly want to avoid.

Weight normalization invariant#

The composite combiner normalizes by sum of magnitudes (sum(abs(w))), not signed sum. A player at z=+1 on every component (including fumble rate — where z=+1 means "fumbles a lot") gets composite_z = (0.35 + 0.27 + 0.10 + 0.10 + 0.08 - 0.05) / 0.95 ≈ 0.894, and fumble penalizes at exactly its designed 5.3% share rather than being amplified by a smaller signed-sum denominator.

This is locked by test_signed_weights_normalize_by_magnitude (added during RB v1.1) and by the new test_wr_v1_weights_example which exercises the actual WR_V1_WEIGHTS dict.

Qualification thresholds#

Two qualification concepts:

Threshold	Rule	Purpose
Grade at all	`targets >= 20`	Excludes fringe WRs we can't say anything meaningful about
Composite qualified	`targets >= 50`	Rotational WR3 or better; appears in main leaderboard; defines z-score population

~3/game over a full season is roughly the floor for "this player got real route time." Tunable if face-check shows too many marginal WR3s at the top or too many clear WR1s falling below.

All WRs with targets >= 20 get a season_grades row; the qualified column distinguishes them.

Confidence#

season_grades.confidence = min(1, targets / 100). 100 targets is ~6/game — "real starter usage" rather than WR1 workload (which would be ~120-140+). Pegging full confidence here gives most healthy starters confidence = 1 and reserves the fractional band for genuine part-season / rotational players.

Data tier#

Per ADR-0003:

2016+: tier 1 (PBP + NGS available; full formula computes).
Pre-2016: out of scope for v1. The formula depends on NGS components (separation, xYAC availability) for 37% of weight. A pre-NGS fallback is deferred; call it a v2 concern.

Validation expectations#

Expect WR composite year-over-year Pearson r on 2+-season samples in the band 0.45 - 0.60.

Interpretation triggers:

Below 0.45 — methodology problem. Most likely a process component (separation or success rate) dominating noise over EPA/YAC. Investigate weight distribution and per-component reliability.
0.45 - 0.60 — the expected band. WR production is genuinely more defense-dependent than QB production, and we don't have CB matchup adjustment in v1.
Above 0.65 — suspicious. Likely means we're accidentally measuring usage (target volume, team context) rather than skill. Investigate whether target earn rate is pulling the stability or whether separation's metric-stability is doing more work than intended.

QB v1 for comparison was in the 0.60 - 0.70 band; WR's lower ceiling is a data limit (no CB matchup data), not a grading failure. Don't chase the QB number by tuning weights.

Consequences#

Testability: each stage is a pure function, same as prior positions. Unit tests verify NaN neutralization, that a pure separator outranks a non-separator with the same efficiency, that the fumble penalty actually subtracts, and that the composite normalization constant matches the hand-computed value from WR_V1_WEIGHTS.

Web app: the existing leaderboard + player detail pages render WRs as soon as season_grades has rows. A position switcher on the home page is a separate follow-up (currently hardcoded to QB; RB and WR both pending surfacing).

Iteration: weight and k changes are single-coefficient edits in weights.py. Adding a new component (say, separation at-throw once it becomes publicly available) is a new SQL CTE and a new row in the weights dicts; no schema change.

Deferred (v1.1+)#

Target-per-route-run — the clean v1.5 upgrade to target earn rate, replaces the "team pass attempts while active" proxy with a true "routes run" denominator. Requires routes-run data (PFF/FTN); not ingested.
Team-context-adjusted target earn rate — regress target share on team pass volume + QB EPA, grade on the residual. ~30 lines of code, a v1.1 candidate if face-check shows earn rate rewarding bad-team-WR1s too generously.
Drop rate — plays can't cleanly isolate drops from defended passes. Requires explicit drop charting.
Slot vs outside split — no alignment data ingested. Face- check will tell us if the one-scale approach systematically biases one archetype.
Contested catch rate — not available in public tracking data.
Red-zone / goal-line efficiency — small sample, mostly role-driven.
Opponent adjustment, team-level — same deferral as QB/RB v1. wr_yac_over_expected_per_rec and wr_separation must be flagged pre_adjusted=True to avoid double-adjustment.
CB matchup adjustment — the v2+ work that would push YoY r from the 0.45-0.60 band toward QB-level 0.60-0.70. Requires per-target defender charting.

References#

ADR-0013 — QB v1 grading formula (same pipeline architecture)
ADR-0014 — RB v1 grading formula (shares receiving machinery, same NaN neutralization policy, same xYAC source for YAC-over- expected)
ADR-0012 — NGS three-table layout (receiving table used for avg_separation)
ADR-0011 — thin plays table (with fumble and xyac_mean_yardage added by migration 0005)
ADR-0003 — data tiering

ADR-0016· 2026-04-23

TE v1 grading formula

Status: Accepted (v1; iterates like RB/WR)

Status: Accepted (v1; iterates like RB/WR)
Date: 2026-04-23
Companion to: ADR-0013 (QB), 0014 (RB), 0015 (WR); ADR-0003 (data tier); ADR-0009 (parquet cache)

Context#

TE grades must reflect receiving only in v1: public data does not support a repeatable blocking grade (no PFF-style charting). Role labels and data_tier_reason communicate what the number measures (see Role and data_tier below).

Decision — composite (tier 1, full six components)#

Same structure as WR v1 with separation at 7% (WR uses 10%). NGS separation is WR-coverage-geometry calibrated; TE-vs-LB/S matchups are noisier in the same metric — downweight, do not drop.

Component	Weight
`te_rec_epa_per_target`	0.35
`te_yac_over_expected_per_rec`	0.27
`te_separation`	0.07
`te_target_earn_rate`	0.10
`te_success_rate_per_target`	0.08
`te_fumble_rate`	-0.05

Sum of magnitudes |w| = 0.92 (signed sum 0.82; composite normalizer uses sum of absolute weights — see test_signed_weights_normalize_by_magnitude and TE tests in test_composite.py).

The earlier "0.95" figure in this ADR was a copy-paste artifact from WR v1 (WR has separation at 0.10 → WR |w| = 0.95); TE separation is downweighted to 0.07 for NGS-calibration reasons, giving |w| = 0.92.

YAC weight = WR (27%): do not increase TE YAC weight on intuition alone; if TE YAC YoY correlation meaningfully exceeds WR YAC in validation, consider v1.1 weight shift with evidence.

Tier 2 — `role = blocking_te`#

Target earn rate is role-dominated for Y-heavy TEs. Omit earn from the composite; redistribute 0.10 to EPA and YAC in proportion 0.35∶0.27 (→ 0.406 and 0.314). Other components unchanged. The component row for te_target_earn_rate is still written with raw / shrunk / z; stat_components.used_in_composite = false for that row.

Because the redistribution preserves magnitude, tier-2 has the same |w| = 0.92 and signed sum 0.82 as tier-1 — on an all-z=1 TE the two dicts both produce 0.82 / 0.92 ≈ 0.8913. The dicts differ by where the earn mass lands, not by total weight.

Filters, features#

Receiving filter: same as WR/RB receiving (RB_REC_FILTER_SQL).
Features: plays + ngs_receiving (week=0) for separation; plays for xYAC-based YAC-over-expected; player_seasons summed snaps_offense for role.
Fumble denominator: receptions.

Qualification#

15 targets minimum to emit a grade row.
40 targets for qualified.
Confidence = min(1, targets / 70).

Shrinkage (per-position `k`)#

TE target earn k = 100 team pass attempts (vs WR 200) — smaller cross-player dispersion in earn rate. Other components align with WR (EPA 50, YAC 30, separation 40, success 50, fumble 100).

Role buckets#

receiving_te: target share ≥ 0.10 (targets / offensive snaps, season).
balanced_te: 0.05 ≤ share < 0.10, or low-snap / low-rate catch-alls.
blocking_te: share < 0.05 and offensive snaps ≥ 200.

`data_tier` and `data_tier_reason`#

Era leg: _era_tier_for_season in grading/era_tier.py → (tier, reason) with reason = era_pre_ngs when tier ≥ 2 from era alone.

TE merge (grading-only):

If role == blocking_te and era tier 1 → data_tier = 2, data_tier_reason = role_blocking_te.
If role == blocking_te and era tier ≥ 2 → keep era tier, data_tier_reason = era_and_role.
Else → era (tier, reason) only.

Non-TE positions: role NULL; data_tier / data_tier_reason from era tuple only.

Schema (migration 0006)#

season_grades.role, season_grades.data_tier_reason, stat_components.used_in_composite.

Pure blocking TEs (< 15 targets)#

No season_grades row. Team/roster UI must not hide these players when built (see plan / UX note).

Validation#

Target TE YoY r band 0.40–0.55 (slightly below WR); interpret like ADR-0015.

Deferred#

Blocking grade, alignment splits, red-zone split, target-per-route earn rate, CB matchup, etc.

References#

pipeline/src/nfl_grades/grading/te.py
pipeline/src/nfl_grades/grading/era_tier.py
docs/adr/0003-data-tier-and-qualified-as-first-class-columns.md

ADR-0017· 2026-04-24

v1 face-check: offense-context contamination in high-volume receiver grades

Status: Accepted (v1 limitation, documented; fix deferred to v1.5)

Status: Accepted (v1 limitation, documented; fix deferred to v1.5)
Date: 2026-04-24
Companion to: ADR-0014 (RB v1), ADR-0015 (WR v1), ADR-0016 (TE v1)

Context#

After shipping WR v1 and TE v1 and running both against the 2024/2025 seasons, a face-check surfaced a recurring pattern: several high-volume receivers on bad offenses graded notably lower than their tape/production would suggest. The prompting case was Brock Bowers (LV, 2024) — the rookie-target-record holder at 153 targets who landed at grade 50.4 / rank 14 of 34 qualified TEs.

The open question was whether v1's grader has a systematic bias (treat all bad-offense receivers as underrated) or something narrower. We ran a pre-check on the 2024 data before picking a direction; the data shows the confound is narrower than "all bad-offense receivers" and also real enough to need written disclosure before declaring v1 done.

Finding#

Affected WRs — 2024, top-15 by targets#

Name	Tm	Tgt	Grade	Rk / 84	Tm EPA#	Top QB
Garrett Wilson	NYJ	154	43.3	50	17	33.8
Jerry Jeudy	CLE	148	55.1	32	32	28.8
Malik Nabers	NYG	172	55.2	31	28	45.4

Wilson: 1,100+ yds despite Rodgers' worst NFL season; ranked in the bottom 40% of qualified WRs.
Jeudy: 1,229 yds on the league's worst offense (CLE, −0.183 EPA/play); ranked #32 is defensible but feels light.
Nabers: rookie target record, 37th percentile grade.

Affected TEs — 2024, top-10 by targets#

Name	Tm	Tgt	Grade	Rk / 34	Tm EPA#	Top QB
David Njoku	CLE	99	21.2	34	32	28.8
Dalton Schultz	HOU	93	30.0	31	22	31.7
Brock Bowers	LV	153	50.4	14	31	29.5

Njoku: last among all qualified TEs despite 1,000+ snaps, solid reputation. Strongest single data point for offense contamination.
Schultz: rank 31/34 with 93 targets on the Stroud-injured/Young HOU offense.
Bowers: mid-pack grade for the highest TE target volume in 2024.

Six players across the two positions, all on offenses with top-QB grade below ~46. Matches the "bad QB play × high receiver volume" pattern.

What v1 handles correctly#

The methodology is not uniformly biased against receivers on weak offenses. Two cases prove the grader distinguishes efficient play from volume-only play inside a bad offensive environment:

Brian Thomas Jr. — 2024 WR, JAX#

135 targets, team EPA rank #18, top QB grade 44.9 (Lawrence's rough season)
Grade 73.9, rank 10 / 84 — top-12 WR by grade despite the weak passing context.

A naive "bad offense → underrate" bias would predict Thomas below the WR median. He's in the top 12%.

Jonnu Smith — 2024 TE, MIA#

111 targets, team EPA rank #21, top QB grade 80.0
Grade 71.4, rank 4 / 34 — top-5 TE.

MIA wasn't great offensively (below-average EPA), yet Smith's per-target efficiency was high enough to surface a top-5 grade.

Zach Ertz (WAS, 2024) is the inverse counter-example worth noting: WAS was a top-4 offense by EPA (top QB 78.7), Ertz ranked 24/34. Strong offense did not lift a clearly declining player. The grade was right.

These three cases together show the grader is responsive to per-target efficiency rather than team context as such.

The specific confound#

The failure mode is narrower than "bad-offense receivers underrated". It is specifically:

High-volume receivers whose targets are forced by their role on a team with below-replacement QB play.

Mechanics:

wr_rec_epa_per_target and te_rec_epa_per_target carry ~35% of the composite. EPA is QB-dependent — the same route/catch generates less EPA when the QB throws late, off-platform, or low-completion.
wr_yac_over_expected_per_rec / te_yac_over_expected_per_rec carry ~27%. xYAC is calibrated on league-average receptions; on a bad-QB offense, contested catches and off-schedule throws reduce real YAC relative to xYAC without the receiver doing anything wrong.
wr_target_earn_rate / te_target_earn_rate carries only ~10% and is a volume-adjacent signal — it helps, but not enough to outweigh the 62%+ from EPA and YAC-over-expected when both are QB-suppressed.

So a receiver who is forced to absorb record target volume on a team whose QB depresses EPA/target and YAC-over-expected across the board gets dinged twice (two big components each running 0.5–1.0 z below true skill) and credited once (one small component at +1.5 to +2.0 z for volume). Net: 5–15 composite points below a reasonable estimate.

The Thomas / Jonnu Smith counter-examples work because their per-target efficiency was high enough in absolute terms to offset the QB context — they weren't just surviving on forced volume.

Why naive offense adjustment is wrong#

The intuitive "residualize components by team offensive EPA" would:

Over-correct Thomas and Jonnu Smith — they already showed the efficiency needed; an additional boost for "bad offense" makes their grades unjustifiably high and distorts the top of the leaderboard.
Under-correct Bowers / Njoku relative to what they actually need — their issue is specifically per-target efficiency suppression from QB play, not general offense-level depression. Team EPA mixes run game + line play + YAC culture, so a team-EPA adjustment would dilute the QB-specific signal.
Create new problems on good offenses — a good-offense receiver who's actually mediocre (Ertz 2024) would get a negative context adjustment and drop below where he belongs.

The right fix is usage-conditional and QB-specific: adjust per-target efficiency components for the QB quality the receiver was playing with, but only for the portion of targets that are "forced" (high target share on bad QB), and leave already-efficient-despite-bad-QB players unadjusted.

That is not a hotfix. It is a methodology change.

Decision#

Ship v1 as-is. Document the confound here. Do not modify weights, thresholds, or components. Do not layer a naive offense adjustment on top of v1.

Defer the real fix to v1.5.

v1.5 plan candidates (do not pick now; analyze first)#

QB-quality-conditional z-scoring — when z-scoring *_rec_epa_per_target and *_yac_over_expected_per_rec, condition on the receiver's primary-QB composite grade (or a CPOE-derived QB quality score). Requires a second regression pass over historical seasons to calibrate.
Usage-residualized volume — add a "forced target share" signal and partially upweight it when the receiver's QB is below a threshold. Functions as a compensating positive weight only for the high-volume-on-bad-QB cell.
Combination — (1) corrects the EPA/YAC depression, (2) credits the fact that absorbing forced volume is itself a skill signal.

All three need a validation pass against multi-season data before picking. Historical backfill of 2016–2023 (already flagged as the other major pending work) is a prerequisite — single-season analysis can't separate noise from true context effects.

UI mitigation for v1#

On player pages, display alongside the composite grade:

Team offensive EPA/play and its league rank that season.
Top QB grade on the player's team that season.
If the player is a receiver (WR/TE/RB) with top-15 volume and their team's top QB grade is below ~45, a small inline note: "grade may be suppressed by QB context — see ADR-0017."

This does not change the grade. It surfaces the context the grade doesn't fully capture, so a user reading Bowers' 50.4 sees "Raiders offense #31, top QB 29.5" next to it and understands what they're looking at.

The note trigger is deliberately narrow (top-volume + bad QB) so it doesn't fire on every bad-offense receiver — that would dilute its meaning and contradict what the data actually shows (see Thomas / Smith).

Consequences#

Easier:

v1 ships with a known, bounded limitation instead of an unfinished methodology fix. The boundary is written down and visible to users.
v1.5 has a clear mandate backed by specific player cases to validate against (Wilson, Jeudy, Nabers, Njoku, Schultz, Bowers; counter- examples Thomas, Jonnu Smith, Ertz).

Harder:

Until v1.5 lands, six named players per season carry visibly suppressed grades and users have to read the context panel to interpret them correctly. Acceptable for an MVP; not acceptable long-term.
The UI has to carry context columns that wouldn't be needed if the grade self-adjusted.

Explicitly given up:

Claiming v1 is "context-neutral". It isn't. It is "per-target efficiency-weighted within the population", which is adjacent but not the same. The /about page and the ADR index should both reflect that honestly.

References#

2024 face-check data (throwaway query, not committed) — results inlined above in §Finding and §What v1 handles correctly.
ADR-0015 §Validation — the WR YoY-r band that would inform v1.5 calibration.
ADR-0016 §Validation — TE YoY-r band.
Pending: multi-season backfill (2016–2023) to enable usage- conditional z-scoring without overfitting to one season.

Design decisions

Monorepo with shared `db/` as schema source of truth

Context#

Decision#

Consequences#

Python pipeline as installable package + raw SQL

Context#

Decision#

Consequences#

Data tier and `qualified` flag as first-class columns

Context#

Decision#

Consequences#

See also#

Normalize historical team abbreviations to current

Context#

Decision#

Consequences#

Hand-written TS types with codegen guardrail

Context#

Decision#

Consequences#

Forward-only migrations with `schema_migrations` tracking

Context#

Decision#

Consequences#

Edge cases#

Pure-function grading math, DB I/O isolated to `ingest/`

Context#

Decision#

Consequences#

Sigmoid grade mapping with k=1.15, z=0->50, z=+2->90

Context#

Decision#

Consequences#

Raw nflverse data cached as parquet; only typed tables in Postgres

Context#

Decision#

Consequences#

Implementation notes (non-binding)#

Use nflreadpy (official nflverse) instead of nfl_data_py

Context#

Decision#

Consequences#

What this changes in the repo#

What this does NOT change#

Store a thin `plays` table in Postgres, not the full PBP fat table

Context#

Decision#

Column selection#

Columns#

Team and player references: strings, not FKs#

Indexes#

Size and storage#

Consequences#

References#

Store NGS as three tables, not one unified fact table

Context#

Decision#

Rationale#

Grain and keys#

Team normalization#

Player mapping#

Minimum season#

What we store#

Consequences#

References#

QB v1 grading formula

Context#

Decision#

Composite#

Per-component definitions (before shrinkage)#

Filter#

Empirical Bayes shrinkage#

Qualified threshold#

Position assignment#

What opponent adjustment?#

Confidence#

Data tier#

Consequences#

Tier 2 — `role = blocking_te`#

Shrinkage (per-position `k`)#

`data_tier` and `data_tier_reason`#