Grim.Cards Case Study — Edition 2026-07-02

Name: Grim.Cards Case Study Dataset — 2026-07-02
Published: 2026-07-02
License: https://creativecommons.org/licenses/by/4.0/

Data Snapshot Date: 2 July 2026 · Permanent URL: grim.cards/case-study/2026-07-02 Dataset Version 2.2 · License: CC BY 4.0 · Publisher: Grim.Cards

Executive Summary

Between 14 May 2026 and 2 July 2026, Grim.Cards recorded 4,914 simulated games across 229 player-submitted decks contributed by 109 distinct users. Across all formats and all gauntlet matchups, those decks won 44.8% of games (2,203 wins, 2,710 losses, 1 draw; n = 229 decks). Separating by format — the only correct lens for this data — Commander decks (n = 201) posted a 45.7% win rate across 4,415 games, while Standard decks (n = 28) posted 37.3% across 499 games.

These headline figures sit below 50%, which is the expected structural outcome of a one-versus-the-field gauntlet: each player deck faces multiple distinct meta opponents in succession, and the field collectively wins more often than any single challenger. Sub-50% is the baseline, not a verdict on deck quality. The more diagnostically interesting signal is not the average but the spread — the 18.7-percentage-point swing between the easiest and hardest Commander matchups, the 45.5-point swing in Standard, and the 80-point range of individual Commander deck win rates (6.7% to 86.7%) — which demonstrate that the gauntlet opponent and the specific deck submitted, not some property of players in aggregate, drive the variance in outcomes.

The five headline findings, each backed by direct simulation measurement, are:

Matchup identity is the dominant driver of observed win rate. In Commander, players won 54.7% against Breya Artifact Combo (n = 201 decks, 867 games) and only 36.0% against Atraxa Superfriends (n = 201 decks, 867 games) — the same pool of decks, a different opponent.
Individual Commander deck win rates span an 80-point range (6.7% to 86.7%; median 46.7%, mean 45.2%; n = 201 decks), indicating the population of submitted decks is highly heterogeneous in construction and intent.
Among the 57 Commander decks retested, the average win-rate change was +2.7 percentage points, with 26 decks improving, 25 declining, and 6 roughly flat.
Monthly Commander volume grew sharply — from 87 simulations in May 2026 to 196 in June 2026 — while average win rate rose modestly from 43.1% to 45.9%.
Board-impact data (a board-state proxy, not a causal win claim) surfaces Terror of the Peaks as the strongest positive performer (+19.04 board-quality points per appearance; n = 10 decks, 45 observations) and Smothering Tithe as the largest negative outlier (−25.63; n = 15 decks, 154 observations) among Commander cards meeting the minimum evidence threshold.

All figures in this report are derived exclusively from the Grim.Cards production dataset. Every reported cohort clears the minimum threshold of 10 distinct decks. No individual user, deck name, or decklist is disclosed. Correlation is not causation throughout.

Methodology & Provenance

Simulation engine

Grim.Cards runs AI-versus-AI Magic: The Gathering games on a custom build of the open-source Forge engine. Each player-submitted deck is played against a fixed gauntlet of meta-representative opponent decks; all play decisions on both sides are made by the engine. No human pilots any game.

Win rate definition

Win rate = wins ÷ total games played, with draws counted in the denominator. This definition is applied consistently throughout the report. A single draw exists in the Commander dataset (Aesi Landfall matchup); it is included in the games denominator and counted neither as a win nor a loss.

Cohort definition

The primary cohort is real, human-submitted decks only. Automated Crucible reference decks (user_id = __grinder__) and system sample decks (is_sample = true) are excluded from every figure in this report. When this report mentions "decks" or "players," it means this cohort exclusively.

Format split

Win rates and all outcome statistics are split by format (Commander vs. Standard) and never pooled. Format is determined by joining simulation records to the deck metadata. The overall figures shown in the executive summary and scope table are cross-format aggregates provided for orientation only; all analytical sections use format-separated data.

Cohort size threshold

Every reported breakdown (per matchup, per color, per construction band, per card, per month, per functional category) must contain ≥ 10 distinct decks. Cohorts below this floor are suppressed or folded into broader groupings and are not surfaced in the report. This applies to win rates, card performance rankings, functional-category comparisons, and construction correlations alike.

Card-level metrics: two distinct measures

This report uses two separate card-level signals that must not be conflated:

Decision impact (counterfactual proxy): For decisions recorded in the simulation, the engine computes the difference between the line actually taken and its own next-best alternative. A negative raw delta means the alternative would have scored better; in this report, improvement over the alternative is expressed as a positive value and worse-than-alternative as negative, per the project sign convention. This is a play-quality proxy from replayed decision points — not damage dealt, creatures killed, or a causal contribution to winning.
Board impact (measured proxy): Per-deck card-performance data pooled across every deck in the cohort. Each figure is the mean board-quality change around turns the card was observed, per observation. Positive means the board state improved; negative means it worsened. This is already expressed on a positive-is-good scale and requires no sign flip. Confidence scales with the number of distinct decks a card appears in and the number of recorded observations; only cards clearing both floors (≥ 10 decks, ≥ 25 observations) are ranked.

Functional category classification (heuristic)

Functional categories — tutor/search effects, sacrifice outlets, discard effects, reanimation effects — are assigned by a keyword heuristic applied to oracle text and pre-tagged flags. This classification may mislabel edge cases. All category-based win-rate comparisons are labeled [HEURISTIC] throughout.

Privacy

No personally identifiable information, raw user IDs, deck names, decklists, or user-level timestamps appear anywhere in this report. All figures are aggregated. The Crucible automated reference corpus does not appear in any statistic presented here.

Data window

All figures cover simulations with completed records from 14 May 2026 through 2 July 2026.

Limitations

The following limitations apply to every finding in this report:

Simulated, not human, play. Results describe engine behavior on these decklists under AI piloting. Human play patterns, sideboarding, and in-game adaptation are not modeled.
Self-selected, non-random sample. Decks are those that users chose to submit to Grim.Cards. The population is not a random sample of any broader player population.
Correlational throughout. Construction breakdowns, category comparisons, and card-level analyses describe associations within this dataset. No causal claims are made or implied.
Fixed gauntlet opponents. Win rates are measured against a fixed set of meta opponents. Results reflect performance against this specific gauntlet, not against an open or evolving field.
Decision-impact figures are proxy measures. Counterfactual decision-impact scores reflect play-quality signals from replayed alternative lines — not damage, kills, or direct win contributions.
Board-impact figures are proxy measures. Card board-impact scores reflect pooled board-state changes around turns a card was observed, not causal contributions to game outcomes.
Heuristic category labels. Functional category membership (tutor, sacrifice, discard, reanimation) is assigned algorithmically and may mislabel edge cases.
Small Standard cohort. With only 28 Standard decks and 499 games in this snapshot, Standard figures carry wider uncertainty than Commander figures. Several Standard card and color cohorts fall below the minimum threshold and are suppressed.
Partial July 2026 data. The July 2026 monthly figures cover only 2 days and should be read as early-window observations, not a settled monthly figure.

1. Dataset Overview

Scope: 14 May 2026 – 2 July 2026. Cohort: player-submitted decks only.

Metric	All Formats	Commander	Standard
Distinct users	109	90	20
Decks	229	201	28
Completed simulations	343	309	34
Total games	4,914	4,415	499
Wins	2,203	2,017	186
Losses	2,710	2,397	313
Draws	1	1	0
Win rate	44.8%	45.7%	37.3%

Commander is the dominant format in this dataset by every measure — it represents 87.7% of decks, 90.4% of simulated games, and 91.6% of wins. Standard contributes 28 decks across 20 users, a sample large enough to report format-level and matchup-level figures but too small to surface most card-level breakdowns. The overall 44.8% cross-format win rate is provided for orientation; all analytical findings use format-separated data.

The dataset's rapid growth over the observation window — from its first recorded simulation on 14 May 2026 to the snapshot on 2 July 2026 — means the population of submitted decks is still accumulating. Results will shift as the sample grows; future editions will make comparisons against this snapshot's baseline.

2. The Gauntlet: Matchup Results

Win rate = wins ÷ games. Draws counted in the denominator. Every cohort: n = 201 Commander decks / n = 28 Standard decks. All cohorts clear the minimum threshold.

2a. Commander Matchups

The Commander gauntlet comprises five opponents. Every player-submitted Commander deck is tested against all five, so each opponent row reflects the full n = 201-deck cohort and the same 867 games per matchup.

Gauntlet Opponent	Wins	Losses	Draws	Games	Win Rate
Breya Artifact Combo	474	393	0	867	54.7%
Derevi Bant Control	461	406	0	867	53.2%
Aesi Landfall	410	456	1	867	47.3%
Edgar Markov Vampires	313	554	0	867	36.1%
Atraxa Superfriends	312	555	0	867	36.0%

The range across Commander matchups is 18.7 percentage points — from 54.7% against Breya Artifact Combo to 36.0% against Atraxa Superfriends. This spread, measured across an identical challenger pool (the same 201 decks), isolates the matchup itself as the primary source of variance. Breya Artifact Combo and Derevi Bant Control are the two opponents against which player decks collectively exceed 50%, meaning the field wins more often than any individual challenger only in the remaining three matchups. The two tightest opponents, Atraxa Superfriends and Edgar Markov Vampires, sit within 0.1 points of each other at 36.0% and 36.1% respectively — essentially identical in aggregate difficulty for this cohort.

The single draw in the dataset occurs in the Aesi Landfall matchup. Under the win-rate definition used (wins ÷ total games, draw in denominator), this draw is neither a win nor a loss and reduces the effective win rate by less than 0.1 percentage points relative to a fully decisive field.

2b. Standard Matchups

The Standard gauntlet comprises five opponents. Every player-submitted Standard deck is tested against all five; each row reflects the full n = 28-deck cohort across 99 games per matchup.

Gauntlet Opponent	Wins	Losses	Games	Win Rate
Temur Harmonizer Combo	58	41	99	58.6%
Jeskai Control	44	55	99	44.4%
Dimir Midrange	36	63	99	36.4%
Azorius Tempo	35	64	99	35.4%
Mono Red Aggro	13	86	99	13.1%

The Standard matchup spread is dramatic: 45.5 percentage points separate Temur Harmonizer Combo (58.6%) from Mono Red Aggro (13.1%). That spread is nearly 2.5 times wider than the Commander spread, suggesting the Standard gauntlet opponents are more differentiated in difficulty — or that the submitted Standard deck population is particularly ill-suited to fast aggressive matchups. Either reading is plausible and neither can be confirmed without additional data.

Mono Red Aggro is a severe outlier: player decks won only 13 of 99 games (13.1%), meaning the gauntlet's aggro deck closed out the majority of games before player decks could mount a response. Temur Harmonizer Combo is the mirror image — the only Standard matchup where players exceeded 50%, winning 58 of 99 games. The Standard sample is 28 decks, the smallest cohort in this report that still clears the minimum threshold; these matchup figures should be interpreted with that limited sample in mind.

3. Top Commanders

Cohort: Commander format, player-submitted decks only. Win rate reported only where n ≥ 10 distinct decks. Commanders with fewer than 10 decks are listed for usage context but win rates are suppressed (marked "—" ) per the minimum-cohort rule.

Among all commanders represented in the dataset, the commander with the most submitted decks that also clears the win-rate reporting threshold is Meren of Clan Nel Toth, appearing in 11 distinct decks. Those decks recorded 67 wins across 165 games for a 40.6% win rate (n = 11 decks, 165 games).

No other single commander clears the 10-deck threshold in this snapshot. The next most-represented commanders are Ureni of the Unwritten (6 decks, 150 games, win rate suppressed), The Ur-Dragon (5 decks, 76 games, suppressed), and Zimone, Infinite Analyst (4 decks, 105 games, suppressed). The dataset contains at least 20 distinct commanders with 2 or more decks submitted, indicating wide diversity of commander choice rather than concentration around a small set of popular options.

Usage note: Commander popularity (deck count) and commander win rate are different questions. In this snapshot, the most-represented commander that can be measured — Meren of Clan Nel Toth — posted a win rate (40.6%) below the format average (45.7%). Whether that gap is attributable to the commander, the specific decks submitted, the matchup composition, or random variance in a cohort of 11 decks cannot be determined from this data. The remaining commanders' win rates are suppressed precisely because their cohorts are too small to report reliably.

As the dataset grows, future editions will be able to surface win-rate comparisons across more commanders. For now, the commander landscape in this dataset is better described as wide and varied than as concentrated or measurable in comparative terms.

4. Top Cards by Usage

Cohort: player-submitted decks by format. Win rate reported for containing-deck win rate (the win rate of all decks in the cohort that include the card), only where n ≥ 10 distinct decks. Basic lands excluded. Containing-deck win rate is a property of the decks that run the card, not a causal claim about the card's individual contribution.

4a. Commander — Most-Played Cards (by deck count)

Card	Decks (n)	Containing-Deck Win Rate
Sol Ring	180	46.0%
Command Tower	154	45.1%
Arcane Signet	136	47.6%
Exotic Orchard	80	43.9%
Reliquary Tower	66	41.9%
Path of Ancestry	56	50.1%
Lightning Greaves	54	44.5%
Evolving Wilds	50	49.3%
Swiftfoot Boots	44	48.0%
Swords to Plowshares	43	45.9%
Bojuka Bog	41	40.5%
Cultivate	41	50.8%
Fellwar Stone	41	45.4%
Demonic Tutor	35	41.3%
Birds of Paradise	35	44.2%
Kodama's Reach	34	54.9%
Mind Stone	34	40.1%
Ashnod's Altar	31	37.5%
Rampant Growth	31	46.0%
Thought Vessel	31	46.7%

Sol Ring is the single most ubiquitous card in the Commander dataset, appearing in 180 of 201 decks (89.6%). The next two most common cards — Command Tower (154 decks) and Arcane Signet (136 decks) — are mana-fixing staples that follow the same ubiquity pattern. The top five by deck count are all mana-related, reflecting broad consensus among submitting players on foundational Commander infrastructure.

Among cards with at least 10-deck containing populations, the highest containing-deck win rate belongs to Kodama's Reach (54.9%; n = 34 decks), followed by Cultivate (50.8%; n = 41 decks) and Path of Ancestry (50.1%; n = 56 decks). At the lower end among qualifiers, Ashnod's Altar (37.5%; n = 31 decks) and Mind Stone (40.1%; n = 34 decks) have the lowest containing-deck win rates among the twenty most-played cards. All of these are descriptive correlations: a card appearing in winning decks does not mean the card caused those wins.

4b. Standard — Most-Played Cards

The Standard card cohort is severely constrained by the 28-deck sample size. No Standard card clears the 10-deck minimum threshold for win-rate reporting in this snapshot. The most common cards in Standard — Inspiring Vantage (7 decks), Lightning Bolt (7 decks), and a cluster of cards each in 6 decks — all fall below the minimum. Standard card-level win rates are therefore fully suppressed in this edition. As the Standard deck count grows in future editions, this section will expand.

5. Win-Rate Distribution

Cohort: player-submitted decks by format. Each deck's win rate is its individual wins ÷ games across all gauntlet matchups. Distribution buckets are 10-percentage-point ranges.

5a. Commander Win-Rate Distribution (n = 201 decks)

Win-Rate Range	Decks
0–10%	8
10–20%	15
20–30%	23
30–40%	51
40–50%	30
50–60%	31
60–70%	18
70–80%	19
80–90%	6
90–100%	0

Summary statistics: Mean 45.2%, Median 46.7%, Min 6.7%, Max 86.7% (n = 201 decks).

The Commander distribution has a notable shape: the 30–40% bucket is the single largest (51 decks), creating a mode that sits below the median and mean. The right tail extends to 86.7%, pulling the mean slightly below the median — a mild right-skew pattern in which a relatively small number of high-performing decks partially offsets a larger cluster of below-average performers. The 30–40% bucket's size (25.4% of all Commander decks) suggests that a meaningful fraction of submitted decks struggle against this specific gauntlet. No deck in the Commander cohort achieved a 90%+ win rate.

The spread from 6.7% to 86.7% — an 80-point range — confirms that the submitted Commander deck population is extremely heterogeneous. This is expected for a format with essentially unlimited construction space: a player submitting a casual tribal deck and a player submitting a highly optimized combo deck both appear in the same cohort. The distribution should not be read as a grading curve; it is a description of the self-selected decks that users chose to test in this window.

5b. Standard Win-Rate Distribution (n = 28 decks)

Win-Rate Range	Decks
0–10%	3
10–20%	3
20–30%	4
30–40%	9
40–50%	2
50–60%	4
60–70%	1
70–80%	2
80–90%	0
90–100%	0

Summary statistics: Mean 36.4%, Median 33.3%, Min 0%, Max 80% (n = 28 decks).

The Standard distribution, with only 28 decks, is too small for confident distributional claims, but the observable pattern differs from Commander. The mean (36.4%) exceeds the median (33.3%), indicating a modest right-pull from a handful of high-performing decks on an otherwise left-heavy distribution. Three decks achieved 0% win rates; two reached 70–80%. No Standard deck in this snapshot exceeded 80%. The 30–40% bucket is again the mode (9 decks), mirroring Commander's modal bucket despite the formats' different gauntlets.

Note that individual bucket cohorts within Standard are small (most contain fewer than 5 decks), so per-bucket figures are reported for distributional description only and carry no analytical weight at the bucket level.

6. Deck Iteration: Retest Win-Rate Changes

Cohort: Commander format only. A "retest" is defined as a deck with more than one completed simulation on record. Win-rate change = latest completed simulation win rate minus first completed simulation win rate. Standard retest data is not reported: the standard retest cohort does not clear the minimum threshold in this snapshot.

Metric	Value	n
Commander decks retested	57	—
Average win-rate change (first → latest)	+2.7 pp	57 decks
Improved (positive delta)	26 decks	57 decks
Declined (negative delta)	25 decks	57 decks
Roughly flat	6 decks	57 decks

Among the 57 Commander decks with more than one completed simulation, the average win-rate change from first to most recent test is +2.7 percentage points. The distribution of outcomes is nearly even: 26 decks improved, 25 declined, and 6 were flat. The slight positive average is driven by the improving cohort outweighing the declining cohort in magnitude at the mean, not in count — the number of improvers and decliners is effectively tied.

This near-symmetry is the most honest summary of the retest data: deck iteration in this cohort does not produce a reliable directional improvement signal in aggregate. Some decks improved substantially, some declined, and the averages are close to balanced. Whether a particular deck's second test reflects deck changes, random variance in the simulation, or some other factor cannot be determined from this data. The +2.7 pp average is presented as an observed figure, not a prediction or a guarantee of improvement from retesting.

The 57 retested decks represent 28.4% of the 201-deck Commander cohort, indicating that the majority of decks in this snapshot were tested only once. As the platform accumulates more iterations per deck, future retest analyses will have larger cohorts and longer iteration chains.

7. Monthly Trends

Cohort: player-submitted decks by format and month. July 2026 covers only 2 days (1–2 July); treat as a partial-window observation.

Month	Format	Simulations	Decks	Games	Win Rate
May 2026	Commander	87	55	1,231	43.1%
June 2026	Commander	196	127	2,820	45.9%
June 2026	Standard	26	21	379	36.9%
July 2026 (partial)	Commander	26	20	364	52.7%

Commander testing volume more than doubled from May (87 simulations, 55 decks) to June (196 simulations, 127 decks), reflecting rapid platform growth in the observation window. The average Commander win rate rose modestly from 43.1% in May to 45.9% in June — a 2.8-point increase. Standard data is available only for June 2026 (26 simulations, 21 decks, 36.9% win rate), as the Standard cohort in May fell below the minimum threshold.

The partial July 2026 figure (26 simulations, 20 decks, 2 days, 52.7% Commander win rate) is an early-window observation and almost certainly subject to significant revision as the month accumulates more tests. It is included for completeness but should not be interpreted as a trend.

An important interpretive caution: monthly win-rate figures compare different cohorts of decks in different months. A rising monthly average reflects that different decks were submitted in that month — it does not indicate that the same decks improved over time. The retest analysis in Section 6 is the correct lens for individual deck improvement; the monthly trend is a lens on submission patterns and cohort mix.

8. Construction Correlations

All figures are descriptive correlations only. Correlation is not causation. Every cohort clears the minimum threshold of n ≥ 10 decks unless noted otherwise.

8a. Color Count vs. Win Rate (Commander)

Colors in Deck	Decks (n)	Win Rate
1 (Mono-color)	29	48.7%
2 (Two-color)	65	41.9%
3 (Three-color)	78	45.8%
5 (Five-color)	23	48.0%

Four-color Commander decks (if any exist in the cohort) fell below the minimum threshold and are suppressed. Among the four reported bands, mono-color decks post the highest observed win rate (48.7%; n = 29) and two-color decks the lowest (41.9%; n = 65). Three- and five-color decks sit in the middle. These are correlational observations; color count is entangled with commander choice, deck strategy, and construction philosophy in ways this data cannot separate.

8b. Color Count vs. Win Rate (Standard)

Only one Standard color-count band clears the minimum threshold in this snapshot:

Colors in Deck	Decks (n)	Win Rate
2 (Two-color)	17	37.9%

Single-color, three-color, and other-count Standard decks fell below the threshold. The two-color figure (37.9%; n = 17) is close to the overall Standard average (37.3%), providing no strong color-count signal within the available Standard data.

8c. Land Ratio vs. Win Rate (Commander)

Land % of Deck	Decks (n)	Win Rate
~30% (≤32%)	25	45.0%
~35% (33–37%)	121	43.3%
~40% (≥38%)	49	50.6%

The largest Commander land-ratio band is the middle tier (~35%; n = 121 decks), which also posts the lowest win rate of the three qualifying bands (43.3%). The ~40% land band posts the highest win rate (50.6%; n = 49 decks). This is a correlation; it may reflect that decks prioritizing consistent mana access are also better constructed in other dimensions, or it may reflect a specific subset of archetypes that both run more lands and happen to match well against this gauntlet.

8d. Land Ratio vs. Win Rate (Standard)

Land % of Deck	Decks (n)	Win Rate
~40% (≥38%)	15	44.4%
Other bands	—	Suppressed

Only the ~40% land band clears the Standard minimum threshold. The figure (44.4%; n = 15 decks) exceeds the overall Standard average (37.3%) by 7.1 percentage points. Other Standard land-ratio bands are suppressed.

8e. Color Identity vs. Win Rate

Commander (n ≥ 10 for all; colors are not mutually exclusive — decks with multiple colors count toward each):

Color	Decks (n)	Games	Win Rate
Red	97	2,251	48.6%
Green	111	2,365	46.2%
Blue	98	2,252	45.2%
Black	116	2,353	43.9%
White	102	2,172	43.9%

Red-containing Commander decks post the highest observed win rate in the cohort (48.6%; n = 97), and Black- and White-containing decks tie for the lowest (43.9%; n = 116 and n = 102 respectively). The spread across all five colors is 4.7 percentage points. Because colors overlap heavily within multicolor decks, these are not independent measurements — a five-color deck contributes to all five rows simultaneously.

Standard (reporting only cohorts ≥ 10 decks):

Color	Decks (n)	Games	Win Rate
Red	14	225	40.0%
White	14	225	38.2%
Blue	10	199	31.2%
Black	10	180	30.0%
Green	—	—	Suppressed (n < 10)

Standard Red-containing decks lead at 40.0% (n = 14); Black-containing decks trail at 30.0% (n = 10). Green-containing Standard decks fall below the minimum threshold (n = 7) and are suppressed. The Standard cohort is small enough that these color figures carry meaningful uncertainty.

8f. Win-Rate Bracket Construction Comparison (Commander)

Decks are grouped into three win-rate brackets; construction characteristics are averaged within each bracket.

Bracket	Decks (n)	Avg Win Rate	Avg Land %	Avg Creature %	Avg Spell %	Avg Art/Ench %	Avg Mana Value
High (>55%)	55	70.0%	35.8%	29.6%	16.5%	17.5%	3.49
Middle (40–55%)	75	46.2%	36.0%	29.2%	16.6%	17.5%	3.20
Low (<40%)	71	24.8%	35.2%	28.1%	19.1%	16.9%	3.08

The high win-rate bracket (n = 55 decks, average 70.0%) shows 0.6 percentage points more lands than the low bracket and an average mana value 0.41 higher (3.49 vs. 3.08). Spell percentage runs in the opposite direction: low-win-rate decks average 19.1% spells vs. 16.5% in the high bracket. Creature and artifact/enchantment percentages are similar across brackets. These are descriptive correlations across a self-selected deck sample; the construction characteristics co-vary with many other unmeasured factors including commander choice, archetype, and player intent.

The Standard bracket data is available only for the low-win-rate bracket (n = 17 decks, average win rate 24.1%), as the middle and high brackets do not clear the minimum threshold within the 28-deck Standard cohort. No cross-bracket comparison is possible for Standard in this snapshot.

8g. Deck Type Mix (Commander, n = 201 decks)

Card Type	Average % of Deck
Land	35.7%
Creature	28.9%
Instant + Sorcery (combined)	17.5%
Artifact	9.9%
Enchantment	7.4%
Planeswalker	0.7%

For the 137 Commander decks where individual instant and sorcery splits are available:

Split Type	Average % of Deck	Sub-cohort
Instant	9.4%	n = 137 decks
Sorcery	7.6%	n = 137 decks

Note: Instant and sorcery split figures are derived from the sub-cohort of 137 decks where the type-split data is available. They need not sum exactly to the combined instant+sorcery figure (17.5%) because unclassifiable spells exist only in the combined figure, and the sub-cohort differs from the full 201-deck population.

Standard (n = 28 decks): Land 37.3%, Creature 28.2%, Instant+Sorcery 23.9% (Instant 15.0%, Sorcery 9.6%; n = 23 decks with splits), Artifact 5.4%, Enchantment 4.9%, Planeswalker 0.5%.

Standard decks in this cohort carry noticeably more instant and sorcery cards (23.9% combined vs. 17.5% in Commander) and fewer artifacts and enchantments (10.3% combined vs. 17.3% in Commander), reflecting the formats' different card-pool and construction norms.

9. Card-Category Insights (Heuristic)

All findings in this section use functional categories assigned by a keyword heuristic. Correlation is not causation. Only category splits where both "with" and "without" cohorts clear n ≥ 10 decks are reported with a delta.

9a. Commander Category Analysis (n = 201 decks)

Category	With Decks (n)	With Win Rate	Without Decks (n)	Without Win Rate	Delta
Sacrifice outlets	196	45.2%	5	— (suppressed)	—
Search / tutor effects	190	45.1%	11	46.1%	−1.0 pp
Discard effects	174	44.7%	27	47.9%	−3.2 pp
Reanimation effects	125	42.8%	76	49.0%	−6.2 pp

Sacrifice outlets are present in 196 of 201 Commander decks — effectively universal in this cohort. The "without" group (5 decks) is suppressed, so no contrast is possible.

The most striking gap is reanimation: Commander decks with reanimation effects (n = 125) won at 42.8%, while those without (n = 76) won at 49.0% — a 6.2-point difference. Tutor effects show the smallest gap (−1.0 pp; n = 190 with, n = 11 without), barely distinguishable from noise at these cohort sizes. Discard effects sit in the middle (−3.2 pp; n = 174 with, n = 27 without).

Interpretive caution [HEURISTIC]: These gaps do not mean that including reanimation effects causes lower win rates. Decks running reanimation effects may differ systematically from those without in strategy, commander choice, total mana investment, or many other dimensions. The keyword heuristic may also mislabel some cards in edge cases. These are descriptive associations only.

9b. Standard Category Analysis (n = 28 decks)

Category	With Decks (n)	With Win Rate	Without Decks (n)	Without Win Rate	Delta
Sacrifice outlets	26	35.9%	2	— (suppressed)	—
Discard effects	18	33.0%	10	42.7%	−9.7 pp
Search / tutor effects	10	27.7%	18	41.3%	−13.6 pp
Reanimation effects	—	— (suppressed)	20	41.0%	—

In Standard, the tutor/search gap is the largest at −13.6 percentage points (27.7% with, n = 10; 41.3% without, n = 18). Reanimation's "with" cohort (8 decks) falls below the minimum threshold and is suppressed. These Standard figures are based on a 28-deck total cohort and should be interpreted with correspondingly limited confidence.

[HEURISTIC] Category membership assigned by keyword heuristic; Standard sample small; no causal inference warranted.

10. Most Impactful Cards (Decision Impact) (Counterfactual Proxy)

This section reports counterfactual decision-impact scores: for recorded decisions in the simulation, the difference between the line taken and the engine's own next-best alternative. In this report: positive value = line taken beat the alternative; negative value = the alternative would have scored better. This is a play-quality proxy from replayed decision points — not damage dealt, creatures killed, or a causal win contribution. Only cards clearing n ≥ 10 distinct decks and ≥ 10 recorded observations are reported.

Commander Decision Impact

Card	Decks (n)	Observations	Decision Impact
Lightning Greaves	10	13	−235 points

Only one Commander card clears both the minimum-deck and minimum-observation thresholds in this snapshot. Lightning Greaves (n = 10 decks, 13 observations) records a decision-impact score of −235 counterfactual points. This means that across the 13 recorded decisions involving Lightning Greaves, the engine's own next-best alternative would, on average, have scored 235 points better than the line actually taken. A negative decision-impact score indicates the card was involved in decisions where the alternative line would have been superior by the engine's own evaluation — not that the card is bad, not that it caused losses, and not that a human player would make the same decisions.

The observation count (13) is modest. As more decks running Lightning Greaves accumulate simulations, this figure will become more or less stable. No Standard card clears the minimum thresholds for decision-impact reporting in this snapshot.

11. Card Performance Roll-Up: Board Impact (Measured Proxy)

Board-impact scores measure the average change in board-state quality around turns when a card was observed, pooled across all decks in the cohort running that card. Positive = board improved; negative = board worsened. This is a board-state proxy, not a damage metric, a kill metric, or a causal win claim. Rankings are based on cards in ≥ 10 distinct decks with ≥ 25 recorded observations. In this snapshot, 57 Commander cards qualify. No Standard cards qualify (Standard cohort too small). Multi-color cards count toward each of their component colors; type buckets pool cards of very different roles.

11a. Top Performers — Commander (Board Impact)

Rank	Card	Decks (n)	Observations	Board Impact (per appearance)
1	Terror of the Peaks	10	45	+19.04
2	Lathliss, Dragon Queen	12	50	+9.10
3	Miirym, Sentinel Wyrm	10	93	+6.67
4	Torment of Hailfire	10	72	+4.01
5	Urza's Incubator	14	53	+3.79
6	Three Visits	14	53	+3.68
7	Solemn Simulacrum	13	94	+3.38
8	Birds of Paradise	17	114	+3.20
9	Dragon's Hoard	11	53	+3.02
10	Jeska's Will	12	39	+2.82
11	Reanimate	15	29	+2.79
12	Herald's Horn	12	33	+2.18
13	Chaos Warp	24	75	+2.15
14	Dragon Tempest	13	38	+2.11
15	Swords to Plowshares	31	143	+2.08

Terror of the Peaks leads the positive performers with a board-impact score of +19.04 per appearance (n = 10 decks, 45 observations). This is the largest positive score in the dataset by a substantial margin — the next closest is Lathliss, Dragon Queen at +9.10. Both are creatures with triggered abilities that generate board presence when other creatures enter play, which the board-quality metric captures in the turns they appear. The scores reflect board-state change, not a win count or a damage total.

The top-15 list is noticeably heavy with Dragon-tribal support cards (Lathliss, Dragon Queen; Miirym, Sentinel Wyrm; Dragon's Hoard; Dragon Tempest; Urza's Incubator — which is most commonly used in tribal strategies). This pattern reflects the Dragon-tribal representation in the submitted deck population rather than a universal finding about these cards across all contexts.

Swords to Plowshares (rank 15, +2.08; n = 31 decks, 143 observations) is the most broadly attested card in the positive-performer list by both deck count and observation count, lending its figure relatively more credibility than the narrower-cohort rankings above it.

11b. Bottom Performers — Commander (Board Impact)

Rank	Card	Decks (n)	Observations	Board Impact (per appearance)
1 (worst)	Smothering Tithe	15	154	−25.63
2	Dictate of Erebos	12	59	−9.83
3	Gray Merchant of Asphodel	10	34	−7.76
4	Ashnod's Altar	17	62	−7.45
5	Blood Artist	12	114	−5.85
6	Dragonstorm Globe	10	38	−5.55
7	Chromatic Lantern	13	49	−5.33
8	Garruk's Uprising	13	34	−5.18
9	Carrion Feeder	10	37	−5.14
10	Sol Ring	109	558	−4.46
11	Blasphemous Act	14	40	−4.45
12	Dark Ritual	13	32	−4.09
13	Frontier Siege	10	26	−4.08
14	Rhystic Study	15	55	−3.96
15	Eternal Witness	10	66	−3.92

Smothering Tithe records the most negative board-impact score in the dataset: −25.63 per appearance (n = 15 decks, 154 observations). The score is substantially worse than the second-worst card (Dictate of Erebos at −9.83), and the 154-observation count makes it one of the better-attested negative figures in the dataset. The board-quality metric measures the state of the board around turns the card appears — it does not measure whether the card "fails to pay off" in any causal sense, and human players may use Smothering Tithe in ways the engine does not optimally replicate.

Sol Ring (rank 10, −4.46; n = 109 decks, 558 observations) is the most widely attested card in the entire dataset and appears in the bottom-15 board-impact list. Its 558 observations dwarf every other card's observation count. A negative board-impact score for Sol Ring may reflect the metric's sensitivity to the specific turns where the card appears (early turns when board states are inherently low-value) rather than any failure of the card itself. The measurement methodology — board-state quality delta around turns of observation — may systematically undervalue cards that are most impactful in the early game when board states are sparse.

The bottom-15 list includes several sacrifice-synergy cards (Dictate of Erebos, Ashnod's Altar, Blood Artist, Carrion Feeder, Gray Merchant of Asphodel) that rely on specific game states to generate value — states the board-quality metric may not fully capture. This is a board-state proxy, not a verdict on card quality.

11c. Board Impact by Card Type (Commander)

Type	Cards	Observations	Decks with ≥1	Avg Board Impact
Instants	365	2,940	31	0.00
Sorceries	322	2,394	24	0.00
Artifacts	339	5,568	109	−2.56
Creatures	1,843	25,708	17	−3.05
Enchantments	385	3,433	15	−11.02

Instants and sorceries both average 0.00 board-impact points per appearance across their respective observation pools — the midpoint of the scale in this dataset. Enchantments average −11.02, the lowest type-aggregate figure, driven in part by Smothering Tithe (the dataset's most negative individual card) and other enchantments in the bottom performers. Artifacts average −2.56 and creatures −3.05.

Interpretive caution: These type-level averages pool cards with wildly different roles and activation patterns. A board-wipe sorcery and a ramp sorcery both count as sorceries. A mana-producing artifact and a sacrifice outlet both count as artifacts. The aggregate figures describe the pooled population of submitted decks' card choices by type, not some property of the card types themselves.

11d. Board Impact by Color Identity (Commander)

Color	Cards	Observations	Decks with ≥1	Avg Board Impact
Red	601	6,536	24	+5.92
Green	921	9,737	23	+3.82
Black	820	10,261	24	+0.67
Colorless	356	6,339	109	−0.45
Blue	627	8,392	15	−1.10
White	736	12,688	31	−11.69

Red cards average the highest pooled board impact (+5.92; 6,536 observations across 24+ decks) and White cards the lowest (−11.69; 12,688 observations across 31+ decks). The White figure is substantially affected by Smothering Tithe (the worst individual performer), which appears in 15 decks with 154 observations — a meaningful drag on the White color aggregate. Multi-color cards count toward each of their component colors, so color rows are not mutually exclusive.

[MEASURED PROXY] Color-identity aggregates pool very different cards and strategies. These figures describe the board-impact profile of the cards that submitted decks in this cohort chose to run by color, not a universal property of the colors.

12. Color-Identity Breakdown

(See Section 8e for the full color-vs-win-rate table. This section provides the narrative summary.)

In Commander (n = 201 decks), the five colors span a 4.7-point win-rate range: Red-containing decks lead at 48.6% (n = 97) and Black- and White-containing decks tie at the bottom at 43.9% (n = 116 and n = 102 respectively). Green (46.2%; n = 111) and Blue (45.2%; n = 98) sit between them. Because the majority of Commander decks in this cohort run three or more colors, these color-level figures are heavily correlated — a deck contributing to the Red row also contributes to the Green and Blue rows in many cases.

The most important observation from the color data is that the full range (48.6% to 43.9%) is narrower than the matchup range (54.7% to 36.0%). Matchup identity explains more of the observed win-rate variance in this dataset than color identity does. This does not mean color choice is irrelevant — it means the data as collected and analyzed here cannot separate color effects from the many other factors that vary alongside color in Commander deck construction.

In Standard (n = 28 decks), Red and White tie for the most-represented colors (14 decks each), with Red leading at 40.0% and White at 38.2%. Blue and Black each appear in 10 decks, posting 31.2% and 30.0% respectively. Green falls below the minimum threshold (7 decks, suppressed). The Standard Standard color figures should be treated with caution given the 28-deck total cohort.

13. Tempo and Game Length

Cohort: player-submitted decks by format. Game length measured in turns per game.

Format	Decks (n)	Avg Turns	Typical Range (Shortest–Longest)
Commander	201	9.0	7.3–10.7
Standard	28	9.5	7.3–11.7

Commander games in this dataset average 9.0 turns (typical spread: 7.3 turns for the shortest games to 10.7 for the longest; n = 201 decks). Standard games average 9.5 turns (spread: 7.3–11.7; n = 28 decks). The two formats are surprisingly similar in average game length despite their structural differences — Commander is a multiplayer format played as 1v1 in the Grim.Cards simulation, and Standard is inherently 1v1.

The average game length is a tempo marker for how long this particular gauntlet took to produce decisive outcomes, not a property of the formats in general or of human play. Standard's slightly longer maximum (11.7 vs. 10.7 turns) may reflect control-heavy matchups (Jeskai Control) extending games, while Standard's more aggressive matchup (Mono Red Aggro posted a 13.1% player win rate, implying most games resolved quickly in the gauntlet's favor) would pull the minimum downward.

14. Power Score (Internal Composite — Secondary and Caveated)

Important: Power Score is an internal Grim.Cards composite indicator, not an objective or universal measure of deck power. It is not a win-rate measurement. It should not be used to rank, judge, or compare decks in a universal sense. It appears here last, as a secondary note on the dataset's internal grade distribution, not as a finding or headline.

Among the 229 player-submitted decks in this snapshot for which Power Score grades were computed:

Grade	Decks
S	1
A	13
B	24
C	35
D	70
F	86

The grade distribution is heavily weighted toward the lower end (F and D together account for 156 of 229 graded decks). This distribution reflects the Power Score composite's calibration against the internal reference corpus, which includes highly optimized automated reference decks; submitted player decks span a wide range of construction philosophies, many of which are intentionally casual, thematic, or experimental rather than maximally optimized. The Power Score distribution is presented here as a descriptive note on the dataset, not as a ranking or quality judgment.

15. Key Findings and What We Cannot Conclude Yet

What the data shows

Gauntlet opponent identity is the primary driver of win-rate variance. The 18.7-point Commander spread and the 45.5-point Standard spread across otherwise identical challenger pools are larger than any other measured variable in this dataset. (Commander: n = 201 decks, 867 games per opponent.)
Commander decks span an 80-point individual win-rate range (6.7%–86.7%), confirming the submitted population is highly heterogeneous. The median (46.7%) sits close to 50%, indicating the typical submitted Commander deck is roughly competitive against this specific gauntlet. (n = 201 decks.)
Standard decks underperform Commander decks in aggregate — 37.3% vs. 45.7% overall win rate — with a median of 33.3% (vs. Commander's 46.7%). This gap is measurable but the Standard cohort (28 decks) is too small to determine whether it reflects format difficulty, deck-construction patterns, cohort composition, or gauntlet calibration. (Commander n = 201, Standard n = 28.)
Retested Commander decks show a near-symmetric outcome split — 26 improved, 25 declined, 6 flat — with a mean delta of +2.7 pp. The near-symmetry means retesting does not produce a consistent directional improvement signal in aggregate. (n = 57 retested Commander decks.)
Dragon-tribal support dominates the top board-impact rankings, with Terror of the Peaks (+19.04), Lathliss, Dragon Queen (+9.10), Miirym, Sentinel Wyrm (+6.67), Dragon's Hoard (+3.02), and Dragon Tempest (+2.11) all in the top-15 positive performers. This reflects the Dragon-tribal representation in the submitted deck population, not a universal finding. (All: n ≥ 10 decks, ≥ 25 observations each.)
Smothering Tithe records the most negative board impact in the dataset (−25.63 per appearance; n = 15 decks, 154 observations), substantially worse than the next worst card (−9.83). This is a board-state proxy finding, not a causal win-rate claim.
Land ratio correlates with Commander win rate across brackets: decks in the ~40% land band average a 50.6% win rate (n = 49) vs. 43.3% for the ~35% land band (n = 121). The high-win-rate bracket (>55%; n = 55) averages 35.8% lands and a 3.49 average mana value, vs. 35.2% and 3.08 for the low bracket (n = 71). These are descriptive correlations.

What we cannot conclude from this data

Causation: No construction feature, card, color, or functional category can be said to cause higher or lower win rates. All reported associations are correlational.
Generalization beyond this gauntlet: Win rates are specific to the five gauntlet opponents in each format. Against a different meta or in human play, outcomes could differ substantially.
Individual deck advice: This report contains no deck-building recommendations. The data describes aggregated outcomes across a self-selected sample; it does not prescribe construction choices.
Commander comparisons below threshold: All commanders except Meren of Clan Nel Toth fall below the minimum 10-deck threshold for win-rate reporting. No commander-vs-commander win-rate comparison is possible in this snapshot.
Standard card-level findings: No Standard card clears both minimum-cohort thresholds. Standard card-level analysis is deferred to a future edition with a larger sample.
Player-level conclusions: No user-level breakdown exists in this report. Aggregation is by deck and format only.

16. Baseline Table for Future Comparison

This table records the key metrics from this snapshot for direct comparison in future editions. All figures: snapshot date 2 July 2026, data window 14 May 2026 – 2 July 2026.

Metric	Value	n	Format
Overall win rate	44.8%	229 decks / 4,914 games	All
Commander win rate	45.7%	201 decks / 4,415 games	Commander
Standard win rate	37.3%	28 decks / 499 games	Standard
Commander median win rate	46.7%	201 decks	Commander
Standard median win rate	33.3%	28 decks	Standard
Commander win rate vs. Breya Artifact Combo	54.7%	201 decks / 867 games	Commander
Commander win rate vs. Atraxa Superfriends	36.0%	201 decks / 867 games	Commander
Standard win rate vs. Temur Harmonizer Combo	58.6%	28 decks / 99 games	Standard
Standard win rate vs. Mono Red Aggro	13.1%	28 decks / 99 games	Standard
Commander individual win rate range	6.7%–86.7%	201 decks	Commander
Retested decks — avg win-rate change	+2.7 pp	57 decks	Commander
Retested decks — improved / declined / flat	26 / 25 / 6	57 decks	Commander
Highest positive board impact (card)	Terror of the Peaks +19.04	10 decks / 45 obs	Commander
Lowest board impact (card)	Smothering Tithe −25.63	15 decks / 154 obs	Commander
Avg Commander game length	9.0 turns	201 decks	Commander
Avg Standard game length	9.5 turns	28 decks	Standard
Total distinct users	109	—	All
Total decks	229	—	All
Total simulations	343	—	All

Citation & Provenance

Publisher: Grim.Cards Report title: Grim.Cards Case Study — Edition 2026-07-02 Dataset version: 2.2 Snapshot date: 2 July 2026 Data window: 14 May 2026 – 2 July 2026 Generated: 2 July 2026 at 14:00:08 UTC Canonical URL: grim.cards/case-study/2026-07-02 License: Creative Commons Attribution 4.0 International (CC BY 4.0) — https://creativecommons.org/licenses/by/4.0/

Method: AI-versus-AI Magic: The Gathering simulations run on a custom build of the open-source Forge engine. Each player-submitted deck is played against a fixed gauntlet of meta decks; win rate is wins divided by all games played (draws counted in the denominator). Cohort: real, human-submitted Commander and Standard decks; automated Crucible decks and system sample decks excluded. Minimum cohort for any reported group: 10 distinct decks. All correlations are descriptive; no causal claims are made.

How to cite (example): Grim.Cards. "Grim.Cards Case Study — Edition 2026-07-02." grim.cards/case-study/2026-07-02. Published 2 July 2026. Dataset version 2.2. CC BY 4.0.

All figures in this report are derived exclusively from the Grim.Cards production dataset, snapshot dated 2 July 2026. No figures have been invented or extrapolated. Every reported cohort contains a minimum of 10 distinct decks. Cohorts falling below this threshold are suppressed. No personally identifiable information, raw user identifiers, deck names, decklists, or user-level timestamps appear in this report.