Introduction
I’ve recently started playing Magic the Gathering and have been experimenting with a few decks to try and learn the game better and see which style(s) of decks I like to play. In order to do so I’m playing practice matches against myself. Despite my best efforts, one of the decks I’m very excited about, the Bloomburrow squirrel deck, has lost most of the practice matches I’ve played against myself.
This led me to some despair. If I couldn’t win with the deck I was most excited about, how do I determine which one I’m the best with? That got me curious about established algorithms for ranking players. Turns out there are several different algorithms for different types of games, like TrueSkill for team-based games. For 1-on-1 games like chess, Glicko-2 is the most popular.
Glicko-2
Glicko-2 is an algorithm made by Mark E. Glickman. It is an extension of the Glicko algorithm (also by Glickman), which is an extension of the Elo algorithm originally by Arpad Elo. The Elo system just includes a ranking of each player. The Glicko algorithm improves upon that by adding a measurement of how uncertain the system is about the player’s rating. As the player plays more games, the more sure the system is about them. It also increases the uncertainty if the player doesn’t play games for a while. The Glicko-2 algorithm improves upon the first by adding another measurement that captures how inconsistent a player is. This captures how not everyone can be 100% all of the time. In the case of a card game like Magic, I’m hoping that this inconsistency score will help capture the randomness of the game and how matches could be determined by the shuffle.
Implementation
My code can be found at this Gist, which also includes example output
My implementation was based heavily on this post I found on Github and the associated implementation code. Their implementation is in Rust and my implementation is in Clojure. Translating the math and imperative calculations took a little bit, but the imperative calculations were pretty easy to turn into recursive loops with loop
/recur
. And reading code was easier for me than reading the math in the original paper. I need to work on that.
Originally, my code would get stuck in an infinite loop because Java’s Math functions will silently return +/- Infinity if a calculation overflows, poisoning all of your calculations. After finding the problem, I added the exp-or-throw!
function to call Math/exp
and then throw if the calculation overflowed.
Now that I identified the overflow problem, I had to figure out why it was happening. Luckily, that was easily solved after reading through Glickman’s paper.
The system constant, τ , which constrains the change in volatility over time, needs to be set prior to application of the system. Reasonable choices are between 0.3 and 1.2, though the system should be tested to decide which value results in greatest predictive accuracy. Smaller values of τ prevent the volatility measures from changing by large amounts, which in turn prevent enormous changes in ratings based on very improbable results. If the application of Glicko-2 is expected to involve extremely improbable collections of game outcomes, then τ should be set to a small value, even as small as, say, τ = 0.2.
Bingo. A system where one deck loses every match and then upsets an undefeated deck definitely has high volatility. The default value of 0.75 that I copied from the other codebase would not work here. So I tried it set to 0.3 and all the calculations completed successfully, no throws or infinite loops.
Another problem: the Glicko-2 system expects to be calculated in batches. Player ratings aren’t updated every match, instead the calculations are batched up to reduce the amount of compute necessary. So for my league I needed to decide what the cut-off for a ranking period should be. Do I do weekly? Monthly? After a certain number of matches? The paper recommends having 10s of matches in your batches, but I was not going to reach that volume for weeks. So I decided to make every individual game a ranking period. It might lead to more instability in the system, but with the small dataset I’m not worried about the small amount of compute it takes to recalculate everything if I ever need to change the algorithm down the line.
I originally posted this article and my friend Justin commented “Huh, those deviation values all look the same, is that right?” This led me to read Glickman’s paper in more detail and realized that I had missed a crucial step! Glickman defines ratings in terms of the Glicko-1 scale, but you need to convert to the Glicko-2 scale before doing any ratings calculations. So I added the glicko1->2
and glicko2->1
functions and called them in the proper locations to fix things. Now my code mostly matches the examples in Glickman’s paper, but the results are slightly off because my code keeps more significant digits than Glickman does in his paper.
Results
So after all of that, how do I rank up against myself? Well, my poor squirrels are at the bottom, but I’m figuring them out and when to make sacrifices to invest in more power later. Looking forward to playing more!
| :date | :winner | :competitors |
|------------+-----------------+------------------------------------------|
| 4/13/2025 | Ahoy Mateys | ["Squirreled Away" "Ahoy Mateys"] |
| 04/13/2025 | Animated Army | ["Squirreled Away" "Animated Army"] |
| 4/13/2025 | Family Matters | ["Squirreled Away" "Family Matters"] |
| 4/15/2025 | Token Triumph | ["Squirreled Away" "Token Triumph"] |
| 4/16/2025 | Animated Army | ["Animated Army" "Draconic Destruction"] |
| 4/19/2025 | Chaos Incarnate | ["Squirreled Away" "Chaos Incarnate"] |
| 4/20/2025 | Squirreled Away | ["Squirreled Away" "Animated Army"] |
| 4/20/2025 | Blood Rites | ["Squirreled Away" "Blood Rites"] |
| :deck | :wins | :losses | :rating | :deviation | :volatility |
|----------------------+-------+---------+---------------+--------------+-------------|
| Ahoy Mateys | 1 | 0 | 1742.61579248 | 356.01494283 | 0.06000712 |
| Blood Rites | 1 | 0 | 1689.26088956 | 354.52165100 | 0.06000991 |
| Family Matters | 1 | 0 | 1589.72018038 | 353.79419606 | 0.06003010 |
| Token Triumph | 1 | 0 | 1566.50186649 | 353.02444491 | 0.06005117 |
| Chaos Incarnate | 1 | 0 | 1552.77594408 | 352.30947683 | 0.06007820 |
| Animated Army | 2 | 1 | 1349.44784567 | 359.85592874 | 0.06003945 |
| Draconic Destruction | 0 | 1 | 1320.60003938 | 355.04057534 | 0.06016293 |
| Squirreled Away | 1 | 6 | 1181.45459587 | 371.58782624 | 0.06140619 |
P.S. - Bloomburrow Art
Only slightly related, but if you haven’t seen the art for the Bloomburrow set, a lot of it just makes me happy. Here’s a way too long list of examples because I couldn’t narrow it down further.
- Bakersbane Duo
- Bria, Riptide Rogue
- Bushy Bodyguard
- Calamity of Cinders
- Daggerfang Duo
- Esika’s Chariot
- Gev, Scaled Scorch
- Glarb, Calamity’s Augur
- Harnesser of Storms
- Hazel’s Brewmaster
- Heartfire Hero
- Jacked Rabbit
- Kindlespark Duo
- Long River Lurker
- Muerra, Trash Tactician
- Run Away Together
- Stormcatch Mentor
- Take Out the Trash
- Tempest Angler
- Tender Wildguide
- The Infamous Cruelclaw
- Thickest in the Thicket
- Treetop Sentries
- Valley Rotcaller
- Vinereap Mentor
- Wandertale Mentor