Killing Off Failed Pet Projects
Letting go of stuff that just ain't gonna work
The Good Old Days
I have been interested in writing and using code to run office NCAA basket ball tournament pools since my early days working for Rockwell International job in the 90's. My first attempt at such software was written in C using AIX workstations we used for work there. This code is long lost. However, I remember what it did well: you could hand out an executable that would generate pool entries, you would collect these entries into a directory and then run the pool program to generate leaderboard reports as the games were played. It was great fun trash talking each other via email back in the day. One of the funnest parts of this program was a "possibilities report". This report would generate all possible tournament results, score all the entries against them and generate stats like percent chance to win, max score, which teams led to the most wins, etc. Again, a great vehicle for trash talking as the report was shared out via email.
The possibilities report is an O(2^n) problem where n is the number of remaining games to be played. Trying to generate such a report before any games have been played is a fool's errand. There are over 1.76 X 1015 possible outcomes and it would take 1 computer nearly 500,000 years to analyze them all!
Though I can't be 100% sure, I seem to remember that the old C program wouldn't let you even run the possibilities report once the first round was over. If I recall, it then took a few hours for that old code to generate the report. I am not sure, however, it may have made you wait until round 2 was complete.
I put NCAA tournament pool software away when I moved on from Rockwell. I spent the next years becoming a web developer. Eventually, I became a Ruby on Rails expert and while working at a startup in the late aughts, I decided to implement a Ruby on Rails web site for managing NCAA Backetball Tournament office pools for my company's NCAA BB Tournament pool. This entailed re-implementing what I remembered about the C program in ruby, then putting a Ruby on Rails website on top of it. But compared to C, Ruby is slow. To caclulate all the possibilities after round 1 finishes takes my original ruby code about 12 years. Even when I parallelized the calculations across processes, I couldn't improve this much beyond about a year. You have to wait until about 5 games are left in round 2 before the time is less than about the time it takes to play a game.
I upgraded the website through a version or two of Rails, then decided on a complete rewrite in an effort to make calculating the score of a bracket against a possible outcome bracket as fast as possible, thinking that was the problem with the ruby code. The rewrite was based on an idea I had where I thought if you represented a bracket entry as a bit field, where each bit represented which team in that game won (0 means the "first" team won, 1 means the "second" team won). The first round scoring is pretty fast: you just have to bitwise XOR the entry bracket and the tournament results bracket bitfields, and count the resulting 1's to determine how many picks are correct. Multiply that by the value of each correct pick in round 1 and you have that entry's round 1 score. However, beyond round 1, the bitfield idea really doesn't work since the teams playing in each game are different in each bracket. IE, a 0 in the entry bitfield isn't necessarily the same team as a 0 in the results bracket. You end up having to backtrack to previous rounds to know what teams are playing in the game beyond round 1. This new code could generate a possibilities report after round 1 after about 8,000 years or so. It was an utter failure compared to my first version which didn't do any bit twiddling at all.
I butted my head up against this problem a lot, thinking there had to be a complicated bit twiddling way to solve this that would result in order of magnitude increases in bracket scoring time. I never found any bit twiddling solutions. Here was my opportunity to re-frame the problem and try solving it in a new way. Instead, I just tried to get spot improvements by first re-writing the bracket scoring code in C and calling it using RNI (Ruby native interface) from the ruby code. This did give an OOM improvement, but just meant I was down to about 600 years. Still nowhere near version one. My next thought was to ditch RNI and write a stand alone C++ program that would score a bracket. This netted a 3x improvement to around 250 years. I wasn't making improvements that were meaningful, but I just couldn't let go of the idea that all that was needed was to optimize the hell out of the time it takes to score a bracket against the results. I even built in the ability to divide and conquer the scoring of all the results across multiple processes and stich the results together to generate the report. But I was stymied and no where near acheiving the holy grail goal of generating the possibilities report after round 1 in reasonable time and in fact never even acheived the performance of my original approach.
I never made any announcements about my efforts here, but here is the code. It serves as a grave marker of hubris and what happens by just not thinking clearly about a problem.
Back to Basics
I more or less gave up on this problem for a long time. My pet bit twiddling idea had a life of it's own and it didn't want to die. This situation persisted until recently when I started watching programmer streamers on twitch TV, particulary Tsoding. One of the techniques he talks about a lot is depth first searches and dynamic programming. After seeing him use this technique to solve some Advent of Code problems, I realized that the technique could be used in the possibilities report calculation for NCAA pool software. I wonder if the original C code at my job at Rockwell used this technique. If it was able to generate a possibilities run in a few hours, I think it must have used dynamic programming. If it was after round 2, maybe it was just a faster version of my ruby code.
At any rate, I proceeded to implement a new project in pure C using dynamic programming and depth first searches. And I succeeded in my goal. The key to the DFS approach is to not score each bracket against each possible result, which results in having to recalcuate each game score millions of times. Rather, you can build up the scores of each bracket in a DFS recursion so you only have to calculate the score of each possible game once per bracket.
In the end, the new code can generate a full possibilities report after round 1 is complete on one computer in less than 1 hour. This is about an OOM better than the C code I ran in the 90's at Rockwell (assuming it was generating a report after round 1). However, in these 20 or so years (my laptop is several years old by now), Moore's law dictates the code should be 220 times faster (about 1.05 million times faster). So I suspect that old C code only worked after round 2.
I even was able to use the parallelization techniques to run the possibilities report across multiple processes to make this time even less with multiple CPU cores. And finally, the stretch goal of running this report after day 1 of the tournament is with reach if you have a compute cloud of 8,192 machines available.
If you are interested, the new version is available on Github: Tournament Version 3.