What is the best database schema to represent an NCAA mens basketball bracket? Here is a link if you aren't familiar: http://www.cbssports.com/collegebasketball/mayhem/brackets/viewable_men
I can see several different ways you could model this data, with a single table, many tables, hard-coded columns, somewhat dynamic ways, etc. You need a way to model both what seed and place each team is in, along with each game and the outcome (and possibly score) of each. You also need a way to represent who plays who at what stage in the tournament.
In the spirit of March Madness, I thought this would be a good question. There are some obvious answers here, and the main goal of this question is to see all of the different ways you could answer it. Which way is best could be subjective to the language you are using or how exactly you are working with it, but try to keep the answers db agnostic, language agnostic and fairly high level. If anyone has any suggestions on a better way to word this question or a better way to define it let me know in the comments.
The natural inclination is to look at a bracket in the order the games are played. You read the traditional diagram from the outside in. But let's think of it the other way around. Each game is played between two teams. One wins, the other loses.
Now, there's a bit more to it than just this. The winners of a particular pair of games face off against each other in another game. So there's also a relationship between the games themselves, irrespective of who's playing in those games. That is, the teams that face off in each game (except in the first round) are the winners of two earlier games.
So you might notice that each game has two "child games" that precede it and determine who faces off in that game. This sounds exactly like a binary tree: each root node has at most two child nodes. If you know who wins each game, you can easily determine the teams in the "parent" games.
So, to design a database to model this, you really only need two entities: Team
and Game
. Each Game
has two foreign keys that relate to other Game
s. The names don't matter, but we would model them as separate keys to enforce the requirement that each game have no more than two preceding games. Let's call them leftGame
and rightGame
, to keep with the binary tree nomenclature. Similarly, we should have a key called parentGame
that tracks the reverse relationship.
Also, as I noted earlier, you can easily determine the teams that face off in each game by looking at who won the two preceding games. So you really only need to track the winner of each game. So, give the Game
entity a winner
foreign key to the Team
table.
Now, there's the small matter of seeding the bracket. That is, modeling the match-ups for the first round games. You could model this by having a Game
for each team in the overall competition where that team is the winner
and has no preceding games.
So, the overall schema would be:
Game:
winner: Team
leftGame: Game
rightGame: Game
parentGame: Game
other attributes as you see fit
Team:
name
other attributes as you see fit
Of course, you would add all the other information you'd want to the entities: location, scores, outcome (in case the game was won by forfeit or some other out of the ordinary condition).