This is a brief summary of the repo https://github.com/jl4730/SpanningTreeAlgo, which developed a simplified distributed version of the https://en.wikipedia.org/wiki/Spanning_Tree_Protocol that can be run on an arbitrary layer 2 topology.
1 Why spanning tree algorithm is needed?
Using bridges to connect LANs fails if the network topology results in loops (cycles). In that case, the bridges loop through packets forever. The answer to this problem is excluding links that lead to loops by running the spanning tree algorithm. The goal of the spanning tree algorithm is to have the bridges select which links (ports) to use for forwarding eliminating loops.
2 How the algorithm works?
The algorithm runs in “rounds” and at every round, each node sends to each neighbor node a configuration message with three fields: a) the sending node’s ID, b) the ID of the roots as perceived by the sending node, and c) the number of hops between that (perceived) root and the sending node.
At every round, each node keeps track of the best configuration message that it has received so far, and it compares that against the configuration messages it receives from neighboring nodes at that round.
At the very first round of the algorithm, every node thinks that it is the root. So for a node with an ID 3 for example, the node sends a configuration message < 3, 3, 0> to its neighbors. Note that the distance of the node from itself (perceived root) is 0.
So how does a node compare two configuration messages? Between two configurations, a node selects one configuration as better if: a) The root of the configuration has a smaller ID, or if b) The roots have equal IDs, but one configuration indicates the smaller distance from the root, or if c) Both roots IDs are the same and the distances are the same, then the node breaks the tie by selecting the configuration of the sending node that has with the smallest ID. In addition, a node stops sending configuration messages over a link (port), when the node receives a configuration message that indicates that it is not the root, eg when it receives a configuration message from a neighbor that: a) either closer to the root, or b) it has the same distance from the root, but it has a smaller ID.
Take the following topology as an example.
At the beginning, every switch thought to be the root and the distance is 0. So the initial messages will look like below. Note the paththrough are all False since there is no need to “paththrough” the switchThrough to get to the origin.
Then the messages will be processed in a FIFO way. Each destination switch will need to process the message received. Note the switch only knows about itself and its neighbors (distributed system) and has no clue of the big picture.
The first scenario will be the message has a smaller root than self.root:
When this happens we need to update “everything” and share the updated information with the neighbors:
If the message shares the same root as self.root then there is no need to update the root. However, we need to check the distance and see if we want to update the switchthrough as in the following case:
If both the root and the distance are the same, we need to check if the current switchThrough is the smaller than the message.origin in the following example:
Lastly, if the roots are the same but the distance is actually bigger than what’s currently stored, we need to check if the message.pathThrough == True and then just to link the neighbor to activelinks as this neighbor relies on this switch to the root. There is no need to notify neighbors in this scenario:
4 Personal takeaway
The biggest takeaway is the beauty of the distributed system. If every switch takes care of itself then jointly the entire organization will be alright. The same goes for lots of things, from as small as a school to as big as a country…