Ideas about points of failure and possible fixes in the CDN structure.
The room encoders are a spof (single point of failure) for every single stream. The same is often true for the network between the encoder and the master relay.
The master relay is currently a spof for all streams, which could be eliminated by having 2 redundant masters.
The following problems have to be solved to allow redundant encoders:
Multiple public relays exist and in case of relay failure only the viewers on that relay will be impacted. However at the moment the streams of these viewers will not properly recover.
The webplayers should automatically retry on another relay if the current connection fails. Currently the load-balancers determine the viewer→relay assignment. The client will need to fetch another redirect from the loadbalancer (use exponential backoff to avoid crashing the lbs).