CDN High Availability

Ideas about points of failure and possible fixes in the CDN structure.

The room encoders are a spof (single point of failure) for every single stream. The same is often true for the network between the encoder and the master relay.

The master relay is currently a spof for all streams, which could be eliminated by having 2 redundant masters.

The following problems have to be solved to allow redundant encoders:

encoder-source:
1. enough bandwidth: both masters pull from the same encoder
2. low bandwidth: encoder pushes to first reachable master
transcode:
1. Possibility A: single transcode with fallback source and dual outputs: hard
2. Possibility B: transcode twice: easy, but has negative impacts on fanout
fanout:
1. easy, each master does it's own fanout locally
2. however, if transcodes are separate the source streams may not match
downstream-relays:
1. icecast: hard, has to switch master source
  1. reconfigure on the fly? e.g. by using DNS
2. nginx: easy, can use multiple upstreams
  1. however if the fanouts are not guaranteed to be equal the upstreams should not be load balanced!

Multiple public relays exist and in case of relay failure only the viewers on that relay will be impacted. However at the moment the streams of these viewers will not properly recover.

The webplayers should automatically retry on another relay if the current connection fails. Currently the load-balancers determine the viewer→relay assignment. The client will need to fetch another redirect from the loadbalancer (use exponential backoff to avoid crashing the lbs).

CDN High Availability

Encoder

Master-Relay

Redundant Masters

Public Relay

Auto-Recovering Webclient