Failure detection is an essential part of almost any functioning distributed system. There are many aspects that define the effectiveness of a failure detector in any particular environment model.
In this talk, the audience will learn about the theoretical foundations behind failure detectors: underlying concepts, crucial properties, and differences in guarantees. We will look at applications of failure detection in several practical use cases and dive into the specifics of their implementation in some known distributed systems.
The audience will walk away with clear intuition behind a variety of failure detection approaches, and understanding of how real-world products are applying them to support production distributed systems.
There are no prerequisites to attend this talk, as the speaker believes it's important to make content approachable and easy to understand for any level.
Program committee comment
This is an overview of failure detectors and their practical applications in distributed systems.