Machine learning has made considerable progress over the past decade, matching and even surpassing human performance on a varied set of narrow computational tasks. This progress has been enabled by the widespread availability of large datasets, as well as by improved algorithms and models. Distribution, implemented either through single-node concurrency or through multi-node parallelism has been the third key ingredient for these advances.
The goal of this talk is to provide an overview of the role of distributed computing in machine learning, with an eye towards the intriguing trade-offs between synchronization and communication costs of distributed machine learning algorithms, on the one hand, and their convergence, on the other. The focus will be on parallelization strategies for the fundamental stochastic gradient descent (SGD) algorithm, which is a key tool when training machine learning models, from venerable linear regression to state-of-the-art neural network architectures. Along the way, we will provide an overview of the ongoing research and open problems in distributed machine learning. The talk will assume no prior knowledge of machine learning or optimization, beyond familiarity with basic concepts in algebra and analysis.
Program committee comment
Dan Alistarh is one of the young rockstars of concurrent and distributed computing. Nowadays, Dan works on machine learning.
Download presentation