CSML Master Class with Carlos Guestrin and Emily Fox
Thursday, 2 July 2015 at 12:00 - Friday, 3 July 2015 at 14:00 (BST)
Carlos Guestrin and Emily Fox will be visiting the Computational Statistics and Machine Learning Centre at UCL from 2-3 July 2015 to give a Masterclass series.Three talks will take place over the 2 days. Lunch will be provided on both days after the lunchtime talk. We only require registration for the first talk on the Thursday, you can attend the other two talks without registering.
We are also hoping to schedule some meetings with Carlos and Emily. If you would like to setup a meeting, please email Rebecca Martin firstname.lastname@example.org.
The CSML Master Class series is sponsored by Google DeepMind
Thursday 2 July 2015 at 12.00 in Torrington (1-19) 115 Galton LT
Machine Learning at Scale: Big Data with Small Clusters
Carlos Guestrin, University of Washington
Machine learning has become the hottest topic in computing. Industries are being disrupted by intelligent applications that use ML at their core. From e-commerce, through movie streaming, to taxis, new companies that rely on ML are displacing old incumbents. And, these applications require the training of models on ever-increasing data set sizes. Thus, a significant amount of effort has been devoted to running these methods on very large computer clusters, at significant financial cost and endless headaches.
In this talk, we will build on a series of systems for ML (including GraphLab, GraphChi and SFrames) to describe a design strategy for scaling up machine learning algorithms. In particular, we will demonstrate that a small cluster or even a single machine, with the right systems, data layout and algorithms, can note only outperform large clusters, on very large real world problems. We will also explore algorithmic designs for ML which, when combine with such systems, can make the techniques accessible to non-ML experts who want to build ML-infused applications and potentially disrupt new markets.
Lunch will be provided after this talk in Room 102
Thursday 2 July 2015 at 16.00 in MPEB 1.02
Leveraging Optimization Techniques to Scale Bayesian Inference
Emily Fox, University of Washington
Data streams of increasing complexity are being collected in a variety of fields ranging from neuroscience, genomics, and environmental monitoring to e-commerce based on technologies and infrastructures previously unavailable. With the advent of Markov chain Monte Carlo (MCMC) combined with the computational power to implement such algorithms, deploying increasingly expressive models has been a focus in recent decades. Unfortunately, traditional algorithms for Bayesian inference in these models such as MCMC and variational inference do not typically scale to the large datasets encountered in practice. Likewise, these algorithms are not applicable to the increasingly common situation where an unbounded amount of data arrive as a stream and inferences need to be made on-the-fly. In this talk, we will present a series of algorithms— stochastic gradient Hamiltonian Monte Carlo, HMM stochastic variational inference, and streaming Bayesian nonparametric inference— to address various aspects of the challenge in scaling Bayesian inference; our algorithms focus on deploying stochastic gradients and working within an optimization framework. We demonstrate our methods on a variety of applications including online movie recommendations, segmenting a human chromatin data set with 250 million observations, and clustering a stream of New York Times documents.
Friday 3 July 2015 at 12.00 in Roberts 508
Large-Scale Distributed Optimization for Machine Learning:
Constraint Impact Lower Bounds for Efficient & Principled Algorithms
Carlos Guestrin, University of Washington
Working set methods and screening rules can radically improve convergence times for sparse and constrained optimization. By reducing optimization to a sequence of small subproblems, working set methods achieve fast convergence times for many challenging problems. Despite excellent performance, theoretical understanding of working sets is limited, and implementations often resort to heuristics to determine subproblem size, makeup, and stopping criteria.
In this talk, we will first present BLITZ, a fast working set algorithm accompanied by useful guarantees. Making no assumptions on data, our theory relates subproblem size to progress toward convergence. We will then generalize the proof path for BLITZ with the concept of a constraint’s impact lower bound (ILBO). The ILBO provides a recipe for the design of new optimization algorithms with theoretical guarantees for a wide range of optimization problems. Applied to a range of real-world, large-scale optimization problems, this approach convincingly outperforms existing solvers in sequential, limited-memory, and distributed settings.
When & Where
Centre for Computational Statistics and Machine Learning (CSML), University College London
The Centre for Computational Statistics and Machine Learning (CSML) spans three departments at University College London, Computer Science, Statistical Science, and the Gatsby Computational Neuroscience Unit. The Centre will pioneer an emerging field that brings together statistics, the recent extensive advances in theoretically well-founded machine learning, and links with a broad range of application areas drawn from across the college, including neuroscience, astrophysics, biological sciences, complexity science, etc. There is a deliberate intention to maintain and cultivate a plurality of approaches within the centre including Bayesian, frequentist, on-line, statistical, etc.