Stanford MLSys Seminar Series
- This semester our talks are at 1 PM PT, starting Thursday, January 21st – see you there!
- Join our email list to get notified of the speaker and livestream link every week!
Machine learning is driving exciting changes and progress in computing. What does the ubiquity of machine learning mean for how people build and deploy systems and applications? What challenges does industry face when deploying machine learning systems in the real world, and how can academia rise to meet those challenges?
In this seminar series, we want to take a look at the frontier of machine learning systems, and how machine learning changes the modern programming stack. Our goal is to help curate a curriculum of awesome work in ML systems to help drive research focus to interesting questions.
Starting in Fall 2020, we’ll be livestreaming each talk in this seminar series Thursdays 1-2 PT on YouTube, and taking questions from the live chat, and videos of the talks will be available on YouTube afterwards as well. Give our channel a follow, and tune in every week for an exciting discussion!
Read about our motivation for starting this seminar.
Check out our introductory video:
AbstractDeep learning is computation-hungry and data-hungry. We aim to improve the computation efficiency and data efficiency of deep learning. I will first talk about MCUNet that brings deep learning to IoT devices. The technique is tiny neural architecture search (TinyNAS) co-designed with a tiny inference engine (TinyEngine), enabling ImageNet-scale inference on an IoT device with only 1MB of FLASH. Next I will talk about TinyTL that enables on-device training, reducing the memory footprint by 7-13x. Finally, I will describe Differentiable Augmentation that enables data-efficient GAN training, generating photo-realistic images using only 100 images, which used to require tens of thousand of images. We hope such TinyML techniques can make AI greener, faster, and more sustainable.
AbstractMy students and I often find ourselves as "subject matter experts" needing to create video understanding models that serve computer graphics and video analysis applications. Unfortunately, like many, we are frustrated by how a smart grad student, armed with a large *unlabeled* video collection, an palette of pre-trained models, and an idea of what novel object or activity they want to detect/segment/classify, requires days-to-weeks to create and validate a model for their task. In this talk I will discuss challenges we've faced in the iterative process of curating data, training models, and validating models for the specific case of rare events and categories in image and video collections. In this regime we've found that conventional wisdom about training on imbalance data sets, and data acquisition via active learning does not lead to the most efficient solutions. I'll discuss these challenges in the context of image and video analysis applications, and elaborate on our ongoing vision of how a grad student, armed with massive amounts of unlabeled video data, pretrained models, and available-in-seconds-supercomputing-scale elastic compute should be able to interactively iterate on cycles of acquiring training data, training models, and validating models.
AbstractBayesian optimization has become a powerful method for the sample-efficient optimization of expensive black-box functions. These functions do not have a closed-form and are evaluated for example by running a complex economic simulation, by an experiment in the lab or in a market, or by a CFD simulation. Use cases arise in machine learning, e.g., when tuning the configuration of an ML model or when optimizing a reinforcement learning policy. Examples in engineering include the design of aerodynamic structures or materials discovery. In this talk I will introduce the key ideas of Bayesian optimization and discuss how they can be applied to tuning ML models. Moreover, I will share some experiences with developing a Bayesian optimization service in industry.
AbstractJAX is a system for high-performance machine learning research and numerical computing. It offers the familiarity of Python+NumPy together with hardware acceleration, plus a set of composable function transformations: automatic differentiation, automatic batching, end-to-end compilation (via XLA), parallelizing over multiple accelerators, and more. JAX's core strength is its guarantee that these user-wielded transformations can be composed arbitrarily, so that programmers can write math (e.g. a loss function) and transform it into pieces of an ML program (e.g. a vectorized, compiled, batch gradient function for that loss). JAX had its open-source release in December 2018 (https://github.com/google/jax). It's used by researchers for a wide range of applications, from studying training dynamics of neural networks, to probabilistic programming, to scientific applications in physics and biology.
AbstractThis talk covers what it means to operationalize ML models. It starts by analyzing the difference between ML in research vs. in production, ML systems vs. traditional software, as well as myths about ML production. It then goes over the principles of good ML systems design and introduces an iterative framework for ML systems design, from scoping the project, data management, model development, deployment, maintenance, to business analysis. It covers the differences between DataOps, ML Engineering, MLOps, and data science, and where each fits into the framework. It also discusses the main skills each stage requires, which can help companies in structuring their teams. The talk ends with a survey of the ML production ecosystem, the economics of open source, and open-core businesses.
AbstractOne of the key bottlenecks in building machine learning systems is creating and managing the massive training datasets that today's models require. In this talk, I will describe our work on Snorkel (snorkel.org), an open-source framework for building and managing training datasets, and describe three key operators for letting users build and manipulate training datasets: labeling functions, for labeling unlabeled data; transformation functions, for expressing data augmentation strategies; and slicing functions, for partitioning and structuring training datasets. These operators allow domain expert users to specify machine learning (ML) models entirely via noisy operators over training data, expressed as simple Python functions---or even via higher level NL or point-and-click interfaces---leading to applications that can be built in hours or days, rather than months or years, and that can be iteratively developed, modified, versioned, and audited. I will describe recent work on modeling the noise and imprecision inherent in these operators, and using these approaches to train ML models that solve real-world problems, including recent state-of-the-art results on benchmark tasks and real-world industry, government, and medical deployments.
AbstractA defining characteristic of federated learning is the presence of heterogeneity, i.e., that data and compute may differ significantly across the network. In this talk I show that the challenge of heterogeneity pervades the machine learning process in federated settings, affecting issues such as optimization, modeling, and fairness. In terms of optimization, I discuss FedProx, a distributed optimization method that offers robustness to systems and statistical heterogeneity. I then explore the role that heterogeneity plays in delivering models that are accurate and fair to all users/devices in the network. Our work here extends classical ideas in multi-task learning and alpha-fairness to large-scale heterogeneous networks, enabling flexible, accurate, and fair federated learning.
AbstractAlthough enterprise adoption of machine learning is still early on, many enterprises in all industries already have hundreds of internal ML applications. ML powers business processes with an impact of hundreds of millions of dollars in industrial IoT, finance, healthcare and retail. Building and operating these applications reliably requires infrastructure that is different from traditional software development, which has led to significant investment in the construction of “ML platforms” specifically designed to run ML applications. In this talk, I’ll discuss some of the common challenges in productionizing ML applications based on experience building MLflow, an open source ML platform started at Databricks. MLflow is now the most widely used open source project in this area, with over 2 million downloads a month and integrations with dozens of other products. I’ll also highlight some interesting problems users face that are not covered deeply in current ML systems research, such as the need for “hands-free” ML that can train thousands of independent models without direct tuning from the ML developer for regulatory reasons, and the impact of privacy and interpretability regulations on ML. All my examples will be based on experience at large Databricks / MLflow customers.
AbstractWe will present CheckList, a task-agnostic methodology and tool for testing NLP models inspired by principles of behavioral testing in software engineering. We will show a lot of fun bugs we discovered with CheckList, both in commercial models (Microsoft, Amazon, Google) and research models (BERT, RoBERTA for sentiment analysis, QQP, SQuAD). We'll also present comparisons between CheckList and the status quo, in a case study at Microsoft and a user study with researchers and engineers. We show that CheckList is a really helpful process and tool for testing and finding bugs in NLP models, both for practitioners and researchers.
About The Seminar
This seminar is being run by Piero Molino, Dan Fu, Karan Goel, Fiodar Kazhamakia, Matei Zaharia, and Chris Ré. You can reach us at sysmlstanfordseminar [at] gmail.