Stanford MLSys Seminar Series


Machine learning is driving exciting changes and progress in computing. What does the ubiquity of machine learning mean for how people build and deploy systems and applications? What challenges does industry face when deploying machine learning systems in the real world, and how can academia rise to meet those challenges?

In this seminar series, we want to take a look at the frontier of machine learning systems, and how machine learning changes the modern programming stack. Our goal is to help curate a curriculum of awesome work in ML systems to help drive research focus to interesting questions.

We started livestreaming each talk in this seminar series every week on YouTube in Fall 2020, and we’ve been going strong ever since! Every week we take questions from the live chat, and keep videos of the talks available on YouTube afterwards as well. Give our channel a follow, and tune in every week for an exciting discussion!

Read about our motivation for starting this seminar.

Check out our introductory video:

Upcoming Talks

Coming soon!

Previous Talks

Roman Kazinnik
Machine Learning in Production: Review of Empirical Solutions
Abstract Taking stock of ML Infra problems with potential to benefit from systematic analysis. ML currently requires running large amounts experiments to compensate for the lack of analysis. Modern AI infrastructure (major clouds) is efficient in creating, training, and deploying thousands of model. At the same time, improving production models performance, accurate estimation of models performance in production, web data relevance, risk mitigation - these are ad hoc and experiment-driven processes. Analytical analysis for Production [distributed, large-scale, rapidly changing environment] ML can help to direct and hopefully replace the empirical and manual processes.

Bio: Roman Kazinnik is working at Meta on the AI Platform team. He is an experienced computer programmer passionate about empirical and theoretical work. He worked on creating models for deep Earth oil exploration and stock trading throughout his career. He is a recipient of the best paper award of the European Assoc. of Computer Graphics, and he did his Master's at Technion and Ph.D. at Tel Aviv University, Israel.

Livestream Link
Alkis Polyzotis
What can Data-Centric AI Learn from Data and ML Engineering?
Abstract Data-centric AI is a new and exciting research topic in the AI community, but many organizations already build and maintain various "data-centric" applications whose goal is to produce high quality data. These range from traditional business data processing applications (e.g., "how much should we charge each of our customers this month?") to production ML systems such as recommendation engines. The fields of data and ML engineering have arisen in recent years to manage these applications, and both include many interesting novel tools and processes. In this talk we present lessons from data and ML engineering that could be interesting to apply in data-centric AI, based on our experience developing data and ML platforms that serve thousands of applications at a range of organizations. In particular, we will discuss lessons related to data monitoring and the challenges to apply it effectively in production ML systems.

Bio: Neoklis (Alkis) Polyzotis is a software engineer at Databricks, working on the intersection of data management and ML. Prior to that, he was a research scientist at Google and a professor at UC Santa Cruz. He received his PhD from the U of Wisconsin at Madison.

Video Link
Xinyu Hu and Olcay Cirit
DeepETA: How Uber Predicts Arrival Times Using Deep Learning
Abstract Estimated Time of Arrival (ETA) plays an important role in delivery and ride-hailing platforms. For example, Uber uses ETAs to calculate fares, estimate pickup times, match riders to drivers, plan deliveries, and more. Commonly used route planning algorithms predict an ETA conditioned on the best available route, but such ETA estimates can be unreliable when the actual route taken is not known in advance. In this talk, we describe an ETA post-processing system in which a deep residual ETA network (DeepETA) refines naive ETAs produced by a route planning algorithm. Offline experiments and online tests demonstrate that post-processing by DeepETA significantly improves upon the accuracy of naive ETAs as measured by mean and median absolute error. We further show that post-processing by DeepETA attains lower error than competitive baseline regression models.

Bio: Xinyu Hu is a Senior Research Scientist at Uber, focusing on large-scale machine learning applications in spatial-temporal problems and causal inference. She currently works on projects in personalized incentives targeting, including user promotion targeting, spatial-temporal paid movement targeting, etc.. Prior to Uber, Xinyu graduated from Columbia University with a Ph.D. in Biostatistics. Olcay Cirit is a Staff Research Scientist at Uber AI focused on ML systems and large-scale deep learning problems. Prior to Uber AI, he worked on ad targeting at Google.

Video Link
Arjun Akula
Improving Robustness and Interpretability in Vision and Language Grounding Models
Abstract Deep neural networks have enabled significant progress on many multi-modal grounding problems such as visual question answering (VQA), referring expression recognition (REF) which has several important applications such as in navigation, medical imaging, robotics and accessibility. In the last few years we have seen a huge improvement in how these models perform, some of them reaching human-level performance on several datasets. However, we find that these models could be exploiting strong biases in these datasets casting doubts on the actual progress. For example, as a human, do you focus on the same visual object when you hear the sentences “the bus in the middle of the crowd” and “the crowd that the bus is in the middle of”? Neural networks do so. The exciting progress on understanding language in the context of an image is not due to the cleverness of the neural networks, but rather because of the shortcuts present in the evaluation datasets. In this talk, we show that state-of-the-art neural network approaches are easily fooled due to their failure in overcoming biases in the training datasets. We also show that the recent self-supervised BERT based multi-modal architectures (e.g. ViLBERT) are relatively more robust compared to other neural architectures. We propose methods to improve robustness (and generalization) of the current models. We show that while data augmentation is one way to increase robustness, multi-task learning is probably a less tedious route. Finally, we describe a mechanism for producing scalable and nonstationary benchmarks (and out-of-distribution hard splits) for testing the generalization capabilities of existing grounding models.

Bio: Arjun Akula is a Research Scientist at Google AI in Mountain View. He got his PhD from UCLA, jointly advised by Prof. Song-Chun Zhu (UCLA) and Prof. Joyce Chai (UMich). His research interests are in computer vision, natural language processing, and deep learning, with the focuses on multi-modal grounding. Specifically, he works on identifying biases in state-of-the-art datasets and models, improving robustness of vision and language grounding models to out-of-distribution and adversarial inputs. He also works on making the underlying reasoning process of deep learning models more transparent and interpretable to human users. During his PhD, he interned at Amazon Alexa AI (Sunnyvale, CA), Google Research (Los Angeles, CA), Amazon AI (Palo Alto, CA) and Mila (Montreal). Prior to his PhD, he worked as a research software engineer at IBM Research AI (India) for 2.5 years. He did his Bachelors and Masters in Computer Science and Engineering from IIIT Hyderabad, India. He is an active member of the academic community serving as a reviewer/program committee member of ACL, CVPR, ARR, EMNLP, ICCV, AAAI, ECCV, NeurIPS and NAACL. Outside of work, he enjoys hiking, traveling, and playing Table Tennis. Here is a link to his personal website:

Video Link
Dan Fu
Improving Transfer and Robustness of Supervised Contrastive Learning
Abstract An ideal learned representation should display transferability and robustness. Supervised contrastive learning is a promising method for training accurate models, but produces representations that do not capture these properties due to class collapse -- when all points in a class map to the same representation. In this talk, we discuss how to alleviate these problems to improve the geometry of supervised contrastive learning. We identify two key principles: balancing the right amount of geometric "spread" in the embedding space, and inducing an inductive bias towards subclass clustering. We introduce two mechanisms for achieving these aims in supervised contrastive learning, and show that doing so improves transfer learning and worst-group robustness. Next, we show how we can apply these insights to improve entity retrieval in open-domain NLP tasks (e.g., QA, search). We present a new method, TABi, that trains bi-encoders with a type-aware supervised contrastive loss and improves long-tailed entity retrieval.

Bio: Dan Fu is a PhD student in the Computer Science Department at Stanford University, where he is co-advised by Christopher Ré and Kayvon Fatahalian. His research focuses on understanding the principles behind why machine learning methods work and using that understanding to build the next generation of ML systems. He is supported by a Department of Defense NDSEG fellowship. Outside of work, he moonlights as a scuba diver and a competitive ballroom dancer.

Video Link
Kexin Rong
Learned Indexing and Sampling for Improving Query Performance in Big-Data Analytics
Abstract Traditional data analytics systems improve query efficiency via fine-grained, row-level indexing and sampling techniques. However, to keep up with the data volumes, increasingly many systems store and process datasets in large partitions containing hundreds of thousands of rows. Therefore, these analytics systems must adapt traditional techniques to work with coarse-grained data partitions as a basic unit to process queries efficiently. In this talk, I will discuss two related ideas that combine learning techniques with partitioning designs to improve the query efficiency in the analytics systems. First, I will describe PS3, the first approximate query processing system that supports non-uniform, partition-level samples. PS3 reduces the number of partitions accessed by 3 to 70x to achieve the same error compared to a uniform sample of the partitions. Next, I will present OLO, an online learning framework that dynamically adapts data organization according to changes in query workload to minimize overall data access and movement. We show that dynamic reorganization outperforms a single, optimized partitioning scheme by up to 30% in end-to-end runtime. I will conclude by discussing additional open problems in this area.

Bio: Kexin Rong is a postdoctoral researcher at Vmware Research Group. Her research focuses on improving the efficiency and usability of large-scale data analytics. She received her Ph.D. in computer science from Stanford, advised by Peter Bailis and Philp Levis. She is joining Georgia Tech in the fall as an assistant professor in the School of Computer Science.

Video Link
Igor Markov
Looper: an end-to-end ML platform for product decisions
Abstract Modern software systems and products increasingly rely on machine learning models to make data-driven decisions based on interactions with users, infrastructure and other systems. For broader adoption, this practice must (i) accommodate product engineers without ML backgrounds, (ii) support fine-grain product-metric evaluation and (iii) optimize for product goals. To address shortcomings of prior platforms, we introduce general principles for and the architecture of an ML platform, Looper, with simple APIs for decision-making and feedback collection. Looper covers the end-to-end ML lifecycle from collecting training data and model training to deployment and inference, and extends support to personalization, causal evaluation with heterogenous treatment effects, and Bayesian tuning for product goals. During the 2021 production deployment Looper simultaneously hosted 440-1,000 ML models that made 4-6 million real-time decisions per second. We sum up experiences of platform adopters and describe their learning curve.

Bio: Igor L. Markov is a Research Scientist at Meta, previously an EECS professor at the University of Michigan. He received his Ph.D. in Computer Science from UCLA, is currently an IEEE Fellow and an ACM Distinguished Scientist. Prof. Markov researches computers that make computers. He has co-authored five books, four US patents, and over 200 refereed publications, some of which were honored by the best-paper awards at the Design Automation and Test in Europe Conference (DATE), the Int'l Symposium on Physical Design (ISPD), the Int'l Conference on Computer-Aided Design (ICCAD) and IEEE Trans. on Computer-Aided Design (TCAD). During the 2011 redesign of the ACM Computing Classification System, Prof. Markov led the effort on the Hardware tree. Prof. Markov is the recipient of a DAC Fellowship, an ACM SIGDA Outstanding New Faculty award, an NSF CAREER award, an IBM Partnership Award, a Microsoft A. Richard Newton Breakthrough Research Award, and the inaugural IEEE CEDA Early Career Award. He has served on the Executive Board of ACM SIGDA and Editorial Boards of several ACM and IEEE Transactions, Communications of the ACM and IEEE Design & Test.

Video Link
Zhuohan Li
Alpa: Automated Model-Parallel Deep Learning
Abstract Alpa ( automates model-parallel training of large deep learning models by generating execution plans that unify data, operator, and pipeline parallelism. Alpa distributes the training of large deep learning models by viewing parallelisms as two hierarchical levels: inter-operator and intra-operator parallelisms. Based on it, Alpa constructs a new hierarchical space for massive model-parallel execution plans. Alpa designs a number of compilation passes to automatically derive the optimal parallel execution plan in each independent parallelism level and implements an efficient runtime to orchestrate the two-level parallel execution on distributed compute devices. Alpa generates parallelization plans that match or outperform hand-tuned model-parallel training systems even on models they are designed for. Unlike specialized systems, Alpa also generalizes to models with heterogeneous architectures and models without manually-designed plans.

Bio: Zhuohan Li is a PhD student in Computer Science at UC Berkeley, advised by Prof. Ion Stoica. His interest lies in the intersection of machine learning and distributed systems in general. His recent research focuses on distributed model parallel training and inference. He completed his BS at Peking University and has interned at Microsoft Research, Anyscale, and Google Brain.

Video Link
Shruti Bhosale
Scaling Multilingual Machine Translation to Thousands of Language Directions
Abstract Existing work in translation has demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages. However, much of this work is English-Centric by training only on data which was translated from or to English. While this is supported by large sources of training data, it does not reflect translation needs worldwide. In this talk, I will describe how we create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages. We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining. Then, we explore how to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters to create high quality models. Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.

Bio: Shruti Bhosale is a Research Engineer at Facebook AI Research in Menlo Park, focusing on Natural Language Processing. She currently works on projects in massively multilingual machine translation and natural language understanding/generation. Her recent work includes many-to-many machine translation for 100 languages, BASE Layers and efficient large-scale language models with Mixture of Experts. She graduated with a Master's Degree in Computer Science from University of Texas at Austin. Prior to Facebook, Shruti built models for people recommendation systems at LinkedIn.

Video Link
Vijay Janapa Reddi
Tiny Machine Learning
Abstract Tiny machine learning (TinyML) is a fast-growing field at the intersection of ML algorithms and low-cost embedded systems. TinyML enables on-device analysis of sensor data (vision, audio, IMU, etc.) at ultra-low-power consumption (<1mW). Processing data close to the sensor allows for an expansive new variety of always-on ML use-cases that preserve bandwidth, latency, and energy while improving responsiveness and maintaining privacy. This talk introduces the vision behind TinyML and showcases some of the interesting applications that TinyML is enabling in the field, from wildlife conservation to supporting public health initiatives. Yet, there are still numerous technical hardware and software challenges to address. Tight memory and storage constraints, MCU heterogeneity, software fragmentation and a lack of relevant large-scale datasets pose a substantial barrier to developing TinyML applications. To this end, the talk touches upon some of the research opportunities for unlocking the full potential of TinyML.

Bio: Vijay Janapa Reddi is an Associate Professor at Harvard University, VP and a founding member of MLCommons (, a nonprofit organization aiming to accelerate machine learning (ML) innovation for everyone. He also serves on the MLCommons board of directors and is a Co-Chair of the MLCommons Research organization. He co-led the MLPerf Inference ML benchmark for data center, edge, mobile and IoT systems. Dr. Janapa-Reddi is a recipient of multiple honors and awards, including the National Academy of Engineering (NAE), Gilbreth Lecturer Honor and IEEE TCCA Young Computer Architect Award. He is passionate about widening access to applied machine learning for STEM, Diversity, and using AI for social good. He designed the Tiny Machine Learning (TinyML) series on edX, a massive open online course (MOOC) that sits at the intersection of embedded systems and ML that tens of thousands of global learners access and audit free of cost. He received a Ph.D. in CS from Harvard University, an M.S. from the University of Colorado at Boulder and a B.S from Santa Clara University.

Video Link
Fait Poms
A vision for interactive model development: efficient machine learning by bringing domain experts in the loop
Abstract Building computer vision models today is an exercise in patience--days to weeks for human annotators to label data, hours to days to train and evaluate models, weeks to months of iteration to reach a production model. Without tolerance for this timeline or access to the massive compute and human resources required, building an accurate model can be challenging if not impossible. In this talk, we discuss a vision for interactive model development with iteration cycles of minutes, not weeks. We believe the key to this is integrating the domain expert at key points in the model building cycle and leveraging supervision cues above just example-level annotation. We will discuss our recent progress toward aspects of this goal: judiciously choosing when to use the machine and when to use the domain expert for fast, low label budget model training (CVPR 2021, ICCV 2021), building confidence in model performance with low-shot validation (ICCV 2021 Oral), and some initial tools for rapidly defining correctness criteria.

Bio: Fait Poms is a Ph.D. student at Stanford advised by Prof. Kayvon Fatahalian and a Senior Applied Research Scientist at Snorkel.AI. Her research concerns designing algorithms and systems that enable domain experts to rapidly define, train, and validate computer vision models for specialized tasks. She has done research internships at Snorkel AI (with Braden Hancock and Alex Ratner), Facebook Reality Labs (with Yaser Sheikh, Chenglei Wu, and Shoou-I Yu), and NVIDIA Research (with Michael Garland and Michael Bauer), and has transferred her research into production at Snorkel AI and Facebook. Her work has appeared at CVPR, ICCV, and SIGGRAPH. Website:

Video Link
Doris Lee
Always-on Dataframe Visualizations with Lux
Abstract Visualizations help data scientists discover trends, patterns, identify outliers, and derive insights from their data. However, existing visualization libraries in Python require users to write a substantial amount of code for plotting even a single visualization, often hindering the flow of data exploration. In this talk, you will learn about Lux, a lightweight visualization tool on top of pandas dataframes. Lux recommends visualizations for free to users as they explore their data within a Jupyter notebook without the need to write additional code. Lux is used by data scientists across a variety of industries and sectors and has nearly 66k total downloads and over 3.3k stars on GitHub. For more information, see:

Bio: Doris Lee is the co-founder and CEO of Ponder. She graduated with her Ph.D. from the RISE Lab and School of Information at UC Berkeley in 2021. During this time, she developed several data science tools aimed at accelerating insight discovery, including Lux, a lightweight visualization tool on top of pandas dataframes. She is the recipient of the Facebook Ph.D. Fellowship in Systems for Machine Learning in 2020. More at:

Video Link
Ellie Pavlick
Implementing Symbols and Rules with Neural Networks
Abstract Many aspects of human language and reasoning are well explained in terms of symbols and rules. However, state-of-the-art computational models are based on large neural networks which lack explicit symbolic representations of the type frequently used in cognitive theories. One response has been the development of neuro-symbolic models which introduce explicit representations of symbols into neural network architectures or loss functions. In terms of Marr's levels of analysis, such approaches achieve symbolic reasoning at the computational level ("what the system does and why") by introducing symbols and rules at the implementation and algorithmic levels. In this talk, I will consider an alternative: can neural networks (without any explicit symbolic components) nonetheless implement symbolic reasoning at the computational level? I will describe several diagnostic tests of "symbolic" and "rule-governed" behavior and use these tests to analyze neural models of visual and language processing. Our results show that on many counts, neural models appear to encode symbol-like concepts (e.g., conceptual representations that are abstract, systematic, and modular), but not perfectly so. Analysis of the failure cases reveals that future work is needed on methodological tools for analyzing neural networks, as well as refinement of models of hybrid neuro-symbolic reasoning in humans, in order to determine whether neural networks' deviations from the symbolic paradigm are a feature or a bug.

Bio: Ellie Pavlick is an Assistant Professor of Computer Science at Brown University, where she leads the Language Understanding and Representation (LUNAR) Lab, and a Research Scientist at Google. Her research focuses on building computational models of language that are inspired by and/or informative of language processing in humans. Currently, her lab is investigating the inner-workings of neural networks in order to "reverse engineer" the conceptual structures and reasoning strategies that these models use, as well as exploring the role of grounded (non-linguistic) signals for word and concept learning. Ellie's work is supported by DARPA, IARPA, NSF, and Google.

Video Link
Cody Coleman
Data selection for Data-Centric AI: Data Quality Over Quantity
Abstract Data selection methods, such as active learning and core-set selection, improve the data efficiency of machine learning by identifying the most informative data points to label or train on. Across the data selection literature, there are many ways to identify these training examples. However, classical data selection methods are prohibitively expensive to apply in deep learning because of the larger datasets and models. This talk will describe two techniques to make data selection methods more tractable. First, "selection via proxy" (SVP) avoids expensive training and reduces the computation per example by using smaller proxy models to quantify the informativeness of each example. Second, "similarity search for efficient active learning and search" (SEALS) reduces the number of examples processed by restricting the candidate pool for labeling to the nearest neighbors of the currently labeled set instead of scanning over all of the unlabeled data. Both methods lead to order of magnitude performance improvements, making active learning applications on billions of unlabeled images practical for the first time.

Bio: Cody Coleman is the Founder and CEO of Coactive AI. He is also a co-creator of DAWNBench and MLPerf and a founding member of MLCommons. His work spans from performance benchmarking of machine learning systems to computationally efficient methods for active learning and core-set selection. He holds a PhD in Computer Science from Stanford University, where Professors Matei Zaharia and Peter Bailis advised him, and an MEng and BS from MIT.

Video Link
Bilge Acun
Designing Sustainable Datacenters with and for AI
Abstract Machine learning has witnessed exponential growth over the recent years. In this talk, we will first explore the environmental implications of the super-linear growth trend of AI from a holistic perspective, spanning data, algorithms, and system hardware. System efficiency optimizations can significantly help reducing the carbon footprint of AI systems. However, predictions show that the efficiency improvements will not be enough to reduce the overall resource needs of AI as Jevon's Paradox suggests "efficiency increases consumption". Therefore, we need to design our datacenters with sustainability in mind, using renewable energy every hour of every day. Relying on wind and solar energy 24/7 is challenging due to their intermittent nature. To cope with the fluctuations of renewable energy generation, multiple solutions can be applied such as energy storage and carbon aware scheduling for the workloads. In this talk, I will introduce a framework to analyze the multi-dimensional solution space by taking into account the operational and embodided footprint of the solutions and further how AI can be a part of the solution.

Bio: Bilge Acun is a Research Scientist at Meta AI (/FAIR). Her research lies in the intersection of energy efficient and sustainable system design and machine learning. Her work at Meta included making large scale machine learning systems more efficient through algorithmic and system optimizations. She received her Ph.D. degree in 2017 at the Department of Computer Science at University of Illinois at Urbana-Champaign. Her dissertation was awarded 2018 ACM SigHPC Dissertation Award Honorable Mention. Before joining FAIR, she worked at the IBM Thomas J. Watson Research Center as a Research Staff Member.

Video Link
Fred Sala
Efficiently Constructing Datasets for Diverse Datatypes
Abstract Building large datasets for data-hungry models is a key challenge in modern machine learning. Weak supervision frameworks have become a popular way to bypass this bottleneck. These approaches synthesize multiple noisy but cheaply-acquired estimates of labels into a set of high-quality pseudolabels for downstream training. In this talk, I introduce a technique that fuses weak supervision with structured prediction, enabling WS techniques to be applied to extremely diverse types of data. This approach allows for labels that can be continuous, manifold-valued (including, for example, points in hyperbolic space), rankings, sequences, graphs, and more. I will discuss theoretical guarantees for this universal weak supervision technique, connecting the consistency of weak supervision estimators to low-distortion embeddings of metric spaces. I will show experimental results in a variety of problems, including learning to rank, geodesic regression, and semantic dependency parsing. Finally I will present and discuss future opportunities for automated dataset construction.

Bio: Frederic Sala is an Assistant Professor in the Computer Sciences Department at the University of Wisconsin-Madison and a research scientist at Snorkel AI. His research studies the foundations of data-driven systems, with a focus on machine learning systems. Previously, he was a postdoctoral researcher in the Stanford CS department. He received his Ph.D. in Electrical Engineering from UCLA.

Video Link
Deepak Narayanan
Resource-Efficient Execution of Deep Learning Computations
Abstract Deep Learning models have enabled state-of-the-art results across a broad range of applications; however, training these models is extremely time- and resource-intensive, taking weeks on clusters with thousands of expensive accelerators in the extreme case. In this talk, I will describe two ideas that help improve the resource efficiency of model training. In the first half of the talk, I will discuss how pipelining can be used to accelerate distributed training. Pipeline parallelism facilitates model training with lower communication overhead than previous methods while still ensuring high compute resource utilization. Pipeline parallelism also enables the efficient training of large models that do not fit on a single worker; for example, we used pipeline parallelism at Nvidia to efficiently scale training to language models with a trillion parameters on 3000+ GPUs. In the second half of this talk, I will describe how resources in a shared cluster with heterogeneous compute resources (e.g., different types of hardware accelerators) should be partitioned among different users to optimize objectives specified over one or more training jobs. Heterogeneity-aware scheduling can improve various scheduling objectives, such as average completion time, makespan, or cloud computing resource cost, by up to 3.5x.

Bio: Deepak is a Senior Researcher in the Systems group at Microsoft Research Redmond. His broad research interests are in distributed systems and systems for Machine Learning. He graduated from Stanford with a Ph.D. in Computer Science in September 2021, where he was advised by Prof. Matei Zaharia.

Video Link
Beidi Chen
Pixelated Butterfly: Simple and Efficient Sparse Training for Neural Network Models
Abstract Overparameterized neural networks generalize well but are expensive to train. Ideally, one would like to reduce their computational cost while retaining their generalization benefits. Sparse model training is a simple and promising approach to achieve this, but there remain challenges as existing methods struggle with accuracy loss, slow training runtime, or difficulty in sparsifying all model components. The core problem is that searching for a sparsity mask over a discrete set of sparse matrices is difficult and expensive. To address this, our main insight is to optimize over a continuous superset of sparse matrices with a fixed structure known as products of butterfly matrices. As butterfly matrices are not hardware efficient, we propose simple variants of butterfly (block and flat) to take advantage of modern hardware. Our method (Pixelated Butterfly) uses a simple fixed sparsity pattern based on flat block butterfly and low-rank matrices to sparsify most network layers (e.g., attention, MLP). We empirically validate that Pixelated Butterfly is 3x faster than butterfly and speeds up training to achieve favorable accuracy--efficiency tradeoffs. On the ImageNet classification and WikiText-103 language modeling tasks, our sparse models train up to 2.5x faster than the dense MLP-Mixer, Vision Transformer, and GPT-2 medium with no drop in accuracy.

Bio: Beidi Chen is a Postdoctoral scholar in the Department of Computer Science at Stanford University, working with Dr. Christopher Ré. Her research focuses on large-scale machine learning and deep learning. Specifically, she designs and optimizes randomized algorithms (algorithm-hardware co-design) to accelerate large machine learning systems for real-world problems. Prior to joining Stanford, she received her Ph.D. in the Department of Computer Science at Rice University, advised by Dr. Anshumali Shrivastava. She received a BS in EECS from UC Berkeley in 2015. She has held internships in Microsoft Research, NVIDIA Research, and Amazon AI. Her work has won Best Paper awards at LISA and IISA. She was selected as a Rising Star in EECS by MIT and UIUC.

Video Link
Mosharaf Chowdhury
Systems Support for Federated Computation
Abstract Although theoretical federated learning research is growing exponentially, we are far from putting those theories into practice. In this talk, I will share our ventures into building practical systems for two extremities of federated learning. Sol is a cross-silo federated learning and analytics system that tackles network latency and bandwidth challenges faced by distributed computation between far-apart data sites. Oort, in contrast, is a cross-device federated learning system that enables training and testing on representative data distributions despite unpredictable device availability. Both deal with systems and network characteristics in the wild that are hard to account for in analytical models. I'll then share the challenges in systematically evaluating federated learning systems that have led to a disconnect between theoretical conclusions and performance in the wild. I'll conclude this talk by introducing FedScale, which is an extensible framework for evaluation and benchmarking in realistic settings to democratize practical federated learning for researchers and practitioners alike. All these systems are open-source and available at

Bio: Mosharaf Chowdhury is a Morris Wellman assistant professor of CSE at the University of Michigan, Ann Arbor, where he leads the SymbioticLab. His recent research is on application-infrastructure co-design for federated learning, resource disaggregation, and systems for AI and Big Data. In the past, Mosharaf invented coflows and was a co-creator of Apache Spark. Artifacts from his research are widely used in cloud datacenters. He has received many individual honors and awards as well as best-of-conference awards thanks to his amazing students and collaborators. He received his Ph.D. from the AMPLab at UC Berkeley in 2015.

Video Link
Zain Asgar
Data science for infrastructure using Pixie
Abstract Pixie is a Kubernetes-native observability platform which helps developers explore, monitor, secure and manage their applications. Pixie is a Cloud Native Computing Foundation Sandbox Project. Pixie utilizes eBPF to automatically collect telemetry data which is stored on edge nodes. This data is usable in Pixie via a Pandas like interface allowing construction of complex data workflows, including machine learning. This talk will provide an overview of Pixie, some of the problems that we solved, and future work we are looking into.

Bio: Zain Asgar is a GM/GVP – Pixie & Open Source @ New Relic. Prior to this Zain was to co-founder/CEO of Pixie Labs (acquired by New Relic). Zain is also an Adjunct Professor of Computer Science at Stanford University and was an Entrepreneur in Residence at Benchmark before co-founding Pixie. He has a PhD from Stanford and has helped build at-scale data and AI/ML at Google AI, Trifacta and Nvidia.

Video Link
Albert Gu
Efficiently Modeling Long Sequences with Structured State Spaces
Abstract A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies. Although conventional models including RNNs, CNNs, and Transformers have specialized variants for capturing long dependencies, they still struggle to scale to very long sequences of 10000 or more steps. We introduce a simple sequence model based on the fundamental state space representation $x'(t) = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t)$ and show that it combines the strengths of several model families. Furthermore, we show that the HiPPO theory of continuous-time memorization can be incorporated into the state matrix $A$, producing a class of structured models that handles long-range dependencies mathematically and can be computed very efficiently. The Structured State Space (S3) model achieves strong empirical results across a diverse range of established benchmarks, including (i) 91% accuracy on sequential CIFAR-10 with no data augmentation or auxiliary losses, on par with a larger 2-D ResNet, (ii) substantially closing the gap to Transformers on image and language modeling tasks, while performing generation 60X faster, (iii) SotA on every task from the Long Range Arena benchmark, including solving the challenging Path-X task of length 16k that all prior work fails on, while being as efficient as all competitors.

Bio: Albert Gu is a PhD student in the Stanford CS department, advised by Chris Ré. His research interests include algorithms for structured linear algebra and theoretical principles of deep sequence models.

Video Link
Baharan Mirzasoleiman
Data-efficient and Robust Learning from Massive Datasets
Abstract Large datasets have been crucial to the success of modern machine learning models. However, training on massive data has two major limitations. First, it is contingent on exceptionally large and expensive computational resources, and incurs a substantial cost due to the significant energy consumption. Second, in many real-world applications such as medical diagnosis, self-driving cars, and fraud detection, big data contains highly imbalanced classes and noisy labels. In such cases, training on the entire data does not result in a high-quality model. In this talk, I will argue that we can address the above limitations by developing techniques that can identify and extract the representative subsets for learning from massive datasets. Training on representative subsets not only reduces the substantial costs of learning from big data, but also improves their accuracy and robustness against noisy labels. I will discuss how we can develop theoretically rigorous techniques that provide strong guarantees for the quality of the extracted subsets, as well as the learned models’ quality and robustness against noisy labels. I will also show the effectiveness of such methods in practice for data-efficient and robust learning.

Bio: Baharan Mirzasoleiman is an Assistant Professor in the Computer Science Department at UCLA. Her research focuses on developing new methods that enable efficient machine learning from massive datasets. Her methods have immediate application to high-impact problems where massive data volumes prohibit efficient learning and inference, such as huge image collections, recommender systems, Web and social services, video and other large data streams. Before joining UCLA, she was a postdoctoral research fellow in Computer Science at Stanford University. She received her Ph.D. in Computer Science from ETH Zurich. She received an ETH medal for Outstanding Doctoral Thesis, and was selected as a Rising Star in EECS by MIT.

Video Link
Gideon Mendels
MLOps System Design for Development and Production
Abstract While ML model development is a challenging process, the management of these models becomes even more complex once they're in production. Shifting data distributions, upstream pipeline failures, and model predictions impacting the very dataset they’re trained on can create thorny feedback loops between development and production. In this talk, we’ll examine some naive ML workflows that don’t take the development-production feedback loop into account and explore why they break down. Next, we'll showcase some system design principles that will help manage these feedback loops more effectively. Finally, we’ll examine several industry case studies where teams have applied these principles to their production ML systems.

Video Link
Zhihao Jia
Automatically Discovering Machine Learning Optimizations
Abstract As an increasingly important workload, machine learning (ML) applications require different performance optimization techniques from traditional runtimes and compilers. In particular, to accelerate ML applications, it is generally necessary to perform ML computations on distributed heterogeneous hardware platforms and parallelize computations using multiple data dimensions, neither of which is even expressible in traditional compilers and runtimes. In this talk, I will present our recent work on automated discovery of performance optimizations for ML by leveraging the mathematical and statistical properties of ML computations. Compared to existing ML systems, our approaches enable faster ML training/inference and stronger correctness guarantees while requiring significantly less human effort.

Bio: Zhihao Jia is an assistant professor of computer science at Carnegie Mellon University. He obtained his Ph.D. from the computer science department at Stanford working with Alex Aiken and Matei Zaharia. His research interests lie in the intersection of computer systems and machine learning, with a focus on building efficient, scalable, and automated systems for ML computations.

Video Link
Baishakhi Ray
Improving Software Reliability using Machine Learning
Abstract Software bugs cost millions of dollars to the US economy. Improving software reliability has been one of the primary concerns of Software Engineering, Security, Programming Language, and Verification research over decades. Researchers developed numerous automatic bug-finding tools, either based on static code analysis or analyzing dynamic code behavior. However, the adoption of these methods in the real-world is still limited, partly because most of them require a significant amount of manual work from developers and have a steep learning curve. In this talk, I will discuss how machine learning-based approaches can help us to automate and scale up the bug-finding (especially with respect to fuzz-testing) and bug-fixing process for large real-world programs.

Bio: Baishakhi Ray is an Associate Professor in the Department of Computer Science, Columbia University, NY, USA. She has received her Ph.D. degree in Electrical & Computer Engineering from the University of Texas, Austin. Baishakhi's research interest is in the intersection of Software Engineering and Machine Learning. Baishakhi has received the NSF CAREER award, IBM Faculty Award, and VMware Early Career Faculty Award, and many best Paper awards including FASE 2020, FSE 2017, MSR 2017, IEEE Symposium on Security and Privacy (Oakland), 2014. Her research has also been published in CACM Research Highlights and has been widely covered in trade media.

Video Link
Mi Zhang
Empowering the Next Billion Devices with Deep Learning
Abstract The proliferation of edge devices and the gigantic amount of data they generate make it no longer feasible to transmit all the data to the cloud for processing. Such constraints fuel the need to move the intelligence from the cloud to the edge where data reside. In this talk, we will present our works on how we bring the power of deep learning to edge devices to realize the vision of Artificial Intelligence of Things. First, we will present our work on designing adaptive frameworks that empower AI-embedded edge devices to adapt to the inherently dynamic runtime resources to enable elastic on-device AI. Second, we shift from the single edge device setting to the distributed setting for the task of distributed on-device inference. We will focus on one killer application of edge computing, and present a distributed workload-adaptive framework for low-latency high-throughput large-scale live video analytics. Third, we will present our work on designing a distributed on-device training framework that significantly enhances the on-device training efficiency without compromising the training quality. The results and insights obtained in those works are also useful in designing many other modern machine learning systems.

Bio: Mi Zhang is an Associate Professor and the Director of the Machine Learning Systems Lab at Michigan State University. He received his Ph.D. from University of Southern California and B.S. from Peking University. Before joining MSU, he was a postdoctoral scholar at Cornell University. His research lies at the intersection of systems and machine learning, spanning areas including On-Device AI, Automated Machine Learning (AutoML), Federated Learning, Systems for Machine Learning, and Machine Learning for Systems. He is the 4th Place Winner of the 2019 Google MicroNet Challenge, the Third Place Winner of the 2017 NSF Hearables Challenge, and the champion of the 2016 NIH Pill Image Recognition Challenge. He is the recipient of six best paper awards and nominations. He is also the recipient of the Facebook Faculty Research Award, Amazon Machine Learning Research Award, and MSU Innovation of the Year Award.

Video Link
Dennis Shasha and Mustafa Anil Kocak
SafePredict and Friends
Abstract SafePredict is a meta-algorithm for machine learning applications that strategically refuses to accept the predictions of an underlying machine learning algorithm or algorithms. The goal is to achieve a user-specified correctness rate on the non-refused predictions without refusing too much. We show applications to an on-line learning setting in which the data-to-class mapping is not independent and identically distributed (not iid). In related work, we look at classification problems where we are willing to guess, on average, k classes in the hopes that one is correct. We compare such an approach in which we always choose the top k most likely classes. Finally, we consider the problem of selective sampling in settings where evaluating each sample is expensive. We build on and improve the Horvitz-Thompson and Augmented Inverse Probability Weighted sampling methods.

Bio: Dennis Shasha is a Julius Silver Professor of computer science at the Courant Institute of New York University and an Associate Director of NYU Wireless. He works on meta-algorithms for machine learning to achieve guaranteed correctness rates, with biologists on pattern discovery for network inference; on automated verification for concurrent algorithms; on a tool for policy planners facing epidemics; on tree and graph matching; on algorithms for time series for finance and migratory patterns; on database tuning; and on computational reproducibility. Because he likes to type, he has written six books of puzzles about a mathematical detective named Dr. Ecco, a biography about great computer scientists, and a book about the future of computing. He has also written eight technical books about database tuning, biological pattern recognition, time series, DNA computing, resampling statistics, causal inference in molecular networks, and the automated verification of concurrent search structures. He has co-authored more than 85 journal papers, 80 conference papers, and 25 patents. Because he loves puzzles, he has written the puzzle column for various publications including Scientific American, Dr. Dobb's Journal, and currently the Communications of the ACM. He is a fellow of the ACM and an INRIA International Chair.

Video Link
Laurel Orr
Towards Transparent Foundations -- Building Accessible Infrastructure for Training Large-Scale Language Models
Abstract “Foundation models” — large-scale self-supervised models that can be adapted to a wide range of downstream tasks - are changing how machine learning systems are constructed and deployed. Due to their extreme resource demands, training and developing a science behind these models has remained difficult. In this talk, I'll introduce and describe the journey behind Mistral, an infrastructure for accessible, easy-to-use foundation model training. I'll describe some of the hurdles we encountered with stable, reproducible training and how we see Mistral as a crucial step to facilitate open foundation model research.

Bio: Laurel Orr is currently a PostDoc at Stanford working with Chris Ré in the Hazy Research Lab. In August of 2019, she graduated with a PhD from Paul G Allen School for Computer Science and Engineering at the University of Washington in Seattle. She was part of the Database Group and advised by Dan Suciu and Magdalena Balazinska. Her research interests are broadly at the intersection of machine learning and data management. She focuses on how to manage the end-to-end lifecycle of self-supervised embedding pipelines. This includes problems of how to better train, maintain, monitor, and patch the embedding models and their use downstream.

Video Link
Pooyan Jamshidi
Causal AI for Systems
Abstract In this talk, I will present the recent progress in employing Causal AI (causal structure learning, causal inference, counterfactual reasoning, causal representation learning, and causal transfer learning) in addressing several significant and outstanding challenges in computer systems. Next, I will present our Causal AI approach for robust performance engineering (performance debugging, performance optimization, and performance predictions) in highly configurable composed systems. In particular, I will present our latest results for identifying and repairing performance faults in on-device ML systems and big data analytics pipelines. Finally, I will conclude by discussing future directions and opportunities of Causal AI in testing autonomous robots and dynamic reconfiguration of serverless systems and microservices.

Bio: Pooyan Jamshidi is an assistant professor in the computer science and engineering department at the University of South Carolina and a visiting researcher at Google AdsAI. His primary research interest is at the intersections of machine learning and systems.

Video Link
Chaoyang He
Distributed ML System for Large-scale Models: Dynamic Distributed Training and Scalable Federated Learning
Abstract In modern AI, large-scale deep learning models have emerged as the core technology behind many important Internet businesses, such as Search/ADs/Recommendation System/CV/NLP. BERT, Vision Transformer, GPT-3, and Switch Transformer models scale up the model size to a billion or even trillion number of parameters, showing non-trivial accuracy improvement for nearly all learning tasks. Distributed training using cloud clusters is key to the successful training of such large-scale models in a timely manner. Developing more advanced distributed training systems and algorithms can either reduce the energy cost or enable us to train even larger models. Furthermore, it is also essential to develop disruptive learning paradigms like federated learning, which can not only protect the privacy of users but also distribute the burden of handling unprecedented big data and models. This talk will mainly focus on distributed ML systems for large-scale models: dynamic distributed training for the cloud cluster ( and scale federated learning for the edge devices ( In the first part, I will introduce PipeTransformer, an automated elastic pipelining for distributed training of Transformer models (BERT and ViT). In PipeTransformer, we design an adaptive on the fly freeze algorithm that can identify and freeze some layers gradually during training, and an elastic pipelining system that can dynamically reduce GPU resources to train the remaining active layers, and also forks more pipelines on released GPU resources to enlarge the width of data parallelism. In the second part, I will talk about scalable federated learning towards training large-scale models on resource-constrained edge devices and FedML Ecosystem, which aims at ubiquitously distributed training at the edge for diverse AI applications such as CV NLP, GraphNN, and IoT.

Bio: Chaoyang He is a Ph.D. Candidate in the CS department at the University of Southern California, Los Angeles, USA. He is advised by Salman Avestimehr (USC), Professor Mahdi Soltanolkotabi (USC), Professor Murali Annavaram (USC), and Professor Tong Zhang (HKUST). He also works closely with researchers/engineers at Google, Facebook, Amazon, and Tencent. Previously, He was an R&D Team Manager and Staff Software Engineer at Tencent (2014-2018), a Team Leader and Senior Software Engineer at Baidu (2012-2014), and a Software Engineer at Huawei (2011-2012). His research focuses on distributed/federated machine learning algorithms, systems, and applications. Chaoyang He has received a number of awards in academia and industry, including Amazon ML Fellowship (2021-2022), Qualcomm Innovation Fellowship (2021-2022), Tencent Outstanding Staff Award (2015-2016), WeChat Special Award for Innovation (2016), Baidu LBS Group Star Awards (2013), and Huawei Golden Network Award (2012). During his Ph.D. study, he has published papers at ICML, NeurIPS, CVPR, ICLR, MLSys, among others. Besides pure research, he also has R&D experience for Internet products and businesses such as Tencent Cloud, Tencent WeChat Automotive / AI in Car, Tencent Games, Tencent Maps, Baidu Maps, and Huawei Smartphone. He obtained three years of experience in R&D team management at Tencent between 2016-2018. With his advisors, he also co-founds, built based on a paper that won Best Paper Award at NeurIPS 2020 FL workshop. More details are available at his homepage:

Video Link
Suman Jana
Scalable, Accurate, Robust Binary Analysis with Transfer Learning
Abstract Binary program analysis is a fundamental building block for a broad spectrum of security tasks. Essentially, binary analysis encapsulates a diverse set of tasks that aim to understand and analyze behaviors/semantics of binary programs. Existing approaches often tackle each analysis task independently and heavily employ ad-hoc task-specific brittle heuristics. While recent ML-based approaches have shown some early promise, they too tend to learn spurious features and overfit to specific tasks without understanding the underlying program semantics. In this talk, I will describe two of our recent projects that use transfer learning to learn binary program semantics and transfer the learned knowledge for different binary analysis tasks. Our key observation is that by designing a pretraining task that can learn binary code semantics, we can drastically boost the performance of binary analysis tasks. Our pretraining task is fully self-supervised -- it does not need expensive labeling effort and therefore can easily generalize across different architectures, operating systems, compilers, optimizations, and obfuscations. Extensive experiments show that our approach drastically improves the performance of popular tasks like binary disassembly and matching semantically similar binary functions.

Bio: Suman Jana is an associate professor in the department of computer science and the data science institute at Columbia University. His primary research interest is at the intersections of computer security and machine learning. His research has received six best paper awards, a CACM research highlight, a Google faculty fellowship, a JPMorgan Chase Faculty Research Award, an NSF CAREER award, and an ARO young investigator award.

Video Link
Jacopo Tagliabue
You don't need a bigger boat: MLOps at reasonable scale
Abstract It is indeed a wonderful time to build machine learning systems, as we don’t have much to do anymore! Thanks to a growing ecosystem of tools and shared best practices, even small teams can be incredibly productive at “reasonable scale”. In this talk, we present our philosophy for modern, no-nonsense data pipelines, highlighting the advantages of a PaaS approach, and showing (with open source code) how the entire toolchain works on real-world data with realistic constraints. We conclude discussing our proposal for self-documenting ML DAGs - 'DAG cards' for Metaflow - and sharing unsolicited advice on the future of MLOps for “reasonable” companies.

Bio: Educated in several acronyms across the globe (UNISR, SFI, MIT), Jacopo Tagliabue was co-founder and CTO of Tooso, an A.I. company in San Francisco acquired by Coveo in 2019. Jacopo is currently the Lead A.I. Scientist at Coveo, shipping models to hundreds of customers and millions of users. When not busy building products, he is exploring topics at the intersection of language, reasoning and learning: his research and industry work is often featured in the general press and premier A.I. venues. In previous lives, he managed to get a Ph.D., do sciency things for a pro basketball team, and simulate a pre-Columbian civilization.

Video Link
Chris Kedzie
Building a Machine Learning Framework for Chatbots
Abstract At Rasa, our goal is to make it easy for anyone to build a conversational assistant -- or chatbot. To that end, we develop Rasa Open Source, an open source machine learning framework for building chatbots, along with Rasa X, a closed source but free tool for monitoring and iteratively improving chatbots once they are in production. In addition to these technical offerings, we also strive to promote good data science through our philosophy of conversation-driven development.

Bio: Chris Kedzie is a machine learning researcher at Rasa. He has published research at the intersection of natural language processing, natural language generation, and machine learning. His most recent work has focused on making neural network models of language generation faithful with respect to content plans or semantic representations. He holds a PhD in computer science from Columbia University and has received a best paper award from the International Conference on Natural Language Generation (INLG 2019).

Video Link
Sasha Rush
Beyond Softmax: Scaling Probabilistic Structure in NLP
Abstract Progress on large autoregressive models for NLP applications has been transformative, but has left many practical questions about how to utilize these approaches in a controllable and efficient manner. This talk explores this challenge of using probabilistic models to impose explicit modeling structure. I show that discrete structured models can now be implemented efficiently on modern hardware with optimizing compilers. These approaches generalize the standard softmax function we all know and love, and in fact are not much harder to use in practice. To show the benefit of this approach, I will describe a factorization of the Transformer into a structured model that lets us learn a fast and accurate parallel translation decoder. The system shows how to take advantage of efficient inference based on basic distributional properties, while maintaining the modeling benefits of a deep model.

Bio: Alexander 'Sasha' Rush is an Associate Professor at Cornell Tech in NYC. His group's research is in the intersection of natural language processing, deep learning, and structured prediction with applications in text generation and efficient inference. He contributes to several open-source projects in NLP and works part time on HuggingFace Transformers. He was recently General Chair of ICLR and developed the MiniConf tool used to run ML/NLP virtual conferences. His work has received paper and demo awards at major NLP, visualization, and hardware conferences, an NSF Career Award, and a Sloan Fellowship.

Video Link
Willem Pienaar
Feature stores as the bridge between models and data
Abstract Feature stores have emerged as a pivotal component in the modern machine learning stack. They solve some of the toughest challenges in data for machine learning, namely feature computation, storage, validation, serving, and reuse. Ultimately, feature stores act as the bridge between models in production and an organization’s data. In this talk I will describe the key problems that feature stores solve, I will describe some key use cases and deployment patterns for feature stores that we see in the wild, and finally I will comment on how feature stores are evolving with the rise of modern data platforms.

Bio: Willem is a tech lead at Tecton where he currently leads open source development for Feast, the open source feature store. Willem previously started and led the data science platform team at Gojek, the Southeast Asian ride-hailing decacorn, where he built their machine learning platform. His main focus areas are building data and ML tooling, allowing organizations to scale machine learning and developer productivity. In a previous life, Willem also founded and sold a networking startup.

Video Link
Pete Warden
Machine Learning Everywhere
Abstract When I first joined Google in 2014, I was amazed to discover they were using 13 kilobyte neural network models to recognize "OK Google" on tiny embedded chips on Android phones. This felt like deep magic, and it made me wonder how many other problems these kinds of miniscule ML models could solve. Over the past few years I've been helping Google ship products using this approach with TensorFlow Lite Micro, and helped external developers create new applications. While it's still early days for "TinyML", we're already seeing interesting impacts on how engineers compose systems, including software-defined sensors, cascades of ML models, air-gapped ambient computing, and ubiquitous on-device voice interfaces. In this talk I'll cover the past, present, and future of embedded ML systems.

Video Link
Yaron Singer
Securing AI systems from operational risk
Abstract As organizations adopt AI technologies they inherit operational risk. This risk often manifests itself in AI models that produce erroneous predictions that go undetected. In this talk we will discuss root causes for AI models going haywire, and present a rigorous framework for eliminating risk from AI. We will show how this methodology can be used as building blocks for continuous monitoring and firewall systems for AI.

Bio: Yaron Singer is the CEO and co-founder of Robust Intelligence, and the Gordon McKay Professor of Computer Science and Applied Mathematics at Harvard University. Before Harvard he was a researcher at Google and obtained his PhD from UC Berkeley. He is the recipient of the NSF CAREER award, the Sloan fellowship, Facebook faculty award, Google faculty award, 2012 Best Student Paper Award at the ACM conference on Web Search and Data Mining, the 2010 Facebook Graduate Fellowship, the 2009 Microsoft Research PhD Fellowship.

Video Link
Karan Goel
Building Malleable ML Systems through Measurement, Monitoring & Maintenance
Abstract Machine learning systems are now easier to build than ever, but they still don’t perform as well as we would hope on real applications. I’ll explore a simple idea in this talk: if ML systems were more malleable and could be maintained like software, we might build better systems. I’ll discuss an immediate bottleneck towards building more malleable ML systems: the evaluation pipeline. I’ll describe the need for finer-grained performance measurement and monitoring, the opportunities paying attention to this area could open up in maintaining ML systems, and some of the tools that I’m building (with great collaborators) in the Robustness Gym project to close this gap.

Bio: Karan Goel is a 3rd year CS PhD student at Stanford advised by Chris Ré. His main goal is to accelerate the pace at which machine learning can be robustly and safely used in practice across applications, and in industry at large. He leads the Robustness Gym project, where he builds tools to measure, monitor and repair machine learning systems interactively. He is a recipient of the Siebel Foundation Scholarship.

Video Link
Richard Liaw
Assorted boring problems in distributed machine learning
Abstract Much of the academic focus on “distributing/scaling up machine learning” is synonymous with “training larger supervised ML models like GPT-3 with more and more compute resources”. However, training is only a small part of the ML lifecycle. In this talk, I’ll focus on a couple other machine learning problems that demand a large amount of compute resources, which may be a bit more “boring” but equally (or arguably more!) important. I’ll cover a couple problems that my collaborators and I have previously worked on at UC Berkeley and now at Anyscale: abstractions for scalable reinforcement learning and building RLlib (ICML 18, ICLR 20), distributed hyperparameter tuning and dynamic resource allocation for hyperparameter tuning (SOCC 19, Eurosys 21), and ray as a substrate for the next generation of ML platforms.

Bio: Richard Liaw is an engineer at Anyscale, where he leads a team in building open source machine learning libraries on top of Ray. He is on leave from the PhD program at UC Berkeley, where he worked at the RISELab advised by Ion Stoica, Joseph Gonzalez, and Ken Goldberg. In his time in the PhD program, he was part of the Ray team, building scalable ML libraries on top of Ray.

Video Link
Even Oldridge
Deep Learning Based Recommender Systems in Production
Abstract Recommender Systems are one of the most complex ML applications to deploy into production. The data is sparse, massive, and constantly increasing, and the models deployed create a feedback loop that requires careful monitoring. What's more, the hardware and software that led to the revolution of deep learning was built during the era of computer vision. Differences in architecture and data between vision and recommenders initially made the HW/SW stack a poor fit for deep learning based recommender systems. In this talk we'll explore what makes recommenders different from a data, architecture, and system perspective, and talk about changes in GPU hardware within the last generation that make it much better suited to the recommendation problem. By focusing on these differences we've also identified improvements on the software side that take advantage of optimizations only possible in the recommendation domain. A new era of faster ETL, Training and Inference is coming to the RecSys space and this talk will walk through some of the patterns of optimization that guide the tools we're building to make recommenders both faster to use and easier to deploy on GPUs.

Bio: Even Oldridge is a Sr. Manager at NVIDIA leading the effort to develop the open source libraries of Merlin which provide fast, easy to use and deploy, scalable recommender systems on the GPU. He has a PhD in Computer Vision and a Masters in Programmable Hardware from the University of British Columbia. He’s worked in the recommendation space for the past decade and has developed systems for recommending dates and houses, among other things. He’s an industry co-chair for ACM RecSys Conference 2021, and he’ll talk your ear off about embeddings and deep learning based recommenders if you let him.

Video Link
Tim Kraska
Towards Instance-Optimized Data Systems
Abstract Recently, there has been a lot of excitement around ML-enhanced (or learned) algorithm and data structures. For example, there has been work on applying machine learning to improve query optimization, indexing, storage layouts, scheduling, log-structured merge trees, sorting, compression, sketches, among many other things. Arguably, the motivation behind these techniques are similar: machine learning is used to model the data and/or workload in order to derive a more efficient algorithm or data structure. Ultimately, what these techniques will allow us to build are “instance-optimized” systems; systems that self-adjust to a given workload and data distribution to provide unprecedented performance and avoid the need for tuning by an administrator. In this talk, I will provide an overview of the opportunities and limitations of learned index structures, storage layouts, and query optimization techniques we have been developing in my group, and how we are integrating these techniques to build a first instance-optimized database system.

Bio: Tim Kraska is an Associate Professor of Electrical Engineering and Computer Science in MIT's Computer Science and Artificial Intelligence Laboratory, co-director of the Data System and AI Lab at MIT (DSAIL@CSAIL), and co-founder of Einblick Analytics. Currently, his research focuses on building systems for machine learning, and using machine learning for systems. Before joining MIT, Tim was an Assistant Professor at Brown, spent time at Google Brain, and was a PostDoc in the AMPLab at UC Berkeley after he got his PhD from ETH Zurich. Tim is a 2017 Alfred P. Sloan Research Fellow in computer science and received several awards including the VLDB Early Career Research Contribution Award, the VMware Systems Research Award, the university-wide Early Career Research Achievement Award at Brown University, an NSF CAREER Award, as well as several best paper and demo awards at VLDB and ICDE.

Video Link
Guanhua Wang
Disruptive Research on Distributed ML Systems
Abstract Deep Neural Networks (DNNs) enable computers to excel across many different applications such as image classification, speech recognition and robotics control. To accelerate DNN training and serving, parallel computing is widely adopted. System efficiency is a big issue when scaling out. In this talk, I will make three arguments towards better system efficiency in distributed DNN training and serving. First, Ring All-Reduce for model synchronization is not optimal, but Blink is. By packing spanning trees rather than forming rings, Blink achieves higher flexibility in arbitrary networking environments and provides near-optimal network throughput. Blink is filed as a US patent and is being used by Microsoft. Blink gains lots of attention from industry, such as Facebook (distributed PyTorch team), ByteDance (parent company of TikTok app). Blink was also featured on Nvidia GTC China 2019 and news from Baidu, Tencent. Second, communication can be eliminated via sensAI's class parallelism. sensAI decouples a multi-task model into disconnected subnets, each is responsible for decision making of a single task. sensAI's attribute of low-latency, real-time model serving attracts several Venture Capitals in the Bay Area. Third, Wavelet is more efficient than gang-scheduling. By intentionally adding task launching latency, Wavelet interleaves peak memory usage across different waves of training tasks on the accelerators, and thus it improves both computation and on-device memory utilization. Multiple companies, including Facebook and Apple, show interests to Wavelet project.

Bio: Guanhua Wang is a final year CS PhD in the RISELab at UC Berkeley, advised by Prof. Ion Stoica. His research lies primarily in the ML+Systems area including fast collective communication schemes for model synchronization, efficient in-parallel model training and real-time model serving.

Video Link
Carole-Jean Wu
Designing AI systems for deep learning recommendation and beyond
Abstract The past decade has witnessed a 300,000 times increase in the amount of compute for AI. The latest natural language processing model is fueled with over trillion parameters while the memory need of neural recommendation and ranking models has grown from hundreds of gigabyte to the terabyte scale. This talk introduces the underinvested deep learning personalization and recommendation systems in the overall research community. The training of state-of-the-art industry-scale personalization and recommendation models consumes the highest number of compute cycles among all deep learning use cases at Facebook. For AI inference, recommendation use cases consume even higher compute cycles of 80%. What are the key system challenges faced by industry-scale neural personalization and recommendation models? This talk will highlight recent advances on AI system development for deep learning recommendation and the implications on infrastructure optimization opportunities across the machine learning system stack. System research for deep learning recommendation and AI at large is at a nascent stage. This talk will conclude with research directions for building and designing responsible AI systems – that is fair, efficient, and environmentally sustainable.

Bio: Carole-Jean Wu is a Technical Lead and Manager at Facebook AI Research – SysML. Her work is in the domain of computer system architecture with particular emphasis on energy- and memory-efficient systems. Her research has pivoted into designing systems for machine learning execution at-scale, such as for personalized recommender systems and mobile deployment. In general, she is interested in tackling system challenges to enable efficient, responsible AI execution. Carole-Jean chairs the MLPerf Recommendation Benchmark Advisory Board, co-chaired MLPerf Inference, and serves on the MLCommons Board as a director. Carole-Jean received her M.A. and Ph.D. from Princeton and B.Sc. from Cornell. She is the recipient of the NSF CAREER Award, Facebook AI Infrastructure Mentorship Award, the IEEE Young Engineer of the Year Award, the Science Foundation Arizona Bisgrove Early Career Scholarship, and the Intel PhD Fellowship, among a number of Best Paper awards.

Video Link
Erin LeDell
Scalable Machine Learning with H2O & Systems Approach to Algorithm Development
Abstract The focus of this presentation is the scalable and distributed machine learning platform, H2O. The multi-node distributed algorithms (GLM, Random Forest, GBM, DNNs, etc) can train on datasets which are larger than RAM (of a single machine), and H2O integrates with other 'big data' systems, Hadoop and Spark. H2O is engineered for production use cases with a focus on fast training and prediction speeds. The second part of the talk will discuss a systems approach to developing novel machine learning algorithms such as H2O AutoML. Unlike well-defined ML algorithms (e.g. GBM), an 'AutoML' algorithm is an automated process which aims to train the best model (or ensemble) in a specified amount of time. I will discuss our methodology for experimentation and validation of new strategies or changes to the algorithm, using a benchmark-driven systems approach.

Bio: Erin LeDell is the Chief Machine Learning Scientist at Her research focuses on automatic machine learning, ensemble machine learning and statistical computing. Before joining, she was the Principal Data Scientist at and Marvin Mobile Security, the founder of DataScientific, Inc. She received her Ph.D. in Biostatistics with a Designated Emphasis in Computational Science and Engineering from University of California, Berkeley.

Video Link
Jason Knight
Reshaping the ML software bedrock with compilers
Abstract The rate of change for ML software, hardware, and algorithms improves our lives daily, but how sturdy are the foundations we rely on? From my experience at one of the first ML accelerator startups (Nervana), applying ML to biology and medicine, leading the ML SW product team at Intel, and then co-founding OctoML I'll describe: 1) The pains of developing ML SW stacks for CPUs, GPUs and accelerators, and how these pains radiate outwards to both practitioners and hardware vendors, 2) How that led me to find the Apache TVM project, what it is, and why it matters, 3) Challenges and opportunities ahead ML compilation and TVM specifically, and what it can enable for ML end users everywhere.

Bio: Jason Knight is co-founder and CPO at OctoML building the machine learning acceleration platform for deploying ML anywhere. From the founders of the Apache TVM project, OctoML uses machine learning to generate efficient binaries for ML model deployment on any hardware. Before starting OctoML, Jason previously drove Intel’s AI software strategy, built large scale human sequencing data pipelines in the biotech industry, and earned a PhD in machine learning and computational biology.

Video Link
Rutuja Surve
Building Decentralized Neural Search Systems in Production
Abstract With the rapid growth of media and meta data in both the enterprise and consumer markets, there is an evolving need for search systems to go beyond simple symbolic retrieval and towards more cognitive-driven understanding. Today, with the ever more long documents and multimedia data, finding the right information is more important and challenging than ever. The rise of deep learning has ushered in a new era of neural search. However, building a neural search system is non-trivial for researchers and engineers. While neural search has long held a significant promise, the advantages of open source combined with recent advances in deep learning now provides us a framework to make the next generation of search technology a reality. In this talk, I will describe how Jina solves these challenges by providing an open source neural search ecosystem for businesses and developers, allowing anyone to search any kind of data with high availability and scalability - driving the shift from a traditional search system to a state-of-the-art AI-centric search system.

Bio: Rutuja is an Artificial Intelligence Engineer at Jina AI, with an interest in open source software and research. Her industry experience includes working with Google and Nutanix as a software engineer. She has been a former core contributor at MariaDB Foundation and has development experience contributing to various open source organisations like Mozilla, Linux Foundation and OWASP.

Video Link
Lin Ma
NoisePage: The Self-Driving Database Management System
Abstract Database management systems (DBMSs) are an important part of modern data-driven applications. However, they are notoriously difficult to deploy and administer. There are existing methods that recommend physical design or knob configurations for DBMSs. But most of them require humans to make final decisions and decide when to apply changes. The goal of a self-driving DBMS is to remove the DBMS administration impediments by managing itself autonomously. In this talk, I present the design of a new self-driving DBMS (NoisePage) that enables such automatic system management. I first discuss a forecasting framework that uses unsupervised clustering and ensemble ML models to efficiently predict the query arrival rates under varying database workload patterns. I then describe NoisePage's modeling framework that constructs and maintains ML models to predict the behavior of self-driving DBMS actions: the framework decomposes the DBMS architecture into fine-grained operating units to estimate the system's behavior under unseen configurations. I then introduce our ongoing work for an action planning framework that makes explainable decisions based on the forecasted workload and the modeled behavior. Lastly, I explain how we integrate all the self-driving components into the system.

Bio: Lin Ma ( is a PhD candidate from Carnegie Mellon University Computer Science Department advised by Andy Pavlo. He is interested in database systems and machine learning. His research focus has been on designing the architecture for self-driving databases. Lin was voted the 'most congenial PhD student' in the CMU Database Group in 2017, 2018, and 2020.

Livestream Link
Ameet Talwalker
Automating Architecture Transfer on Diverse Tasks
Abstract Hand-crafted neural architecture design has played a major role in accelerating progress in computer vision, resulting in effective backbones like ResNet. Unfortunately, these convolutional backbones are not as effective in other domains. Successfully transferring existing architectures to applications such as sequence modeling, learning on graphs, or solving partial differential equations has required the manual design of task-specific neural operations to replace convolutions. In this talk, we will first motivate the problem of 'automating architecture transfer' to enable users to find the right operations given data from their specific domain. We will next present our ongoing work on this problem, by introducing a family of neural operations called 'XD-Operations' that mimic the inductive bias of multichannel convolutions while being much more expressive, provably containing numerous well-known operations. We then demonstrate the effectiveness of XD-operations on a diverse set of applications---in some cases outperforming the latest neural operation designs.

Bio: Ameet Talwalkar is an assistant professor in the Machine Learning Department at CMU, and also co-founder and Chief Scientist at Determined AI. His interests are in the field of statistical machine learning. His current work is motivated by the goal of democratizing machine learning, with a focus on topics related to automation, fairness, interpretability, and federated learning. He led the initial development of the MLlib project in Apache Spark, is a co-author of the textbook 'Foundations of Machine Learning' (MIT Press), and created an award-winning edX MOOC on distributed machine learning. He also helped to create the MLSys conference, serving as the inaugural Program Chair in 2018, General Chair in 2019, and currently as President of the MLSys Board.

Video Link
Theodoros Rekatsinas
Structure is all you need: Software 2.0 for Data Quality Management
Abstract Data quality management is a bottleneck in modern analytics as high-effort tasks such as data validation and cleaning are essential to obtain accurate results. In this talk, I will review how Software 2.0 can automate routine data validation tasks such as missing value imputation and detection of corrupted samples. First, I will discuss how one can leverage structured, statistical dependencies in the data to obtain information theoretically optimal data preparation methods, and then I will demonstrate how the widely-used Attention mechanism is key to automated data validation. This talk builds upon experience with projects such as HoloClean, FDX, and Picket and their application to different scientific and industrial use-cases.

Bio: Theodoros (Theo) Rekatsinas is an Assistant Professor in the Department of Computer Sciences at the University of Wisconsin-Madison, currently on leave at Apple. Theo is also a co-founder of Inductiv (now part of Apple), which developed technology that uses artificial intelligence to automate processes that involve identifying and correcting errors in data.

Video Link
Savin Goyal
Taming the Long Tail of Industrial ML Applications
Abstract Data Science usage at Netflix goes much beyond our eponymous recommendation systems. It touches almost all aspects of our business - from optimizing content delivery and informing buying decisions to fighting fraud. Our unique culture affords our data scientists extraordinary freedom of choice in ML tools and libraries, all of which results in an ever-expanding set of interesting problem statements and a diverse set of ML approaches to tackle them. Our data scientists, at the same time, are expected to build, deploy, and operate complex ML workloads autonomously without the need to be significantly experienced with systems or data engineering. In this talk, I will discuss some of the challenges involved in improving the development and deployment experience for ML workloads. I will focus on Metaflow, our ML framework, which offers useful abstractions for managing the model’s lifecycle end-to-end, and how a focus on human-centric design positively affects our data scientists' velocity.

Bio: Savin is a software engineer at Netflix responsible for Metaflow, Netflix's ML platform. He focuses on building generalizable infrastructure to accelerate the impact of data science at Netflix and beyond.

Video Link
Fabio Petroni
Assessing Machine Knowledge
Abstract In the talk I will review a set of general approaches for representing large scale textual knowledge sources that are useful for multiple downstream tasks. I will present benchmarking tools spanning multiple domains (including Question Answering, Entity Linking and Dialogue) and I will describe the latest knowledge-intensive NLP models with a focus on their efficiency.

Bio: Fabio is a Research Engineer in the Facebook Artificial Intelligence Research (FAIR) lab in London. His research focuses on Natural Language Processing, in particular, Information Extraction, Question Answering and Knowledge Representation. Prior to joining Facebook, he was with the R&D department of Thomson Reuters and received a PhD degree from Sapienza University of Rome.

Video Link
Sara Hooker
The Hardware Lottery
Abstract I will introduce the term Hardware lottery to describe when a research idea wins because it is suited to the available software and hardware and not because the idea is superior to alternative research directions. This talk will motivate attention to hardware lotteries by discussing examples from early computer history which have delayed research progress by casting successful ideas as failures. These lessons are particularly salient given the advent of domain specialized hardware which make it increasingly costly to stray off of the beaten path of research ideas.

Bio: Sara Hooker is a researcher at Google Brain working on reliable explanations of model behavior. Her main research interests gravitate towards training models beyond test-set accuracy to be compact, robust, fair and interpretable. In 2014, she founded Delta Analytics, a non-profit dedicated to bringing technical capacity to help non-profits across the world use machine learning for good.

Video Link
Anna Goldie
Chip Floorplanning with Deep Reinforcement Learning
Abstract In this talk, I will describe a reinforcement learning (RL) method for chip floorplanning, the engineering problem of designing the physical layout of a computer chip. Chip floorplanning ordinarily requires weeks or months of effort by physical design engineers to produce manufacturable layouts. Our method generates floorplans in under six hours that are superior or comparable to humans in all key metrics, including power consumption, performance, and chip area. To achieve this, we pose chip floorplanning as a reinforcement learning problem, and develop a novel edge-based graph convolutional neural network architecture capable of learning rich and transferrable representations of the chip. Our method was used in the design of the next generation of Google’s artificial intelligence (AI) accelerators (TPU).

Bio: Anna Goldie is a Staff Researcher at Google Brain and co-founder/tech-lead of the Machine Learning for Systems Team. She is also a PhD student in the Stanford NLP Group, where she is advised by Prof. Chris Manning. At MIT, she earned a Masters of Computer Science, Bachelors of Computer Science, and Bachelors of Linguistics. She speaks fluent Mandarin, Japanese, and French, as well as conversational Spanish, Italian, German, and Korean. Her work has been covered in various media outlets, including MIT Technology Review and IEEE Spectrum.

Livestream Link
Piero Molino
Ludwig, a Declarative Deep Learning Toolbox
Abstract The talk will introduce Ludwig, a deep learning toolbox that allows to train models and to use them for prediction without the need to write code. Thanks to its declarative configuration system and the use of data types to guide piepeline building, it helps make deep learning approachable for non-experts and enable faster model improvement iteration cycles for experienced machine learning engineers and researchers. By using Ludwig, experts and researchers can simplify the development process and focus on experiment comparison and model quality. We will also discuss recent improvements to Ludwig, including AutoML and hyperparameter optimization capabilities, its backstory and its future releases.

Bio: Piero Molino is a Staff Research Scientist at Stanford University working on Machine Learning systems and algorithms. Piero completed a PhD on Question Answering at the University of Bari, Italy. Founded QuestionCube, a startup that built a framework for semantic search and QA. Worked for Yahoo Labs in Barcelona on learning to rank, IBM Watson in New York on natural language processing with deep learning and then joined Geometric Intelligence, where he worked on grounded language understanding. After Uber acquired Geometric Intelligence, he became one of the founding members of Uber AI Labs. At Uber he worked on research topics including Dialogue Systems, Language Generation, Graph Representation Learning, Computer Vision, Reinforcement Learning and Meta Learning. He also worked on several deployed systems like COTA, an ML and NLP model for Customer Support, Dialogue Systems for driver hands free dispatch, pickup and communications, and on the Uber Eats Recommender System with graph learning. He is the author of Ludwig, a code-free deep learning toolbox.

Video Link
Shreya Shankar
Debugging Machine Learning in Production
Abstract Machine learning pipelines can successfully demonstrate high performance on train and evaluation datasets, but what happens after you promote that model to production? What are some of the challenges faced, and how do groups of different stakeholders with different technical abilities collaborate to identify and “fix” bugs? In my talk, I will draw from my experiences to describe a high level overview of modern ML infrastructure, criteria for promoting models, case studies of “bugs” encountered when clients were interacting with the live ML predictions, and the challenges in solving these issues.

Bio: Shreya is a computer scientist living in San Francisco interested in making machine learning work in the “real world.” Currently, she is taking a break from work, but previously, she was the first ML engineer at Viaduct, did ML research at Google Brain, and completed her BS and MS in computer science at Stanford.

Video Link
Josh Tobin
A missing link in the ML infrastructure stack?
Abstract Machine learning is quickly becoming a product engineering discipline. Although several new categories of infrastructure and tools have emerged to help teams turn their models into production systems, doing so is still extremely challenging for most companies. In this talk, we survey the tooling landscape and point out several parts of the machine learning lifecycle that are still underserved. We propose a new category of tool that could help alleviate these challenges and connect the fragmented production ML tooling ecosystem. We conclude by discussing similarities and differences between our proposed system and those of a few top companies.

Bio: Josh Tobin is the founder and CEO of a stealth machine learning startup. Previously, Josh worked as a deep learning & robotics researcher at OpenAI and as a management consultant at McKinsey. He is also the creator of Full Stack Deep Learning (, the first course focused on the emerging engineering discipline of production machine learning. Josh did his PhD in Computer Science at UC Berkeley advised by Pieter Abbeel.

Video Link
Travis Addair
Horovod and the Evolution of Deep Learning at Scale
Abstract Deep neural networks are pushing the state of the art in numerous machine learning research domains; from computer vision, to natural language processing, and even tabular business data. However, scaling such models to train efficiently on large datasets imposes a unique set of challenges that traditional batch data processing systems were not designed to solve. Horovod is an open source framework that scales models written in TensorFlow, PyTorch, and MXNet to train seamlessly on hundreds of GPUs in parallel. In this talk, we'll explain the concepts and unique constraints that led to the development of Horovod at Uber, and discuss how the latest trends in deep learning research are informing the future direction of the project within the Linux Foundation. We'll explore how Horovod fits into production ML workflows in industry, and how tools like Spark and Ray can combine with Horovod to make productionizing deep learning at scale on remote data centers as simple as running locally on your laptop. Finally, we'll share some thoughts on what's next for large scale deep learning, including new distributed training architectures and how the larger ecosystem of production ML tooling is evolving.

Bio: Travis Addair is a software engineer at Uber leading the Deep Learning Training team as part of the Michelangelo machine learning platform. He is the lead maintainer for the Horovod open source project and chairs its Technical Steering Committee within the Linux Foundation. In the past, he’s worked on scaling machine learning systems at Google and Lawrence Livermore National Lab.

Video Link
Song Han
TinyML: Reducing the Carbon Footprint of Artificial Intelligence in the Internet of Things (IoT)
Abstract Deep learning is computation-hungry and data-hungry. We aim to improve the computation efficiency and data efficiency of deep learning. I will first talk about MCUNet that brings deep learning to IoT devices. The technique is tiny neural architecture search (TinyNAS) co-designed with a tiny inference engine (TinyEngine), enabling ImageNet-scale inference on an IoT device with only 1MB of FLASH. Next I will talk about TinyTL that enables on-device training, reducing the memory footprint by 7-13x. Finally, I will describe Differentiable Augmentation that enables data-efficient GAN training, generating photo-realistic images using only 100 images, which used to require tens of thousand of images. We hope such TinyML techniques can make AI greener, faster, and more sustainable.

Bio: Song Han is an assistant professor in MIT EECS. He received his PhD degree from Stanford University. His research focuses on efficient deep learning computing. He proposed “deep compression” technique that can reduce neural network size by an order of magnitude without losing accuracy, and the hardware implementation “efficient inference engine” that first exploited pruning and weight sparsity in deep learning accelerators. His recent research on hardware-aware neural architecture search and TinyML was highlighted by MIT News, Wired, and Venture Beat, and received many low-power computer vision (LPCV) contest awards. Song received Best Paper awards at ICLR’16 and FPGA’17, Amazon Machine Learning Research Award, SONY Faculty Award, Facebook Faculty Award. Song was named “35 Innovators Under 35” by MIT Technology Review for his contribution on “deep compression” technique that “lets powerful artificial intelligence (AI) programs run more efficiently on low-power mobile devices.” Song received the NSF CAREER Award for “efficient algorithms and hardware for accelerated machine learning.”

Video Link
Kayvon Fatahalian
From Ideas to Video Analysis Models in Hours, Not Weeks
Abstract My students and I often find ourselves as "subject matter experts" needing to create video understanding models that serve computer graphics and video analysis applications. Unfortunately, like many, we are frustrated by how a smart grad student, armed with a large *unlabeled* video collection, an palette of pre-trained models, and an idea of what novel object or activity they want to detect/segment/classify, requires days-to-weeks to create and validate a model for their task. In this talk I will discuss challenges we've faced in the iterative process of curating data, training models, and validating models for the specific case of rare events and categories in image and video collections. In this regime we've found that conventional wisdom about training on imbalance data sets, and data acquisition via active learning does not lead to the most efficient solutions. I'll discuss these challenges in the context of image and video analysis applications, and elaborate on our ongoing vision of how a grad student, armed with massive amounts of unlabeled video data, pretrained models, and available-in-seconds-supercomputing-scale elastic compute should be able to interactively iterate on cycles of acquiring training data, training models, and validating models.

Bio: Kayvon Fatahalian is an Assistant Professor in the Computer Science Department at Stanford University. His lab works on visual computing systems projects, including large-scale video analytics, programming systems for video data mining, compilation techniques for optimizing image processing pipelines. In all these efforts, the goal is to enable more rapid development of applications that involve video processing at scale.

Video Link
Matthias Poloczek
Scalable Bayesian Optimization for Industrial Applications
Abstract Bayesian optimization has become a powerful method for the sample-efficient optimization of expensive black-box functions. These functions do not have a closed-form and are evaluated for example by running a complex economic simulation, by an experiment in the lab or in a market, or by a CFD simulation. Use cases arise in machine learning, e.g., when tuning the configuration of an ML model or when optimizing a reinforcement learning policy. Examples in engineering include the design of aerodynamic structures or materials discovery. In this talk I will introduce the key ideas of Bayesian optimization and discuss how they can be applied to tuning ML models. Moreover, I will share some experiences with developing a Bayesian optimization service in industry.

Bio: Matthias’ research interests lie at the intersection of machine learning and optimization, with a focus on Bayesian methods for 'exotic' optimization problems arising in business applications and in the natural sciences. He is a Principled Scientist at Amazon. Previously, Matthias was a Senior Manager at Uber AI, where he founded Uber’s Bayesian optimization team and led the cross-org effort that built a company-wide service to tune ML models at scale. Matthias received his PhD in CS from Goethe University in Frankfurt in 2013 and then worked as a postdoc at Cornell with David Williamson and Peter Frazier from 2014 until 2017. He was an Assistant Professor in the Department of Systems and Industrial Engineering at the University of Arizona from 2017 until 2019.

Video Link
Roy Frostig
JAX: accelerating machine learning research by composing function transformations in Python
Abstract JAX is a system for high-performance machine learning research and numerical computing. It offers the familiarity of Python+NumPy together with hardware acceleration, plus a set of composable function transformations: automatic differentiation, automatic batching, end-to-end compilation (via XLA), parallelizing over multiple accelerators, and more. JAX's core strength is its guarantee that these user-wielded transformations can be composed arbitrarily, so that programmers can write math (e.g. a loss function) and transform it into pieces of an ML program (e.g. a vectorized, compiled, batch gradient function for that loss). JAX had its open-source release in December 2018 ( It's used by researchers for a wide range of applications, from studying training dynamics of neural networks, to probabilistic programming, to scientific applications in physics and biology.

Bio: Roy Frostig is a research scientist at Google. He's interested in forming reliable foundations for machine learning, by making software systems for ML research and by studying the statistical elements of its practice. He received his BS, MS, and PhD from Stanford, advised by Percy Liang.

Video Link
Chip Huyen
Principles of Good Machine Learning Systems Design
Abstract This talk covers what it means to operationalize ML models. It starts by analyzing the difference between ML in research vs. in production, ML systems vs. traditional software, as well as myths about ML production. It then goes over the principles of good ML systems design and introduces an iterative framework for ML systems design, from scoping the project, data management, model development, deployment, maintenance, to business analysis. It covers the differences between DataOps, ML Engineering, MLOps, and data science, and where each fits into the framework. It also discusses the main skills each stage requires, which can help companies in structuring their teams. The talk ends with a survey of the ML production ecosystem, the economics of open source, and open-core businesses.

Bio: Chip Huyen is an engineer who develops tools and best practices for machine learning production. She’s currently with Snorkel AI and she’ll be teaching Machine Learning Systems Design at Stanford from January 2021. Previously, she was with Netflix, NVIDIA, Primer. She’s also the author of four bestselling Vietnamese books.

Video Link
Alex Ratner
Programmatically Building & Managing Training Data with Snorkel
Abstract One of the key bottlenecks in building machine learning systems is creating and managing the massive training datasets that today's models require. In this talk, I will describe our work on Snorkel (, an open-source framework for building and managing training datasets, and describe three key operators for letting users build and manipulate training datasets: labeling functions, for labeling unlabeled data; transformation functions, for expressing data augmentation strategies; and slicing functions, for partitioning and structuring training datasets. These operators allow domain expert users to specify machine learning (ML) models entirely via noisy operators over training data, expressed as simple Python functions---or even via higher level NL or point-and-click interfaces---leading to applications that can be built in hours or days, rather than months or years, and that can be iteratively developed, modified, versioned, and audited. I will describe recent work on modeling the noise and imprecision inherent in these operators, and using these approaches to train ML models that solve real-world problems, including recent state-of-the-art results on benchmark tasks and real-world industry, government, and medical deployments.

Bio: Alex Ratner is the co-founder and CEO of Snorkel AI, Inc., which supports the open source Snorkel library and develops Snorkel Flow, an end-to-end system for building machine learning applications, and an Assistant Professor of Computer Science at the University of Washington. Prior to Snorkel AI and UW, he completed his PhD in CS advised by Christopher Ré at Stanford, where his research focused on applying data management and statistical learning techniques to emerging machine learning workflows, such as creating and managing training data, and applying this to real-world problems in medicine, knowledge base construction, and more.

Video Link
Virginia Smith
On Heterogeneity in Federated Settings
Abstract A defining characteristic of federated learning is the presence of heterogeneity, i.e., that data and compute may differ significantly across the network. In this talk I show that the challenge of heterogeneity pervades the machine learning process in federated settings, affecting issues such as optimization, modeling, and fairness. In terms of optimization, I discuss FedProx, a distributed optimization method that offers robustness to systems and statistical heterogeneity. I then explore the role that heterogeneity plays in delivering models that are accurate and fair to all users/devices in the network. Our work here extends classical ideas in multi-task learning and alpha-fairness to large-scale heterogeneous networks, enabling flexible, accurate, and fair federated learning.

Bio: Virginia Smith is an assistant professor in the Machine Learning Department at Carnegie Mellon University. Her research interests span machine learning, optimization, and distributed systems. Prior to CMU, Virginia was a postdoc at Stanford University, received a Ph.D. in Computer Science from UC Berkeley, and obtained undergraduate degrees in Mathematics and Computer Science from the University of Virginia.

Video Link
Matei Zaharia
Machine Learning at Industrial Scale: Lessons from the MLflow Project
Abstract Although enterprise adoption of machine learning is still early on, many enterprises in all industries already have hundreds of internal ML applications. ML powers business processes with an impact of hundreds of millions of dollars in industrial IoT, finance, healthcare and retail. Building and operating these applications reliably requires infrastructure that is different from traditional software development, which has led to significant investment in the construction of “ML platforms” specifically designed to run ML applications. In this talk, I’ll discuss some of the common challenges in productionizing ML applications based on experience building MLflow, an open source ML platform started at Databricks. MLflow is now the most widely used open source project in this area, with over 2 million downloads a month and integrations with dozens of other products. I’ll also highlight some interesting problems users face that are not covered deeply in current ML systems research, such as the need for “hands-free” ML that can train thousands of independent models without direct tuning from the ML developer for regulatory reasons, and the impact of privacy and interpretability regulations on ML. All my examples will be based on experience at large Databricks / MLflow customers.

Bio: Matei Zaharia is an Assistant Professor of Computer Science at Stanford University and Chief Technologist at Databricks. He started the Apache Spark project during his PhD at UC Berkeley in 2009, and has worked broadly on other cluster computing and analytics software, including MLflow and Delta Lake. At Stanford, Matei is a co-PI of the DAWN Lab doing research on infrastructure for machine learning. Matei’s work was recognized through the 2014 ACM Doctoral Dissertation Award, an NSF CAREER Award, and the US Presidential Early Career Award for Scientists and Engineers (PECASE).

Video Link
Marco Tulio Ribeiro
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
Abstract We will present CheckList, a task-agnostic methodology and tool for testing NLP models inspired by principles of behavioral testing in software engineering. We will show a lot of fun bugs we discovered with CheckList, both in commercial models (Microsoft, Amazon, Google) and research models (BERT, RoBERTA for sentiment analysis, QQP, SQuAD). We'll also present comparisons between CheckList and the status quo, in a case study at Microsoft and a user study with researchers and engineers. We show that CheckList is a really helpful process and tool for testing and finding bugs in NLP models, both for practitioners and researchers.

Bio: Marco Tulio Ribeiro is a Senior Researcher at Microsoft Research. His work is on facilitating the communication between humans and machine learning models, which includes interpretability, trust, debugging, feedback, robustness, testing, etc. He received his PhD from the University of Washington.

Video Link

About The Seminar

Seminar Hosts: Dan Fu, Karan Goel, Fiodar Kazhamakia, Piero Molino.

Executive Producers: Matei Zaharia, Chris Ré.

You can reach us at sysmlstanfordseminar [at] gmail.

Source code for this website can be found here.