Stanford MLSys Seminar Series
- Our talks this semester are Thursdays 1 PM PT!
- Join our email list to get notified of the speaker and livestream link every week!
Machine learning is driving exciting changes and progress in computing. What does the ubiquity of machine learning mean for how people build and deploy systems and applications? What challenges does industry face when deploying machine learning systems in the real world, and how can academia rise to meet those challenges?
In this seminar series, we want to take a look at the frontier of machine learning systems, and how machine learning changes the modern programming stack. Our goal is to help curate a curriculum of awesome work in ML systems to help drive research focus to interesting questions.
Starting in Fall 2020, we’ll be livestreaming each talk in this seminar series Thursdays 1-2 PT on YouTube, and taking questions from the live chat, and videos of the talks will be available on YouTube afterwards as well. Give our channel a follow, and tune in every week for an exciting discussion!
Read about our motivation for starting this seminar.
Check out our introductory video:
AbstractWith the rapid growth of media and meta data in both the enterprise and consumer markets, there is an evolving need for search systems to go beyond simple symbolic retrieval and towards more cognitive-driven understanding. Today, with the ever more long documents and multimedia data, finding the right information is more important and challenging than ever. The rise of deep learning has ushered in a new era of neural search. However, building a neural search system is non-trivial for researchers and engineers. While neural search has long held a significant promise, the advantages of open source combined with recent advances in deep learning now provides us a framework to make the next generation of search technology a reality. In this talk, I will describe how Jina solves these challenges by providing an open source neural search ecosystem for businesses and developers, allowing anyone to search any kind of data with high availability and scalability - driving the shift from a traditional search system to a state-of-the-art AI-centric search system.
Bio: Rutuja is an Artificial Intelligence Engineer at Jina AI, with an interest in open source software and research. Her industry experience includes working with Google and Nutanix as a software engineer. She has been a former core contributor at MariaDB Foundation and has development experience contributing to various open source organisations like Mozilla, Linux Foundation and OWASP.
AbstractThe rate of change for ML software, hardware, and algorithms improves our lives daily, but how sturdy are the foundations we rely on? From my experience at one of the first ML accelerator startups (Nervana), applying ML to biology and medicine, leading the ML SW product team at Intel, and then co-founding OctoML I'll describe: 1) The pains of developing ML SW stacks for CPUs, GPUs and accelerators, and how these pains radiate outwards to both practitioners and hardware vendors, 2) How that led me to find the Apache TVM project, what it is, and why it matters, 3) Challenges and opportunities ahead ML compilation and TVM specifically, and what it can enable for ML end users everywhere
Bio: Jason Knight is co-founder and CPO at OctoML building the machine learning acceleration platform for deploying ML anywhere. From the founders of the Apache TVM project, OctoML uses machine learning to generate efficient binaries for ML model deployment on any hardware. Before starting OctoML, Jason previously drove Intel’s AI software strategy, built large scale human sequencing data pipelines in the biotech industry, and earned a PhD in machine learning and computational biology.
AbstractThe focus of this presentation is the scalable and distributed machine learning platform, H2O. The multi-node distributed algorithms (GLM, Random Forest, GBM, DNNs, etc) can train on datasets which are larger than RAM (of a single machine), and H2O integrates with other 'big data' systems, Hadoop and Spark. H2O is engineered for production use cases with a focus on fast training and prediction speeds. The second part of the talk will discuss a systems approach to developing novel machine learning algorithms such as H2O AutoML. Unlike well-defined ML algorithms (e.g. GBM), an 'AutoML' algorithm is an automated process which aims to train the best model (or ensemble) in a specified amount of time. I will discuss our methodology for experimentation and validation of new strategies or changes to the algorithm, using a benchmark-driven systems approach.
Bio: Erin LeDell is the Chief Machine Learning Scientist at H2O.ai. Her research focuses on automatic machine learning, ensemble machine learning and statistical computing. Before joining H2O.ai, she was the Principal Data Scientist at Wise.io and Marvin Mobile Security, the founder of DataScientific, Inc. She received her Ph.D. in Biostatistics with a Designated Emphasis in Computational Science and Engineering from University of California, Berkeley.
AbstractThe past decade has witnessed a 300,000 times increase in the amount of compute for AI. The latest natural language processing model is fueled with over trillion parameters while the memory need of neural recommendation and ranking models has grown from hundreds of gigabyte to the terabyte scale. This talk introduces the underinvested deep learning personalization and recommendation systems in the overall research community. The training of state-of-the-art industry-scale personalization and recommendation models consumes the highest number of compute cycles among all deep learning use cases at Facebook. For AI inference, recommendation use cases consume even higher compute cycles of 80%. What are the key system challenges faced by industry-scale neural personalization and recommendation models? This talk will highlight recent advances on AI system development for deep learning recommendation and the implications on infrastructure optimization opportunities across the machine learning system stack. System research for deep learning recommendation and AI at large is at a nascent stage. This talk will conclude with research directions for building and designing responsible AI systems – that is fair, efficient, and environmentally sustainable.
Bio: Carole-Jean Wu is a Technical Lead and Manager at Facebook AI Research – SysML. Her work is in the domain of computer system architecture with particular emphasis on energy- and memory-efficient systems. Her research has pivoted into designing systems for machine learning execution at-scale, such as for personalized recommender systems and mobile deployment. In general, she is interested in tackling system challenges to enable efficient, responsible AI execution. Carole-Jean chairs the MLPerf Recommendation Benchmark Advisory Board, co-chaired MLPerf Inference, and serves on the MLCommons Board as a director. Carole-Jean received her M.A. and Ph.D. from Princeton and B.Sc. from Cornell. She is the recipient of the NSF CAREER Award, Facebook AI Infrastructure Mentorship Award, the IEEE Young Engineer of the Year Award, the Science Foundation Arizona Bisgrove Early Career Scholarship, and the Intel PhD Fellowship, among a number of Best Paper awards.
AbstractDatabase management systems (DBMSs) are an important part of modern data-driven applications. However, they are notoriously difficult to deploy and administer. There are existing methods that recommend physical design or knob configurations for DBMSs. But most of them require humans to make final decisions and decide when to apply changes. The goal of a self-driving DBMS is to remove the DBMS administration impediments by managing itself autonomously. In this talk, I present the design of a new self-driving DBMS (NoisePage) that enables such automatic system management. I first discuss a forecasting framework that uses unsupervised clustering and ensemble ML models to efficiently predict the query arrival rates under varying database workload patterns. I then describe NoisePage's modeling framework that constructs and maintains ML models to predict the behavior of self-driving DBMS actions: the framework decomposes the DBMS architecture into fine-grained operating units to estimate the system's behavior under unseen configurations. I then introduce our ongoing work for an action planning framework that makes explainable decisions based on the forecasted workload and the modeled behavior. Lastly, I explain how we integrate all the self-driving components into the system.
Bio: Lin Ma (https://www.cs.cmu.edu/~malin199/) is a PhD candidate from Carnegie Mellon University Computer Science Department advised by Andy Pavlo. He is interested in database systems and machine learning. His research focus has been on designing the architecture for self-driving databases. Lin was voted the 'most congenial PhD student' in the CMU Database Group in 2017, 2018, and 2020.
AbstractHand-crafted neural architecture design has played a major role in accelerating progress in computer vision, resulting in effective backbones like ResNet. Unfortunately, these convolutional backbones are not as effective in other domains. Successfully transferring existing architectures to applications such as sequence modeling, learning on graphs, or solving partial differential equations has required the manual design of task-specific neural operations to replace convolutions. In this talk, we will first motivate the problem of 'automating architecture transfer' to enable users to find the right operations given data from their specific domain. We will next present our ongoing work on this problem, by introducing a family of neural operations called 'XD-Operations' that mimic the inductive bias of multichannel convolutions while being much more expressive, provably containing numerous well-known operations. We then demonstrate the effectiveness of XD-operations on a diverse set of applications---in some cases outperforming the latest neural operation designs.
Bio: Ameet Talwalkar is an assistant professor in the Machine Learning Department at CMU, and also co-founder and Chief Scientist at Determined AI. His interests are in the field of statistical machine learning. His current work is motivated by the goal of democratizing machine learning, with a focus on topics related to automation, fairness, interpretability, and federated learning. He led the initial development of the MLlib project in Apache Spark, is a co-author of the textbook 'Foundations of Machine Learning' (MIT Press), and created an award-winning edX MOOC on distributed machine learning. He also helped to create the MLSys conference, serving as the inaugural Program Chair in 2018, General Chair in 2019, and currently as President of the MLSys Board.
AbstractData quality management is a bottleneck in modern analytics as high-effort tasks such as data validation and cleaning are essential to obtain accurate results. In this talk, I will review how Software 2.0 can automate routine data validation tasks such as missing value imputation and detection of corrupted samples. First, I will discuss how one can leverage structured, statistical dependencies in the data to obtain information theoretically optimal data preparation methods, and then I will demonstrate how the widely-used Attention mechanism is key to automated data validation. This talk builds upon experience with projects such as HoloClean, FDX, and Picket and their application to different scientific and industrial use-cases.
Bio: Theodoros (Theo) Rekatsinas is an Assistant Professor in the Department of Computer Sciences at the University of Wisconsin-Madison, currently on leave at Apple. Theo is also a co-founder of Inductiv (now part of Apple), which developed technology that uses artificial intelligence to automate processes that involve identifying and correcting errors in data.
AbstractData Science usage at Netflix goes much beyond our eponymous recommendation systems. It touches almost all aspects of our business - from optimizing content delivery and informing buying decisions to fighting fraud. Our unique culture affords our data scientists extraordinary freedom of choice in ML tools and libraries, all of which results in an ever-expanding set of interesting problem statements and a diverse set of ML approaches to tackle them. Our data scientists, at the same time, are expected to build, deploy, and operate complex ML workloads autonomously without the need to be significantly experienced with systems or data engineering. In this talk, I will discuss some of the challenges involved in improving the development and deployment experience for ML workloads. I will focus on Metaflow, our ML framework, which offers useful abstractions for managing the model’s lifecycle end-to-end, and how a focus on human-centric design positively affects our data scientists' velocity.
Bio: Savin is a software engineer at Netflix responsible for Metaflow, Netflix's ML platform. He focuses on building generalizable infrastructure to accelerate the impact of data science at Netflix and beyond.
AbstractIn the talk I will review a set of general approaches for representing large scale textual knowledge sources that are useful for multiple downstream tasks. I will present benchmarking tools spanning multiple domains (including Question Answering, Entity Linking and Dialogue) and I will describe the latest knowledge-intensive NLP models with a focus on their efficiency.
Bio: Fabio is a Research Engineer in the Facebook Artificial Intelligence Research (FAIR) lab in London. His research focuses on Natural Language Processing, in particular, Information Extraction, Question Answering and Knowledge Representation. Prior to joining Facebook, he was with the R&D department of Thomson Reuters and received a PhD degree from Sapienza University of Rome.
AbstractI will introduce the term Hardware lottery to describe when a research idea wins because it is suited to the available software and hardware and not because the idea is superior to alternative research directions. This talk will motivate attention to hardware lotteries by discussing examples from early computer history which have delayed research progress by casting successful ideas as failures. These lessons are particularly salient given the advent of domain specialized hardware which make it increasingly costly to stray off of the beaten path of research ideas.
Bio: Sara Hooker is a researcher at Google Brain working on reliable explanations of model behavior. Her main research interests gravitate towards training models beyond test-set accuracy to be compact, robust, fair and interpretable. In 2014, she founded Delta Analytics, a non-profit dedicated to bringing technical capacity to help non-profits across the world use machine learning for good.
AbstractIn this talk, I will describe a reinforcement learning (RL) method for chip floorplanning, the engineering problem of designing the physical layout of a computer chip. Chip floorplanning ordinarily requires weeks or months of effort by physical design engineers to produce manufacturable layouts. Our method generates floorplans in under six hours that are superior or comparable to humans in all key metrics, including power consumption, performance, and chip area. To achieve this, we pose chip floorplanning as a reinforcement learning problem, and develop a novel edge-based graph convolutional neural network architecture capable of learning rich and transferrable representations of the chip. Our method was used in the design of the next generation of Google’s artificial intelligence (AI) accelerators (TPU).
Bio: Anna Goldie is a Staff Researcher at Google Brain and co-founder/tech-lead of the Machine Learning for Systems Team. She is also a PhD student in the Stanford NLP Group, where she is advised by Prof. Chris Manning. At MIT, she earned a Masters of Computer Science, Bachelors of Computer Science, and Bachelors of Linguistics. She speaks fluent Mandarin, Japanese, and French, as well as conversational Spanish, Italian, German, and Korean. Her work has been covered in various media outlets, including MIT Technology Review and IEEE Spectrum.
AbstractThe talk will introduce Ludwig, a deep learning toolbox that allows to train models and to use them for prediction without the need to write code. Thanks to its declarative configuration system and the use of data types to guide piepeline building, it helps make deep learning approachable for non-experts and enable faster model improvement iteration cycles for experienced machine learning engineers and researchers. By using Ludwig, experts and researchers can simplify the development process and focus on experiment comparison and model quality. We will also discuss recent improvements to Ludwig, including AutoML and hyperparameter optimization capabilities, its backstory and its future releases.
Bio: Piero Molino is a Staff Research Scientist at Stanford University working on Machine Learning systems and algorithms. Piero completed a PhD on Question Answering at the University of Bari, Italy. Founded QuestionCube, a startup that built a framework for semantic search and QA. Worked for Yahoo Labs in Barcelona on learning to rank, IBM Watson in New York on natural language processing with deep learning and then joined Geometric Intelligence, where he worked on grounded language understanding. After Uber acquired Geometric Intelligence, he became one of the founding members of Uber AI Labs. At Uber he worked on research topics including Dialogue Systems, Language Generation, Graph Representation Learning, Computer Vision, Reinforcement Learning and Meta Learning. He also worked on several deployed systems like COTA, an ML and NLP model for Customer Support, Dialogue Systems for driver hands free dispatch, pickup and communications, and on the Uber Eats Recommender System with graph learning. He is the author of Ludwig, a code-free deep learning toolbox.
AbstractMachine learning pipelines can successfully demonstrate high performance on train and evaluation datasets, but what happens after you promote that model to production? What are some of the challenges faced, and how do groups of different stakeholders with different technical abilities collaborate to identify and “fix” bugs? In my talk, I will draw from my experiences to describe a high level overview of modern ML infrastructure, criteria for promoting models, case studies of “bugs” encountered when clients were interacting with the live ML predictions, and the challenges in solving these issues.
Bio: Shreya is a computer scientist living in San Francisco interested in making machine learning work in the “real world.” Currently, she is taking a break from work, but previously, she was the first ML engineer at Viaduct, did ML research at Google Brain, and completed her BS and MS in computer science at Stanford.
AbstractMachine learning is quickly becoming a product engineering discipline. Although several new categories of infrastructure and tools have emerged to help teams turn their models into production systems, doing so is still extremely challenging for most companies. In this talk, we survey the tooling landscape and point out several parts of the machine learning lifecycle that are still underserved. We propose a new category of tool that could help alleviate these challenges and connect the fragmented production ML tooling ecosystem. We conclude by discussing similarities and differences between our proposed system and those of a few top companies.
Bio: Josh Tobin is the founder and CEO of a stealth machine learning startup. Previously, Josh worked as a deep learning & robotics researcher at OpenAI and as a management consultant at McKinsey. He is also the creator of Full Stack Deep Learning (fullstackdeeplearning.com), the first course focused on the emerging engineering discipline of production machine learning. Josh did his PhD in Computer Science at UC Berkeley advised by Pieter Abbeel.
AbstractDeep neural networks are pushing the state of the art in numerous machine learning research domains; from computer vision, to natural language processing, and even tabular business data. However, scaling such models to train efficiently on large datasets imposes a unique set of challenges that traditional batch data processing systems were not designed to solve. Horovod is an open source framework that scales models written in TensorFlow, PyTorch, and MXNet to train seamlessly on hundreds of GPUs in parallel. In this talk, we'll explain the concepts and unique constraints that led to the development of Horovod at Uber, and discuss how the latest trends in deep learning research are informing the future direction of the project within the Linux Foundation. We'll explore how Horovod fits into production ML workflows in industry, and how tools like Spark and Ray can combine with Horovod to make productionizing deep learning at scale on remote data centers as simple as running locally on your laptop. Finally, we'll share some thoughts on what's next for large scale deep learning, including new distributed training architectures and how the larger ecosystem of production ML tooling is evolving.
Bio: Travis Addair is a software engineer at Uber leading the Deep Learning Training team as part of the Michelangelo machine learning platform. He is the lead maintainer for the Horovod open source project and chairs its Technical Steering Committee within the Linux Foundation. In the past, he’s worked on scaling machine learning systems at Google and Lawrence Livermore National Lab.
AbstractDeep learning is computation-hungry and data-hungry. We aim to improve the computation efficiency and data efficiency of deep learning. I will first talk about MCUNet that brings deep learning to IoT devices. The technique is tiny neural architecture search (TinyNAS) co-designed with a tiny inference engine (TinyEngine), enabling ImageNet-scale inference on an IoT device with only 1MB of FLASH. Next I will talk about TinyTL that enables on-device training, reducing the memory footprint by 7-13x. Finally, I will describe Differentiable Augmentation that enables data-efficient GAN training, generating photo-realistic images using only 100 images, which used to require tens of thousand of images. We hope such TinyML techniques can make AI greener, faster, and more sustainable.
Bio: Song Han is an assistant professor in MIT EECS. He received his PhD degree from Stanford University. His research focuses on efficient deep learning computing. He proposed “deep compression” technique that can reduce neural network size by an order of magnitude without losing accuracy, and the hardware implementation “efficient inference engine” that first exploited pruning and weight sparsity in deep learning accelerators. His recent research on hardware-aware neural architecture search and TinyML was highlighted by MIT News, Wired, and Venture Beat, and received many low-power computer vision (LPCV) contest awards. Song received Best Paper awards at ICLR’16 and FPGA’17, Amazon Machine Learning Research Award, SONY Faculty Award, Facebook Faculty Award. Song was named “35 Innovators Under 35” by MIT Technology Review for his contribution on “deep compression” technique that “lets powerful artificial intelligence (AI) programs run more efficiently on low-power mobile devices.” Song received the NSF CAREER Award for “efficient algorithms and hardware for accelerated machine learning.”
AbstractMy students and I often find ourselves as "subject matter experts" needing to create video understanding models that serve computer graphics and video analysis applications. Unfortunately, like many, we are frustrated by how a smart grad student, armed with a large *unlabeled* video collection, an palette of pre-trained models, and an idea of what novel object or activity they want to detect/segment/classify, requires days-to-weeks to create and validate a model for their task. In this talk I will discuss challenges we've faced in the iterative process of curating data, training models, and validating models for the specific case of rare events and categories in image and video collections. In this regime we've found that conventional wisdom about training on imbalance data sets, and data acquisition via active learning does not lead to the most efficient solutions. I'll discuss these challenges in the context of image and video analysis applications, and elaborate on our ongoing vision of how a grad student, armed with massive amounts of unlabeled video data, pretrained models, and available-in-seconds-supercomputing-scale elastic compute should be able to interactively iterate on cycles of acquiring training data, training models, and validating models.
Bio: Kayvon Fatahalian is an Assistant Professor in the Computer Science Department at Stanford University. His lab works on visual computing systems projects, including large-scale video analytics, programming systems for video data mining, compilation techniques for optimizing image processing pipelines. In all these efforts, the goal is to enable more rapid development of applications that involve video processing at scale.
AbstractBayesian optimization has become a powerful method for the sample-efficient optimization of expensive black-box functions. These functions do not have a closed-form and are evaluated for example by running a complex economic simulation, by an experiment in the lab or in a market, or by a CFD simulation. Use cases arise in machine learning, e.g., when tuning the configuration of an ML model or when optimizing a reinforcement learning policy. Examples in engineering include the design of aerodynamic structures or materials discovery. In this talk I will introduce the key ideas of Bayesian optimization and discuss how they can be applied to tuning ML models. Moreover, I will share some experiences with developing a Bayesian optimization service in industry.
Bio: Matthias’ research interests lie at the intersection of machine learning and optimization, with a focus on Bayesian methods for 'exotic' optimization problems arising in business applications and in the natural sciences. He is a Principled Scientist at Amazon. Previously, Matthias was a Senior Manager at Uber AI, where he founded Uber’s Bayesian optimization team and led the cross-org effort that built a company-wide service to tune ML models at scale. Matthias received his PhD in CS from Goethe University in Frankfurt in 2013 and then worked as a postdoc at Cornell with David Williamson and Peter Frazier from 2014 until 2017. He was an Assistant Professor in the Department of Systems and Industrial Engineering at the University of Arizona from 2017 until 2019.
AbstractJAX is a system for high-performance machine learning research and numerical computing. It offers the familiarity of Python+NumPy together with hardware acceleration, plus a set of composable function transformations: automatic differentiation, automatic batching, end-to-end compilation (via XLA), parallelizing over multiple accelerators, and more. JAX's core strength is its guarantee that these user-wielded transformations can be composed arbitrarily, so that programmers can write math (e.g. a loss function) and transform it into pieces of an ML program (e.g. a vectorized, compiled, batch gradient function for that loss). JAX had its open-source release in December 2018 (https://github.com/google/jax). It's used by researchers for a wide range of applications, from studying training dynamics of neural networks, to probabilistic programming, to scientific applications in physics and biology.
Bio: Roy Frostig is a research scientist at Google. He's interested in forming reliable foundations for machine learning, by making software systems for ML research and by studying the statistical elements of its practice. He received his BS, MS, and PhD from Stanford, advised by Percy Liang.
AbstractThis talk covers what it means to operationalize ML models. It starts by analyzing the difference between ML in research vs. in production, ML systems vs. traditional software, as well as myths about ML production. It then goes over the principles of good ML systems design and introduces an iterative framework for ML systems design, from scoping the project, data management, model development, deployment, maintenance, to business analysis. It covers the differences between DataOps, ML Engineering, MLOps, and data science, and where each fits into the framework. It also discusses the main skills each stage requires, which can help companies in structuring their teams. The talk ends with a survey of the ML production ecosystem, the economics of open source, and open-core businesses.
Bio: Chip Huyen is an engineer who develops tools and best practices for machine learning production. She’s currently with Snorkel AI and she’ll be teaching Machine Learning Systems Design at Stanford from January 2021. Previously, she was with Netflix, NVIDIA, Primer. She’s also the author of four bestselling Vietnamese books.
AbstractOne of the key bottlenecks in building machine learning systems is creating and managing the massive training datasets that today's models require. In this talk, I will describe our work on Snorkel (snorkel.org), an open-source framework for building and managing training datasets, and describe three key operators for letting users build and manipulate training datasets: labeling functions, for labeling unlabeled data; transformation functions, for expressing data augmentation strategies; and slicing functions, for partitioning and structuring training datasets. These operators allow domain expert users to specify machine learning (ML) models entirely via noisy operators over training data, expressed as simple Python functions---or even via higher level NL or point-and-click interfaces---leading to applications that can be built in hours or days, rather than months or years, and that can be iteratively developed, modified, versioned, and audited. I will describe recent work on modeling the noise and imprecision inherent in these operators, and using these approaches to train ML models that solve real-world problems, including recent state-of-the-art results on benchmark tasks and real-world industry, government, and medical deployments.
Bio: Alex Ratner is the co-founder and CEO of Snorkel AI, Inc., which supports the open source Snorkel library and develops Snorkel Flow, an end-to-end system for building machine learning applications, and an Assistant Professor of Computer Science at the University of Washington. Prior to Snorkel AI and UW, he completed his PhD in CS advised by Christopher Ré at Stanford, where his research focused on applying data management and statistical learning techniques to emerging machine learning workflows, such as creating and managing training data, and applying this to real-world problems in medicine, knowledge base construction, and more.
AbstractA defining characteristic of federated learning is the presence of heterogeneity, i.e., that data and compute may differ significantly across the network. In this talk I show that the challenge of heterogeneity pervades the machine learning process in federated settings, affecting issues such as optimization, modeling, and fairness. In terms of optimization, I discuss FedProx, a distributed optimization method that offers robustness to systems and statistical heterogeneity. I then explore the role that heterogeneity plays in delivering models that are accurate and fair to all users/devices in the network. Our work here extends classical ideas in multi-task learning and alpha-fairness to large-scale heterogeneous networks, enabling flexible, accurate, and fair federated learning.
Bio: Virginia Smith is an assistant professor in the Machine Learning Department at Carnegie Mellon University. Her research interests span machine learning, optimization, and distributed systems. Prior to CMU, Virginia was a postdoc at Stanford University, received a Ph.D. in Computer Science from UC Berkeley, and obtained undergraduate degrees in Mathematics and Computer Science from the University of Virginia.
AbstractAlthough enterprise adoption of machine learning is still early on, many enterprises in all industries already have hundreds of internal ML applications. ML powers business processes with an impact of hundreds of millions of dollars in industrial IoT, finance, healthcare and retail. Building and operating these applications reliably requires infrastructure that is different from traditional software development, which has led to significant investment in the construction of “ML platforms” specifically designed to run ML applications. In this talk, I’ll discuss some of the common challenges in productionizing ML applications based on experience building MLflow, an open source ML platform started at Databricks. MLflow is now the most widely used open source project in this area, with over 2 million downloads a month and integrations with dozens of other products. I’ll also highlight some interesting problems users face that are not covered deeply in current ML systems research, such as the need for “hands-free” ML that can train thousands of independent models without direct tuning from the ML developer for regulatory reasons, and the impact of privacy and interpretability regulations on ML. All my examples will be based on experience at large Databricks / MLflow customers.
Bio: Matei Zaharia is an Assistant Professor of Computer Science at Stanford University and Chief Technologist at Databricks. He started the Apache Spark project during his PhD at UC Berkeley in 2009, and has worked broadly on other cluster computing and analytics software, including MLflow and Delta Lake. At Stanford, Matei is a co-PI of the DAWN Lab doing research on infrastructure for machine learning. Matei’s work was recognized through the 2014 ACM Doctoral Dissertation Award, an NSF CAREER Award, and the US Presidential Early Career Award for Scientists and Engineers (PECASE).
AbstractWe will present CheckList, a task-agnostic methodology and tool for testing NLP models inspired by principles of behavioral testing in software engineering. We will show a lot of fun bugs we discovered with CheckList, both in commercial models (Microsoft, Amazon, Google) and research models (BERT, RoBERTA for sentiment analysis, QQP, SQuAD). We'll also present comparisons between CheckList and the status quo, in a case study at Microsoft and a user study with researchers and engineers. We show that CheckList is a really helpful process and tool for testing and finding bugs in NLP models, both for practitioners and researchers.
Bio: Marco Tulio Ribeiro is a Senior Researcher at Microsoft Research. His work is on facilitating the communication between humans and machine learning models, which includes interpretability, trust, debugging, feedback, robustness, testing, etc. He received his PhD from the University of Washington.
About The Seminar
This seminar is being run by Piero Molino, Dan Fu, Karan Goel, Fiodar Kazhamakia, Matei Zaharia, and Chris Ré. You can reach us at sysmlstanfordseminar [at] gmail.
Source code for this website can be found here.