Statistical Approaches to Understanding Modern ML Methods
Aug 2-4, 2021
University of Wisconsin–Madison
When we use modern machine learning (ML) systems, the output often consists of a trained model with good performance on a test dataset. This satisfies some of our goals in performing data analysis, but leaves many unaddressed — for instance, we may want to build an understanding of the underlying phenomena, to provide uncertainty quantification about our conclusions, or to enforce constraints of safety, fairness, robustness, or privacy. As an example, classical statistical methods for quantifying a model’s variance rely on strong assumptions about the model — assumptions that can be difficult or impossible to verify for complex modern ML systems such as neural networks.
This workshop will focus on using statistical methods to understand, characterize, and design ML models — for instance, methods that probe “black-box” ML models (with few to no assumptions) to assess their statistical properties, or tools for developing likelihood-free and simulation-based inference. Central themes of the workshop may include:
Using the output of a ML system to perform statistical inference, compute prediction intervals, or quantify measures of uncertainty
Using ML systems to test for conditional independence
Extracting interpretable information such as feature importance or causal relationships
Integrating likelihood-free inference with ML
Developing mechanisms for enforcing privacy, robustness, or stability constraints on the output of ML systems
Exploring connections to transfer learning and domain adaptation
Automated tuning of hyperparameters in black-box models and derivative-free optimization
Organizers
Participants
Schedule
Morning | Conformal Prediction Methods | Speaker |
---|---|---|
9:15-9:30 | Welcome and Introduction | |
9:30-10:15 | Conformal prediction, testing, and robustness | Vladimir Vovk |
10:15-11:00 | Conformalized Survival Analysis | Lihua Lei |
11:00-11:15 | Break | |
11:15-12:00 | Conformal prediction intervals for dynamic time series | Yao Xie |
Afternoon | Challenges and Trade-offs in Deep Learning | Speaker |
2:00-2:45 | The Statistical Trade-offs of Generative Modeling with Deep Neural Networks | Zaid Harchaoui |
2:45-3:30 | The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers | Hanie Sedghi |
3:30-3:45 | Break | |
3:45-4:30 | Uncovering the Unknowns of Deep Neural Networks: Challenges and Opportunities | Sharon Li |
5:00-8:00 | Reception at Tripp Commons, Memorial Union |
Morning | Robust Learning | Speaker |
---|---|---|
9:30-10:15 | Optimal doubly robust estimation of heterogeneous causal effects | Edward Kennedy |
10:15-11:00 | Robust W-GAN-Based Estimation Under Wasserstein Contamination | Po-Ling Loh |
11:00-11:15 | Break | |
11:15-12:00 | PAC Prediction Sets Under Distribution Shift | Osbert Bastani |
Afternoon | Interpretation in Black-Box Learning | Speaker |
2:00-2:45 | Floodgate: Inference for Variable Importance with Machine Learning | Lucas Janson |
2:45-3:30 | From Predictions to Decisions: A Black-Box Approach to the Contextual Bandit Problem | Dylan Foster |
3:30-3:45 | Break | |
3:45-5:00 | **Lightning talks** by in-person participants | |
Searching for Synergy in High-Dimensional Antibiotic Combinations | Jennifer Brennan (Seattle) | |
Supervised tensor decomposition with features on multiple modes | Jiaxin Hu (Madison) | |
Latent Preference Matrix Estimation with Graph Side Information | Changhun Jo (Madison) | |
Nonconvex Factorization and Manifold Formulations are Almost Equivalent in Low-rank Matrix Optimization | Yuetian Luo (Madison) | |
Excess Capacity and Backdoor Poisoning | Naren Manoj (TTIC) | |
Risk bounds for regression and classification with structured feature maps | Andrew McRae (GaTech) | |
Robust regression with covariate filtering: Heavy tails and adversarial contamination | Ankit Pensia (Madison) | |
Derandomizing knockoffs | Zhimei Ren (Chicago) |
Morning | Modern Statistical Methodologies pt I | Speaker |
---|---|---|
9:30-10:15 | Discrete Splines: Another Look at Trend Filtering and Related Problems | Ryan Tibshirani |
10:15-11:00 | A quick tour of distribution-free post-hoc calibration | Aaditya Ramdas |
11:00-11:15 | Break | |
11:15-12:00 | Recovering from Biased Data: Can Fairness Constraints Improve Accuracy? | Avrim Blum |
Afternoon | Modern Statistical Methodologies pt II | Speaker |
2:00-2:45 | Model-assisted analyses of cluster-randomized experiments | Peng Ding |
2:45-3:30 | Simulation-based inference: recent progress and open questions | Kyle Cranmer |
3:30-3:45 | Break | |
3:45-4:30 | Statistical challenges in Adversarially Robust Machine Learning | Kamalika Chaudhuri |
Slides
All slides for the workshop are included in the gallery below. Click on a poster to see it in full screen.