Statistical Approaches to Understanding Modern ML Methods

Aug 2-4, 2021
University of Wisconsin–Madison

When we use modern machine learning (ML) systems, the output often consists of a trained model with good performance on a test dataset. This satisfies some of our goals in performing data analysis, but leaves many unaddressed — for instance, we may want to build an understanding of the underlying phenomena, to provide uncertainty quantification about our conclusions, or to enforce constraints of safety, fairness, robustness, or privacy. As an example, classical statistical methods for quantifying a model’s variance rely on strong assumptions about the model — assumptions that can be difficult or impossible to verify for complex modern ML systems such as neural networks. 

This workshop will focus on using statistical methods to understand, characterize, and design ML models — for instance, methods that probe “black-box” ML models (with few to no assumptions) to assess their statistical properties, or tools for developing likelihood-free and simulation-based inference. Central themes of the workshop may include:

  • Using the output of a ML system to perform statistical inference, compute prediction intervals, or quantify measures of uncertainty

  • Using ML systems to test for conditional independence

  • Extracting interpretable information such as feature importance or causal relationships

  • Integrating likelihood-free inference with ML

  • Developing mechanisms for enforcing privacy, robustness, or stability constraints on the output of ML systems

  • Exploring connections to transfer learning and domain adaptation

  • Automated tuning of hyperparameters in black-box models and derivative-free optimization

Organizers

Participants

Osbert Bastani
Osbert Bastani
PAC prediction sets under distribution shift
Avrim Blum
Avrim Blum
Recovering from biased data: Can fairness constraints improve accuracy
Kamalika Chaudhuri
Kamalika Chaudhuri
Statistical challenges in adversarially robust machine learning
Kyle Cranmer
Kyle Cranmer
Simulation-based inference: recent progress and open questions
Peng Ding
Peng Ding
Model-assisted analyses of cluster-randomized experiments
Dylan Foster
Dylan Foster
From predictions to decisions: A black-box approach to the contextual bandit problem
Zaid Harchaoui
Zaid Harchaoui
The statistical trade-offs of generative modeling with deep neural networks
Lucas Janson
Lucas Janson
Floodgate: Inference for variable importance with machine learning
Edward Kennedy
Edward Kennedy
Optimal doubly robust estimation of heterogeneous causal effects
Lihua Lei
Lihua Lei
Conformalized survival analysis
Sharon Li
Sharon Li
Uncovering the unknowns of deep neural networks: Challenges and opportunities
Po-Ling Loh
Po-Ling Loh
Robust W-GAN-based estimation under Wasserstein contamination
Aaditya Ramdas
Aaditya Ramdas
A quick tour of distribution-free post-hoc calibration
Hanie Sedghi
Hanie Sedghi
The deep bootstrap framework: Good online learners are good offline generalizers
Ryan Tibshirani
Ryan Tibshirani
Discrete splines: Another look at trend filtering and related problems
Vladimir Vovk
Vladimir Vovk
Conformal prediction, testing, and robustness
Yao Xie
Yao Xie
Conformal prediction intervals for dynamic time series

Schedule

Morning Conformal Prediction Methods Speaker
9:15-9:30 Welcome and Introduction
9:30-10:15 Conformal prediction, testing, and robustness Vladimir Vovk
10:15-11:00 Conformalized Survival Analysis Lihua Lei
11:00-11:15 Break
11:15-12:00 Conformal prediction intervals for dynamic time series Yao Xie
Afternoon Challenges and Trade-offs in Deep Learning Speaker
2:00-2:45 The Statistical Trade-offs of Generative Modeling with Deep Neural Networks Zaid Harchaoui
2:45-3:30 The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers Hanie Sedghi
3:30-3:45 Break
3:45-4:30 Uncovering the Unknowns of Deep Neural Networks: Challenges and Opportunities Sharon Li
5:00-8:00 Reception at Tripp Commons, Memorial Union
Morning Robust Learning Speaker
9:30-10:15 Optimal doubly robust estimation of heterogeneous causal effects Edward Kennedy
10:15-11:00 Robust W-GAN-Based Estimation Under Wasserstein Contamination Po-Ling Loh
11:00-11:15 Break
11:15-12:00 PAC Prediction Sets Under Distribution Shift Osbert Bastani
Afternoon Interpretation in Black-Box Learning  Speaker
2:00-2:45 Floodgate: Inference for Variable Importance with Machine Learning Lucas Janson
2:45-3:30 From Predictions to Decisions: A Black-Box Approach to the Contextual Bandit Problem Dylan Foster
3:30-3:45 Break
3:45-5:00 **Lightning talks** by in-person participants
Searching for Synergy in High-Dimensional Antibiotic Combinations Jennifer Brennan (Seattle)
Supervised tensor decomposition with features on multiple modes Jiaxin Hu (Madison)
Latent Preference Matrix Estimation with Graph Side Information Changhun Jo (Madison)
Nonconvex Factorization and Manifold Formulations are Almost Equivalent in Low-rank Matrix Optimization Yuetian Luo (Madison)
Excess Capacity and Backdoor Poisoning Naren Manoj (TTIC)
Risk bounds for regression and classification with structured feature maps Andrew McRae (GaTech)
Robust regression with covariate filtering: Heavy tails and adversarial contamination Ankit Pensia (Madison)
Derandomizing knockoffs Zhimei Ren (Chicago)
Morning Modern Statistical Methodologies pt I Speaker
9:30-10:15 Discrete Splines: Another Look at Trend Filtering and Related Problems  Ryan Tibshirani
10:15-11:00 A quick tour of distribution-free post-hoc calibration Aaditya Ramdas
11:00-11:15 Break
11:15-12:00 Recovering from Biased Data: Can Fairness Constraints Improve Accuracy? Avrim Blum
Afternoon Modern Statistical Methodologies pt II Speaker
2:00-2:45 Model-assisted analyses of cluster-randomized experiments Peng Ding
2:45-3:30 Simulation-based inference: recent progress and open questions Kyle Cranmer
3:30-3:45 Break
3:45-4:30 Statistical challenges in Adversarially Robust Machine Learning Kamalika Chaudhuri

Slides

All slides for the workshop are included in the gallery below. Click on a poster to see it in full screen.