Statistical Approaches to Understanding Modern ML Methods

Aug 2-4, 2021
University of Wisconsin–Madison

When we use modern machine learning (ML) systems, the output often consists of a trained model with good performance on a test dataset. This satisfies some of our goals in performing data analysis, but leaves many unaddressed — for instance, we may want to build an understanding of the underlying phenomena, to provide uncertainty quantification about our conclusions, or to enforce constraints of safety, fairness, robustness, or privacy. As an example, classical statistical methods for quantifying a model’s variance rely on strong assumptions about the model — assumptions that can be difficult or impossible to verify for complex modern ML systems such as neural networks.

This workshop will focus on using statistical methods to understand, characterize, and design ML models — for instance, methods that probe “black-box” ML models (with few to no assumptions) to assess their statistical properties, or tools for developing likelihood-free and simulation-based inference. Central themes of the workshop may include:

Using the output of a ML system to perform statistical inference, compute prediction intervals, or quantify measures of uncertainty
Using ML systems to test for conditional independence
Extracting interpretable information such as feature importance or causal relationships
Integrating likelihood-free inference with ML
Developing mechanisms for enforcing privacy, robustness, or stability constraints on the output of ML systems
Exploring connections to transfer learning and domain adaptation
Automated tuning of hyperparameters in black-box models and derivative-free optimization

Organizers

Rina Foygel Barber

Lu Mao

Michael Newton

Robert Nowak

Rebecca Willett

Participants

Osbert Bastani

PAC prediction sets under distribution shift

Avrim Blum

Recovering from biased data: Can fairness constraints improve accuracy

Kamalika Chaudhuri

Statistical challenges in adversarially robust machine learning

Kyle Cranmer

Simulation-based inference: recent progress and open questions

Peng Ding

Model-assisted analyses of cluster-randomized experiments

Dylan Foster

From predictions to decisions: A black-box approach to the contextual bandit problem

Zaid Harchaoui

The statistical trade-offs of generative modeling with deep neural networks

Lucas Janson

Floodgate: Inference for variable importance with machine learning

Edward Kennedy

Optimal doubly robust estimation of heterogeneous causal effects

Lihua Lei

Conformalized survival analysis

Sharon Li

Uncovering the unknowns of deep neural networks: Challenges and opportunities

Po-Ling Loh

Robust W-GAN-based estimation under Wasserstein contamination

Aaditya Ramdas

A quick tour of distribution-free post-hoc calibration

Hanie Sedghi

The deep bootstrap framework: Good online learners are good offline generalizers

Ryan Tibshirani

Discrete splines: Another look at trend filtering and related problems

Vladimir Vovk

Conformal prediction, testing, and robustness

Yao Xie

Conformal prediction intervals for dynamic time series

Schedule

Monday

Morning	Conformal Prediction Methods	Speaker
9:15-9:30	Welcome and Introduction
9:30-10:15	*Conformal prediction, testing, and robustness*	Vladimir Vovk
10:15-11:00	*Conformalized Survival Analysis*	Lihua Lei
11:00-11:15	Break
11:15-12:00	*Conformal prediction intervals for dynamic time series*	Yao Xie
Afternoon	Challenges and Trade-offs in Deep Learning	Speaker
2:00-2:45	The Statistical Trade-offs of Generative Modeling with Deep Neural Networks	Zaid Harchaoui
2:45-3:30	The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers	Hanie Sedghi
3:30-3:45	Break
3:45-4:30	Uncovering the Unknowns of Deep Neural Networks: Challenges and Opportunities	Sharon Li
5:00-8:00	Reception at Tripp Commons, Memorial Union

Tuesday

Morning	Robust Learning	Speaker
9:30-10:15	Optimal doubly robust estimation of heterogeneous causal effects	Edward Kennedy
10:15-11:00	Robust W-GAN-Based Estimation Under Wasserstein Contamination	Po-Ling Loh
11:00-11:15	Break
11:15-12:00	PAC Prediction Sets Under Distribution Shift	Osbert Bastani
Afternoon	Interpretation in Black-Box Learning	Speaker
2:00-2:45	Floodgate: Inference for Variable Importance with Machine Learning	Lucas Janson
2:45-3:30	From Predictions to Decisions: A Black-Box Approach to the Contextual Bandit Problem	Dylan Foster
3:30-3:45	Break
3:45-5:00	Lightning talks by in-person participants
	Searching for Synergy in High-Dimensional Antibiotic Combinations	Jennifer Brennan (Seattle)
	Supervised tensor decomposition with features on multiple modes	Jiaxin Hu (Madison)
	Latent Preference Matrix Estimation with Graph Side Information	Changhun Jo (Madison)
	Nonconvex Factorization and Manifold Formulations are Almost Equivalent in Low-rank Matrix Optimization	Yuetian Luo (Madison)
	Excess Capacity and Backdoor Poisoning	Naren Manoj (TTIC)
	Risk bounds for regression and classification with structured feature maps	Andrew McRae (GaTech)
	Robust regression with covariate filtering: Heavy tails and adversarial contamination	Ankit Pensia (Madison)
	Derandomizing knockoffs	Zhimei Ren (Chicago)

Wednesday

Morning	Modern Statistical Methodologies pt I	Speaker
9:30-10:15	Discrete Splines: Another Look at Trend Filtering and Related Problems	Ryan Tibshirani
10:15-11:00	A quick tour of distribution-free post-hoc calibration	Aaditya Ramdas
11:00-11:15	Break
11:15-12:00	Recovering from Biased Data: Can Fairness Constraints Improve Accuracy?	Avrim Blum
Afternoon	Modern Statistical Methodologies pt II	Speaker
2:00-2:45	*Model-assisted analyses of cluster-randomized experiments*	Peng Ding
2:45-3:30	Simulation-based inference: recent progress and open questions	Kyle Cranmer
3:30-3:45	Break
3:45-4:30	Statistical challenges in Adversarially Robust Machine Learning	Kamalika Chaudhuri

Slides

All slides for the workshop are included in the gallery below. Click on a poster to see it in full screen.