An Automatic Finite-Sample Robustness Metric: Can Dropping a Little Data Change Conclusions? – Prof. Tamara Broderick
Join us for the Data Sciences Speaker Series with Prof. Tamara Broderick, Associate Professor, Department of Electrical Engineering and Computer Science, MIT. This talk is co-sponsored by the Data Sciences Institute and the Master of Science in Applied Computing at the University of Toronto.
Registration Required – REGISTER HERE
- Date: September 19, 2022
- Time: 4:00 pm – 5:00 pm ET
- Format: Virtual
Talk Title: An Automatic Finite-Sample Robustness Metric: Can Dropping a Little Data Change Conclusions?
Description: One hopes that data analyses will be used to make beneficial decisions regarding people’s health, finances, and well-being. But the data fed to an analysis may systematically differ from the data where these decisions are ultimately applied. For instance, suppose we analyze data in one country and conclude that microcredit is effective at alleviating poverty; based on this analysis, we decide to distribute microcredit in other locations and in future years. We might then ask: can we trust our conclusion to apply under new conditions? If we found that a very small percentage of the original data was instrumental in determining the original conclusion, we might expect the conclusion to be unstable under new conditions. So we propose a method to assess the sensitivity of data analyses to the removal of a very small fraction of the data set. Analyzing all possible data subsets of a certain size is computationally prohibitive, so we provide an approximation. We call our resulting method the Approximate Maximum Influence Perturbation. Our approximation is automatically computable, theoretically supported, and works for common estimators — including (but not limited to) OLS, IV, GMM, MLE, MAP, and variational Bayes. We show that any non-robustness our metric finds is conclusive. Empirics demonstrate that while some applications are robust, in others the sign of a treatment effect can be changed by dropping less than 0.1% of the data– even in simple models and even when standard errors are small.
About the speaker: Tamara Broderick is an Associate Professor in the Department of Electrical Engineering and Computer Science at MIT. She is a member of the MIT Laboratory for Information and Decision Systems (LIDS), the MIT Statistics and Data Science Center, and the Institute for Data, Systems, and Society (IDSS). She completed her Ph.D. in Statistics at the University of California, Berkeley in 2014. Previously, she received an AB in Mathematics from Princeton University (2007), a Master of Advanced Study for completion of Part III of the Mathematical Tripos from the University of Cambridge (2008), an MPhil by research in Physics from the University of Cambridge (2009), and an MS in Computer Science from the University of California, Berkeley (2013). Her recent research has focused on developing and analyzing models for scalable Bayesian machine learning. She has been awarded selection to the COPSS Leadership Academy (2021), an Early Career Grant (ECG) from the Office of Naval Research (2020), an AISTATS Notable Paper Award (2019), an NSF CAREER Award (2018), a Sloan Research Fellowship (2018), an Army Research Office Young Investigator Program (YIP) award (2017), Google Faculty Research Awards, an Amazon Research Award, the ISBA Lifetime Members Junior Researcher Award, the Savage Award (for an outstanding doctoral dissertation in Bayesian theory and methods), the Evelyn Fix Memorial Medal and Citation (for the Ph.D. student on the Berkeley campus showing the greatest promise in statistical research), the Berkeley Fellowship, an NSF Graduate Research Fellowship, a Marshall Scholarship, and the Phi Beta Kappa Prize (for the graduating Princeton senior with the highest academic average).