Machine Learning Meets Statistical Physics II
Mutual Information Can Be Estimated when Undersampled Data Have Low-Dimensional Latent Structure
9:00 am – 9:12 amMutual Information (MI) is a non-linear measure of statistical dependence between two variables. It is useful in analyzing complex systems, clustering, and feature selection. Estimating MI in high-dimensional spaces has traditionally been intractable due to prohibitive sample size requirements. Recent neural network based estimators promise a solution, yet it is unclear when and why they work or fail. We demonstrate that these estimators implicitly project data onto low-dimensional spaces where MI estimation becomes feasible. We argue that any effective MI estimation requires the presence of an intrinsic low-dimensional structure in the data, along with a sufficient number of samples to accurately sample the latent space, and fails otherwise. Based on this insight, we reformulate MI estimation as a dimensionality reduction problem. Specifically, we use the Deep Variational Symmetric Information Bottleneck in combination with existing estimators to learn low-dimensional embeddings that maximize the mutual information between variables. We benchmark this approach against standalone estimators and offer practical guidelines for robust MI estimation.