Employing machine learning algorithms on vast and swiftly expanding data sets often poses significant computational challenges. To mitigate this, probability sampling is frequently employed to reduce data set sizes. This method constructs a sampled data set by including each original data point with a known probability. Despite its advantages, the quality of solutions obtained from samples may significantly deviate from those derived from the complete data set. In this study, we investigate the worst-case performance degradation resulting from sampling within the domain of adaptive optimization. Many machine learning/AI problems, including active learning, can be framed as adaptive optimization challenges. Our primary contribution lies in deriving both upper and lower bounds on this performance degradation for a diverse range of utility functions.
Jing Yuan is an Assistant Professor in the Department of Computer Science and Engineering at the University of North Texas. Her research centers on AI for equitable, robust, and adaptive decision-making, integrating methodologies from machine learning, optimization, and social networks. Her contributions have been featured in prestigious publications such as the INFORMS Journal on Computing, Production and Operations Management, Operations Research Letters, and AAAI.
Event Date: March 12, 2024