Introduction to our JST PRESTO project (2021/10-2026/03)
Background
One of the major funding bodies for scientific research in Japan is JST, the Japan Science and Technology Agency (en/ja). JST provides a wide variety of grants both to individual researchers and teams of researchers, but the flagship grant for individuals is called PRESTO (Sakigake in Japanese; en/ja). My application to PRESTO was accepted within the research area of "Trustworthy AI" (en/ja), and its original schedule was to run from October 2021 to March 2025, with a total budget of 40 million yen. My project has since been selected to receive a one-year extension with an additional 4 million yen, so it now runs through March 2026.
On this page, I will give an overview of the goals and key ideas underlying my initial proposal, as well as provide a summary of the research papers, software, and presentations I make which are closely related to this project.
Overview of key concepts
The title of my project is "Machine learning with guarantees under diverse risk measures" and the most important underlying idea is that the current machine learning methodology rooted in performance on average needs a principled re-evaluation.
Put simply, the formal definition of "success" in most machine learning tasks can be formally expressed as minimizing the expected value (i.e., the average) of a random loss computed using some kind of loss function. Here the randomness is typically assumed to be over the random draw of a new data point at test time (i.e., after "training" is complete). This approach is perfectly natural, but in designing and evaluating learning systems (e.g., human workflows supported by machine learning software, automated systems running such software), the emphasis on the average leaves out other important properties of the random loss distribution (e.g., dispersion, heaviness of tails, symmetry, etc.). In the title of my project, I use the term "diverse risk measures" to emphasize that I want to develop new algorithms, theory, and methodologies for machine learning tasks characterized by the optimization of a wider variety of properties of the test loss distribution, including but not limited to the expected loss.
Some background reading
In the following paper, I discuss some of the key ideas underlying this project with a bit more formal notation, and also complement this with a brief historical review of statistical learning and the role played by the expected loss.
A Survey of Learning Criteria Going Beyond the Usual Risk
Matthew J. Holland and Kazuki Tanabe
Journal: Journal of Artificial Intelligence Research, 78:781-821, 2023.
Oral: AAAI 2024 (Journal track), Vancouver, Canada.
[journal, doi, arXiv]
Several works done by myself and colleagues can be considered precursors to the current PRESTO project.
Learning with risks based on M-location
Matthew J. Holland
Journal: Machine Learning, 111:4679-4718, 2022.
Oral: ECML-PKDD 2022, Grenoble, France.
[journal, doi, arXiv, code]Spectral risk-based learning using unbounded losses
Matthew J. Holland and El Mehdi Haress
Presented at AISTATS 2022, online.
Proceedings of Machine Learning Research 151:1871-1886, 2022.
[proceedings, arXiv, code]Making learning more transparent using conformalized performance prediction
Matthew J. Holland
Presented at ICML 2021, Workshop on Distribution-Free Uncertainty Quantification.
[arXiv]Learning with risk-averse feedback under potentially heavy tails
Matthew J. Holland and El Mehdi Haress
Presented at AISTATS 2021, online.
Proceedings of Machine Learning Research 130:892-900, 2021.
[proceedings, arXiv, code]Classification using margin pursuit
Matthew J. Holland
Presented at AISTATS 2019, Naha, Japan.
Proceedings of Machine Learning Research 89:712-720, 2019.
[proceedings, code]
With these works as technical and conceptual context, the following section summarizes key points regarding the progress made since starting this project.
New work since starting this project
In the first few months of the 2022 academic year (starting in April), in the vein of sharing the key ideas and initial results of this research work with a diverse audience, I gave several talks, both at universities and conferences here in Japan. In particular, the following two oral presentations were the first in-person presentations I have made since the start of the COVID outbreak.
- JSAI 2022 (Kyoto): "Achieving desirable loss distributions by design"
- NEURO 2022 (Okinawa): "Achieving desirable reward distributions by design" (received presentation award)
In addition to rigorous theoretical and experimental analysis aimed at experts in machine learning, I have also been making some effort to author an "explainer" article which breaks down the key concepts underlying this research project into a form that is congenial to a more diverse audience, inspired in part by the ICLR Blog Track introduced in 2022, currently stored in the following public GitHub repository.
offgen: A visual "explainer" for off-sample generalization metrics
Matthew J. Holland
Public GitHub Repository.
[code]
The first substantive new results build upon the "M-location" notion considered in our previous work (the ECML-PKDD 2022 paper cited above), making a significant conceptual and technical expansion by placing the notion of "dispersion" at the forefront when designing off-sample generalization metrics (i.e., risk functions). My first stab at this was presented at AISTATS 2023.
Flexible risk design using bi-directional dispersion
Matthew J. Holland
Presented at AISTATS 2023, Valencia, Spain.
Proceedings of Machine Learning Research 206:1586-1623, 2023.
[proceedings, arXiv, code]
Building on this investigation, I then began to focus my efforts on gradient descent-type algorithms which ask the loss to concentrate around a pre-specified threshold, and the potential side-effects and benefits. While a theoretical justification for how to set the threshold remains non-trivial, even a very simple, low-cost hyperparameter selection often leads to very competitive performance in practice. In addition, such procedures enjoy formal guarantees of avoiding what we call "unintended criterion collapse," in which the solution set is contained in a set of solutions the algorithm explicitly means to avoid. These initial results are in our NeurIPS 2024 and ICML 2024 papers below.
Soft ascent-descent as a stable and flexible alternative to flooding
Matthew J. Holland and Kosuke Nakatani
Presented at NeurIPS 2024, Vancouver, Canada.
Advances in Neural Information Processing Systems 37, 2024.
[proceedings, arXiv, code]Criterion Collapse and Loss Distribution Control
Matthew J. Holland
Presented at ICML 2024, Vienna, Austria.
Proceedings of Machine Learning Research 235:18547-18567, 2024.
[proceedings, arXiv, code]
As additional progress is made, this article will be updated and expanded.
