Knowledge-Guided Machine Learning: A New Framework for Accelerating Scientific Discovery

Since Process-based models of dynamical systems are often used to study engineering and environmental systems. Despite their extensive use, these models have several well-known limitations due to incomplete or inaccurate representations of the physical processes being modeled. There is a tremendous opportunity to systematically advance modeling in these domains by using state of the art machine learning (ML) methods that have already revolutionized computer vision and language translation. However, capturing this opportunity is contingent on a paradigm shift in data-intensive scientific discovery since the “black box” use of ML often leads to serious false discoveries in scientific applications. Because the hypothesis space of scientific applications is often complex and exponentially large, an uninformed data-driven search can easily select a highly complex model that is neither generalizable nor physically interpretable, resulting in the discovery of spurious relationships, predictors, and patterns. This problem becomes worse when there is a scarcity of labeled samples, which is quite common in science and engineering domains.

This talk makes a case that in a real-world systems that are governed by physical processes, there is an opportunity to take advantage of fundamental physical principles to inform the search of a physically meaningful and accurate ML model. While this talk will illustrate the potential of the knowledge-guided machine learning (KGML) paradigm in the context of environmental problems (e.g., Fresh water science, Hydrology, Agroecology), the paradigm has the potential to greatly advance the pace of discovery in a diverse set of discipline where mechanistic models are used, e.g., power engineering, climate science, weather forecasting, and pandemic management.

Research funded by NSF (Expeditions in Computing, BIGDATA, STC, GCR, and HDR programs), DARPA, ARPA-E, and USGS

Date/Time
Wednesday, December 14, 2022, 11am-12pm EST

This seminar can be viewed remotely via Microsoft Teams: Join here

IS&T Colloquium Committee Host: Matt Dosberg

Recording available via NASA MS Stream.

Vipin Kumar
University of Minnesota
URL: http://www.cs.umn.edu/~kumar

Vipin Kumar is a Regents Professor and holds William Norris Chair in the department of Computer Science and Engineering at the University of Minnesota. His research spans data mining, high-performance computing, and their applications in Climate/Ecosystems and health care. He also served as the Director of Army High Performance Computing Research Center (AHPCRC) from 1998 to 2005. He has authored over 400 research articles, and co-edited or coauthored 10 books including the widely used textbook ``Introduction to Parallel Computing", and "Introduction to Data Mining". Kumar's current major research focus is on knowledge-guided machine learning and its applications to understanding the impact of human induced changes on the Earth and its environment. Kumar’s research on this topic is funded by NSF’s BIGDATA, INFEWS, STC, and HDR programs, as well as ARPA-E, DARPA, and USGS. He has recently finished serving as the Lead PI of a 5-year, $10 Million project, "Understanding Climate Change - A Data Driven Approach", funded by the NSF's Expeditions in Computing program. Kumar is a Fellow of the ACM, IEEE, AAAS, and SIAM. Kumar's foundational research in data mining and high-performance computing has been honored by the ACM SIGKDD 2012 Innovation Award, which is the highest award for technical excellence in the field of Knowledge Discovery and Data Mining (KDD), the 2016 IEEE Computer Society Sidney Fernbach Award, one of IEEE Computer Society's highest awards in high performance computing, and Test-of-time award from 2021 Supercomputing conference (SC21).

Posted in Fall 2022, Uncategorized