John Schnase and Dan Duffy
Meeting the Big Data Challenges of Climate Science through Cloud-Enabled Climate Analytics-as-a-Service
Wednesday, March 26, 2014
Building 3 Auditorium - 11:00 AM
(Coffee and cookies at 10:30 AM)
Climate science is a big data domain that is experiencing unprecedented growth. In our efforts to address the big data challenges of climate science, we are moving toward a notion of Climate Analytics-as-a-Service (CAaaS), a specialization of the concept of business process-as-a-service that is an evolving extension of IaaS, PaaS, and SaaS enabled by cloud computing. In this presentation, we will describe two projects that demonstrate this shift.
MERRA Analytic Services (MERRA/AS) is an example of cloud-enabled CAaaS. The MERRA reanalysis integrates observational data with numerical models to produce a global temporally and spatially consistent synthesis of 26 key climate variables. It represents a type of data product that is of growing importance to scientists doing climate change research and a wide range of decision support applications. MERRA/AS enables MapReduce analytics over the MERRA data collection by bringing together the following elements in a full, end-to-end demonstration of CAaaS capabilities: (1) high-performance, data proximal analytics, (2) scalable data management, (3) software appliance virtualization, (4) adaptive analytics, and (5) a domain-harmonized API. The effectiveness of MERRA/AS has been demonstrated in several applications.
NASA's High-Performance Science Cloud (HPSC) is an example of the type of compute-storage fabric required to support CAaaS. The HPSC combines several technologies in use within the NCCS: (1) virtualized high speed Infiniband network, (2) combined high performance file system and object storage, and (3) virtual system environments specific for data intensive, science applications. At the center of the HPSC resource is a large object storage environment that combines computation with data storage capabilities, which allows users to access the object storage environment much like a traditional file system, while also providing the capability to perform data proximal processing using technologies like a Hadoop Distributed File System (HDFS). Surrounding the storage is a cloud of high performance compute resources with many processing cores and large memory coupled to the storage through an Infiniband network. Through the use of technologies such as Single Root Input/Output Virtualization (SR-IOV), virtual systems can be provisioned on the compute resources with extremely high-speed network connectivity to the storage and to other virtual systems.
These technologies are providing a new tier in the data and analytic services stack that helps connect earthbound, enterprise-level data and computational resources to new customers and new mobility-driven applications and modes of work. In our experience, CAaaS lowers the barriers and risk to organizational change, fosters innovation and experimentation, and provides the agility required to meet our customers' increasing and changing needs.
Dr. John Schnase is a Senior Computer Scientist in NASA Goddard Space Flight Center's Office of Computational and Information Sciences and Technology (Code 606). He leads the Office's climate informatics R&D activities, which focus on the development of new information technologies the their transfer into practical use. He also is a Principal Investigator on the Applied Sciences Program's RECOVER Project, which is building cloud-based decision support capabilities for post-wildfire ecosystem recovery. John's training and experience is in ecology and computer science. Before joining NASA in 1999, John's work on the life history of Cassin's Sparrow (Aimophila cassinii) resulted in an early application of computers in avian energetics modeling. As a co-organizer of the ACM Hypertext '91 Conference, he assisted Tim Berners-Lee with the first public demonstration in the United States of communication between a Hypertext Transfer Protocol (http) client and server via the Internet. John attended Angelo State University, the University of Texas at Austin, Baylor College of Medicine, and Texas A&M University, where he earned his PhD in Computer Science in 1992. He now Chairs the External Advisory Board of NFS's DataNet Federation Consortium, is a member of the Executive Committee of the Computing Accreditation Commission of ABET, is a Fellow of the American Association for the Advancement of Science (AAAS), and is a former member of the Biodiversity and Ecosystems Panel of the President's Committee of Advisors on Science and Technology (PCAST).
Dr. Dan Duffy is head of the NASA Center for Climate Simulation (NCCS, Code 606.2), which provides high performance computing, storage, networking, and data systems designed to meet the specialized needs of the Earth science modeling communities. He has worked on a number of applied research and development projects to explore technologies for the next generation of high performance computing solutions for NASA scientists, including being the co-Investigator on the MERRA Analytics Service Project, which has lead to the formulation of Climate Analytics-as-a-Service (CAaaS). Prior to being the high performance computing lead, Dan served as the lead system engineer for the NCCS over the past 10 years where he has architected dramatic increases in computational and storage capabilities for NASA scientists. Before joining NASA in 2003, Dan worked on highly parallel applications for the Department of Defense (DoD). Dr. Duffy received undergraduate degrees in Physics and Computer Science from Western Kentucky University in 1990, M.S. in Physics and M.S. in Science Education from Florida State University in 1993, and his Ph.D. in theoretical physics from Florida State University in 1997. Dr. Duffy's research in physics focused on the properties of highly correlated electron systems with a specific focus toward a better understanding of high temperature superconductivity. Dr. Duffy has also taught high school physics, undergraduate physics (during his post doctoral work at the University of Santa Barbara), and has given tutorials on parallel methods for high performance applications.
IS&T Colloquium Committee Host: Jim Fischer
Sign language interpreter upon request: 301-286-7040