How Do We Facilitate the Use of Large Amounts of Heterogeneous Data, or ... Is Gleaning Knowledge from information in Reach?
Wednesday, March 12, 2014
Building 3 Auditorium - 11:00 AM
(Coffee and cookies at 10:30 AM)
Large Heterogeneous Datasets (aka Big Data) have come upon us, but what exactly does this mean. Actually, it means different things to different people. Articles, Symposiums, Conferences, Working Groups, and even Colloquiums, are rapidly surfacing to capture, share, and define the various aspects of Large Heterogeneous Datasets, from theoretical concepts to operational implementation: Reference Architecture, Infrastructure, Management, Search and Mining, Security & Privacy, Applications, Implementation, to mention a few. In addition, use cases are being compiled.
On the continuum of ever evolving data management systems, we need to understand and develop ways that allow data relationships to be examined, and information to be manipulated, such that knowledge can be enhanced, to facilitate science. In short, we have a lot of data that we have not provided opportunity for users to holistically 'mine'.
This presentation, after laying down relevant definitions, examines various aspects and activities related to the 'Big Data Problem'. Specifically, in order to address the ultimate goal of enhancing the cross use and intellectual combining of heterogeneous datasets in order to find unobvious data relationships, the importance of implementing relevant data analytics, not so much concerned with individual analyses or analysis steps, but with the entire discovery methodology, becomes significant. The importance of employing analytics methodologies at NASA, as well as potential strategic directions addressing the field of advanced data usage, will be examined. This includes an understanding of the combination of skills needed, including analytic, machine learning, data mining and statistical analysis, as well as experience with algorithms and coding, specifically applied to particular science domains. Thus, the need for and development of the Data Scientist will also be examined.
Steve Kempler has served as Manager of the Goddard Earth Sciences (GES) Data and Information Services Center (DISC) since 1998. During his tenure, he has overseen development, implementation, and operations of NASA's ambitious Earth Observing System Data and Information System (EOSDIS) at the GES DISC, as well as facilitated the evolution of Earth science data systems, emphasizing the develop and operation of innovative solutions to Earth Science Data Management challenges. While being responsive to data management needs of future and existing Earth science projects and short term information technology projects, he has recognized the need to examine and develop concepts that will further enable science research and applications through the use of advanced data system technologies. In particular, promoting the homogenization of related independently generated datasets to facilitate information extraction. And now, to further this through examination, promotion, and implementation of analytics methodologies that help researchers fully utilize heterogeneous datasets. Prior to serving the GES DISC, amongst his many past propjects, he is particularly proud to have worked on the science shattering missions, Voyager and COBE. He received his B.A. (1976) in Geography from SUNY, Cortland and his M.A. (1979) in Physical Geography (Atmospheric Science) from Ohio State University. He is a member of the AGU (Informatics), IEEE, and ESIP Federation. He has authored and co-authored several articles pertaining to data usability for Earth science research and applications.
IS&T Colloquium Committee Host: Jacqueline Le Moigne
Sign language interpreter upon request: 301-286-7040