As dataset diversity and data volumes continue to increase, providing users with the interfaces, tools and services they need to discover relevant datasets creates new challenges and opportunities for the improvement of search relevancy and search engine result ranking. Diversity of user communities is a challenge as well, given that relevancy depends on specific user types and needs. In this session we will report on projects and activities to improve search relevancy from the perspective of finding and utilizing Earth science in-situ, satellite and model data. We will explore search relevance on both the dataset and granule levels, dataset relationships/dependencies, semantic relationships, data quality, user characterization and content based ranking. Specifically we will report on search relevance activities and results ongoing in NASA, NCAR and other organizations with the goal of building synergy among community experiences, and developing strategies to improve search relevance and user experience across the entire spectrum of Earth science data and data users. PRESENTATIONS: 1) NASA Progress in Search Relevancy Edward M. Armstrong, Lewis McGibbney, Kim Whitehall NASA Jet Propulsion Laboratory Recently the NASA ESDSWG on Search Relevance concluded its first year of activities to address search relevance across the 12 NASA earth science data centers. It was originally proposed to characterize the term search relevancy as it relates to ESDIS, to assess the implementations that address search relevancy, and to determine how existing implementations can be improved and new approaches be enacted. Individually and collectively, the group sought the expertise of persons within ESDIS, industry and academia. Five core subgroups (from an initial collection of ten) were organized on the topics of Spatial Relevance, Temporal Relevance, Dataset Heuristics, Dataset Relationships, and Federated Search: Spatial relevance; This subgroup aimed to provide direction on substantiated metrics on methods to define spatial overlap in searches with the purpose to improve relevance ranking based on dataset spatial characteristics. Temporal relevance; This subgroup aimed to provide direction on substantiated metrics on methods to define temporal overlap in searches with the purpose to improve relevance ranking based on dataset temporal characteristics. Dataset relevance heuristics; This subgroup aimed to identify the top heuristics in Common Metadata Repository (CMR) and other search engines and applicability to EOSDIS and DAACs, with the purpose of taking a first pass at the dataset (collection) search problem. Dataset relationships; This subgroup aimed to provide a common framework for identifying relatedness across datasets with the purpose of lowering the barrier to obtaining similar datasets for a given user query. Implementation in Federated Search; This subgroup aimed to provide substantiated metrics and guidance on improving Information Retrieval (IR) practices within a Federated Search (FS) context defined as an IR technology that allows the simultaneous search of multiple searchable resources In this presentation we will summarize the findings and recommendations of the first year of the group activities as well as discuss our plans and progress for year 2 activities including addressing semantic dataset relationships, granule level relevance. mining user behavior, and optimizing content for commercial search engines. 2) Challenges and Potential Approach in Search Relevance from a Dataset Maturity Perspective Ge Peng NOAA CICS-NC/NCEI 3) Connecting diverse data users with diverse data sets at the NCAR Research Data Archive Grace Peng NCAR For more than 40 years, the RDA (rda.ucar.edu) has been collecting and disseminating weather and climate data to the research community. We host a growing collection of over 600 datasets from myriad sources, from ocean observations in 1662 to present day satellite measurements and globally gridded analyses/reanalyses. From inception, RDA data users have ranged from neophyte graduate student through professors with decades of experience. Increasingly, researchers from outside the weather and climate community (energy, insurance, government sectors) are using our data. This is a sign of our success and maturity. However, diverse user backgrounds means that we can no longer assume a common lingua franca when describing our data. In order to help researchers sift through our datasets to find what they need, we collect granule-level metadata that powers the RDA search functions. Users may search with free text, or perform faceted searches to successively narrow down possible selections. Once they have identified datasets of interest, they are directed to data set homepages which enable them to examine the parameters available for each file and vertical level. The granule-level metadata also enables us to offer custom subsetting services for most data sets. Because we are a .edu and teaching is part of our mission, we do not aim to fully automate our data discovery and other services. Each dataset is assigned to a data specialist who serves multiple roles as a data curator engineer software developer subject matter expert and educator. When data users are unsure about some aspect of the data, we want to engage with them to help clear up their confusion. This helps raise the level of sophistication of the data users and our understanding of how to better describe and refactor data to improve future usability. In this presentation, I will demonstrate some of our data discovery and education capabilities. I will also give an overview of our manual and automated metadata collection processes, which enables our search functions. 4) Earthdata Search: The Relevance of Relevance Patrick Quinn NASA Earthdata Search is a web application which allows users to search, discover, visualize, refine, and access NASA and International/Interagency data about the Earth. As a client to the CMR, its catalog of collections grew 700% in the past year. This massive expansion brought relevancy to the forefront of the client's usability needs. In this talk, we discuss places where the influx of collections helped illuminate existing usability issues and how we have tackled or plan to tackle those challenges by improving relevance and metadata.
Speakers
Enterprise Search Technologist III, Jet Propulsion Laboratory