Note: We are currently advertising a position for this project:
Throughput Data Recovery and Annotation
The EarthCube program has provided a platform for transformative Geosciences. The Geoscience Paper of the Future, GeoDeepDive, Flyover Country, Project 418, CHORDS and IEDA’s Alliance Testbed Project, have already changed the ways many in the community undertake research. EarthCube activities have leveraged technical knowledge to improve discoverability, data access and management, and helped to reduce the time required to move from idea to publication in the geosciences. Challenges related to workflow management fall upon early-career researchers, researchers outside of R1 institutions, and researchers from multiple disciplines, managing data outside of core data repositories with long-term funding. Challenges can include (1) a lack of credit for data (or script) generation and re-use, (2) lack of technical knowledge around interdisciplinary workflows or new technical tools, (3) an inability to access contextual information for records with missing or incomplete metadata, (4) lack of access to secondary data or analytic results associated with publications beyond their core discipline or personal network.
This project will use existing tephra vocabularies from databases such as Neotoma to recover information using GeoDeepDive. The programmatic scripts will link publications, data resources and unique tephra events. As GeoDeepDive evolves as a core technology within the EarthCube community a critical task is to assign provenance to data extracted using the technology. This component of the project will seek to recover relationships between tephra data across CCDRs using GeoDeepDive and existing vocabularies, and also to establish best practices for linking GDD scripts to extracted data and publications in a manner that reflects potential uncertainties in data extraction.