Diving into Mega Metadata for Coral Reef Research

Collaborative Piece with Ouida Meier

DDS and CRESYNT make big strides

Representatives from the three largest NSF-funded coral reef data repositories came together in one room. They joined metadata experts, data managers, domain scientists, and software engineers. Of course they would dive deep into progress for data interoperability.

In March 2018, the EarthCube Coral Reef Science and Cyberinfrastructure Network (CRESCYNT) Project conducted two major workshops structured around Data Science for Coral Reefs. Both proved to be very productive. They accomplished their practical training and data exploration goals. This led to some interesting outcomes from both workshops. The first of the March workshops focused on Data Rescue and data management. Participants there concluded that metadata and its uses were among the most important of the topics they encountered. The second session centered around Data Integration and Team Science. By the end of the session, many had realized that writing good metadata is essential for making datasets at disparate scales work together. These two hands-on March workshops also led to a final workshop in August held jointly by two EarthCube Projects, CRESCYNT and Data Discovery Studio*. It focused on how to enhance metadata for finding and using data from coral reef research as a multidisciplinary and multiscaled use case representative of broader geosciences.

Workshops that listen

These workshops are moving the EarthCube community towards better interoperability and data usage, serving scientists who want and need the tools, and  creating avenues of communication between users and creators. For example, in the CRESCYNT workshops, participants concluded by asking that data and metadata experts get together and recommend some metadata practices and standards that would work for the coral reef community. They asked for recommendations specific to their broad spectrum of data types and repositories; and they also wanted them to address pre-repository research, storage, sharing and analytical metadata needs. The data challenges identified in the workshop sessions provided specific, relevant issues that needed to be addressed by the recommendations. CRESCYNT and DDStudio both listened to these needs.

CRESYNT’s takeaways

Between the sessions, CRESCYNT and DDStudio both generated mutualistic, important outcomes. CRESCYNT generated tools that they hope will change the way coral-reef data is handled. These tools include cross-mapping an essential set of metadata to web standards and producing a draft ISO metadata profile for coral reef data. This profile will be available for both data discovery and data sharing (a simpler form with freeform text entry in many of the fields). It will also address understanding and usability of data at the workbench level (a more detailed form with options to supply more highly specified fields). The CRESCYNT team expects to offer their tools to the coral reef community for feedback and potential adoption over the next few months.

What DDS learned

The Data Discovery Studio turned to exploring how to automate processes to enhance metadata, especially when it comes from different repositories and science use cases. At the end of the joint workshop, the team and special guests made a deep-dive inquiry into the future trajectory of DDStudio. The examined how best to focus future work and best share its API. This means enhancing metadata in additional ways, enabling users to index and edit resource descriptions, exploring the discovered resources using Jupyter notebooks, and publishing resource metadata using schema.org markup.

Moving Forward

The sessions also addressed plans for the upcoming hackathon and a Data Discovery Science Competition, as well as DDStudio’s collaboration with the Science Gateways Community Institute (SGCI). These sessions were  bound to be influential; they included guests from the coral reef data ‘heavy-hitters’, and experts from all across EarthCube and the field of data science. Ted Habermann, from the Metadata 2020 project and co-author of “The influence of community recommendations on metadata completeness”, and Stephen Richard, who has experience with schema.org and metadata standards authoring, were special guests. They were in good special guest company with several coral reef data experts. Gastil Gastil-Buhl, from Moorea Coral Reef LTER, Hannah Ake, from BCO-DMO, and Sarah O’Connor and Zachary Mason, both from NOAA NCEI were there. Together they represented the three largest formal repositories for coral reef research data in the US or sponsored by NSF. Especially notable is NOAA with its user-friendly metadata writing interface and CoRIS. Also in attendance were Eric Lingerfelt, the EarthCube Technical Officer; guests from Scripps; DDStudio team members Ilya Zaslavsky, Karen Stocks, Gary Hudman, David Valentine, and Tom Whitenack; and CRESCYNT’s Ouida Meier. Together the group provided broad and integrative metadata, software, and domain expertise. Ilya and Karen kindly hosted the group at UCSD’s San Diego Supercomputer Center and Scripps Institution of Oceanography.

With a team like that and traction along avenues of communication dedicated to improving geoscience data, it is an unsurprising treat that these projects are moving along well. They are already making changes to the face of this data studio. To keep up with the progress of CRESCYNT or DDStudio or any other EarthCube movers and shakers, subscribe to the EarthCube blog and contact projects that can help solve your own work challenges.

*Note: Data Discovery Studio is formerly known as Data Discovery Hub, and is built on the CINERGI search engine APIs already familiar to some.

Leave a Reply