Linking Open Data
Linking open data across metric data repositories and discipines
Project Leader: Nicolas Delsol, Meghan Dennis, Paulina Przystupa, and Sarah W. Kansa
This project aims to document the process of integrating data from Open Context (opencontext.org, an archaeological open data publishing and curating platform) and FuTRES to highlight the feasibility and research value of linked open data. While many archaeologists, paleontologists, and biologists build their own datasets and use them intensively for research, or contribute data to collaborative databases, it remains challenging to use data collected by others or curated in separated repositories. Several other hurdles also face data users. Data sharing has often focused on archiving for compliance purposes, and this has resulted in messy and poorly described data entering repositories. Repositories are often still siloed or linked only at the disciplinary level (such as Open Context for archaeological data, iDigBio for biodiversity data, etc.) or focus on specific data types (such as trait data in the case of FuTRES). For the data user, there are few opportunities to learn about how to discover, evaluate, and integrate data from multiple sources including their own datasets. Focusing on data reuse best practice and the research benefits of data integration will help scholars be more thoughtful about data clean up and documentation. Published examples of how to discover relevant data and prepare it for integration can provide much needed guidance for archaeologists, paleontologists, and neontologists to engage in linked open data.
This project will explore the processes and realities of linking biometrical data already published in Open Context with similar data in FuTRES and in personal datasets. We will focus on the integration of metric skeletal data from archaeological cow specimens (Bos taurus) and any possible existing datasets on modern and ancestral cows (Bos primigenius) from the FuTRES dataset or other interested workshop participants. Open Context has a new Data Literacy Program that has been developing learning modules drawing on archaeological cattle data, including metric data, as an exemplar of how to use open data from Open Context for research purposes. This project is being conducted by Meghan Dennis and Paulina Przystupa (Postdoctoral Researchers with Open Context). At the same time Nicolas Delsol (a Ph.D. student at the University of Florida) has a Colonial New World cow dataset that includes metric data as well. In this project, Delsol will add his cow metric data to FuTRES and then together with Open Context and FuTRES personnel, explore best practice to integrate the Open Context cow metric data to FuTRES data (Delsol’s and others) and/or link Open Context and FuTRES data into a separate research dataset using an API. This entire process will be documented step by step and published in R as well as a research paper so that others can benefit from following a “recipe” for linking data across disciplines and repositories. The attempt to integrate/link datasets across systems will be helpful to document to understand the challenges and potential stumbling blocks in this kind of work. This attempt alone could be a useful research paper. If the data integration/linking works well, then we could proceed to a simple research study of body size differences in Old/New World or pre/post colonialism contexts. This project would support professional development and mentorship, as it could be run primarily by Open Context’s Data Literacy Program postdoctoral researchers Meghan Dennis and Paulina Przystupa, and Ph.D. student Nicolas Delsol, with mentorship from FuTRES and Open Context, furthering community building among early career researchers.
Methods
Data. Datasets to be used in this project include (1) open access cattle biometrical data documented on 14,524 specimens from 20 projects published in Open Context; (2) unpublished cattle biometrical data collected by project participant Nicolas Delsol (1,685 linear measurement on 804 bone specimens from four archaeological sites); (3) any existing cattle data available in FuTRES; and (4) any appropriate datasets from other interested participants that could be quickly integrated into FuTRES.
Approach. This project will document the process of integrating data from two different systems and combining data across disciplines. The step-by-step process includes data discovery, download, checking for integrity, clean up, and alignment. We will also document data analysis and visualization using R.
Outcomes and Dissemination. The aim of this project is to demonstrate the feasibility, process, and value of combining datasets from different Web resources. We will document this process in R for others to reuse with their own datasets and we will publish a paper about linked open data, highlighting this case study of linking Open Context and FuTRES data. FuTRES engagement
A first step of this project will be to upload the Delsol dataset to FuTRES (using the FuTRES Rshiny app) and thereby integrating the Delsol data with any other pre-existing FuTRES or interested workshop participant cow data. This dataset will be integrated with related data published in Open Context. The primary aim of this work is to demonstrate successful integration of related data from FuTRES and Open Context to encourage scholars to develop research programs that draw on linked open data.
Roles/Competency | Identified team members | Needed team members |
---|---|---|
Identify datasets to integrate | Delsol, Dennis, Przystupa, Emery, S. Kansa | Other cow-related datasets are welcome! If added, data donors will be included in all appropriate steps below |
Mammal anatomy expertise | Delsol, Emery, S. Kansa | Will need help from FuTRES ontology team |
Data cleaning and metadata alignment appropriate to OC and FuTRES | Delsol, Dennis, Przystupa, Kansa, FuTRES team | Need help with FuTRES |
Upload data to FuTRES | OC team, FuTRES team | Need help with FuTRES |
Document data integration process | OC team, FuTRES team | Need help with FuTRES |
R Coding | OC team | Would like additional help or review |
API development (if feasible) | E. Kansa (OC team), FuTRES team | Need help with FuTRES side of API development |
Metric data comparative analysis as appropriate | Delsol, Emery, S. Kansa | |
Writing and editing (1 paper on integration process, possibly 1 paper on metric data analysis) | Delsol, Dennis, Przystupa (possibly Emery and S. Kansa) |