← Back to News List

PhD defense: Supporting Citizen Science and Biodiversity Informatics on the Semantic Web

Ph.D. Dissertation Defense

Supporting Citizen Science and
Biodiversity Informatics on the Semantic Web

Joel Sachs

10:00am Friday, 14 December 2012, ITE 325b

It is common for Semantic Web documents to use terms from multiple ontologies, with no expectation that the full semantics of each ontology will be imported by consuming applications. This makes sense, because importing all ontologies referenced by a document causes both practical and logical problems. But it has the drawback of leaving it to the consuming application to determine appropriate semantics for the terms being used. We describe an approach to constructing ontologies by layer, designed to make it easier for both data publishers and application developers to tailor-fit semantics to use cases.

The layers that we develop correspond to patterns in the RDF graph. This contrasts with typical approaches to modular ontology development, where the layers are domain based. The three primary motivations for this approach are i) preserving computational tractability; ii) enabling easy coupling and decoupling with foundational ontologies and iii) maintaining cognitive tractability. This third motivation is still under-studied in semantic web development; we consider it in relation to reducing the ease with which ontology users can publish data that accidentally implies things that they do not mean. This is important always, but becomes especially so in citizen science, where users will naturally bring intuitive semantics to the terms that they encounter.

We describe case studies that involved deploying our approach in the context of citizen science activities, and which provided opportunities to assess its capabilities and limitations. We also describe subsequent work aimed at addressing these limitations, and, by applying newly defined layers over the underlying data, show that we are able to improve the competency of our knowledge base. More generally, we show that appropriately combining triple-pattern-based layers allows us to support a wide variety of use cases with varied (and occasionally conflicting) requirements.

In addition to our approach to semantic layering, contributions include an improved understanding of how to blend social and semantic computing to support citizen science, and a collection of layers for representing biodiversity information in RDF, with a focus on invasive species. Compared with other proposed “semanticizations” of the Darwin Core standard for representing biodiversity occurrence data, these layers involve minimal modification to the Darwin Core vocabulary, and make maximal use of the Darwin Core namespace, thereby simplifying the transition of current practices onto the semantic web.

Committee: Drs. Tim Finin (Chair), Anupam Joshi, Tim Oates, Cynthia Parr, Yelena Yesha, Laura Zavala

Posted: December 10, 2012, 11:13 PM