ProvenanceAndLinkedDataMetadataGroup
Contents
Provenance[edit]
Potential Scenarios[edit]
- Big eScience: Tracking sources of data, code, libraries associated with published research.
- reproduce
- citation tracking
- retract cascade
- Simple data publishing on the web
- basically just need author and production means information and licensing.
- Consuming and reusing data sets
- selection of datasets based on provenance/trust information
- More relevant scenarios from the Social Web Incubator Group:
- Management of Ontologies and other Structured Knowledge Organization Systems (especially when used as metadata vocabularies?): Possible use cases from the real world:
- http://ontolog.cim3.net/cgi-bin/wiki.pl?OOR Open Ontology Repository (cross-domain)
- http://bioportal.bioontology.org/ NCBO BioPortal (domain-specific)
- http://www.obofoundry.org/ The Open Biomedical Ontology (OBO) Foundry (domain-specific)
Existing Vocabularies for Provenance-Related Metadata[edit]
- Open Archive Object Exchange and Reuse
- Information Object - Information Realization
- CIDoc CRM (Heritage-Orientation)
- IRW: (Web Resources) (reuses Information Object - Information Realizaton*)
- Situation/Plan Execution (Agent, Role, test...)*
- Dublin Core
- Creative Common
- PREMIS (Library of Congress Preservation Metadata)
- Intelligence Community Standard for Source Reference Citation Metadata 2008 (reuses Dublin Core, DoD Discovery Metadata Specification)
- Ontology Metadata Vocabulary
- The Provenance Vocabulary
State of the Art[edit]
- Solutions Around
- Open Provenance Vocabulary
- Provenance Vocabulary
- POWDER
- Dublin Core
- Named Graphs and Semantic Web Publishing Vocabulary
- Web of Trust Vocab
- Void
- RDF/XML source description
Research, Modeling Suggestions and Issues[edit]
Discriminant Resources Specifications[edit]
Problem statement[edit]
- In the LOD, there is the problem of interlinking new incoming datasets and maintain the existing links
- How to allow datasets to expose and advertise how to identify the instances they contain in order to ease inter-linkage
- How to declare and specify what are the discriminant resources so that ontology and instance matching tools are better able to find equivalent resources between datasets
State of the Art[edit]
- voiD is a vocabulary for providing general statistics on a dataset.
- Silk provides a declarative way (the Silk specification) to interlink 2 identified datasets
- The scope feature of POWDER can be used to identify the resources to which a description apply
- Ontology Metadata Vocabulary OMV
- The more the specification goes to the meta-level, the less it is contextualized. Context is important as some properties might be relevant to discriminate in one dataset while they are not in general.
- How to attach such discriminant metadata to an ontology?
- How to attach such discriminant metadata to a dataset?
Proposal[edit]
schema of the resources discrimination proposed vocab
Note (Jérôme Euzenat): it is really disturbing how discriminant is used as a dual of identifiant here.
- Any set of properties is discriminant: it discriminates along these properties.
- Any identifier is discriminant: it discriminates between the entities which have the identifier and those who do not have it.
This does not means that these are the same thing. Typically, in some datasets, two entities may have the same values for all properties but yet be different. Their identifier (URI in the semantic web) will help keeping track of the difference (as long as no one adds a sameAs between them). Hence, discriminating is good for discriminating, not always sufficient for identifying (which is a stronger notion, ask for the people who serve in jail because they have been identified as someone else on the basis of not enough discriminating identification).
Hence, putting, as done in the figure above, discriminating under identifying, is not a good idea, this should be the other way around.
Finally, it is very difficult to find identifying features anyway (or absolutely discriminating features, i.e., which discriminate individuals against all the others). If you choose one, as soon as you put yourself (let say a City by its name and its country) in a different context (let say add historical dimension) then it does not work anymore. This is exactly why all our governments try to assign us a "unique" number (well we are reassigning the French social security numbers of people born in 1909...) in order to identify us.