ProvenanceAndLinkedDataMetadataGroup

Provenance[edit]

Potential Scenarios[edit]

Big eScience: Tracking sources of data, code, libraries associated with published research.
- reproduce
- citation tracking
- retract cascade
Simple data publishing on the web
- basically just need author and production means information and licensing.
Consuming and reusing data sets
- selection of datasets based on provenance/trust information
More relevant scenarios from the Social Web Incubator Group:
Management of Ontologies and other Structured Knowledge Organization Systems (especially when used as metadata vocabularies?): Possible use cases from the real world:
- http://ontolog.cim3.net/cgi-bin/wiki.pl?OOR Open Ontology Repository (cross-domain)
- http://bioportal.bioontology.org/ NCBO BioPortal (domain-specific)
- http://www.obofoundry.org/ The Open Biomedical Ontology (OBO) Foundry (domain-specific)

Existing Vocabularies for Provenance-Related Metadata[edit]

Open Archive Object Exchange and Reuse
Information Object - Information Realization
CIDoc CRM (Heritage-Orientation)
IRW: (Web Resources) (reuses Information Object - Information Realizaton*)
Situation/Plan Execution (Agent, Role, test...)*
Dublin Core
Creative Common
PREMIS (Library of Congress Preservation Metadata)
Intelligence Community Standard for Source Reference Citation Metadata 2008 (reuses Dublin Core, DoD Discovery Metadata Specification)
Ontology Metadata Vocabulary
The Provenance Vocabulary

State of the Art[edit]

Solutions Around
- Open Provenance Vocabulary
- Provenance Vocabulary
- POWDER
- Dublin Core
- Named Graphs and Semantic Web Publishing Vocabulary
- Web of Trust Vocab
- Void
- RDF/XML source description

Research, Modeling Suggestions and Issues[edit]

Ontology Dynamics
- International Workshop on Ontology Dynamics (IWOD) 2009

Discriminant Resources Specifications[edit]

Problem statement[edit]

In the LOD, there is the problem of interlinking new incoming datasets and maintain the existing links
- How to allow datasets to expose and advertise how to identify the instances they contain in order to ease inter-linkage
- How to declare and specify what are the discriminant resources so that ontology and instance matching tools are better able to find equivalent resources between datasets

State of the Art[edit]

voiD is a vocabulary for providing general statistics on a dataset.
Silk provides a declarative way (the Silk specification) to interlink 2 identified datasets
The scope feature of POWDER can be used to identify the resources to which a description apply
Ontology Metadata Vocabulary OMV

The more the specification goes to the meta-level, the less it is contextualized. Context is important as some properties might be relevant to discriminate in one dataset while they are not in general.
- How to attach such discriminant metadata to an ontology?
- How to attach such discriminant metadata to a dataset?

Proposal[edit]

schema of the resources discrimination proposed vocab

Note (Jérôme Euzenat): it is really disturbing how discriminant is used as a dual of identifiant here.

Any set of properties is discriminant: it discriminates along these properties.
Any identifier is discriminant: it discriminates between the entities which have the identifier and those who do not have it.

This does not means that these are the same thing. Typically, in some datasets, two entities may have the same values for all properties but yet be different. Their identifier (URI in the semantic web) will help keeping track of the difference (as long as no one adds a sameAs between them). Hence, discriminating is good for discriminating, not always sufficient for identifying (which is a stronger notion, ask for the people who serve in jail because they have been identified as someone else on the basis of not enough discriminating identification).

Hence, putting, as done in the figure above, discriminating under identifying, is not a good idea, this should be the other way around.

Finally, it is very difficult to find identifying features anyway (or absolutely discriminating features, i.e., which discriminate individuals against all the others). If you choose one, as soon as you put yourself (let say a City by its name and its country) in a different context (let say add historical dimension) then it does not work anymore. This is exactly why all our governments try to assign us a "unique" number (well we are reassigning the French social security numbers of people born in 1909...) in order to identify us.

ProvenanceAndLinkedDataMetadataGroup

Contents

Provenance[edit]

Potential Scenarios[edit]

Existing Vocabularies for Provenance-Related Metadata[edit]

State of the Art[edit]

Research, Modeling Suggestions and Issues[edit]

Discriminant Resources Specifications[edit]

Problem statement[edit]

State of the Art[edit]

Proposal[edit]

Navigation menu

Views

Personal tools

Navigation

Search

Tools