balloon Fusion

Towards Automatic Query Federation

Working with the Linked Open Data, you can easily realize that there is a need for an intelligent and automatic discovery of Linked Open Data endpoints since replication of data would be a waste of resources. The idea of establishing a mediator service has been pursued and led to the balloon prototype in the first phase. Building upon the indexed co-reference information crawled by the balloon Overflight service, balloon Fusion focuses an intelligent query rewriting with automatic endpoint discovery and query federation.

balloon Fusion is acting as a mediator service between a client application and a set of actual Linked Data endpoints as single point of access based on distributed co-reference information like owl:sameAs.

Briefly summarized, the query federation service knows all occurrences of similar concepts (co-reference) existent in reachable Linked Data endpoints. On the basis of this information, it performs an automatic and intelligent query rewriting whereas routing of the query is managed by utilizing the SPARQL 1.1 Query Federation recommendation.

Example: Although utilizing URIs assure the identification of resources, a semantic entity can be referred by multiple overlapping, but different, URIs. When querying for a semantic entity, a user initially doesn’t know which identifier for the semantic entity exists in a particular endpoint. As an example, one might know the URI of the DBpedia resource of the Austrian ski jumper Thomas Morgenstern since it is the most common and domain independent endpoint in the Linked Open Data cloud: http://dbpedia.org/resource/Thomas Morgenstern.
One could think of enlisting Freebase for further information about Thomas Morgenstern with the same URI. Unfortunately, this leads to no results because the given URI does not exist in Freebase. Freebase has its own identifier for the same semantic entity, complicating an efficient querying or browsing inside the data graph. In linguistics, this issue is known as co-reference issue meaning multiple expressions refer to the same thing.

From the perspective of a Semantic Web user a central issue is how to automatically discover and query multiple endpoints that use different URI naming schemes. For that reason, we see the need for a discovery of relevant synonym URIs to enable an automatic integration in the query process in conjunction with an smart endpoint selection. An automatic query enhancement and intelligent routing to suitable endpoints fulfills the wish for an easy information access. The main objective of balloon Fusion is offering a mediator service between a SPARQL client application and a set of actual Linked Open Data SPARQL endpoints to reveal a single point of access. The idea behind the query rewriting process is to (i) extend the initial query by synonym URIs and (ii) address a suitable set of endpoints to improve the result set, without changing the enclosed query semantics. The primary focus of this approach is to accomplish an immediate query rewriting without any on-demand analysis or check queries on Linked Open Data endpoints, which often ends in long- lasting latencies or timeouts.

Example: Given an example SPARQL query in which all known predicates and objects for a given subject are queried:

SELECT ?p ?o WHERE { <http://vocab.semantic-web.at/AustrianSkiTeam/121> ?p ?o.}

Based on the co-reference information from balloon Overflight the initial query can be rewritten. The co-reference set reveals 3 different URIs for the same semantic concept, which are distributed over 2 different endpoints. Given the selected endpoints, we can establish a federated querying by using the W3C SPARQL 1.1. Federated query extension. For each endpoint, only the necessary identifiers are encapsulated in a SERVICE clause. Now, the original statement in the initial query can be replaced by the union of the generated SERVICE component. The final SPARQL result of the example query would look like this:

SELECT ?p ?o WHERE {
    SERVICE <http://dbpedia.org/sparql> {
        <http://dbpedia.org/resource/Thomas_Morgenstern> ?p ?o. 
    } UNION {
        SERVICE <http://vocab.semantic-web.at/sparql/OpenData> { 
        {<http://dbpedia.org/resource/Thomas_Morgenstern> ?p ?o} 
        UNION
        {<http://vocab.semantic-web.at/AustrianSkiTeam/121> ?p ?o} 
        UNION
        {<http://rdf.freebase.com/ns/m/08zld9> ?p ?o}
}}}

Demonstration

For more detailed information please refer to:

Kai Schlegel, Florian Stegmaier, Sebastian Bayerl, Michael Granitzer, Harald Kosch Balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information in 5th International Workshop on Data Engineering Meets the Semantic Web, co-located with the 30th IEEE International Conference on Data Engineering, 2014