balloon Overflight

Indexing Linked Open Data

Following the main idea of the Linked Open Data project, data within the Linked Open Data Cloud is highly interlinked, openly available, and can be queried separately on each endpoint. Harmonization at the modelling level is established due to unified description schemes, such as RDF(S), however, a convenient request considering the global data set is only possible with strong limitations. Given a specific concept, it is hard to find all relevant information, which is distributed over several endpoints. A straightforward consumption of Linked Data is, among other things, hindered by the absence of a global view on basic interlinking in the Linked Data cloud.

Note: A naive solution of this problem would be, to establish a local data replication of all Linked Open Data endpoints for an overall view on the data. This approach is not feasible due to the amount of data, unnecessary data duplication and related update problems. Because of this, we differentiate between "content"- and "schema-interlinking"-information (e.g. equal-predicates, hierarchy and type information). We focus on the aggregation of "schema-interlinking"-information to simplify the access to "content"-information afterwards.

To overcome this issue, balloon Overflight aims in the unification of basic information of Linked Open Data endpoints to generate a simplified and global sub-graph. The creation of the knowledge base leading to the bird's-eye view is the result of a continuous SPARQL crawling process. All datahub.io registered SPARQL endpoints are regularly queryied for triples which contain equivalence, instance or inheritance predicates. This knowledge is then aggregated in a graph database for further analysis (e.g. balloon Fusion or balloon Commonalities)

Generating a simplified but gloabal sub-graph of LOD:
Automatic SPARQL indexing of equivalence (e.g owl:sameAs), instance (e.g. rdf:type) and inheritance (e.g. rdf:subClassOf) relationships.

All retrieved triples are additionally stored in dumps, which are available as open download in a zipped way:
ftp://moldau.dimis.fim.uni-passau.de/data/