balloon Commonalities
Finding shared characteristics of entities and types
The balloon Overflight services was extended to crawl more predicates than only equivalence (e.g. owl:sameAs) relationships. The new predicates now include structural relationships like instances (rdf:type) and class inheritance (rdfs:subClassOf). As a result, the graph database now holds a structural hierarchy, which can be exploited by the balloon Commonalities service. The crawled dataset contains information about concept bundles. A concept bundle includes all equivalences, types and inferred types (based on subclasses) relationships about a semantic concept. Consequently, a concept bundle would contain all synonyms URIs, types and provenance information like data origin. The main target of this approach is to create a global index of basic structural interlinking information whereas the real “content”-information is not particularly interesting for the index and can be queried efficiently afterwards.
The structural graph is the foundation for understanding the nature of Linked Open Data and can be utilized to accomplish analysis tasks. balloon Commonalities offers a range of different analysis services:
- Type Index: The graph holds a implicit index of different semantic types and can be queried for example to get all encountered instances of a specified type which are spread over many LOD endpoints.
- Type Matching: Exploiting the type index, the balloon service can find similar instances to a given concept based on common inheritance. On the other side, given multiple concepts the service can also identify which types are important and shared among all participating concepts (canonical types).
- Type Distribution: Linked Data concepts feature a large number of type relationships. The co-occurrence of different types reveals semantic connections between types. Associate rule mining can provide insights and show up related or unrelated types.
- Structural Distance: The structural graph can be queried for a path between two given concepts. This allows a distance calculation based on sibling & parent relationships. For example Josef Bradl and Thomas Morgenstern have the distance 1 because they have the type AustrianSkiJumpers in common.
- Further analysis: Querying further predicates of type instances can offer the possibility of building a kind of fuzzy scheme of Linked Open Data. For instance, almost all instances of the type Person have a predicate birthplace. These predicates play an important role in unstructured data like the semantic web because probable expectations can be used for efficient querying. Another future analysis is the distribution of related information of concept bundles spread over multiple endpoints. In other word: How many different endpoints are involved to define a concept bundle.