HOBBIT Holistic Benchmarking of Big Linked Data
HOBBIT project started in December 2015, and focused so far on the architecture and development of the Benchmarking Platform architecture and development, as well as Community building.
The idea of creating a platform to host different benchmarking tools for the Big Linked Data lifecycle was born with the creation of the General Entity Annotator Benchmarking Framework (GERBIL). GERBIL  was designed to facilitate the benchmarking of named entity recognition (NER), named entity disambiguation (NED) and other semantic tagging approaches. Its objective is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools on multiple datasets. The main pain points for tool developers and end users, which motivated the creation of GERBIL, were:
Accessibility of Gold Standards. A developer requires available gold standard datasets to use for the evaluation of annotation tools,. Formats and data representations provided in these gold standards vary within the different domains and tools. Thus, authors evaluating their systems have to write a parser and the actual evaluation tool to be able to use the available datasets.
Comparability of results. A large number of quality measures has been developed and used actively across the annotation research community to evaluate the same task, leading to difficult comparisons between the results across publications on the same topics. For example, while some authors publish macro-F-measures and simply call them F-measures, others publish micro-F-measures for the same purpose, leading to significant discrepancies across the scores.
Repeatability of experiments. Considering the challenges for evaluating tools, recreating the experiments remains a hard task. Moreover, it is difficult to track benchmarks configuration and results’ achievements. Thus, with GERBIL users can receive a stable URL for their experiments containing human- as well as machine-readable metadata about the experiment.
The HOBBIT platform will expand upon the mechanisms behind GERBIL to cover the different Linked Data lifecycle stages (see Figure 1). First, the HOBBIT Platform is innovating the integration of datasets to not only open datasets, but also providing the tools for generating datasets that reflect real-world industrial (closed) datasets. In addition to classical metrics such as precision, recall, F-measure and runtime, we will collect relevant KPIs from the community and provide reference implementations as well as public performance reports to the community, especially to interested developers and parties.
For any organisation interested in participating to the collection of requirements, we have opened up a short survey on how they are evaluating their software. The goal of this survey is to determine industry-relevant key performance indicators (KPI’s) for building benchmarks that measure these factors. With this survey, we aim to raise awareness about the HOBBIT project and lay a foundation for potential contacts to build the HOBBIT association. This association will play a key role in determining the form of the HOBBIT benchmarking platform, by providing KPI’s, use cases and datasets.
Fig 1. HOBBIT Linked Data Challenges Categories
Two campaigns are being organised by the HOBBIT team, and they will take place at the ESWC 2016 conference (May 29th - June 2nd 2016).
Generation and Acquisition
The Open Knowledge Extraction challenge focuses on evaluating two tasks.
The first task comprises (1) the identification of entities in a sentence (Entity Recognition), (2) the linking of the entities to a reference Knowledge Base (Entity Linking) and (3) the assignment of a type to the entity (Entity Typing). The task focussed on mapping entities to classes like “Person”, “Place” , “Organization” and “Role” according to the semantics of the DOLCE Ultra Lite ontology . However, GERBIL as well as HOBBIT focus is on being knowledge base-agnostic in order to cover a wide range of Linked Data. The second task aimed at the identification of the type description of a given entity and infer the most appropriate DOLCE+DnS Ultra Lite class that contains this type. The participating systems receive short texts in which a single named entity had been marked.
Visualization and Services
The 6th QALD evaluation campaign focusses on question answering (QA) over linked data, with a strong emphasis on multilinguality and hybrid approaches using information from both structured and unstructured data. GERBIL will be able to measure not only the performance of the QA systems but also other subtasks, like recognizing required properties, relations or entities. Although QALD-6 will be mainly evaluated using the existing portal, we will present the new GERBIL version to the community to kick-start comparable, archivable and up-to-date experiments in the research field of QA.
The results of these campaigns will allow us to assess the evaluation component that will be deployed within the HOBBIT platform. For more information, join our HOBBIT community!
HOBBIT is a project within the EU’s “Horizon 2020” framework program and started on December 1st, 2015. The consortium consists of InfAI (coordinator, Germany), Fraunhofer IAIS (Germany), FORTH (Greece), NCSR “Demokritos” (Greece), iMinds (Belgium), USU Software AG (Germany), Ontos AG (Switzerland), OpenLink Software (UK), AGT Group R&D GmbH (Germany) and TomTom (Poland). For more information, see http://project-hobbit.eu/
 Usbeck R., et al.. 2015. GERBIL: General Entity Annotator Benchmarking Framework. In Proceedings of the 24th International Conference on World Wide Web (WWW ’15). ACM, New York, NY, USA, 1133-1143.