|
|
Wednesday, 5/23/2007
11:45 AM - 12:15 PM
Level: Technical - Introductory
The CORE (Co-Occurrence and Ranking of Entities) module is an extension of the KIM semantic annotation platform. It enables timeline analysis and a novel search interface. The essence of the approach is a specific indexing, performed on the basis of semantic annotation of text with respect to “semantic features”: named entities and key-phrases. These features form a reduced dimension space, where the documents are characterized by their occurrence of the text. Documents (or other blocks of text) are considered contexts; document sets (or corpora) represent compound contexts. The frequency of occurrence of an entity in a context indicates its level of association with it, which can also be considered as “popularity”. The frequency of co-occurrence of entities is evidence of the existence and strength of an associative relationship between them; although the exact relation type might not be known, this can be used for rich metadata extraction. The dates of the documents are used to provide temporal extent to the context space and allow for analysis of trends of popularity or of association. Central for the implementation is CORE DB – a relational database with full-text indexing, where occurrence information can be managed efficiently. It allows for CORE Search, a novel search interface, involving CORE statistics, full-text search, and logically related concepts, obtained from an OWL repository (OWLIM). To align it with the “conventional” faceted search concept, one can assume that a separate facet is created for each class of entities referred in the documents. CORE Search will be demonstrated on top of an enterprise-scale dataset: 10^6 entities, having 10^8 occurrences in 10^6 documents. CORE DB is efficient enough to allow instant popularity timeline generation and incremental (as-you-type) CORE Search. Examples for usage of CORE DB and CORE Search in a completely different scenario – analysing the co-occurrences of research agents (people or organiZations) which co-occur in research contexts (papers, projects, events, and organizations).
Borislav Popov studied Computer Science and specialized Artificial Intelligence in the Sofia University, Bulgaria. He has experience in software engineering and IT consultancy related to ERP systems. His main research interests are information extraction, information retrieval, ontology management and knowledge representation. He was also involved in adaptation and implementation of advanced machine learning methods (namely Hidden Markov Models and Support Vector Machines) for usage in natural language processing. Popov joined Sirma in 2001 and currently leads several projects, among which are: the KIM Platform development (http://www.ontotex.com/kim), the Ontotext participation in PrestoSpace (IP within FP6, http://www.prestospace.org/) and MediaCampaign (http://www.media-campaign.eu/); the KIM adaptation within the ETD portal (European Tourism Destinations, http://etd.ec3.at/). He has more than 10 scientific publications at international conferences and refereed journals.
Atanas Kiryakov is founder and head of Ontotext lab of Sirma Group - a Bulgarian software house. Ontotext is a leading Semantic Web technology provider with applications in KM, Semantic Web, EAI, Business Intelligence, e-Government, telecommunications, life sciences, media monitoring and online recruitment.
Kiryakov joined Sirma as a software engineer in 1993, he joined the board in 2000 and founded Ontotext in the same year. His current research interests are in semantic annotation and search, large-scale semantic repositories and reasoning, (upper-level) ontology design, information extraction, IR, object consolidation. He is an organizer and a member of program committees of a number of international forums and author of more than 20 articles and book chapters.
|
|
|