Semantic Technology Conference | May 20-24, 2007
  Vidrevich Tatyana      

Experiences in Automated Aggregation of Rich Product Content

Tatyana Vidrevich
Chief Technology Officer
XSB, Inc


 

Tuesday, 5/22/2007
3:15 PM - 3:45 PM
Level: Case Study

Web-based search engines do not provide meaningful parametric search of product information. Specialized portals do generate searchable Product Information but only through costly and time-consuming manual efforts in offshore content factories. This presentation describes XSB Inc's strategy and experience through two cases studies of automated generation of searchable content using XSB Inc's ontology-based product information reasoning tools. We also quantify some benefits of this approach. In the first study we present the generation of a Master Data File (MDF) for 17M products from 1300 vendors. The MDF enables recognition of identical and similar products from different sources and facilitates the creation of a large and diverse database of structured product information to support attribute-based product retrieval and comparison. The second study addresses collection and standardization of product reference content from the web, which is important when vendor-provided descriptions are poor in content. In the MDF process:

  • Each record is subjected to quality validation
  • All manufacturer names are standardized to a consistent representation
  • Part numbers are standardized and their prefixes removed
  • An MDF ID is assigned for each valid vendor record to relate the item to the group the identical items
  • Each valid vendor record is assigned an MDF subgroup ID to cluster possibly equivalently packaged items within the same MDF group
  • Product descriptions are classified to the UNSPSC taxonomy
  • Product attributes are extracted from the product descriptions and standardized according to a reference ontology

Ms. Tatyana Vidrevich is a Chief Technology Officer at XSB, Inc. She is a recipient of Federal Computer Week's 2006 prestigious Rising Star award for her effort in developing a Master Data File for the DOD EMALL. Ms. Vidrevich recently served as the Principal Investigator of a multi-year SBIR grant from the National Science Foundation for the development of the Xtractica® system. Xtractica® enables domain experts to autonomously mine and extract both structured and unstructured data from a variety of sources, and standardize this data to a reference ontology so that it may then be used for querying and in-depth reasoning. Ms. Vidrevich holds a Master's Degree in Computer Science from Stony Brook University. Upon completion of her MS degree in 2000, she joined XSB, Inc. where she directs the design and development of the company’s data management solutions.


   
Close Window