1 / 6

Categorization based on Company Activities

Categorization based on Company Activities. Martin Repka , Ján Paralič Katedra kybernetiky a umelej inteligencie Fakulta elektrotechniky a informatiky Technická univerzita v Košiciach { Martin.Repka , Jan.Paralic }@ tuke.sk. Company Network.

lexine
Download Presentation

Categorization based on Company Activities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Categorization based on Company Activities Martin Repka, JánParalič Katedrakybernetiky a umelejinteligencie Fakultaelektrotechniky a informatiky Technickáuniverzita v Košiciach {Martin.Repka, Jan.Paralic}@tuke.sk

  2. Company Network • publicly accessed sources as Business Register of Slovak Republic (ORSR) • aggregated, analyzed and integrated within project ITLIS1 • featuring complete data model, which integrates structural, compositional and temporal (historical) data • aim • correlation between compositional data and online catalogue categories 1 www.itlis.eu

  3. Data • Amadeus - compositional data about 25683 companies • Azet - online catalogue entries in 15705 cases • Itlis - companies in 10 largest online cat. categories, • 2040 companies (ranging from 105 to 389 companies in category)

  4. Company categorization • classification of documents into categories • Text-mining technique • List of company activities as unstructured document

  5. Results • average accuracy • 10-fold cross-validation • average performance using mode values = 19.07% • RapidMiner

  6. Conclusions • data integration from several sources • experimentally verified correlation of compositional data (company activities) and online categorization • several model learners • time complexity and model accuracy • very efficient model learners seems to be SVM, Neural Network and Naive Bayes (Kernel) model • information could be useful as support to other analyses of company network.

More Related