210 likes | 335 Views
Data at BvD: sources, testing, standardisation Brussels - 11th of September 2013. BvD = business information provider. Two key factors to be one of the world’s leading company:
E N D
Data at BvD: sources, testing, standardisation Brussels - 11th of September 2013
BvD = business information provider Two key factors to be one of the world’s leading company: Own proprietary software : easy understanding and usage of data in various industries (not detailed in this presentation) Daily work on adding value to massive quality data
adding value to massive quality data Daily work focusing on three components 1) Alliance with leading Information Providers (IPs) 2) Quality data = not only the coverage 3) Adding value for usability and relevance Illustrated using the case of Russia
Alliance with leading Information providers (IPs) Why?: local specificities require local players (sources, format, legislation, etc.) Who?: more than 100 specialists in their fields (company data, stock market info, scoring companies, etc.). Either directly public source when available (CCIR, OMPIC) either private companies able to directly collect from public sources (and gathering data from various sources).
Alliance with leading Information providers (IPs) How?: Evaluation + monitoring of best source. Long term strong win-win relationship Selection of Best IP (Quality + quantity) Quantity determining factor whendeciding the partnership. (Ex. Romania) Quality In case of similarcoverage, the qualityis important (ex. Russia, Poland) Monitoring Process: IPs regularly send data to BvD for integration into the broad range of products (ex. Qin -> Oriana -> Orbis)
Alliance with leading Information providers (IPs) Monitoring Monthly basis by evaluating the evolution of the population and itsquality/timeliness By confronting the data to potentialothersuppliers or competitors Process: IPs regularly send data to BvD for integration into the broad range of products (ex. Qin -> Oriana -> Orbis)
Alliance with leading Information providers (IPs) Challenges?: Improving content (a.o. all registered companies in Turkey), structuring information (addresses), collecting new data items (legal events, GPS coordinates) Sourcing new countries where partial structured company information exists (Africa, South America) Russia: several possible partners. Long term relationship. Quality unrivalled (various sources). Network (importance of local specialists)
Quality is not only coverage • Coverage & timeliness • Quality & accuracy • Usability & relevance
Quality is not only coverage Timeliness : network of IPs able to report any change, minimizing the time lag of exchanging information. BvD able to process daily data related to 150 million entities and make it available online in easy and fast way. Accuracy : quality tests applied at every stage of data integration. Applied on all datasets (consistency). Own proprietary hybrid approach (automated and manual testing) Russia: doubtful accounts are identified, checked. Several sources used. Top companies monitored. Reporting/communication with IP
Quality is not only coverage QC automation (Mozart) Client feedback Communication with IPs
Quality is not only coverage Mozart
New Mozart data Flow (1/2) The different stages of the data between the IP and BvD product are the following: • Delivery of the data from the IP to BvD • Preliminary checks and validation of the data received by BvD • Formatting of the IP data in a standard ‘readable’ format for BvD, for the ‘restrictive’ company data, but also for the ownership data and for the contacts data • Validation of the several output files • Creation of the BvD databases • Validation of the final data, its processing, and its conformity with BvD software • Publication of the data
New Mozart data flow (2/2) Each dimension is tested at the correct stage in order to optimize the reaction time and the production process.For the stages including validation of data, output files might be: • Red alert: the process is stopped because critical information is missing • Orange alert: the process might continue but need manual intervention • Green alert: process continues but a list is produced for information purpose
Production Amadeus (1993- 2009) BvD formating Local IP Amadeus & Oriana databases Local IP formating Software to read/access data Local IP formating QC Production Local IP formating Once a month Once a month Needstwodays Once a month. Needsaround one week
Mozart: Production from 2010 BvD Local IP Local IP Local IP formating formating formating QC QC QC Amadeus & Oriana databases Local IP formating QC Software to read/access data Production QC Each time datasetisreceived Needs a few hours Once a week. Needsaroundtwodays
Consequences: organization • Reaction time towards IPs: as soon as data is provided on FTP • Dynamic/immediate process: as soon as data is copied starts the first stage in data processing • Quick process: indexation in less than 24 hours • Weekly updates: instead of monthly before • Independency of analysts: no need to contact the indexer to check/validate expected changes
Consequences: quality • Same tests are applied on every single dataset provided to BvD • Exhaustive tests of data: data structure & formatting, statistics, coherence, logical • Follow-up of changes: facilitated through direct contacts, ease of access to data content • Categorization of issues: red/orange/green. More objectivity. • Permanent IP quality monitoring: facilitated through individual tests
Consequences: transparency Enhanced communication and anticipation of changes • External : upstream (crf IP’s information letters when delivering data) or downstream (cfr Client exports, intranet news, reporting IPs) • Internal : PM/QC/Support collaborate and share information, transfers to Orbis, pre-sales stats and reports easily available
Quality is not only coverage Client feedback Really useful for refining the quality of the databases in details Who to be in contact with : trombinoscope Communication with IPs – cfr Carole’s presentation
Adding value to data BvD’s expertise to daily work on data enrichment. Data content is matched and aggregated in order to build one unique comprehensive source of information (unique identifier) Local references are mapped vs international standards (a.o. activity codes) Financial formats are harmonized to allow cross border analysis
Adding value to data Bespoke collection of critical data items (URLs, individual’s e-mails, activity description, etc.) Dedicated BvD teams cross-checking information received from IPs and enriching content (ownership, contacts) Retrieving the best from data (indexation of news) Russia: Managing Cyrillic alphabet (translation/transliteration), accounting format changes every 3 years (harmonized presentation on Ruslana)