E N D
1. Factiva Intelligent Indexing™ SLA 2004
2. Agenda Factiva Intelligent Indexing™
Application of Factiva Intelligent Indexing™
Pros and Cons
Quality Control
3. Factiva Intelligent Indexing™
4. FII Structure One universal taxonomy
Building blocks
Inclusive hierarchy
Polyarchy
Synonyms and alias names
Full descriptions
Variable depth and breadth
5. Polyarchy Internet/Online services
E-commerce
Internet browsers
Internet portals
Internet search engines
Internet service providers
etc.
Computers
Computer hardware
Computer services
Computer stores
Networking
Semiconductors
Software
Applications software
GroupWare
Intelligent agents
Internet browsers
etc.
7. FII Application Code mapping
Entity extraction
Rule-based system
Linguistic analysis software
Manual review
8. Code Mapping Most information providers provide some form of metadata. This is matched to relevant Factiva indexing terms.
Advantages:
Easy and quick
Efficient use of existing data
Disadvantages:
Mismatches between coding schemes
Different interpretations of same concepts
Variable quality – which sources do you trust?
9. Entity extraction This tool finds company names which are then compared to our controlled vocabulary.
Advantages:
Consistent
Precise
Disadvantages:
Ambiguous names
High maintenance costs
10. Symbology Snapshot
11. Rule-based system Sets of IF-THEN statements established by editors, information architects, or subject-matter experts.
Advantages:
Good at highly formulaic content
Precise
Disadvantages:
Need thousands of rules for a complete system
Maintenance of the rules themselves becomes VERY expensive!
Only captures explicit concepts
12. Example
13. Linguistics-based categorization This tool is currently employed across all English, French, German and Spanish language publications. A combination of linguistic analysis and statistical algorithms allows new content to be compared to example data and coded appropriately.
Advantages:
Scales to millions of documents, thousands of categories, multiple languages
Copes well with change
Fits editorial workflow
Good fine-tuning tools – editorial control
Codes implicit as well as explicit concepts
Disadvantages:
Training time and cost
14. Editorial Control Set relevance levels
Maintain training set
Stop words - correlation and multiple meanings
"Chechnya" to the industries model, as it was triggering the freelance journalist code (because so many of them were dying there)
15. Manual coding About 200 editors spread across main time zones
Advantages:
Humans easily grasp the gist of the story
Cope well with exceptions
Visible/Controllable
Disadvantages:
Very resource-intensive = Expensive
Slow
Inconsistent (subjective and temporal)
Not scalable
16. Review process Lists reviewed every three months, redefinition, new codes, expansion changes
Market research/customer feedback and behavior
Changes to parent schemes/standards
Editorial/Quality control feedback
Internal coding forum
45-day notice period
17. Quality control Sampling by editors
Scoring for precision and recall
Analysis by source, language, code, editor etc.
Feedback to editors and systems
Corrective action
18. Results Three million articles coded a month
All receive a level of autocoding
Seventy-nine percent automation or more than two million are auto-coded with no further manual review
19. Recap Factiva’s taxonomy is Factiva Intelligent Indexing™
Factiva uses a hybrid methodology for application
Factiva has a coding team for governance and maintenance
End result: Factiva Intelligent Indexing™ leverages our editorial strengths, combining human experience and expertise with the latest automation software to implement a completely flexible and granular indexing system across all of our content.