390 likes | 589 Views
2. Agenda. The Challenge of Unstructured ContentKey Concepts and TermsTaxonomy, Classification and ECM AdoptionThe Case for ECM Classification Technologies . 3. The Challenge of Managing Unstructured Content. 4. Is this something I should save?. Where should I put it so we can find it later?.
E N D
1. 1 The Challenges of Building Enterprise Content Taxonomies and the Role of Classification Technologies in Maintaining Their Effectiveness Reginald J. Twigg, Ph.D. (rtwigg@us.ibm.com)
ECM Capture and Content Integration, IBM This session provides an overview of the role of taxonomy and classification management in ECM, outlines IBM’s strategy for enabling ECM (starting with FileNet P8) classification and taxonomy management and introduces the offering. This session is a Classification for ECM 101, starting high level with key challenges for managing unstructured enterprise content, then moving into more detail on the CM maturity model, and finally, the offering.
Nowhere, can any other vendor provide such a broad set of capabilities and amass so many experts in ECM.
This is a key IBM unique. And a customer value. This session provides an overview of the role of taxonomy and classification management in ECM, outlines IBM’s strategy for enabling ECM (starting with FileNet P8) classification and taxonomy management and introduces the offering. This session is a Classification for ECM 101, starting high level with key challenges for managing unstructured enterprise content, then moving into more detail on the CM maturity model, and finally, the offering.
Nowhere, can any other vendor provide such a broad set of capabilities and amass so many experts in ECM.
This is a key IBM unique. And a customer value.
2. 2 Agenda The Challenge of Unstructured Content
Key Concepts and Terms
Taxonomy, Classification and ECM Adoption
The Case for ECM Classification Technologies
3. 3 The Challenge of Managing Unstructured Content
4. 4 What’s the right folder for this document? What’s the “document type”? Is this document a “record?” How should it be filed? Modern information economy workers have to make these kinds of content-centric decisions every day. (This is especially when their organization are compliance minded or adhering to a records management program.)
Making these daily decisions on your content is not optional. Furthermore, new uses of the content in your organization -- like a records management initiative, for example -- require new decisions be made about that content. These decisions about content are acts of classification. If taxonomy is a noun, classification is a verb.
---------------------------------
Taxonomy: A hierarchical structure of information components, any part of which can be used to classify a content item in relation to other items in the structure
Classification: A coding of content items as members of a group for the purposes of cataloging them or associating them with a taxonomy. That is, classification is the act of placing a piece of content within that collection.What’s the right folder for this document? What’s the “document type”? Is this document a “record?” How should it be filed? Modern information economy workers have to make these kinds of content-centric decisions every day. (This is especially when their organization are compliance minded or adhering to a records management program.)
Making these daily decisions on your content is not optional. Furthermore, new uses of the content in your organization -- like a records management initiative, for example -- require new decisions be made about that content. These decisions about content are acts of classification. If taxonomy is a noun, classification is a verb.
---------------------------------
Taxonomy: A hierarchical structure of information components, any part of which can be used to classify a content item in relation to other items in the structure
Classification: A coding of content items as members of a group for the purposes of cataloging them or associating them with a taxonomy. That is, classification is the act of placing a piece of content within that collection.
5. 5 Taxonomies provide context for structured data fairly easily – this is being covered quite well by most of the presentations here at Taxonomy Boot Camp. But what about for unstructured data, such as Word documents, PDFs, and so on? The Gartner Group estikmates that unstructured content is more than four times as large as structured data. Not only does unstructured content dwarf structured data, but it is growing at a rate of 30% per year. This means that your unstructured data is doubling every three years!
[TO AUDIENCE] How many of you have SharePoint?
How many attorneys, do you think, have extraordinarily valuable documents just sitting on their desktops?
How are you, as a knowledge worker in the information economy, going to take action on all of this information?!Taxonomies provide context for structured data fairly easily – this is being covered quite well by most of the presentations here at Taxonomy Boot Camp. But what about for unstructured data, such as Word documents, PDFs, and so on? The Gartner Group estikmates that unstructured content is more than four times as large as structured data. Not only does unstructured content dwarf structured data, but it is growing at a rate of 30% per year. This means that your unstructured data is doubling every three years!
[TO AUDIENCE] How many of you have SharePoint?
How many attorneys, do you think, have extraordinarily valuable documents just sitting on their desktops?
How are you, as a knowledge worker in the information economy, going to take action on all of this information?!
6. 6 80% of Enterprise Data is Unstructured There are two key categories of enterprise information – data and content. Data are managed by relational databases. The advent of the relational database ushered in a wave of enterprise applications, starting with ERP, then CRM, Supply Chain Management, etc. These business solutions leveraged the database to help organizations manage their structured transactional business processes.
However, most enterprise information – especially that used in many core business process for services, government and other sectors – is unstructured. According to Gartner, at least 80% of enterprise information is unstructured, and many organizations report growth at over 100% per year. Because unstructured content is the substance of many critical line of business processes, it has become necessary to deploy new technologies for managing its processes.There are two key categories of enterprise information – data and content. Data are managed by relational databases. The advent of the relational database ushered in a wave of enterprise applications, starting with ERP, then CRM, Supply Chain Management, etc. These business solutions leveraged the database to help organizations manage their structured transactional business processes.
However, most enterprise information – especially that used in many core business process for services, government and other sectors – is unstructured. According to Gartner, at least 80% of enterprise information is unstructured, and many organizations report growth at over 100% per year. Because unstructured content is the substance of many critical line of business processes, it has become necessary to deploy new technologies for managing its processes.
7. 7 What is Enterprise Content? According to Forrester, unstructured content falls into 3 main categories – transactional, business and persuasive. Imaging customers are most familiar with transactional content, such as forms, faxes, reports and scanned images. Business content includes process-centric and collaborative document management, while persuasive includes digital assets and web-based content.
Unstructured enterprise content is tightly coupled with business processes. Content is created in processes, changed through business processes and also drives those processes. Finding ways to manage these content forms drives the effectiveness of many core business processes in customer service, case management, document management, etc.According to Forrester, unstructured content falls into 3 main categories – transactional, business and persuasive. Imaging customers are most familiar with transactional content, such as forms, faxes, reports and scanned images. Business content includes process-centric and collaborative document management, while persuasive includes digital assets and web-based content.
Unstructured enterprise content is tightly coupled with business processes. Content is created in processes, changed through business processes and also drives those processes. Finding ways to manage these content forms drives the effectiveness of many core business processes in customer service, case management, document management, etc.
8. 8 Content is Not Managed in One Place According to Forrester, global 2000 organizations have multiple content repositories. Each repository has one or more content management application built on top of it. Most notable is that a quarter of large enterprises have more than 15 content repositories with dependent applications. This can be a real barrier to setting policies and standards for the enterprise, since the organization must interpret the different organizational concepts and schema across different content management repositories.
Please note that these statistics include mainstream content management repositories (e.g., FileNet, IBM, OTEX, DCTM) and not other content management systems.According to Forrester, global 2000 organizations have multiple content repositories. Each repository has one or more content management application built on top of it. Most notable is that a quarter of large enterprises have more than 15 content repositories with dependent applications. This can be a real barrier to setting policies and standards for the enterprise, since the organization must interpret the different organizational concepts and schema across different content management repositories.
Please note that these statistics include mainstream content management repositories (e.g., FileNet, IBM, OTEX, DCTM) and not other content management systems.
9. 9 And Then There’s SharePoint, File Shares and . . . . . . Which leads us to SharePoint. The ease of deployment has made MSFT SharePoint the choice for many departmental collaboration and simple content management solutions. These solutions, including Quickr, Team Rooms and other departmental content management applications, are loosely-organized, relying on users to classify and manage the lifecycle of their content.
The challenge is to gain access to, and manage this content without sacrificing productivity. Being able to classify this content is foundational to being able to manage it.
They look out over that horizon, look out over their own enterprise to find all that other content they’re determined to bring under management. But its not just one file system, or one sharepoint instance. These repositories, these silos, they’re everywhere. There could be hundreds of sharepoint repositories. There could be hundreds of file shares. Silos associated with basic content services can litter the enterprise.
Maybe this content is classified. Probably not.
And to make matters worse, even if there is a taxonomy for each of these content silos, and the content is all classified, it isn’t likely to match the standard information model you’ve built into your ECM architecture. Each of them can come with their own designed taxonomy . . . Or an organic taxonomy – a “Folksonomy”. . . . Which leads us to SharePoint. The ease of deployment has made MSFT SharePoint the choice for many departmental collaboration and simple content management solutions. These solutions, including Quickr, Team Rooms and other departmental content management applications, are loosely-organized, relying on users to classify and manage the lifecycle of their content.
The challenge is to gain access to, and manage this content without sacrificing productivity. Being able to classify this content is foundational to being able to manage it.
They look out over that horizon, look out over their own enterprise to find all that other content they’re determined to bring under management. But its not just one file system, or one sharepoint instance. These repositories, these silos, they’re everywhere. There could be hundreds of sharepoint repositories. There could be hundreds of file shares. Silos associated with basic content services can litter the enterprise.
Maybe this content is classified. Probably not.
And to make matters worse, even if there is a taxonomy for each of these content silos, and the content is all classified, it isn’t likely to match the standard information model you’ve built into your ECM architecture. Each of them can come with their own designed taxonomy . . . Or an organic taxonomy – a “Folksonomy”.
10. 10 Where do I start? We’ve got 600 GB of content from basic content services all over the enterprise.How can we get this content efficiently mapped into our ECM taxonomy?
We’ve been managing our content without classifying it for a few years now.How can our users navigate amongst this existing content in a way that’s intuitive for our business?
The lawyers have to review 400,000 electronic documents for their case.How can we make sure they don’t waste their time? You’re likely familiar with the statistic by now – 80% of the information in the enterprise is unstructured (as opposed to structured, database content). By bringing that content under management with ECM and cataloguing it, enterprises are working hard to both save time and cost by further leveraging the content, but also manage risk and enforce the compliance of that content.
To meet those ends, we’ve seen our customers begin to struggle with organizing this content. Organizing the content, lending structure to this unstructured tangle of documents and email leaves enterprises thinking, “Where do I start?”. Getting this content organized is rising as a hurdle in our customers adoption of ECM. Some examples:
Customers find that they simply cannot classify content as they bulk ingest content into ECM. There is simply no one to manually address the problem of getting hundreds and hundreds of gigabytes of content mapped into their standard ECM taxonomy
It’s not just the new content – it’s also the content that you might already have under management – there wasn’t a classification solution that you had the time or budget for in the past, and now you’re left with content that’s unclassified. Yes, you have it under management, but its still relatively unstructured – you did the bare minimum. Now you’re having trouble finding this content and have better access to this content in an intuitive manner for your business.
Finally, you can’t always proactively organizing your content – new applications (like a legal discovery review process) might demand new ways of approaching the information. Maybe you need to prioritize your content and create workflows that make sure your lawyers are devoting the bulk of their attention to the right area.You’re likely familiar with the statistic by now – 80% of the information in the enterprise is unstructured (as opposed to structured, database content). By bringing that content under management with ECM and cataloguing it, enterprises are working hard to both save time and cost by further leveraging the content, but also manage risk and enforce the compliance of that content.
To meet those ends, we’ve seen our customers begin to struggle with organizing this content. Organizing the content, lending structure to this unstructured tangle of documents and email leaves enterprises thinking, “Where do I start?”. Getting this content organized is rising as a hurdle in our customers adoption of ECM. Some examples:
Customers find that they simply cannot classify content as they bulk ingest content into ECM. There is simply no one to manually address the problem of getting hundreds and hundreds of gigabytes of content mapped into their standard ECM taxonomy
It’s not just the new content – it’s also the content that you might already have under management – there wasn’t a classification solution that you had the time or budget for in the past, and now you’re left with content that’s unclassified. Yes, you have it under management, but its still relatively unstructured – you did the bare minimum. Now you’re having trouble finding this content and have better access to this content in an intuitive manner for your business.
Finally, you can’t always proactively organizing your content – new applications (like a legal discovery review process) might demand new ways of approaching the information. Maybe you need to prioritize your content and create workflows that make sure your lawyers are devoting the bulk of their attention to the right area.
11. 11 Key Concepts and Terms Now we turn to classification and taxonomy management. Before diving deeper into the role of taxonomy and classification management in ECM, it is important to level-set on the key concepts and definitions.
Although many of you have experience with classification (here ask how many audience members are currently working on class/tax projects), the terminology can vary. This section attempts to provide some clarity on these concepts.Now we turn to classification and taxonomy management. Before diving deeper into the role of taxonomy and classification management in ECM, it is important to level-set on the key concepts and definitions.
Although many of you have experience with classification (here ask how many audience members are currently working on class/tax projects), the terminology can vary. This section attempts to provide some clarity on these concepts.
12. 12 Key Concepts Taxonomy: a hierarchical structure of information components, any part of which can be used to classify a content item in relation to other items in the structure.
Classification: a coding of content items as members of a group for the purposes of cataloging them or associating them with a taxonomy.
Metadata: a means of describing, locating, cataloging, and activating content as objects in a software ecosystem (literally, data about data).
Enterprise Catalog: a centralized and normalized metadata model for unstructured content for the purposes of providing consistent services across all ECM applications. Taxonomy - Paraphrased from Gartner: “A taxonomy is a classification, typically hierarchical, of information components (for example, terms, concepts, graphics and sounds) and the relationships among them. In a hierarchical structure, it reflects increasing levels of specificity the further down the hierarchy a particular element lies. Taxonomies may be used to represent membership in various domains, and can support information organization, discovery, presentation and access. The taxonomic organization and its labels serve as metadata for the content they organize.” Hype Cycle for the High Performance Workplace 2006, Rita E. Knox et al. 14 July 2006 (ID: G00139954), p. 34.
Classification - Classification is rooted in natural and library sciences for the purposes of identification and location. Wikipedia provides the most useful definition and analogy for ECM classification: “A library classification is a system of coding and organizing library materials (books, serials, audiovisual materials, computer files, maps, manuscripts, realia) according to their subject and allocating a call number to that information resource. Similar to classification systems used in biology, bibliographic classification systems group entities that are similar together typically arranged in a hierarchical tree structure (assuming none-faceted system). . . . Classification of a piece of work consists of two steps. Firstly the 'aboutness' of the material is ascertained. Next, a call number based on the classification system will be assigned to the work using the notation of the system. . . . Classification systems in libraries generally play two roles. Firstly they facilitate subject access (See Cutter) by allowing the user to find out what works or documents the library has on a certain subject. Secondly, they provide a known location for the information source to be located (e.g where it is shelved).”
Taxonomy - Paraphrased from Gartner: “A taxonomy is a classification, typically hierarchical, of information components (for example, terms, concepts, graphics and sounds) and the relationships among them. In a hierarchical structure, it reflects increasing levels of specificity the further down the hierarchy a particular element lies. Taxonomies may be used to represent membership in various domains, and can support information organization, discovery, presentation and access. The taxonomic organization and its labels serve as metadata for the content they organize.” Hype Cycle for the High Performance Workplace 2006, Rita E. Knox et al. 14 July 2006 (ID: G00139954), p. 34.
Classification - Classification is rooted in natural and library sciences for the purposes of identification and location. Wikipedia provides the most useful definition and analogy for ECM classification: “A library classification is a system of coding and organizing library materials (books, serials, audiovisual materials, computer files, maps, manuscripts, realia) according to their subject and allocating a call number to that information resource. Similar to classification systems used in biology, bibliographic classification systems group entities that are similar together typically arranged in a hierarchical tree structure (assuming none-faceted system). . . . Classification of a piece of work consists of two steps. Firstly the 'aboutness' of the material is ascertained. Next, a call number based on the classification system will be assigned to the work using the notation of the system. . . . Classification systems in libraries generally play two roles. Firstly they facilitate subject access (See Cutter) by allowing the user to find out what works or documents the library has on a certain subject. Secondly, they provide a known location for the information source to be located (e.g where it is shelved).”
13. 13 Taxonomy Is . . . Not turning animals into trophies A system for organizing the corpus of business content Before the builds, point out that we often get asked about what taxonomy is. I have been asked if it is about stuffing animals. Hit the first build – what taxonomy is not.
Second build explains what it is in simple terms. The library image also provides the organizing principles for ECM and the use of the enterprise catalog for organizing and providing access to the corpus of content.Before the builds, point out that we often get asked about what taxonomy is. I have been asked if it is about stuffing animals. Hit the first build – what taxonomy is not.
Second build explains what it is in simple terms. The library image also provides the organizing principles for ECM and the use of the enterprise catalog for organizing and providing access to the corpus of content.
14. 14 Taxonomy and Classification in ECM Classification Examples:
Document Classing
Foldering
Taxonomy Examples:
Enterprise Content Catalog
Industry Standard Document Taxonomies (ISO, XMI)
Methods:
Rules-Based: Applies pre-determined rules for ‘if, then’ classification of text and properties
Analytics-Based: Applies algorithms to interpret classes in order to apply classification rules to them These are the most fundamental types of classification and taxonomy. Most ECM customers will be familiar with these concepts, so it should provide a baseline for the remaining discussion.
2 key methods of classification:
Rules-based is most common and very useful for classification. Rules-based classification allows setting pre-determined rules for classing content, then determining the appropriate actions to take on it. The ‘if-then’ logic of rules-based approaches quite literally performs classification (‘if this’ classifies a content item) in the process of taking action on it. Records Crawler is a solid rules-based engine for classification and provides a useful tool for many classification tasks.
Analytics-based classification provides content intelligence as a way of understanding content that can not be easily classified through pre-defined rules. Using advanced analytical algorithms, analytics-based classification can learn a taxonomy and interpret unknown content to automate its classification. These approaches have the benefits of reducing the number of exceptions to manage, as well as help classify content in file shares and collaboration solutions where little or inconsistent metadata are common.
These approaches are complementary – rules-based classification can address most content, while analytics-based classification helps manage exceptions. Today, the IBM Classification Module (available through the OmniFind offering) provides out-of-the-box analytics classification. These are the most fundamental types of classification and taxonomy. Most ECM customers will be familiar with these concepts, so it should provide a baseline for the remaining discussion.
2 key methods of classification:
Rules-based is most common and very useful for classification. Rules-based classification allows setting pre-determined rules for classing content, then determining the appropriate actions to take on it. The ‘if-then’ logic of rules-based approaches quite literally performs classification (‘if this’ classifies a content item) in the process of taking action on it. Records Crawler is a solid rules-based engine for classification and provides a useful tool for many classification tasks.
Analytics-based classification provides content intelligence as a way of understanding content that can not be easily classified through pre-defined rules. Using advanced analytical algorithms, analytics-based classification can learn a taxonomy and interpret unknown content to automate its classification. These approaches have the benefits of reducing the number of exceptions to manage, as well as help classify content in file shares and collaboration solutions where little or inconsistent metadata are common.
These approaches are complementary – rules-based classification can address most content, while analytics-based classification helps manage exceptions. Today, the IBM Classification Module (available through the OmniFind offering) provides out-of-the-box analytics classification.
15. 15 ECM Taxonomy Illustrated Before we go any further, lets pause to define some terms. Taxonomy can mean different things to different people. In the spirit of “a picture is worth a thousand words”, we’ve thrown up some examples of taxonomies.
Lets build up to a definition of a taxonomy
Metadata is data about content. Think of metadata as an attribute of a piece of content or a tag associated to a piece of content.
In turn, a taxonomy is a collection of the different metadata values, for a particular attribute (or category). All the different values an attribute could have are collected into a metadata structure called a taxonomy.
Some easy examples of taxonomies within the context of P8 are folders or document classes. But really, any other attribute that has been designed into your ECM deployment makes up a taxonomy.
The act of deciding where in a particular taxonomy a document should be placed is the act of classification. And that decision – that action of classification – is at the basis of the solution we’ll discuss today.
==================
Other notes to the speaker:
Some outright, formal definitions:
Metadata: A means of describing, locating, cataloging, and activating content as objects in a software ecosystem (literally, data about data)
Taxonomy: A hierarchical structure of information components, any part of which can be used to classify a content item in relation to other items in the structure
Classification: A coding of content items as members of a group for the purposes of cataloging them or associating them with a taxonomy
===========================
Taxonomy - Paraphrased from Gartner: “A taxonomy is a classification, typically hierarchical, of information components (for example, terms, concepts, graphics and sounds) and the relationships among them. In a hierarchical structure, it reflects increasing levels of specificity the further down the hierarchy a particular element lies. Taxonomies may be used to represent membership in various domains, and can support information organization, discovery, presentation and access. The taxonomic organization and its labels serve as metadata for the content they organize.” Hype Cycle for the High Performance Workplace 2006, Rita E. Knox et al. 14 July 2006 (ID: G00139954), p. 34.
Classification - Classification is rooted in natural and library sciences for the purposes of identification and location. Wikipedia provides the most useful definition and analogy for ECM classification: “A library classification is a system of coding and organizing library materials (books, serials, audiovisual materials, computer files, maps, manuscripts, realia) according to their subject and allocating a call number to that information resource. Similar to classification systems used in biology, bibliographic classification systems group entities that are similar together typically arranged in a hierarchical tree structure (assuming none-faceted system). . . . Classification of a piece of work consists of two steps. Firstly the 'aboutness' of the material is ascertained. Next, a call number based on the classification system will be assigned to the work using the notation of the system. . . . Classification systems in libraries generally play two roles. Firstly they facilitate subject access (See Cutter) by allowing the user to find out what works or documents the library has on a certain subject. Secondly, they provide a known location for the information source to be located (e.g where it is shelved).”
Before we go any further, lets pause to define some terms. Taxonomy can mean different things to different people. In the spirit of “a picture is worth a thousand words”, we’ve thrown up some examples of taxonomies.
Lets build up to a definition of a taxonomy
Metadata is data about content. Think of metadata as an attribute of a piece of content or a tag associated to a piece of content.
In turn, a taxonomy is a collection of the different metadata values, for a particular attribute (or category). All the different values an attribute could have are collected into a metadata structure called a taxonomy.
Some easy examples of taxonomies within the context of P8 are folders or document classes. But really, any other attribute that has been designed into your ECM deployment makes up a taxonomy.
The act of deciding where in a particular taxonomy a document should be placed is the act of classification. And that decision – that action of classification – is at the basis of the solution we’ll discuss today.
==================
Other notes to the speaker:
Some outright, formal definitions:
Metadata: A means of describing, locating, cataloging, and activating content as objects in a software ecosystem (literally, data about data)
Taxonomy: A hierarchical structure of information components, any part of which can be used to classify a content item in relation to other items in the structure
Classification: A coding of content items as members of a group for the purposes of cataloging them or associating them with a taxonomy
===========================
Taxonomy - Paraphrased from Gartner: “A taxonomy is a classification, typically hierarchical, of information components (for example, terms, concepts, graphics and sounds) and the relationships among them. In a hierarchical structure, it reflects increasing levels of specificity the further down the hierarchy a particular element lies. Taxonomies may be used to represent membership in various domains, and can support information organization, discovery, presentation and access. The taxonomic organization and its labels serve as metadata for the content they organize.” Hype Cycle for the High Performance Workplace 2006, Rita E. Knox et al. 14 July 2006 (ID: G00139954), p. 34.
Classification - Classification is rooted in natural and library sciences for the purposes of identification and location. Wikipedia provides the most useful definition and analogy for ECM classification: “A library classification is a system of coding and organizing library materials (books, serials, audiovisual materials, computer files, maps, manuscripts, realia) according to their subject and allocating a call number to that information resource. Similar to classification systems used in biology, bibliographic classification systems group entities that are similar together typically arranged in a hierarchical tree structure (assuming none-faceted system). . . . Classification of a piece of work consists of two steps. Firstly the 'aboutness' of the material is ascertained. Next, a call number based on the classification system will be assigned to the work using the notation of the system. . . . Classification systems in libraries generally play two roles. Firstly they facilitate subject access (See Cutter) by allowing the user to find out what works or documents the library has on a certain subject. Secondly, they provide a known location for the information source to be located (e.g where it is shelved).”
16. 16 Taxonomy, Classification and ECM Adoption Now let’s connect the dots – how classification and taxonomy play in ECM.Now let’s connect the dots – how classification and taxonomy play in ECM.
17. 17 Business Drivers for ECM Taxonomy Management Proliferating departmental solutions
Content Management
Collaboration (SP, Quickr, Team Rooms, Wikis)
User-based classification and high workforce turnover
Productivity declines as knowledge disappears
Legal discovery is a secondary concern
Mergers and Acquisitions – need to reconcile disparate content management practices, repositories and processes Challenges – the growing number of departmental solutions is an obvious challenge. How do organizations balance the need for knowledge-worker productivity with establishing and maintaining corporate standards for business process and governance.
High workforce turnover risks a loss of knowledge and understanding of the uses of content. Although lawsuits are an obvious example (an employee leaves and then sues the company), the loss of productivity that occurs from the loss of knowledge about the location and uses of content is a higher, more measurable cost. Studies have suggested that the average company invests $50,000 to train new knowledge workers. Add to this cost the cost of recreating documents that may be ‘out there somewhere’, and turnover can add significant expense to relying on users to manage classification.
M&A is another obvious challenge. Not only must different content be accessed and integrated, but this content is always borne of different processes, users and work cultures. Being able to understand acquired taxonomies is essential to integrating them into the new ECM system.Challenges – the growing number of departmental solutions is an obvious challenge. How do organizations balance the need for knowledge-worker productivity with establishing and maintaining corporate standards for business process and governance.
High workforce turnover risks a loss of knowledge and understanding of the uses of content. Although lawsuits are an obvious example (an employee leaves and then sues the company), the loss of productivity that occurs from the loss of knowledge about the location and uses of content is a higher, more measurable cost. Studies have suggested that the average company invests $50,000 to train new knowledge workers. Add to this cost the cost of recreating documents that may be ‘out there somewhere’, and turnover can add significant expense to relying on users to manage classification.
M&A is another obvious challenge. Not only must different content be accessed and integrated, but this content is always borne of different processes, users and work cultures. Being able to understand acquired taxonomies is essential to integrating them into the new ECM system.
18. 18 Classification is Hard Work
19. 19 Organization is the Root Cause Most organizations face content taxonomy barriers – especially as they standardize around ECM
Assigning categories en masse
Reclassifying existing content as taxonomies evolve
Merging taxonomies
Integrating the wisdom of folksonomies In Enterprise Content Management, taxonomies ensure that content is accurately catalogued and easily accessible. Having consistent and reliable access to unstructured content is arguably the foundation to realizing the business benefits of ECM, and all subsequent content-centric enterprise applications will realize their ROI by leveraging this essential capability.
Enterprise information architects see the standardization of content under a single set of rules and policies, as a key driver for ECM adoption. ECM platform standardization on IBM FileNet P8 enables the management of unstructured content across the enterprise by metadata management into a single, unified catalog. Standardization also raises a unique challenge – namely, how to manage content mapped under widely different metadata structures (i.e. different taxonomies) spread across multiple departments, repositories, and applications.
This manifests itself in some pain points that you yourself might be pondering or tackling right now.
1) The easiest, most basic use case to think of is bringing new content under management into ECM. Its stored on a fileshare and you need to accurately catalog it – thousands of documents coming in ‘en masse’, requiring classification.
2) The next pain point is related – what if you’ve already brought content under management, and either because you didn’t have the facility before or the need didn’t exist at ingestion time, the content doesn’t have the proper classification. You need to reclassify your content.
3) The business scenarios go on . . . You might have merged with a company with its own ECM. How are you going to rectify the conflicting taxonomies?
4) You’ve got taxonomies distributed through the enterprise – they don’t exactly match your standard taxonomy – but they’ve got interesting aspects to them – how are you going to normalize your standard taxonomy to take this wisdom into account?In Enterprise Content Management, taxonomies ensure that content is accurately catalogued and easily accessible. Having consistent and reliable access to unstructured content is arguably the foundation to realizing the business benefits of ECM, and all subsequent content-centric enterprise applications will realize their ROI by leveraging this essential capability.
Enterprise information architects see the standardization of content under a single set of rules and policies, as a key driver for ECM adoption. ECM platform standardization on IBM FileNet P8 enables the management of unstructured content across the enterprise by metadata management into a single, unified catalog. Standardization also raises a unique challenge – namely, how to manage content mapped under widely different metadata structures (i.e. different taxonomies) spread across multiple departments, repositories, and applications.
This manifests itself in some pain points that you yourself might be pondering or tackling right now.
1) The easiest, most basic use case to think of is bringing new content under management into ECM. Its stored on a fileshare and you need to accurately catalog it – thousands of documents coming in ‘en masse’, requiring classification.
2) The next pain point is related – what if you’ve already brought content under management, and either because you didn’t have the facility before or the need didn’t exist at ingestion time, the content doesn’t have the proper classification. You need to reclassify your content.
3) The business scenarios go on . . . You might have merged with a company with its own ECM. How are you going to rectify the conflicting taxonomies?
4) You’ve got taxonomies distributed through the enterprise – they don’t exactly match your standard taxonomy – but they’ve got interesting aspects to them – how are you going to normalize your standard taxonomy to take this wisdom into account?
20. 20 Challenges and Impacts of Merging Taxonomies Misclassification – change is constant, and master taxonomies must manage multiple custom taxonomies for each content source
“Folksonomies” from departmental collaboration solutions are created by users and unmanaged by ECM standards
Impact:
Unreliable Metadata – Inconsistencies lose or mislabel content
Process Misfires – Poor metadata triggers incorrect events and workflows
Information challenges.
Unreliable metadata is particularly a challenge with BCS (Basic Content Services) and local collaboration solutions (SharePoint, Quickr). These solutions rely on users to manage metadata, if it is managed at all. Without consistent metadata, traditional classification methods break down.
“Folksonomies” are user-created, purpose-built taxonomies. Systems like SharePoint and Team Rooms allow users to create and use their own classifications. While folksonomies might have their work purpose, they create a challenge for applying consistent standards to their content.
Classification flux is the fact that taxonomies are dynamic and constantly subject to change. For this reason classification is not a single-shot but must be performed on a regular basis to ensure that taxonomies are not out of synch. Information challenges.
Unreliable metadata is particularly a challenge with BCS (Basic Content Services) and local collaboration solutions (SharePoint, Quickr). These solutions rely on users to manage metadata, if it is managed at all. Without consistent metadata, traditional classification methods break down.
“Folksonomies” are user-created, purpose-built taxonomies. Systems like SharePoint and Team Rooms allow users to create and use their own classifications. While folksonomies might have their work purpose, they create a challenge for applying consistent standards to their content.
Classification flux is the fact that taxonomies are dynamic and constantly subject to change. For this reason classification is not a single-shot but must be performed on a regular basis to ensure that taxonomies are not out of synch.
21. 21 Classification Barriers to ECM Maturity The keynote session provided an introduction to the ECM maturity model. Classification challenges can pose hurdles to moving up the maturity curve, specifically:
Build 1 (ingestion): moving from a siloed to a more integrated content management environment poses the challenge of being able to interpret and classify content across multiple silos. Getting metadata into FileNet P8 from multiple content silos is a major challenge, since understanding how to classify it often requires manual effort. At this stage, automation is needed to manage the volume and types of content.
Build 2 (Standardization): Establishing an ECM platform as the enterprise standard for content-centric business applications has demonstrable ROI in reducing operating costs and allowing the consistent capturing and application of policies across the organization. The hurdle to standardization is applying standards across multiple taxonomies. The inability to reconcile different taxonomies can lead to failure to standardize.
Build 3 (Enforcement): Once standards are set, classification is necessary to provide ongoing enforcement of content management policies across the organizations. Content must be understood in order to apply both policies and process to it. Classification and taxonomy serve as the basis for this understanding. Compliance typically surfaces the pain of classification and enforcement, although many content management applications raise the requirement.
Note: Often the challenge of taxonomy management is less about defining an enterprise taxonomy than about applying it to the different taxonomies that exist in work processes and different forms of content. In this case, taxonomy does not need to be imposed as much as it needs to interpret, understand, and classify content in different applications and sources.The keynote session provided an introduction to the ECM maturity model. Classification challenges can pose hurdles to moving up the maturity curve, specifically:
Build 1 (ingestion): moving from a siloed to a more integrated content management environment poses the challenge of being able to interpret and classify content across multiple silos. Getting metadata into FileNet P8 from multiple content silos is a major challenge, since understanding how to classify it often requires manual effort. At this stage, automation is needed to manage the volume and types of content.
Build 2 (Standardization): Establishing an ECM platform as the enterprise standard for content-centric business applications has demonstrable ROI in reducing operating costs and allowing the consistent capturing and application of policies across the organization. The hurdle to standardization is applying standards across multiple taxonomies. The inability to reconcile different taxonomies can lead to failure to standardize.
Build 3 (Enforcement): Once standards are set, classification is necessary to provide ongoing enforcement of content management policies across the organizations. Content must be understood in order to apply both policies and process to it. Classification and taxonomy serve as the basis for this understanding. Compliance typically surfaces the pain of classification and enforcement, although many content management applications raise the requirement.
Note: Often the challenge of taxonomy management is less about defining an enterprise taxonomy than about applying it to the different taxonomies that exist in work processes and different forms of content. In this case, taxonomy does not need to be imposed as much as it needs to interpret, understand, and classify content in different applications and sources.
22. 22 Lessons Learned From ERP Adoption Getting Classification Right: ‘Garbage in = garbage out’ is often used in metadata management projects to describe the problem of building a metadata model on inconsistent sources.
Driving Process on Taxonomies: ERP systems depending on 3 master taxonomies – material, vendor and customer. These taxonomies drive events, workflow definition and the development of transaction-centric business process applications
Mastering Metadata: The ability to deploy new enterprise applications depends upon the re-usability, scalability and integrity of the metadata model
System of Record is Required for Standardization:
Establishes an enterprise standard that can be audited
Forms the foundation for building demonstrable best practices
Enforces consistency of data capture and output ECM has many lessons to learn from the mass adoption of ERP. SAP arguably became the market leader in this space by closely managing classification, taxonomy, metadata and process. ECM has similar challenges today.
Getting Classification Right: Managing data input requires discipline. ERP systems tightly controlled the classification of data and data entry to maintain the integrity of their metadata models. Much of the pain of ERP implementation was training users to enter data in a highly-structured, disciplined way. Classification is at the core of this process. ECM will have special challenges, however, since much unstructured content is produced by knowledge workers in their own processes. The challenge for ECM will be to classify in ways that maintain and enforce process and policies without interfering with productivity.
Driving Process on Taxonomies: SAP built the R/3 system on 3 master taxonomies – material, vendor and customer. The relationship among these 3 taxonomies formed the basis of their event model, workflow and application stack. This provided a consistent way of applying taxonomy to improve the efficiency of their business processes. The challenge for ECM adoption is to maintain productivity while providing the benefits of master taxonomies.
Mastering Metadata: Not a new concept to FileNet P8 users, mastering metadata for unstructured content ensures consistent access to, and interaction with, that content across different business functions and sources.
Establishing a System of Record: First used in data management in the 1980s, a ‘system of record’ provides a basis of accountability for all of the information contained in the system. This means it can pass audit (internal, external, regulatory, industry) and serves as the basis for demonstrating policy compliance. ERP systems established themselves as the ‘system of record’ for transactional business processes – most notably General Ledger, costing, inventory, procurement, etc. The challenge for ECM is to establish a system of record for unstructured content and content-centric business processes, records management, document management and other ECM processes.ECM has many lessons to learn from the mass adoption of ERP. SAP arguably became the market leader in this space by closely managing classification, taxonomy, metadata and process. ECM has similar challenges today.
Getting Classification Right: Managing data input requires discipline. ERP systems tightly controlled the classification of data and data entry to maintain the integrity of their metadata models. Much of the pain of ERP implementation was training users to enter data in a highly-structured, disciplined way. Classification is at the core of this process. ECM will have special challenges, however, since much unstructured content is produced by knowledge workers in their own processes. The challenge for ECM will be to classify in ways that maintain and enforce process and policies without interfering with productivity.
Driving Process on Taxonomies: SAP built the R/3 system on 3 master taxonomies – material, vendor and customer. The relationship among these 3 taxonomies formed the basis of their event model, workflow and application stack. This provided a consistent way of applying taxonomy to improve the efficiency of their business processes. The challenge for ECM adoption is to maintain productivity while providing the benefits of master taxonomies.
Mastering Metadata: Not a new concept to FileNet P8 users, mastering metadata for unstructured content ensures consistent access to, and interaction with, that content across different business functions and sources.
Establishing a System of Record: First used in data management in the 1980s, a ‘system of record’ provides a basis of accountability for all of the information contained in the system. This means it can pass audit (internal, external, regulatory, industry) and serves as the basis for demonstrating policy compliance. ERP systems established themselves as the ‘system of record’ for transactional business processes – most notably General Ledger, costing, inventory, procurement, etc. The challenge for ECM is to establish a system of record for unstructured content and content-centric business processes, records management, document management and other ECM processes.
23. 23 Customer Lessons for Mastering Taxonomies ‘Master’ taxonomy of record required for
Compliance
Business process applications
Merged master taxonomies become large and unwieldy
Multiple taxonomies require integration and translation
Centralized, decentralized, or hybrid?
Autoclassification increasingly is used to manage:
Taxonomy merging from multiple use cases
Taxonomy/folksonomy translation from distributed content sources
24. 24 The Case for ECM Classification Technologies This final section introduces the strategy and offering for classification and taxonomy management for FileNet P8.This final section introduces the strategy and offering for classification and taxonomy management for FileNet P8.
25. 25 What are your classification options? Decision making of this sort has a long history in organizations. We can look back and recall how telephone calls were connected. <CLICK> It was a highly manual process – one that had common classification steps. These three common steps would be to analyze, decide, and take action. So for a telephone call, our manual operators would
1) Identify who’s calling
2) Decide to whom the caller should be connected
3) Finally, take action by creating the physical connection between the two.
Though not intellectually challenging, the process took a quantifiable amount of time and effort.
In modern terms, this type of classification process crops up frequently – think of our email inboxes. <CLICK> We need to follow the same process for foldering our email. Or our fileshare content. Or Sharepoint. Every piece of unstructured content in your organization could face this kind of decision making process for downstream applications such as records management or compliance initiatives, enterprise search, or general business processes.
Analyze: The user digests the content
Decide: Determine the right business process – what do we want to happen to the content?
Take Action: Execute the actual business process you have decided upon
Of course, executing this manual classification decision making process assumes that the user understands and has the right background to execute this process, and that they don’t opt out.Decision making of this sort has a long history in organizations. We can look back and recall how telephone calls were connected. <CLICK> It was a highly manual process – one that had common classification steps. These three common steps would be to analyze, decide, and take action. So for a telephone call, our manual operators would
1) Identify who’s calling
2) Decide to whom the caller should be connected
3) Finally, take action by creating the physical connection between the two.
Though not intellectually challenging, the process took a quantifiable amount of time and effort.
In modern terms, this type of classification process crops up frequently – think of our email inboxes. <CLICK> We need to follow the same process for foldering our email. Or our fileshare content. Or Sharepoint. Every piece of unstructured content in your organization could face this kind of decision making process for downstream applications such as records management or compliance initiatives, enterprise search, or general business processes.
Analyze: The user digests the content
Decide: Determine the right business process – what do we want to happen to the content?
Take Action: Execute the actual business process you have decided upon
Of course, executing this manual classification decision making process assumes that the user understands and has the right background to execute this process, and that they don’t opt out.
26. 26 Effective Accuracy Right now some of you are thinking… hey, that’s not bad. Humans are pretty accurate! And you are right – human’s are very accurate, but only when they participate. The ARMA reports that manual filing results in misfilings 2 – 7% of the time. Lets use 5% as a reasonable mid-point -- although Gartner estimates your workers are even less accurate -- but for our purposes we’ll credit humans with 95% accuracy.
Human beings can choose to opt out. Studies have shown that over time, participation drops off in manual electronic filing programs. These programs emphasize training of the users to educate them on making manual classification decision and the initial response to training is good. And then, over time, they deprioritize the task, the training fades and participation drops. We intuitively know this from our day-to-day working life – manual filing of content is not a primary objective of LOB workers – how often have you just chosen the first option in a dropdown because you’re in a rush? And we’ve seen this documented in studies at the National Archives. After the training fades, you can only expect 50% participation in manual filing.
[The National Archives and Records Administration – the custodian of all US records and archives – did a 6 month study which culminated in 2002 of ‘state of the art’ technology available at the time. Of course, this technology relied upon end users to make decisions about what was a record and how to declare/classify. NARA – perhaps more than any other organization, has a special interest in managing records – and yet they couldn’t get their own users to consistently file records. As you can see in the chart, there was a significant drop-off after the training period, and 56% of people who participated in the study found the technology – which is still being marketed -- “Burdensome” or “Extremely Burdensome” to use. Interesting to note that some users did not even file any records.]
<click> If we can only expect 50% participation in a manual classification program, then despite our workers’ accuracy in making decisions, their effective accuracy is much lower because they are not participating – with only half of a population participating actively, you’re left with 48% effective accuracy instead of 95%. Automated methods of classification, in contrast, do not fall victim to whims, motivations and priorities of the typical worker. With rules based approaches, your accuracy depends primarily on how good of a rule builder you are and how much you invest in creating and maintaining good rules. As such, your accuracy can vary widely. Rules, however, are coarse and unlikely to reach the same level of accuracy in filing that humans can. Taking these two factors into account, we estimate the accuracy of rules based approaches to be between 40-70%. And unlike human based approaches – the rules engine doesn’t lack motivation – its participation is 100%, leading to an effective accuracy of 40 – 70%.
Next lets review context sensitive approaches. The best rule builders in the world are never going to be as accurate as the best contextually sensitive approaches and research into automated classification has born this out. And the high end of accuracy for these approaches has been proven to be 90% and above. Filter this high accuracy through “perfect” participation, and you’ll get the best results as an individual method. Bottom-line: human’s are the most accurate classifiers in isolation, but when viewed through the lens of how well people participate, un-aided humans become the least effective, least consistent and least reliable method of classifying content.Right now some of you are thinking… hey, that’s not bad. Humans are pretty accurate! And you are right – human’s are very accurate, but only when they participate. The ARMA reports that manual filing results in misfilings 2 – 7% of the time. Lets use 5% as a reasonable mid-point -- although Gartner estimates your workers are even less accurate -- but for our purposes we’ll credit humans with 95% accuracy.
Human beings can choose to opt out. Studies have shown that over time, participation drops off in manual electronic filing programs. These programs emphasize training of the users to educate them on making manual classification decision and the initial response to training is good. And then, over time, they deprioritize the task, the training fades and participation drops. We intuitively know this from our day-to-day working life – manual filing of content is not a primary objective of LOB workers – how often have you just chosen the first option in a dropdown because you’re in a rush? And we’ve seen this documented in studies at the National Archives. After the training fades, you can only expect 50% participation in manual filing.
[The National Archives and Records Administration – the custodian of all US records and archives – did a 6 month study which culminated in 2002 of ‘state of the art’ technology available at the time. Of course, this technology relied upon end users to make decisions about what was a record and how to declare/classify. NARA – perhaps more than any other organization, has a special interest in managing records – and yet they couldn’t get their own users to consistently file records. As you can see in the chart, there was a significant drop-off after the training period, and 56% of people who participated in the study found the technology – which is still being marketed -- “Burdensome” or “Extremely Burdensome” to use. Interesting to note that some users did not even file any records.]
<click> If we can only expect 50% participation in a manual classification program, then despite our workers’ accuracy in making decisions, their effective accuracy is much lower because they are not participating – with only half of a population participating actively, you’re left with 48% effective accuracy instead of 95%. Automated methods of classification, in contrast, do not fall victim to whims, motivations and priorities of the typical worker. With rules based approaches, your accuracy depends primarily on how good of a rule builder you are and how much you invest in creating and maintaining good rules. As such, your accuracy can vary widely. Rules, however, are coarse and unlikely to reach the same level of accuracy in filing that humans can. Taking these two factors into account, we estimate the accuracy of rules based approaches to be between 40-70%. And unlike human based approaches – the rules engine doesn’t lack motivation – its participation is 100%, leading to an effective accuracy of 40 – 70%.
Next lets review context sensitive approaches. The best rule builders in the world are never going to be as accurate as the best contextually sensitive approaches and research into automated classification has born this out. And the high end of accuracy for these approaches has been proven to be 90% and above. Filter this high accuracy through “perfect” participation, and you’ll get the best results as an individual method. Bottom-line: human’s are the most accurate classifiers in isolation, but when viewed through the lens of how well people participate, un-aided humans become the least effective, least consistent and least reliable method of classifying content.
27. 27 State of Classification Management Technologies ECM Classification/Taxonomy is an emerging discipline
Industry standard taxonomies:
Focus on business function or transaction types
Have not reached the enterprise level
Classification best practices:
Content ingestion
Application development reclassification
Classification software focuses on content ingestion:
Electronic content (email, Office documents, free-form text)
Paper content (document images) requires OCR
Search is not enough – must drive value in the business process
28. 28 Criteria For ECM Classification Management Solutions Integrate with and support the ECM metadata model
Interpret a highly-federated content ecosystem
Go beyond search to catalog and manage content
Build on advanced analytic technologies – rules alone are not enough
Interpret content to extract meaningful (meta)data
Employ multiple methods (engines) for classification
Integrate teaching/learning
29. 29 Classification Cost ModelAll projects have comparable implementation costsThe operational costs are the key We’ve covered the volume of content, we’ve covered accuracy, now lets tackle the cost of classification….
Any system is going to cost some money in an upfront setup and deployment. Set that aside, the key area of differentiation in terms of cost is going to be ongoing cost.
We know, from a study by Cohasset Associates, that human beings cost 17 cents per document when you’re asking them to manually decide whether a document is a record and in turn execute the decision and action required to classify the content. Analyze the content, decide the right course of action, execute that course of action. Those three steps take a quantifiable amount of time for a human. It adds up to about 17 cents per document.
Automated solutions involve none of that human involvement. The participation problem is eliminated with a rules or context sensitive approach. Ongoing maintenance tasks for the automated solutions pale in comparison to the distributed productivity costs associated to manual classification.
Best case, manual classification is 17 times more expensive than automated methods! Run away from any experts who recommend pushing classification tasks down as deep as possible into an organization when it comes to deploying records management practices; the cost to actually pursue this so called best practice and engage your entire organization in manual filing becomes prohibitive very quickly.We’ve covered the volume of content, we’ve covered accuracy, now lets tackle the cost of classification….
Any system is going to cost some money in an upfront setup and deployment. Set that aside, the key area of differentiation in terms of cost is going to be ongoing cost.
We know, from a study by Cohasset Associates, that human beings cost 17 cents per document when you’re asking them to manually decide whether a document is a record and in turn execute the decision and action required to classify the content. Analyze the content, decide the right course of action, execute that course of action. Those three steps take a quantifiable amount of time for a human. It adds up to about 17 cents per document.
Automated solutions involve none of that human involvement. The participation problem is eliminated with a rules or context sensitive approach. Ongoing maintenance tasks for the automated solutions pale in comparison to the distributed productivity costs associated to manual classification.
Best case, manual classification is 17 times more expensive than automated methods! Run away from any experts who recommend pushing classification tasks down as deep as possible into an organization when it comes to deploying records management practices; the cost to actually pursue this so called best practice and engage your entire organization in manual filing becomes prohibitive very quickly.
30. 30 With unstructured content growing ~30% annually, the time investment for non-automated methods is increasingly significant Of course, information is not stagnant: unstructured content is growing by 30% every year.
Today, if we burden our end users with manual classification for records declaration, it takes up 2.5% of their time. That is, technology, with forced manual classification, would be reducing -- not improving -- your workers’ productivity. And with the explosive growth of unstructured content, this productivity sap can conceptually could grow to 5.0%
<walk through the math>
<Rhetorical question at end>: Here’s a knowledge worker who’s going through this 72 times a day. The question for you in the audience. Do you get more than 72 emails a day? Who among us can afford a 2.5% hit on our productivity? The answer is no one can.
This is why manual classification is flawed.Of course, information is not stagnant: unstructured content is growing by 30% every year.
Today, if we burden our end users with manual classification for records declaration, it takes up 2.5% of their time. That is, technology, with forced manual classification, would be reducing -- not improving -- your workers’ productivity. And with the explosive growth of unstructured content, this productivity sap can conceptually could grow to 5.0%
<walk through the math>
<Rhetorical question at end>: Here’s a knowledge worker who’s going through this 72 times a day. The question for you in the audience. Do you get more than 72 emails a day? Who among us can afford a 2.5% hit on our productivity? The answer is no one can.
This is why manual classification is flawed.
31. 31 Hard Data for Creating an ROI Case 17 cents/per manual classification decision (Cohasset Associates)
Automated classification has an ongoing cost of much less than 1 cent/classification (IBM Estimate)
Average enterprise user has 25 emails per day (IDC, IBM customers)
Average user comes into contact with 70 pieces of content every day
Average business email size is 100 – 130 KB (IDC, IBM customers)
Information workers spend 9 – 10 hours a week looking for information. They waste about 3.5 hours each week on searches that don't turn up the right information. (IDC)
5% of email qualifies as “business records” (IBM Customers)
$1 buys 6 GB of disk
We can push this analysis into greater detail. Let’s begin with some hard data for building an ROI case…
17 cents/per manual classification decision (Cohasset Associates)
15 seconds per manual filing: analyze, decide and execute
Assumes knowledge worker salary of $84K
Automated classification has an ongoing cost of much less than 1 cent/classification (IBM Estimate)
Compare 17 cents vs 1 cent per classification
Average enterprise user has 25 emails per day (IDC, IBM customers)
Average user comes into contact with 70 pieces of content every day
Average business email size is 100 – 130 KB (IDC, IBM customers)
Information workers spend 9 – 10 hours a week looking for information. They waste about 3.5 hours each week on searches that don't turn up the right information. (IDC)
5% of email qualifies as “business records” (IBM Customers)
$1 buys 6 GB of disk
We can push this analysis into greater detail. Let’s begin with some hard data for building an ROI case…
17 cents/per manual classification decision (Cohasset Associates)
15 seconds per manual filing: analyze, decide and execute
Assumes knowledge worker salary of $84K
Automated classification has an ongoing cost of much less than 1 cent/classification (IBM Estimate)
Compare 17 cents vs 1 cent per classification
Average enterprise user has 25 emails per day (IDC, IBM customers)
Average user comes into contact with 70 pieces of content every day
Average business email size is 100 – 130 KB (IDC, IBM customers)
Information workers spend 9 – 10 hours a week looking for information. They waste about 3.5 hours each week on searches that don't turn up the right information. (IDC)
5% of email qualifies as “business records” (IBM Customers)
$1 buys 6 GB of disk
32. 32 Productivity Savings with automated classification Assume an enterprise with 10,000 employees:
10,000 x 25 emails per employee per day = 250,000 emails per day
Manual classification = 17 cents/email
If only 10% of email is examined and classified…
$4,250/day in productivity loss through manual classification.
$300K+ in savings within 12 weeks through automated classification
33. 33 Storage costs savings with automated classification Storing 100% of the email annually
Results in storing 15.6 billion messages
15.6 billion messages x 100 KB/message x $1/(6 GB) = ~$250K storage costs
One simple alternative to “saving everything”
90 day disposition for 50% of the email volume (the 50% with the lowest business value as judged by automatic classification)
Long term storage for the top 50% of the email volume
Using automated classification to dispose of bottom 50% after 90 days
Capacity for top 50% (long term storage):
15.6 billion messages x 50% x100 KB/message x $1/(6 GB) = $125K
Capacity for bottom 50% (90 days rolling):
15.6 billion messages x 50% x 100 kb/message x $1/(6 GB) x Ľ = $31K
Total storage cost: $156K
$94K+ in hardware savings annually What would the same 10,000 employee company save on hardware storage?
An example: Since about 5% of email are business records, you can create an archival policy that disposes of a large percentage of emails with no business value
What would the same 10,000 employee company save on hardware storage?
An example: Since about 5% of email are business records, you can create an archival policy that disposes of a large percentage of emails with no business value
34. 34
35. 35 What are your classification options? You can see where I am going here…
When you combine the increases in productivity with the enormous cost savings, a context-sensitive approach to classification is superior in head-to-head competition with manual or rules-based approaches. At IBM we have found that the very best results actually come when various automated methods are combined – there are always instances when contextually sensitive, statistics-based approaches are improved by rules… And vice versa. Layering multiple methods will lead to your best results.You can see where I am going here…
When you combine the increases in productivity with the enormous cost savings, a context-sensitive approach to classification is superior in head-to-head competition with manual or rules-based approaches. At IBM we have found that the very best results actually come when various automated methods are combined – there are always instances when contextually sensitive, statistics-based approaches are improved by rules… And vice versa. Layering multiple methods will lead to your best results.
36. 36 Where does automated classification improve ROI? Loan origination
Claims processing
Correspondence management
Compliance management
Vertical document management
Legal/regulatory matter management and legal discovery
and more Loan origination: better automating routing based on analysis of free-form descriptions and comments
Claims processing: claim prioritization, process routing, fraud estimation based on descriptions, correspondence, interviews, and more
Correspondence management: automated routing and handling based on determination of request type
Compliance management: automated identification & declaration of records from poorly managed sources like SharePoint and file systems
Vertical document management: accelerated solution deployment and reduced long-term cost with automatic mapping of existing documents into appropriate vertical taxonomy
Legal/regulatory matter management and legal discovery: faster organization and prioritization of content; adaptive similarity analysis
Similar value-add opportunities exist in process optimization for insurance, utilities, government-specific solutions, and more
Loan origination: better automating routing based on analysis of free-form descriptions and comments
Claims processing: claim prioritization, process routing, fraud estimation based on descriptions, correspondence, interviews, and more
Correspondence management: automated routing and handling based on determination of request type
Compliance management: automated identification & declaration of records from poorly managed sources like SharePoint and file systems
Vertical document management: accelerated solution deployment and reduced long-term cost with automatic mapping of existing documents into appropriate vertical taxonomy
Legal/regulatory matter management and legal discovery: faster organization and prioritization of content; adaptive similarity analysis
Similar value-add opportunities exist in process optimization for insurance, utilities, government-specific solutions, and more
37. 37 Summary of Automated Classification Benefits Saves users’ time – by automating an otherwise manual process and by making content more easily searchable
Saves storage costs – by identifying content with limited or no business value
Cuts maintenance costs – through reliance on “learning by example” and taking real-time feedback from users
Higher accuracy through consistency – automated solutions don’t opt out of participating!
Realize the value of your ECM solutions – better organized content is more useful for other applications (search, records management, business processes) [Note that a big part of these processes are paper. For example, for any given loan file you might have 100 different paper documents. IBM has added classification technology to our capture technology. Is this a claim? A police report? A photo?][Note that a big part of these processes are paper. For example, for any given loan file you might have 100 different paper documents. IBM has added classification technology to our capture technology. Is this a claim? A police report? A photo?]
38. 38 Summary of Automated Classification Benefits
Opportunity cost savings – Users focus on critical business decisions, rather than mundane classification decisions.
39. 39