310 likes | 467 Views
Introduction to Metadata. Accelerating Cisco's Business Intelligence. By Kristen Brennan Version 1.1 July 23, 2003. What Does "Metadata" Mean?.
E N D
Introduction to Metadata Accelerating Cisco's Business Intelligence By Kristen Brennan Version 1.1 July 23, 2003
What Does "Metadata" Mean? The prefix meta- means "beyond." A meta was a stone column which marked the end of a racecourse in Ancient Greece. The first chariot to pass beyond the meta won the race. We still have meta with us today, as the modern goalpost. The word "data" is plural for datum, which means "a fact." The Romans began letters with a datum: the name of the city in which the letter was written.
What Does "Metadata" Mean? In the technology industry the word metadata is usually defined as "data about data," often with an apology that this definition is really too broad to be useful. For a deeper understanding of what metadata really means, consider the Greek and Latin roots: "beyond" "facts" metadata Metadata means "information which goes beyond mere facts."
What Is Metadata? But if metadata "goes beyond facts," does that mean metadata is nonfactual? Hardly! Rather, metadata is an additional layer of facts which place the original information into a useful context. A classic example of metadata is the card catalog at a library. By indexing title, subject and author, this catalog provides metadata about the original data, the books. Readers can pinpoint exactly what they're looking for without reading every book in the library.
Metadata Satisfies a Perspective Of course any information about a book which is not included in the book itself is technically metadata: number of copies sold, a list of languages the book has been translated into, even the author's favorite foods. How do librarians decide which metadata is worth indexing? The answer is Perspective: Who needs the metadata, why do they need it, and what information do they need exactly? Library card catalogs satisfy metadata requirements from a reader perspective.
There are 3 Types of Metadata Technical Metadata Business Metadata Administrative Metadata For people who add & modify information to the database For people who need a more useful form of the data itself For people who design & develop the database
Technical Metadata Technical metadata is information about the database, for people who design & develop it. Types of technical metadata include: • Table names • Column names • Information types • Database key attributes and indices • Mapping to other databases • The most important piece of technical metadata is the database schema. Technical metadata is sometimes called "structural metadata."
Administrative Metadata Administrative metadata is information about the data stored in the database, for the people who add to & modify that information. Types of administrative metadata include: • Author • Creation date • Expiration date • History (version tracking) • Provenance (data source)
Business Metadata Business metadata is the data itself, the information the database was created to process and store, refined into a more useful form. Types of business metadata include: • Controlled vocabulary • Data definitions (in plain English) • Data accuracy (how current?) • Data lineage (where from?) • Data stewardship (who touches?) • Data ownership (who's responsible?) • Business rules • Security Business metadata is sometimes called "informational" or "descriptive" metadata.
Metadata Mimics The Human Brain Metadata can empower business intelligence because it is modeled on the human brain: Cat The neocortex gives names to concepts. Dog Animal The limbic system understands the relationships between concepts. Cat Animal Dog Animal Animal The brainstem organizes that network of concepts into useful patterns Cat Dog
Metadata Mimics The Human Mind Metadata allows a company's systems to think of data the same way people do: as concepts which are related to each other into useful patterns, such as a hierarchy. Concepts 3700 7600 Router Relationships 3700 Router 7600 Router Hierarchy Routers 3700 7600
How Intelligence Evolves Is metadata enough to empower business intelligence? To answer that question, consider the evolution of intelligence in a person: A baby understands only impulses: raw data. It does not know the names for things nor how they relate. Babies can only act impulsively. Hunger Impulsive reaction: cry
How Intelligence Evolves Within a few years we evolve a syntactic model of our world; raw data is grouped into useful patterns of metadata. This allows us to weigh decisions and make choices which improve our ability to meet our short- term needs. A young person can think tactically. Hunger solutions reasons for wait for dinner eat candy bar Tactical decision: eat candy bar
How Intelligence Evolves Adults learn to understand the world as semantic metadata: as data arranged into meaningful patterns which we explicitly understand. This allows us to consider all the factors relevant to a decision, beyond the obvious. An adult can think strategically. Hunger solutions reasons for meets which goal? wait for dinner eat candy bar goals meets which goal? long-term short-term Strategic decision: wait for dinner
Syntactic vs. Semantic Like the adult mind, business metadata can be either syntactic or semantic (or fall somewhere along the spectrum): Products Products Category Routers Routers Product Family 3700 Series 3700 Series 7600 Series 7600 Series Products 3725 3745 3725 3745 Semantics is about meaning. A semantic system explicitly understands the relationships between each piece of data. This enables the system to make inferences. Syntax is about arrangement. A syntactic system organizes data for easy retrieval, but doesn't understand what the data means.
Inferences: The Key to Business Intelligence An inference is a logical conclusion drawn by comparing two or more facts. For example, if we know that "Spot is a dog" and "Dogs are not allowed on the beach," then we can infer that "Spot is not allowed on the beach." Business people are experts at drawing inferences. So why do we need machine inferences at all?
The Evolution of Business Intelligence To understand the importance of enabling machine inferences, consider the evolution of business intelligence in a company: Mr. Smith starts "WidgetCo" in his garage. He sells widgets made of brass and tin. Each time a unit is sold, the person who sold it records the transaction in a notebook. This notebook represent Mr. Smith's business data. brass widgets profit sales mousepads disorganized business data
The Evolution of Business Intelligence When it comes time to pay taxes, Mr. Smith discovers that everyone has been recording information differently. His daughter Mary counts everything as a widget, even mousepads. His son Tom only counts the expensive brass widgets as "widgets," and groups the tin widgets together with "miscellaneous." Mr. Smith introduces a controlled vocabulary, or a single set of names for concepts which everyone in his company agrees to use. He decides that brass and tin are both widgets, but mousepads are not. things we sell miscellaneous widgets tin brass mousepads
The Evolution of Business Intelligence While checking the ledger, Mr. Smith discovers that he and his daughter both recorded a particular sale that they each helped with. So the master ledger now includes two instances of the same sale, making his revenue appear $3 higher than it should be! sales ms. a. murray mr. astaire anne murray sales Mr. Smith normalizes his data by removing this redudant information. mr. astaire ms. anne murray
The Evolution of Business Intelligence As WidgetCo grows into a larger company, Mr. Smith discovers that raw data is no longer enough. Now that his company sells widgets, gizmos and doohickeys, he has to check the sales figures of all three to figure out how much brass to buy each month. He solves the problem by building a metadata system. Now his system can "slice the data" any way he wants: it can tell him how many widgets he sold, or how many brass items he sold, or the total weight of all brass items sold.
The Evolution of Business Intelligence Metadata empowers Mr. Smith to drawinferences about his sales of brass and tin products. For instance, if he sold 45 tons of brass items last month, and he only had 50 tons of brass, he can infer that he only has 5 tons left - so he'll need to order more for next month. materials things we sell doohickeys brass tin gizmos widgets is made of
The Evolution of Business Intelligence But today's companies can be so large that no single human being could possibly know every detail of their operation. For instance, Mr. Smith discovers that he and his competitor, Frank's Widgets, both sell large amounts of brass widgets in certain neighborhoods. But why? sales territories high-performing areas local towns C-town B-town A-town E-town D-town ?
The Evolution of Business Intelligence Mr. Smith solves the problem by upgrading to a semantic metadata system, thus enabling machine inferences. Now his data system thinks a little bit like a person, and he can ask it questions. He writes a query called "Identify Commonalities," which compares all company information for each of the three high-performing areas and returns a list of factors they have in common.
The Evolution of Business Intelligence The system reveals that every high-performing area is predominantly Spanish-speaking. sales territories predominantly speaks high-performing areas local towns languages E-town C-town B-town A-town D-town Spanish English predominantly speaks
The Evolution of Business Intelligence Mr. Smith phones a few retailers, and they confirm that the people buying the most brass widgets speak Spanish. Another call reveals that a world-class Spanish-language technical college has recently opened in the area. Mr. Smith leverages this highly-evolved business intelligence to launch a line of Adminículos de Latón: brass widgets with Spanish-language packaging and technical support. The Spanish community feel valued and understood as customers, so they begin to buy all their widgets from WidgetCo.
The Evolution of Business Intelligence WidgetCo's highly-evolved business intelligence provides a competitive advantage over Frank's Widget's. Due to brand loyalty, over time this small advantage multiplies until WidgetCo is the world-leader in widgets, and purchases Frank's Widgets as a subsidiary. Frank eventually became Vice President of WidgetCo. He and Mr. Smith are now best friends, and often golf together.
Conclusion For today's large corporations, the race to business intelligence is about metadata: information which goes beyond mere facts. + = Metadata Data Meta
Appendix: Related Terms To centralize data means to combine several sources into a single source. For instance, if Mr. Smith and his two salespeople each recorded sales in their own notebooks, he might centralize those notebooks into a single ledger. If Ed in manufacturing and Mary in finance each track "number of units sold," people in the company won't know which one to use! A Single Source Of Truth (SSOT) is the one official version of a piece of information.