50 likes | 62 Views
Mining Structured vs. Unstructured Data Where is the structure and where did the semantics go?. Rahim Yaseen SAP Labs LLC. Why Mining works for structured data. For relational data There is no separation of the semantic data model and the logical storage model
E N D
Mining Structured vs. Unstructured DataWhere is the structure and where did the semantics go? Rahim Yaseen SAP Labs LLC.
Why Mining works for structured data.. • For relational data • There is no separation of the semantic data model and the logical storage model • Both are co-incident in a single data model and the data definition has limited semantics • The semantics are captured in the richness of the queries which form well known associations based on expert knowledge of relationships in the data models Reports Rich semantics are usually expressed in queries and reports which have apriori knowledge of the data models Queries Relational Data Model For relational databases, the data model represents a combination of the data representation specification and its storage as relational data. Sometimes, views can express alternate representational models that differ from the underlying tables structures. Data
What will it take to mine unstructured data? Why free (text) search is not the answer.. • The data has no structural model for which meaningful semantics can be applied • As a result, queries have limited semantics and are not rich enough to get the desired outcomes • The limiting nature of ad hoc search (vs. the richness of pre-defined queries based on known structure/semantics) limits the relevance of the output Converting unstructured data to structured data is also not the answer.. • Applying an ETL like technique to convert data to a structured form is limiting • This does not guarantee that all the data of interest can be captured • It provides for only a single (fixed) interpretation of such unstructured data Can overlaying a semantic model onto the data be the answer? • Extract a semantic (meta) model of interest from the unstructured data • Use the structure/semantics of this model to formulate rich search/query • E.g., techniques used when searching and comparing products • Relevant attributes from product descriptions are extracted to form a model • These attributes are used to formulate rich searches/queries and comparisons
Can Mining work for both structured/unstructured data? Reports Queries and Search that can leverage the structure of the data model to specify queries and search that are rich in semantics A separate logical data (meta) model distinct from the underlying storage model • Extracted from the data in a non-intrusive fashion and captured as meta-data • Single data representation model can map to multiple storage models • Structure and semantics of meta-data help structure queries, search, reports • Are embedded tags in the data a possible approach to define ontology structures? • Is it feasible to extract such semantic models and can mining based on this perform? Queries A simple semantic data representation model for modeling data (structured and unstructured). Meta-data based on ontologies is extracted from the underlying data. Simple Semantic (Meta) Data Model Multiple Storage Model Multiple Storage Model Data Storage Model (s) Multiple storage models including; relational, XML, text, etc. Data