100 likes | 183 Views
From Data to Knowledge. An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation. From Data to Knowledge: Leveraging Ontology, Epistemology, and Logic. Definitions & examples
E N D
From Data to Knowledge An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation
From Data to Knowledge: Leveraging Ontology, Epistemology, and Logic • Definitions & examples of a way to “[enhance] human cognition and generating new knowledge from [the] wealth of heterogeneous digital data” on the web) • A web of knowledge • Community access to knowledge
Definitions: “From Data to Knowledge” • Progression of terms: symbols, data, conceptualized data, knowledge • Symbols: characters and character-string instances • Data: symbols as values in attribute-value pairs • Conceptualized data: data in the framework of a conceptual model • Knowledge: conceptualized data with a degree of certainty or community agreement • From Data to Knowledge • Recognize symbols • Classify symbols with respect to meta-data attributes • Embed attribute-value pairs into a conceptual framework of concepts, relationships, and constraints • Present for community approval or check with respect to community-approved knowledge or link to original source
Examples: From Data to Knowledge • Car Ads • Symbols: $, 12k, ford, 4-Door • Data: price(12k), mileage(12k), make(ford) • Conceptualized data: • Car(C123) has Price($12,000) • Car(C123) has Mileage(12,000) • Car(C123) has Make(Ford) • BodyTypeisa Feature • Car(C123) has Feature(Sedan) • Knowledge • Community agreement that the ontology is “correct” • Community agreement that the facts in the ontology are “correct” • Appointments • Biology
Examples: From Data to Knowledge • Appointments • Biology
Examples: From Data to Knowledge • Biology
Definitions: “Ontology,” “Epistemology,” and “Logic” • Ontology • Existenceanswers “What exists?” • Computationally, it answers: what concepts, relationships, and constraints exist and how they are interrelated. • Epistemology • The nature of knowledgeanswers: “What is knowledge?”, “How is knowledge acquired?”, “What do people know?” • Computationally, it answers: what is knowledge (conceptualized data with community agreement). • Logic • Principles of valid inference—answers: “What can be inferred?” • Computationally, it answers: what can be inferred (in a formal sense) from conceptualized data.
Examples: “Computational Answers” • Ontology: What exists? • In Car Ads: Car, Make, Model, Car has Make, Engine isa Feature • In Appointments: Service Provider, Date, Appoint with Doctor • In Biology: Protein Activity, Molecular Weight, Chromosome Location is aggregate of ChromosomeNumber and Start and End and Orientation • Epistemology: • What is knowledge? • A fact-filled Biology ontology • Chromosome Number (21) starts at Start (29,350,518) and ends at End (29,367,889) with Orientation(minus) • How is it acquired? • Creation of a fact-filled Biology ontology obtained from a reliable source • Provenance: Was the source from which the Biology ontology was created reliable? • What do people know? • Does my knowledge that I have an appointment with Dr. Jones on Thursday align with the appointment ontology as established by the doctor’s office? • I view the world with my car ads ontologyhow does it align with the community standard ontology? • Logic: Principles of valid inference • Find red Nissans later than a 2002 with less than 100k miles • In Appointments: can reason that a dermatologist is a medical service provider
A Web of Knowledge asSemantic-Web Pages • Human-readable page (ordinary HTML, XML, …) • One or more annotation attachments • a reference to the ontology used for annotation • queriable RDF triples of extracted information • pointers into the original source for every item • highlighting possibilities for extracted data • hover possibilities to connect to the ontology
Community Access to Knowledge • Access to knowledgeboth ontological knowledge as well as facts. • Ease of Use • Free-form queries • Form-based queries • Scalability • Semantic indexes • Caching (on the scale of Google++)