EFFORTS TO AUTOMATE LABELING OF LECTURES WITH COMPUTING ONTOLOGY TERMS

EFFORTS TO AUTOMATE LABELING OF LECTURES WITH COMPUTING ONTOLOGY TERMS Felicia Decker and Lois Delcambre Portland State University

PREVIOUS WORK • Course • Intro to Databases • We found 6 courses – on the web – with all lectures • Lecture notes • ppt/pdf/html • Hand-labeled each lecture topic with Computing Ontology (CO) terms • used this to validate the CO • leaf CO terms correspond to lecture topics

CURRENT WORK • Will the words that appear in these lecture notes help us choose CO terms?Are there “signature” words for each topic? • Tools • Lucene • Converter tools (ppt/pdf/html -> text) • Microsoft Excel

LUCENE • Index lecture notes • text from one lecture = one document • documents/lectures from one course = one collection (with an index) • Provides us with • Term frequency (tf) • Inverse document frequency (idf) • Tf-idf • Currently using single words, just now introducing stemming

CONVERTER TOOLS • Lecture notes come in different formats • PPT -> text • Apache POI • PDF -> text • TextMiningTool 1.1.42 • Xpdf-3.02 • HTML -> text • Copy/paste • Internet Explorer – save webpage as text

EXCEL • After using Lucene to get tf, idf and tf-idf data for each term in the given index… • Select a CO term: e.g., Normalization • Using CO-labeled lecture notes (previous work), choose the lectures labeled with Normalization • Compile tf/idf/tf-idf data into one spreadsheet

HAND-LABEL WORDS FROM LECTURES AS “IMPORTANT” • Signature words were human-selected from Database Management Systems by Ramakrishnan and Gehrke, 3rd Ed. • Use Find All/Replace All function in Excel to highlight all signature words that identify Normalization

INITIAL EFFORT

INITIAL EFFORT: RESULTS • Conclusions • Tf-idf is not a strong indicator • Cannot solely rely on tf-idf • ‘Running example’ • While good for teaching • We don’t care about this data • Stemming is important • Use of phrases may help

NEXT STEPS • Intersection of terms across all classes • May solve ‘running example’ problem • Compute average rank • Compute average tf-idf (?) • Union all documents with the same CO label(union text from all the lectures on normalization, union text from all lectures on query optimization, etc.) • Look at tf-idf • Consider various classification algorithms (looking to see if there are some implemented for Lucene)

EFFORTS TO AUTOMATE LABELING OF LECTURES WITH COMPUTING ONTOLOGY TERMS

EFFORTS TO AUTOMATE LABELING OF LECTURES WITH COMPUTING ONTOLOGY TERMS

Presentation Transcript

Ontology-Based Computing

Visualizing the Uncertainty of Urban Ontology Terms

An ontology of computing

Labeling with Maplex

Automate with Apex

Automate with Apex

Automatic generation of MedDRA terms groupings using an ontology

Computing Ontology

GAS ontology: an ontology for collaboration among ubiquitous computing devices

Two Lectures on Grid Computing

Speeding up ontology creation of scientific terms.

Global Efforts to Secure Cloud Computing

Combining Declarative and Procedural Knowledge to Automate and Represent Ontology Mapping

MIREOT Minimum information to reference external ontology terms

Ontology Mapping in Pervasive Computing Environment

Labeling with Genisphere Kit

COPING With Lectures!

Current Ontology-related Standards Efforts

How to Automate Daily Hotel Operations with a Hotel PMS - Pure Automate

Visualizing the Uncertainty of Urban Ontology Terms

Automate Everything with Robotics