150 likes | 264 Views
Schema Matching through Structural Analysis and Natural Language Processing Ming Xiao Skidmore College Faculty Mentor: Dr. Longzhuang Li. Outline. Background Importance Known Methods Approach Challenges Future Work Acknowledgements Questions. Schema Matching.
E N D
Schema Matching through Structural Analysis and Natural Language ProcessingMing XiaoSkidmore CollegeFaculty Mentor: Dr. Longzhuang Li
Outline • Background • Importance • Known Methods • Approach • Challenges • Future Work • Acknowledgements • Questions
Schema Matching • Schema = Description of a Table • Match elements that are related
Importance • Enterprise • Merging two companies, one database • Environmental Data Collection • Merging data to provide overall picture • Storm Tracking
Structural Matching • Similarity Flooding Algorithm • Neighbors are similar to each other A = Alex, Aly B = Ben, Beth C = Carl, Cam Ben, Beth A B Carl, Cam Alex, Aly C S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity Flooding: A Versatile Graph Matching Algorithm and its Application to Schema Matching
Table to Graph Client CID CName First Name Last Name Name of Table is the head of the graph Customer Company CustID Contact Phone
The Neighborhood • Cross product of elements in original graphs Client, Customer Client Customer Last Name CName Client, Company CID CustID First Name Phone Company Client, CustID Contact Client, Contact • Choose similar pairs Client, Phone ...
Determining Pairs • Semantic • Determine correlation between two definitions • Client → {person, pays, services, goods, seeks, advice, lawyer, ...} • Customer → {someone, pays, goods, services} • Company → {institution, conduct, business, ...} • Cosine Similarity, value between 0 and 1 Client, Customer Client, Company • Letter Pair • CustID, CID • CustID → {Cu, us, st, tI, ID} • CID → {CI, ID} • Phone → {Ph, ho, on, ne} • Letter Pair Similarity CustID, CID CustID, Phone
Neighbors N • Each node, N, is a pair (x,y) • One element from each graph Client, Customer • There is an edge N → N' • x → x' and y → y' CID, CustID N'
Challenges • Multi-word terms • Air Temperature • Air → {mixture, gases, oxygen, required, breathing, ...} • Temperature → {degree, hotness, coldness, body, ...} • Similar meaning, different defining words • Hurricane vs. Cyclone • Hurricane → {severe, heavy, rain, ...} • Cyclone → {violent, windstorm, ...}
Future Work • Incorporate other methods • Gender, Sex → {M,F}
References E. Rahm, P.A. Bernstein: A survey of approaches to automatic schema matching S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity Flooding: A Versatile Graph Matching Algorithm and its Application to Schema Matching
Acknowledgements • Dr. Longzhuang Li • Dr. Dulal Kar • Dr. Ahmed Mahdy • Huy Tran • National Science Foundation