1 / 15

Outline

Schema Matching through Structural Analysis and Natural Language Processing Ming Xiao Skidmore College Faculty Mentor: Dr. Longzhuang Li. Outline. Background Importance Known Methods Approach Challenges Future Work Acknowledgements Questions. Schema Matching.

verena
Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Schema Matching through Structural Analysis and Natural Language ProcessingMing XiaoSkidmore CollegeFaculty Mentor: Dr. Longzhuang Li

  2. Outline • Background • Importance • Known Methods • Approach • Challenges • Future Work • Acknowledgements • Questions

  3. Schema Matching • Schema = Description of a Table • Match elements that are related

  4. Importance • Enterprise • Merging two companies, one database • Environmental Data Collection • Merging data to provide overall picture • Storm Tracking

  5. Known Methods

  6. Structural Matching • Similarity Flooding Algorithm • Neighbors are similar to each other A = Alex, Aly B = Ben, Beth C = Carl, Cam Ben, Beth A B Carl, Cam Alex, Aly C S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity Flooding: A Versatile Graph Matching Algorithm and its Application to Schema Matching

  7. Table to Graph Client CID CName First Name Last Name Name of Table is the head of the graph Customer Company CustID Contact Phone

  8. The Neighborhood • Cross product of elements in original graphs Client, Customer Client Customer Last Name CName Client, Company CID CustID First Name Phone Company Client, CustID Contact Client, Contact • Choose similar pairs Client, Phone ...

  9. Determining Pairs • Semantic • Determine correlation between two definitions • Client → {person, pays, services, goods, seeks, advice, lawyer, ...} • Customer → {someone, pays, goods, services} • Company → {institution, conduct, business, ...} • Cosine Similarity, value between 0 and 1 Client, Customer Client, Company • Letter Pair • CustID, CID • CustID → {Cu, us, st, tI, ID} • CID → {CI, ID} • Phone → {Ph, ho, on, ne} • Letter Pair Similarity CustID, CID CustID, Phone

  10. Neighbors N • Each node, N, is a pair (x,y) • One element from each graph Client, Customer • There is an edge N → N' • x → x' and y → y' CID, CustID N'

  11. Challenges • Multi-word terms • Air Temperature • Air → {mixture, gases, oxygen, required, breathing, ...} • Temperature → {degree, hotness, coldness, body, ...} • Similar meaning, different defining words • Hurricane vs. Cyclone • Hurricane → {severe, heavy, rain, ...} • Cyclone → {violent, windstorm, ...}

  12. Future Work • Incorporate other methods • Gender, Sex → {M,F}

  13. References E. Rahm, P.A. Bernstein: A survey of approaches to automatic schema matching S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity Flooding: A Versatile Graph Matching Algorithm and its Application to Schema Matching

  14. Acknowledgements • Dr. Longzhuang Li • Dr. Dulal Kar • Dr. Ahmed Mahdy • Huy Tran • National Science Foundation

  15. Question?

More Related