240 likes | 413 Views
Searching and Ranking Documents based on Semantic Relationships. Boanerges Aleman-Meza LSDIS lab , Computer Science, University of Georgia. Paper presentation ICDE Ph.D. Workshop 2006 April 3rd, 2006, Atlanta, GA, USA.
E N D
Searching and Ranking Documents based on Semantic Relationships Boanerges Aleman-Meza LSDIS lab, Computer Science, University of Georgia Paper presentation ICDE Ph.D. Workshop 2006 April 3rd, 2006, Atlanta, GA, USA This work is funded by NSF-ITR-IDM Award#0325464 titled '‘SemDIS: Discovering Complex Relationships in the Semantic Web’ and NSF-ITR-IDM Award#0219649 titled ‘Semantic Association Identification and Knowledge Discovery for National Security Applications.’
Outline • Research Problem • Proposed Solution • Preliminary Results • Outstanding Future Work • Conclusions and Future work
Today’s use of Relationships (for web search) • ‘href’ relationships between documents • documents as a whole • No explicit relationships are used • other than co-occurrence • Implicit semantics • such as page importance (some content from www.wikipedia.org)
But, more relationships are available • Documents are connected through concepts&relationships • i.e., MREF [SS’98] • Named-entities can be identified • with respect to existing data, such as ontologies (some content from www.wikipedia.org)
Complex Relationships • People will use Web search not only for documents, but also for information about semantic relationships [SFJMC’02] • Relationships play an important role in the continuing evolution of the Web [SAK’03]
Complex Relationships • Semantic Relationships: named-relationships connecting information items • their semantic ‘type’ is defined in an ontology • go beyond ‘is-a’ relationship (i.e., class membership) • Have gained interest in the Semantic Web • operators “semantic associations” [AS’03] • discovery and ranking [AHAS’03, AHARS’05, AMS’05] • Relevant in emerging applications: • content analytics – business intelligence • knowledge discovery – national security
Research Problem How we can exploit semantic relationships of named-entities to improve relevance in search and ranking of documents?
Proposed Solution: Diagram View • Builds upon the following capabilities: • Populated Ontologies • Semantic Annotation • RDF databases • It can be done [ABEPS’05] • Demonstrated with small dataset • Using explicit, named relationships [SRT’05] • Allows to explain why a document is relevant
Research Challenges • Ranking Complex Relationships • Utilization of populated Ontologies • Defining and measuring what is relevant • Addressing Scalability
Proposed Solution: Big Picture Ranking Complex Relationships[AHAS’03, AHARS’05] User-defined Context for Document Retrieval [ABEPS’05] Searching and Ranking Documents based on Semantic Relationships Large Populated Ontologies [AHSAS’04] Relevance Measures using Semantic Relationships [ANR+06] (current work)
Rarity Association Length Organization Political Organization Democratic Political Organization Subsumption Context Trust Ranking Complex Relationships Association Rank Popularity
Populated Ontologies: SWETO • SWETO: Semantic Web Technology Evaluation Ontology [AHSAS’04] • Large scale test-bed ontology containing instances extracted from heterogeneous Web sources • Domain: cs-publications, locations, terrorism • Over 800K entities, 1.5M relationships (version 1.4) • Developed using Freedom toolkit • (www.semagix.com) • Version 1.4
Defining what is relevant Ultimately, many entities are inter-connected! … Which ones are relevant?
… Defining what is relevant • Relevance is determined by considering: • - type of next entity (from ontology) • name of connecting relationship • length of discovered path so far • (short paths are preferred) • cumulative relevance score • other properties such as transitivity • user-defined context (if any)
… Defining what is relevant • Involves human-defined relevance of • specific path segments • The simplest case, • a YES/NO question: • Is it relevant to discover entities through a ‘ticker’ relationship? • … yes? • Is it relevant to discover entities through a ‘industry focus’ relationship? • … no? ticker (Company) x industry focus y
… Measuring what is relevant has industry focus Information-loss: measure that defines a cut-off on whether a sequence of relationships is still relevant (extending [MKIS’00]) Technology Consulting listed in has industry focus 499 Fortune 500 (20+) leader of listed in listed in leader of based at Plano Tina Sivinski Electronic Data Systems 7K+ EDS NYSE:EDS ticker listed in
Preliminary Results • Using human-defined relevance • pruned to 5 relevant paths • naïve method (all paths) • results in over 24K paths • (of up to length 5) Technology Consulting has industry focus Fortune 500 listed in leader of based at Plano Tina Sivinski Electronic Data Systems EDS NYSE:EDS ticker listed in
Outstanding Future Work • Formalize relevance-threshold idea • leading to claim/lemma with proof • Address Scalability Issues • refinement of current indexing techniques • Release of SWETO-DBLP Ontology • enhanced ontology of DBLP data • Comprehensive Evaluations • human-subjects & comparisons with related work
Future Work: Context:why, what, how? • Context Focused/Personalized Relevance • Context captures users’ interest to provide him/her with relevant results • By selecting concepts/relations/entities of the ontology Will build upon our previous work [AHAS’03, ABEPS’05]
Related Work • Semantic Searching and Ranking of entities on the Semantic Web • Rocha et al. WWW’2004 • Nie et al. WWW’2005 • Guha et al. WWW’2003 • Stojanovic et al. ISWC’2003 • Zhuge et al. WWW’2003
References [ABEPS’05] B. Aleman-Meza, P. Burns, M. Eavenson, D. Palaniswami, A.P. Sheth: An Ontological Approach to the Document Access Problem of Insider Threat, IEEE ISI-2005 [ASBPEA’06] B. Aleman-Meza, A.P. Sheth, P. Burns, D. Paliniswami, M. Eavenson, I.B. Arpinar: Semantic Analytics in Intelligence: Applying Semantic Association Discovery to determine Relevance of Heterogeneous Documents, Adv. Topics in Database Research, Vol. 5, 2006 (in print) [AHAS’03] B. Aleman-Meza, C. Halaschek, I.B. Arpinar, and A.P. Sheth: Context-Aware Semantic Association Ranking, First Intl’l Workshop on Semantic Web and Databases, September 7-8, 2003 [AHARS’05] B. Aleman-Meza, C. Halaschek-Wiener, I.B. Arpinar, C. Ramakrishnan, and A.P. Sheth: Ranking Complex Relationships on the Semantic Web, IEEE Internet Computing, 9(3):37-44 [AHSAS’04] B. Aleman-Meza, C. Halaschek, A.P. Sheth, I.B. Arpinar, and G. Sannapareddy: SWETO: Large-Scale Semantic Web Test-bed, Int’l Workshop on Ontology in Action, Banff, Canada, 2004 [AMS’05] K. Anyanwu, A. Maduko, A.P. Sheth: SemRank: Ranking Complex Relationship Search Results on the Semantic Web, WWW’2005 [AS’03] K. Anyanwu, and A.P. Sheth, ρ-Queries: Enabling Querying for Semantic Associations on the Semantic Web, WWW’2003
References [HAAS’04] C. Halaschek, B. Aleman-Meza, I.B. Arpinar, A.P. Sheth, Discovering and Ranking Semantic Associations over a Large RDF Metabase, VLDB’2004, Toronto, Canada (Demonstration Paper) [MKIS’00] E. Mena, V. Kashyap, A. Illarramendi, A.P. Sheth, Imprecise Answers in Distributed Environments: Estimation of Information Loss for Multi-Ontology Based Query Processing, Int’l J. Cooperative Information Systems 9(4):403-425, 2000 [SAK’03] A.P. Sheth, I.B. Arpinar, and V. Kashyap, Relationships at the Heart of Semantic Web: Modeling, Discovering and Exploiting Complex Semantic Relationships, Enhancing the Power of the Internet Studies in Fuzziness and Soft Computing, (Nikravesh, Azvin, Yager, Zadeh, eds.) [SFJMC’02] U. Shah, T. Finin, A. Joshi, J. Mayfield, and R.S. Cost, Information Retrieval on the Semantic Web, CIKM 2002 [SRT’05] A.P. Sheth, C. Ramakrishnan, C. Thomas, Semantics for the Semantic Web: The Implicit, the Formal and the Powerful, Int’l J. Semantic Web Information Systems 1(1):1-18, 2005 [SS’98] K. Shah, A.P. Sheth, Logical Information Modeling of Web-Accessible Heterogeneous Digital Assets, ADL 1998
Data, demos, more publications at SemDis project web site, http://lsdis.cs.uga.edu/projects/semdis/Thank You