Graph and Topological Structure Mining on Scientific Articles

Graph and Topological Structure Mining on Scientific Articles Fan Wang, Ruoming Jin, Gagan Agrawal and Helen Piontkivska The Ohio State University The Kent State University Presenter: Fan Wang The Ohio State University

Outline • Introduction • Topological Structure Mining • Data Preprocessing and Graph Representations • Experiment Results and Pattern Analysis • Conclusion

Introduction • Huge number of genes in literature • Associated with targeted disease or functionality • Finding interaction among genes manually • Time consuming • Error Prone

Introduction • Well-known relationship among chemokine ligands • Mining these relations from literature documents • Mining frequent patterns from graph datasets • Convenient representation • Lots of research in subgraph mining

Introduction • Our Goal • Find commonly occurring interactions • Represent them visually • Capture the co-occurrence of scientific terms • Graph representation of scientific document • Mining frequent topological structures

Topological Structure Mining • Disadvantages of subgraph mining • Exact matching • Missing potential patterns • Focusing on the topological relationship • Incorporating approximate matching

Topological Structure Mining G X G is a subgraph of Y Y X is a (0,3) topological structure of Y

Topological Structure Mining • Definition • Given a collection of graphs, two parameters l and h, and a threshold θ. A (l,h)-topological structure whose support is greater than or equal to θis called a frequent topological structure. • Given a set of graphs, in our KDD05 paper, an algorithm TSMiner finding frequent topological structures is implemented

Our Work • Using topological structure mining • Challenges • How to create graphs? • What are the keywords? • How to insert edges into graphs?

Data Preprocessing and Graph Representation • One graph for each document • Nodes are keywords of interest • Edges inserted based on occurrence of the keywords • Run topological structure mining algorithm

Data Preprocessing • Four dictionaries of keywords • Short Dictionary • 321 genes expressed between prostate epithelial and stromal cells • Long Dictionary • 2600 human genes found in supperarray’s DNA microarray experiment • Confusion Dictionary • Gene names easily confused with ordinary words • GO Dictionary • GO terms (molecular function, biological process and cellular component)

Graph Representations • Edge Construction Methods • Sentence-based Method • Two keywords in one sentence • Mutual Information Method • The mutual information of two keywords greater than a threshold • Sliding Window Method • Two keywords located within a sliding window with a pre-defined size

Experiment Results • Focusing on articles containing at least one of the 5 genes • CCL5, TF, IGF1, MYLK, IGFBP3 • Generating graph for each article • Finding frequent topological structures

Three Edge Construction Methods

Results • Sliding window method wins • Largest number of frequent patterns • Best scalability • Topological structure mining giving us more frequent patterns • Large number doesn’t mean high biological significance

Pattern Analysis • ONLY be found by topological structure mining • ONLY be found by sliding window method • Restoring nodes revealing interesting patterns

Conclusion • Sliding window method is the best • The most number of frequent patterns • The highest quality of frequent patterns • Topological structures found corresponding well to known relationships • Topological mining being a very valuable tool for biological researchers

Three Edge Construction Methods • Interestingness of Edges • Counting the number of distinct edges • Computing the average interestingness of edges for all patterns found by using each edge construction method

Graph and Topological Structure Mining on Scientific Articles

Graph and Topological Structure Mining on Scientific Articles

Presentation Transcript

Peta-Graph Mining

Writing and Publishing a Scientific articles

Graph-Based Data Mining

Introduction to Graph Mining

Graph Mining: Laws, Generators and Tools

Scientific Data Mining

Definition and search of scientific articles

Topological Graph Theory

Reporting Guidelines and scientific articles’ impact

Centrality and Graph Mining

Scientific Articles

Sensor and Graph Mining

Scientific Data Mining

Graph Algorithms: Topological Sort

Graph mining in bioinformatics

Reporting on Scientific Journal Articles

Large Graph Mining

Large Graph Mining

Writing Scientific Research Articles

Writing Scientific Research Articles

Centrality and Graph Mining

Making a Scientific Graph