80 likes | 157 Views
Impact of different relation extraction methods on network analysis results. Jana Diesner. Need: scalable, reliable, robust methods & tools. Unstructured At any scale. Network Analysis Answer substantive and graph-theoretic questions Develop and test hypothesis and theories
E N D
Impact of different relation extraction methods on network analysis results Jana Diesner
Need: scalable, reliable, robust methods & tools • Unstructured • At any scale • Network Analysis • Answer substantive and graph-theoretic questions • Develop and test hypothesis and theories • Visualizations • Populate databases • Input to further computations, e.g. simulations, machine learning Motivation • Text Data • Network Data • Applications
Research Questions and Relevance • How do network data and analysis results obtained by using different relation extraction methods compare to each other? • Why does it matter? • Increased comparability, generalizability, transparency of methods and tools • Increased control and power for developers and users • Supports drawing of reasonable and valid conclusions
Relation Extraction Methods Meta-data (META) Subject Matter Experts (SME) Text, automated (TextA) Text, manual (TextM) Meta-Data Database query Codebook Proximity-based linkage of nodes Proximity-based linkage of nodes Proximity-based linkage of nodes
Data • Large-scale, over-time, open source data from different domains
Results I • Text automated vs. manual: total number of nodes of sub-type “generic” far higher than “specific” • Rethink focus of network analysis: collectives vs. individuals • Importance of detecting unnamed entities • Ground truth data (SME) hardly resembled by analyzing text bodies and not at all by meta-data networks • In most ideal case, 50% of nodes and 20% of links • Agreement in structure and key entities depends on type of network
Results II • Agreement between text-based, and with meta-data depends on type of network • For more complete view, combine automated text-based with meta-data network
Acknowledgements • This work was supported by the National Science Foundation (NSF) IGERT 9972762, the Army Research Institute (ARI) W91WAW07C0063, the Army Research Laboratory (ARL/CTA) DAAD19-01- 2-0009, the Air Force Office of Scientific Research (AFOSR) MURI FA9550-05-1-0388, the Office of Naval Research (ONR) MURI N00014‐08‐11186, and a Siebel Scholarship. Additional support was provided by the CASOS Center at Carnegie Mellon University. The views and conclusions contained in this talk are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of the NSF, ARI, ARL, AFOSR, ONR, or the United States Government. • Thank You! Questions, Comments, Feedback: jdiesner@illinois.edu