190 likes | 356 Views
M ultimodal I nformation A ccess and S ynthesis A DHS Institute of Discrete Science Center @ UIUC. Dan Roth Department of Computer Science University of Illinois at Urbana/Champaign. MIAS Mission. Most of the data today is unstructured
E N D
Multimodal Information Access and Synthesis A DHS Institute of Discrete Science Center @ UIUC Dan Roth Department of Computer Science University of Illinois at Urbana/Champaign
MIAS Mission • Most of the data today is unstructured • books, newspaper articles, journal publications, field reports, images, and audio and video streams. • How to deal with the huge amount of unstructured data as if it was organized in a database with a known schema. • how to locate, access, organize, analyze and synthesize unstructured data. MIAS Mission: • develop the theories, algorithms, and tools for analysts to • access a variety of data formats and models • integrate them with existing resources • transform raw data into useful and understandable information.
Task Perspective • In the next decade, intelligence analysts will need to • monitor a huge number of interesting events and entities • formulate and evaluate hypotheses with respect to them. • Analysts must interact, at the appropriate level of semantic abstraction, with a system that can • synthesize, summarize and interpret vast amounts of multimodal information, • integrate observed data with domain models and background information in multiple formats, • propose hypotheses, and help verify them.
Scenario • Consider an intelligence analyst researching a problem • Iranian nuclear program – generate a list of Iranian nuclear scientists, affiliations, specialties, biographies, photos, and notable recent activities. • Medical treatment – what is known about it; who are the experts; what do users say about it; what side effects have been reported • Current technologies have solved the problem of collecting and storing huge amounts of information; it would be reasonable to assume that the information she is after does exist; • However, multiple barriers exist on the way to a successful completion of the analysis, each posing a significant research challenge.
Multimodal Information Access & Synthesis Saeed Zakeri Saeed Zakeri Online Data Sources Focused Multimodal Data Retrieval Web pages attended * * News Articles Field reports Specific Web sites Text Repositories Relational databases Surveillance Videos ... Discover unusual events, entities, and associations. Continuous monitoring of events, entities & associations Rapid retrieval of all info. about a particular entity Efficient search, querying, question answering, browsing, mining, * * * * * visited Elkhan Factory Text documents U. of Tehran * * Elkhan Factory * * * * * * Relational data loc = Northern Iran name = Elkhan Factory topics = fertilizer, enrichment Semantic categories Temporal categories Subjectivity/Opinions Images Infer Metadata: Semantic entities Discover Relations Between Semantic Entities Support Information Analysis, Knowledge Discovery, Monitoring Semantic Disambiguation & Integration across multiple sources and modalities
MIAS Processes • Focused data retrieval and integration • Identify and collect relevant data from multiple sources • Semantic data enrichment • Infer semantics from unstructured data and images; • Allow navigation and search across disparate data modalities; augment KBs • Entity identification and relations discovery • Identify real-world entities and relations among them • Relate them to existing institutional resources for information integration • Knowledge discovery and hypotheses generation and verification • Construct the rich semantic structure and hidden networks of entity linkages • Foundations • Machine learning, database and data mining, natural language processing, inference and optimization and computer vision techniques • Required for and driven by the aforementioned problems.
Integrated Mission: Research & Education • Develop diverse human resources to enhance the scientific research, educational, and governmental workforce in MIAS • Educational and Outreach Initiatives: • Target students from small research programs & minority-serving • Expose them to the national labs • Open opportunities for bigger impact • A comprehensive education program designed to increase participation in the study and practice of MIAS topics: • Provide substantive training for a new generation of experts in the field, • Serve as a tool for recruiting an experienced group of undergraduates into graduate study in one of the broad fields of information science • Be an intellectual community center, where participants at all levels of expertise come together in an enriched environment of collaboration.
Data Science Summer Institute at UIUC 1st DSSI: May 2007 Huge Success • Intensive Course • in • The Math of Data Sciences • Probability and Statistics • Linear Algebra • Data Structures and Algorithms • Optimization • Learning & Clustering • Research Projects • Led by co-PIs and Grad Students • Topics: • Virtual Web: Focused Crawling • Relations and Entities • Text and Images 8 weeks course 25 students Faculty from UIUC, Kansas State, UTSA, UTEP Advanced MIAS Related Tutorials Speaker Series
Machine Learning & Data Mining Databases & Information Integration Computer Vision Natural Language Processing & Information Extraction Information Retrieval & Web Information Access Data Science Summer Institute at UIUC Advanced MIAS Related Tutorials
Our Team • Leading researchers in intelligent information access & analysis and its foundations • A large number of affiliates/consultants covering all areas of interest to the MIAS center.
Kevin C. Chang: Data Integration and Retrieval Deep Web MetaQuerier: Large-scale integration over deep Web • pioneered the “holistic integration” paradigm • Widely published at SIGMOD, VLDB, ICDE • System building and demo at CIDR, SIGMOD, VLDB, … • PC/editor/organizers of SIGMOD, ICDE, WWW, SIGKDD special issue, WIRI06, IIWeb06 workshops • Awards: NSF CAREER, IBM Faculty, NCSA Faculty VLDB00 Best Paper Selection
AnHai Doan: Data Integration • Data integration • Matching Schemas, Ontologies, Entities • Integrating databases and text • ACM Doctoral Dissertation Award 2004 • Sloan Fellowship 2006 • Edited special issues on Data Integration • Co-chaired workshops on data integration, Web technologies, machine learning • Co-writing a book titled “Data Integration” Monitoring People and Events
David Forsyth: Computer Vision and Learning Linking Text and Images Labeling images via (a lot of) caption text • Leading Computer Vision researcher, • Over 110 papers on vision, graphics, learning applications • Program Chair, CVPR 2000, General chair CVPR 2006, regular member of PC in all major vision conferences • IEEE Technical Achievement Award, 2006 • Lead author of main textbook, widely adopted
Jiawei Han: Data Mining • Mining patterns and knowledge discovery from massive data • Research focus: Mining data streams, frequent patterns, sequential patterns, graph patterns, and their applications • Privacy preserving Data Mining • Developed many popular data mining algorithms, e.g., FPgrowth, PrefixSpan, gSpan, StarCubing, CrossMine, RankingCube, and CrossClus • Over 300 research papers published in conferences and journals • Editor-in-Chief, ACM Transactions on Knowledge Discovery from Data • Textbook, “Data mining: Concepts and Techniques,” adopted worldwide
Cinda Heeren: Data Mining, Education • MIAS Summer School Director • Mathematical Foundation of Data Science Discrete UIUC CS Department Director of Diversity Programs • Lecturer for Discrete Math, Data Structures courses at UIUC • One of the leading teachers and educational leaders at UIUC. • Research in Data Mining and Data bases • Speaker and regular presenter at conferences for young women, including: • GAMES and WYSE summer camps • Expanding Your Horizons careers conference • Grace Hopper Celebration of Women in Computing 2004 - panel on best practices for recruiting women into undergraduate programs in CS. • SIGCSE 2006 - workshop, “How to host your own Small Regional Celebration of Women in Computing.” Summer School
ChengXiang Zhai: Information Retrieval and Text Mining • Probabilistic Paradigms for Information Retrieval • Personalized/Context Dependent Search • Relation Identification and Data Integration • Leading expert in information retrieval and search technologies • Recipient of the 2004 Presidential Early Career Award for Scientists and Engineers (PECASE), • Main architect and key contributor of the Lemur Toolkit (being used by many research groups and IR companies around the world) • ACM SIGIR’04 best paper award • Selected services include Program Chair of ACM CIKM 2004; HLT/NAACL 06; SIGIR’09
Dan Roth: Machine Learning, NLP • Semantic Analysis and Data enrichment • Entity and Relation Identification and Integration • Textual Entailment • Machine Learning Methods for NLP and IE • Leading Researcher in Machine Learning, NLP, AI • Developed Popular Machine Learning system and machine learning based NLP tools used in industry and NLP classes. • Program Chair, ACL’03, CoNLL’02; Regular senior PC member in all major Machine Learning, NLP and AI conferences • Associate Editor: Journal of Artificial Intelligence Research; Machine Learning • Multiple papers awards
Information Access & Synthesis Processes Tools Text Processing& Analysis Semantic Analysis & Information Extraction Information Integration Machine Learning & Data Mining Integrating Text & Images • Focused data retrieval and integration, • Identify and collect relevant data from multiple sources • Semantic data enrichment, • Infer semantics from unstructured data and images; • Allow navigation and search across disparate data modalities; augment KBs • Entity identification and relations discovery, • Identify real-world entities and relations among them • Relate them to existing institutional resources for information integration • Knowledge discovery and hypotheses generation and verification, • Construct the rich semantic structure and hidden networks of entity linkages • Foundations • Machine learning, database and data mining, natural language processing, inference and optimization and computer vision techniques • Called for and driven by the aforementioned problems.