420 likes | 570 Views
Information management, workflow and discovery /check-in for project definitions. Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013. Review of reading. Information Integration Social issues in information discovery and sharing Information integration in geo-informatics
E N D
Information management, workflow and discovery /check-in for project definitions Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013
Review of reading • Information Integration • Social issues in information discovery and sharing • Information integration in geo-informatics • http://cseweb.ucsd.edu/~goguen/projs/data.html • http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1839387/ • Information Life Cycle • MSDN Information Life Cycle • Information Life Cycle definition and context • http://www.computerworld.com/s/article/79885/The_new_buzzwords_Information_lifecycle_management • http://www.databasejournal.com/sqletc/article.php/3340301/Database-Archiving-A-Critical-Component-of-Information-Lifecycle-Management.htm • http://en.wikipedia.org/wiki/Information_Lifecycle_Management • http://msdn.microsoft.com/en-us/library/bb288451.aspx • Information Visualization • http://mastersofmedia.hum.uva.nl/2011/04/18/the-simple-ways-of-information-visualization/comment-page-1/ • http://www.siggraph.org/education/materials/HyperVis/domik/folien.html • http://www.visual-literacy.org/periodic_table/periodic_table.html • Information model development and visualization • http://www.acm.org/crossroads/xrds7-3/smeva.html • Outside the current box • Peter Fox and James Hendler, 2011, Changing the Equation on Scientific Data Visualization, Science, Vol. 331 no. 6018 pp. 705-708, DOI: 10.1126/science.1197654 online at http://www.sciencemag.org/content/331/6018/705.full or see: http://escience.rpi.edu/publications/visualization/fox_hendler_science2011.html
Logical Collections • The primary goal of a Management system is to abstract the physical collection into logical collections. The resulting view is a uniform homogeneous collection. • Note the analogy with logical models and information integration: so EARLY ON • Identifying naming conventions and organization • Aligning cataloguing and naming to facilitate search, access, use (who uses?) • Provision of **contextual** information
Physical Handling • Map between physical and logical. • Where and who does it come from? • Is there a transfer into a physical form? • Is it backed-up, archived, cached? … • What formats? • Naming conventions – do they change? • Note analogy to physical models
Security • Access authorization and change verification. This is the basis of trusting your information.
Ownership • Who is responsible for quality and meaning
Metadata • Recall metadata are data about data. • Metainformation?
Persistence • Deployment of mechanisms to counteract technology obsolescence.
Discovery • Ability to identify useful relations and information inside the collection • More on this later in this class
Dissemination • Mechanisms to make aware the interested parties of changes and additions to the collections. • Do you rely on information retrieval? The Web?
Summary of Information Management • Creation of logical collections • Physical handling • Interoperability support • Security support • Ownership • Metadata collection, management and access. • Persistence • Knowledge and information discovery • Dissemination and publication
Note for your project writeup! • Information management! Cover the 9 areas.
Information Workflow • What is a workflow? • Why would you use it? • Key considerations for information, cf. data • Some pointers to workflow systems
What is a workflow? • General definition: “series of tasks performed to produce a final outcome” (taxes?) • Information workflow – involves people but potentially want to • Automate jobs that a person traditionally performed manually • Process large volumes of information faster than one could do by hand • NB difference from data workflows – it reaches out to encompass the user (e.g. ‘unrecorded actions’)
Background: Business Workflows • Example: planning a trip • Need to perform a series of tasks: book a flight, reserve a hotel room, arrange for a rental car, etc. • Each task may depend on outcome of previous task • Days you reserve the hotel depend on days of the flight • If hotel has shuttle service, may not need to rent a car • Prior information, experience, preferences…
What about information workflows? • Perform a set of transformations/ operations on information source(s) • Examples • Generating images from raw data • Identifying areas of interest from a large information source (e.g. word cloud) • Classifying a set of objects • Querying a web service for more information on a set of objects • Many others…
More on Workflows • Can process many information types: • Archives • Web pages • Streaming/ real time • Images • Semiotic systems • Robust workflows depending on formal (concept and logical) models of the flow of information among components • May be simple and linear or very complex
Challenges • Questions: • What are some challenges for users in implementing workflows? • What are some challenges to executing these workflows? • What are limitations of writing a program? • Mastering a programming language • Visualizing workflow • Sharing/exchanging workflow • Formatting issues • Locating datasets, services, or functions
Benefits of Workflows • Documentation of aspects of analysis • Visual communication of analytical steps • Ease of testing/debugging • Reproducibility • Reuse of part or all of workflow in a different project
Additional Benefits • Integration of and between multiple computing environments • ‘Automated’ access to distributed resources via other architectural components, e.g. web services and Grid technologies • System functionality to assist with information integration of heterogeneous components and source
Why not just use a script? • Script does not specify low-level task scheduling and communication • May be platform-dependent • Can’t be easily reused • May not have sufficient documentation to be adapted for another purpose
Why can a GUI be useful? • No need to learn a programming language • Visual representation of what workflow does • Allows you to monitor workflow execution • Enables user interaction (though not necessarily collaboration) • Facilitates sharing of workflows
Some workflow systems • Kepler • SCIRun • Sciflo • Triana • Taverna • Pegasus • Some commercial tools: • Windows Workflow Foundation • Mac OS X Automator • http://www.isi.edu/~gil/AAAI08TutorialSlides/5-Survey.pdf • http://www.isi.edu/~gil/AAAI08TutorialSlides/ • See reading for this week
Discovery • How does someone find your information? • How would you provide discovery of • collections • files • ‘bits’ • How would you find ->
Discovery • Search (Federated Search) • Helped by • Folksonomies (user contributed) • Intelligent Agents • Search Engines • Taxonomies • Find photos of Kim • Boy or girl?
Use cases • Find a sound recording of a swallow. • Excuse me?
Use cases • Find a sound recording of an African Swallow • Find a sound recording of a bird that sounds like an African Swallow • Media types – how can you discover them?
Use cases • Find the movie that Jean Tripplehorn first starred in/ that was her most successful/ was lead actress? • Has anyone gene sequenced a mouse? • Find images of primary productivity in the North Atlantic • Discovery can often involve information integration (or is it *almost always*?)
Three level ‘metadata’ solution for DATA Data Discovery Data Integration Level 1: Data Registration at the Discovery Level, e.g. Volcano location and activity Level 2: Data Registration at the Inventory Level, e.g. list of datasets, times, products Level 3: Data Registration at the Item Detail Level, e.g. access to individual quantities Earth Sciences Virtual Database A Data Warehouse where Schema heterogeneity problem is Solved; schema based integration Ontology based Data Integration Using scientific workflows A.K.Sinha, Virginia Tech, 2006
Three level ‘metadata’ solution? Information Integration Information Discovery Level 1: Registration at the Discovery Level, e.g. Find the upper level entry point to a source Level 2: Registration at the Inventory Level, e.g. list of datasets, using the logical organization Level 3: Registration at the Item Detail Level, i.e. annotation e.g. tagging Catalog/ Index Schema based integration Integration using mapping management A.K.Sinha, Virginia Tech, 2006
Information discovery • What makes discovery work? • Metadata • Logical organization • Attention to the fact that someone would want to discover it • It turns out that file types are a key enabler or inhibitor to discovery • Result ranking using *tuned* algorithm • What does not work? • Result ranking algorithms that depend on unconventional information types (icon, index, symbol)
Federated search • “is the simultaneous search of multiple online databases or web resources and is an emerging feature of automated, web-based library and information retrieval systems. It is also often referred to as a portal or a federated search engine.” wikipedia • Libraries have been doing this for a long time (Z39.50, ISO23950) • Key is consistent search metadata fields (keywords) • E.g. Geospatial One Stop http://www.geodata.gov
Smart search • Semantically aware search, e.g. http://noesis.itsc.uah.edu , http://eie.cos.gmu.edu (Water -> Semantic Search) • Faceted search, e.g. mspace (http://mspace.fm ), exhibit (MIT), S2S (RPI; http://aquarius.tw.rpi.edu/s2s )
Faceted search logd.tw.rpi.edu
Summary - discovery • Useful to write a few discovery use cases to drive how your design is developed • Evolution of your role in facilitating discovery and what/ how others implement access to your information
Reading for this week • Is retrospective
Check in for Project Assignment • Analysis of existing information system content and architecture, critique, redesign and prototype redeployment • Or a new use case, development, etc.
What is next • April 16 – Information Audit • April 23 – • April 30 – • May 6 – final project presentations