270 likes | 404 Views
Building an Integrated Information Service a strategic initiative. Agenda. Introductions Problem Definition Approach Questions (jeers/cheers). The Big Problem ™. It is increasingly more difficult to find our stuff. It’s hard to find your stuff It’s hard for you to find my stuff
E N D
Building an Integrated Information Servicea strategic initiative Andrew Schain, NASA HQ’s CTO, October 26th 2005
Agenda • Introductions • Problem Definition • Approach • Questions (jeers/cheers) Andrew Schain, NASA HQ’s CTO, October 26th 2005
The Big Problem ™ It is increasingly more difficult to find our stuff. • It’s hard to find your stuff • It’s hard for you to find my stuff • It’s nearly impossible to be made aware of stuff we don’t know about • Stuff: everything from raw instrument data to video clips to compiled human analysis Andrew Schain, NASA HQ’s CTO, October 26th 2005
The Big Problem ™ It is increasingly more difficult to understand which stuff is relevant. • In the world of decision making support, we can’t alwaysanticipate where the next piece of required (sometimes vital) information is. Andrew Schain, NASA HQ’s CTO, October 26th 2005
The Big Problem ™ It’s really big (1) Corpus Size & Rate of Growth. • NASA has a tremendous amount of data collected over the last 50 years. • The exact size and growth rate of our data collection are unknown • Employees, partners and customers are generating new data continuously • Efforts to assess the value of our data collection in either informational or financial terms are difficult • Neither the collection nor its growth rate are likely to diminish significantly in the next 5 years • 13% of NASA’s budget is spent supporting information technology. Andrew Schain, NASA HQ’s CTO, October 26th 2005
The Big Problem ™ It’s really complicated (2) Variety of Data Sources & Types. • Data and information has great variety in origin, source and type ranging from one-of-a-kind instruments and software, to last-of-its-kind legacy systems. • Collection includes foundational science data and PowerPoint briefings • stored in man-made appliances and human experiences • Computer systems and instruments are diverse, spread out across the globe • in some cases, beyond • Data consumers are also potential producers • regenerating data through analysis, compilation, edits and emails. Nearly each instance is another piece of data added to our unorganized collection. Andrew Schain, NASA HQ’s CTO, October 26th 2005
The Big Problem ™ We are not monolithic (3) Customer Environment • For NASA, the world is our data and information community. Our customers vary from schoolchildren to university researchers. They encompass nearly all of the disciplines of science, engineering and project management. They speak many different natural languages accented with unique science nomenclatures, technical idioms and the contextual nuances of their own experiences. • Even where there is a common language, humans use different vocabularies and meanings, and domain specialists may have difficulties in conveying information to non-specialists. Andrew Schain, NASA HQ’s CTO, October 26th 2005
The Big Problem ™ Searching alone won’t help (4) Discovery & Relevance • As the quantity and variety of data / information increases, it is increasingly more difficult to find information that you or your organization has collected • Learning of related information outside of your own organization seems impossible, but is required to achieve a more complete and effective understanding of many of our activities. We cannot anticipate the exact piece of information we will need, but we need to be aware of it nonetheless. In other words, a mechanism is required to present data in relevant context to each unique situation without knowing the data source or potential consumer(s) in advance. Andrew Schain, NASA HQ’s CTO, October 26th 2005
A bunch of us thought about this problem a whole lot Major discovery: The problem is solvable! • 5 of us co-authored a white paper Schain, Raskin, Wilson, Keller, Truszkowski • Lots of early reviewers and editors • Big contributions from Jeanne Holm, Scott Glasser, John McManus, Rob Winters, Kendall Clark, Bijan Parsia, Jim Hendler Andrew Schain, NASA HQ’s CTO, October 26th 2005
Our Options First, admit that we have a problem. Then, either (1) We fix the data problem, or (2) We don’t fix the data problem. • We let the data problem fix itself. • We let someone else fix the data problem. (NASA-Speak for nobody fixes it) Let’s just say, “option 1” Andrew Schain, NASA HQ’s CTO, October 26th 2005
Some derived requirements(or are they inferred?) • Mustn’t shock the system. • Human, financial, or in-situ services / data independence • Reality: One size does not fit all (it’s complex). • This effort is across the board! And involves everyone. • Must enable experts to effect adds/changes within their discipline while making results available for others. • Must scale up and be available across timelines. • Uniform/understood machine interfaces in a distributed service • Must provide accurate results promptly. • Requires machine assistance • Must work in a global heterogeneous environment. • Expressed at levels of common representation outside of OS or network funnies Andrew Schain, NASA HQ’s CTO, October 26th 2005
How? SomeTagging • Linking, Expression, Extension, Relevance • Where information about an application, a service, a dataset is available we should tag it • Where it is not available we should consider ways of adding it • Tags should be considered in terms of context, relationship and meaning • “Annotate & Share” • Provide a mechanism where subject matter experts add metadata or context and leave it for others to build on incrementally • Strategy must enable data and information customers to drive incremental metadata organization based on their needs. • Each valid construct can be left for others to reuse and repurpose. Andrew Schain, NASA HQ’s CTO, October 26th 2005
And Organizing • Once tags or mechanisms for collecting metadata is in-place, we need to organize it. • and assure that logic and relationships within applications or vital contextual instrument information is inserted and maintained. • Making it discoverable • Search, browse, and query • Machines connecting the dots not just people • Leverage what we have and what we know • Reuse and leverage available ontologies • SWEET, what else you got? • Scan Stanford, swoogle, etc • Validated and use a library service • Currency and validity? Yikes! • Think about SNMP wrapped in a little OWL/Pellet • Slurp and translate existing schemas into RDF ontologies or OWL depending on requirements and opportunities Andrew Schain, NASA HQ’s CTO, October 26th 2005
The Obvious • Buzzwords lose currency • The Semantic Web versus Semantic Web Technologies • KM (not a noun), Taxonomies, Ontologies, Web Services, Web2 • This is a very big job, • It is a very big problem with few alternative solutions • We don’t have all of the skills to do it • One size does not fit all, this is an integration effort • Big data bases don’t suck • Except when you try to integrate them • Requires careful data stewardship as well as ownership • The problem will not be solved all at once • But this approach will provide immediate incremental and significant improvement • Positions us to easily adapt to changing data requirements and devices. Note: “blue” items are needed to make the problem solvable Andrew Schain, NASA HQ’s CTO, October 26th 2005
Approach Establish Leadership & Organize • Formal project lasts for 5 years • Specialized management disbands and normal operational support takes over if proper momentum is established • Achievable if focus is on specific areas • Advertising brings in important stakeholders and knowledge workers • Publish Principles, Advertise Strategies Establish Infrastructure & Processes • Tag everything we can, even if it’s just a little bit • Make it easy on folks to do it and to add to it • Annotate & Share • Each valid construct left for others to build on • Make it easy on folks to leave stuff behind in libraries Establish Attractors • Implementing key “attractor” services will create a Network Effect Andrew Schain, NASA HQ’s CTO, October 26th 2005
Leadership Tactics Social Advertisement and Primers • Show & Tell • Demonstrations, examples, proofs • Understanding the basics • A “big picture” orientation made up of the smaller components • How do the parts fit together? • Parsers | Triple Stores | RDF, Kowari, RDFLib, 3Store, Seseame | Query languages | Reasoners | FaCT, Racer, Pellet, Jena | Open vs Closed World Publish design principles • Data organization constructs (e.g. taxonomies, ontologies, XML schemas) must be reusable and available for computer systems/services. The URIs used for uniquely identifying these constructs should resolve to their respective schemas; • Web services must be made available for reuse (and strategies need to be developed to identify service types, applications, and rules governing their availability); • Yield to the greater concept even if your focus is more narrow; • Keep it simple to maximize agility and re-use; • Leverage our existing web infrastructure. Andrew Schain, NASA HQ’s CTO, October 26th 2005
Leadership Tactics Publish strategic principles • Keep data and contextual validity close to the data owners and subject matter experts who care about it; • Develop a strategy about data curator functions, providing assurance, change control, etc; • Accept that some data constructs or services may not be fully mature at the outset but can be driven by subsequent customer use and applied benefit; • Protect individual privacies as disparate systems become available to wider use; • Establish a presence/partnership on standards bodies (the W3C Semantic Web Best Practices) other standards projects (PAW) and with research (UMD, Stanford, MIT, Southampton); • Conduct InterOps and Teach-ins • Publish target designs so that application developers can model their systems against a standard, leveraging the work that has gone before, and enabling fast track extension and integration to other systems; • Understand, document and manage to the measurable success criteria for the initial, intermediate and longer terms of the effort. Andrew Schain, NASA HQ’s CTO, October 26th 2005
Just a bit of Infrastructure • Enough for services to be organized, discoverable, and reusable. KR libraries • Make XML, RDF, OWL, Thesauri, and Taxonomies available as a library service, enabling authorized reuse from authoritative sources; • We will need to establish manageable, semantically rich official libraries for unique Knowledge Representations (e.g., payload processing constructs, human relation constructs, vehicle and instrument constructs); • We need to adopt more universal KRs constructed outside of NASA but certified for our use (e.g., astronomy and celestial mechanics constructs, metallurgy, telemetry, navigation constructs, facilities, computers and other capital investment constructs); • Architecture should enable ownership/authorship and responsibility for domain experts to give others confidence and trust through provenance, currency and (more importantly) through successful results; • Think SVN & Annotea • A process for conversion or translation of traditional schemas or corpora will need to be formalized so our repositories of production-worthy ontologies can grow easily. Andrew Schain, NASA HQ’s CTO, October 26th 2005
A bit more infrastructure • Service Advertisement Repositories • Established for individuals to publish available web services that can communicate with other existing services. The goal is to enable our computersto know when a new service has come online, understand what it does, employ its functions as part of generalized tasks, and specify under what conditions the service can be used and trusted. • Testing with careful consideration to browsing, discovery, and trust capabilities. • Metadata Collection and KR Construction • Provide tools that either harvest existing metadata or provide computer assistance in asserting new metadata. • Efficient mechanisms for populating KBs should be assessed and some preliminary findings tested against candidate systems. Natural Language Processors that assist in determining likely metadata elements, as well as simple mechanisms for customers to add semantic annotations, should be evaluated in parallel. • The Drawer of Kitchen Utensils • Integrating existing information services. • GlueCode that will add metadata awareness to infrastructure components, instruments and existing applications. • Collaboration awareness for Wikis, DMSs, Del.icio.us, Workflows, Flickr, and so on. • Tools that will generate RDF from office-type applications. • Tools that will generate ontologies from database schemas. Andrew Schain, NASA HQ’s CTO, October 26th 2005
Establish “Attractor Services” The network effect describes how a service becomes more valuable as more and more people adopt it. As more services and capabilities get incorporated, it motivates more individuals and more services to participate. The more services we tie together, the greater the utility. The greater the utility, the more services get incorporated. • Linking People,Organizations, Projects, and Skills • Metadata search and inference in image inventories • Federal Enterprise Architecture and Capital Investments • Semantically-enriched Document Management • Integrating Science Knowledge • Semantically-Enabled Workflows Andrew Schain, NASA HQ’s CTO, October 26th 2005
Anticipated Positive Results • Once established the attractor services will be able to use the same KRs and KBs making the overall capabilities much more powerful • Raises the bar for acceptable solutions • Expansion of current skill sets • Social Networks will facilitate tighter work environments even when geographically dispersed. Andrew Schain, NASA HQ’s CTO, October 26th 2005
What has happened so far • JPL workshop • White paper is an accepted agency EA recommendation • Hounding • CIOs are aware, learning more, and seeking to support • Demos • More hounding Andrew Schain, NASA HQ’s CTO, October 26th 2005
The JPL Workshop • The demo is a proud moment • LDAP feed via Python script and converted to RDF, stored in 3store, queries via a Mspace java code talking to the 3store, forked to redfoot, passing a uri for each instance for browsing, plus a BS’d FOAF network creates an artificial representation of FOAF:Knows • This was important because • The building blocks had not been arranged like this before • The 9 of us worked together “virtually” • Got one more to do for senior NASA managers • This time add a connection to a web service • If successful, formalization and funding to implement the production service Andrew Schain, NASA HQ’s CTO, October 26th 2005
It doesn’t matter that the technologies are immature • We have a LOT to do and some stuff can/should be done now • Learn from doing/drive the technologies • Maintain linkage with you folks establish strong preferences with industry • By the time we need the other bits, they will be ready too • Or we should engage the development or standards committees and drive our requirements instead of being dragged behind Andrew Schain, NASA HQ’s CTO, October 26th 2005
More, Please • SWEET, SciFlo, IO, Workflows, and all the other stuff that is easier to integrate together! • FOAF, DOAP, Wikis, Triple Stores, KRs, Del.icio.us, Flckr, Annotea • Trading and Talking • Presents new alignment possibilities • Oracle, Adobe, Microsoft, Cisco, W3C • Think about it • Re-combining and combing the stuff we are already collecting Andrew Schain, NASA HQ’s CTO, October 26th 2005
Thinking Back on the Future • The big question for me was, is this real? • Or is it: Can it be made real? • It is the same question for you guys. • Do you want to see your ideas implemented globally? What if you could integrate them in? • Important nature of this group. • And what are you going to do about good and evil? • Every time I talk about these technologies with people on the “outside” they become frightened • Protecting your privacy rights • Protecting your public work Andrew Schain, NASA HQ’s CTO, October 26th 2005
Thank you Andrew Schain, NASA HQ’s CTO, October 26th 2005