160 likes | 266 Views
ICT roadmap for SEIS: draft strcuture. 26-27 / 11 /200 8, Copenhagen Jaanus Heinlaid Systems Consultant, TietoEnator Eesti. 1 What is ICT? 2 What is SEIS (in ICT context)? 2.1 SEIS concept 2.2 Underlying principles 2.3 Environmental information flows according to SEIS
E N D
ICT roadmap for SEIS: draft strcuture 26-27/11/2008, Copenhagen Jaanus Heinlaid Systems Consultant, TietoEnator Eesti presentation
1 What is ICT? 2 What is SEIS (in ICT context)? 2.1 SEIS concept 2.2 Underlying principles 2.3 Environmental information flows according to SEIS 2.4 Base elements of SEIS 3 The road to implementing SEIS 3.1 Identify requirements to data 3.1.1 How data is found on SEIS nodes 3.1.2 How do we recognise data 3.1.3 How data on SEIS nodes is understood by machines 3.1.4 How do we know the quality of data 3.2 Build tools for data 3.2.1 Data Discovery tools 3.2.2 Data repositories 3.2.3 Data conversion services 3.2.4 Quality Assessment tools 3.2.5 Tools for further data processing presentation
What is ICT? • ICT stands for Information Communication Technology. • Why does it matter? Everybody what ICT stands for, right? Well, not quite. It is paradoxical, but often the term ICT doesn't ring a bell in the heads of IT (Information Technology) people. This is because different communities use different terms for what is basically the same thing. According to Wikipedia the term ICT is used in preference to IT in two communities: education and government. • So, since it is the IT people who will eventually implement the technology behind SEIS, maybe the masterminds of this initiative could consider replacing ICT with IT. < back presentation
What is SEIS (in ICT context)? • This chapter should basically serve as a reminder of what is it we're doing the roadmap for. • It should probably collect and summarize everything ICT-related that has been written about SEIS in different documents. In short- this chapter should present a concise overview of SEIS in ICT perspective, such that a complete newcomer IT person would quickly get the idea behind SEIS. < back presentation
Underlying principles • This chapter should probably list the famous underlying principles of SEIS, but maybe only those that are very relevant in IT world: • Information is managed as close as possible to its source • Information is provided once and shared with other for many purposes • Information should be readily accessible for the end-users < back presentation
Environmental information flows according to SEIS Here probably a good place for the EEA's picture about SEIS data flows and some explanations: < back presentation
Base elements of SEIS • Content • Data (all relevant data and information related to Europe’s environment from local to global scale) • Metadata (the information describing the data (content, quality, condition, other characteristics) • Infrastructure • A network for accessing and sharing of environmental data between SEIS nodes. • Tools and services: to help discover and make use of data as well as to streamline it. • Organisation • Division of roles and responsibilities among all actors involved. Here once more the 3 general-level elements of SEIS: < back presentation
How data is found on SEIS nodes • It depends on the possibilities and willingness of the organisations between SEIS nodes (i.e. stakeholders). • Given the concept of SEIS and the much-stated guiding principles, especially the principle of managing information as close as possible to it source, the ideal way would be to build a network of Web Services. • A Web Service is a software system designed to support interoperable machine-to-machine interaction over a network (http://en.wikipedia.org/wiki/Web_services). • This is a popular concept nowadays, but requires stakeholders to have or buy IT expertise. Since not all the stakeholders have equal possibilities in this matter, some mechanisms must still be established for stakeholders to simply report their data in the plain old way of sending it in numerous different formats (e.g. MS Excel, MS Access, XML) to some central repository • Keepers of that repository will process it and publish it in a machine-readable way to other stakeholders. < back presentation
How do we recognise data • Web Services is just a technological mechanism to find the data. Once some data is found, how do we know it's what we searched for. Well- that's where the magic of metadata comes in. Naturally, stakeholders are not equally attentive to tag their data with metadata. But SEIS must provide mechanisms for tagging (http://en.wikipedia.org/wiki/Tag_(metadata)) other stakeholders metadata. • Metadata is crucial to enable structured searches and comparison of data in different repositories. < back presentation
How data on SEIS is understood by machines • Standardise data formats (per topic) and make the data flows comply with those standards. • For those unable to reportin standardised formats, some special SEIS nodes must be designed to understand various proprietary data formats and make some XML from them. • Trying to make stakeholders agree on standardised formats is slow and painful. One of the first attempts was EEA's Waterbase project that started and died in year 2000. But there have also been successful attempts:http://www.eea.europa.eu/maps/ozone. • Make SEIS a Semantic Web. • SEIS seems to be a typical case of a Semantic Web (http://en.wikipedia.org/wiki/Semantic_web). • The idea is that different parties don't report data in proprietary formats, but instead they publish data in formats they wish and provide semantics for machines to understand those formats. • Still cannot do completely without standards: such formats have to be XML-based and syntax of semantics si standradized too. But this is done by W3C and consortiums alike. The truth is probably to apply both ways. < back presentation
How do we know the quality of data • This is again where metadata comes in hand. • At some nodes in SEIS there will definitely be some Quality Assessments (QA) done. • It is vital that SEIS provides ways for reflecting the results of a data quality assessment in the metadata of the data source. • The resulting QA report must become a permanent subjective comment on the data. < back presentation
Data Discovery tools • These are probably the most important tools we're going to have for data. • They have to be quite intelligent and they will rely on what is published on SEIS nodes. • They will probably need to provide some web services for directly registering data sources (i.e. data existence is pushed into discovery tools). < back presentation
Data repositories • Ideally there would not need to be central data repositories. Because ideally all SEIS nodes would comply with the specification of "how data is found". • Unfortunately, the reality is a bit more complicated and not all nodes will be that advanced. • For these nodes, central data repositories will have to be created where the data can be reported to. • The repositories make sure the reported data they have is published in a discoverable way. < back presentation
Data conversion services • There will definitely have to be services for converting data from a proprietary format into more general machine-understandable formats. • This is for the sake of better discoverability and automated quality assessment and other cases of automatic data processing. < back presentation
Quality Assessment tools • Quality assessment is probably the first and most important data processing action. • So, some kind of web-based workbenches must be created where quality assessment operations can be carried out on the discovered data and the result permanently related with the data source. • The quality assessment results themselves will become new data sources that must be discoverable by discovery tools. < back presentation
Tools for further data processing • After the quality of data has become known, further processing can be carried out. • This is probably the stage where human hand-work starts to come into play, but various helper tools can certainly created to help this work. • What kind of helper tools- this has to be found out by analysis of current processes and filtering out the most important cases where machines can come to help. < back presentation