1.58k likes | 1.75k Views
Topics in Informatics. Spring 2005, SJC. About the Instructor. Instructor Dr. Hong Zhou Office McDonough 317 Office Hours MWF 10:00 – 11:00am Email: hzhou@sjc.edu , Phone: 231-5826 Syllabus You can all me either: Hong Dr. Hong Dr. Zhou. What is Informatics?.
E N D
Topics in Informatics Spring 2005, SJC
About the Instructor • Instructor Dr. Hong Zhou • Office McDonough 317 • Office Hours MWF 10:00 – 11:00am • Email: hzhou@sjc.edu, Phone: 231-5826 • Syllabus • You can all me either: • Hong • Dr. Hong • Dr. Zhou
What is Informatics? • Search for ‘What is informatics’ at http://www.google.com, we got different definitions. • Basically, the study and application of the knowledge and skills of data/information flow and manipulation (including storage, retrieval, analysis, and construction/deriving, etc).
Informatics • Data obtaining • Data flow and control • Data representation (records) and storage • Data retrieval/mining • Data analysis • Data derivation (generating new data from existing data via analysis)
What Will You Learn • Obtaining reliable data. • Data Management (Data Storage and Representation, Retrieval) database. • Introduction to Bioinformatics. • Introduction to Health Informatics.
Part I Obtaining Reliable Data • Complex and precise communication is something distinguishing us from non-human. • The world development is somehow the development of our understanding, i.e. information of the universe including our social systems. • Information and its uses are the center of such development.
Information vs Data • What is in your mind when we talk about “INFORMATION”? • Is information touchable, visible? • To my understanding, data is the description of information, and information is the interpretation of data. • So, let’s deal with the description data, in this class.
What is Data? • When we talk about data, the first image in our mind might be numbers such as 5, 87, 98.34, etc. • However, are the numbers 5, 87, 98.34 meaningful/informative?
Data with Context • Pure numbers are meaningless for us. • Numbers with context are meaningful, however. • For example, 5 pounds of sugar. • So, in this class, we are talking about meaningful data and ignore all meaningless data. (Are we meaningful persons?)
Quick Questions Are following ‘data’ meaningful? • 20 • 20 years • A 20 years old girl • A 20 years old girl named Amie • A 20 years old girl named Amie who is a SJC student.
Data Target • Data is used to describe a subject. • For example, age, height, weight, gender, profession, are description of a person. • Medical record is a description of a patient
Quick Question What are the targets of the following two rows of data?
What is RELIABLE? • When we talk about reliable data, what does that mean? • Let’s discuss this issue at two levels: • Individual level • Group/population level (statistics)
Individual Level • Reliable data means that the data is ‘closely’ related to the individual (or event) and ‘precisely’ describes the individual (or event). • A computer of 3.2 ghz CPU, 512 mb RAM, 512 kb cache, etc.
Group/Population Level • ‘Reliable’ is more meaningful at the group level. • Can a specific medical diagnose of a patient be representative of all patients with the same symptom? • Probably not.
Statistical Thinking • One powerful approach to analyze data is statistics. • We measure the reliability (significance) of data in the sense of statistics. • Statistical thinking is to use data to build our understanding, gain insights, and draw conclusions or make inferences. • Not drawing conclusion from an incident.
Principles in Statistical Thinking • Count on data instead of an incident • Where the data is from matters. • Lurking variables • Variation is everywhere • Conclusions are not absolutely certain
Count on large amount of data instead of a few incidents • Famous fortune teller • The thumb of a monk
Where data is from • Group data can be collected from surveys or observations, or obtained from experiments. • When collecting data, where the data come from is important. For example, once there is a question “If you had it to do over again, would you have children?” 70% from the written responses are NO. Is this piece information reliable?
Lurking Variables • Is music practice improving test scores? • What is behind?
The Importance of RANDOM • The key factor in data collection is the RANDOM concept, i.e. the data has to be randomly collected with no bias. • Suppose that you are doing a survey of 2004 election prediction from 10000 people in USA, how are you going to pick the 10000 persons? Only in schools? Only in New York? Only women? Avoid as much as bias as you can.
Experiments • Some reliable data can only be produced by experiments, especially in science. • For example, in biology, to pin down the function of a gene, you have to knock out the gene or depress it and check the phenotype changes. After that, you have to recover the gene and verify if the phenotype also recovers. Such experiments are very convincing, but expensive.
Another Experiment • It once was believed that women who take hormones after menopause reduce the risk of heart attack. The belief was resulted from the studies that simply compared women who were taking hormones with others who were not. Are such study results reliable? • Such experiments lack proper Controls, which are the essential in all experiments. • How are you going to design an experiment for this study?
Reliable Data cont’d • It is not a simple task to obtain reliable data, it requires extensive consideration and design. • Some experiment results may look convincing at some time, but may lose their reliability over time or when the environment changes. For example, the third stop light of cars.
Discussion • Is absence of evidence the evidence of absence?
Project 1 • Write a paragraph to discuss the claim “Absence of evidence is evidence of absence”. Please make your own judgment as the grading is based on your argument. • Design a simplesurvey to collect opinions about terminating death penalty. Be aware of the importance of “RANDOM”. Write a short paragraph to argument that the data collected by your survey is reliable. • Points: 100. • Due Date: Feb 1st, 2005. • Submit your work in the digital drop box in Blackboard.
Part II Data Storage • Can all information be recorded as data? Let’s start the discussion. • Feeling • Knowledge • Intelligence
Personal Ideas • My understanding: Yes, just some of them are too complicated or too difficult to manifest precisely. • And that is whey we have IQ test, MQ (motivational quotient), EQ, etc.
Where to store • Data is stored somewhere. • Minds • Books (paper documents) • Computers • Etc • Let’s compare the three storage methods, which one you think more lasting or appropriate?
Passing Words • In ancient time, knowledge is passed in words generation by generation. • Here is a story about passing by words: • General called the captain telling “tonight at 7:00pm, the Halley comet will pass your camp in the sky. Organize your soldiers to watch”. • Captain informed his lieutenant: “Tonight at 7:00pm, the Halley comet will pass our camp in the sky and the general is coming to watch with our soldiers.” • The lieutenant informed the sergeant: “Tonight at 7:00pm, the general will accompany Halley comet passing over our camp, organize the soldiers” • The sergeant to soldiers: “Tonight at 7:00pm, general Halley will pass over our camp in sky and we are going to watch that”.
Data Storage • Paper storage: • Size and cost • Transportation • Computer: • Signature legal effect • Hacking • What if computers are down? • However, if data is not organized, it is difficult to make use of. So, data storage strategy is important. • In this class, we talk about data storage by using computer technology.
Ways to store • Data storage is a big, and probably the largest issue related to computer data manipulation. • Different database structures, different database managements, online storage, etc.
Chapter 1. File structure • Hierarchical structure • Easy to deal with the hierarchical relationships. • For example, the administration is a hierarchical structure. • Let me use the DOD/NIMA VPF structure as an example
VPF Structure • DOD (Department of Defense) and NIMA (National Image and Mapping Agency) sponsored the VPF development (Vector Product Format) Nickname: very poor format • It is used to store the earth ground information and provide a digital map.
VPF structure • Library Database Library Coverage Coverage Coverage File1 File1 File1 File1 File1
Navigation in Hierarchical Structure What is the purpose of Index?
Project 2 • Create a hierarchical file structure to store some your works in SJC. • This is the way I prefer: organize your works based on the classes you take. • If you have other ways, that is ok as long as they are organized well. • Show me in class what you have done. • Points: 100
Chapter 2 XML • Extensible Markup Language • Purpose: • Data transportation • Data representation • Data storage • Why we should talk about it here? Because the data inside a XML file is hierarchical
What XML Promises? • Data portability • Programming language Java promises the portability of programs. • However, programs are working on data. Before XML, data is not portable, communication among systems, agencies are extremely difficult. • XML allows systems to communicate using a standard means of data representation.
HTML? • HTML is the portable language for browsers. • It is a standard. • However, it governs how information is displayed in a browser with defined formats and defined tags.
The Difficulties XML faces • XML has some defined formats • But doesn’t have defined tags. • User defined tags • Unlimited types of data.
Solution (Partially) • Make the information self-explained. • You have to invent your own tags!
A Simple Example <person> <lastname>Fonship</lastname> <firstname>Michele</firstname> <gender>female</gender> <education type=“elementary”> <start-date>9/1980</start-date> <stop-date>5/1985</stop-date> <school>Badley school</school> </education> </person>
Tips about XML format • A tag is case sensitive • A starting tag must have a closing tag to match • All XML elements must be properly nested. • All XML documents must have a root element. • Attribute values must always be quoted.
Comments in XML • Comments in XML • The syntax for writing comments in XML is similar to that of HTML. • <!-- This is a comment --> • A sample XML file.
XML Element Naming • Names can contain letters, numbers, and other characters • Names must not start with a number or punctuation character • Names must not start with the letters xml (or XML or Xml ..) • Names cannot contain spaces.
Is it valid or not? <students> <one student> <first name>Rose</first name> <last name>Washington</last name> </one student> </students>
Element Content • An XML element is everything from (including) the element's start tag to (including) the element's end tag. • An element can have element content, mixed content, simple content, or empty content. An element can also have attributes.
Is this valid? <food> <vegetable></vegetable> <fruit>apple</fruit> </food>
Child Elements vs. Attributes <person sex="female"> <firstname>Anna</firstname> <lastname>Smith</lastname> </person> <person> <sex>female</sex> <firstname>Anna</firstname> <lastname>Smith</lastname> </person>