520 likes | 542 Views
Documenting PGR . A short summary about Theories and Practical work. How do you store your data? . In MS Excel spreadsheet? Handwritten cards? In a database (like MS Access)? In you head? In another database format?. What is a database?.
E N D
Documenting PGR A short summary about Theories and Practical work
How do you store your data? • In MS Excel spreadsheet? • Handwritten cards? • In a database (like MS Access)? • In you head? • In another database format?
What is a database? • Databases are designed to offer an organized mechanism for storing, managing and retrieving information. They do so through the use of tables. If you’re familiar with spreadsheets like Microsoft Excel, you’re probably already accustomed to storing data in tabular form. It’s not much of a stretch to make the leap from spreadsheets to databases. • The ability to link tables is a very powerful resource
Retrieve all records that match certain criteria Update records in bulk Cross-reference records in different tables Perform complex aggregate calculations So why not use a spreadsheet? Databases are actually much more powerful than spreadsheets in the way you’re able to manipulate data . Just a few actions you can perform on a database:
What is a database? I have data! Me too! In other words: A well-organized set of interrelated data held in one or more files which are capable of being managed by a software. Me too! Me too! Me also! Me too! Me too!
Step by step • Analysis • of gene bank activities to determine information and documentation needs • Design • of the manual and/or computerized system based on documentation and information needs • Implementation • of the system that has been developed
Data • Quantitative • Qualitative Qualitative, description of the object being examined. Blue, horizontal .. Quantitative, dealing with numbers 94 g, 34 mm, 67% You can also store pictures, links url-addresses, maps, scanned material..
A relation can be seen as a table that stores information. Attributes or columns Person Rows Records
But every table is not an relation. • There can not exist two rows in a table that are exactly the same • Every column in the table has a unique name • There is only one value in every cell. • All values in one column belongs to the same domain or are undefined (null)
Redundant data Data in just one table repeat information… Students taking courses in a school What if Peter wants to take a course in English?
Example: Students taking courses in a University How do I store information if Peter wants to take a course in English?
Primary key • A primary key can identify the data row • It must be unique • Cant be null • Can be a ID The easiest way is to let the program hold track of the rows and automatically increase the key. Remember the primary key must be unique!
Table of accessions, data of seedsample The id’s are the primary key, because they identify the rest of the rows
But how store data in a good and reliable way? • Just to put everything in a file, in one big table is not a good idea. • Use a relational database (think of the db as several tables that are connected to each other) • Normalize it! • Split up information into different tables and give the record an identifier! • Save related information in one table. (For instance all personal data in a table called “person”) • Link different tables together so you can retrieve all information you need
Normalization • Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: • eliminating redundant data (for example, storing the same data in more than one table) and • ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.
This database is poorly normalized. In fact, it is not normalized at all. Everything is stored in one big table. Imagine how many things you have to repeat if you test the same accession three times!
Person Information about the institute where people work is repeated several times! Imagine how boring it will be to add the same information for all staff members in NordGen!
Person Much better! If the Institute changes telephone number you just have to change it once! Institutes
Relational Database Managers By linking files or information together, a relationship is produced betweenthe tables. Theshared field is only stored once, it is not duplicated in each file
You often visualize a table like this: person person id name adress zipcode/city country title inst_id
another example… Fields which have a logical relation to each other and to the identifying field, should be grouped together in a relation. Seed info Accession_id Accession name Species Date of acquisition DonorInstituteName Street Address City Address Other fields… This is not a good example! If the gene bank receives another accession at a later date , the entire address would need to be entered again in a different record. It would be better to store information about the donors in a separate table.
… this would be better Donor Seed info AccDon_id donPer_id donInst_id donAccname Don_info Accession_id Accession name Species Date of acquisition Amount of seeds AccDon_id Other fields… Information about the donated material, from which institute, person who donated it, what the accession has been called in the donating institute and other information about the donated material
…yet another example Storage Passport Registration Accession name Scientific name Original country Other fields… Accession name Collecting institute Collector Date of collection Other fields… Accession name Freezer_number Box_number Storage_date Other fields… What happens if you have to change the name? You have to change it in all three otherwise data will be lost. The best thing is to store a link to the accession and store all the information about the accession elsewhere.
Storage Passport Accession Accession_id Accession name Scientific name Original country Other fields… Accession_id Collecting institute Collector Date of collection Other fields… Accession_id Freezer_number Box_number Storage_date Other fields… If you misspelled the accession name, you just have to change it once
Visualized in a ER-model (Entity-Relationship model) it looks like this: Seed sending info Gene Bank 1:N Our institute (Gene Bank) can receive many seeds (with seed sending information)
Visualized in a ER-model (Entity-Relationship model) it looks like this: One species can have many accessions. One accession belongs to just one species One teacher can have up to five students to teach species accessions 1 * 1 teacher 0..5 students
Different categories for data modeling Object oriented model Relational model
Data modeling- how to get started • To start – find the object • Let the object become a relation • Each object or relation should have its own information and descriptive attributes • The object becomes the table, the attributes columns and the identifying attribute becomes the primary key • An object could be • Persons • Places • Objects (like Accessions, institutes…) • Events (like seed drying, germination tests..) Things you want to store information about
Data modeling • A data model makes it easier to understand the meaning of the data • A common and popular data model used in database design is based on the concepts of the Entity – Relationship (ER) model • Conceptual • logical structure on databases. Designed by experts, information users, those who know and are familiar with the information. Identification of important entity and relationship types • Physical structure • designed by experts, database developers. How the logical structure is to be physically implemented (as tables) on the target DBMS
Data modeling: • Should represent reality and the information that need to be stored • Help us to visualize our task and clarify rules and restrictions and relations between objects • Gives the staff a chance to participate early in the development • Is a established way of developing information systems • Is a way to document and explain the IT-system
Accession number Scientific name Pedigree name Donor name Acquisition name And more… Collecting organization Collecting date Country of collection And more… Accession and collecting information Start to gather related information into tables and link them together
A model of accession descriptors and collecting descriptors: Accession Collecting Accession_id Accession name Scientific name Cultivar name Donor Acquisition date More… Accession_id Collection organization Collecting date Country of collecting Province/state Location of collecting site Type of sample… The accession relation gather all the information about the accession. The collecting relation describe all the information about the specific collecting event
Example of relations Collect collect_id accession_id person_id latitude longitude site_name country_id 1 Accession accession_id taxon_id accession_name country_id acc_mandate 1 Taxon taxon_id sci_name eng_name mandate grin_no thgw 1 1 N Seedstore batch_id accession_id batch_no collection box_no harvest_year N 1 germin_test test_id accession_id batch_id grm_pct test_date person_id N N
Wow Instead of repeating a lot of information, You store the accession once and link to other tables with other information! By linking files or information together, a relationship is produced betweenthe tables. Smart! And effective!
Part 2 Maybe a short brake?
DBMS A software system that enables users to define, create, maintain and control access to the database. • The DBMS is the software that interacts with the users’ application programs and the database. • Typically a DBMS provides the following facilities…
It allows users to define the database with types and structures and the constraints on the data to be stored in the database. (DDL) • It allows users to insert, update, delete and retrieve data from the database. (DML) • It provides controlled access to the database.
Example of DML and DDL • Data manipulating language (DML) • Select * from Person; • Insert into Person (Name, Address, tel) values (‘John Book’, ‘California’, 123 45) • Data Definition language (DDL) (Changes in structure) • Create table Person ( Name varchar(40) NotNull, Address char(10), tel int)
DBMS environment Data Procedures People Hardware Software Bridge Human Machine
Users with software and questions Web site, For instance SESTO. Question: List all accessions User1 User2 DBMS Interpretation of question Data base language (SQL) DDL & DML Data bases with information and meta data
Features in SESTO e.g. project archive and pictures archive Entry levels with seed store data: Genus, Taxons, Cultivars and Accessions. To list all Taxons in your gene bank, click the button and information about the taxons and their accessions will be presented.
To see the accession list of this taxon (Arachis hypogaea) click the [select]-link
To see detailed information about this accession click [select]
The accession name SWZ1 from Zea mays gives this information To see all the information from stored material click this link
To edit accession information and storage information, click the [edit] links and a pop up window will be presented Two batches with 10 distribution bags are stored in the Active Storage in freezer 15 in boxes 10 and 8.
To add and edit information about an accession. The information goes to the database and will be stored there
Reliable Retrieve information fast User-friendly Flexible, should anticipate changes A documentation system should be…
Things you will not regret! • Be organized and structured from the start! • Be sure you have a good data model Spend some time on normalization, it will be worth it later • Try to cooperate and work together with the documentation working group • (Documentation is far to important to be left alone with the IT-staff)
Web services • Web services are standardized programs that send your data to databases automatically. • Your data will be published elsewhere • This makes your work recognized • And will give your gene bank credit and your material will be asked for • GBIF, Biocase Have you heard about the new dataset…
Last but not least… Data is not information. • Information must be retrieved and understood. • Then knowledge is created. "Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?" T.S.Eliot (Where is the information we have lost in data?)