230 likes | 353 Views
16 November 2011 Biologists Meeting. Data Management and Manipulations: The Good, the Bad and the Fuhgeddaboudit !. Lisa Reed Center for Vector Biology Rutgers University. What this talk is about. Data Management Why is it important? How to do it effectively How to protect your data
E N D
16 November 2011 Biologists Meeting Data Management and Manipulations: The Good, the Bad and the Fuhgeddaboudit ! Lisa Reed Center for Vector Biology Rutgers University
What this talk is about • Data Management • Why is it important? • How to do it effectively • How to protect your data • Data Manipulations • What is it and what can it do for me? • Pivot and Graphs • Connectivity and Reports
Data Management • Broad definition • Data Resource Management is the development and execution of architectures, policies, practices and procedures that properly manage the full data lifecycle needs of an enterprise -Wikipedia • How you keep data in a clear, concise and safe manner that anyone can retrieve at a later date without confusion or frustration. • Data structure • Meta data • Safe keeping and retrieval
Data Structure • Data structure is a “way of storing and organizing data [in a computer] so that it can be used efficiently.” (Wikipedia) • Should be easy to understand (for example, by date) • How is data stored? • Records: One line of data arranged by “fields” or “variables.” • Arrays: A set of data arranged by position to denote variables • How does Excel do it? • Records
Excel Navigation – The Top Office Button Column Letter Ribbon Formula Bar Split Pane Name Box Active Cell Row Number Multiple Tabs
Variables into Columns • First Row are Variable Names • Use names that make sense • Code • Format (date) A right mouse click on either column or cell > Format cells…
Data Structure Expandable in both directions to 16,348 columns and 1,048,576 rows
Things to Consider in Your Dataset • Do I use Zero? • Pro = you fill in each potential place setting with a value. • Pro = average calculate correctly! • Con = Can take a long time (but there is a short cut) • Con = Each entry is a possible data error • Do you have a way to distinguish when the trap is out of commission? • Use of period or other non-numerical value
Calculations with and without Zero • Blanks are NOT considered zero. • Sums are counted correctly (can define average through a pivot table). • Can replace blanks with zeroes. Highlight>Find and Replace>Go To Special>Select Blanks>type 0>Control-Enter
Meta Data • Meta = “About” • What are the data elements? • Descriptions about how the data is organized? • Coding and their meanings • Everything someone needs to know in order to use the dataset • Best place to put this information is in the dataset itself • Excel provides an good format for doing so.
Saving Your Data • Save it. Save it Often. Save it with AutoSave. • Back it up. • Keep a copy in a safe place. • Keep a copy in a separate place. • Keep a copy in the cloud. “We have recently seen some really nasty malware on several computers that simply was not removable by numerous sophisticated software tools at our disposal. The only safe remedy was to completely rewrite the machines from scratch.”
Data Manipulations • Exploratory Data Analysis vs. Confirmatory Data Analysis • What is your data’s story? • Hand entry – Know your data. • Summarizing data • Tables • Graphs • Pivot Tables • Graphs • Into a Document
Pivot Table 1 • Highlight Data (hold down shift key and arrow down, then arrow across) • INSERT Pivot Table
Pivot 2 • A new sheet is created • Can be modified • Lists variables in your dataset • Has place for column, row, value and filter
Drag Variables to Label, Value and Filters • Drag Species to Row Labels and it will place the species names, one to each row • Drag Results to Column Label • Drag Pathogen to Filter • Drag Moscount to Values
Pivot Example Excel
Graphing Your Data • Highlight your data. • Apply a graph template. The greatest value of a picture is when it forces us to notice what we never expected to see. –Tukey 1977
Live Links from One Document to Another • With “hot” links, your information will always update. • Saves time • Preserves method • You can also paste a static picture that is not linked to data. • Click on graph<Copy<Go to Document<Paste Special<Paste Link<Excel Object • Graph will be shown and can be updated with a right click<Update.
Conclusions • Data Management will provide continuity through data structure, meta data and data preservation. • Exploring data can provide insight and new questions. • You can link these explorations into different documents. Questions?
16 November 2011 Biologists Meeting Data Management and Manipulations: The Good, the Bad and the Fuhgeddaboudit ! Lisa Reed Center for Vector Biology Rutgers University