300 likes | 366 Views
Data storage, backup & security. GRAD 521, Research Data Management Winter 2014 – Lecture 7 Amanda L. Whitmire, Asst. Professor. Follow-up from last class. What is a reasonable timeline for DCP?. Overview for today. Why? Where to store data Local drive | network drive | cloud
E N D
Data storage, backup & security GRAD 521, Research Data Management Winter 2014 – Lecture 7 Amanda L. Whitmire, Asst. Professor
Follow-up from last class • What is a reasonable timeline for DCP?
Overview for today Why? Where to store data Local drive | network drive | cloud Consider: capacity & access by co-workers Data backup Disaster recovery (research continuity) Data security Corruption or loss (hardware failure or data deletion) Confidentiality (personal or intellectual property)
Why data storage, backup & securityare important “Your data are the life blood of your research. If you lose your data recovery could be slow, costly or even worse… it could be impossible.”
This happens a lot: physical theft & unintentional damage Cute, but not a valid security plan.
Rare, unexpected events happen University of Southampton, School of Electronics and Computer Science, Southampton, UK, 2005
Data storage options Personal computers (PCs) & laptops External storage devices Networked Drives Cloud servers
Storage: PC/laptop Advantages Convenient Disadvantages Drive failure common Laptops: susceptible to theft & unintentional damage Not replicated Bottom Line Do NOT use to store master copies of data Not a long term storage solution Back up important data & files regularly
Storage: external storage devices Advantages Convenient, cheap & portable Disadvantages Longevity not guaranteed (e.g. Zip disks) Errors writing to CD/DVD are common Easily damaged, misplaced or lost (=security risk) May not be big enough to hold all data; multiple drives needed Bottom Line Do NOT use to store master copies of data Not recommended for long-term storage
Storage: networked drives Advantages Data in single place, backed up regularly Replicated storage not vulnerable to loss due to hardware failure Secure storage minimizes risk of loss, theft, unauthorized use Available as needed (assuming network avail.) Disadvantages Cost may be prohibitive; export control Bottom Line Highly recommended for master copies of data Recommended for long-term storage (~5 years)
Storage: cloud storage Advantages Data in single place, backed up regularly Replicated storage not vulnerable to loss due to hardware failure Secure storage minimizes risk of loss, theft, unauthorized use Disadvantages Cost may be prohibitive Upload/download bottleneck & fees Longevity? Export control Bottom Line Possibly recommended for master copies of data Not recommended for in-process data, large files
Storage: Google Drive for OSU Advantages All same advantages of network & cloud storage File sharing & collaboration w/variable access levels Unlimited storage (GD), 30 GB non-GD Automatic version control on GD Disadvantages 30 GB may not be enough Upload/download bottleneck Bottom Line Possibly recommended for master copies of data Possibly not recommended for in-process data, large files
? ? ? ?
Data backup “Keeping backups is probably your most important data management task.” -Everyone
Data backup Best Practice: 3 Copies of datasets
Backups: full Advantages Data can be easily & fully restored from a recent full backup Disadvantages Time consuming Take up the most storage Bottom Line Recommended for master copies of data Frequency depends on data size & mutability
Backups: differential Advantages Data can be easily & fully restored from a full backup + 1 differential backup Disadvantages Size of each differential backup increases each time Backup window increases each time Bottom Line Frequency depends on data size & mutability
Backups: incremental Advantages Smallest file size between backups (full or incremental) Shortest backup window Disadvantages When you need to restore data, the full backup +all incremental backups are required = more difficult restore scenario Bottom Line Frequency depends on data size & mutability
Backups: bottom line Pick a strategy Be consistent Test your approach!
Data security “Data security is the means of ensuring that research data are kept safe from corruption and that access is suitably controlled.”
Data security • It is important to consider the security of your data to prevent: • Accidental or malicious damage/modification to data • Theft of valuable data • Breach of confidentiality agreements and privacy laws • Premature release of data, which can void intellectual property claims • Release before data have been checked for accuracy and authenticity
Data security • There are different levels of security to consider for your research data: • Access: This refers to the mechanisms for limiting the availability of your data • Systems: This covers protecting your hardware and software systems • Data Integrity: This refers to the mechanisms for ensuring that your data is not manipulated in an unauthorized way
Data security: access • Limit the availability of your data: • ID/Password: Step 1, for everyone really • Role-based access: limited privileges/permissions to data depending on user • Wireless devices: lack anti-virus software and firewalls; vulnerable to theft & theft of device • Use a PIN; limit storage of sensitive data on device
Data security: systems • Protect your hardware & software systems: • Anti-virus software: required of all OSU computers • OS & media software: keep them up to date • Firewalls: block unwanted network traffic from reaching your computer or server(e.g. typical home router) • Intrusion detection software: detects & alerts, does not prevent • Physical access: locked office; password on wake; cable lock for laptops;
Data security: data integrity • Protect the integrity of your data @ file-level: • Encryption: the process of converting data into an unreadable code. You must have access to a password or a secret encryption key to be able to read an encrypted file. Check with OSU Data Security team for advice (no “one size fits all” solution). • Electronic signatures: meant to ensure the authenticity of the signer and by extension, the document; now carry legal significance • Watermarking: embeds a digital marker for authorship verification and can alert someone of alterations made to data files; most often w/images & media
? ? ? ?
Exercise Complete the ‘Data Storage, Backup & Security Checklist’