210 likes | 442 Views
United Nations Regional Seminar on Census Data Archiving for Africa Addis Ababa, Ethiopia, 20-23 September 2011 Session 7 – Data storage, maintenance and security Presented by: Ayenika Godheart Mbiydzenyuy African Centre for Statistics (ACS)
E N D
United Nations Regional Seminar on Census Data Archiving for Africa Addis Ababa, Ethiopia, 20-23 September 2011 Session 7 – Data storage, maintenance and security Presented by: Ayenika Godheart Mbiydzenyuy African Centre for Statistics (ACS) UNECA Free Powerpoint Templates
Presentation Plan • Introduction • Strategies for data storage • Institutional back-up policy • Procedures to safeguard the security of the data • procedures for data transmission and encryption of the data • Conclusion
Introduction • Data storage is the holding of data in an electromagnetic form for access by a computer processor. There are two main kinds of storage: • (i) Primary storage is data that is held in random access memory (RAM) and other memory devices that are built into computers and • (ii) Secondary storage is data that is stored on external storage devices such as hard disks, tapes, CD's. • Data being stored is maintain by adding, deleting, changing and updating binary as well as high level files. • Data is maintained manually and/or through an automated programs, but the origination and translation/delivery point of the data must be translated into a binary representation for storage. • Data is usually edited at a slightly higher level in a format relevant to the content of the data (such as text, images, or scientific or financial information). • The data being maintained must be secured as a means of ensuring data safety from corruption permitting access in a suitably and controlled manner.
In an information environment, the success of any census archiving system is tightly coupled to its ability to store and manage information. Storage systems are a critical part of NSOs network infrastructure. With the amount of data growing at an incredible rate, during the census the storage strategy must keep pace. In designing a storage strategy for census data archiving, the choice of the right technology for the primary storage system, as well as a solid backup procedure that ensure system management must be guaranteed. The Need for Storage: A computer's main memory uses Dynamic RAM (DRAM): It stores data and provides almost instantaneous access to that data, but is limited in and is gone once the computer is turned off. Permanent storage holds the data and software that must be preserved even when the computer is powered down. Permanent storage needs can be immense. The present library of software applications can easily exceed many gigabytes, and the quantity of data can range in the terabytes. In designing a data storage strategy for census archiving in a network, the stakes are extremely high. 2. Strategies for data storage
A well constructed storage system should: Prevent data loss Offer adequate capacity that can easy scale as storage needs grow Provide fast access to data without interruptions Be prepared for equipment failures Use cost-effective technologies Storage Strategy Design Issues
Not all issues that have an impact on NSOs data strategy can be solved with technology. Individuals must follow sound practices with institutional data. Be sure that the users place their data within the supported data structures. Users cannot, for example, store vital data files on the local drives of their computers if the organization's storage strategy assumes that all data resides on network servers. End users are unlikely to perform frequent backups of their data or follow other procedures that ensure that institutional data are secure. An important part of NSOs storage strategy should involve the training of users to ensure the network storage facilities are secure and well-managed. Most backup software products will archive the contents of distributed computer hard drives; however, few of the other requirements of a well-managed storage strategy can be met with this approach. Data Policies
Several factors come into play when selecting storage options. Capacity Scalability Costs Performance Reliability Manageability Cost Analysis Risk Analysis Capacity Planning General Considerations
Backing up data is a basic precautionary step that everybody working with computers should take. Backup copies are an insurance policy against the possibility of your data being lost, damaged or destroyed. A reliable backup mechanism is indispensable for every institution engaged in digital preservation. Digital collections prepared with so much of effort and cost with an aim of long-term preservation must be made immune from all kinds of natural or man-made disasters. Moreover, the back-up methodology adopted must be such that it has long-term relevance and usage. It should not become obsolete or redundant after a small period of time because digital preservation technology has not been standardized or finalized yet. From the use of obsolete floppies to CD’s and DVD’s, and to Tape Drives, it has been changing so fast. Institutional back-up policy
A good backup policy will protect your data from a large range of mishaps. The range of events that you should consider when planning how to backup your data includes: Accidental changes to data Accidental deletion of data Loss of data due to media or software faults Virus infections and interference by hackers Catastrophic events (fire, flood etc.) A good backup policy should provide protection against all of these threats. Understanding Requirements for a Good Backup Policy
Backups should be made regularly to ensure that they remain up-to-date. The more frequently data is being changed the more frequently backups should be made. If your data is changing significantly every day you should consider a daily backup, but if you are prepared and can afford to redo a longer period of work then less frequent backup may be appropriate. As well as backing up frequently, you should keep several backup copies made at different dates. Doing this guard against the danger that your backup copy will incorporate a recent, but as yet undiscovered problem, from your working copy. Frequency of Backup
A backup copy may suffer the same mishaps as the working copy of your data, so it is a good idea to spread the risk by maintaining several backup copies. A minimum of two backup copies should be maintained in addition to your working copy of the data. Offsite Backups More serious events, such as a fire in the office, will destroy both the working copy of the data and any backup copies stored at the same location. Some backup copies should be stored 'offsite' (offsite is a relative term, dependent on the level of protection you want). Media Backup copies should be made on new media. Do not continue to use media once they start to develop faults. Specifically, floppy disks are not a good media for backup copies. If they are used, they should be replaced often. Multiple Formats Store backup copies in both the software formats that you are using and in exported formats (many spreadsheets and database packages can exported to delimited text for example). This will help protect you from subtle faults that can sometimes develop in complicated data formats (such as database file formats) that may not become apparent until after they have been included in both the working copy and the backup copies. Multiple Backup Copies
Census projects should never assume that their institution's policies will be appropriate to their needs. Always check. Institutions may maintain backups for a limited period Institutions may only provide backups to protect against complete loss of data, and not individual users losing data Institutions may not backup all data held on their network Many organizations advise their users to make their own backups of critical data. This is good advice and should be followed. Check Your Backup A backup that does not actually work is of no use at all. Always test your backup procedures to ensure that your backup can be retrieved and is useable. Institutional Backup Policy
Backup is not Preservation A backup copy is an exact copy of the version of the data you are working on. If your working copy becomes unusable, you should be able to start using your backup copy immediately, on the same computers, using the same software. In contrast, a preservation version of the data should be designed to mitigate the effects of rapid technology change that might otherwise make the data unusable within a few years. Some of the prevalent devices for storing back-up data are hard drives or disks in a computer, CDs, DVDs, Tape Drives, Tape Drives (DAT tape, DLT Tape, Zip and JAZ) and hard copies.
VIRTUAL TAPE LIBRARY - A VTL is an archival backup solution that combines traditional tape backup methodology (software or appliance based) with low-cost disk technology to create an optimized backup and recovery solution. NEAR-LINE DISK TARGET - A disk array that acts as a target or cache for tape backup. These arrays typically offer faster backup and recovery times when compared with tape and are cost effective because they're increasingly based on low cost Advanced Technology Attachment disk drives. Unlike virtual tape libraries, however, they typically require configuration and process changes to existing backup / recovery operations. CONTENT-ADDRESSED STORAGE (CAS) - A disk based storage system that uses the content of the data as a locator for the information, eliminating dependence on file system locators or volume/block/device descriptors to identify and locate specific data. MASSIVE ARRAY OF IDLE DISKS (MAID) - A disk system in which disks spin only when necessary (such as during read/write operations), reducing total power consumption and enabling massive high-capacity disk systems with comparable economics to tape libraries. OTHER TYPES OF BACKUP TECHNOLOGY TO CONSIDER:
SNAPSHOTS AND INCREMENTAL CAPTURE - A snapshot is a copy of a volume that is essentially empty but has pointers to existing files. When one of the files changes the snap volume creates a copy of the original file just before the new file is written to disk on the original volume. INCREMENTAL CAPTURE - Vendors in this category can replace existing backup technologies or co-exist with them. Incremental capture solutions can take snapshots at the block, file, or volume level. CONTINUOUS CAPTURE - This segment of the data-protection market includes software or appliances designed to capture every write made to primary storage and make a time-stamped copy on a secondary device. ARRAY-BASED REPLICATION - These products have been around for a long time and have traditionally come from large disk-array vendors such as EMC, Hitachi Data Systems, and IBM. These products run on high-end arrays and are very robust (and expensive). HOST-BASED REPLICATION - Host-based replication software runs on servers. As writes are made to one array, they are also written to a second array. Vendors in this category have eliminated many of the complexities in their products, making them easier to deploy and manage. FABRIC-BASED REPLICATION - The new debate raging in the storage industry revolves around the following question: "Where should storage services, or applications, reside—on hosts, arrays, or in the fabric on switches or appliances?"
Steps to Improving Data Safeguards Protecting data in dynamic and diverse environments is a formidable challenge. You need to focus on categorized data inventory, sharing mechanisms, and leak detection. The challenges of securing data in modern organizations are vast. First: Find and Understand Data To determine how to secure your data, first identify which records warrant protection, and where they reside. Finding the data typically involves interviews and the review of existing documentation. Expand your findings by scanning file servers within your organization for potentially sensitive records. Budget-strapped? Free data discovery tools that can get you started with this task include: Sensitive data plugins for Nessus Spider Firefly FindSSN and Find_CCN 4. Procedures to safeguard the security of the data
Second: Help Users Share and Store Data How will people exchange and store data securely? Don't expend your efforts on security controls without defining how people will share the sensitive data to get work done. Third: Detect the Data Leaks, to React Quickly Despite your best efforts, sensitive data may get exposed, often because of an oversight in storing, sharing, or securing them. Consider how you will detect the leak quickly to minimize the incident's scope and severity. The data discovery process, as well as a security assessment, can help discover data where they don't belong. In addition, make use of web search engines to identify potentially sensitive records accessible to the public over the internet. Keep an eye on public data breaches. Knowing what data breaches have occurred can help you understand the leading causes of the incidents, so you can adjust your security controls appropriately.
Data transmission refers to computer-mediated communication among system users, and also with other systems. The basic functions of using on-line information systems -entering data into a computer, displaying data from a computer, controlling the sequence of input-output transactions, with guidance for users throughout the process. In considering data transmission functions, we must adopt a broad perspective. Data that are transmitted via computer may include words and pictures as well as numbers. And the procedures for data transmission may take somewhat different forms for different system applications. Data might be transmitted by transferring a data file from one user to another, perhaps with an accompanying message to indicate that such a file transfer has been initiated. In some applications, computer-mediated data transmission may be a discrete, task-defined activity. Effective communication is of critical importance in systems where information handling requires coordination among groups of people. This will be true whether communication is mediated by computer or by other means. Procedures for data transmission and encryption of the data
In cryptography, encryption is the process of transforming information (referred to as plaintext) using an algorithm (called cipher) to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key. The result of the process is encrypted information (in cryptography, referred to as ciphertext). In many contexts, the word encryption also implicitly refers to the reverse process, decryption (e.g. “software for encryption” can typically also perform decryption), to make the encrypted information readable again (i.e. to make it unencrypted). Encryption has long been used by militaries and governments to facilitate secret communication. Encryption is now commonly used in protecting information within many kinds of civilian systems. Encryption, by itself, can protect the confidentiality of messages, but other techniques are still needed to protect the integrity and authenticity of a message; for example, verification of a message authentication code (MAC) or a digital signature. Standards and cryptographic software and hardware to perform encryption are widely available, but successfully using encryption to ensure security may be a challenging problem.
Cloud computing is a technology that uses the internet and central remote servers to maintain data and applications. Cloud computing allows consumers and businesses to use applications without installation and access their personal files at any computer with internet access. This technology allows for much more efficient computing by centralizing storage, memory, processing and bandwidth. Cloud computing is simply all about data: storing data securely, managing data effectively, accessing data efficiently, integrating data relative to needs, and using data analytics to improve business intelligence and enhancing decision making business processes. Sounds like a proper mouthful but, if you’re in business and accumulating data, you’re more than likely already doing that kind of thing already. It’s just a matter of how effectively you’re doing it, and whether Cloud Computing can offer you efficiencies of scale and cost. Conclusion