260 likes | 379 Views
STORAGE MANAGEMENT II. (1) Files on disk. (2) How an O/S locates files. (3) Data access. 1. Files on disk. Anything saved on disk is a “file” = collection of data. There are many different kinds of files, distinguished by the file extension e.g. .doc, .exe, .bmp, .jpg.
E N D
STORAGE MANAGEMENT II (1) Files on disk. (2) How an O/S locates files. (3) Data access.
1. Files on disk. Anything saved on disk is a “file” = collection of data. There are many different kinds of files, distinguished by the file extension e.g. .doc, .exe, .bmp, .jpg
Files on disk continued. Examples include: 1. Application packages .exe, .com 2. System files e.g. config.sys 3. O/S commands. .BAT.
Files on disk continued. 4. Data files: .txt, .doc, .dat. .grf. .pcx, .bmp.,.html, .bas, .cpp, .ppt. Notice that Windows can create file associations so that double clicking a file invokes a program e.g.?
Running programs. To run a program, we may double click its icon, select it from a menu, or use Windows explorer to find it and double click the .exe file. In each case, the operating system locates the file on auxiliary storage, and loads and runs it.
2. How does O/S locate the file? Q: But how does the O/S know where the file is on auxiliary storage? A: It uses the directory stored on the disk. The O/S performs a “Dir” operation, finds the file and 2 extra bits of information we don’t see….
O/S locating the files. Q: What are these bits of information? A: The track and sector number. Then the File Manager can tell the device manager to locate the file.
O/S locating the files. The device manager translates the track number and sector number into physical commands for the drive: F/M --> D/M --> Drive
O/S locating the files. The drive then does the following: 1. Search--spindle rotates disk to correct sector. 2. Seek--R/W head moves to correct track. 3. Transfer: Magnetic <> Electrical
3. Data access. It is important to distinguish 2 types of access: 1. File access (depends on storage): can be (a) sequential (e.g. Tape) or (b) random (e.g. disk / CD). This concerns how the R/W head gets to the START of the file.
Data access. 2. Record access. This concerns how the data is organized inside the file. Even if we are using a DASD device, we may have 2 main organizations:
Data access. 1. Sequential organization. Within the file, records are accessed one after another, possibly arranged in key order, but we do not jump directly to a given key. Also, there are entry-ordered files, such as point-of sale sales transactions.
Data access. Sequential organization (continued). Disadvantage? Slow if only want 1 record--suppose it is number 9000! When useful: for uniform operations on whole file e.g. payroll, grades, mailing lists.
Data access. Note that even if the sequential file is on a random device, we cannot randomly access its records because of its internal organization.
Data access. 2. Random organization. Allows individual records to be accessed directly. A random file can be stored on tape or disk, but we can only use random access if the file is stored on a random device.
Data access. Random organization (continued). Q: How can we get to a record directly? A: there needs to be a mechanism for calculating the position of an individual record.
Data access. 1. The file is organized into a number of sections (“buckets”) for records. Typically use a prime number e.g. 41, which gives sections 0 1 2 3 …… 40
Data access. 2. Each record is assigned a unique number based on its key e.g. Key Record number. ‘A’ 65 ‘B’ 66 ‘Z’ 90
Data access. 3. Assign each record to its own section, using a hashing algorithm: Section # = remainder of Record # divided by # of sections
Data access. E.g. where does the record with key ‘A’ belong? Section # = remainder of 65 / 41 = 24.
Data access. Where does the record with key ‘B’ belong? Section # = remainder of 66 / 41 = 25.
Data access. Where does the record with key ‘R’ belong? Section # = remainder of 82 / 41 = 0.
Data access. This gives the distribution: ‘R’ ‘S’ … ‘Z’ ‘A’ ‘B’ ... ‘Q’ 0 1 … 8 24 25 ... 40
Data access. Why use a prime number of sections? Suppose we used 40, and had record numbers 40, 45, 50, 55,…80, 85, 90, 95…. Then all records would be clustered in sections 0 5 10 15 …...
Data access. This creates 2 problems: (1) Collisions (2 or more records for the same spot) -- can handle by allowing a list of records in each bucket, but would then need sequential access within each bucket.
Data access. (2) Wasted space. We would never use 80% of the file sections. Ideally we want wide distribution and no conflicts.