Data Structures

Data Structures Random Access Files

Learning Objectives • Explain RandomAccess Searches. • Explain the purpose and operation of HashingAlgorithms.

Access Methods to Data • Computers can store large volumes of data. • The difficulty is to be able to get it back. • In order to be able to retrieve data it must be stored in some sort of order. • There are a number of ways of arranging the data that will aid access under different circumstances.

Random Access • Is the ability to find (jump to) a file, program or specific data immediately without having to go through other files or data first (sequential access). • Think of the difference between finding and playing a song/track/movie on an old cassette or video tape versus a CD, DVD or mp3 player.

Random Access File • Data is stored in no particular order. • A “hashing algorithm” is performed on the key field of the record to be stored or retrieved. • This results in a number (called the hashed location) which is used as the address to store or retrieve the record. • How this is done will be explained next.

Hashing using Modular Arithmetic • Maximum: 100 items of data – a four-digit key: • (1537/100 = 15, remainder 37) • 1537 will be stored at location 37 • Same key for approximately 200 items of data: • (37 * 3 = 111) • 1537 will be stored at the hashed location 111

Hashing using Folding • The number 8473772 could be split into 847 and 377. • If you add them together you get: 1224. • For a maximum of 100 items of data, you would take the last two digits: 24 • 847377 will be stored at location 24 • Same number for approximately 200 items of data: • (24 * 3 = 52) • 847377 will be stored at the hashed location 52

Clashes / Collisions • Some ID numbers will clash to the same address. • 1537 • 1537 / 100 = … remainder 37 • 1837 / 100 = … remainder 37

Overcoming the problem of clashes / collisions:

1. Search serially • Search serially from the hashed location until an empty location is found. • Then insert the clashed record into this empty location.

Hashed location ……….. ……….. ……….. ……….. ……….. ……….. ……….. Search for next free location. Memory Clashing record inserted. Next free location Note: When trying to find the clashing record again its location is unknown (the computer only knows that it is somewhere after the hashed location).

2. Memory bucket / Overflow Area • Reserve an “overflow area” of memory or “memory bucket” to place duplicates in serial form (one after the other). • Create a pointer to this “memory bucket” or “overflow area” from the hashed location.

Hashed location Memory Pointer from hashed location to the “memory bucket”. Memory Bucket or Overflow Area ……….. ……….. ……….. ……….. ……….. ……….. ……….. Clashing record inserted serially (one after the other) at the next free location in the “memory bucket”. Note: When trying to find the clashing record again, its exact location is unknown (the computer only knows that it is in the “memory bucket” somewhere).

3. Linked List • Use the hashed location as start of linked list, search serially through the memory from this hashed location for the next free location and store the clashed record there. • Add a pointer to the hashed location to the new location used above. • Create a null pointer in the new location used above to signify the end of the list. • Subsequent clashes will simply extend this linked list.

Hashed location ……….. ……….. ……….. ……….. ……….. ……….. ……….. Search for next free location. Pointer to clashing record. Memory Clashing record inserted. Next free location / Null Pointer (XX) Note: Subsequent clashes will simply extend this linked list. When trying to find the clashing record again its exact location is known using this method.

In Summary: • Records in a random access file are accessed using a hashing algorithm by: • Reading the key field. • Applying a hashing algorithm to the key field to give the address of the data. • Looking for data at that address (whilst been aware of problems caused by clashes).

Plenary • Explain RandomAccess Searches.

Plenary • RandomAccess • The data being searched for is used to give the address of where it is stored.

Plenary • What is the purpose and operation of HashingAlgorithms? • Allow data being searched for in a random access file to be used to give the address of where it is stored. • This is done by carrying out some arithmetic on the data that is being searched for.

Data Structures

Data Structures

Presentation Transcript

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

DATA STRUCTURES

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures