300 likes | 762 Views
VSAM. Virtual Storage Access Method Allows for interactive updates (adds, changes, and deletes). Three Types of VSAM files. KSDS Keyed Sequence Data Set RRDS Relative Record Data Set ESDS Entry Sequence Data Set. KSDS. Allows for users to access records sequentially or randomly
E N D
VSAM • Virtual Storage Access Method • Allows for interactive updates (adds, changes, and deletes)
Three Types of VSAM files • KSDS • Keyed Sequence Data Set • RRDS • Relative Record Data Set • ESDS • Entry Sequence Data Set
KSDS • Allows for users to access records sequentially or randomly • Includes an index and data components • A CLUSTER consists of both the index and data components together • Index relates a value in a key field to the actual location of the record on disk • Index is only used in random (or dynamic) processing
VSAM Data Component Storage Concepts • Control Interval (CI) • Fixed amount of storage; Must be multiple of 512 • Usually 2048 or 4096 bytes • Data records are stored in a CI • A CI of 4096 bytes can store forty 100-byte records • Control Areas (CA) • One cylinder in size • Size of a cylinder varies by disk drive • Contains numerous CIs
Approximate number of records in a CI (CI size - 10) / record length
Records in a CI • Calculate the number of records in a CI formula on previous slide • Determine the percentage of records in a CI to be added or insert in an average CI • Concept of FreeSpace • Space where records can be added “on the fly”
CI in a CA • A fixed number based on CI size and disk drive • For disk drive with ACD001: • CI of 2048: 315 CI in a CA • CI of 4096: 180 CI in a CA • Determine the percent of the CI to be left open for additions • Known as CA freespace
Freespace in a CI • Is used to add records which belong in the CI • Records in a CI are shifted automatically within the CI to accommodate the inserted record
CI Splits • If there is no freespace in the CI in which the record is to be inserted: • CI split • Half of the records in the CI are moved to a free CI in the same CA • The inserted record is then inserted in the proper CI • These happen routinely and are accomplished quickly
CA splits • If a CI needs to split and there is no free CI in the CA: CA split • Half of the CI are moved to a free CA (usually at the end of the file, there is unused space) • Therefore, each of the two CI have 50% free space • The original CI can now split • CA splits have much overhead and should be avoided!!!
Avoid CA splits • Reorganization of files • Import: Copy the VSAM file to a sequential dataset • Export: Delete and reload the VSAM file from the sequential data set: Resets the freespace as in the define cluster • Allocate adequate freespace • Analyze the primary key
Primary Key Analysis • Is the pattern in the PK • Example: The Financial Aid office must keep three years of data on-line. • Previous year: State reporting • Current year: Distributing aid to students • Next year: Granting/guaranteeing aid for next year
Primary Key Analysis • Key options for the financial aid file • 1-digit year + SSN • Little on-line activity on first third of file • Most “adds” are in last third: CI/CS split likely • SSN + 1 digit year • Activity is spread evenly throughout the file • Recommended
VSAM Index Component • Every KSDS as an index component for each primary key AND each foreign key • Base cluster: Primary key index and Data components • Foreign key index is known as the alternate index
Primary Key Index and Data • Two parts of the index: • Sequence set • Index set
Primary Key Index and Data • Sequence Set • Lowest level of the index component • Contains information that relates key values to a specific CI • Links the highest PK in a CI to the address of that CI • Stores all the key-address pairs for the Cis in a CA in one CI of the sequence set • There is a separate CI in the sequence set for each CA in the data
Primary Key Index and Data • Index Set • Highest level of the index component • Key-address pairs stored in one CI (can be stored in main memory for processing efficiency) • Address Pointer links to address of the appropriate sequence set • Based on size of data component (number of cylinders or CA needed to store data component), you may need a intermediate index
Alternate Index and Data Component • Alternate index relates alternate key to primary key • Then uses the primary key index to locate the data • Can have unique or non-unique AI • Figure 2-4 (unique) • Figure 2-5 (non-unique)
Alternate Index • Systems analyst determines whether to update the AI each time the cluster is changed • Can cause much overhead to update all indices especially in the case of a CA split • Other alternative is to re-build the index periodically (every night)
Relative Record Data Set (RRDS) • Lets the user access each record at random without the overhead of maintaining an index • Instead each record in a RRDS is numbered, starting with 1 for the first record • RRDS consists of a specified number of areas or slots • Known as the relative record number (RRN)
RRDS • May need a routine to convert the PK of a record to a relative record number • Hashing • Most common hashing routine: The remainder option on the divide • Can cause empty slots. Can waste storage; But we avoid CI/CA splits • Difficult to HASH if PK is non-numeric
RRDS • Collision • If hashing routine results is same RRN for two different PK • Must set secondary searching technique in case of collisions • Usually linear probing: check the next record up to a maximum number of tries. • Needed to know if record to be added already exists • Needed to determine if record to be retrieved exists without reading entire file
RRDS • Advantages • No index overhead • Direct relationship between data and location of the data • Permits both random and sequential processing • If good hashing routine with minimal collisions: performance efficiency is excellent
RRDS • Disadvantages • Storage efficiency • Collisions • Difficulty in determining good hashing technique • Difficulty with alphabetic key • Does NOT support the concept of FK or AI • Not widely used
ESDS • Entry Sequenced Data Set • The simplest type of VSAM file • Records are stored sequentially at time of entry
ESDS • Similar to sequential processing • Does allow to OPEN EXTEND to add records to the end of the file
ESDS • Author does not recommend it • Says it is restricted to sequential processing • BUT you can build an AI for a FK • BUT as of my current COBOL manuals, COBOL cannot use the AI and processing must be sequential. • This is my current topic for career day questions