250 likes | 431 Views
11 . Indexing and Hashing. 서울시립대학교 전자전기컴퓨터공학부 김한준. Indexing and Hashing. Used for reducing access (search) time ordered index hash index Criteria 1. access type 2. access time 3. insertion time 4. deletion time 5. space overhead Search key
E N D
11. Indexing and Hashing 서울시립대학교 전자전기컴퓨터공학부 김한준
Indexing and Hashing • Used for reducing access (search) time • ordered index • hash index • Criteria 1. access type 2. access time 3. insertion time 4. deletion time 5. space overhead • Search key • the attribute or set of attributes used to look up records in a file ( table)
Ordered Index • Primary Index • the index whose search key specifies the sequential order of the file • dense index / sparse index • multilevel index • secondary Index • primary 이외의 index • need buckets (pointers to records with same key value) • for sparse index • for sequential scan
Brighton A-217 750 Downtown A-101 500 Downtown A-110 600 Mianus A-215 700 Perryridge A-102 400 Perryridge A-201 900 Perryridge A-218 700 Redwood A-222 700 Round Hill A-305 350 Brighton Downtown Mianus Perryridge Redwood Round Hill Dense Index
Brighton Mianus Redwood Brighton A-217 750 Downtown A-101 500 Downtown A-110 600 Mianus A-215 700 Perryridge A-102 400 Perryridge A-201 900 Perryridge A-218 700 Redwood A-222 700 Round Hill A-305 350 Sparse Index
• • • • • • • • • 데이터 블록 0 인덱스 블록 0 인덱스 블록 1 • • • 외부 인덱스 내부 인덱스 Multilevel Index 데이터 블록 1
Brighton A-217 750 Downtown A-101 500 Downtown A-110 600 Mianus A-215 700 Perryridge A-102 400 Perryridge A-201 900 Perryridge A-218 700 Redwood A-222 700 Round Hill A-305 350 350 400 500 600 700 750 900 Secondary Index
root leaf B+ - Tree Index Files • balanced • at least half full • B+의 한node는 한 page (block)
B+tree의 성질 • Root로 부터 leaf 까지의 모든 path는 같은 length를 가짐 • Root 또는 leaf가 아닌 각 node는 2/n ~ n의 child node를 가짐 • Leaf node는 (n-1)/2 ~ n-1 값을 갖는다. • Root가 leaf가 아니면, 최소한 2개의 child를 가짐. • Root가가 lead이면, 0 ~ (n-1) 값을 가질 수 있다.
Perryridge Mianus Redwood Perryridge Mianus Redwood Round Hill Brighton Downtown B+tree의 예 (1) account 파일에 대한 B+ - 트리 (n = 3)
Perryridge Brighton Downtown Mianus Perryridge Redwood Round Hill B+tree의 예 (2) • leaf node는 2 ~ 4개의 값을 가짐. • n = 5 (5-1)/2 ~ 5-1. • root 이외의 leaf이 아닌 node는 3 ~ 5개의 child를 가짐. • (n = 5이므로 5/2 ~ 5). • root는 적어도 2개의 child를 가짐 account 파일에 대한 B+ - 트리 (n = 5)
Perryridge Perryridge Mianus Redwood Downtown Mianus Redwood Perryridge Mianus Mianus Redwood Round Hill Brighton Downtown Brighton Clearview Redwood Round Hill Downtown Perryridge B+tree의 insert “Clearview” insert 전/후
Perryridge Mianus Redwood Perryridge Downtown Redwood Downtown Mianus Perryridge Mianus Redwood Round Hill Redwood Round Hill Brighton Clearview Brighton Clearview B+tree의 delete account에서 “Downtown”을 삭제 “Downtown” 대신 “Perryridge”의 삭제
Hashing • h(k) ⇒ B • A good hash function • has uniform distribution (among expected values) • has random distribution (among expected values) • Static hash functions • hash function이 구축 시 결정됨 • Dynamic hash functions • hash function이 data insert/delete 하면서 변화 • ex) Extendible hashing, bucket address table
버켓 0 버켓 5 버켓 1 버켓 6 Brighton A-217 750 Round Hill A-305 350 Perryridge A-102 400 Perryridge A-201 900 Perryridge A-218 700 Downtown A-102 400 Downtown A-110 6900 버켓 2 버켓 7 Mianus A-215 700 버켓 3 버켓 8 버켓 4 버켓 9 Redwood A-222 700 Hash file의 예
버켓 0 Brighton A-217 750 Downtown A-101 500 Downtown A-110 600 Mianus A-215 700 Perryridge A-102 400 Perryridge A-201 900 Perryridge A-218 700 Redwood A-222 700 Round Hill A-305 350 버켓 1 A-215 A-305 A-101 A-110 A-217 A-102 A-218 A-222 A-201 버켓 2 버켓 3 버켓 4 버켓 5 버켓 6 Hash Index
i1 i • • 00 bucket 1 • • 01 i2 • • 10 • • 11 bucket 2 i3 bucket 3 Dynamic Hashing • i1= i – 1. i2 = i3 = i Hash prefix • • • Bucket address table • • •
Hash prefix 0 0 Bucket address table bucket 1 Initial extendable hash structure (bucket size = 2 ) Dynamic Hashing: hash function branch-name h( branch-name) Brighton 0010 1101 1111 1011 0010 1100 0011 0000 Downtown 1010 0011 1010 0000 1100 0110 1001 1111 Mianus 1100 0111 1110 1101 1011 1111 0011 1010 Perryridge 1111 0001 0010 0100 1001 0011 0110 1101 Redwood 0011 0101 1010 0110 1100 1001 1110 1011 Round Hill 1101 1000 0011 1111 1001 1100 0000 0001
1 Brighton A-217 750 2 2 Downtown A-101 500 Downtown A-110 600 3 Mianus A-215 700 Dynamic Hashing Hash prefix Bucket address table Hash structure after four insertions
1 Brighton A-217 750 Redwood A-222 700 3 2 Downtown A-101 500 Downtown A-110 600 3 Mianus A-215 700 Round Hill A-305 350 Perryridge A-102 400 Perryridge A-201 900 Perryridge A-218 700 3 3 Dynamic Hashing Hash prefix Bucket address table Hash structure after seven insertions
Hashing vs. Ordered Indexing • Characteristics • hash는 range query 불가 select * from student where name = ‘홍길동’ : hashing select * from student where name > ‘홍길동’ : indexing • fast access (hash) • Insertion time • hash는 초기에는 빠르나, 갈수록 collision등으로 느려짐 • B+ tree : worst, average time 보장 • hash는 inherently secondary index • few DBMS support hashing for user data • hashing is used mainly for internal use
Index Definition in SQL • not part of SQL standard • not required for correctness • not required for expressiveness • Most DBMS provide index definition syntax create index <index-name> on <relation-name> (<attr-list>)
Multi-key Access 가정: branch_name, balance에 index설정 where branch-name = “Perryridge”andbalance = 1000 각 index 사용 비효율적 where branch-name = “Perryridge”and balance < 1000 where branch-name <“Perryridge” and balance = 1000
Bi 4 3 2 1 4 3 2 1 0 Townsend Perryridge Mianus Central Bj Linear scale for branch_name 0 1 2 3 4 5 6 Grid Array Buckets 1K 2K 5K 10K 50K 100K 1 2 3 4 5 6 Grid file Linear scale for balance
Partitioned Hash • 예 : Partitioned hash function for key (customer-street, customer-city) Search-key value hash value (Main, Harrison) 101 111 (Main, Brooklyn) 101 001 (Park, Palo Alto) 010 010 (Spring, Brooklyn) 001 001 (Alma, Palo Alto) 110 010