1 / 31

Data Structures for Language Processing: Classification and Techniques

Explore linear and non-linear data structures based on nature, purpose, and lifetime in language processing. Learn about hash tables, search procedures, and memory management techniques such as rehashing and overflow chaining to optimize performance. Understand how collision handling methods like reference counts and garbage collection aid in efficient memory allocation.

marioni
Download Presentation

Data Structures for Language Processing: Classification and Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter - 2 Data strucuters for Language processing

  2. Classification 1. Based on nature ---- Linear and Non-linear eg :- Linear = array , stack etc. Non-Linear = Tree , Graph etc. 2. Based on Purpose --- Search or allocation eg :- Search = Binary search tree allocation = stacks,heaps 3.Based on Lifetime ---- whether used during Language Processing or during target program executions eg :- Lang. Processing = Object based data model Target program = Hash tables

  3. Search Data Structures

  4. Entry Format

  5. Fixed and Variable Length entries

  6. Generic Search Procedure

  7. Binary Search Organization • All entries in table satisfies the order relation • They keep entries according to order • Uses symbols like ‘<’ and ‘>’ • It uses algorithm to make entries represened in next slide

  8. Hash Tables

  9. H is the Hasing function. S is the symmbols for entry S(e) is current entry symbol

  10. Hashing Function • Hashing function is used to make search system faster. • It transforms the source symbol or group of symbols to numerical numbers to make faster comparisons and searching • Hashing do not change the original meaning of symbols it just transforms them to other form. • Size is pre decided for transforming message to particular format • If message is of less size than that size , it performs “folding” operation • In folding message is padded with 0’s to complete the size of it.

  11. Properties of good hashing func.

  12. examples of hash function 1. Multiplication of numbers h(s) will result some number by multiplying symbols with predefined number to generate various hash values 2. Division function , it will also divide the symbol by some pre defined number and convert the symbol to numeric value. 3.Exponential method , like all mathematical funciton this can also be used

  13. Collision in hashing Many function result into same number generation which leads to collision of numbers and searching will crash Thus to avoid collision we have varsious collision handling techniques 1. Rehasing technique 2. Overflow chaining technique

  14. Rehasing

  15. Overflow chaining • It avoids collision by creating new table for colliding entry in overflow table • Every entry contains a pointer for mapping with pointer poiting towards other entry in overflow table • Symbols associated with tables are chained together thus this technique is called chaining technique.

  16. Linked List

  17. Binary Tree • Each entry in tree is having two fields 1.left_pointer and 2. right_pointer • If s is symbol on left of tree then all the entries on left will be less then root node • And if s is on right then every S will be greater then root node/entry

  18. Nested Search

  19. Stacks

  20. Extended Stack model

  21. Heaps

  22. Memory managment • Due to repetition of allocation and deallocation of memory area holes are created in memory area. • Memory management takes care of this holes and reallocate this area by managing it properly • It increases performance and speed of allocation and deallocation of memory spaces

  23. Identifying free memory area Two popular techniques are :- 1. Reference counts 2.Garbage Colletion

  24. Reference count • It associates a reference count number with memory area to indicate number of active users • Number increases when user access memory and decreases when user releases memory area • Area is said to be free when number drops to zero

  25. Garbage collection • It makes two passes over memory to get unused areas • In first pass it traverse through all memory and marks every memory in use. • In second pass all unmarked areas are declared as free memory areas. • They are called every time system runs out of memory to allocate new memory area.

More Related