170 likes | 309 Views
Final Presentation. Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz , Dr . Haim Mizrahi Academic coordinator: Prof. Michael Elad Students : Eyal Sharabi Horwitz , Shiran Cohen. Project Objectives.
E N D
Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. MiriRabinovitz, Dr. Haim Mizrahi Academic coordinator: Prof. Michael Elad Students: EyalSharabiHorwitz, ShiranCohen
Project Objectives • This project is part of an overall development of an organizational Wiki meant for sharing information within the organization. • Our project’s objective is to serve as an automatic tagging tool for key phrases, based on an organizational taxonomy. The project is composed of two separate modules – a service module and the GUI module • The Objectives of the Service Module: • Identifying key phrases that relate to an organizational taxonomy in an unstructured text. • Develop and implement algorithms to identify and extract new key phrases from a given document.
Project Objectives – cont. • The Objectives of the Service Module – cont. • Present the findings in an excel file to allow future analysis of the key phrases found by the automatic tagging tool. • The Objectives of the GUI Module: • Design an Interface that enables the user to analyze the key phrases found by the automatic tagging tool: • Insert a new key phrase into the taxonomy. • Delete a key phrase suggested by the automatic tagging tool. • Edit the text of a key phrase suggested by the automatic tagging tool before adding it to the taxonomy. • Present the rationale that lead to the finding of a key phrase by the service module, • Allow the user to add new key phrases to the taxonomy
Methodology • In depth understanding of the morphology analyzed documents and taxonomy and using this information in the different tagging algorithms. • Literature survey used for developing algorithms to present new key phrases to the user from a given document: • Frequency based tagging algorithm – checks how frequent a key phrase appear in a given document and in the whole corpus. • Location based tagging algorithm – gives a score to a key phrase based on it’s distance from the beginning and end of the document and it’s life span in the document. • Noun tagging algorithm – gives higher score to key phrases with multiple nouns. • Microsoft’s .NetWinForms API was used to create the GUI. • Access DB was used to save the information about the key phrases used by the different algorithms, and to save the updated taxonomy.
Achievements • The Service Module • Implementing an algorithm for identifying key phrases from the taxonomy in a given text. Using an advanced screening process of similar key phrases. • Implementing several tagging algorithms used to suggest new key phrases to the user. • Frequency, location and noun based tagging (presented in the methodology section) • Foreign language tagging – tagging the foreign language phrases in the text • Flexibility: • GUI-Process separation to allow portability and usage with various systems • Expansion of the taxonomy to effectively unlimited size • New tagging algorithms can be added easily to the process.
Achievements – cont. • The GUI • An Interface was created to enable the user to analyze the key phrases found by the automatic tagging tool: • Insert a new key phrase into the taxonomy – adding the new key phrase under an existing main subject and secondary subject in the taxonomy hierarchy or adding new ones. • Delete a key phrase. • Edit the text of a key phrase suggested by the automatic tagging tool before adding it to the taxonomy. • Present the rationale that lead to the finding of a key phrase by the service module, • Allow the user to add new key phrases to the taxonomy by marking the desired text in the document.
Achievements – cont. • The GUI – cont. • Algorithm selection window: • used to select the different algorithms to be used in order to find new key phrases in a given text. • Allows to manage the parameters of the different algorithms, to give different weights to different algorithms and different weights to phrases of different size in order to give preference in the tagging process to phrases of a certain size. • Saving the findings for future use and analysis: • Enables the user to save the current taxonomy into the DB for future use in other documents • Enables the user to save the current taxonomy and the new key phrase found by the automatic tagging tool to an excel file for future analysis
Achievements – cont. • Documentation provided • User’s manual • Developers’ guide • Inline documentation of the code
Example of the tagging process A new document was loaded to the automatic tagging tool Pressing file->Open Loads a new file to the automatic tagging tool
By pressing the “Initiate Tagging” button the tagging process begins. here presented are the tagging results of the taxonomy based tagging algorithms The different taxonomy key phrases that were found in the text are presented in the hierarchy that they appear in, in the taxonomy This key phrase was found by the automatic tagging tool as a taxonomy phrase Initiate the tagging process
The user can press on the phrases that were found and see their location in the document and their location in the hierarchy of the organizational taxonomy By pressing a certain key phrase it is presented in the text in red
The user can choose which of the implemented tagging algorithms he wishes to run and their weight in determining whether a phrase found in the document will be presented to the user as a new suggested key phrase Pressing: Algorithm-> algorithm selection, allows the user to choose the advanced tagging algorithms to run The user can choose the algorithms to run and their weight in the total score. He can also give higher weight to phrases of a certain length
The new key phrases found by the automatic tagging tool are presented to the user and he can chose whether to approve or delete each of the suggested key phrases The new key phrases found by the automatic tagging tool are presented to the user The user can approve/ delete a new key by pressing the right click of the mouse on the phrase
If the user chose to approve a certain key phrase, he enters an editing window were he decides where the new key phrase should be in the taxonomy hierarchy. If the user selects approve, the editing window opens and the user is being requested to choose a main and secondary subject for the new key phrase in the taxonomy hierarchy
The user can save all the new findings and the new taxonomy into an excel file or into the DB for future use and analysis The current taxonomy and new key phrases approved by the user can be saved for future analysis
Conclusions • When developing a system, large or small, one must take the time to plan and create a high level design and not rush to implement the system. • A considerable amount of time should be dedicated to fully understand the morphology analyzer’s output. • To optimize the system’s output it should be tested on a large document corpus. • This course has contributed a lot to us in learning how to work with different software tools, develop a large system and work in a team.
Points for improvement • Choose a certain appearance of a key phrase in the text based on high number of key phrases surrounding it. • Integrate algorithms using advanced natural language processing tools for better understanding of the text. • Add machine learning abilities that enable the system to adjust the parameters of the different algorithms as the system analyzes more documents.