110 likes | 328 Views
AUTO SUMMARIZER AND RECTIFIER. BASED ON STATISTICAL METHOD. PROPOSAL. Auto summarization provides a concise summary for a document. In this I present a Statistical approach to addressing the text generation problem in domain-independent, single-document summarization.
E N D
AUTO SUMMARIZER AND RECTIFIER BASED ON STATISTICAL METHOD
PROPOSAL Auto summarization provides a concise summary for a document. In this I present a Statistical approach to addressing the text generation problem in domain-independent, single-document summarization. My thesis Includes salton’s vector space model which divides the sentences into categories which can also be used for summarizing the contents in WebPages.
HOW IT WORKS The summarizer initially breaks the entire document into sentences based on the separators. The Second step is that the unnecessary words are removed from the document. The document after removing the stop words is revised again for the unique words. Unique words are the one which have the same meaning or might be redundant in the document. These are removed by a method called stemming.
MECHANISM By using the Stemming mechanism the occurrence of a word is calculated and the results are displayed in the format of how many times they occur and the number of sentences they have occurred. This helps to calculate the weight of the each and every word that occur in a sentence. Based on the weight age for each and every word the total weight for a Sentence can be calculated. The final step is the summarizing by which the highest weight sentence is ranked number 1 followed by the next consecutive sentences in the document.
CALCULATION wg=tf* Math.log10(scnt/df) The stopwords are listed separately in a Text file for easy parsing and removing them for further summarization process. Sentences are extracted in the order of their importance until the summary reaches the required length. Though there are various methods for summarizing and this might be an existing mechanism the proposal lies exactly in the Rectifier part.
CALCULATION-SOME FORMULAS FOR SUMMARIZING INITIAL CALCULATIONS: dj=(W1,j , W2,j , Wt,j) q=(W1,q , W2,q , Wt,q) In the above method documents and queries are represented as vectors. For comparing the similarities and the relevance in between the words in a document Cosine Angle between the vectors can be used cos =d2.q/||d2|| ||q|| a cosine value of zero means that the query and document vector are orthogonal and have no match
RECTIFICATION Providing the summarization of a normal task may be an easy task , but before that checking the UI of the book or the doc is an extra feature added in this application. Mobi and Topaz teams in Amazon follow manual method of testing the UI feature of a content. Though the entire page may be covered from cover to end there is a possibility of a human error. There is no options for testing the contents that are corrupted and also by chance if a corrupted UI in a content reaches the customer it results in a heavy negative impact.
PROPOSAL FOR RECTIFICATION For treating Corrupted documents or documents with corrupted UI development of a preprocessor is necessary or a Syntactic Parser can be used for increasing the robustness of a system. The application can not only be used in the system for automating but also can be installed in the kindle for checking the customer side scenarios related to the reader. The human misses can be avoided, time taken for a correcting a book is reduced by almost 60-75% based on the content.
RECTIFICATION (CONTD) The advantages of using this tool is productivity of a resource increases by 50%. Even if heavy contents are allocated to a resource at the max the deviation can less than 10%. The number of contents dealt per day can increase by 50%. This is a stand alone application. No special requirements like a server or a Separate Platform are needed for running it. Report Generation can also be included for reporting Performance Issues. Customer side complaints will be dramatically reduced which boosts a positive feedback from customers.
RECTIFICATION The Main purpose of this application is that it can be used for the Smoke testing purpose before qualifying the content for further level of testing. Mobi and Topaz teams in Amazon follow only manual testing process for qualifying contents this application can be used in both kindle as well as in a PC for a side by side testing to check UI related issues and Grammatical errors. No extra cost is needed for running this Application. The Deployment process is also very simple and Portable.
QUESTIONS Q & A