200 likes | 323 Views
13 Miscellaneous. Functional Programming. More…. Computer languages ranking http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all&lang2=sbcl
E N D
13 Miscellaneous Functional Programming
More… • Computer languages ranking • http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all&lang2=sbcl • “Lisp is worth learning for the profound enlightenment experience you will have when you finally get it; that experience will make you a better programmer for the rest of your days, even if you never actually use Lisp itself a lot. This is the same argument you tend to hear for learning Latin. It won't get you a job, except perhaps as a classics professor, but it will improve your mind, and make you a better writer in languages you do want to use, like English.”
Introduction • Search papers using keywords “text summarization” “summarizer”, … • “Summarizing based on concept counting and hierarchy analysis”, H. Ji, Z. Luo, M. Wan, and X. Gao, IEEE SMC. • An effective English text summarizing system • Concept extraction • Semantic analysis
Introduction • Methods of weighting sentences • Position information • Cue words • Word counting • Lexical chains • Structural information • Heuristic rules
Introduction • Word counting based Vector Space Model (VSM) is the leading method • Every sentence is corresponded to a vector S(T1, W1, T2, W2, …, Tn, Wn) • Ti is a word in the text and Wi is the frequency of Ti in S • Miss the semantic relations between words
Summarization • E.g. a text about Bayesian Network • The topic “network” is expressed by the words “network” , “net”, and “system” • When considering whether “network” is a topic, word counting based VSM misses the reflection of “net” and “system” • This paper construct VSM and extract abstracts based on concept counting instead of word counting
Concept counting algorithm • Concept hierarchy tree • A concept is a generalization of particular instances on the abstract level • A concept may be corresponded to a word or several semantically related words
Concept counting algorithm • Concept hierarchy tree
Concept counting algorithm • Selection of topic concepts • A topic concept should possess generalization power of its son concepts • Three evaluation parameters • S-Frequency • T-Frequency • Conclusion Rate
Concept counting algorithm • Concept S-Frequency • F(Wi) is the frequency of word Wi in the text{W1, W2, W3, …, Wn} : the words belong to concept C • Concept T-Frequency • {A1, A2, A3, …, An} : the offspring nodes of C
Concept counting algorithm • Concept Conclusion Rate • {S1, S2, S3, …, Sn} : the son concepts of C • Higher R(C) represents more generalization of a parent node C → it is more reasonable to use C as the topic concept
Concept counting algorithm • Concept Selection Rate • In the experiments, α=1, β=0.25,γ=1, and δ=0.5 • FS(“subject_matter”)=0;FT(“subject_matter”)=11;R(“subject_matter”)=1-6/11=0.45;Sel(“subject_matter”)=(log1+0.25log12)(0.45+0.5)=0.256
Concept counting algorithm • 1. Place all the nodes on the second level into CandConceptSet; • 2. Take the node with maximum selection rate --- C from CandConceplSet, if CandConceptSet is empty, then end; • 3. If C is a leaf node, then place C into TopicConceptSet, go to 2; • 4. If Sel(C) >= SelThreshold, then:(1) Add C into TopicConceptSet;(2) Delete the son tree rooted as C from CHT. and ;count the parameters ofrelated nodes again. go to 2 • 5. If Sel(C) < Serrhereshold, add the son concepts of C into CandConceptSet, go to 2.
Concept counting algorithm • After the selection step, we obtain possible topic concepts, e.g., {language, subject_matter, performance, summarization, punctuation, text software, macro} • Importance of topic concepts • In order to compute the importance of the sentences in the text → compute the importance of every topic concept I(Ti) firstly • FT(Ti) is the number of words that express topic Ti • λTis 1.2 if Tiis at title position, and is 1 otherwise ;評估那個topic是重要的
Summarization • Topic concept based VSM • For each sentence S in the text, every word in S is corresponded to its related topic concept • S can be represented by a node in a n-dimension vector space: S(T1, W1; T2, W2; …, Tn, Wn)Tiis a topic concept of S and Wiis the frequency of Ti in S • After VSM is built, we compute the importance of every sentence and extract the most important sentences to form the abstract; 評估重要的topic在這個句子出現的頻率
Summarization • ;算某個句子的重要性 λposis the position weight of S and λparis the importance of the paragraph S • At last the sentences are sorted according to their importance and the abstract draft is composed of the sentences with highest importance
Summarization • Topic concept based partition ;考慮多主題的狀況 • If we extract sentences only by their importance, the structure of abstract may be unbalanced • Especially for a multi-topic text • A multi-topic text includes several concept hierarchy trees • P(Tr1,V1;Tr2,V2;......Trn,Vn) Tri is a concept hierarchy tree, and Vi is the frequency of the topic concepts of P which are located in Tri • Compute the similarity between Pi and Pj to decide which continuous paragraphs are “topic part” • Extract important sentences based on topic parts to make up the last abstract
Experiment Results • Measure the performance Nhm: number of sentences extracted that also appear in the human summary Nh: number of the sentences in the human summary Nm: number of the sentences extracted by a special method α: relative importance of R and P
Experiment Results • Fmeasurevalue of two methods
Other Features • Sentence Length Cut-off Feature: • Short sentences tend not to be included in summaries • Fixed-Phrase Feature • Sentences containing any of a list of fixed phrases, mostly two words long (e.g., “this letter...”, “In conclusion...” etc.) • Paragraph Feature • This feature records information for the first ten paragraphs and last five paragraphs in a document • Thematic Word Feature • The most frequent content words are defined as thematic words • Uppercase Word • Proper names are often important