280 likes | 498 Views
Automatic Discovery of Technology Trends from Patent Text. Youngho Kim, Yingshi Tian , Yoonjae Jeong , Ryu Jihee , Sung- Hyon Myaeng School of Engineering Information and Communication University, South Korea. Introduction.
E N D
Automatic Discovery of Technology Trends from Patent Text Youngho Kim, YingshiTian, YoonjaeJeong, RyuJihee, Sung-HyonMyaeng School of Engineering Information and Communication University, South Korea
Introduction • Motive: Patent text is a good source to discover technological progresses. • Problem: Previous solutions(citation analysis, network-based patent analysis) forpatent domain have some drawbacks • Need domain expertise • Not easy to recognize salient concepts • Hamper wide application of the proposed method
Introduction • In this paper, the authors want to • Avoid the limitations mentioned previously • Method • Semantic key-phrase extraction(No experts) • Technological trend discovery(Unsupervised) • Semantic key-phrase define: • Problem, such as “recognizing spoken language” • Solution, such as “language model” • Domain, such as “speech recognition”
Introduction • Application: help users explore numerous technical documents efficiently to get the technological trends, the below is a example
Overall procedure • Technology identification through semantic key-phrase extraction • The probabilistic framework with linguistic clues • The probabilistic framework have weighting • The linguistic clues have weighting • Finally, Using statistical learner to learn(Libsvm) • Discover technological trends by • Select important technologies during a time sapn • Linking them according to semantic relatedness
Problem Formulation • Definition • Domain :A field of technology given by a user query, then generate a collection of related field • Problem : A patent or a method attempts to solve • Solution : A method, a model or an approach that is associated with a particular problem • Technology : A combination of a problem, a solution, and the given domain • Time Span:
Problem Formulation • Definition • Technological Trend : a main stream of technologies during a time span l. • Example:
Technological Trend Discovery System • Structure of Patent Documents • Semantic Key-phrase Extraction • Problem Extraction • Solution Extraction • Technological Trend Discovery
Structure of Patent Documents • Database : USPTO(United States Patent and Trademark office) Cite information Linguistic features Linguistic features Linguistic features Time span
Semantic Key-phrase Extraction • Step 1 • Parsing a patent to get smallest noun phrase as key-phrase candidates(e.g. signal patterns) • Expand NP to V+NP by dependency(e.g. recognizing signal patterns) • Step 2 • Identify Problem key-phrase by classifying • Step 3 • Among the rest of candidate, extract solution key-phrase to get
Problem Extraction Feature • Topical language model(unigram) • Consider the dependency(bigram model) • Special smoothing: Relevance & background language model
Problem Extraction • Question: Probability model is biased to the topicality, need other mechanism to revise it • Method: Linguistic clues • Gather all distinct patterns from the annotation • Generalize grammar by these pattern • E.g. (method/NN+in/PP )and(system/NN+in/PP) ==> ( method | system )NN+in/PP
Problem Extraction Feature • 342 generalized patterns
Problem Extraction • generalized patternsneed a confidence • A statistical machine learner(Libsvm) to the linguistic clues and the language models. • Libsvm classify the candidate into problem & non-problem by using the above features
Solution Extraction • Probability features work would not be useful • The solution phrase are rarely share within cited document • Add the “head word” feature(i.e. model, approach, method, methodology etc.) • the other feature category is the same as Problem Extraction
Technology Trend Discovery • Reduction: Select several salient technologies and associate semantic relations between them • How to find an good time span can discover effective technological trends • KL-divergence to compare two language model
Technology Trend Discovery • How to find salient technologies within time spans. • If a technology is important , many patent will refer to it • Mutual information concept
Technology Trend Discovery Algorithm • Step 1 • Define an initial time span(by dense of the data) • Step 2 • Generate all possible combination of time span(e.g. <1998~2000,1999~2001> ) • Step 3 • Calculate KL-divergences of all pairs from step 2, rank them • Step 4 • Select the most important technology among the top n pairs
Experiment • Database: USPTO • Domain: Speech recognition • Data number: US 1420 patent document • Time: 1976 - 2003 • Annotator: three computer science graduate students • Annotated number:400 document(uniformly select over the span of time)
Experiment • Annotated work • Deal with the acronym(by Wiki and simple parenthetical patterns) • WordNet to normalize the noun and verb • Technology phrase(Answer) is produced by gold standard with majority votes • Agreements for 78% of sample(about 300 ) • Technology Trend Discovery do not have a standard , it is too hard.(too many time span) ==>do not have good evaluation
Experiment • Set the background language model • Used LIBSVM as a machine learner,used 5-fold cross validation
Experiment • All feature was proven the effectiveness
Experiment • From the above step, we can discover many meaningful problems and solutions • Question: Synonymy issue(even utilize synonyms from WordNet)
Experiment • Discover technological trends by the Technology Trend Discovery Algorithm
Conclusion & future work • Discover such trends can reveal latent technologies • Also can assist an exploration by alleviating information overload caused by search results • Future work • Synonymy issue in Semantic Extraction • TTD standardized evaluation needs to investigated