1 / 28

LING/C SC 581: Advanced Computational Linguistics

LING/C SC 581: Advanced Computational Linguistics. Lecture 14 Feb 26 th. Administrivia. Hope you sent feedback on the last lecture on Text Classification by Marcos Zampieri . This Thursday (Feb 21 st ), we have another guest lecture from faculty candidate Adriana Picoral .

arthurj
Download Presentation

LING/C SC 581: Advanced Computational Linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING/C SC 581: Advanced Computational Linguistics Lecture 14 Feb 26th

  2. Administrivia • Hope you sent feedback on the last lecture on Text Classification by Marcos Zampieri. • This Thursday (Feb 21st), we have another guest lecture from faculty candidate Adriana Picoral. • Her job talk on "Investigating Multilingualism through Computational Linguistics" is tomorrow at noon in CHEM 209 • Homework 6 out today (due on Friday at midnight)

  3. Last Time • WordNet verbs and adjectives. • Also Framenet for verb frames/senses. • bfs.perl (basic program) bfs4.perl (all minimal length solutions) • @INC is hardwired into Perl: an environment variable can be set to add to the Perl module search path, e.g.: • export PERL5LIB=/home/foobar/code • see https://perlmaven.com/how-to-change-inc-to-find-perl-modules-in-non-standard-locations

  4. WordNet: programmed search • Make no assumptions e.g. chair and table $ perl bfs4.perl chair#n#1 table#n#1 Not found (distance 7 and 100000 nodes explored) $ perl bfs4.perl chair#n#1 table#n#1 200000 Max set to: 200000 Not found (distance 8 and 200007 nodes explored) $ perl bfs4.perl chair#n#1 table#n#1 300000 Max set to: 300000 Found at distance 8 (256541 nodes explored) table#n#1 hype contents#n#1 hypo list#n#1 hype index#n#4 deri index#v#2 hypo supply#v#1 hype seat#v#5 deri seat#n#3 hype chair#n#1 Found at distance 8 (282344 nodes explored) table#n#1 hype contents#n#1 hypo list#n#1 hype index#n#4 deri index#v#2 hypo supply#v#1 hype seat#v#4 deri seat#n#3 hype chair#n#1

  5. WordNet: programmed search $ perl bfs4.perl chair#n#1 table#n#1 500000 Max set to: 500000 Found at distance 8 (256541 nodes explored) table#n#1 hype contents#n#1 hypo list#n#1 hype index#n#4 deri index#v#2 hypo supply#v#1 hype seat#v#5 deri seat#n#3 hype chair#n#1 Found at distance 8 (282344 nodes explored) table#n#1 hype contents#n#1 hypo list#n#1 hype index#n#4 deri index#v#2 hypo supply#v#1 hype seat#v#4 deri seat#n#3 hype chair#n#1 All minimal solutions found does the long chain still have meaning?

  6. WordNet: programmed search table#n#2

  7. WordNet: programmed search

  8. WordNet: programmed search $ perl bfs4.perl chair#n#1 table#n#2 Found at distance 2 (82 nodes explored) table#n#2 holo leg#n#3 mero chair#n#1 All minimal solutions found https://wordnet.princeton.edu/wordnet/man/wngloss.7WN.html holonym The name of the whole of which the meronym names a part. Y is a holonym of X if X is a part of Y . meronym The name of a constituent part of, the substance of, or a member of something. X is a meronym of Y if X is a part of Y .

  9. WordNet: programmed search $ perl bfs4.perl chair#n#1 table#n#2 Found at distance 2 (82 nodes explored) table#n#2 holo leg#n#3 mero chair#n#1 All minimal solutions found • Take out holoand merofrom @relations

  10. WordNet: programmed search $ perl bfs4a.perl chair#n#1 table#n#2 Found at distance 3 (81 nodes explored) table#n#2 hypo furniture#n#1 hype seat#n#3 hype chair#n#1 All minimal solutions found

  11. WordNet: programmed search • Example: • John mended the torn dress • what can be deduced about the state of the world (situation) after the event of “mending”? • find the semantic relationship between mend and tear bfs3.perl mend#v#1 tear#v#1 Found at distance 6 (58492 nodes explored) tear#v#1 hypo separate#v#2 hype break_up#v#10 also break#v#4 ants repair#v#1 hypo better#v#2 hype mend#v#1 perl bfs3.perl tear#v#1 mend#v#1 Found at distance 6 (33606 nodes explored) mend#v#1 derimender#n#1 hypo skilled_worker#n#1 hype cutter#n#3 dericut#v#1 hypo separate#v#2 hype tear#v#1 many more…

  12. WordNet: programmed search • Example: • John mended the red dress • mend is a change-of-state verb (applies to its object)

  13. WordNet: programmed search • Example: • John mended the red dress • mend is a change-of-state verb (applies to its object)

  14. WordNet: programmed search $ perl bfs4.perl mend#v#1 red#a#1 Not found (distance 7 and 100001 nodes explored) $ perl bfs4.perl mend#v#1 red#a#1 200000 Max set to: 200000 Found at distance 7 (116111 nodes explored) carmine#a#1 deri carmine#n#1 deri carmine#v#1 hypo redden#v#2 hypo color#v#1 hypo change#v#1 hype better#v#2 hype mend#v#1 Found at distance 7 (116210 nodes explored) red#a#1 deri red#n#1 hypo chromatic_color#n#1 hypo color#n#1 deri color#v#1 hypo change#v#1 hype better#v#2 hype mend#v#1 Found at distance 7 (116211 nodes explored) red#a#1 deri red#a#1 deri red#n#1 hypo chromatic_color#n#1 hypo color#n#1 deri color#v#1 hypo change#v#1 hype better#v#2 hype mend#v#1 Found at distance 7 (116325 nodes explored) ruddy#a#2 deri ruddiness#n#1 hypo complexion#n#1 hypo color#n#1 deri color#v#1 hypo change#v#1 hype better#v#2 hype mend#v#1

  15. WordNet: programmed search $ perl bfs4.perl mend#v#1 red#n#3 Found at distance 6 (49389 nodes explored) Bolshevik#n#1 hypo radical#n#3 hypo person#n#1 hype changer#n#1 deri change#v#1 hype better#v#2 hype mend#v#1 Found at distance 6 (84143 nodes explored) Bolshevik#n#1 hypo radical#n#3 hypo person#n#1 hype worker#n#1 hype skilled_worker#n#1 hype mender#n#1 deri mend#v#1 All minimal solutions found

  16. Homework 6 • Question 1: • Try to find the shortest distance links between each of planet, star, eagle vs. telescope • (Make sure you have the right word sense) • How many are there? • Question 2: • Draw a (merged) graph of semantic relations found • Question 3: • Are any of the chains of semantic relations what you expect? • Question 4: • Is the chain useful? Why or why not? • Question 5: • What do you think the shortest connection linking star and telescope should look like? • How about eagle and telescope?

  17. Cosine Similarity Using word vectors acquired from large corpora GloVe (Stanford), word2vec (Google) Python: gensim etc. • vec(‘Rome’) closest vec(‘Paris’) – vec(‘France’) + vec(‘Italy’) • Examples: • telescope: [1.5667, 1.1436, 1.6432, 0.2347, -0.57751, -0.29565, -0.78965, -0.95205, -0.097776, -0.31729, 0.82443, 0.27591, 0.70094, 1.2939, -1.1032, 1.0748, -0.21654, 0.44433, -1.854, -0.50952, -0.1966, -0.050295, -0.75702, -1.4179, 1.1795, -0.29231, -0.61232, 0.40963, -0.79731, 0.02117, 0.57397, -0.6336, -0.13071, -1.1153, -0.5656, -0.20496, 0.34324, 1.1626, 0.19703, -0.76862, 1.1381, 0.019043, 0.10676, 0.46047, -0.50555, -0.26049, 1.1725, -0.049478, -0.71014, 0.19022] • star: [-0.21025, 1.6081, 0.037375, 1.0411, 0.61061, 0.064748, -0.93674, -0.030028, -0.18348, 0.73875, 0.65025, 0.75496, -0.73316, 0.95964, 0.89172, -0.10495, 0.11496, 0.30448, -1.4942, -0.036297, -0.95949, 0.41062, -0.23896, 0.40387, -0.32893, -1.5343, -0.45627, 0.109, -0.41474, -0.57094, 2.1997, 0.47089, 0.56732, -0.16914, 0.43481, 0.40459, -0.007678, -0.22073, -0.33289, -1.0992, 0.33632, 1.3412, -0.34081, -0.50183, -0.2514, -0.10199, 0.19292, -0.48934, -0.41793, 0.18085] • potato: [-0.063054, -0.62636, -0.76417, -0.041484, 0.56284, 0.86432, -0.73734, -0.70925, -0.073065, -0.74619, -0.34769, 0.14402, 1.4576, 0.034688, 0.11224, 0.13854, 0.10484, 0.60207, 0.021777, -0.21802, 0.087613, -1.4234, 1.0361, 0.1509, 0.13608, -0.2971, -0.90828, 0.34182, 1.3367, 0.16329, 1.2374, -0.20113, -0.91532, 1.4222, -0.1276, 0.69443, -1.1782, 1.2072, 1.0524, -0.11957, -0.1275, 0.41798, -0.9232, -0.1312, 1.2696, 1.2318, 0.30061, -0.18854, 0.15899, 0.0486]

  18. Cosine Similarity • Visualization: potato star telescope

  19. Cosine Similarity http://blog.christianperone.com/2013/09/machine-learning-cosine-similarity-for-vector-space-models-part-iii/

  20. Cosine Similarity • Vectors A,B and cos(θ): (wikipedia) • Python: from scipy import linalg, mat, dot import numpy as np m1 = mat(A) # row m2 = mat(B) m12 = dot(m1,m2.T)/(np.linalg.norm(m1)*np.linalg.norm(m2))

  21. Cosine Similarity a-b • Let x = (x1,…,xn) and y = (y1,…,yn) • Define x·y = ∑i xi yidot product • ‖x‖ = √(∑i xi2) • = √(x·x) • a,b nonzero vectors • ‖a-b‖2 = ‖a‖2 + ‖b‖2 -2‖a‖ ‖b‖cosθ (Law of cosines) • But ‖a-b‖2 = (a-b)·(a-b) • ‖a-b‖2 = a·a-2a·b+b·b • ‖a-b‖2 = ‖a‖2-2a·b+‖b‖2 • Then a·b = ‖a‖ ‖b‖cosθ ‖a-b‖ a ‖a‖ b θ ‖b‖ • a·b =0 means θ=90° (orthogonal)

  22. Cosine Similarity • Triangle: law of cosines c2 = a2 + b2 – 2ab cosθ • Proof: • Points: C (0,0), B (a,0), A (b cosθ,bsinθ) • Pythagoras • c2 = (a – b cosθ)2 + (b sinθ)2 • c2 = a2 – 2ab cosθ + b2cos2θ +b2sin2θ • c2 = a2 – 2ab cosθ + b2(cos2θ +sin2θ) • c2 = a2 – 2ab cosθ + b2 ‖⃦a-b‖⃦ ‖⃦a‖⃦ θ ‖⃦b‖⃦ wikipedia c a θ b

  23. Examples • Code adapted from: • https://github.com/adventuresinML/adventures-in-ml-code/blob/master/tf_word2vec.py • Training on text8: • http://mattmahoney.net/dc/textdata.html • first 108 bytes of fil9, the cleaned-up version of enwik9, the first 109 bytes of the English Wikipedia dump on Mar. 3, 2006. • clean up: remove meta-data, hypertext links, citations, footnotes. Also case-fold, spell out numbers, out-of-band a-z converted to blanks, etc. • Skip-gram model • n = 2 or 4 (either side of target word)

  24. Examples • Embedding: 300, skip window: 4, vocab size: 20,000 (others: UNK) • Filename: text8.zip, #words: 17,005,207 • Nearest to the: • regulate, camelot, anymore, mutants, lowlands, thorn, irene, ax • and, of, a, UNK, in, to, one, nine • a, and, UNK, in, of, one, to, zero • a, UNK, and, of, in, two, is, one • a, one, and, UNK, zero, in, s, two • a, UNK, of, in, one, two, s, and • and, a, s, in, ursus, of, UNK, one • a, UNK, ursus, and, s, three, one, six • a, one, ursus, seven, three, six, four, UNK • a, ursus, UNK, s, of, this, and, in

  25. Examples • Embedding: 300, skip window: 4, vocab size: 20,000 (others: UNK) • Nearest to have: • shrink, generalization, scandinavia, cards, approval, diplomatic, bus, bog • UNK, the, and, cards, generalization, approval, to, scandinavia • and, voter, to, generalization, in, a, cards, UNK • and, that, UNK, shrink, voter, the, in, is • that, and, in, voter, coke, shrink, generalization, cards • that, and, are, in, it, is, by, two • ursus, that, and, are, in, with, it, be • ursus, that, are, be, and, with, by, has • are, ursus, be, that, has, in, with, by • are, has, that, be, ursus, had, and, with

  26. Examples • Embedding: 300, skip window: 4, vocab size: 10,000 (others: UNK) • Nearest to nine: • dust, owner, regain, freedom, party, gained, playstation, himself • zero, in, UNK, of, and, the, one, coke • one, zero, eight, two, in, and, six, the • one, eight, zero, two, six, three, seven, five • eight, zero, one, two, six, three, seven, five • eight, one, seven, zero, two, six, three, five • eight, seven, six, four, one, three, five, zero • eight, seven, six, one, five, four, zero, three • eight, seven, six, four, one, five, three, zero • eight, seven, six, four, five, one, three, zero

  27. Examples • Embedding: 300, skip window: 4, vocab size: 20,000 (others: UNK) • Nearest to some: • groove, bram, cavitation, wickets, respect, wtoo, sticky, anatolia • alien, a, in, the, of, wickets, and, respect • a, alien, UNK, of, zero, and, the, wickets • alien, a, the, of, and, wickets, zero, UNK • a, alien, zero, and, two, the, or, groove • or, UNK, a, alien, two, six, in, and • or, two, and, a, the, ursus, alien, are • and, or, ursus, are, that, alien, the, from • or, are, ursus, other, and, that, two, UNK • or, the, are, other, and, two, many, ursus

  28. Examples • Embedding: 300, skip window: 4, vocab size: 20,000 (others: UNK) • Nearest to american: • ways, practitioners, hexadecimal, tito, confirming, damascus, sharply, roof • phi, ways, practitioners, legislatures, halley, whole, mughal, UNK • phi, ways, practitioners, legislatures, tito, one, halley, roof • phi, one, and, legislatures, ways, practitioners, tito, eight • phi, the, zero, UNK, and, legislatures, in, ways • one, nine, phi, UNK, two, six, by, three • UNK, and, nine, by, ursus, phi, zero, the • nine, and, UNK, callithrix, ursus, s, phi, six • nine, in, of, and, UNK, callithrix, ursus, phi • nine, UNK, in, callithrix, ursus, and, one, of

More Related