280 likes | 297 Views
Dive into investigating multilingualism through computational linguistics with guest lecturer Adriana Picoral. Learn about WordNet verbs and adjectives and explore Perl programming capabilities. Don't miss the opportunity to expand your knowledge in this advanced computational linguistics lecture. Homework 6 assignment is due soon!
E N D
LING/C SC 581: Advanced Computational Linguistics Lecture 14 Feb 26th
Administrivia • Hope you sent feedback on the last lecture on Text Classification by Marcos Zampieri. • This Thursday (Feb 21st), we have another guest lecture from faculty candidate Adriana Picoral. • Her job talk on "Investigating Multilingualism through Computational Linguistics" is tomorrow at noon in CHEM 209 • Homework 6 out today (due on Friday at midnight)
Last Time • WordNet verbs and adjectives. • Also Framenet for verb frames/senses. • bfs.perl (basic program) bfs4.perl (all minimal length solutions) • @INC is hardwired into Perl: an environment variable can be set to add to the Perl module search path, e.g.: • export PERL5LIB=/home/foobar/code • see https://perlmaven.com/how-to-change-inc-to-find-perl-modules-in-non-standard-locations
WordNet: programmed search • Make no assumptions e.g. chair and table $ perl bfs4.perl chair#n#1 table#n#1 Not found (distance 7 and 100000 nodes explored) $ perl bfs4.perl chair#n#1 table#n#1 200000 Max set to: 200000 Not found (distance 8 and 200007 nodes explored) $ perl bfs4.perl chair#n#1 table#n#1 300000 Max set to: 300000 Found at distance 8 (256541 nodes explored) table#n#1 hype contents#n#1 hypo list#n#1 hype index#n#4 deri index#v#2 hypo supply#v#1 hype seat#v#5 deri seat#n#3 hype chair#n#1 Found at distance 8 (282344 nodes explored) table#n#1 hype contents#n#1 hypo list#n#1 hype index#n#4 deri index#v#2 hypo supply#v#1 hype seat#v#4 deri seat#n#3 hype chair#n#1
WordNet: programmed search $ perl bfs4.perl chair#n#1 table#n#1 500000 Max set to: 500000 Found at distance 8 (256541 nodes explored) table#n#1 hype contents#n#1 hypo list#n#1 hype index#n#4 deri index#v#2 hypo supply#v#1 hype seat#v#5 deri seat#n#3 hype chair#n#1 Found at distance 8 (282344 nodes explored) table#n#1 hype contents#n#1 hypo list#n#1 hype index#n#4 deri index#v#2 hypo supply#v#1 hype seat#v#4 deri seat#n#3 hype chair#n#1 All minimal solutions found does the long chain still have meaning?
WordNet: programmed search table#n#2
WordNet: programmed search $ perl bfs4.perl chair#n#1 table#n#2 Found at distance 2 (82 nodes explored) table#n#2 holo leg#n#3 mero chair#n#1 All minimal solutions found https://wordnet.princeton.edu/wordnet/man/wngloss.7WN.html holonym The name of the whole of which the meronym names a part. Y is a holonym of X if X is a part of Y . meronym The name of a constituent part of, the substance of, or a member of something. X is a meronym of Y if X is a part of Y .
WordNet: programmed search $ perl bfs4.perl chair#n#1 table#n#2 Found at distance 2 (82 nodes explored) table#n#2 holo leg#n#3 mero chair#n#1 All minimal solutions found • Take out holoand merofrom @relations
WordNet: programmed search $ perl bfs4a.perl chair#n#1 table#n#2 Found at distance 3 (81 nodes explored) table#n#2 hypo furniture#n#1 hype seat#n#3 hype chair#n#1 All minimal solutions found
WordNet: programmed search • Example: • John mended the torn dress • what can be deduced about the state of the world (situation) after the event of “mending”? • find the semantic relationship between mend and tear bfs3.perl mend#v#1 tear#v#1 Found at distance 6 (58492 nodes explored) tear#v#1 hypo separate#v#2 hype break_up#v#10 also break#v#4 ants repair#v#1 hypo better#v#2 hype mend#v#1 perl bfs3.perl tear#v#1 mend#v#1 Found at distance 6 (33606 nodes explored) mend#v#1 derimender#n#1 hypo skilled_worker#n#1 hype cutter#n#3 dericut#v#1 hypo separate#v#2 hype tear#v#1 many more…
WordNet: programmed search • Example: • John mended the red dress • mend is a change-of-state verb (applies to its object)
WordNet: programmed search • Example: • John mended the red dress • mend is a change-of-state verb (applies to its object)
WordNet: programmed search $ perl bfs4.perl mend#v#1 red#a#1 Not found (distance 7 and 100001 nodes explored) $ perl bfs4.perl mend#v#1 red#a#1 200000 Max set to: 200000 Found at distance 7 (116111 nodes explored) carmine#a#1 deri carmine#n#1 deri carmine#v#1 hypo redden#v#2 hypo color#v#1 hypo change#v#1 hype better#v#2 hype mend#v#1 Found at distance 7 (116210 nodes explored) red#a#1 deri red#n#1 hypo chromatic_color#n#1 hypo color#n#1 deri color#v#1 hypo change#v#1 hype better#v#2 hype mend#v#1 Found at distance 7 (116211 nodes explored) red#a#1 deri red#a#1 deri red#n#1 hypo chromatic_color#n#1 hypo color#n#1 deri color#v#1 hypo change#v#1 hype better#v#2 hype mend#v#1 Found at distance 7 (116325 nodes explored) ruddy#a#2 deri ruddiness#n#1 hypo complexion#n#1 hypo color#n#1 deri color#v#1 hypo change#v#1 hype better#v#2 hype mend#v#1
WordNet: programmed search $ perl bfs4.perl mend#v#1 red#n#3 Found at distance 6 (49389 nodes explored) Bolshevik#n#1 hypo radical#n#3 hypo person#n#1 hype changer#n#1 deri change#v#1 hype better#v#2 hype mend#v#1 Found at distance 6 (84143 nodes explored) Bolshevik#n#1 hypo radical#n#3 hypo person#n#1 hype worker#n#1 hype skilled_worker#n#1 hype mender#n#1 deri mend#v#1 All minimal solutions found
Homework 6 • Question 1: • Try to find the shortest distance links between each of planet, star, eagle vs. telescope • (Make sure you have the right word sense) • How many are there? • Question 2: • Draw a (merged) graph of semantic relations found • Question 3: • Are any of the chains of semantic relations what you expect? • Question 4: • Is the chain useful? Why or why not? • Question 5: • What do you think the shortest connection linking star and telescope should look like? • How about eagle and telescope?
Cosine Similarity Using word vectors acquired from large corpora GloVe (Stanford), word2vec (Google) Python: gensim etc. • vec(‘Rome’) closest vec(‘Paris’) – vec(‘France’) + vec(‘Italy’) • Examples: • telescope: [1.5667, 1.1436, 1.6432, 0.2347, -0.57751, -0.29565, -0.78965, -0.95205, -0.097776, -0.31729, 0.82443, 0.27591, 0.70094, 1.2939, -1.1032, 1.0748, -0.21654, 0.44433, -1.854, -0.50952, -0.1966, -0.050295, -0.75702, -1.4179, 1.1795, -0.29231, -0.61232, 0.40963, -0.79731, 0.02117, 0.57397, -0.6336, -0.13071, -1.1153, -0.5656, -0.20496, 0.34324, 1.1626, 0.19703, -0.76862, 1.1381, 0.019043, 0.10676, 0.46047, -0.50555, -0.26049, 1.1725, -0.049478, -0.71014, 0.19022] • star: [-0.21025, 1.6081, 0.037375, 1.0411, 0.61061, 0.064748, -0.93674, -0.030028, -0.18348, 0.73875, 0.65025, 0.75496, -0.73316, 0.95964, 0.89172, -0.10495, 0.11496, 0.30448, -1.4942, -0.036297, -0.95949, 0.41062, -0.23896, 0.40387, -0.32893, -1.5343, -0.45627, 0.109, -0.41474, -0.57094, 2.1997, 0.47089, 0.56732, -0.16914, 0.43481, 0.40459, -0.007678, -0.22073, -0.33289, -1.0992, 0.33632, 1.3412, -0.34081, -0.50183, -0.2514, -0.10199, 0.19292, -0.48934, -0.41793, 0.18085] • potato: [-0.063054, -0.62636, -0.76417, -0.041484, 0.56284, 0.86432, -0.73734, -0.70925, -0.073065, -0.74619, -0.34769, 0.14402, 1.4576, 0.034688, 0.11224, 0.13854, 0.10484, 0.60207, 0.021777, -0.21802, 0.087613, -1.4234, 1.0361, 0.1509, 0.13608, -0.2971, -0.90828, 0.34182, 1.3367, 0.16329, 1.2374, -0.20113, -0.91532, 1.4222, -0.1276, 0.69443, -1.1782, 1.2072, 1.0524, -0.11957, -0.1275, 0.41798, -0.9232, -0.1312, 1.2696, 1.2318, 0.30061, -0.18854, 0.15899, 0.0486]
Cosine Similarity • Visualization: potato star telescope
Cosine Similarity http://blog.christianperone.com/2013/09/machine-learning-cosine-similarity-for-vector-space-models-part-iii/
Cosine Similarity • Vectors A,B and cos(θ): (wikipedia) • Python: from scipy import linalg, mat, dot import numpy as np m1 = mat(A) # row m2 = mat(B) m12 = dot(m1,m2.T)/(np.linalg.norm(m1)*np.linalg.norm(m2))
Cosine Similarity a-b • Let x = (x1,…,xn) and y = (y1,…,yn) • Define x·y = ∑i xi yidot product • ‖x‖ = √(∑i xi2) • = √(x·x) • a,b nonzero vectors • ‖a-b‖2 = ‖a‖2 + ‖b‖2 -2‖a‖ ‖b‖cosθ (Law of cosines) • But ‖a-b‖2 = (a-b)·(a-b) • ‖a-b‖2 = a·a-2a·b+b·b • ‖a-b‖2 = ‖a‖2-2a·b+‖b‖2 • Then a·b = ‖a‖ ‖b‖cosθ ‖a-b‖ a ‖a‖ b θ ‖b‖ • a·b =0 means θ=90° (orthogonal)
Cosine Similarity • Triangle: law of cosines c2 = a2 + b2 – 2ab cosθ • Proof: • Points: C (0,0), B (a,0), A (b cosθ,bsinθ) • Pythagoras • c2 = (a – b cosθ)2 + (b sinθ)2 • c2 = a2 – 2ab cosθ + b2cos2θ +b2sin2θ • c2 = a2 – 2ab cosθ + b2(cos2θ +sin2θ) • c2 = a2 – 2ab cosθ + b2 ‖⃦a-b‖⃦ ‖⃦a‖⃦ θ ‖⃦b‖⃦ wikipedia c a θ b
Examples • Code adapted from: • https://github.com/adventuresinML/adventures-in-ml-code/blob/master/tf_word2vec.py • Training on text8: • http://mattmahoney.net/dc/textdata.html • first 108 bytes of fil9, the cleaned-up version of enwik9, the first 109 bytes of the English Wikipedia dump on Mar. 3, 2006. • clean up: remove meta-data, hypertext links, citations, footnotes. Also case-fold, spell out numbers, out-of-band a-z converted to blanks, etc. • Skip-gram model • n = 2 or 4 (either side of target word)
Examples • Embedding: 300, skip window: 4, vocab size: 20,000 (others: UNK) • Filename: text8.zip, #words: 17,005,207 • Nearest to the: • regulate, camelot, anymore, mutants, lowlands, thorn, irene, ax • and, of, a, UNK, in, to, one, nine • a, and, UNK, in, of, one, to, zero • a, UNK, and, of, in, two, is, one • a, one, and, UNK, zero, in, s, two • a, UNK, of, in, one, two, s, and • and, a, s, in, ursus, of, UNK, one • a, UNK, ursus, and, s, three, one, six • a, one, ursus, seven, three, six, four, UNK • a, ursus, UNK, s, of, this, and, in
Examples • Embedding: 300, skip window: 4, vocab size: 20,000 (others: UNK) • Nearest to have: • shrink, generalization, scandinavia, cards, approval, diplomatic, bus, bog • UNK, the, and, cards, generalization, approval, to, scandinavia • and, voter, to, generalization, in, a, cards, UNK • and, that, UNK, shrink, voter, the, in, is • that, and, in, voter, coke, shrink, generalization, cards • that, and, are, in, it, is, by, two • ursus, that, and, are, in, with, it, be • ursus, that, are, be, and, with, by, has • are, ursus, be, that, has, in, with, by • are, has, that, be, ursus, had, and, with
Examples • Embedding: 300, skip window: 4, vocab size: 10,000 (others: UNK) • Nearest to nine: • dust, owner, regain, freedom, party, gained, playstation, himself • zero, in, UNK, of, and, the, one, coke • one, zero, eight, two, in, and, six, the • one, eight, zero, two, six, three, seven, five • eight, zero, one, two, six, three, seven, five • eight, one, seven, zero, two, six, three, five • eight, seven, six, four, one, three, five, zero • eight, seven, six, one, five, four, zero, three • eight, seven, six, four, one, five, three, zero • eight, seven, six, four, five, one, three, zero
Examples • Embedding: 300, skip window: 4, vocab size: 20,000 (others: UNK) • Nearest to some: • groove, bram, cavitation, wickets, respect, wtoo, sticky, anatolia • alien, a, in, the, of, wickets, and, respect • a, alien, UNK, of, zero, and, the, wickets • alien, a, the, of, and, wickets, zero, UNK • a, alien, zero, and, two, the, or, groove • or, UNK, a, alien, two, six, in, and • or, two, and, a, the, ursus, alien, are • and, or, ursus, are, that, alien, the, from • or, are, ursus, other, and, that, two, UNK • or, the, are, other, and, two, many, ursus
Examples • Embedding: 300, skip window: 4, vocab size: 20,000 (others: UNK) • Nearest to american: • ways, practitioners, hexadecimal, tito, confirming, damascus, sharply, roof • phi, ways, practitioners, legislatures, halley, whole, mughal, UNK • phi, ways, practitioners, legislatures, tito, one, halley, roof • phi, one, and, legislatures, ways, practitioners, tito, eight • phi, the, zero, UNK, and, legislatures, in, ways • one, nine, phi, UNK, two, six, by, three • UNK, and, nine, by, ursus, phi, zero, the • nine, and, UNK, callithrix, ursus, s, phi, six • nine, in, of, and, UNK, callithrix, ursus, phi • nine, UNK, in, callithrix, ursus, and, one, of