130 likes | 253 Views
Bc. Anton Balucha http ://www.tonyb.sk/. Identification of people. Assignment from subject Information Retrieval. search engine results many information about many people strewed , not integrated. Motivation.
E N D
Bc. Anton Balucha http://www.tonyb.sk/ Identificationofpeople • AssignmentfromsubjectInformationRetrieval
searchengineresults • manyinformationaboutmanypeople • strewed, notintegrated Anton Balucha - Identification of people Motivation
createanapplication, whichidentifyoccurenceof person on various web sites Anton Balucha - Identification of people Task
http://www.pipl.com– (easy to use , transparent list ofresults) • http://www.zabasearch.com (searchpeopleonly in USA) • http://www.wink.com (searchpeople on socialnetworks) • http://www.people.yahoo.com(searchpeoplewithsomeenteredparameters – mane, surname, town, state, e-mail) • https://addons.mozilla.org/sk/firefox/addon/3167 (pluginintoFirefoxbrowser) • http://www.peoplesearch.com (searchpeopleonly in USA in entered state) • http://www.peekyou.com(searchpeople on variousportals - Google+, Wikipedia, LinkedIn, Flickr, Twitter) • http://www.123people.com(searchpeople on variousportals - Google+, Wikipedia, LinkedIn, Flickr, Twitter) • http://www.bestpeoplesearch.com (searchpeopleonly in USA in entered state, possibility to hire person forsearching) Anton Balucha - Identification of people Existingsolutions
programmed in Java • web application • availablefrom z http://www.tonyb.sk/ • no staticdata • activeusingofresultsfromsearchengines Anton Balucha - Identification of people Desctioptionofsolution - architectural
Anton Balucha - Identification of people descriptionofsolution - implementation Google results web pages removediacritics remove stop words remove HTML stemming TF-IDF identifykeywords show results identifykeywords identifykeywords
Anton Balucha Mária Bieliková Pavol Návrat Peter Borga Petra Majzúnová Miloš Blaško Anton Balucha - Identification of people Useddata
Anton Balucha http://www2.fiit.stuba.sk/research/pewe/program-2008-2009/ http://dlznik.zoznam.sk/socialna-siet/anton-balucha-1 http://dlznik.zoznam.sk/socialna-siet/anton-balucha-clen-1 http://sk.linkedin.com/pub/anton-balucha/36/52/42a Mária Bieliková http://www2.fiit.stuba.sk/~bielik/ http://www2.fiit.stuba.sk/~bielik/books/index.html http://www.fhv.umb.sk/app/user.php?ACTION=PUBLICATION&user=bielikova.maria http://mariabielik.zenfolio.com/ http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/b/Bielikov=aacute=:M=aacute=ria.html Peter Borga http://en-gb.facebook.com/peterborga http://www.facebook.com/peter.borgneal http://uk.linkedin.com/in/peterborgneal http://peter-borg.com.au/ Anton Balucha - Identification of people Sample output
better text processor better stemming better keyword identification just right number of keywords Anton Balucha - Identification of people Improvement
I found • what is stemming & lemmatization • what is TF-IDF • what is precision & recall • how interesting is text research Anton Balucha - Identification of people In the End…
intallationofJava intallationofApacheTomcat deploy externalapplications access to the Internet access to theapplication Anton Balucha - Identification of people Installation ofApplication
[1] Michal Laclavík, Martin Šeleng: Vyhľadávanie informácií. Vyhľadávanie informácií. Dostupné na <http://vi.ikt.ui.sav.sk/>(01.12.2011) [2] PorterStemmer. Dostupné na <http://tartarus.org/martin/PorterStemmer/>(01.12.2011) Anton Balucha - Identification of people UsedLiterature