10 likes | 125 Views
Making Reports. Why Python?. Useful Libraries. Adding Papers to VIVO. def counts(s,log,trim=100000000): names = ["Date","Process","User","ADD/SUB","Subject","Predicate","Object"] ix = names.index(s) print "Counts of "+s things = {} for row in log: try:
E N D
Making Reports Why Python? Useful Libraries Adding Papers to VIVO def counts(s,log,trim=100000000): names = ["Date","Process","User","ADD/SUB","Subject","Predicate","Object"] ix = names.index(s) print "Counts of "+s things = {} for row in log: try: thing = row[ix] things[thing] = things.get(thing,0) + 1 except: continue i=0 for thing in sorted(things, key =things.get, reverse=True): i = i + 1 if i > trim: break print thing,things[thing] Figure 2. Counting function from logging script Python1,2 is a popular, easy to learn language very well suited for use with VIVO and the semantic web. Python is available for Mac, Windows and Linux, has simple procedural syntax and clear syntax for object oriented development. Python is trivial to install and standard installations include and integrated development environment, IDLE. Python is open source, and has a strong development community. Many libraries are included in the standard distribution and many more libraries are available through standard python archives. Installing additional libraries typically requires a single command. Python has a very short learning curve. Experienced programmers can install and write python programs on their first day. Python has outstanding support for data structures, the Internet, exception handling, XML, string manipulation, CSV files, and interaction with other systems. Python is very fast to compile and execute. A 200,000 line Excel CSV can be read, and summarized in a few seconds. Python scripts read bibtex, use SPARQL calls to find available VIVO URIs, and templates to generate RDF. UF authors are identified and papers linked to profiles. Journals, authors and publishers are created if needed. Python string functions improve and standardize text. Reports summarize actions taken. Some useful Python libraries for use with VIVO: Pybtex – read bibtex files into python structures Tempita – simple, flexible templates Csv – read and write CSV files Minidom – read, manage, write XML data Re – regular expressions in python Datetime – ISO standard datetime processing pyRTF – make RTF documents Urllib – create URLs, fetch web content Entrez – query, read, process PubMed files Rdflib – tools for working with RDF Vivotools – UF tools for SPARQL query, generate VIVO URIs Logs for week of 2012-08-09 have 342789 entries Top five users dsr 236086 aa238@ufl.edu 49853 people 20139 ankitbaderiya@ufl.edu 14403 nettiepa@ufl.edu 14385 Counts of Process HARVEST 258296 MANUAL 84442 Counts of ADD/SUB ADD 219256 SUB 123482 Top five subjects <http://vivo.ufl.edu/individual/n32260> 338 <http://vivo.ufl.edu/individual/n466171> 320 <http://vivo.ufl.edu/individual/n116751> 315 <http://vivo.ufl.edu/individual/n7023308> 276 <http://vivo.ufl.edu/individual/n121504> 269 Top five predicates <http://vivo.ufl.edu/ontology/vivo-ufl/dateHarvested> 99709 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 72817 <http://vivoweb.org/ontology/core#dateTimeInterval> 35856 <http://vivoweb.org/ontology/core#start> 22458 <http://vivoweb.org/ontology/core#end> 22456 Top five Objects 2012-08-10-04:00 45157 2012-07-13-04:00 44927 <http://www.w3.org/2002/07/owl#Thing> 22740 <http://vivoweb.org/ontology/core#DateTimeInterval> 22454 Python Pubs version 1.0 9573 Figure 1. Log reports from python script Mo, J, Ding, M, Maizels, M, Ahn, A H, "A Brain Representation of Persistent Throbbing in a Patient With Chronic Migraine: Evidence for the Modulation of Attention and Sensory Processing", Headache, 52, 2012, pp 901 VIVO uri http://vivo.ufl.edu/individual/n7302907706 Gu, Qun Jane, Tang, Adrian, Xu, Zhiwei, Chang, Mau-Chung Frank, "A D-Band Passive Imager in 65 Nm Cmos", IEEE Microwave and Wireless Components Letters, 22, 2012, pp 263-265. doi: 10.1109/LMWC.2012.2192720 VIVO uri http://vivo.ufl.edu/individual/n7458921860 Gurbuz, Feyza, Pardalos, Panos M, "A Decision Making Process Application for the Slurry Production in Ceramics Via Fuzzy Cluster and Data Mining", Journal of Industrial and Management Optimization, 8, 2012, pp 285-297. doi: 10.3934/jimo.2012.8.285 VIVO uri http://vivo.ufl.edu/individual/n9478748156 Zager, Jonathan S, Chai, Christy Y, Beasley, Georgia M, Deneve, Jeremiah L, Chen, Y Ann, Marzban, Suroosh S, Grobmyer, Stephen R, Rawal, Bhupendra, Tyler, Douglas S, Hochwald, Steven N, "A Multi-Institutional Experience of Repeat Regional Chemotherapy for Recurrent Melanoma of Extremities", Annals of Surgical Oncology, 19, 2012, pp 1637-1643. doi: 10.1245/s10434-011-2151-z VIVO uri http://vivo.ufl.edu/individual/n1706363420 Figure 5. Papers being added to VIVO Use of Python for VIVO Application Programming Getting Started Visit www.python.org, download python and click to install. Get a good, quick read python book. Spend a day writing code. Spend a day studying code examples. Write something simple, make it work. Write something more sophisticated. Ask questions. Use Google to find Python examples and additional libraries. Use libraries to build on existing functions. The UF code example use Python 2.7.3. We use Python 2.7.3 because it is supported by Google App Engine4. Using Google App Engine, you can create on-line python web sites and applications using Google infrastructure at no-cost. Adding People to VIVO Making Web Pages Python and VIVO From a spreadsheet, RDF can be generated by Python to add people to VIVO, linking them to their home department. Once people are in and identified via UFID, subsequent scripts can attach grants, papers, photos, courses taught, positions held. See the UF Implementation Poster3 for additional information on processes used at UF to generate VIVO data to represent scholarship at UF. Simple Python functions can make SPARQL queries, and template libries can be used to make RDF. Python associative arrays (dictionaries) can store data from VIVO and provide extremely efficient look-up. A single query can return all people in VIVO which can then be placed in a dictionary. Subsequent code can refer to the dictionary without having the make additional queries. Python code can quickly retrieve RDF, parse it, find additional URIs and retrieve additional RDF, thereby following demantic web graphs and identifying data properties and values. At the University of Florida, Python is used to report from VIVO logs, generate web pages, and create RDF for ingest of people, papers, and positions held. The techniques demonstrated here can be used to ingest and report on any data in VIVO. Code from these examples will be available at VIVO repositories. Figure 3. Web page from VIVO data References Figure 4. Complete code for web page 1Python Home Page. www.python.org Accessed 8/16/2012. 2Ceder, VL The Quick Python Book, 2nd ed. Greenwich, CT: Manning, 2010, 336 pgs. ISBN 97819335182207 3Conlon, M, Barnes, CP, Sposato, V, Rejack, N, Schmidt, E, Collante, W, Guazzelli, L, Williams, S, Raum, N. Implementation of VIVO at the University of Florida. Conference Poster, VIVO 2012, Miami, FL. 4Google App Engine Home Page. https://appengine.google.com Accessed 8/16/2012. Figure 6. A profile created by Python software Mike Conlon, Nicholas Rejack and Laura Guazzelli UF Clinical and Translational Science Institute, University of Florida