250 likes | 272 Views
Parsing KML with Python. More markup languages: XML, KML Digesting KML soup KML shp better than Arc. Dr. Tateosian. Review built-in ‘zip’ function. listA = ['FID', 'Shape', 'COVER', 'RECNO'] listB = ['OID', 'Geometry', 'String']
E N D
Parsing KML with Python More markup languages: XML, KML Digesting KML soup KML shp better than Arc Dr. Tateosian
Review built-in ‘zip’ function listA = ['FID', 'Shape', 'COVER', 'RECNO']listB = ['OID', 'Geometry', 'String'] fora, b in zip( listA, listB): print "a: ", a ," b: ", b a: FID b: OID a: Shape b: Geometry a: COVER b: String
What is XML? • XML stands for EXtensible Markup Language • XML is a markup language much like HTML • XML was designed to carry data, not to display data • XML tags are not predefined. You must define your own tags • XML is designed to be self-descriptive. letter.xml • XML was designed to transport and store data, focus on data content • HTML was designed to display data, focus on how data looks • XML documents have a root element and a tree structure <root> <child> <subchild>.....</subchild> </child></root>
KML == Keyhole markup language • KML: XML-based markup language for annotating and overlaying visualizations on 2D online maps or 3D Earth browsers (such as Google Earth) • Example: GPS generated kml file with info. about driving to sports events (M. Kanters)
Simple KML code example • An XML header. • This is line 1 in every KML file. • No spaces or other characters can appear before this line. • A KML namespace declaration. • This is line 2 in every KML 2.2 file. • A Placemark object that contains the following elements: • A name that is used as the label for the Placemark • A description that appears in the "balloon" attached to the Placemark • A Point that specifies the position of the Placemark on the Earth's surface (longitude, latitude, and optional altitude)
In class -- Exploring KML soup • What data type is t? • What does the 'find' method do? • What's the difference between t.contents and t.contents[0]? • What does a description tag contain? • How many placemarks are in this file? 2 • What does a placemark tag contain?
Exploring KML soup • What data type is t? • >>> t = soup.find('description') • >>> type(t)<class 'BeautifulSoup.Tag'> # Has properties such as 'name', 'attrs', and 'contents'. • What does the 'find' method do?Returns a ‘BeautifulSoup.Tag' object for the first occurrence of the tag name. • What's the difference between t.contents and t.contents[0]? >>> t.contents # a list [u'Tofu Gumbo and Zydeco!', <br />, u'Score: 97'] >>> t.contents[0] # a string containing the first content item u'Tofu Gumbo and Zydeco!' • What does a description tag contain? <description>Tofu Gumbo and Zydeco!<br />Score: 97</description> • How many placemarks are in this file? 2 • What does a placemark tag contain? <placemark><name>Bubba's Tofu Gumbo</name><description>Tofu Gumbo and Zydeco!<br />Score: 97</description><point><coordinates>-90,30,0</coordinates></point></placemark>
Parsing kml • ArcGIS conversion results. The 'PopupInfo' field contains all of the 'description' tag contents. • Desired results, the 'blurb' and 'score' fields contains the 'description' tag contents.
Parse kml example • Output:
Approach I: Convert KML to shapefile CALL the Create Feature Class (management) toolSET the field names and typesFOR each field name CALL the Add Field (management) toolCREATE a soup object from the kml file contentsGET tag lists from the soup (findAll)CREATE an insert cursorFOR each item in the tag lists GET the value for each field PUT field values in a list INSERT the new row into the shapefile
Driving Report example • Parsing kml with Python
Approach II: Convert KML to shapefile 1. Parse kmlplacemarks. • Find all placemarks. • FOR EACH point placemark. • Create a point object. • Get the point name. • Get the point description. • Parse the point description (Get date, time, etc.). • Add point object to point list. 2. Create output shapefile. • Create empty point shapefile. • Add fields (name, date, time, etc.). • Create insert cursor. • FOR EACH point in list. • Create a new row. • Set row properties based on current point. • Delete search cursor.
Types of placemarks in driving report KML 2. LineStrings placemarks 1. Point placemarks
In class – parseKML_pts_inclass.py • When complete the script should • Parse kmlplacemarks. • Create output shapefile. • Modify the script where you see ###
Summing up • Topics discussed • Review zip, consuming soup • Under the hood: XML, KML files • KML to shp to parse description • KML to shp to get points • Up next • fetch, parse, & uncompress
Long lines • Use variables • Use line continuation characters
HTMLParser • Built-in alternative to BeautifulSoup. • Need to understand oop concepts. • Write your own class. • Use inheritance. • Override methods.
Inheritance • OOP concept • Define child class with parent class name in parenthesis • a child class inherits properties and methods from a parent class. The 2 lines of code that make ChildClass inherit from ParentClass.
Override Method • OOP concept • child class inherits properties and methods from a parent class.
Use HTMLParser • Must write your own class that inherits from HTMLParser class ChildClass(ParentClass): def __init__(self): print 'The child class constructor ran!'ParentClass.__init__(self) • HTMLParser has several inbuilt methods (e.g., handle_starttag, handle_endtag, handle_data) that are intended to be overridden by you to suit your purpose. • Example: • HTMLParser method handle_starttag • To override, name it the same • and pass in 2 args ( tags and attributes)
HTMLParser Example Line 6 -- overwrite handle_starttag • Method takes 2 arguments from HTMLParser, tag and attrs. • --tag holds the name of the tag: 'a',‘br','body‘,’pre’ • --attrs holds a list of tuples with attributes and their values: [('href','www.google.com‘)] Line 17 -- Create a class object instance Line 18 -- Call the feed method to get it all started
Appendix: Subclasses inherit • benefit of OOP • subclassesextend a class; • subclass inherits all the properties and methods of the class. • Can define additional methods and properties • Can override methods defined in the superclass. • overriding means they provide their own definitions for methods defined in a superclass. • Easier to reuse similar code between various classes. • Example: • Human superclass, contains common characteristics and behaviors of all humans (heartRate, uglyshorts, fallInLove). • Could build several subclasses that inherit from the human superclass and add characteristics and behaviors specific to that type of human. • male/female subclasses inherit from human class? • man/boy subclasses of male? • Writing subclasses lets you reuse code. Instead of recreating all the code common to both classes, you can simply extend an existing class.