1 / 23

HDF & HDF-EOS Workshop XV 17 April 2012

Using HDF5 and Python: The H5py module. Daniel Kahn. Science Systems and Applications, Inc. Acknowledgement: Thanks to Ed Masuoka, NASA Contract NNG06HX18C. HDF & HDF-EOS Workshop XV 17 April 2012. Python has lists:. >>> for elem in ['FirstItem','SecondItem','ThirdItem']:

ghalib
Download Presentation

HDF & HDF-EOS Workshop XV 17 April 2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using HDF5 and Python: The H5py module Daniel Kahn Science Systems and Applications, Inc. Acknowledgement: Thanks to Ed Masuoka, NASA Contract NNG06HX18C HDF & HDF-EOS Workshop XV 17 April 2012

  2. Python has lists: >>> for elem in ['FirstItem','SecondItem','ThirdItem']: ... print elem ... FirstItem SecondItem ThirdItem >>> We can assign the list to a variable. >>> MyList = ['FirstItem','SecondItem','ThirdItem'] >>> for elem in MyList: ... print elem ... FirstItem SecondItem ThirdItem >>> HDF & HDF-EOS Workshop XV 17 April 2012

  3. Lists can contain a mix of objects: >>> MixedList = ['MyString',5,[72, 99.44]] >>> for elem in MixedList: ... print elem ... MyString 5 [72, 99.44] A list inside a list Lists can be addressed by index: >>> MixedList[0] 'MyString' >>> MixedList[2] [72, 99.44] HDF & HDF-EOS Workshop XV 17 April 2012

  4. A note about Python lists: Python lists are one dimensional. Arithmetic operations don’t work on them. Don’t be tempted to use them for scientific array based data sets. More the ‘right way’ later... HDF & HDF-EOS Workshop XV 17 April 2012

  5. Python has dictionaries. Dictionaries are key,value pairs >>> Dictionary = {'FirstKey':'FirstValue', 'SecondKey':'SecondValue', 'ThirdKey':'ThirdValue'} >>> Dictionary {'SecondKey': 'SecondValue', 'ThirdKey': 'ThirdValue', 'FirstKey': 'FirstValue'} >>> Notice that Python prints the key,value pairs in a different order than I typed them. The Key,Value pairs in a dictionary are unordered. HDF & HDF-EOS Workshop XV 17 April 2012

  6. Dictionaries are not lists, however we can easily create a list of the dictionary keys: >>> list(Dictionary) ['SecondKey', 'ThirdKey', 'FirstKey'] >>> We can use a dictionary in a loop without additional elaboration: >>> for Key in Dictionary: ... print Key,"---->",Dictionary[Key] ... SecondKey ----> SecondValue ThirdKey ----> ThirdValue FirstKey ----> FirstValue >>> HDF & HDF-EOS Workshop XV 17 April 2012

  7. HDF5 is made of “Dictionaries” a dataset name is the key, and the array is the value. Keys Value HDFView is a tool which shows use the keys (TreeView) and the values (TableView) of an HDF5 file. HDF & HDF-EOS Workshop XV 17 April 2012

  8. Andrew Collette’s H5py module allows us to use Python and HDF5 together. We can use H5py to manipulate HDF5 files as if they were Python Dictionaries >>> import h5py >>> in_fid = h5py.File('DansExample1.h5','r') >>> for DS in in_fid: ... print DS,"------->",in_fid[DS] ... FirstDataset -------> <HDF5 dataset "FirstDataset": shape (25,), type "<i4"> SecondDataset -------> <HDF5 dataset "SecondDataset": shape (3, 3), type "<i4"> ThirdDataset -------> <HDF5 dataset "ThirdDataset": shape (5, 5), type "<i4"> >>> Values Keys HDF & HDF-EOS Workshop XV 17 April 2012

  9. So What? We need to be able to manipulate the arrays, not just the file. The Numpy module by Travis Oliphant allows the manipulation of arrays in Python. We will see examples of writing arrays later, but to get arrays from the H5py object we have the ellipses. >>> import h5py >>> fid = h5py.File('DansExample1.h5','r') >>> fid['FirstDataset'] <HDF5 dataset "FirstDataset": shape (25,), type "<i4"> >>> fid['FirstDataset'][...] array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]) >>> type(fid['FirstDataset'][...]) <type 'numpy.ndarray'> >>> HDF & HDF-EOS Workshop XV 17 April 2012

  10. Reasons to use Python and HDF5 instead of C or Fortran The basic Python Dictionary object has a close similarity to the HDF5 Group. The object oriented and dynamic nature of Python allows the existing Dictionary syntax to be repurposed for HDF5 manipulation. In short, working with HDF5 in Python requires much less code than C or Fortran which means faster development and fewer errors. HDF & HDF-EOS Workshop XV 17 April 2012

  11. Comparison to C, h5_gzip: Fewer lines of code means fewer places to make mistakes The 37 line h5_gzip.py example is a “direct” translation of the C version. Some more advanced techniques offer insight into advantages of Python/H5py programming. Text in next slides is color coded to help match code with same functionality. First writing a file… HDF & HDF-EOS Workshop XV 17 April 2012

  12. HDF & HDF-EOS Workshop XV 17 April 2012

  13. Reading data…. HDF & HDF-EOS Workshop XV 17 April 2012

  14. And finally, just to see what the file looks like… HDF & HDF-EOS Workshop XV 17 April 2012

  15. Real world example: Table Comparison Background: For the OMPS Instruments we need to design binary arrays to be uploaded to the satellite to sub-sample the CCD to reduced data rate. For ground processing use we store these arrays in HDF5. As part of the design process we want to be able to compare arrays in two different files. HDF & HDF-EOS Workshop XV 17 April 2012

  16. Here is an example of a Sample Table HDF & HDF-EOS Workshop XV 17 April 2012

  17. Here is another example: HDF & HDF-EOS Workshop XV 17 April 2012

  18. Here is the “difference” of the arrays. Red pixels are unique to the first array. HDF & HDF-EOS Workshop XV 17 April 2012

  19. The code: CompareST.py #!/usr/bin/env python """ Documentation """ from __future__ import print_function,division import h5py import numpy import ViewFrame def CompareST(ST1,ST2,IntTime): with h5py.File(ST1,'r') as st1_fid,h5py.File(ST2,'r') as st2_fid: ST1 = st1_fid['/DATA/'+IntTime+'/SampleTable'].value ST2 = st2_fid['/DATA/'+IntTime+'/SampleTable'].value ST1[ST1!=0] = 1 ST2[ST2!=0] = 1 Diff = (ST1 - ST2) ST1[Diff == 1] = 2 ViewFrame.ViewFrame(ST1) HDF & HDF-EOS Workshop XV 17 April 2012

  20. ..and the command line argument parsing. if __name__ == "__main__": import argparse OptParser = argparse.ArgumentParser(description = __doc__) OptParser.add_argument("--ST1",help="SampleTableFile1") OptParser.add_argument("--ST2",help="SampleTableFile2") OptParser.add_argument("--IntTime",help="Integration Time", default='Long') options = OptParser.parse_args() CompareST(options.ST1,options.ST2,options.IntTime) HDF & HDF-EOS Workshop XV 17 April 2012

  21. Recursive descent into HDF5 file Print group names, number of children and dataset names. #!/usr/bin/env python from __future__ import print_function import h5py def print_num_children(obj): if isinstance(obj,h5py.highlevel.Group): print(obj.name,"Number of Children:",len(obj)) for ObjName in obj: # ObjName will a string print_num_children(obj[ObjName]) else: print(obj.name,"Not a group") with h5py.File('OMPS-NPP-NPP-LP_STB', 'r+') as f: print_num_children(f) HDF & HDF-EOS Workshop XV 17 April 2012

  22. The Result…. ssai-s01033@dkahn: ~/python % ./print_num_children.py / Number of Children: 1 /DATA Number of Children: 10 /DATA/AutoSplitLong Not a group /DATA/AutoSplitShort Not a group /DATA/AuxiliaryData Number of Children: 6 /DATA/AuxiliaryData/FeatureNames Not a group /DATA/AuxiliaryData/InputSpecification Not a group /DATA/AuxiliaryData/LongLowEndSaturationEstimate Not a group /DATA/AuxiliaryData/ShortLowEndSaturationEstimate Not a group /DATA/AuxiliaryData/Timings Number of Children: 2 /DATA/AuxiliaryData/Timings/Long Not a group /DATA/AuxiliaryData/Timings/Short Not a group /DATA/AuxiliaryData/dummy Not a group /DATA/Long Number of Children: 14 /DATA/Long/BadPixelTable Not a group /DATA/Long/BinTransitionTable Not a group /DATA/Long/FeatureNamesIndexes Not a group /DATA/Long/Gain Not a group /DATA/Long/InverseOMPSColumns Not a group HDF & HDF-EOS Workshop XV 17 April 2012

  23. Summary Python with H5py and Numpy modules make developing Programs to manipulate HDF5 files and perform calculations With HDF5 arrays simpler which increase development speed and reduces errors. HDF & HDF-EOS Workshop XV 17 April 2012

More Related