1 / 37

NPClime – Backend Services Greg Hill

NPClime – Backend Services Greg Hill. 2 April 2008. Data Management Conference Ft Collins, Colorado. overview – purpose - structure. Backend encapsulates 4 primary subservices. Discover – locate stations, view metadata Download – access station data Summarize – process data

lelia
Download Presentation

NPClime – Backend Services Greg Hill

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NPClime – Backend Services Greg Hill 2 April 2008 Data Management Conference Ft Collins, Colorado

  2. overview – purpose - structure • Backend encapsulates 4 primary subservices. • Discover – locate stations, view metadata • Download – access station data • Summarize – process data • Analyze – plot data

  3. general observations • Inherently difficult to QA. Many ‘answers’ to check, unknown starting point is with NPClime or upstream (GIGO). • Clear, focused, simple project goals – nice to have. • Use Cases are subtle and wonderful, but quite difficult to come up with. • While we have many, we only have 1 ‘big one’, which I am calling the Frankenstein Case:

  4. subservice characteristics, unique challenges to each • Discover – metadata, query language: • 7 ‘search buckets’ , each with own constraints – spatial, temporal, hierarchical, textual, numerical, relationship, logical • Download – connectivity, scale, time: • ‘100 stations for 100 years’, threads, limits • Summarize – math, reporting: • segmentation, missing values, layout, images • Analyze – local distributed computing: • ‘R’ statistics, plotting, layout, archival

  5. overall facets of effort – challenges as a whole • Heterogeneous (‘alphabet soup’): • interprocess communication, control • Distributed (NPS, WRCC, ACIS): • firewalls, proxies, bandwidth, timeouts • Simultaneous (multiuser): • sessions, threads, scheduling, synchronization • Heavy Duty: • error handling, load balancing, recovery, memory • Data Intensive: • ‘dirty’ data, units, voluminous, delivery

  6. NPClime acknowledgements – projects - people NPS I&M, IT ACIS / WRCC Bio- Geomancer (Berkeley) Alexandria Digital Library (UCSB) PostGIS, JTS, Java, Python, asm PDFBox, curl, ImageMagick, itext, JExcelAPI, ojalgo

  7. Feature Service Data Service sessionid sessionid sessionid sessionid, timestamp timestamp Sums R services and subservices – what ties everything together? UI OACIS Service Timer Service OACIS Service returns immediately, redirects job asynchronously using “magic box”; UI polls Timer for status.

  8. what would data managers be interested in? • protocols for handling time • data formats, database schemas • filesystem usage • myriad tricks – none major in and of itself what would programmers be interested in? • architecture • geometry • synchronization • error handling, communication

  9. protocols for handling time – 2 simple observations • unique timestamp, e.g. 1205946880120has the same numerical and lexical sort order. yyyymmdd form of date shares this property. 20000101 > 19991231 01012000 ? 12311999

  10. filesystem usage – ‘degrees of separation’ model ‘acis’ <user> <category> ‘markup’ <ext> fixed depth +- 1 ‘xml’ ‘xsl’ <facet> Useful for both programmatic and manual access.

  11. data formats • XML great for • marshalling / unmarshalling • schemas and transformations • known, irregular structure • XML is less suitable in some situations: • simple, regular structure • ‘XMLephant’ moral: tightly controlled = useful but cumbersome

  12. JSON (Javascript Object Notation) • complements XML, does not replace it • can process without a blueprint (XSD) • simple assumptions: dimension, datatype • perfect for climate data • reads directly into R data frame • lightweight

  13. data formats – hard to sugar coat, even when nice !!

  14. metadata formats – even harder to sugar coat, never nice !! Consider normalizing units on a plot with this metadata ! (yes, this is a cooked example, but the possibility is real)

  15. interface definition • syntax (how is x written) • semantics (what is x) • simple, unambiguous • valid inputs / outputs not enough! • still does not handle thornier questions of what makes an interface ‘good’ • for this, iterative approach is sometimes best – ‘good programs are not written, but grow’.

  16. delivery techniques • timestamped .zip files have been effective. • advantages: • encapsulation • compression • familiarity • timestamp is cryptic, but useful

  17. current development • efficiency, memory usage, bandwidth • lifetime of objects • database partitioning, indexing, etc. • multiuser (implemented, relatively untested) • error handling - two things to avoid: • little errors that bring program to a halt • big errors that go unnoticed

  18. summary • 4 subservices, ‘waterfall’ analogy • each provides a value-added deliverable • external form and internal underpinnings • characteristics and challenges • tricks and techniques • lessons learned • current development

  19. NPClime – Screenshots Greg Hill 2 April 2008 Data Management Conference Ft Collins, Colorado

  20. Queries

  21. Queries

  22. Queries

  23. Queries

  24. Queries

  25. Queries

  26. Queries

  27. Queries

  28. Queries

  29. Queries

  30. Queries

  31. Requirements Document Requirements Surveys Wants vs Needs Feedback Requirements Gathering Forum Other ideas Examples Existing tools NPClime Prototypes

  32. Log on Query or Analysis? Criteria Type? Enter Spatial Criteria Enter Non-Spatial Criteria Edit / Store / Discard? Continue Entering Queries? Query Execution

  33. Requirements Dependencies Contextual Query Capability Analysis Capability Functional User Interface

  34. 2 ACIS Data products 3 9 1 8 4 5 data 6 7 query stations, parameters NPClime UI XML-RPC report data, graphics GUID report + GUID GUID R Temp Archival Stored AOIs, Views, and Queries +data, graphics GUID Web Map Service ACIS Metadata GIS Layers

More Related