370 likes | 484 Views
NPClime – Backend Services Greg Hill. 2 April 2008. Data Management Conference Ft Collins, Colorado. overview – purpose - structure. Backend encapsulates 4 primary subservices. Discover – locate stations, view metadata Download – access station data Summarize – process data
E N D
NPClime – Backend Services Greg Hill 2 April 2008 Data Management Conference Ft Collins, Colorado
overview – purpose - structure • Backend encapsulates 4 primary subservices. • Discover – locate stations, view metadata • Download – access station data • Summarize – process data • Analyze – plot data
general observations • Inherently difficult to QA. Many ‘answers’ to check, unknown starting point is with NPClime or upstream (GIGO). • Clear, focused, simple project goals – nice to have. • Use Cases are subtle and wonderful, but quite difficult to come up with. • While we have many, we only have 1 ‘big one’, which I am calling the Frankenstein Case:
subservice characteristics, unique challenges to each • Discover – metadata, query language: • 7 ‘search buckets’ , each with own constraints – spatial, temporal, hierarchical, textual, numerical, relationship, logical • Download – connectivity, scale, time: • ‘100 stations for 100 years’, threads, limits • Summarize – math, reporting: • segmentation, missing values, layout, images • Analyze – local distributed computing: • ‘R’ statistics, plotting, layout, archival
overall facets of effort – challenges as a whole • Heterogeneous (‘alphabet soup’): • interprocess communication, control • Distributed (NPS, WRCC, ACIS): • firewalls, proxies, bandwidth, timeouts • Simultaneous (multiuser): • sessions, threads, scheduling, synchronization • Heavy Duty: • error handling, load balancing, recovery, memory • Data Intensive: • ‘dirty’ data, units, voluminous, delivery
NPClime acknowledgements – projects - people NPS I&M, IT ACIS / WRCC Bio- Geomancer (Berkeley) Alexandria Digital Library (UCSB) PostGIS, JTS, Java, Python, asm PDFBox, curl, ImageMagick, itext, JExcelAPI, ojalgo
Feature Service Data Service sessionid sessionid sessionid sessionid, timestamp timestamp Sums R services and subservices – what ties everything together? UI OACIS Service Timer Service OACIS Service returns immediately, redirects job asynchronously using “magic box”; UI polls Timer for status.
what would data managers be interested in? • protocols for handling time • data formats, database schemas • filesystem usage • myriad tricks – none major in and of itself what would programmers be interested in? • architecture • geometry • synchronization • error handling, communication
protocols for handling time – 2 simple observations • unique timestamp, e.g. 1205946880120has the same numerical and lexical sort order. yyyymmdd form of date shares this property. 20000101 > 19991231 01012000 ? 12311999
filesystem usage – ‘degrees of separation’ model ‘acis’ <user> <category> ‘markup’ <ext> fixed depth +- 1 ‘xml’ ‘xsl’ <facet> Useful for both programmatic and manual access.
data formats • XML great for • marshalling / unmarshalling • schemas and transformations • known, irregular structure • XML is less suitable in some situations: • simple, regular structure • ‘XMLephant’ moral: tightly controlled = useful but cumbersome
JSON (Javascript Object Notation) • complements XML, does not replace it • can process without a blueprint (XSD) • simple assumptions: dimension, datatype • perfect for climate data • reads directly into R data frame • lightweight
metadata formats – even harder to sugar coat, never nice !! Consider normalizing units on a plot with this metadata ! (yes, this is a cooked example, but the possibility is real)
interface definition • syntax (how is x written) • semantics (what is x) • simple, unambiguous • valid inputs / outputs not enough! • still does not handle thornier questions of what makes an interface ‘good’ • for this, iterative approach is sometimes best – ‘good programs are not written, but grow’.
delivery techniques • timestamped .zip files have been effective. • advantages: • encapsulation • compression • familiarity • timestamp is cryptic, but useful
current development • efficiency, memory usage, bandwidth • lifetime of objects • database partitioning, indexing, etc. • multiuser (implemented, relatively untested) • error handling - two things to avoid: • little errors that bring program to a halt • big errors that go unnoticed
summary • 4 subservices, ‘waterfall’ analogy • each provides a value-added deliverable • external form and internal underpinnings • characteristics and challenges • tricks and techniques • lessons learned • current development
NPClime – Screenshots Greg Hill 2 April 2008 Data Management Conference Ft Collins, Colorado
Requirements Document Requirements Surveys Wants vs Needs Feedback Requirements Gathering Forum Other ideas Examples Existing tools NPClime Prototypes
Log on Query or Analysis? Criteria Type? Enter Spatial Criteria Enter Non-Spatial Criteria Edit / Store / Discard? Continue Entering Queries? Query Execution
Requirements Dependencies Contextual Query Capability Analysis Capability Functional User Interface
2 ACIS Data products 3 9 1 8 4 5 data 6 7 query stations, parameters NPClime UI XML-RPC report data, graphics GUID report + GUID GUID R Temp Archival Stored AOIs, Views, and Queries +data, graphics GUID Web Map Service ACIS Metadata GIS Layers