Shelter from the Storm

Shelter from the Storm Building a Safe Archive in a Hostile World

SCOOP Goal • SURA-funded Coastal Modeling Project • Want to develop the community’s cutting-edge techniques to make them ready for use in tomorrow’s production systems. • For example, automatic verification of storm/surge models against observed data, to help improve the models

CCT Goals • One of CCTs key research outputs is software • Want this to be software of a good quality, to be robust • Want re-use of software across projects • Also want software to be picked up by external users, as well as collaborators

The SCOOP Archive • Need to archive lots of files • Atmospheric models (MM5, GFDL) • Hydrodynamic models (ADCIRC, SWAN, etc) • Observational data (sensor data, buoys) • Requirements poorly defined: • How much data? Don’t know • How long should we keep it for? Don’t know • Have to interface with bespoke data transport mechanisms (LDM) • How to achieve our goals under these conditions?!

Basic Archive Operation Upload: • Client signals they want to do an upload of some files (names are given) • Archive tells the client where to upload them to (transaction handles) • Client uploads files (indep. of archive) • Client tells archive it’s done • Archive creates the logical filenames • Use “upload” tool for this

Basic Archive Operation Download: • Clients use the catalog service to discover/search for logical filenames • Clients talk to the RLS server to get physical URLs • Interact with physical URLs directly • Can use “getdata” CLI tool to encapsulate this • Also, there are portal pages...

Operations on Service • fileUploadBegin - for starting an upload • fileUploadEnd - for saying that an upload is completed • logicalNameRetry • removeDeadTransactions • closeArchive

Distributed Software • Some services hosted externally • Can’t assume our machine or s/w never fails • Need to retain state of our service on restart

Robust Code • Don’t assume our service will remain “up” => Keep all internal state in a database => Reload internal state on a restart • Don’t assume external services always “up” => Design loosely coupled services => Store pending interactions in the database => Retry these periodically • Do “stress testing” on the service during the testing/debug cycle

Keep the internalAPIs Simple int logname_initialize(void); void logname_remove(void); bool logname_create_logfile (std::string logical_name, bool name_is_final, const std::vector<std::string>& urls); bool logname_delete_logfile(std::string logical_name); ulong logname_upload_pending_lognames (ulong max_rows, ulong& total_found, ulong& max_rows_used);

Encouraging Reuse • SCOOP Archive has lots of strange rules about filenames and metadata • During design and implementation, keep thinking: • Is this for the SCOOP project, or • Is this a generic feature • Use good O-O design to keep SCOOP code separate from archive code

Keeping SCOOPto one side... class ArchiveFilingLogic { public: // Called by the default moveFiles implementation virtual bool createPhysicalPath(std::string physicalPath); virtual bool moveFiles(std::vector<std::string>& fileNames,std::vector<std::string>& missingFiles,std::string stagePath,std::string physicalPath); virtual void physicalLocationForFiles (const std::vector<std::string>& filenames, std::map<std::string,std::string>& directories, std::map<std::string,std::string>& errors)=0; virtual std::vector<std::string> logicalNamesForFiles(const std::vector<std::string>& filenames,std::string physicalPath)=0; } ;

New Requirements • Handling common compression formats • Producing subsets of data (predictively) • Tracking data before it is ingested • Notifying people when data arrives • Transforming data to other formats • Generating analytical data “on the fly” • Federating data across multiple locations • Good initial design will simplify all this...

Highest Priority... • Archive machine running out of space • People have started to rely on the service • So, currently we are uploading copies of all data to SDSC DataCenter, using SRB • Now need to keep track of URLs on physically distributed resources • But SRB can help with some of the other requirements...

Any Questions?

Shelter from the Storm

Shelter from the Storm

Presentation Transcript

Storm from Woodsong

Lessons from the December storm

Storm Safe Shelter

The Ultimate Storm Shelter – The Hurricane.

Take Shelter from the 2013 Tax Storm

From Shelter to Home Fargo-Moorhead Shelter Animals Jeff Canning

The Storm

“The Storm”

Shelter from the Storm

Quality Control in the Storm Shelter Industry

Shelter from the Storm: Supporting and Intervening with Children Affected by Domestic Violence

Shelter Training, Shelter Library and Shelter Community

Shelter

Gimme Shelter: Weathering the Media Storm of a Cheating Scandal

Shelter from the Global Economic Storm

A Shelter in the Time of Storm

The Storm

Update from the Global Shelter Cluster

The Storm

SHELTER