220 likes | 345 Views
Gamma-ray Large Area Space Telescope. GLAST Large Area Telescope Data Access Tony Johnson Stanford Linear Accelerator Center tonyj@slac.stanford.edu. http://glast-ground.slac.stanford.edu/. Outline. Topics Covered xrootd LAT Data Catalog Features Web Interface Tools Download Manager
E N D
Gamma-ray Large Area Space Telescope GLAST Large Area Telescope Data Access Tony Johnson Stanford Linear Accelerator Center tonyj@slac.stanford.edu http://glast-ground.slac.stanford.edu/
Outline • Topics Covered • xrootd • LAT Data Catalog • Features • Web Interface • Tools • Download Manager • Skimmer • WIRED • Astro Server • Miscellaneous
xrootd • xrootd • System developed at SLAC to manage large datasets • Distributes files across disks • Maximizes throughput • Minimizes manual disk management • Automates archiving datasets to (and restoring from) tape • Provides more reliability and scalability than NFS • Supports access control based on GLAST collaborator list • Has been in used for OpsSim2 and “Big MC Run” • Mostly working smoothly • Miscellaneous idiosyncrasies that need to be understood • Timeout problems when reading files
LAT Data Catalog • Data catalog is a database designed for tracking LAT datasets • Can be used with • Disk files in AFS, NFS, or XROOTD servers, or tape archives • Data created inside or outside of processing pipeline • Data created/stored at SLAC or elsewhere • One or more locations per dataset • Simplifies access to data by providing a uniform view of files irrespective of their physical location • Allows data to be organized into a tree of “virtual” folders • Folders don’t have to correspond to physical location of data • Allows data to have associated “meta-data” • Some meta-data is required and verified by catalog • size, location, run range, creation date • Other meta-data is user-defined and arbitrarily extensible • Data can be • Browsed using virtual folders and “groups” • Folders contain arbitrary sub-folders, datasets and groups • Groups contain homogeneous list of datasets • Searched using meta-data • E.g. DatasetType=MC && RunMin > 50 && RunMin < 100 • Data crawler • As new datasets are registered crawler validates files and extracts meta-data (file size, number of events, etc).
LAT Data Catalog - Web Interface Access/ Authentification handled by web Dataset Description Events, file size, run range automatically set by “crawler” Supports mirroring at multiple sites Browsable tree of datasets Meta-data added by creator • http://glast-ground.slac.stanford.edu/DataCatalog/
LAT Data Catalog - Tools • Pipeline Tools • From within “Pipeline Scriptlet” datasets can be • registered together with meta-data and multiple locations • located using meta-data and passed to subsequent processing stages • Command Line Tools • Available now • registerDataset • Wildcards supported for registering many datasets at once • find • List/search for files • addLocation • addMetadata • Coming soon • remove • move • Java API • Programmatic access to full functionality • More Info • Data catalog User’s Guide • http://confluence.slac.stanford.edu/display/ds/Data+Catalog+Users+Guide
Recent Improvements • Line-mode client find command • datacat find -G merit /MC-Tasks/OpsSim/opssim2-GR-v13r9/runs -s RunMin root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000002-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000003-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000004-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000005-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000006-merit.root • datacat find --recurse --search-groups -F 'DataType=="MERIT"&&nMetStart>=257731200 && nMetStart<=257731202' -S SLAC_XROOT -s TaskName -s Name /MC-Tasks/OpsSim/ root://glast-rdr//glast/mc/OpsSim/opssim2-GR-HEAD1-1041-2-6/merit/opssim2-GR-HEAD1-1041-2-6-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-HEAD1-1041-2-6/merit/opssim2-GR-HEAD1-1041-2-6-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p1/merit/opssim2-GR-v13r9p1-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p1/merit/opssim2-GR-v13r9p1-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p2/merit/opssim2-GR-v13r9p2-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p2/merit/opssim2-GR-v13r9p2-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p2-np/merit/opssim2-GR-v13r9p2-np-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p2-np/merit/opssim2-GR-v13r9p2-np-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p3/merit/opssim2-GR-v13r9p3-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p3/merit/opssim2-GR-v13r9p3-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-nocel/merit/opssim2-nocel-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-nocel/merit/opssim2-nocel-000001-merit.root • Available now in DEV, feedback encouraged • Dan is preparing adding to data catalog user’s guide • Enhancements to data catalog access in pipeline • Access meta-data from search results
Recent Improvements • New faster crawler • Original crawler was not able to keep up with MC running at full throttle. • New crawler processes files in parallel and can easily keep up • During Ops Sim2 problems discovered with files >2GB in length • Now fixed
Status/Problems/Plans • Problems • Can be painfully slow (with 5,000,000 datasets) • New oracle database being tested now • Karen working on adding “materialized views” • Further optimization of queries needed • Sensible pagination of large datasets • Web interface needs to allow selection of data based on • Run number range • Time range • Meta-data search (c.f. line-mode client) • File versions • As of Ops Sim 2 L1Proc registers multiple versions of files • r0257998848_v001_merit.root • r0257998848_v002_merit.root • Data catalog does not know these are multiple versions of the same file • Sends them both to the skimmer duplicate events • Propose to add versioning to data catalog (show only latest by default) • Need Custom Views of data • E.g. All ASP products for run nnn source abc • Plan • Fix problems
Download Manager • One-click download of multiple files • Inherits authorization from web login • note no anonymous FTP in future – SLAC account will be required for data access • Works with ftp:, http: and root: • Validates files (length, checksum) against data catalog • Supports simultaneous download of multiple files • Does not download files which already exist in target dir • So easy to fetch recently added files • Can resume download of partially downloaded files
Status/Problems/Plans • Several problems discovered during Ops Sim 2 • 100% CPU usage after file recovery (fixed) • Bad error message if checksum inconsistent (fixed) • Problems downloading files >2GB (almost fixed) • New feature • Start/Pause download requested (now available) • Feature requests pending • Ability to download select run/time ranges • This will work automatically once this feature is added to data catalog web application • Non-GUI version for automated download/sync of data • Ability to select files to download from GUI (without web)
LAT Data Skimmer • Allows data to be selected using “TCut” on tuple columns • Can output either Root or Fits (FT1) files • Uses Pipeline II for data processing • Allows parallel processing for large tasks • Output available for download for 10 days • Complete skim history maintained for later reuse
3 Ways to Access Data Skimmer • Directly from Data Portal • http://glast-ground.slac.stanford.edu/DataPortal/ • click on “Simple Skimmer” • Data Processing Page(s) • From the Data Catalog
Status/Problems/Plans • Problems • Backend/root crashes • new (compiled) backend available soon • E-mail notification should include data dir even if failed • Need to be able to navigate from pipeline> data dirs • Skimmer improvements in progress • Ability to skim more types of files • “svac” “cal” and “gcr” added by David Chamont • Web interface needs to catch up • Ability to output more event types • Full Recon, Digi, MC trees • “Extended Event” (intermediate between FT1 and Merit) • Event Lists • CompositeEventLists (CEL) files • Access to more “expert” options
Event Display (WIRED) • WIRED allows quick look at detector response • can be installed directly from Web with no additional GLAST software required. • Uses “HepRep” interchange format/infrastructure (shared with FRED)
Status/Problems/Plans • According to rumour doesn’t work outside my office • Actually it doesn’t work in my office either • But it did work fine for DC2 data • Invariant under spatial translations/rotations • Now being hooked up to data catalog/xrootd • Issue related to CEL files in gleam being investigated • Should be working again in next few days • “Event Display” link will appear it data catalog • Will support browsing events or selection of specific events
Astro Data Server • Similar to skimmer, allows events to be selected using cuts • Cuts can only be on position in the sky, energy, time, and event category • Works much faster than Skimmer • Currently loaded with DC2 data • Currently being refurbished for use with Service Challenge data and beyond • Will load all events as soon as they are produced by L1Proc • User will be able to select • all data including partial runs • only “complete” runs • Loose event cuts CTBClassLevel>1 • User can select CTBClassLevel category • Able to output FT1, FT2, Extended event files, Merit root files • API for programmatic event selection • Will be used by ASDC tools • Closer integration with data catalog, skimmer
Astro Data Server • Astro data server will remember the last set of parameters you used • Astro Server also has a “Favorites” page • Keeps a list of your “favorite” search parameters
Status/Problems/Plans • Was used for SC2 55 day run • Not used in Ops Sim 2 • Still plan to • Load data from L1Proc • Add programmatic interface for use by ASP/ASDC tools • Better integration with Data Portal • Bottom of priority list
Miscellaneous • Data Access Restrictions • Starting very soon (this week hopefully) you will need to be a “glast collaborator” to access files from xrootd • You will need to login to access data catalog/download manager • Need to define standard skims • Automate their production • Part of RSP? • Automate their registration in data catalog • Access to ASP/RSP data has not been discussed here • But is in the plan • Feedback from Ops Sim2 has been very useful • Not all digested yet • Need more/better documentation • Data Access frequently asked questions • http://confluence.slac.stanford.edu/x/zgAz • Please suggest more FAQ’s • More feedback welcome • http://glast-ground.slac.stanford.edu/DataPortal/