1 / 10

Paul Scherrer Institut

Paul Scherrer Institut. Timo Korhonen. Improvements to Indexing Tool (Channel Archiver) ‏. EPICS Meeting, BNL 2010. Channel Archiver at PSI. Currently four different archive servers are in use .

eadam
Download Presentation

Paul Scherrer Institut

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Paul Scherrer Institut Timo Korhonen Improvements to Indexing Tool (Channel Archiver)‏ EPICS Meeting, BNL 2010

  2. Channel Archiver at PSI • Currently four different archive servers are in use. • SLS Accelerator data: slsmcarch (machine archive server; HP, Xenon quadcore 2.66 GHz, 32 GB RAM)‏ • Long Term: since January 2001; 10314 channels; 70 GB • Medium Term: 6 months; 66883 channels; 120 GB • Short Term Archiver: 14 days; 70381 channels; 114G GB • Post Mortem Archiver: Stores the last famous words • Total available disc space for data: 500 GB • SLS Beamline data: slsblarch (beamline archive server; HP, AMD Opteron dualcore 1.8 GHz; 6 GB RAM)‏ • Long and short term archivers for every beamline (total 29 Engines)‏ • Short term archivers store data up to 12 months • Total amount of data: 163 GB / 384 GB

  3. Channel Archiver at PSI • archive servers (cont) • PSI (office) data: gfaofarch‏ • Long Term Archiver: Stores data since January 2006 • Medium and Short Term Archivers • ZHE Cyclotron High Energy • Long (since April 2008) • Medium and short term • SwissFEL: felarch1 (HP, Quadcore 2.66 GHz, 10 G RAM) • Small teststand OBLA • 638 channels, 2.1 Terabytes! • Waveforms, images • FIN250 test injector • LT, MT and ST (.6, 7.9 and 464 GB)

  4. Channel Archiver at PSI • The archive engines are running stable • The problems we have had are on the retrieval side • Indexing is used to speed up retrieval • Indexes on daily files • Master index on the whole archived data • We need the performance • The SwissFEL test machine is going to produce a lot of data • Waveforms, images • We need to archive more than in a production machine • For us, there is no need for (immediate) change • We would like to keep the channel archiver going • Updates, bugfixes • Retrieval tools • Waveform viewer, etc have been developed • Matlab export would be welcome • Indexing tools need work

  5. Index Tool improvements • Background • The ArchiveIndexTool is used at PSI in the night between Saturday and Sunday each week to create master indexes for the midterm archive. • Indexing is essential for good retrieval performance • The tool produces many errors when run on the EPICS archive indices to produce or to update the master index. • Disclaimer: I know very little about this, I just tell what the people who work on this have reported. • Involved people: • Gaudenz Jud (archiver maintenance, operation and development)‏ • Hans-Christian Stadler (PSI IT, Scientific Computing) is investigating the issue together with Gaudenz

  6. Index Tool improvements • Findings so far: • After investigating an error log: • From the code it is clear that the ArchiveEngine and the ArchiveIndexTool are not supposed to be used concurrently on the same indices. • Running them concurrently does produce errors – but not those we see in production. • the errors seem to only occur on the production machine, when there is a high load and a lot of disk activity. • try a quick fix: a retry mechanism on the highest level. All index files are closed and reopened after a delay. This quick fix seems to work so far.

  7. Observations: • The RTree implementation does not allow concurrent read/write access. It might be possible to arrange the file operations in a way that allows concurrent access when the index is stored on a strictly POSIX compliant file system. • The RTree implementation has a RTree node "cache" that only grows. Nodes are never evicted from the cache. I'm implementing a new LRU node cache with a fixed number of entries to see if this reduces system load. • The RTree implementation uses many small disk operations (see example code above). A reimplementation should use large disk transfers. • The RTree implementation is like a B-Tree, but does not adjust the node size to the disk sector size for improved I/O performance.

  8. Observations (continued): • The RTree implementation is not optimal for the use case seen at SLS, where data is inserted at the end only. This leads to a reduced fill level of the nodes. The RTree maintains the invariant, that only the root node may be filled less than 1/2. In addition to that data is moved between nodes too often, leading to many random accesses on disk. A reimplementation should feature a datastructure that is optimal for appends at the end.

  9. Conclusions so far: • Finding out the real reason for the errors is a time consuming process. The real reason for the errors has not yet been identified. • the offsets zu Data structures in index get corrupted. However, it is not clear where. • Because the corruption only happens when the load on the production system is high, logical errors in the normal execution path can be almost certainly excluded. • The experience so far suggests that a new implementation of the RTree Code could solve a number of problems

  10. Thank you for your attention!

More Related