CASTOR logging at RAL

CASTOR logging at RAL Rob Appleyard, James Adams and Kashyap Manjusha

The plan • Implement CERN’s new CASTOR logging system at RAL • Start work at the source of messages and work through to the destination. • Simple, right?

simple-log-producer • Easy to set up, ran well. • We weren’t sure of the reason to write custom code for this, we are aware of off-the-shelf products that do the same thing.

Apollo/ActiveMQ • CERN planned to move from Apollo to ActiveMQ, so we decided to go straight to the final destination. • The ActiveMQ broker proved difficult to set up. • Arcane config. • Dept. experience – they got it running once and hoped to never touch it again! • ActiveMQ seemed like overkill. • It is a heavyweight bit of software. • Our use case is extremely simple; take messages from multiple sources and forward them on. • Is there a simpler way of doing this?

Message Broker: The solution! • Replace the ActiveMQ broker with some rsyslog config that does the same thing. • We use rsyslog for all other Tier 1 logging at RAL. • Lightweight solution that does what we needed. • Simply send unprocessed log messages over TCP. • A couple of lines in rsyslog.conf were all that was necessary.

But… • The simple-log-producer is not simply a forwarding mechanism. Messages processed locally before transmission. • We need to do the processing done by slp somewhere downstream. • Combine slp and the consumer scripts into one script that runs on the ‘viewer’ node. • We could also eliminate the rsyslog broker, and just send directly to the viewer.

smooshed-log-producer • Attempt to combine simple-log-producer with the consumers. • James spent several days working on this. • Thought that the system could more easily be re-implemented using standard software. • Let’s have a go!

The Off-the-Shelf Approach • Use logstash feeding ElasticSearch and Kibana. • All three components are affiliated open-source products. • <1 day to produce a working prototype. • We already have a solution for long-term archival of log messages; central loggers that capture all Tier 1 messages.

Logstash • Open source log management tool. • Input -> Filters -> Output • Can interface with… more or less anything.

Logstash • Our setup receives syslog messages over TCP, tokenises and forwards them to ElasticSearch. • RAL has experimented with it in the past for other applications. • See: logstash.net

ElasticSearch • Distributed real-time search and analysis tool. • Based on Apache Lucene (JSON document-based search) • Horizontal construction – need more capacity? Just add more nodes. • Currently running on a two-machine cluster • Accepting messages from preproduction instance.

Kibana

Kibana • Web FE for ElasticSearch. • Index and full text of every CASTOR log message. All tokenised and searchable. • Search on any message field. • Arbitrary queries • Lots of graphs and analytics. • Much faster than DLF, at least with preprod. • No Oracle or MySQL database required. • Current implementation: LINK!

The Result We have a system that appears to be capable of fulfilling our needs better than DLF. • Faster • Able to run arbitrary queries. • Components are all off-the-shelf. • Correctly handles all CASTOR messages that DLF did (which isn’t everything…) • Needs some help to interpret and deal with a few anomalies. • The xroot logs are weird.

Future Plans • Currently working against preprod during 2.1.14 stress testing. • During stress testing, received 16 GB/day of CASTOR logs. • No dependency on 2.1.14 • We aim to start testing against our production instances before Christmas. • Scalability? • The plan is to have one message index per CASTOR instance. • Possible future development: • Reconfigure rsyslog on source nodes to send JSON to logstash rather than syslog (should be pretty trivial).

Questions?

CASTOR logging at RAL