60 likes | 66 Views
Alessandro Cavalli, Alfredo Pagano (INFN/CNAF, Bologna, Italy). Failover Procedures for operational tools COD-15 PARALLEL SECTIONS Lyon, 7 February 2008. GOCDB. People present: Alessandro, Cristina, Cyril, Gilles, Guillaume, Kai, Osman
E N D
Alessandro Cavalli, Alfredo Pagano (INFN/CNAF, Bologna, Italy) Failover Proceduresfor operational toolsCOD-15PARALLEL SECTIONS Lyon, 7 February 2008
GOCDB Lyon, 7 Feb 2008 • People present: Alessandro, Cristina, Cyril, Gilles, Guillaume, Kai, Osman • The 2 main points to work on, in parallel, in the near future: • Quick solution to have some level of failover ASAP • The final failover solution • First quick solution: • Alessandro fast and dirty idea is to try to automate what was manually done for the CIC portal in the past: • Close the write access on DB • Issue the “export” command to create a full dump (schema,table,indexes, etc) • Open again full access to DB • Compress and transfer the dump, doing checksum control • Apply the dump at the backup DB side (CNAF) • The idea is to choose the less impacting moment of the 24 hours (e.g. when it’s working hours on America+pacific), and to automate all this as a cron job, on both sides (RAL+CNAF). On Oracle RAC cluster, it has to be done maybe on only one of the cluster elements • The small DB size suggests, given the experience with the CIC portal, that the main DB will be off for only few minutes (well < 5)
GOCDB (2) Lyon, 7 Feb 2008 • As the other, more definitive failover approach, it has been stated that: • We want to set it up between RAL and CNAF • It should have more frequent refresh of data, possibly real-time synchronization • It would be a read only DB anyway • We could give it a try with Streams: • Guillaume is confident that it should be achievable, also as a first step (avoiding the dirty solution) • Alessandro is more worried, because of the feedback from CNAF DBAs about Streams trickiness, and lots of resistances from other DBA teams around, to give support for this • Connection method: • From the operations tools (READ ONLY): Oracle-style connection properties can be used, with “FAILOVER = ON” attribute. Also the RO replica can be included, as far as we can provide the proper freshness of data on the replica DB • From the web interface (READ WRITE): we must have special care to detect when the web is using the RO replica because of main is KO: the web interface must notify the web user that UPDATE DATA is NOT possible
GOCDB (3) Lyon, 7 Feb 2008 • Cristina has to bring all we said to Keir (GOCDB Oracle admin), so that he makes his own idea about it • Soon, right after the COD, we will have a phone meeting with Keir and Alfredo. We will get into deeper details, and establish the collaboration to produce some result for the 1st level-solution • Fruitful talk with Osman: he is getting data from GOCDB for the CIC portal with materialized views. This must be investigated asap, because with the “REFRESH COMPLETE” clause we might get pretty easily a periodic snapshot at CNAF without ANY added feature/configuration at RAL • As soon as possible Alessandro and Alfredo will study the materialized view that should be performed from CNAF to RAL
CIC portal Lyon, 7 Feb 2008 • People: Alessandro and Gilles • We had only time to focus on: • What components of CIC portal would be seriously affected by going on READ ONLY DB failover mode? • Is it worth the effort? (Do we get a usable portal, compared with the effort to get to this result?) • Affected portal elements: • COD dashboard should work!! (except NOTEPAD & HANDOVER) • VO ID cards: READ ONLY • SITE/ROC reports: NOT working • BROADCAST TOOL: without archiving (can we accept it?) • S.D. notification: NOT working • Conclusions: • while we are getting achievements with GOCDB in the next few weeks, we could try to apply a similar solution to the CIC portal • About “is it worth the effort” question… someone has comments as portal developer or as portal user?? • Realtime discussion with Gilles: • he can produce some interesting statistic about % of READ and WRITE requests • the impression is that it is worth the effort: less priority than GOCDB, but let’s give it a try
Actions Lyon, 7 Feb 2008 • GOCDB: some result has to be achieved in the next 2 weeks (by Feb 22) • GOCDB: 1st level of failover, that could be either manually or automatically depending on the progresses done, ready at CNAF (end of March) • CIC portal: TBD depending also on GOCDB achievements • Contact again Emir: some test result for next f2f COD • Update Wiki