300 likes | 448 Views
Operation in AB-CO 2005 & Beyond. Scope. How to ensure a support to operation with the right quality of services Domains Are: PS Complex : Linac2, Linac3, PSB, PS, AD, Isolde (+ REX), LEIR SPS & Transfer lines Experimental area CTF3 LHC Hardware commissioning Cryogenic systems
E N D
Scope • How to ensure a support to operation with the right quality of services • Domains Are: • PS Complex : Linac2, Linac3, PSB, PS, AD, Isolde (+ REX), LEIR • SPS & Transfer lines • Experimental area • CTF3 • LHC Hardware commissioning • Cryogenic systems • Beam interlock & Powering interlock Systems • QPS • Vacuum, PO • LHC
Objectives • Homogenize principles through the different domains • Include the new requirements • Hardware commissioning • LHC commissioning & operation • Identify and Agree with partners on responsibility limits • Emit recommendations on, organization tools, procedures,
Planning • 15 Octobre first meeting • End of December proposals for • 2005 • End of april proposals for • 2006-2010 reminding : • 2006-2007 Hardware commissioning • 2006 LEIR run • 2007-2008 LHC Commissioning • 2008-2009 first phase of LHC operation
Recommendations for 2005 • LINAC, BOOSTER, ISOLDE, LINAC3. • As it is now with CO internal adjustments • LEIR. • During commissioning PL will organize support • After acceptation same as above with enforced support for new technology • SPS • No piquet support, Only insfrastructure support during working time • LHC hardware Commissioning • Each PL organize the support for his project (PIC, QPS,CRYO,….) • Infrastructure support for Servers, FIP, PVSS, FESA, CMW, Laser, logging,
CO Software (app, components) LASER, Logging , BIC … CMW UNICOS (PIC,QPS,CRYO) CM, JAPC… UNICOS (PIC,QPS,CRYO) LASER, Logging,… Timing CO DIAG FESA PIC,BIC,QPS Cryo Ring FESA UNICOS (Cryo) PIC
Tools for Hardware installation & Operation • Naming Convention • Layout DB • Two layers of descripition • System (PLC, VWE, GATEWAY, FIP segment, Server,..) • Functional Component (slot) of systems (board, Power Supply CPU,…) • Connection to functional slots (timing, PIC, Power, Ethernet • ABCAM • Asset management tools describe all physical equipment associated to a functional slots
VME-VXI • Failure types • Power/Network failure • RACK top : Power supply, timing fan out, RF repeater (local diagnostic) intervention by trained team with procedure • CPU : (monitored by Xcluc), intervention by trained team with procedure • CO Board : (all CO board does not contains remote monitoring mechanism or if they exist they are not homogeneous) intervention by trained team with procedure • 1553 Fieldbus and serial link (not always monitored) intervention by trained team with procedure • Application in FE : (seen in Xcluc) repair or reboot or do nothing by operators
VME-VXI • Problems to address • 450 units , Several back planes type • BDI,RF types cannot be maintained by CO • PS complex equipments to be transferred from configuration DB to the Hardware maintenance tools • Different monitoring & remote action methods • Huge investment (money & manpower) to be done to homogenize • Some equipments does not have monitoring capabilities (racks) • Cohabitation of CO non CO managed board • PB of differential diagnostics • Who is doing the intervention
FIP • Failure types • Power (disseminated power supplies along the network) • Ethernet only for gateway • Gateway (150) components failures (diagnostic on Xcluc) gateway replacement by trained team with procedure, soft reloading by operators • Mother board, power supply, FIP Board,Timing cards • Segment (585)Component failures (diagnostic via FIP diagnostic tool) component replacement by trained team with procedure • Copper/ Fiber coupler, Cu/Cu repeater,FIP DIAG • Agent failures (diagnostic via FIP diagnostic tool or supervision/expert application) equipment group responsibility • Application in FE : (seen in Xcluc) repair or reboot or do nothing by operators
FIP • Problem to address • CO declare all components/architectures/layout in the maintenance/operation tools • Provide homogeneous Tools for Diagnostic & remote action • Remote reset . Restart gateway • Make difference between agent (equipment) and FIP (CO) problem • Agent diagnostics
PLC • Failure types • Power / Network • Back plane power supply , PLC Ethernet board, CPU board (no remote differential diagnostic possible) intervention by trained team with procedure • IO board or field bus board failure (monitored by PLC console software) intervention by trained team with procedure • Instruments or electronic failure (PIC)(monitored trough PLC/PVSS) intervention by specialist • Application failure (seen in supervision system) action via PLC console software by specialist
PLC • Problem to Address • PLC owned BY CO (Cryo(125), PIC/WIC(44), RR(??)) • Different projects with different constraint and principles • For PIC CO is also responsible for electronic equipments monitored via PLC/PVSS • PLC owned by Equipment group (BT, PO, VAC,RF(20)some PLC in between (30) • We have to determine limit of CO responsibilities & services • Centralize all PLC related information in tools accepted by the community • Abcam, LayoutDB • Common Diagnostics principles to be established • Generalize and complete IEPLC diagnostics methodology to all PLCs • Remote reset/action are not always a good strategy (disastrous for Cryo PLC with a Ethernet PB) Action possible only after a local diagnostics • Intervention procedures need to be establish by CO and followed by a trained (on PLC) team • After a CPU replacement application reload needed in some cases • The support need to know how to use PLC console program • Identify who can perform these task and train them
TIMING • Failure type • GMT Distribution • Power failure • failure of a Timing component (Coupler, repeater, Timing Board) trained team • Cable or Fiber disconnection/cut trained team • Timing board failure on client unit (VME, Gateway) trained team • Timing Distribution • Connection /repeaters trained team • Event timing disabled by user : should be treated by operators • MTG sequencer • Hardware failure specialists • Error in programming operation timing specialist • Timing reception via Ethernet in work stations (video)
TIMING • Problem to Address • Introduce GMT layout & Timing distribution Layout DB • Back log of “PS complex” • Difficult to sort Software/User error & hardware for normal operation crew • Several tools for timing diagnostics for different PB • CTRtest, TG8test timing board reception check • Video: telegram reception (In FE and WS) • TestTGM : availability of services • Necessity to have a real timing competence always available in OP • First diagnostic and solution of softwar&user errors • Timing related work is part of the normal Operator Work but it’s not tracked as it should be by OP
Servers • Failure types • Power/network (all systems grouped in restricted area) • Loss of a system resource • CPU, Power supplies, disk • Repair operator • Hardware intervention (specialist) • Configuration Loss : • Repair /reboot does not solve PB • restore from a backup (specialist) • Application • Diag In application itself • Repair from xcluc (operator) • Problems to address • OS Configuration homogenization • Still some PS/SL way of life to migrate toward AB • Procedure & training for operator intervention • What is the task of the operator • How to do it in a proper way
Power Dependence • Identify a power Failure on all Process Control devices • All systems must be entered in layout DB • Connection to power supply known • All power units must be monitored • What does that mean ?? • Is the granularity achieved by TS-EL compatible with our needs ??? • How to make the link between TS-EL monitoring system And CO equipment • GTPM (data collection nee to be organized) • ANOTHER TOOL… • Intervention should be done by OP/TS-EL
Network Dependence • Identify a Network Failure on all Process Control devices • All systems must be entered in layout DB • Link to be establish to Netops • All network components are monitored • How to link the NETOPS/spectrum information to the CO diagnostic tools
Java Applications : Situation • Legacy software • Known by CO : One member can maintain them • Orphans Applications : ??? • Both case : Phasing out “Moyen terme” . • New application or new component (library) • Developed by CO or CO/OP team , this team develops according to common rules • Diagnostic tools available in CCC to make distinction between application failure or external Problem • Software Component List necessary for the application • Hardware dependence List • Technical contact list. • Failure Types • Controlled process (application) Process expert • Control system (application, xcluc ,…) control Specialist • Front end communication, application server. CMW server…) • Application (Xcluc) repair and if not efficient application Specialist • Config error for data driven application (process expert) • No efficient Intervention on application Software can be done by a non expert
Java application • Problem to address • For legacy software • Identify and plan all legacy and Orphans applications upgrade • If no upgrade (not possible or non useful) or before upgrade identify an expert or a support team per application (team can be a mix OP/CO/… Staff) • For new software • Identify the expert team per application (OP/CO/…) • Include in application documentation or online : • List of dependencies to other application • List of hardware dependencies
DM application • Failure type • Oracle server IT • Applications server see server page • Logging application : A monitoring tool exists for logging on a web based access page. Can be seen & corrected by CCC operator • Config DB : ??? • Problem to address • Ensure the guaranty of services 365/24 by IT for oracle server • Prepare procedure for CCC operator on reference server web based intervention.
PVSS Application • No automatic control actions performed in PVSS applications: • Monitoring, Operator command request, Interface to LASER/logging • All applications Based on JUNICOS frameworks • Same principles of monitoring through all applications • Failure types are not applications dependant • Failure Types • Controlled process (via application & SMS) Process expert • Control system (via PVSS monitoring tool) PVSS Specialist • Front end comunication,Data server CPU disk usage,Archive monitoring,,Logging exchange monitoring.. • PVSS manager (auto repair in case of failure Xcluc) PVSS Specialist • Problems to address • Backup/Restore policy to be established • Integration with existing tools
Operation Responsibilities HT Timing /Sequencing Remote reset FE FC CMW FE All sections will have activities related to operation in 2006 • AP • Java Applications framework • High level applications for : • LEAR • LHC HC • LASER IN Servers FE (via xcluc) PIC/WIC IS PVSS IEPLC CRYO FIP Test bench DM Logging Configuration DB ABCAM LAYOUT DB
Present piquet know How HT Timing /Sequencing Remote reset FE FC CMW FE • AP • Java Applications framework • Legacy Application • High level application : • LEAR • LHC HC • LASER IN Servers FE (via xcluc) PIC/WIC IS PVSS IEPLC CRYO FIP Test bench DM Logging Configuration DB ABCAM LAYOUT DB
Some remarks • We have a large diversity of systems and only a small part is integrated today • The Present piquet team is not tailored to take over the entire operation duty of the CO group • 1 team leader , 4 experts ,2 new comers • “new” technologies not mastered by existing team • Geographical dispersion of equipement • In 2006 /2007 Operation activity will have to “Cohabite” with installation/commissioning activities
Firsts Proposals • For hardware system use systematically the layout DB and ABCAM tools • Together with OP clean the Power/Network Issues • Transmit to OP the Timing software management • Clarify responsibilities with equipments in all grey areas. • Prepare & execute the legacy software upgrade • Integrate all existing diagnostic tool • LASER (AP),GTPM (OP),XCLUC (IN),Spectrum (IT -CS),TIM (OP),PVSS UNICOS integrated diagnostics (IS/IN),Application integrated diagnostics (AP) ,DiagCMW (FC), TIMING Tools (HT), PLC consoles Tools (IS), FIP diagnostic Tool (IS), Logging monitoring (DM)
Tracks • All sections must organize (alone, in synergy with other, via a reorganization,…) the operation support of the systems or applications they deploy. • Not systematic organization (PIQUET OR LIST) • intervention team can be grouped • IE : hardware for VME, gateway, FIP, PLC • PVSS/PLC & PVSS/FEC applications support • Create an operation coordination (a Person or a Team) • Makes the interface toward OP • Coordinates the control system integration • Requesting procedure/documentation to system teams • Coordinating the diagnostic tools development • Requesting from the different team the functionalities necessary to operation • Create a Real Operation Oriented policy within the entire group
No installation No configuration No application modifications No application bug fixing No timing user error fixing No intervention on commissioning system No intervention on Power/network PB For system in operation Hardware Remote diagnostic Local diagnostic Reboot, or reinitialize communication Hardware intervention (with limitations) Application reloading (with limitation) Call Equipment specialists Software Refine diagnostic Reboot application (operators) Call specialists Management Tracks problems Requests & obtain improvements Possible Operation Team Duties/Limits for 2006