290 likes | 305 Views
Explore the use of machine learning detour for slow extracted spill control in North Area Fixed target experiments. Investigate the possibility of injecting sinusoidal modulation to compensate for noise on spill. Test the resilience of trained agent for changes in spill parameters.
E N D
Kill spillThe ML-detour for slow extracted spill control S. Hirlander, V. Kain
Spill quality to North Area Fixed target experiments • Resonant extraction: 2/3 integer resonance • Slow extraction over ~ 5 s flattop • Constraint from experiments in NA:
Spill quality to North Area Fixed target experiments • Resonant extraction: 2/3 integer resonance • Slow extraction over ~ 5 s flattop • Constraint from experiments in NA:
Spill quality to North Area Fixed target experiments • Resonant extraction: 2/3 integer resonance • Slow extraction over ~ 5 s flattop • Constraint from experiments in NA:
Spill quality to North Area Fixed target experiments • Inject sinusoidal modulation of quadrupole current (QF current in future) to compensate n x 50 Hz noise on spill • Can inject at 50, 100, 150 and 300 Hz • Task: find amplitude and phase of correction signal at various frequencies • And: spill noise drifts in amplitude and phase
Other example… • Can be really bad… • Since 2016: measure phase, "calibrate" electronics for phase response: • Adjust amplitude after scan • Typically 10 iterations
Can we reduce correction time to 1 step • What about reinforcement learning? • test case for NAF algorithm • Ingredients: • OpenAI gym environment simulating spill • What is the state? • What is an action? • What is the reward?
Simulated environment • Only 50 Hz • Measured signal: • Goal: minimize ripple with the optimal
Simulated environment • “BSI spill monitor” (2kHz) for SHiP cycle • State: real and imaginary part of the FFT spectrum at 50 Hz • Reward: effective spill length: max = 1 s
Numerical optimizer versus RL • goal: • • COBYLA: example between 25 and 40 iterations Could be accelerated by restricting the search space.
Numerical optimizer versus RL • RL with NAF • Simon has prepared a wrapper in the spinup style for the NAF algorithm
NAF and Environment • Activation function of final layer: tanh • action: np.array: • scale it to correct values in environment
How to train? • Ideally want to learn best policy to correct for when fspill, Aspill change. • But: to train need to be able to change those; Can only change fcorr and Acorr • Training works in episodes: • For each new episode reset() is invoked in OpenAI gym environment • Assume to start with fairly well corrected spill • reset(): choose random setting of fcorrand Acorr • During episode: step(): find best Dfcorr and DAcorr to maximise effective spill length • Stop if going too far • Stop if reached maximum number of allowed iterations per episode • Stop if reached acceptable effective spill length (i.e. 0.995) • Test resilience of trained agent for changes in fspill and Aspill
Does it work? • Training (reset(): random initial correction) • Example for a given seed: 654 150 episodes Max episode length: 300 1293 iterations
Does it work? • Training (reset(): random initial correction) • Example for a given seed: 654 150 episodes Max episode length: 300 1293 iterations ~ 6 h of beam time with production supercycle.
Does it work? • Test (reset(): 150 random initial spill) • Within limited range: fspill = +/- 20º . Amplitude change only up to DAspil = 0.1; Otherwise agent cannot solve it anymore! • Resilience to phase changes is good, but amplitude changes not good enough. • would need to run an optimizer after a drift too large in amplitude ….
Example - larger amplitude drifts • Test (reset(): 150 random initial spill) • Within limited range: fspill = +/- 20º . Amplitude change only up to DAspil = 0.3; Otherwise agent cannot solve it anymore!
What to do? Analytical solution • Can measure amplitude of oscillation with FFT • Without correction: Aspill • With correction: Asum • Need to calibrate electronics: • Amplitude and phase knob calibration factor offset what we set…
Calibration – amplitude settings • Switch off correction: measure Aspill • Switch on correction: measure Asumwith two amplitude correction settings ( A1, A2 )
Calibration – phase settings • Measure Asumwith two phase correction settings ( A1, A2 ) • calculate d according to formula before
Correction procedure • Do not want to switch off correction – huge amplitudes • Two measurements with different correction amplitudes • Calculate Aspill • Adjust ac such that • Calculate cosd as before, adjust pc that • Sign not uniquely defined, try both.
Conclusion • Numerical optimizers always work. • But they take some time • Training for RL is tricky. Cannot train for what is really drifting. • The training with the correction settings can however be used to compensate for spill amplitude and phase setting changes. • But only in a small range for drifts • If too large use optimizer to reset, then work again with agent • RL training takes a long time!! • Analytic solution looks fine! • But it did not work in the past • Calibration seemed to drift… • ….will have a new correction system. Need BEAM to check!