1 / 12

Continual Learning for Recurrent Neural Networks: an Empirical Evaluation

This talk focuses on Continual Learning with Recurrent Neural Networks (LSTM). First, we highlight the need to study these two topics, then we present the first review which includes existing papers and benchmarks and finally we empirically evaluate the performance of popular continual learning strategies on recurrent models.

andcos
Download Presentation

Continual Learning for Recurrent Neural Networks: an Empirical Evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Continual Learning for Recurrent Neural Networks Review + Empirical Evaluation Andrea Cossu, Antonio Carta, Vincenzo Lomonaco, Davide Bacciu

  2. Is sequential data processing important for CL? Human activity recognition (sensors, videos) Robot control (temporally-correlated raw data) Finance (next stock value prediction) Natural Language Processing (domain shift, translation) …

  3. Why do you focus on RNNs? e.g CNNs? Transformers? → different questions Variable number of layers → unrolling Weight sharing over time steps Backpropagation through time (for deep RNNs)

  4. Organized review of RNN in CL Let’s look at what is already here (shallow taxonomy) Seminal works simple studies on synthetic benchmarks NLP application-specific Bio-inspired / alternative recurrent paradigms Custom learning algorithms, ad-hoc architectures Deep networks LSTM, GRU… The vast majority of works focus on new models / strategies

  5. Benchmarks description

  6. ● Study the behavior of RNNs – with popular CL strategies not designed for sequential data processing – on application-agnostic benchmarks Our objective

  7. Experimental evaluation Class-incremental (no task labels), single-head 6 strategies + Naive + Joint Training ● EWC, MAS, LwF, GEM, A-GEM, Replay (random sampling) No architectural strategies Grid search protocol with held-out experiences

  8. Benchmarks Split / Permuted MNIST (really?) Application-agnostic, “closer” to real-world Synthetic Speech Commands – audio with 101 time steps Quick, Draw! - variable sequence length

  9. Sequence length affects forgetting lwf gem agem mas ewc replay-10 replay-1 lwf gem replay-10 replay-1 1.0 1.0 0.8 0.8 0.6 ACC 0.6 ACC 0.4 0.4 0.2 0.2 0.0 0.0 MLP (1) LSTM-16 (49) LSTM-4 (196) ROW-LSTM (28) Model (Sequence Length) MLP (1) LSTM-16 (49) LSTM-4 (196) ROW-LSTM (28) Model (Sequence Length)

  10. Impact on more realistic benchmarks

  11. What for the future? Just scratched the surface Adapt existing CL strategies ● Orthogonal projections seem promising – find a better tradeoff Improve recurrent models and learning algorithms ● BPTT alternatives – local algorithms Applications: place something somewhere… and leave it there!

  12. Do you believe RNNs are worth studying in CL? https://arxiv.org/abs/2103.07492

More Related