COMP4332/RMBI4310

COMP4332/RMBI4310 Recurrent Neural Network (Keras) Prepared by Raymond Wong Presented by Raymond Wong raywong@cse

We have just finished describing the concepts related to “Recurrent Neural Network (RNN)” • Next, we will give the Keras code for RNN. • Before that, let us describe some background.

Consider the following two-dimensional time series with 8 timestamps recorded.

Consider this example with eight timestamps. • When t = 1 • When t = 2 • When t = 3 • When t = 4 • When t = 5 • When t = 6 • When t = 7 • When t = 8

Suppose that we use the values at the 3 previous timestamps for predicting the first dimensional value at the current timestamp (0.1, 0.4) 0.4 (0.2, 0.7) (0.3, 0.1) (0.2, 0.7) (0.3, 0.1) 0.5 (0.4, 0.2) (0.3, 0.1) (0.4, 0.2) (0.5, 0.1) 0.6 (0.4, 0.2) (0.6, 0.6) (0.5, 0.1) 0.7 0.8 (0.6, 0.6) (0.5, 0.1) (0.7, 0.2) How many number of records?

This training dateset for the target attribute is a 2-dimensional array. [ [0.4], [0.5], [0.6], [0.7], [0.8] ] This training dateset for the input attributes is a 3-dimensional array. [[ [0.1, 0.4], [0.2, 0.7], [0.3, 0.1] ], [ [0.2, 0.7], [0.3, 0.1], [0.4, 0.2] ], [ [0.3, 0.1], [0.4, 0.2], [0.5, 0.1] ], [ [0.4, 0.2], [0.5, 0.1], [0.6, 0.6] ], [ [0.5, 0.1], [0.6, 0.6], [0.7, 0.2] ]] What is the shape of this array? What is the shape of this array? (0.1, 0.4) 0.4 (0.2, 0.7) (0.3, 0.1) (0.2, 0.7) (0.3, 0.1) 0.5 (0.4, 0.2) (0.3, 0.1) (0.4, 0.2) (0.5, 0.1) 0.6 (0.4, 0.2) (0.6, 0.6) (0.5, 0.1) 0.7 0.8 (0.6, 0.6) (0.5, 0.1) (0.7, 0.2)

Suppose that there is a missing value. • The table becomes the following. -1 (0.1, 0.4) 0.4 (0.2, 0.7) (0.3, 0.1) (0.2, 0.7) (0.3, 0.1) 0.5 (0.4, 0.2) (0.3, 0.1) (0.4, 0.2) -1 (0.5, 0.1) 0.6 (0.4, 0.2) -1 (0.6, 0.6) (0.5, 0.1) 0.7 0.8 (0.6, 0.6) (0.5, 0.1) -1 (0.7, 0.2)

We have finished the background description. • We are ready to describe how to use Keras for RNN.

Outline • Keras Program (Vanilla LSTM Model) • Keras Program (Stacked LSTM Model) • Keras Program (Vanilla LSTM Model Handling Missing Data) • Keras Program (Vanilla GRU Model)

1. Keras Program (Vanilla LSTM Model) • Vanilla LSTM Model contains the following layers • Input layer • LSTM Layer • Dense Layer • Output Layer

1. Keras Program (Vanilla LSTM Model) 2 dimensions Rectifier function Traditional LSTM output input yt N1 xt final_yt Fully connected Output layer LSTM layer Input layer Dense layer

1. Keras Program (Vanilla LSTM Model) • We will give the Keras program called “Python-Vanilla-LSTM.py”

Final Summary about Parameter Setting • LSTM Model Parameter • No. of layers • No. of memory units in each layer • Connection between memory units from different layers • Optimization Method • Error Function adam SGD rmsprop Binary Cross Entropy, mse, mae • Training (Time) Parameter • No of epochs • Batch size We could set “no. of epochs = 150” as a stopping condition We could set “Batch Size= 10” (for example) • Evaluation • Measurement • Training/Validation/Test e.g., accuracy (or in short, “acc”) e.g., percentage of the data for the validation/test set

Final Summary about Parameter Setting • LSTM Model Parameter • No. of layers • No. of memory units in each layer • Connection between memory units from different layers • Optimization Method • Error Function adam mse • Training (Time) Parameter • No of epochs • Batch size We could set “no. of epochs = 1000” as a stopping condition We could set “Batch Size= 1” • Evaluation • Measurement • Training/Validation/Test mse validation_split=0.2

We are ready to write our Keras program for LSTM.

Data Collection Data Processing Collected Data Processed Data Raw Data Result Presenting Data Mining Processed Data Presentable Forms of Data Mining Results Data Mining Results

We have to define some “data mining” models to perform some “data mining” tasks We could call many existing libraries to complete these “data mining” tasks Data Mining Processed Data Data Mining Results

Phase 1:ModelTraining Phase 2:Model Storing Training/Validation/Test Data Model (In Memory) Model (In Disk) Processed Data Data Mining Results Phase 4:New DataPrediction Phase 3:ModelReading Model (In Memory) Model (In Disk) PredictedResult New Data Data Mining Results Processed Data

Python # the training dataset is hard-coded in this program # (i.e., the content of this dataset could be found in this program). # the new dataset has the same content as the training set newTargetAttributeDataFilename = "New-LSTM-Output.csv" modelFilenamePrefix = "LSTM-Model“ # Phase 1: to train the model print("Phase 1: to train the model...") numpyX, numpyY, model = trainModel() # Phase 2: to save the model to a file print("Phase 2: to save the model to a file...") saveModel(model, modelFilenamePrefix) # Phase 3: to read the model from a file print("Phase 3: to read the model from a file...") model = readModel(modelFilenamePrefix) # Phase 4: to predict the target attribute of a new dataset based on a model print("Phase 4: to predict the target attribute of a new dataset based on a model...") predictNewDatasetFromModel(numpyX, numpyY, newTargetAttributeDataFilename, model)

Phase 1:ModelTraining Phase 2:Model Storing Training/Validation/Test Data Model (In Memory) Model (In Disk) Phase 4:New DataPrediction Phase 3:ModelReading Model (In Memory) Model (In Disk) PredictedResult New Data

Phase 1:ModelTraining Training/Validation/Test Data Model (In Memory) To read the dataTo split the data into the input attributes and the target attribute There are the following 5 steps. Step 1: to load the data Step 2: to define the model Step 3: to compile the model Step 4: to fit the model Step 5: to evaluate the model To define the “structure” of the model To define how to update the parameter used in the “structure” of the model To train the model with the given data To evaluate the data

Python # to train a model def trainModel(trainingDataFilename): # the hard-coded training data pythonX = [[[0.1, 0.4], [0.2, 0.7], [0.3, 0.1]], [[0.2, 0.7], [0.3, 0.1], [0.4, 0.2]], [[0.3, 0.1], [0.4, 0.2], [0.5, 0.1]], [[0.4, 0.2], [0.5, 0.1], [0.6, 0.6]], [[0.5, 0.1], [0.6, 0.6], [0.7, 0.2]]] pythonY = [ [0.4], [0.5], [0.6], [0.7], [0.8]] numpyX = numpy.array(pythonX) numpyY = numpy.array(pythonY) print("Training Data: ") print("X's shape:", numpyX.shape) print("Y's shape:", numpyY.shape)

Python to set the "fixed" seed of a random number generator used in the "optimization" tool in the neural network model The reason why we fix this is to reproduce the same output each time we execute this program In practice, you could set it to any number (or, the current time) (e.g., “numpy.random.seed(int(time.time()))”) numpy.random.seed(11) # Step 1: to load the data print(" Step 1: to load the data...") # the datasets are stored in variables pythonX and pythonY # Step 2: to define the model print(" Step 2: to define the model...") model = Sequential() model.add(LSTM(1, input_shape=(3, 2))) model.add(Dense(1, activation='relu'))) ( No. of Timestamps, No. of Dimensions)

2 dimensions Timestamp = t-1 st-1 Memory Unit … Timestamp = t Traditional LSTM xt 2 dimensions Traditional LSTM Traditional LSTM xt+1 xt-1 yt+1 yt-1 2 dimensions yt st … Timestamp = t+1

2 dimensions Rectifier function Traditional LSTM Memory Unit output input yt N1 xt final_yt Fully connected Output layer LSTM layer Input layer Dense layer

Python # Step 3: to compile the model print(" Step 3: to compile the model...") model.compile(loss="mean_squared_error", optimizer="adam", metrics=["mean_squared_error"]) # Step 4: To fit the model print(" Step 4: to fit the model...") model.fit(numpyX, numpyY, validation_split=0.2, epochs=1000, batch_size=1) # Step 5: To evaluate the model print(" Step 5: to evaluate the model...") scores = model.evaluate(numpyX, numpyY) print("") print("{}: {}".format(model.metrics_names[1], scores[1])) return numpyX, numpyY, model

Output Using TensorFlow backend. Phase 1: to train the model... Training Data: X's shape: (5, 3, 2) Y's shape: (5, 1) Step 1: to load the data... Step 2: to define the model... Step 3: to compile the model... Step 4: to fit the model... Train on 4 samples, validate on 1 samples Epoch 1/1000 4/4 [==============================] - 1s 125ms/step - loss: 0.2166 - mean_squared_error: 0.2166 - val_loss: 0.4796 - val_mean_squared_error: 0.4796 Epoch 2/1000 4/4 [==============================] - 0s 4ms/step - loss: 0.2080 - mean_squared_error: 0.2080 - val_loss: 0.4666 - val_mean_squared_error: 0.4666 Epoch 3/1000 4/4 [==============================] - 0s 0us/step - loss: 0.2007 - mean_squared_error: 0.2007 - val_loss: 0.4531 - val_mean_squared_error: 0.4531 Epoch 4/1000 4/4 [==============================] - 0s 4ms/step - loss: 0.1925 - mean_squared_error: 0.1925 - val_loss: 0.4397 - val_mean_squared_error: 0.4397 Epoch 5/1000 4/4 [==============================] - 0s 4ms/step - loss: 0.1859 - mean_squared_error: 0.1859 - val_loss: 0.4258 - val_mean_squared_error: 0.4258… …

Output … Epoch 998/1000 4/4 [==============================] - 0s 4ms/step - loss: 4.6928e-04 - mean_squared_error: 4.6928e-04 - val_loss: 7.8068e-04 - val_mean_squared_error: 7.8068e-04 Epoch 999/1000 4/4 [==============================] - 0s 0us/step - loss: 4.6736e-04 - mean_squared_error: 4.6736e-04 - val_loss: 7.5334e-04 - val_mean_squared_error: 7.5334e-04 Epoch 1000/1000 4/4 [==============================] - 0s 8ms/step - loss: 4.6010e-04 - mean_squared_error: 4.6010e-04 - val_loss: 7.7079e-04 - val_mean_squared_error: 7.7079e-04 Step 5: to evaluate the model... 5/5 [==============================] - 0s 0us/step mean_squared_error: 0.0005179524887353182

Phase 2:Model Storing Model (In Memory) Model (In Disk) In Keras, we have to store the RNN model into two components. • The model structure (stored in JSON format) • The model weight information (stored in HDF5 format) Skipped (Similar)!

In Keras, we have to read the RNN model from the two components • The model structure (stored in JSON format) • The model weight information (stored in HDF5 format) Skipped (Similar)! Phase 3:ModelReading Model (In Memory) Model (In Disk)

There are the following steps. Step 1: to load the new data (input attributes) Step 2: to predict the target attribute of the new data based on a model Step 3: to save the predicted target attribute of the new data into a file Step 4 (Extra Step): to show the expected result and the predicted result Phase 4:New DataPrediction Model (In Memory) PredictedResult New Data

Python # to predict the target attribute of a new dataset # (in our case, the content from the training set) based on a model # In our case, the new dataset comes from the variable numpyX def predictNewDatasetFromModel(numpyX, numpyY, newTargetAttributeDataFilename, model): # This could be found in variable numpyX Step 1: to load the new data (input attributes) Step 2: to predict the target attribute of the new data based on a model newY_TwoDim = model.predict(numpyX, batch_size=1) Step 3: to save the predicted target attribute of the new data into a file numpy.savetxt(newTargetAttributeDataFilename, newY_TwoDim, delimiter=",", fmt="%.10f") Step 4 (Extra Step): to show the expected result and the predicted result for i in range(len(numpyX)): print("Expected", numpyY[i, 0], "Predicted", newY_TwoDim[i, 0]) return newDataY_TwoDim

New-LSTM-Output.csv 0.4306412339 0.5134388208 0.5834084153 0.6794037223 0.7722369432

Output Expected 0.4 Predicted 0.430668 Expected 0.5 Predicted 0.51345 Expected 0.6 Predicted 0.583398 Expected 0.7 Predicted 0.679383 Expected 0.8 Predicted 0.772213

We have seen a simple Vaillia LSTM structure in the previous example as follows.

2 dimensions Timestamp = t-1 st-1 Memory Unit … Timestamp = t Traditional LSTM xt 2 dimensions Traditional LSTM Traditional LSTM xt+1 xt-1 yt+1 yt-1 2 dimensions yt st … Timestamp = t+1

2 dimensions Rectifier function Traditional LSTM Memory Unit output input yt N1 xt final_yt Fully connected Output layer LSTM layer Input layer Dense layer

Next, we describe a more complicated structure in the Vanilla LSTM.

2 dimensions Rectifier function Memory Unit output input Memory Unit N1 xt final_yt Memory Unit Fully connected Fully connected Output layer LSTM layer Input layer Dense layer

Original Code Python # Step 2: to define the model print(" Step 2: to define the model...") model = Sequential() model.add(LSTM(1, input_shape=(3, 2))) model.add(Dense(1, activation='relu')) Updated Code Python # Step 2: to define the model print(" Step 2: to define the model...") model = Sequential() model.add(LSTM(3, input_shape=(3, 2))) model.add(Dense(1, activation='relu')) We also changed the seed from 11 to 117. If we set the (random) seed to 11, the result could not converge. However, if we set the seed to 117, the result could converge.

2. Keras Program (Stacked LSTM Model) • We will modify the original Keras program called “Python-Vanilla-LSTM.py”. • The updated program is called “Python-Stacked-LSTM.py”.

2. Keras Program (Stacked LSTM Model) • Stacked LSTM Model contains the following layers • Input layer • LSTM Layer • LSTM Layer (possibly, more than one layer) • Dense Layer • Output Layer

2 dimensions Rectifier function Memory Unit Memory Unit output input Memory Unit N1 xt final_yt Memory Unit Memory Unit Fully connected Fully connected Dense layer Output layer LSTM layer LSTM layer Input layer

Original Code Python # Step 2: to define the model print(" Step 2: to define the model...") model = Sequential() model.add(LSTM(1, input_shape=(3, 2))) model.add(Dense(1, activation='relu')) Updated Code Python If we do not include “return_sequenes=True”, there is a compilation error. # Step 2: to define the model print(" Step 2: to define the model...") model = Sequential() model.add(LSTM(3, return_sequences=True, input_shape=(3, 2))) model.add(LSTM(2)) model.add(Dense(1, activation='relu'))

In general, we can have the following code. Updated Code Python # Step 2: to define the model print(" Step 2: to define the model...") model = Sequential() model.add(LSTM(3, return_sequences=True, input_shape=(3, 2))) model.add(LSTM(4 , return_sequences=True)) model.add(LSTM(5 , return_sequences=True)) model.add(LSTM(3)) model.add(Dense(1, activation='relu')) We also changed the seed from 11 to 117. If we set the (random) seed to 11, the result could not converge. However, if we set the seed to 117, the result could converge.

COMP4332/RMBI4310

COMP4332/RMBI4310

Presentation Transcript