I'm making an RNN language model with Keras and in order to train the model (supervised learning) I have to create a numpy array
y (with the labels of each observation for each sequence) of shape
(num_of_training_sequences, size_of_vocabulary) containing one-hot vectors.
When I have too many training sequences, this array is too big to fit in memory. However, it doesn't have to be! Since the number of possible one-hot vectors is only
y could just be a
num_of_training_sequences sized array that contains references (aka pointers) to pre-allocated one-hot vectors. This way, if two sequences end in the same word and should have the same one-hot vector, then they would just reference the same address in memory of that one-hot vector.
Is there anything I can do to overcome this? Keras's code and documentation says
fit() only accepts numpy arrays and tensors.