We often times find ourself in a situation where we have combination of different features that we want to leverage in our model. The input data to your model is a mix of time series and tabular data. Architecting a deep learning model to work well in this scenario is an interesting problem.
One example scenario is: You have data from a device like fitbit and you want to predict a sleep stage at any given minute:
You have a mixture of:
Time series inputs:
Tabular features:
and bunch of other features.
One way to approach this is treat it as multimodal deep learning.
Multimodal learning — Photo on [ResearchGate]
And mix in “wide and deep learning” as introduced by google research [here]
Wide and deep learning — Photo on google [blog]
Some interesting decisions to consider:
For the multiple time series input sequences:
In general late fusion seems to work better than early fusion and also does not require padding the input when different input sequences are not of same length. However it is really dependent on the problem space and correlation of input sequences.
In general RNN seems to work better for shorter sequences and bidirectional LSTM for longer sequences. In late fusion you can mix and match RNN/LSTM/1d CNN for different sequences.
This example uses 2 layer bidi LSTM with late fusion.
For each of the time series features (feature_1 and feature_2) we also have the baseline values, hence we have fusion layers where we fuse representation of sequence (hidden state of LSTM) with the baseline values. However this fusion layer is not necessary and is not required in the absence of baseline values.
#multimodal #deep-and-wide-learning #deep-learning #timeseries #deep learning