We often times find ourself in a situation where we have combination of different features that we want to leverage in our model. The input data to your model is a mix of time series and tabular data. Architecting a deep learning model to work well in this scenario is an interesting problem.

One example scenario is: You have data from a device like fitbit and you want to predict a sleep stage at any given minute:

You have a mixture of:

Time series inputs:

- sequence of heart rate
- sequence of respiratory rate

Tabular features:

- time since sleep onset
- personalized embedding representing this users sleep pattern

and bunch of other features.

One way to approach this is treat it as multimodal deep learning.

Multimodal learning — Photo on [ResearchGate]

And mix in “wide and deep learning” as introduced by google research [here]

Wide and deep learning — Photo on google [blog]

## So how do we go about this?

- Pass in time series sequence though an RNN or LSTM or 1D CNN and capture the hidden state or CNN embedding as a representation of the sequence.
- Concatenate embeddings for each of the sequence with other tabular features.

Some interesting decisions to consider:

For the multiple time series input sequences:

- do you treat the sequences independent and fuse/concatenate the representations late (late fusion)?
- Or do you treat them as multi-channel input with each time series as a channel (early fusion)?

In general late fusion seems to work better than early fusion and also does not require padding the input when different input sequences are not of same length. However it is really dependent on the problem space and correlation of input sequences.

In general RNN seems to work better for shorter sequences and bidirectional LSTM for longer sequences. In late fusion you can mix and match RNN/LSTM/1d CNN for different sequences.

## Here is a sample model implementation (in pytorch):

This example uses 2 layer bidi LSTM with late fusion.

For each of the time series features (feature_1 and feature_2) we also have the baseline values, hence we have fusion layers where we fuse representation of sequence (hidden state of LSTM) with the baseline values. However this fusion layer is not necessary and is not required in the absence of baseline values.

#multimodal #deep-and-wide-learning #deep-learning #timeseries #deep learning