You do not need to be a Data Scientist to know this feeling: this is the winter season, you open the tap on “hot” in the morning and it takes a few seconds (sometimes more) before the water goes from cold to the desired temperature.

The reason is obvious: the hot water needs time to flow from the boiler to reach your hands. And, said differently, the two events (asking for hot water and getting some) are “desynchronized”.

If you want to reach a specific temperature at the “t” instant, you have to anticipate how long it will take for the water to run through the pipes.

How is that related to modeling processes in Industry?

When you are extracting data from your historian, you usually get a time series with a timestamp as an index. This timestamp corresponds to the time when the values were recorded:

Let’s assume that the “viscosity” of a product is your targeted feature and that there is a 200 meters pipe between your temperature sensor and your viscosimeter.

When you measure a viscosity of “61” at 06:06, is the temperature measured at the same time (10.8°C) contributing to this result? Unless the flow is really fast, probably not! So what temperature should you take into consideration when building a Machine Learning model?

From an engineering perspective, you could try to determine the flow rate, the pipe diameter, etc. and make your best guess on how many seconds it takes for the product to go from A to B.

From a Data Science perspective, you could use the data itself to determine this “de-synchronization” and work on it!

As usual, nothing better than a simple example to understand how we can cope with this problem.

We will create 3 cyclical features (x1, x2 and x3) and the corresponding target “y”, as the weighted combination of all of them.

#towards-data-science #correlation #industry #data analysis

AI in Industry: Why you should synchronize features in Time-Series
1.10 GEEK