Why? Existing tools are not well-suited to time series tasks and do not easily integrate together. Methods in the scikit-learn package assume that data is structured in a tabular format and each column is i.i.d. — assumptions that do not hold for time series data. Packages containing time series learning modules, such as statsmodels, do not integrate well together. Further, many essential time series operations, such as splitting data into train and test sets across time, are not available in existing python packages.
To address these challenges, sktime was created.
Logo of the sktime library (Github: https://github.com/alan-turing-institute/sktime)
sktime is an open-source Python toolbox for machine learning with time series. It is a community-driven project funded by the UK Economic and Social Research Council, the Consumer Data Research Centre, and The Alan Turing Institute.
sktime extends and the scikit-learn API to time series tasks. It provides the necessary algorithms and transformation tools to efficiently solve for time series regression, forecasting, and classification tasks. The library includes dedicated time series learning algorithms and transformation methods not readily available in other common libraries.
sktime was designed to interoperate with scikit-learn, easily adapt algorithms for interrelated time series tasks, and build composite models. How? Many time series tasks are related. An algorithm that can solve for one task can often be re-used to help solve a related one. This idea is called reduction. For example, a model for time series regression (use a series to predict an output value) can be re-used for a time series forecasting task (the predicted output value is a future value).
Mission statement: “sktime enables understandable and composable machine learning with time series. It provides scikit-learn compatible algorithms and model composition tools, supported by a clear taxonomy of learning tasks, with instructive documentation and a friendly community.”
In the rest of this article, I highlight some of the unique features of sktime.
Sktime uses a nested data structure for time series in pandas data frames.
Each row in a typical data frame contains i.i.d. observations and columns represent different variables. For sktime methods, each cell in the Pandas data frame can now contain an entire time series. This format is flexible for multivariate, panel, and heterogenous data and allows the reuse of methods in both Pandas and scikit-learn.
In the table below, each row is an observation that contains a time series array in column X and class value in column y. sktime estimators and transformers can operate on such series.
#time-series-analysis #machine-learning #forecasting #python #data-science