1642058100

This repository hosts the **TSFEL - Time Series Feature Extraction Library** python package. TSFEL assists researchers on exploratory feature extraction tasks on time series without requiring significant programming effort.

Users can interact with TSFEL using two methods:

Online

It does not requires installation as it relies on Google Colabs and a user interface provided by Google Sheets

Offline

Advanced users can take full potential of TSFEL by installing as a python package

`pip install tsfel`

TSFEL is optimized for time series and **automatically extracts over 60 different features on the statistical, temporal and spectral domains.**

**Intuitive, fast deployment and reproducible**: interactive UI for feature selection and customization**Computational complexity evaluation**: estimate the computational effort before extracting features**Comprehensive documentation**: each feature extraction method has a detailed explanation**Unit tested**: we provide unit tests for each feature**Easily extended**: adding new features is easy and we encourage you to contribute with your custom features

The code below extracts all the available features on an example dataset file.

```
import tsfel
import pandas as pd
# load dataset
df = pd.read_csv('Dataset.txt')
# Retrieves a pre-defined feature configuration file to extract all available features
cfg = tsfel.get_features_by_domain()
# Extract features
X = tsfel.time_series_features_extractor(cfg, df)
```

Features | Computational Cost |
---|---|

ECDF | 1 |

ECDF Percentile | 1 |

ECDF Percentile Count | 1 |

Histogram | 1 |

Interquartile range | 1 |

Kurtosis | 1 |

Max | 1 |

Mean | 1 |

Mean absolute deviation | 1 |

Median | 1 |

Median absolute deviation | 1 |

Min | 1 |

Root mean square | 1 |

Skewness | 1 |

Standard deviation | 1 |

Variance | 1 |

Features | Computational Cost |
---|---|

Absolute energy | 1 |

Area under the curve | 1 |

Autocorrelation | 1 |

Centroid | 1 |

Entropy | 1 |

Mean absolute diff | 1 |

Mean diff | 1 |

Median absolute diff | 1 |

Median diff | 1 |

Negative turning points | 1 |

Peak to peak distance | 1 |

Positive turning points | 1 |

Signal distance | 1 |

Slope | 1 |

Sum absolute diff | 1 |

Total energy | 1 |

Zero crossing rate | 1 |

Neighbourhood peaks | 1 |

Features | Computational Cost |
---|---|

FFT mean coefficient | 1 |

Fundamental frequency | 1 |

Human range energy | 2 |

LPCC | 1 |

MFCC | 1 |

Max power spectrum | 1 |

Maximum frequency | 1 |

Median frequency | 1 |

Power bandwidth | 1 |

Spectral centroid | 2 |

Spectral decrease | 1 |

Spectral distance | 1 |

Spectral entropy | 1 |

Spectral kurtosis | 2 |

Spectral positive turning points | 1 |

Spectral roll-off | 1 |

Spectral roll-on | 1 |

Spectral skewness | 2 |

Spectral slope | 1 |

Spectral spread | 2 |

Spectral variation | 1 |

Wavelet absolute mean | 2 |

Wavelet energy | 2 |

Wavelet standard deviation | 2 |

Wavelet entropy | 2 |

Wavelet variance | 2 |

When using TSFEL please cite the following publication:

Barandas, Marília and Folgado, Duarte, et al. "*TSFEL: Time Series Feature Extraction Library.*" SoftwareX 11 (2020). https://doi.org/10.1016/j.softx.2020.100456

We would like to acknowledge the financial support obtained from the project Total Integrated and Predictive Manufacturing System Platform for Industry 4.0, co-funded by Portugal 2020, framed under the COMPETE 2020 (Operational Programme Competitiveness and Internationalization) and European Regional Development Fund (ERDF) from European Union (EU), with operation code POCI-01-0247-FEDER-038436.

Author: Fraunhoferportugal

Source Code: https://github.com/fraunhoferportugal/tsfel

License: BSD-3-Clause License

1642038960

pyts is a Python package for time series classification. It aims to make time series classification easily accessible by providing preprocessing and utility tools, and implementations of state-of-the-art algorithms. Most of these algorithms transform time series, thus pyts provides several tools to perform these transformations.

pyts requires:

- Python (>= 3.7)
- NumPy (>= 1.17.5)
- SciPy (>= 1.3.0)
- Scikit-Learn (>=0.22.1)
- Joblib (>=0.12)
- Numba (>=0.48.0)

To run the examples Matplotlib (>=2.0.0) is required.

If you already have a working installation of numpy, scipy, scikit-learn, joblib and numba, you can easily install pyts using `pip`

`pip install pyts`

or `conda`

via the `conda-forge`

channel

`conda install -c conda-forge pyts`

You can also get the latest version of pyts by cloning the repository

```
git clone https://github.com/johannfaouzi/pyts.git
cd pyts
pip install .
```

After installation, you can launch the test suite from outside the source directory using pytest:

`pytest pyts`

See the changelog for a history of notable changes to pyts.

The development of this package is in line with the one of the scikit-learn community. Therefore, you can refer to their Development Guide. A slight difference is the use of Numba instead of Cython for optimization.

The section below gives some information about the implemented algorithms in pyts. For more information, please have a look at the HTML documentation available via ReadTheDocs.

If you use pyts in a scientific publication, we would appreciate citations to the following paper:

```
Johann Faouzi and Hicham Janati. pyts: A python package for time series classification.
Journal of Machine Learning Research, 21(46):1−6, 2020.
```

Bibtex entry:

```
@article{JMLR:v21:19-763,
author = {Johann Faouzi and Hicham Janati},
title = {pyts: A Python Package for Time Series Classification},
journal = {Journal of Machine Learning Research},
year = {2020},
volume = {21},
number = {46},
pages = {1-6},
url = {http://jmlr.org/papers/v21/19-763.html}
}
```

**Note: the content described in this section corresponds to the main branch, not the latest released version. You may have to install the latest version to use some of these features.**

pyts consists of the following modules:

`approximation`

: This module provides implementations of algorithms that approximate time series. Implemented algorithms are Piecewise Aggregate Approximation, Symbolic Aggregate approXimation, Discrete Fourier Transform, Multiple Coefficient Binning and Symbolic Fourier Approximation.

`bag_of_words`

: This module provide tools to transform time series into bags of words. Implemented algorithms are WordExtractor and BagOfWords.

`classification`

: This module provides implementations of algorithms that can classify time series. Implemented algorithms are KNeighborsClassifier, SAXVSM, BOSSVS, LearningShapelets, TimeSeriesForest and TSBF.

`datasets`

: This module provides utilities to make or load toy datasets, as well as fetching datasets from the UEA & UCR Time Series Classification Repository.

`decomposition`

: This module provides implementations of algorithms that decompose a time series into several time series. The only implemented algorithm is Singular Spectrum Analysis.

`image`

: This module provides implementations of algorithms that transform time series into images. Implemented algorithms are Recurrence Plot, Gramian Angular Field and Markov Transition Field.

`metrics`

: This module provides implementations of metrics that are specific to time series. Implemented metrics are Dynamic Time Warping with several variants and the BOSS metric.

`multivariate`

: This modules provides utilities to deal with multivariate time series. Available tools are MultivariateTransformer and MultivariateClassifier to transform and classify multivariate time series using tools for univariate time series respectively, as well as JointRecurrencePlot and WEASEL+MUSE.

`preprocessing`

: This module provides most of the scikit-learn preprocessing tools but applied sample-wise (i.e. to each time series independently) instead of feature-wise, as well as an imputer of missing values using interpolation. More information is available at the pyts.preprocessing API documentation.

`transformation`

: This module provides implementations of algorithms that transform a data set of time series with shape `(n_samples, n_timestamps)`

into a data set with shape `(n_samples, n_extracted_features)`

. Implemented algorithms are BagOfPatterns, BOSS, ShapeletTransform, WEASEL and ROCKET.

`utils`

: a simple module with utility functions.

Author: Johannfaouzi

Source Code: https://github.com/johannfaouzi/pyts

License: BSD-3-Clause License

1639561200

In this video we are covering a Logistic Regression to predict stock prices (or rather returns) in Python. We are also taking a look at cryptos (Bitcoin) at the end.

As said in the video you should not take this as a valid trading strategy. It is just an idea how a Logistic Regression could be used and how overfitting can be avoided or at least diminished using a train test split.

I am purposely NOT showing a time horizon where this is working or looking nicely to make you aware of that.

I am planning on covering other algorithms and extending the strategy. If you find that interesting please leave the video a like and subscribe :-)

The video series is inspired by the Hands-On Algorithmic Trading with Python course by Deepak Kanungo. Anyhow, the code and some approaches strongly deviate from his.

Disclaimer: This video is not an investment advice and is for informational and educational purposes only.

0:00 - 0:52 Introduction

0:52 - 01:48 Quick recap

01:48 - 05:08 Data prep / Amendments to get lagged directions

04:46 - 07:17 Model building, fitting & prediction

07:17 - 09:25 Strategy, Performance and Visualization

09:25 - 13:25 Train test split

13:25 - 15:53 Confusion Matrix and Classification Report

15:53 - 16:38 Considering different amount of lags

16:38 - 18:08 Considering Bitcoin

Previous vid on Linear Regression:

https://youtu.be/AXBhrLongC8

1633167888

Learn How to Build Landmark Classifier in Flutter Step by Step 2021

00:00 - Intro

00:44 - Project Setting

01:01 - AndroidManifest Setting

01:08 - Podfile Setting (TFLite Setting)

01:21 - Info.plist Setting

01:30 - TFLite Model & Label Setting

02:12 - Home Page

10:28 - ImageService

11:30 - ClassificationService

18:45 - Classification Page

* Github: https://github.com/PuzzleLeaf/flutter_tflite_landmark_classifier

1629843540

Learn How to Build to Populer US Products Classifier Using Flutter

This model is trained to recognize more than 100,000 popular supermarket products in the United States from images. The model is mobile-friendly and can run on-device.

▶SourceCode

* Github

- https://github.com/PuzzleLeaf/flutter_tensorflow_lite_us_products_classifier

▶Timestamp

00:00 - Intro

00:21 - MainPage

01:05 - Camera Setting

03:22 - Modal Bottom Sheet

05:08 - TFLite Model Setting

15:08 - Test

1628747275

In this video, you will learn about decision tree classification algorithm in python

#decisiontree #classification #python

1626129240

you will learn

- What are Support Vector Machines?Features of SVM and its applicationExplanation of different SVM hyperparametersPython implementation for Multiclass classification

**SVM algorithm was proposed by Vladimir N. Vapnik and Alexey Ya. Chervonenkis in 1963**

Support vector machine is a supervised machine learning algorithm used for classification as well as regression. SVM’s objective is to identify a hyperplane to separate data points into two classes by maximizing the margin between support vectors of the two classes

- Support Vectors
- Hyperplane and Margins
- Features of Support Vector Machine
- Applications of SVM
- Hyperparameters of SVM
- Multiclass classification using Support Vector Classifier

#machine-learning #classification #python #supervised-learning

1624988760

We all have used one of the following supervised learning algorithms for predictive analysis:

- Logistic Regression
- Ridge Regression
- LASSO Regression
- Linear Discriminant Analysis (LDA)
- K Nearest Neighbors (KNN)
- Naive Bayes (NB)
- Support Vector Machine (SVM)
- Decision Tree
- Random Forest (RF)
- Gradient Boosting

But have you thought of their pros or cons? Here I have listed few :

- 1. Logistic Regression
- 2. Ridge Regression
- 3. LASSO Regression
- 4. Linear Discriminant Analysis (LDA)
- 5. K Nearest Neighbors (KNN)
- 6. Naive Bayes (NB)
- 7. Support Vector Machine (SVM)
- 8. Decision Tre
- 9. Random Forest (RF)
- 10. Gradient Boosting

#classification #supervised-learning #regression #algorithms #machine-learning

1623887280

In my first blog, ‘Bird Recognition App using Microsoft Custom Vision AI and Power BI’, we looked at the intriguing behaviors and attributes of birds using Power BI. This inspired me to create an ‘AI for birds’ web app’ using Azure Custom Vision along with a phone app using Power Apps and an iPhone / Android platform that could identify a bird in real-time. I created this app to raise awareness of the heart-breaking reality which most birds face around the world.

In this blog, let’s go behind the scenes and take a look at the journey of how this was created.

**What is Azure Custom Vision?**

_Azure Custom Vision is an image recognition AI service part of ___Azure Cognitive Services__ that enables you to build, deploy, and improve your own image identifiers. An image identifier applies labels (which represent classes or objects) to images, according to their visual characteristics. It allows you to specify the labels and train custom models to detect them.

**What does Azure Custom Vision do?**

*The Custom Vision service uses a Machine Learning algorithm to analyze images. You can submit groups of images that feature and lack the characteristics in question. You label the images yourself at the time of the submission. Then, the algorithm trains to this data and calculates its accuracy by testing itself on those same images.*

*Once the algorithm is trained, you can run a test, retrain, and eventually use it in your image recognition app to classify new images. You can also export the model itself for offline use.*

1. **Upload images** - Bring your own labelled images or use Custom Vision to quickly add tags to any unlabelled images.

2. **Train the model** - Use your labelled images to teach Custom Vision the concepts you care about.

3. **Evaluate the result** - Use simple REST API calls to quickly tag images with your new custom computer vision model.

The Custom Vision Service uses machine learning to classify the images you upload. The only thing that is required to do is specify the correct tag for each image. You can also tag thousands of images at a time. The AI algorithm is very powerful and once the model is trained, you can use the same model to classify new images according to the needs of the app.

#artificial intelligence #azure machine learning #classification

1623539820

**Classification** is one of two types of supervised machine learning tasks (i.e. tasks where we have a labeled dataset) with the other being **regression**.

**Key point to remember**: supervised learning tasks use features to predict targets, or, in non-tech speak, they use attributes/characteristics to predict something. For instance, we can take a basketball player’s height, weight, age, foot-speed, and/or multiple other aspects to predict how many points they’ll score or whether they will be an all-star.

- Regression tasks predict a
*continuous*value (i.e., how many points someone will score) - Classification tasks predict a
*non-continuous*value (i.e. if someone will be an all-star)

Answer the following question:

“Does my target variable have an order to it?”

For example, my project predicting the recommended age of a reader was a *regression* task because I was predicting a precise **age** (e.g., 4 years old). If I was attempting to identify whether a book was suitable for teens or not, then it would have been a *classification* task since the answer would have been either **yes** or **no**.

Nope, those are just the easy examples 😄

Imagine a scenario where you get a new batch of students every year and have to sort them into houses based on their personality traits.

In this situation, the houses do not have any type of sequence/ranking to them. Sure, Harry definitely didn’t want to be housed in Slytherin, and the Sorting Hat clearly took that into consideration, but that doesn’t mean Slytherin is closer to Gryffindor in the same way that 25 is closer to 30 than it is to 19.

Similarly, if we had a data set containing the ingredients of dishes and attempted to predict the country of origin, we’d be solving a classification problem. Why? Because country names have no numerical order. We can say that Russia is the largest country on earth or that China has the most people but those are attributes of the country (i.e., land size and population) which are not intrinsic to the name of the country.

#supervised-learning #machine-learning #classification #c

1623338820

Many times, it happens that you have an interview in a few days, and your schedule is jam-packed to prepare for it. Or maybe you are in revision mode and want to look at all the basic popular machine learning models. If that is the case, you have come to the right place. In this blog, I will briefly explain some of the most commonly asked machine learning models in interviews. I will also list important parameters related to each model and a source to find a detailed explanation of the same topic, so you can dig deeper if and when required.

Machine learning models can be broadly categorized into two categories supervised and unsupervised learning. Further in supervised learning, we have two broad categories regression and classification. The following sections explain each of them briefly to give you the necessary insights.

Note: I am providing models, which I believe are the most common ones and should be prepared before giving any data science interview. However this list is subjective.

In supervised learning, the data that you use for training the models is “labeled”. This means the output for each input is known. For example, if your model is trying to predict house prices, you might have variables like size of the house, number of floors, etc. When your data is labeled, it means you would also have a variable that contains the house price.

The above example was for regression. Let’s have a close look at regression and classification now.

In classification, the output of the model is discrete. For example, consider dog vs cat image classification, where we predict whether the image contains the family of dogs or cats. The class (which is the output of the model) will be discrete here i.e. either dog or cat. Now, we will look through the models which are commonly used for classification.

Don’t get confused; it has the word “regression” in the name, but it is used for classification. Logistic regression uses an equation to create a curve with your data and then uses this curve to predict the outcome of a new observation.

In essence, a logistic equation is created so that the output values can only be between 0 and 1.

**Detailed** **Explanation ****here**

Support Vector Machines (SVM) form a boundary between data points for classification. For example, in the case of 2 dimensions, SVM will create a boundary such that the majority of data points of one class fall on one side of the boundary, and most of the other class falls on the other side.

So the goal in SVM is to find the boundary which maximizes the margin (described in the above image).

**Important Parameter/Concepts **— Kernel, C, Gamma, Margin

**Detailed Explanation ****here**

In the decision tree, you basically ask questions about your observation and follow the tree down until you reach an outcome, as shown below.

In the above example, each square is called a **node, **and more number of nodes here will cause more overfitting of the model on the dataset.

**Important Parameter/Concepts **— Node, Leaf Node, Entropy, Information Gain

**Detailed Explanation ****here**

It is an ensemble learning technique that uses multiple decision trees to give the final output. Random forests create multiple decision trees based on bootstrapped datasets of the original dataset and randomly select subsets of the variable at each step of decision trees. During inference, we get outputs from all the decision trees, and finally, we select the output which has maximum votes. Random forests are generally preferred over decision trees as they prevent overfitting.

**Important Parameter/Concepts **— Number of decision trees, Size of bootstrapped data, Number of random forest feature, and everything else mentioned in decision tree’s section.

#classification #artificial-intelligence #machine-learning

1622822400

One of my third year Electronic and Electrical Engineering projects was to build a machine which can sort M&Ms according to their colour. The final version could sort approximately 47 sweets per minute, and won our team a few bottles of beers for our work.

Perhaps one day I will write an article detailing that process, but what I want to talk about today is the colour classification; how it classified sweets originally, and two years later, using my new-found data science knowledge to solve this problem. I’ve been using the test data as a proving ground for experimenting with various machine learning techniques as I learn them, so this blog aims to document my learning process.

The data was gathered by running M&Ms through the machine, jotting down the red, green and blue values the colour sensor returned, and logging the colour of the sweet. A long and tedious process indeed. The classification process was entirely manual; I examined where the clusters presided, and set up bounding boxes to classify sweets that fell inside them.

This was functional enough for an electronics project (and enough to win the beer), but it came with a host of problems with hacky workaround solutions.

To begin with, the machine had no idea how to deal with outliers — it would give the sweets a jiggle and rescan them, and if that didn’t work, it threw them in a waste bin. The shapes and orientations of the distributions were also not considered. This became problematic especially for red and orange sweets, as their bounding boxes intersected. If a sweet fell in the intersection region, the machine would jiggle and rescan until it fell into the exclusive red or orange boxes (the red box wins in this demo). The machine also hard classified sweets, which caused many red/orange mixups. Let’s look at the confusion matrix for this technique.

#numpy #python #classification #engineering #machine-learning

1622617140

Most of the supervised learning problems in machine learning are classification problems. Classification is the task of assigning a data point with a suitable class. Suppose a pet classification problem. If we input certain features, the machine learning model will tell us whether the given features belong to a cat or a dog. Cat and dog are the two classes here. One may be numerically represented by 0 and the other by 1. This is specifically called a binary classification problem. If there are more than two classes, the problem is termed a multi-class classification problem. This machine learning task comes under supervised learning because both the features and corresponding class are provided as input to the model during training. During testing or production, the model predicts the class given the features of a data point.

This article discusses Logistic Regression and the math behind it with a practical example and Python codes. Logistic regression is one of the fundamental algorithms meant for classification. Logistic regression is meant exclusively for binary classification problems. Nevertheless, multi-class classification can also be performed with this algorithm with some modifications.

#developers corner #binary classification #classification #logistic regression #logit #python #regression #scikit learn #sklearn #statsmodels #tutorial

1621505834

Advanced analytical applications can be developed using machine learning algorithms in Oracle database software since version 9i. As the database versions are renewed, new ones are added to these algorithm options. The current algorithm list that comes with Oracle 19c version is as follows.

With Oracle 21c, new algorithms have been added to this list. Undoubtedly, the most interesting of these algorithms was the XGBoost algorithm, one of the industry’s most frequently used ensemble methods. XGBoost, which has proved its learning capacity with its success in ML competitions opened over Kaggle, is now ready for use in the Oracle database.

#database #xgboost algorithm #classification

1620068940

We are currently in the seventh month of a global pandemic. I think I can safely say no one is enjoying it. The quicker this ends the better. One of the most important tools the medical field has when it comes to mitigating the spread of a virus is vaccination. However, vaccinations don’t work if no one gets vaccinated. Now, I realize that sounds like an obvious statement, and in many ways it is. However, it is nonetheless a hugely important fact. This is the underlying idea of “herd immunity”, and it is vital to be able to identify members of the community who are unlikely to get vaccinated.

Fortunately for us aspiring data scientists this is not our first pandemic. In response to the H1N1 Flu in 2009, the Centers for Disease Control and Prevention (CDC) conducted a survey “in order to monitor and evaluate flu vaccination efforts among adults and children”. This phone survey asked people whether they had received H1N1 and seasonal flu vaccines, in conjunction with information they shared about their lives, opinions, and behaviors. DrivenData provided a large chunk of this dataset and posed the question: Using the survey results can you make a model that predicts who will get either vaccine? I figured I would give it a shot (pun intended).

For my last project, I created a linear regression model to predict a baseball team’s total wins. The target there (wins) is a _continuous value _and so it was perfectly suited for a regression model. This problem is not so simple (or perhaps it’s more simple?). Here our target value is binary, whether or not a participant got the vaccine. So for this problem, we will be looking at classification models rather than regression models. These are models that predict the *likelihood* of one result or the other, rather than trying to predict a continuous variable.

In this post I will walk through the process I took to build my model as well as explain some of the different classification models I ended up not using.

#data-science #machine-learning #python #pandas #classification