Dexter  Goodwin

Dexter Goodwin

1642058100

An intuitive Library to Extract Features From Time Series

Time Series Feature Extraction Library

Intuitive time series feature extraction

This repository hosts the TSFEL - Time Series Feature Extraction Library python package. TSFEL assists researchers on exploratory feature extraction tasks on time series without requiring significant programming effort.

Users can interact with TSFEL using two methods:

Online

It does not requires installation as it relies on Google Colabs and a user interface provided by Google Sheets

Offline

Advanced users can take full potential of TSFEL by installing as a python package

pip install tsfel

Includes a comprehensive number of features

TSFEL is optimized for time series and automatically extracts over 60 different features on the statistical, temporal and spectral domains.

Functionalities

  • Intuitive, fast deployment and reproducible: interactive UI for feature selection and customization
  • Computational complexity evaluation: estimate the computational effort before extracting features
  • Comprehensive documentation: each feature extraction method has a detailed explanation
  • Unit tested: we provide unit tests for each feature
  • Easily extended: adding new features is easy and we encourage you to contribute with your custom features

Get started

The code below extracts all the available features on an example dataset file.

import tsfel
import pandas as pd

# load dataset
df = pd.read_csv('Dataset.txt')

# Retrieves a pre-defined feature configuration file to extract all available features
cfg = tsfel.get_features_by_domain()

# Extract features
X = tsfel.time_series_features_extractor(cfg, df)

Available features

Statistical domain

FeaturesComputational Cost
ECDF1
ECDF Percentile1
ECDF Percentile Count1
Histogram1
Interquartile range1
Kurtosis1
Max1
Mean1
Mean absolute deviation1
Median1
Median absolute deviation1
Min1
Root mean square1
Skewness1
Standard deviation1
Variance1

Temporal domain

FeaturesComputational Cost
Absolute energy1
Area under the curve1
Autocorrelation1
Centroid1
Entropy1
Mean absolute diff1
Mean diff1
Median absolute diff1
Median diff1
Negative turning points1
Peak to peak distance1
Positive turning points1
Signal distance1
Slope1
Sum absolute diff1
Total energy1
Zero crossing rate1
Neighbourhood peaks1

Spectral domain

FeaturesComputational Cost
FFT mean coefficient1
Fundamental frequency1
Human range energy2
LPCC1
MFCC1
Max power spectrum1
Maximum frequency1
Median frequency1
Power bandwidth1
Spectral centroid2
Spectral decrease1
Spectral distance1
Spectral entropy1
Spectral kurtosis2
Spectral positive turning points1
Spectral roll-off1
Spectral roll-on1
Spectral skewness2
Spectral slope1
Spectral spread2
Spectral variation1
Wavelet absolute mean2
Wavelet energy2
Wavelet standard deviation2
Wavelet entropy2
Wavelet variance2

Citing

When using TSFEL please cite the following publication:

Barandas, Marília and Folgado, Duarte, et al. "TSFEL: Time Series Feature Extraction Library." SoftwareX 11 (2020). https://doi.org/10.1016/j.softx.2020.100456

Acknowledgements

We would like to acknowledge the financial support obtained from the project Total Integrated and Predictive Manufacturing System Platform for Industry 4.0, co-funded by Portugal 2020, framed under the COMPETE 2020 (Operational Programme Competitiveness and Internationalization) and European Regional Development Fund (ERDF) from European Union (EU), with operation code POCI-01-0247-FEDER-038436.

Author: Fraunhoferportugal
Source Code: https://github.com/fraunhoferportugal/tsfel 
License: BSD-3-Clause License

#python #data-science #classification 

An intuitive Library to Extract Features From Time Series
Dexter  Goodwin

Dexter Goodwin

1642038960

Pyts: A Python package for time series classification

pyts: a Python package for time series classification

pyts is a Python package for time series classification. It aims to make time series classification easily accessible by providing preprocessing and utility tools, and implementations of state-of-the-art algorithms. Most of these algorithms transform time series, thus pyts provides several tools to perform these transformations.

Installation

Dependencies

pyts requires:

  • Python (>= 3.7)
  • NumPy (>= 1.17.5)
  • SciPy (>= 1.3.0)
  • Scikit-Learn (>=0.22.1)
  • Joblib (>=0.12)
  • Numba (>=0.48.0)

To run the examples Matplotlib (>=2.0.0) is required.

User installation

If you already have a working installation of numpy, scipy, scikit-learn, joblib and numba, you can easily install pyts using pip

pip install pyts

or conda via the conda-forge channel

conda install -c conda-forge pyts

You can also get the latest version of pyts by cloning the repository

git clone https://github.com/johannfaouzi/pyts.git
cd pyts
pip install .

Testing

After installation, you can launch the test suite from outside the source directory using pytest:

pytest pyts

Changelog

See the changelog for a history of notable changes to pyts.

Development

The development of this package is in line with the one of the scikit-learn community. Therefore, you can refer to their Development Guide. A slight difference is the use of Numba instead of Cython for optimization.

Documentation

The section below gives some information about the implemented algorithms in pyts. For more information, please have a look at the HTML documentation available via ReadTheDocs.

Citation

If you use pyts in a scientific publication, we would appreciate citations to the following paper:

Johann Faouzi and Hicham Janati. pyts: A python package for time series classification.
Journal of Machine Learning Research, 21(46):1−6, 2020.

Bibtex entry:

@article{JMLR:v21:19-763,
  author  = {Johann Faouzi and Hicham Janati},
  title   = {pyts: A Python Package for Time Series Classification},
  journal = {Journal of Machine Learning Research},
  year    = {2020},
  volume  = {21},
  number  = {46},
  pages   = {1-6},
  url     = {http://jmlr.org/papers/v21/19-763.html}
}

Implemented features

Note: the content described in this section corresponds to the main branch, not the latest released version. You may have to install the latest version to use some of these features.

pyts consists of the following modules:

approximation: This module provides implementations of algorithms that approximate time series. Implemented algorithms are Piecewise Aggregate Approximation, Symbolic Aggregate approXimation, Discrete Fourier Transform, Multiple Coefficient Binning and Symbolic Fourier Approximation.

bag_of_words: This module provide tools to transform time series into bags of words. Implemented algorithms are WordExtractor and BagOfWords.

classification: This module provides implementations of algorithms that can classify time series. Implemented algorithms are KNeighborsClassifier, SAXVSM, BOSSVS, LearningShapelets, TimeSeriesForest and TSBF.

datasets: This module provides utilities to make or load toy datasets, as well as fetching datasets from the UEA & UCR Time Series Classification Repository.

decomposition: This module provides implementations of algorithms that decompose a time series into several time series. The only implemented algorithm is Singular Spectrum Analysis.

image: This module provides implementations of algorithms that transform time series into images. Implemented algorithms are Recurrence Plot, Gramian Angular Field and Markov Transition Field.

metrics: This module provides implementations of metrics that are specific to time series. Implemented metrics are Dynamic Time Warping with several variants and the BOSS metric.

multivariate: This modules provides utilities to deal with multivariate time series. Available tools are MultivariateTransformer and MultivariateClassifier to transform and classify multivariate time series using tools for univariate time series respectively, as well as JointRecurrencePlot and WEASEL+MUSE.

preprocessing: This module provides most of the scikit-learn preprocessing tools but applied sample-wise (i.e. to each time series independently) instead of feature-wise, as well as an imputer of missing values using interpolation. More information is available at the pyts.preprocessing API documentation.

transformation: This module provides implementations of algorithms that transform a data set of time series with shape (n_samples, n_timestamps) into a data set with shape (n_samples, n_extracted_features). Implemented algorithms are BagOfPatterns, BOSS, ShapeletTransform, WEASEL and ROCKET.

utils: a simple module with utility functions.

Author: Johannfaouzi
Source Code: https://github.com/johannfaouzi/pyts 
License: BSD-3-Clause License

#python #machinelearning #classification 

Pyts: A Python package for time series classification

How to Predicting Stock/Crypto Returns with Python

In this video we are covering a Logistic Regression to predict stock prices (or rather returns) in Python. We are also taking a look at cryptos (Bitcoin) at the end.

As said in the video you should not take this as a valid trading strategy. It is just an idea how a Logistic Regression could be used and how overfitting can be avoided or at least diminished using a train test split.

I am purposely NOT showing a time horizon where this is working or looking nicely to make you aware of that.

I am planning on covering other algorithms and extending the strategy. If you find that interesting please leave the video a like and subscribe :-)

The video series is inspired by the Hands-On Algorithmic Trading with Python course by Deepak Kanungo. Anyhow, the code and some approaches strongly deviate from his.

Disclaimer: This video is not an investment advice and is for informational and educational purposes only.

0:00 - 0:52 Introduction
0:52 - 01:48 Quick recap
01:48 - 05:08 Data prep / Amendments to get lagged directions
04:46 - 07:17 Model building, fitting & prediction
07:17 - 09:25 Strategy, Performance and Visualization
09:25 - 13:25 Train test split
13:25 - 15:53 Confusion Matrix and Classification Report
15:53 - 16:38 Considering different amount of lags
16:38 - 18:08 Considering Bitcoin

Previous vid on Linear Regression:
https://youtu.be/AXBhrLongC8

#python #machinelearning  #classification 

How to Predicting Stock/Crypto Returns with Python
Antwan  Larson

Antwan Larson

1633167888

How to Build Landmark Classifier in Flutter Step by Step 2021

Learn How to Build Landmark Classifier in Flutter Step by Step 2021

00:00 - Intro
00:44 - Project Setting
01:01 - AndroidManifest Setting
01:08 - Podfile Setting (TFLite Setting)
01:21 - Info.plist Setting
01:30 - TFLite Model & Label Setting
02:12 - Home Page
10:28 - ImageService
11:30 - ClassificationService
18:45 - Classification Page

* Github: https://github.com/PuzzleLeaf/flutter_tflite_landmark_classifier

#flutter  #tflite  #classification 

How to Build Landmark Classifier in Flutter Step by Step 2021
Antwan  Larson

Antwan Larson

1629843540

How to Build to Populer US Products Classifier Using Flutter

Learn How to Build to Populer US Products Classifier Using Flutter

This model is trained to recognize more than 100,000 popular supermarket products in the United States from images. The model is mobile-friendly and can run on-device.

▶SourceCode
* Github
- https://github.com/PuzzleLeaf/flutter_tensorflow_lite_us_products_classifier

▶Timestamp
00:00 - Intro
00:21 - MainPage
01:05 - Camera Setting
03:22 - Modal Bottom Sheet
05:08 - TFLite Model Setting
15:08 - Test

#flutter  #tflite  #classification #lite 

How to Build to Populer US Products Classifier Using Flutter

Learn About Decision Tree Classification Algorithm in Python

In this video, you will learn about decision tree classification algorithm in python
 #decisiontree  #classification #python 

Learn About Decision Tree Classification Algorithm in Python
Sofia  Maggio

Sofia Maggio

1626129240

A Brief Overview of Support Vector Machines

you will learn

  • What are Support Vector Machines?Features of SVM and its applicationExplanation of different SVM hyperparametersPython implementation for Multiclass classification

SVM algorithm was proposed by Vladimir N. Vapnik and Alexey Ya. Chervonenkis in 1963

Support vector machine is a supervised machine learning algorithm used for classification as well as regression. SVM’s objective is to identify a hyperplane to separate data points into two classes by maximizing the margin between support vectors of the two classes

  • Support Vectors
  • Hyperplane and Margins
  • Features of Support Vector Machine
  • Applications of SVM
  • Hyperparameters of SVM
  • Multiclass classification using Support Vector Classifier

#machine-learning #classification #python #supervised-learning

A Brief Overview of Support Vector Machines

Pros and Cons of popular Supervised Learning Algorithms

We all have used one of the following supervised learning algorithms for predictive analysis:

  1. Logistic Regression
  2. Ridge Regression
  3. LASSO Regression
  4. Linear Discriminant Analysis (LDA)
  5. K Nearest Neighbors (KNN)
  6. Naive Bayes (NB)
  7. Support Vector Machine (SVM)
  8. Decision Tree
  9. Random Forest (RF)
  10. Gradient Boosting

But have you thought of their pros or cons? Here I have listed few :

  • 1. Logistic Regression
  • 2. Ridge Regression
  • 3. LASSO Regression
  • 4. Linear Discriminant Analysis (LDA)
  • 5. K Nearest Neighbors (KNN)
  • 6. Naive Bayes (NB)
  • 7. Support Vector Machine (SVM)
  • 8. Decision Tre
  • 9. Random Forest (RF)
  • 10. Gradient Boosting

#classification #supervised-learning #regression #algorithms #machine-learning

Pros and Cons of popular Supervised Learning Algorithms
Cody  Osinski

Cody Osinski

1623887280

Building a Bird Recognition App Using Custom Vision AI and Power BI

In my first blog, ‘Bird Recognition App using Microsoft Custom Vision AI and Power BI’, we looked at the intriguing behaviors and attributes of birds using Power BI. This inspired me to create an ‘AI for birds’ web app’ using Azure Custom Vision along with a phone app using Power Apps and an iPhone / Android platform that could identify a bird in real-time. I created this app to raise awareness of the heart-breaking reality which most birds face around the world.

In this blog, let’s go behind the scenes and take a look at the journey of how this was created.

What is Azure Custom Vision?

_Azure Custom Vision is an image recognition AI service part of _Azure Cognitive Services that enables you to build, deploy, and improve your own image identifiers. An image identifier applies labels (which represent classes or objects) to images, according to their visual characteristics. It allows you to specify the labels and train custom models to detect them.

What does Azure Custom Vision do?

The Custom Vision service uses a Machine Learning algorithm to analyze images. You can submit groups of images that feature and lack the characteristics in question. You label the images yourself at the time of the submission. Then, the algorithm trains to this data and calculates its accuracy by testing itself on those same images.

Once the algorithm is trained, you can run a test, retrain, and eventually use it in your image recognition app to classify new images. You can also export the model itself for offline use.

How does it work?

1. Upload images - Bring your own labelled images or use Custom Vision to quickly add tags to any unlabelled images.

2. Train the model - Use your labelled images to teach Custom Vision the concepts you care about.

3. Evaluate the result - Use simple REST API calls to quickly tag images with your new custom computer vision model.

The Custom Vision Service uses machine learning to classify the images you upload. The only thing that is required to do is specify the correct tag for each image. You can also tag thousands of images at a time. The AI algorithm is very powerful and once the model is trained, you can use the same model to classify new images according to the needs of the app.

#artificial intelligence #azure machine learning #classification

 Building a Bird Recognition App Using Custom Vision AI and Power BI
Abdullah  Kozey

Abdullah Kozey

1623539820

C is for Classification

What is classification?

Classification is one of two types of supervised machine learning tasks (i.e. tasks where we have a labeled dataset) with the other being regression.

Key point to remember: supervised learning tasks use features to predict targets, or, in non-tech speak, they use attributes/characteristics to predict something. For instance, we can take a basketball player’s height, weight, age, foot-speed, and/or multiple other aspects to predict how many points they’ll score or whether they will be an all-star.

So what’s the difference between the two?

  • Regression tasks predict a continuous value (i.e., how many points someone will score)
  • Classification tasks predict a non-continuous value (i.e. if someone will be an all-star)

How do I know which technique to use?

Answer the following question:

“Does my target variable have an order to it?”

For example, my project predicting the recommended age of a reader was a regression task because I was predicting a precise age (e.g., 4 years old). If I was attempting to identify whether a book was suitable for teens or not, then it would have been a classification task since the answer would have been either yes or no.

OK, so classification is only for yes/no, true/false, cat/dog problems, right?

Nope, those are just the easy examples 😄

Example 1: Sorting People into Groups

Imagine a scenario where you get a new batch of students every year and have to sort them into houses based on their personality traits.

In this situation, the houses do not have any type of sequence/ranking to them. Sure, Harry definitely didn’t want to be housed in Slytherin, and the Sorting Hat clearly took that into consideration, but that doesn’t mean Slytherin is closer to Gryffindor in the same way that 25 is closer to 30 than it is to 19.

Example 2: Applying Labels

Similarly, if we had a data set containing the ingredients of dishes and attempted to predict the country of origin, we’d be solving a classification problem. Why? Because country names have no numerical order. We can say that Russia is the largest country on earth or that China has the most people but those are attributes of the country (i.e., land size and population) which are not intrinsic to the name of the country.

#supervised-learning #machine-learning #classification #c

C is for Classification
Queenie  Davis

Queenie Davis

1623338820

5 Minutes Cheat Sheet Explaining all Machine Learning Models

Many times, it happens that you have an interview in a few days, and your schedule is jam-packed to prepare for it. Or maybe you are in revision mode and want to look at all the basic popular machine learning models. If that is the case, you have come to the right place. In this blog, I will briefly explain some of the most commonly asked machine learning models in interviews. I will also list important parameters related to each model and a source to find a detailed explanation of the same topic, so you can dig deeper if and when required.

Machine learning models can be broadly categorized into two categories supervised and unsupervised learning. Further in supervised learning, we have two broad categories regression and classification. The following sections explain each of them briefly to give you the necessary insights.

Note: I am providing models, which I believe are the most common ones and should be prepared before giving any data science interview. However this list is subjective.

Supervised learning

In supervised learning, the data that you use for training the models is “labeled”. This means the output for each input is known. For example, if your model is trying to predict house prices, you might have variables like size of the house, number of floors, etc. When your data is labeled, it means you would also have a variable that contains the house price.

The above example was for regression. Let’s have a close look at regression and classification now.

Classification

In classification, the output of the model is discrete. For example, consider dog vs cat image classification, where we predict whether the image contains the family of dogs or cats. The class (which is the output of the model) will be discrete here i.e. either dog or cat. Now, we will look through the models which are commonly used for classification.

Logistic regression

Don’t get confused; it has the word “regression” in the name, but it is used for classification. Logistic regression uses an equation to create a curve with your data and then uses this curve to predict the outcome of a new observation.

In essence, a logistic equation is created so that the output values can only be between 0 and 1.

Detailed **Explanation **here

Support Vector Machine

Support Vector Machines (SVM) form a boundary between data points for classification. For example, in the case of 2 dimensions, SVM will create a boundary such that the majority of data points of one class fall on one side of the boundary, and most of the other class falls on the other side.

So the goal in SVM is to find the boundary which maximizes the margin (described in the above image).

**Important Parameter/Concepts **— Kernel, C, Gamma, Margin

**Detailed Explanation **here

Decision Tree

In the decision tree, you basically ask questions about your observation and follow the tree down until you reach an outcome, as shown below.

In the above example, each square is called a **node, **and more number of nodes here will cause more overfitting of the model on the dataset.

**Important Parameter/Concepts **— Node, Leaf Node, Entropy, Information Gain

**Detailed Explanation **here

Random Forest

It is an ensemble learning technique that uses multiple decision trees to give the final output. Random forests create multiple decision trees based on bootstrapped datasets of the original dataset and randomly select subsets of the variable at each step of decision trees. During inference, we get outputs from all the decision trees, and finally, we select the output which has maximum votes. Random forests are generally preferred over decision trees as they prevent overfitting.

**Important Parameter/Concepts **— Number of decision trees, Size of bootstrapped data, Number of random forest feature, and everything else mentioned in decision tree’s section.

#classification #artificial-intelligence #machine-learning

5 Minutes Cheat Sheet Explaining all Machine Learning Models
Bailee  Streich

Bailee Streich

1622822400

Building an M&M Colour Classifier

One of my third year Electronic and Electrical Engineering projects was to build a machine which can sort M&Ms according to their colour. The final version could sort approximately 47 sweets per minute, and won our team a few bottles of beers for our work.

Perhaps one day I will write an article detailing that process, but what I want to talk about today is the colour classification; how it classified sweets originally, and two years later, using my new-found data science knowledge to solve this problem. I’ve been using the test data as a proving ground for experimenting with various machine learning techniques as I learn them, so this blog aims to document my learning process.

The data was gathered by running M&Ms through the machine, jotting down the red, green and blue values the colour sensor returned, and logging the colour of the sweet. A long and tedious process indeed. The classification process was entirely manual; I examined where the clusters presided, and set up bounding boxes to classify sweets that fell inside them.

This was functional enough for an electronics project (and enough to win the beer), but it came with a host of problems with hacky workaround solutions.

To begin with, the machine had no idea how to deal with outliers — it would give the sweets a jiggle and rescan them, and if that didn’t work, it threw them in a waste bin. The shapes and orientations of the distributions were also not considered. This became problematic especially for red and orange sweets, as their bounding boxes intersected. If a sweet fell in the intersection region, the machine would jiggle and rescan until it fell into the exclusive red or orange boxes (the red box wins in this demo). The machine also hard classified sweets, which caused many red/orange mixups. Let’s look at the confusion matrix for this technique.

#numpy #python #classification #engineering #machine-learning

Building an M&M Colour Classifier

Logistic Regression -Beginners Guide in Python - Analytics India Magazine

Most of the supervised learning problems in machine learning are classification problems. Classification is the task of assigning a data point with a suitable class. Suppose a pet classification problem. If we input certain features, the machine learning model will tell us whether the given features belong to a cat or a dog. Cat and dog are the two classes here. One may be numerically represented by 0 and the other by 1. This is specifically called a binary classification problem. If there are more than two classes, the problem is termed a multi-class classification problem. This machine learning task comes under supervised learning because both the features and corresponding class are provided as input to the model during training. During testing or production, the model predicts the class given the features of a data point.

This article discusses Logistic Regression and the math behind it with a practical example and Python codes. Logistic regression is one of the fundamental algorithms meant for classification. Logistic regression is meant exclusively for binary classification problems. Nevertheless, multi-class classification can also be performed with this algorithm with some modifications.

#developers corner #binary classification #classification #logistic regression #logit #python #regression #scikit learn #sklearn #statsmodels #tutorial

Logistic Regression -Beginners Guide in Python - Analytics India Magazine
Grace  Lesch

Grace Lesch

1621505834

Classification With XGBoost Algorithm in a Database

Advanced analytical applications can be developed using machine learning algorithms in Oracle database software since version 9i. As the database versions are renewed, new ones are added to these algorithm options. The current algorithm list that comes with Oracle 19c version is as follows.

With Oracle 21c, new algorithms have been added to this list. Undoubtedly, the most interesting of these algorithms was the XGBoost algorithm, one of the industry’s most frequently used ensemble methods. XGBoost, which has proved its learning capacity with its success in ML competitions opened over Kaggle, is now ready for use in the Oracle database.

#database #xgboost algorithm #classification

Classification With XGBoost Algorithm in a Database
Kasey  Turcotte

Kasey Turcotte

1620068940

Using Classification Models To Predict Vaccinations

We are currently in the seventh month of a global pandemic. I think I can safely say no one is enjoying it. The quicker this ends the better. One of the most important tools the medical field has when it comes to mitigating the spread of a virus is vaccination. However, vaccinations don’t work if no one gets vaccinated. Now, I realize that sounds like an obvious statement, and in many ways it is. However, it is nonetheless a hugely important fact. This is the underlying idea of “herd immunity”, and it is vital to be able to identify members of the community who are unlikely to get vaccinated.

Fortunately for us aspiring data scientists this is not our first pandemic. In response to the H1N1 Flu in 2009, the Centers for Disease Control and Prevention (CDC) conducted a survey “in order to monitor and evaluate flu vaccination efforts among adults and children”. This phone survey asked people whether they had received H1N1 and seasonal flu vaccines, in conjunction with information they shared about their lives, opinions, and behaviors. DrivenData provided a large chunk of this dataset and posed the question: Using the survey results can you make a model that predicts who will get either vaccine? I figured I would give it a shot (pun intended).

For my last project, I created a linear regression model to predict a baseball team’s total wins. The target there (wins) is a _continuous value _and so it was perfectly suited for a regression model. This problem is not so simple (or perhaps it’s more simple?). Here our target value is binary, whether or not a participant got the vaccine. So for this problem, we will be looking at classification models rather than regression models. These are models that predict the likelihood of one result or the other, rather than trying to predict a continuous variable.

In this post I will walk through the process I took to build my model as well as explain some of the different classification models I ended up not using.

#data-science #machine-learning #python #pandas #classification

Using Classification Models To Predict Vaccinations