1602925200

Data Visualization plays a crucial role in real-time Machine Learning applications. Visualizing data makes a much easier and convenient way to know, interpret, and classify data in many cases. And there are some techniques which can help to visualize data and reduce dimensions of the dataset.

In my previous article, I gave an overview of Principal Component Analysis (PCA) and explained how to implement it. PCA is a basic technique to reduce dimensions and plot data. There are some limitations of using PCA from which the major is, it does not group similar classes together rather it is just a method of transforming point to linear representation which makes it easier for humans to understand data. While t-SNE is designed to overcome this challenge such that it can **group similar objects together** even in a context of **lack of linearity**.

This article is categorized into the following sections:

- What is t-SNE?
- Need/Advantages of t-SNE
- Drawbacks of t-SNE
- Applications of t-SNE — when to use and when not to use?
- Implementation of t-SNE to MNIST dataset using Python
- Conclusion

It is a technique that tries to **maintain the local structure **of the data-points which reduces dimensions.

Let’s understand the concept from the name (t — Distributed Stochastic Neighbor Embedding): Imagine, all data-points are plotted in d -dimension(high) space and a data-point is surrounded by the other data-points of the same class and another data-point is surrounded by the similar data-points and of same class and likewise for all classes. So now, if we take any data-point (x) then the surrounding data-points (y, z, etc.) are called the neighborhood of that data-point, neighborhood of any data-point (x) is calculated such that it is** geometrically close** with that neighborhood data-point (y or z), i.e. by calculating the distance between both data-points. So basically, the neighborhood of x contains points that are closer to x. The technique only tries **to preserve the distance of the neighborhood**.

**What is embedding?** The data-points plotted in d-dimension are embedded in 2D such that the neighborhood of all data-points are tried to maintain as they were in d-dimension. Basically, for every point in high dimension space, there’s a corresponding point in low dimension space with the neighborhood concept of t-SNE.

t-SNE creates a **probability distribution** using the Gaussian distribution that defines the relationships between the points in high-dimensional space.

It is stochastic since in **every run it’s output changes**, that is it is not deterministic.

#deep-learning #dimensionality-reduction #machine-learning #data-visualization #data-science

1593347004

The Greedy Method is an approach for solving certain types of optimization problems. The greedy algorithm chooses the optimum result at each stage. While this works the majority of the times, there are numerous examples where the greedy approach is not the correct approach. For example, let’s say that you’re taking the greedy algorithm approach to earning money at a certain point in your life. You graduate high school and have two options:

#computer-science #algorithms #developer #programming #greedy-algorithms #algorithms

1596427800

Finding a certain piece of text inside a document represents an important feature nowadays. This is widely used in many practical things that we regularly do in our everyday lives, such as searching for something on Google or even plagiarism. In small texts, the algorithm used for pattern matching doesn’t require a certain complexity to behave well. However, big processes like searching the word ‘cake’ in a 300 pages book can take a lot of time if a naive algorithm is used.

Before, talking about KMP, we should analyze the inefficient approach for finding a sequence of characters into a text. This algorithm slides over the text one by one to check for a match. The complexity provided by this solution is O (m * (n — m + 1)), where m is the length of the pattern and n the length of the text.

Find all the occurrences of string pat in string txt (naive algorithm).

```
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
string pat = "ABA"; // the pattern
string txt = "CABBCABABAB"; // the text in which we are searching
bool checkForPattern(int index, int patLength) {
int i;
// checks if characters from pat are different from those in txt
for(i = 0; i < patLength; i++) {
if(txt[index + i] != pat[i]) {
return false;
}
}
return true;
}
void findPattern() {
int patternLength = pat.size();
int textLength = txt.size();
for(int i = 0; i <= textLength - patternLength; i++) {
// check for every index if there is a match
if(checkForPattern(i,patternLength)) {
cout << "Pattern at index " << i << "\n";
}
}
}
int main()
{
findPattern();
return 0;
}
view raw
main6.cpp hosted with ❤ by GitHub
```

This algorithm is based on a degenerating property that uses the fact that our pattern has some sub-patterns appearing more than once. This approach is significantly improving our complexity to linear time. The idea is when we find a mismatch, we already know some of the characters in the next searching window. This way we save time by skip matching the characters that we already know will surely match. To know when to skip, we need to pre-process an auxiliary array prePos in our pattern. prePos will hold integer values that will tell us the count of characters to be jumped. This supporting array can be described as the longest proper prefix that is also a suffix.

#programming #data-science #coding #kmp-algorithm #algorithms #algorithms

1624867080

Algorithm trading backtest and optimization examples.

#algorithms #optimization examples #algorithm trading backtest #algorithm #trading backtest

1593350760

Learn what are metaheuristics and why we use them sometimes instead of traditional optimization algorithms. Learn the metaheuristic Genetic Algorithm (GA) and how it works through a simple step by step guide.

#genetic-algorithm #algorithms #optimization #metaheuristics #data-science #algorithms

1626429780

- Making something better.
- Increase efficiency.

- A problem in which we have to find the values of inputs (also called solutions or decision variables) from all possible inputs in such a way that we get the “best” output values.
- Definition of “best”- Finding the values of inputs that result in a maximum or minimum of a function called the objective function.
- There can be multiple objective functions as well (depends on the problem).

An algorithm used to solve an optimization problem is called an optimization algorithm.

Algorithms that simulate physical and/or biological behavior in nature to solve optimization problems.

- It is a subset of evolutionary algorithms that simulates/models Genetics and Evolution (biological behavior) to optimize a highly complex function.
- A highly complex function can be:
- 1. Very difficult to model mathematically.
- 2. Computationally expensive to solve. Eg. NP-hard problems.
- 3. Involves a large number of parameters.

- Introduced by Prof. John Holland in 1965.
- The first article on GA was published in 1975.
- GA is based on two fundamental biological processes:
- 1.
**Genetics (by G.J. Mendel in 1865):**It is the branch of biology that deals with the study of genes, gene variation, and heredity. - 2.
**Evolution (by C. Darwin in 1875):**It is the process by which the population of organisms changes over generations.

- A population of individuals exists in an environment with limited resources.
- Competition for those resources causes the selection of those fitter individuals that are better adapted to the environment.
- These individuals act as seeds for the generation of new individuals through recombination and mutation.
- Evolved new individuals act as initial population and Steps 1 to 3 are repeated.

- 1. Acoustics
- 2. Aerospace Engineering
- 3. Financial Markets
- 4. Geophysics
- 5. Materials Engineering
- 6. Routing and Scheduling
- 7. Systems Engineering

- 1. Population Selection Problem
- 2. Defining Fitness Function
- 3. Premature or rapid convergence of GA
- 4. Convergence to Local Optima

#evolutionary-algorithms #data-science #genetic-algorithm #algorithm