Virgil  Hagenes

Virgil Hagenes

1603407600

Handling Outliers in Clusters using Silhouette Analysis

The real-world data often has a lot of outlier values. The cause of outliers can be data corruption or failure to record data. The handling of outliers is very important during the data preprocessing pipeline as the presence of outliers can prevent the model to perform best.

There are various strategies to handle outliers in the dataset. This article will cover how to handle outliers after clustering data into several clusters using Silhouette Analysis.

Silhouette Analysis:

The** silhouette method **is a method to find the optimal number of clusters and interpretation and validation of consistency within clusters of data. The silhouette method computes silhouette coefficients of each point that measure how much a point is similar to its own cluster compared to other clusters. by providing a succinct graphical representation of how well each object has been classified. The analysis of these graphical representations is called Silhouette Analysis.

The silhouette value is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation). The value of the silhouette ranges between [1, -1].

Important Points:
The Silhouette coefficient of +1 indicates that the sample is far away from the neighboring clusters.
The Silhouette coefficient of 0 indicates that the sample is on or very close to the decision boundary between two neighboring clusters.
Silhouette coefficient <0 indicates that those samples might have been assigned to the wrong cluster or are outliers.

#education #artificial-intelligence #data-science #machine-learning #clustering

What is GEEK

Buddha Community

Handling Outliers in Clusters using Silhouette Analysis

Why Use WordPress? What Can You Do With WordPress?

Can you use WordPress for anything other than blogging? To your surprise, yes. WordPress is more than just a blogging tool, and it has helped thousands of websites and web applications to thrive. The use of WordPress powers around 40% of online projects, and today in our blog, we would visit some amazing uses of WordPress other than blogging.
What Is The Use Of WordPress?

WordPress is the most popular website platform in the world. It is the first choice of businesses that want to set a feature-rich and dynamic Content Management System. So, if you ask what WordPress is used for, the answer is – everything. It is a super-flexible, feature-rich and secure platform that offers everything to build unique websites and applications. Let’s start knowing them:

1. Multiple Websites Under A Single Installation
WordPress Multisite allows you to develop multiple sites from a single WordPress installation. You can download WordPress and start building websites you want to launch under a single server. Literally speaking, you can handle hundreds of sites from one single dashboard, which now needs applause.
It is a highly efficient platform that allows you to easily run several websites under the same login credentials. One of the best things about WordPress is the themes it has to offer. You can simply download them and plugin for various sites and save space on sites without losing their speed.

2. WordPress Social Network
WordPress can be used for high-end projects such as Social Media Network. If you don’t have the money and patience to hire a coder and invest months in building a feature-rich social media site, go for WordPress. It is one of the most amazing uses of WordPress. Its stunning CMS is unbeatable. And you can build sites as good as Facebook or Reddit etc. It can just make the process a lot easier.
To set up a social media network, you would have to download a WordPress Plugin called BuddyPress. It would allow you to connect a community page with ease and would provide all the necessary features of a community or social media. It has direct messaging, activity stream, user groups, extended profiles, and so much more. You just have to download and configure it.
If BuddyPress doesn’t meet all your needs, don’t give up on your dreams. You can try out WP Symposium or PeepSo. There are also several themes you can use to build a social network.

3. Create A Forum For Your Brand’s Community
Communities are very important for your business. They help you stay in constant connection with your users and consumers. And allow you to turn them into a loyal customer base. Meanwhile, there are many good technologies that can be used for building a community page – the good old WordPress is still the best.
It is the best community development technology. If you want to build your online community, you need to consider all the amazing features you get with WordPress. Plugins such as BB Press is an open-source, template-driven PHP/ MySQL forum software. It is very simple and doesn’t hamper the experience of the website.
Other tools such as wpFoRo and Asgaros Forum are equally good for creating a community blog. They are lightweight tools that are easy to manage and integrate with your WordPress site easily. However, there is only one tiny problem; you need to have some technical knowledge to build a WordPress Community blog page.

4. Shortcodes
Since we gave you a problem in the previous section, we would also give you a perfect solution for it. You might not know to code, but you have shortcodes. Shortcodes help you execute functions without having to code. It is an easy way to build an amazing website, add new features, customize plugins easily. They are short lines of code, and rather than memorizing multiple lines; you can have zero technical knowledge and start building a feature-rich website or application.
There are also plugins like Shortcoder, Shortcodes Ultimate, and the Basics available on WordPress that can be used, and you would not even have to remember the shortcodes.

5. Build Online Stores
If you still think about why to use WordPress, use it to build an online store. You can start selling your goods online and start selling. It is an affordable technology that helps you build a feature-rich eCommerce store with WordPress.
WooCommerce is an extension of WordPress and is one of the most used eCommerce solutions. WooCommerce holds a 28% share of the global market and is one of the best ways to set up an online store. It allows you to build user-friendly and professional online stores and has thousands of free and paid extensions. Moreover as an open-source platform, and you don’t have to pay for the license.
Apart from WooCommerce, there are Easy Digital Downloads, iThemes Exchange, Shopify eCommerce plugin, and so much more available.

6. Security Features
WordPress takes security very seriously. It offers tons of external solutions that help you in safeguarding your WordPress site. While there is no way to ensure 100% security, it provides regular updates with security patches and provides several plugins to help with backups, two-factor authorization, and more.
By choosing hosting providers like WP Engine, you can improve the security of the website. It helps in threat detection, manage patching and updates, and internal security audits for the customers, and so much more.

Read More

#use of wordpress #use wordpress for business website #use wordpress for website #what is use of wordpress #why use wordpress #why use wordpress to build a website

Tyrique  Littel

Tyrique Littel

1604008800

Static Code Analysis: What It Is? How to Use It?

Static code analysis refers to the technique of approximating the runtime behavior of a program. In other words, it is the process of predicting the output of a program without actually executing it.

Lately, however, the term “Static Code Analysis” is more commonly used to refer to one of the applications of this technique rather than the technique itself — program comprehension — understanding the program and detecting issues in it (anything from syntax errors to type mismatches, performance hogs likely bugs, security loopholes, etc.). This is the usage we’d be referring to throughout this post.

“The refinement of techniques for the prompt discovery of error serves as well as any other as a hallmark of what we mean by science.”

  • J. Robert Oppenheimer

Outline

We cover a lot of ground in this post. The aim is to build an understanding of static code analysis and to equip you with the basic theory, and the right tools so that you can write analyzers on your own.

We start our journey with laying down the essential parts of the pipeline which a compiler follows to understand what a piece of code does. We learn where to tap points in this pipeline to plug in our analyzers and extract meaningful information. In the latter half, we get our feet wet, and write four such static analyzers, completely from scratch, in Python.

Note that although the ideas here are discussed in light of Python, static code analyzers across all programming languages are carved out along similar lines. We chose Python because of the availability of an easy to use ast module, and wide adoption of the language itself.

How does it all work?

Before a computer can finally “understand” and execute a piece of code, it goes through a series of complicated transformations:

static analysis workflow

As you can see in the diagram (go ahead, zoom it!), the static analyzers feed on the output of these stages. To be able to better understand the static analysis techniques, let’s look at each of these steps in some more detail:

Scanning

The first thing that a compiler does when trying to understand a piece of code is to break it down into smaller chunks, also known as tokens. Tokens are akin to what words are in a language.

A token might consist of either a single character, like (, or literals (like integers, strings, e.g., 7Bob, etc.), or reserved keywords of that language (e.g, def in Python). Characters which do not contribute towards the semantics of a program, like trailing whitespace, comments, etc. are often discarded by the scanner.

Python provides the tokenize module in its standard library to let you play around with tokens:

Python

1

import io

2

import tokenize

3

4

code = b"color = input('Enter your favourite color: ')"

5

6

for token in tokenize.tokenize(io.BytesIO(code).readline):

7

    print(token)

Python

1

TokenInfo(type=62 (ENCODING),  string='utf-8')

2

TokenInfo(type=1  (NAME),      string='color')

3

TokenInfo(type=54 (OP),        string='=')

4

TokenInfo(type=1  (NAME),      string='input')

5

TokenInfo(type=54 (OP),        string='(')

6

TokenInfo(type=3  (STRING),    string="'Enter your favourite color: '")

7

TokenInfo(type=54 (OP),        string=')')

8

TokenInfo(type=4  (NEWLINE),   string='')

9

TokenInfo(type=0  (ENDMARKER), string='')

(Note that for the sake of readability, I’ve omitted a few columns from the result above — metadata like starting index, ending index, a copy of the line on which a token occurs, etc.)

#code quality #code review #static analysis #static code analysis #code analysis #static analysis tools #code review tips #static code analyzer #static code analysis tool #static analyzer

Ray  Patel

Ray Patel

1623292080

Getting started with Time Series using Pandas

An introductory guide on getting started with the Time Series Analysis in Python

Time series analysis is the backbone for many companies since most businesses work by analyzing their past data to predict their future decisions. Analyzing such data can be tricky but Python, as a programming language, can help to deal with such data. Python has both inbuilt tools and external libraries, making the whole analysis process both seamless and easy. Python’s Panda s library is frequently used to import, manage, and analyze datasets in various formats. However, in this article, we’ll use it to analyze stock prices and perform some basic time-series operations.

#data-analysis #time-series-analysis #exploratory-data-analysis #stock-market-analysis #financial-analysis #getting started with time series using pandas

Virgil  Hagenes

Virgil Hagenes

1603407600

Handling Outliers in Clusters using Silhouette Analysis

The real-world data often has a lot of outlier values. The cause of outliers can be data corruption or failure to record data. The handling of outliers is very important during the data preprocessing pipeline as the presence of outliers can prevent the model to perform best.

There are various strategies to handle outliers in the dataset. This article will cover how to handle outliers after clustering data into several clusters using Silhouette Analysis.

Silhouette Analysis:

The** silhouette method **is a method to find the optimal number of clusters and interpretation and validation of consistency within clusters of data. The silhouette method computes silhouette coefficients of each point that measure how much a point is similar to its own cluster compared to other clusters. by providing a succinct graphical representation of how well each object has been classified. The analysis of these graphical representations is called Silhouette Analysis.

The silhouette value is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation). The value of the silhouette ranges between [1, -1].

Important Points:
The Silhouette coefficient of +1 indicates that the sample is far away from the neighboring clusters.
The Silhouette coefficient of 0 indicates that the sample is on or very close to the decision boundary between two neighboring clusters.
Silhouette coefficient <0 indicates that those samples might have been assigned to the wrong cluster or are outliers.

#education #artificial-intelligence #data-science #machine-learning #clustering

Factor analysis & Cluster analysis on Countries Classification

Recently, I am attracted by the news that Tanzania has attained lower middle income status under the World Bank’s classification, five years ahead of projection. Being curious on how they make the judgement, I take a look of the World Bank’s official website here.

Basically, the World Bank classifies the world’s economies into four income groups — high, upper-middle, lower-middle and low by considering Gross National Income (GNI) per capita (current US$).

Undoubtedly, the indicator is a great one representing the average income level of residents essentially living in the economy territory, which in turn captures the overall economic development level of the country. Nonetheless, I believe the picture should be more than that since countries within the same income group may still vary a lot in different aspects.

Therefore, based on a group of interesting indicators selected from the World Bank database, I first try to apply factor analysis to see what dimension these indicators could represent, followed by cluster analysis to re-classify the economies. Hopefully, this article would help us understand the world in a better way. For the codes of this article, you can refer to the Github link here.

Data

First of all, 29 indicators in different aspects are selected for this work. To prevent some potential bias caused by numerical indicators with significant scale difference, such as Gross Domestic Products (GDP) or Population size, I have mainly chosen ratio or growth indicators, with some non-traditional ones included, such as diabetes prevalence and mobile cellular subscriptions.

Indicators selected

Image for post

List of selected indicators

From the above list, you may notice that the year for indicators are not the same due to data availability. And I believe this is one of the major reasons why the World bank uses single indicator (GNI per capita) for economies classification.

The best we can do is to pick the indicators with a reasonable amount of countries provided the data (> 140 economies), and then choose the most recent year. After filtering, there are totally 159 countries included in this exercise.

Correlation matrix

Now, let’s take a look of the relationships between the indicators by plotting a correlation matrix using the following codes.

Image for post

From the correlation matrix, we can observe some interesting but reasonable relationships. For example,

_(a) __Positive relationship between access to electricity (% of population) and percentage of people using at least basic drinking water services _— Electricity and drinking water are basic services in the society. Both should be developed simultaneously in similar stage, and hence have similar level of accessibility within a country.

_(b) Positive relationship between vulnerable employment (% of total employment) and employment in agriculture (% of total employment) — _Compared to employment in industrial and services sector, employment in agriculture should be more vulnerable.

_© __Negative relationship between rural population (% of population) and individuals using the Internet (% of population) _— Higher proportion of rural population in total, less developed the economy may be. So the proportion of rural population is negatively correlated with the percentage of individuals with access to the Internet, which represents the technological development of an economy.

Factor analysis

In fact, there are many other interesting relationships among the variables. In order to understand the whole picture in a faster and better way, we can apply factor analysis to reduce the 29 indicators into fewer numbers of factors.

But how many factors should be reduced to ? We can get an idea by plotting a scree plot with number of factors in the x-axis, and the eigenvalue in the y-axis. Generally, if a factor’s eigenvalue is greater than or close to one, we would include that. The scree plot below shows that there could be 7 factors.

For the concepts behind factor analysis, this article gives a good explanation.

#clustering #world #economy #factor-analysis #data-science #data analysis