FreezeGun: Let Your Python Tests Travel Through Time

FreezeGun: Let your Python tests travel through time

FreezeGun is a library that allows your Python tests to travel through time by mocking the datetime module.

Usage

Once the decorator or context manager have been invoked, all calls to datetime.datetime.now(), datetime.datetime.utcnow(), datetime.date.today(), time.time(), time.localtime(), time.gmtime(), and time.strftime() will return the time that has been frozen. time.monotonic() and time.perf_counter() will also be frozen, but as usual it makes no guarantees about their absolute value, only their changes over time.

Decorator

from freezegun import freeze_time
import datetime
import unittest

# Freeze time for a pytest style test:

@freeze_time("2012-01-14")
def test():
    assert datetime.datetime.now() == datetime.datetime(2012, 1, 14)

# Or a unittest TestCase - freezes for every test, from the start of setUpClass to the end of tearDownClass

@freeze_time("1955-11-12")
class MyTests(unittest.TestCase):
    def test_the_class(self):
        assert datetime.datetime.now() == datetime.datetime(1955, 11, 12)

# Or any other class - freezes around each callable (may not work in every case)

@freeze_time("2012-01-14")
class Tester(object):
    def test_the_class(self):
        assert datetime.datetime.now() == datetime.datetime(2012, 1, 14)

# Or method decorator, might also pass frozen time object as kwarg

class TestUnitTestMethodDecorator(unittest.TestCase):
    @freeze_time('2013-04-09')
    def test_method_decorator_works_on_unittest(self):
        self.assertEqual(datetime.date(2013, 4, 9), datetime.date.today())

    @freeze_time('2013-04-09', as_kwarg='frozen_time')
    def test_method_decorator_works_on_unittest(self, frozen_time):
        self.assertEqual(datetime.date(2013, 4, 9), datetime.date.today())
        self.assertEqual(datetime.date(2013, 4, 9), frozen_time.time_to_freeze.today())

    @freeze_time('2013-04-09', as_kwarg='hello')
    def test_method_decorator_works_on_unittest(self, **kwargs):
        self.assertEqual(datetime.date(2013, 4, 9), datetime.date.today())
        self.assertEqual(datetime.date(2013, 4, 9), kwargs.get('hello').time_to_freeze.today())

Context manager

from freezegun import freeze_time

def test():
    assert datetime.datetime.now() != datetime.datetime(2012, 1, 14)
    with freeze_time("2012-01-14"):
        assert datetime.datetime.now() == datetime.datetime(2012, 1, 14)
    assert datetime.datetime.now() != datetime.datetime(2012, 1, 14)

Raw use

from freezegun import freeze_time

freezer = freeze_time("2012-01-14 12:00:01")
freezer.start()
assert datetime.datetime.now() == datetime.datetime(2012, 1, 14, 12, 0, 1)
freezer.stop()

Timezones

from freezegun import freeze_time

@freeze_time("2012-01-14 03:21:34", tz_offset=-4)
def test():
    assert datetime.datetime.utcnow() == datetime.datetime(2012, 1, 14, 3, 21, 34)
    assert datetime.datetime.now() == datetime.datetime(2012, 1, 13, 23, 21, 34)

    # datetime.date.today() uses local time
    assert datetime.date.today() == datetime.date(2012, 1, 13)

@freeze_time("2012-01-14 03:21:34", tz_offset=-datetime.timedelta(hours=3, minutes=30))
def test_timedelta_offset():
    assert datetime.datetime.now() == datetime.datetime(2012, 1, 13, 23, 51, 34)

Nice inputs

FreezeGun uses dateutil behind the scenes so you can have nice-looking datetimes.

@freeze_time("Jan 14th, 2012")
def test_nice_datetime():
    assert datetime.datetime.now() == datetime.datetime(2012, 1, 14)

Function and generator objects

FreezeGun is able to handle function and generator objects.

def test_lambda():
    with freeze_time(lambda: datetime.datetime(2012, 1, 14)):
        assert datetime.datetime.now() == datetime.datetime(2012, 1, 14)

def test_generator():
    datetimes = (datetime.datetime(year, 1, 1) for year in range(2010, 2012))

    with freeze_time(datetimes):
        assert datetime.datetime.now() == datetime.datetime(2010, 1, 1)

    with freeze_time(datetimes):
        assert datetime.datetime.now() == datetime.datetime(2011, 1, 1)

    # The next call to freeze_time(datetimes) would raise a StopIteration exception.

tick argument

FreezeGun has an additional tick argument which will restart time at the given value, but then time will keep ticking. This is alternative to the default parameters which will keep time stopped.

@freeze_time("Jan 14th, 2020", tick=True)
def test_nice_datetime():
    assert datetime.datetime.now() > datetime.datetime(2020, 1, 14)

auto_tick_seconds argument

FreezeGun has an additional auto_tick_seconds argument which will autoincrement the value every time by the given amount from the start value. This is alternative to the default parameters which will keep time stopped. Note that given auto_tick_seconds the tick parameter will be ignored.

@freeze_time("Jan 14th, 2020", auto_tick_seconds=15)
def test_nice_datetime():
    first_time = datetime.datetime.now()
    auto_incremented_time = datetime.datetime.now()
    assert first_time + datetime.timedelta(seconds=15) == auto_incremented_time

Manual ticks

FreezeGun allows for the time to be manually forwarded as well.

def test_manual_tick():
    initial_datetime = datetime.datetime(year=1, month=7, day=12,
                                        hour=15, minute=6, second=3)
    with freeze_time(initial_datetime) as frozen_datetime:
        assert frozen_datetime() == initial_datetime

        frozen_datetime.tick()
        initial_datetime += datetime.timedelta(seconds=1)
        assert frozen_datetime() == initial_datetime

        frozen_datetime.tick(delta=datetime.timedelta(seconds=10))
        initial_datetime += datetime.timedelta(seconds=10)
        assert frozen_datetime() == initial_datetime
def test_monotonic_manual_tick():
    initial_datetime = datetime.datetime(year=1, month=7, day=12,
                                        hour=15, minute=6, second=3)
    with freeze_time(initial_datetime) as frozen_datetime:
        monotonic_t0 = time.monotonic()
        frozen_datetime.tick(1.0)
        monotonic_t1 = time.monotonic()
        assert monotonic_t1 == monotonic_t0 + 1.0

Moving time to specify datetime

FreezeGun allows moving time to specific dates.

def test_move_to():
    initial_datetime = datetime.datetime(year=1, month=7, day=12,
                                        hour=15, minute=6, second=3)

    other_datetime = datetime.datetime(year=2, month=8, day=13,
                                        hour=14, minute=5, second=0)
    with freeze_time(initial_datetime) as frozen_datetime:
        assert frozen_datetime() == initial_datetime

        frozen_datetime.move_to(other_datetime)
        assert frozen_datetime() == other_datetime

        frozen_datetime.move_to(initial_datetime)
        assert frozen_datetime() == initial_datetime


@freeze_time("2012-01-14", as_arg=True)
def test(frozen_time):
    assert datetime.datetime.now() == datetime.datetime(2012, 1, 14)
    frozen_time.move_to("2014-02-12")
    assert datetime.datetime.now() == datetime.datetime(2014, 2, 12)

Parameter for move_to can be any valid freeze_time date (string, date, datetime).

Default arguments

Note that FreezeGun will not modify default arguments. The following code will print the current date. See here for why.

from freezegun import freeze_time
import datetime as dt

def test(default=dt.date.today()):
    print(default)

with freeze_time('2000-1-1'):
    test()

Installation

To install FreezeGun, simply:

$ pip install freezegun

On Debian systems:

$ sudo apt-get install python-freezegun

Ignore packages

Sometimes it's desired to ignore FreezeGun behaviour for particular packages (i.e. libraries). It's possible to ignore them for a single invocation:

from freezegun import freeze_time

with freeze_time('2020-10-06', ignore=['threading']):
    # ...

By default FreezeGun ignores following packages:

[
    'nose.plugins',
    'six.moves',
    'django.utils.six.moves',
    'google.gax',
    'threading',
    'Queue',
    'selenium',
    '_pytest.terminal.',
    '_pytest.runner.',
    'gi',
]

It's possible to set your own default ignore list:

import freezegun

freezegun.configure(default_ignore_list=['threading', 'tensorflow'])

Please note this will override default ignore list. If you want to extend existing defaults please use:

import freezegun

freezegun.configure(extend_ignore_list=['tensorflow'])

Author: spulec
Source Code: https://github.com/spulec/freezegun
License: Apache-2.0 License

#python 

What is GEEK

Buddha Community

FreezeGun: Let Your Python Tests Travel Through Time
Lina  Biyinzika

Lina Biyinzika

1678051620

A Practical Guide of Unsupervised Learning Algorithms

In this article, learn about Machine Learning Tutorial: A Practical  Guide of Unsupervised Learning Algorithms. Machine learning is a fast-growing technology that allows computers to learn from the past and predict the future. It uses numerous algorithms for building mathematical models and predicting future trends. Machine learning (ML) has widespread applications in the industry, including speech recognition, image recognition, churn prediction, email filtering, chatbot development, recommender systems, and much more.

Machine learning (ML) can be classified into three main categories; supervised, unsupervised, and reinforcement learning. In supervised learning, the model is trained on labeled data. While in unsupervised learning, unlabeled data is provided to the model to predict the outcomes. Reinforcement learning is feedback learning in which the agent collects a reward for each correct action and gets a penalty for a wrong decision. The goal of the learning agent is to get maximum reward points and deduce the error.

What is Unsupervised Learning?

In unsupervised learning, the model learns from unlabeled data without proper supervision.

Unsupervised learning uses machine learning techniques to cluster unlabeled data based on similarities and differences. The unsupervised algorithms discover hidden patterns in data without human supervision. Unsupervised learning aims to arrange the raw data into new features or groups together with similar patterns of data.

For instance, to predict the churn rate, we provide unlabeled data to our model for prediction. There is no information given that customers have churned or not. The model will analyze the data and find hidden patterns to categorize into two clusters: churned and non-churned customers.

Unsupervised Learning Approaches

Unsupervised algorithms can be used for three tasks—clustering, dimensionality reduction, and association. Below, we will highlight some commonly used clustering and association algorithms.

Clustering Techniques

Clustering, or cluster analysis, is a popular data mining technique for unsupervised learning. The clustering approach works to group non-labeled data based on similarities and differences. Unlike supervised learning, clustering algorithms discover natural groupings in data. 

A good clustering method produces high-quality clusters having high intra-class similarity (similar data within a cluster) and less intra-class similarity (cluster data is dissimilar to other clusters). 

It can be defined as the grouping of data points into various clusters containing similar data points. The same objects remain in the group that has fewer similarities with other groups. Here, we will discuss two popular clustering techniques: K-Means clustering and DBScan Clustering.

K-Means Clustering

K-Means is the simplest unsupervised technique used to solve clustering problems. It groups the unlabeled data into various clusters. The K value defines the number of clusters you need to tell the system how many to create.

K-Means is a centroid-based algorithm in which each cluster is associated with the centroid. The goal is to minimize the sum of the distances between the data points and their corresponding clusters.

It is an iterative approach that breaks down the unlabeled data into different clusters so that each data point belongs to a group with similar characteristics.

K-means clustering performs two tasks:

  1. Using an iterative process to create the best value of K.
  2. Each data point is assigned to its closest k-center. The data point that is closer to the particular k-center makes a cluster.

 

An illustration of K-means clustering. Image source

DBScan Clustering

“DBScan” stands for “Density-based spatial clustering of applications with noise.” There are three main words in DBscan: density, clustering, and noise. Therefore, this algorithm uses the notion of density-based clustering to form clusters and detect the noise.

Clusters are usually dense regions that are separated by lower density regions. Unlike the k-means algorithm, which works only on well-separated clusters, DBscan has a wider scope and can create clusters within the cluster. It discovers clusters of various shapes and sizes from a large set of data, which consists of noise and outliers.

There are two parameters in the DBScan algorithm:

minPts: The threshold, or the minimum number of points grouped together for a region considered as a dense region.

eps(ε): The distance measure used to locate the points in the neighborhood. 

dbscan-clustering

 

An illustration of density-based clustering. Image Source 

Association Rule Mining

An association rule mining is a popular data mining technique. It finds interesting correlations in large numbers of data items. This rule shows how frequently items occur in a transaction.

Market Basket Analysis is a typical example of an association rule mining that finds relationships between items in the grocery store. It enables retailers to identify and analyze the associations between items that people frequently buy.

Important terminology used in association rules:

Support: It tells us about the combination of items bought frequently or frequently bought items.

Confidence: It tells us how often the items A and B occur together, given the number of times A occurs.

Lift: The lift indicates the strength of a rule over the random occurrence of A and B. For instance, A->B, the life value is 5. It means that if you buy A, the occurrence of buying B is five times.

The Apriori algorithm is a well-known association rule based technique.

Apriori algorithm 

The Apriori algorithm was proposed by R. Agarwal and R. Srikant in 1994 to find the frequent items in the dataset. The algorithm’s name is based on the fact that it uses prior knowledge of frequently occurring things. 

The Apriori algorithm finds frequently occurring items with minimum support. 

It consists of two steps:

  • Generation of candidate itemsets.
  • Removing items that are infrequent and don’t fulfill the criteria of minimum support.

Practical Implementation of Unsupervised Algorithms 

In this tutorial, you will learn about the implementation of various unsupervised algorithms in Python. Scikit-learn is a powerful Python library widely used for various unsupervised learning tasks. It is an open-source library that provides numerous robust algorithms, which include classification, dimensionality reduction, clustering techniques, and association rules.

Let’s begin!

1. K-Means algorithm 

Now let’s dive deep into the implementation of the K-Means algorithm in Python. We’ll break down each code snippet so that you can understand it easily.

Import libraries

First of all, we will import the required libraries and get access to the functions.

#Let's import the required libraries
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns

Loading the dataset 

The dataset is taken from the kaggle website. You can easily download it from the given link. To load the dataset, we use the pd.read_csv() function. head() returns the first five rows of the dataset.

my_data = pd.read_csv('Customers_Mall.csv.') my_data.head() dataset-columns

The dataset contains five columns: customer ID, gender, age, annual income in (K$), and spending score from 1-100. 

Data Preprocessing 

The info() function is used to get quick information about the dataset. It shows the number of entries, columns, total non-null values, memory usage, and datatypes. 

my_data.info()

dataset-description

 

To check the missing values in the dataset, we use isnull().sum(), which returns the total number of null values.

 

#Check missing values 
my_data.isnull().sum()

dataset-null-values

 

The box plot or whisker plot is used to detect outliers in the dataset. It also shows a statistical five number summary, which includes the minimum, first quartile, median (2nd quartile), third quartile, and maximum.

my_data.boxplot(figsize=(8,4)) dataset-boxplot-detect-outliers

Using Box Plot, we’ve detected an outlier in the annual income column. Now we will try to remove it before training our model. 

#let's remove outlier from data
med =61
my_data["Annual Income (k$)"] = np.where(my_data["Annual Income (k$)"] >
 120,med,my_data['Annual Income (k$)'])


The outlier in the annual income column has been removed now to confirm we used the box plot again.

my_data.boxplot(figsize=(8,5)) outlier-removed

Data Visualization

A histogram is used to illustrate the important features of the distribution of data. The hist() function is used to show the distribution of data in each numerical column.

my_data.hist(figsize=(6,6)) 

The correlation heatmap is used to find the potential relationships between variables in the data and to display the strength of those relationships. To display the heatmap, we have used the seaborn plotting library.

plt.figure(figsize=(10,6)) sns.heatmap(my_data.corr(), annot=True, cmap='icefire').set_title('seaborn') plt.show() dataset-histogram

Choosing the Best K Value

The iloc() function is used to select a particular cell of the data. It enables us to select a value that belongs to a specific row or column. Here, we’ve chosen the annual income and spending score columns.

 

X_val = my_data.iloc[:, 3:].values X_val

 

# Loading Kmeans Library

from sklearn.cluster import KMeans

Now we will select the best value for K using the Elbow’s method. It is used to determine the optimal number of clusters in K-means clustering.

my_val = []

for i in range(1,11):

    kmeans = KMeans(n_clusters = i, init='k-means++', random_state = 123)

    kmeans.fit(X_val)

    my_val.append(kmeans.inertia_)

The sklearn.cluster.KMeans() is used to choose the number of clusters along with the initialization of other parameters. To display the result, just call the variable.

my_val dataset-iloc-function #Visualization of clusters using elbow’s method plt.plot(range(1,11),my_val) plt.xlabel('The No of clusters') plt.ylabel('Outcome') plt.title('The Elbow Method') plt.show() clusters-elbow-method

Through Elbow’s Method, when the graph looks like an arm, then the elbow on the arm is the best value of K. In this case, we’ve taken K=3, which is the optimal value for K.

kmeans = KMeans(n_clusters = 3, init='k-means++') kmeans.fit(X_val) number-of-clusters #To show centroids of clusters  kmeans.cluster_centers_ cluster-centers #Prediction of K-Means clustering  y_kmeans = kmeans.fit_predict(X_val) y_kmeans

fit-predict-function-kmeans

Splitting the dataset into three clusters

The scatter graph is used to plot the classification results of our dataset into three clusters.

plt.scatter(X_val[y_kmeans == 0,0], X_val[y_kmeans == 0,1], c='red',s=100)

plt.scatter(X_val[y_kmeans == 1,0], X_val[y_kmeans == 1,1], c='green',s=100)

plt.scatter(X_val[y_kmeans == 2,0], X_val[y_kmeans == 2,1], c='orange',s=100)

plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1], s=300, c='brown')

plt.title('K-Means Unsupervised Learning')

plt.show()


2. Apriori Algorithm

To implement the apriori algorithm, we will utilize “The Bread Basket” dataset. The dataset is available on Kaggle and you can download it from the link. This algorithm suggests products based on the user’s purchase history. Walmart has greatly utilized the algorithm to recommend relevant items to its users.

Let’s implement the Apriori algorithm in Python. 

Import libraries 

To implement the algorithm, we need to import some important libraries.

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

import seaborn as sns

Loading the dataset 

The dataset contains five columns and 20507 entries. The data_time is a prominent column and we can extract many vital insights from it.

my_data= pd.read_csv("bread basket.csv") my_data.head() bread-basket-dataset-apriori

Data Preprocessing 

Convert the data_time into an appropriate format.

my_data['date_time'] = pd.to_datetime(my_data['date_time'])

#Total No of unique customers

my_data['Transaction'].nunique()

unique-customers-apriori

Now we want to extract new columns from the data_time to extract meaningful information from the data.

#Let's extract date

my_data['date'] = my_data['date_time'].dt.date

#Let's extract time

my_data['time'] = my_data['date_time'].dt.time

#Extract month and replacing it with String

my_data['month'] = my_data['date_time'].dt.month

my_data['month'] = my_data['month'].replace((1,2,3,4,5,6,7,8,9,10,11,12), 

                                          ('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug',

                                          'Sep','Oct','Nov','Dec'))

#Extract hour

my_data[‘hour’] = my_data[‘date_time’].dt.hour

# Replacing hours with text

# Replacing hours with text

hr_num = (1,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23)

hr_obj = (‘1-2′,’7-8′,’8-9′,’9-10′,’10-11′,’11-12′,’12-13′,’13-14′,’14-15’,

               ’15-16′,’16-17′,’17-18′,’18-19′,’19-20′,’20-21′,’21-22′,’22-23′,’23-24′)

my_data[‘hour’] = my_data[‘hour’].replace(hr_num, hr_obj)

# Extracting weekday and replacing it with String 

my_data[‘weekday’] = my_data[‘date_time’].dt.weekday

my_data[‘weekday’] = my_data[‘weekday’].replace((0,1,2,3,4,5,6), 

                                          (‘Mon’,’Tues’,’Wed’,’Thur’,’Fri’,’Sat’,’Sun’))

#Now drop date_time column

my_data.drop(‘date_time’, axis = 1, inplace = True)

After extracting the date, time, month, and hour columns, we dropped the data_time column.

Now to display, we simply use the head() function to see the changes in the dataset.

my_data.head()

dataset-apriori

# cleaning the item column

my_data[‘Item’] = my_data[‘Item’].str.strip()

my_data[‘Item’] = my_data[‘Item’].str.lower()

my_data.head()

clean-dataset

Data Visualization 

To display the top 10 items purchased by customers, we used a barplot() of the seaborn library. 

plt.figure(figsize=(10,5))

sns.barplot(x=my_data.Item.value_counts().head(10).index, y=my_data.Item.value_counts().head(10).values,palette='RdYlGn')

plt.xlabel('No of Items', size = 17)

plt.xticks(rotation=45)

plt.ylabel('Total Items', size = 18)

plt.title('Top 10 Items purchased', color = 'blue', size = 23)

plt.show()


From the graph, coffee is the top item purchased by the customers, followed by bread.

Now, to display the number of orders received each month, the groupby() function is used along with barplot() to visually show the results.

mon_Tran =my_data.groupby('month')['Transaction'].count().reset_index() mon_Tran.loc[:,"mon_order"] =[4,8,12,2,1,7,6,3,5,11,10,9] mon_Tran.sort_values("mon_order",inplace=True) plt.figure(figsize=(12,5)) sns.barplot(data = mon_Tran, x = "month", y = "Transaction") plt.xlabel('Months', size = 14) plt.ylabel('Monthly Orders', size = 14) plt.title('No of orders received each month', color = 'blue', size = 18) plt.show() orders-received-dataset

To show the number of orders received each day, we applied groupby() to the weekday column.

wk_Tran = my_data.groupby('weekday')['Transaction'].count().reset_index()

wk_Tran.loc[:,"wk_ord"] = [4,0,5,6,3,1,2]

wk_Tran.sort_values("wk_ord",inplace=True)

plt.figure(figsize=(11,4))

sns.barplot(data = wk_Tran, x = "weekday", y = "Transaction",palette='RdYlGn')

plt.xlabel('Week Day', size = 14)

plt.ylabel('Per day orders', size = 14)

plt.title('Orders received per day', color = 'blue', size = 18)

plt.show()


 Implementation of the Apriori Algorithm 

We import the mlxtend library to implement the association rules and count the number of items.

from mlxtend.frequent_patterns import association_rules, apriori

tran_str= my_data.groupby(['Transaction', 'Item'])['Item'].count().reset_index(name ='Count')

tran_str.head(8)

association-rule-mixtend

Now we’ll make a mxn matrix where m=transaction and n=items, and each row represents whether the item was in the transaction or not.

Mar_baskt = tran_str.pivot_table(index='Transaction', columns='Item', values='Count', aggfunc='sum').fillna(0)

Mar_baskt.head()

market-basket

We want to make a function that returns 0 and 1. 0 means that the item wasn’t present in the transaction, while 1 means the item exists.

def encode(val):

    if val<=0:

        return 0

    if val>=1:

        return 1

#Let's apply the function to the dataset

Basket=Mar_baskt.applymap(encode)

Basket.head()

basket-head

#using apriori algorithm to set min_support 0.01 means 1% freq_items = apriori(Basket, min_support = 0.01,use_colnames = True) freq_items.head()

frequent-items-apriori

Using the association_rules() function to generate the most frequent items from the dataset.

App_rule= association_rules(freq_items, metric = "lift", min_threshold = 1) App_rule.sort_values('confidence', ascending = False, inplace = True) App_rule.head() association-rules-apriori

From the above implementation, the most frequent items are coffee and toast, both having a lift value of 1.47 and a confidence value of 0.70. 

3. Principal Component Analysis 

Principal component analysis (PCA) is one of the most widely used unsupervised learning techniques. It can be used for various tasks, including dimensionality reduction, information compression, exploratory data analysis and Data de-noising.

Let’s use the PCA algorithm!

First we import the required libraries to implement this algorithm.

import numpy as np 

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

%matplotlib inline

from sklearn.decomposition import PCA

from sklearn.datasets import load_digits

Loading the Dataset 

To implement the PCA algorithm the load_digits dataset of Scikit-learn is used which can easily be loaded using the below command. The dataset contains images data which include 1797 entries and 64 columns.

 

#Load the dataset

my_data= load_digits()

#Creating features

X_value = my_data.data

#Creating target

#Let's check the shape of X_value

X_value.shape

 

dataset-X-shape

 

 

#Each image is 8x8 pixels therefore 64px  my_data.images[10] image-pixels #Let's display the image plt.gray()  plt.matshow(my_data.images[34])  plt.show()

image-pixels

Now let’s project data from 64 columns to 16 to show how 16 dimensions classify the data.

X_val = my_data.data 

y_val = my_data.target

my_pca = PCA(16)  

X_projection = my_pca.fit_transform(X_val)

print(X_val.shape)

print(X_projection.shape)

projection-shape

Using colormap we visualize that with only ten dimensions  we can classify the data points. Now we’ll select the optimal number of dimensions (principal components) by which data can be reduced into lower dimensions.

plt.scatter(X_projection[:, 0], X_projection[:, 1], c=y_val, edgecolor='white',

            cmap=plt.cm.get_cmap("gist_heat",12))

plt.colorbar();

x-projection

pca=PCA().fit(X_val)

plt.plot(np.cumsum(my_pca.explained_variance_ratio_))

plt.xlabel('Principal components')

plt.ylabel('Explained variance')

Based on the below graph, only 12 components are required to explain more than 80% of the variance which is still better than computing all the 64 features. Thus, we’ve reduced the large number of dimensions into 12 dimensions to avoid the dimensionality curse. pca=PCA().fit(X_val)

plt.plot(np.cumsum(pca.explained_variance_ratio_))

plt.xlabel('Principal components')

plt.ylabel('Explained variance')



#Let's visualize how it looks like

Unsupervised_pca = PCA(12)  

X_pro = Unsupervised_pca.fit_transform(X_val)

print("New Data Shape is =>",X_pro.shape)

#Let's Create a scatter plot

plt.scatter(X_pro[:, 0], X_pro[:, 1], c=y_val, edgecolor='white',

            cmap=plt.cm.get_cmap("nipy_spectral",10))

plt.colorbar();


principal-component-analysis

Wrapping Up 

beyond machine

In this machine learning tutorial, we’ve implemented the Kmeans, Apriori, and PCA algorithms. These are some of the most widely used algorithms, having numerous industrial applications and solve many real world problems. For instance, K-means clustering is used in astronomy to study stellar and galaxy spectra, solar polarization spectra, and X-ray spectra. And, Apriori is used by retail stores to optimize their product inventory. 

Dreaming of becoming a data scientist or data analyst even without a university and a college degree? Do you need the knowledge of data science and analysis for promotions in your current role?

Are you interested in securing your dream job in data science and analysis and looking for a way to get started, we can help you? With over 10 years of experience in data science and data analysis, we will teach you the rubrics, guiding you with one-on-one lessons from the fundamentals until you become a pro.

Our courses are affordable and easy to understand with numerous exercises and assignments you can learn from. At the completion of our courses, you’ll be readily equipped with technical and practical skills to take on any data science and data analysis role in companies, collaborate effectively among teams and help businesses meet and exceed their objectives by extracting actionable insights from data.

Original article sourced at: https://thedatascientist.com

#machine-learning 

Shardul Bhatt

Shardul Bhatt

1626775355

Why use Python for Software Development

No programming language is pretty much as diverse as Python. It enables building cutting edge applications effortlessly. Developers are as yet investigating the full capability of end-to-end Python development services in various areas. 

By areas, we mean FinTech, HealthTech, InsureTech, Cybersecurity, and that's just the beginning. These are New Economy areas, and Python has the ability to serve every one of them. The vast majority of them require massive computational abilities. Python's code is dynamic and powerful - equipped for taking care of the heavy traffic and substantial algorithmic capacities. 

Programming advancement is multidimensional today. Endeavor programming requires an intelligent application with AI and ML capacities. Shopper based applications require information examination to convey a superior client experience. Netflix, Trello, and Amazon are genuine instances of such applications. Python assists with building them effortlessly. 

5 Reasons to Utilize Python for Programming Web Apps 

Python can do such numerous things that developers can't discover enough reasons to admire it. Python application development isn't restricted to web and enterprise applications. It is exceptionally adaptable and superb for a wide range of uses.

Robust frameworks 

Python is known for its tools and frameworks. There's a structure for everything. Django is helpful for building web applications, venture applications, logical applications, and mathematical processing. Flask is another web improvement framework with no conditions. 

Web2Py, CherryPy, and Falcon offer incredible capabilities to customize Python development services. A large portion of them are open-source frameworks that allow quick turn of events. 

Simple to read and compose 

Python has an improved sentence structure - one that is like the English language. New engineers for Python can undoubtedly understand where they stand in the development process. The simplicity of composing allows quick application building. 

The motivation behind building Python, as said by its maker Guido Van Rossum, was to empower even beginner engineers to comprehend the programming language. The simple coding likewise permits developers to roll out speedy improvements without getting confused by pointless subtleties. 

Utilized by the best 

Alright - Python isn't simply one more programming language. It should have something, which is the reason the business giants use it. Furthermore, that too for different purposes. Developers at Google use Python to assemble framework organization systems, parallel information pusher, code audit, testing and QA, and substantially more. Netflix utilizes Python web development services for its recommendation algorithm and media player. 

Massive community support 

Python has a steadily developing community that offers enormous help. From amateurs to specialists, there's everybody. There are a lot of instructional exercises, documentation, and guides accessible for Python web development solutions. 

Today, numerous universities start with Python, adding to the quantity of individuals in the community. Frequently, Python designers team up on various tasks and help each other with algorithmic, utilitarian, and application critical thinking. 

Progressive applications 

Python is the greatest supporter of data science, Machine Learning, and Artificial Intelligence at any enterprise software development company. Its utilization cases in cutting edge applications are the most compelling motivation for its prosperity. Python is the second most well known tool after R for data analytics.

The simplicity of getting sorted out, overseeing, and visualizing information through unique libraries makes it ideal for data based applications. TensorFlow for neural networks and OpenCV for computer vision are two of Python's most well known use cases for Machine learning applications.

Summary

Thinking about the advances in programming and innovation, Python is a YES for an assorted scope of utilizations. Game development, web application development services, GUI advancement, ML and AI improvement, Enterprise and customer applications - every one of them uses Python to its full potential. 

The disadvantages of Python web improvement arrangements are regularly disregarded by developers and organizations because of the advantages it gives. They focus on quality over speed and performance over blunders. That is the reason it's a good idea to utilize Python for building the applications of the future.

#python development services #python development company #python app development #python development #python in web development #python software development

Gage  Monahan

Gage Monahan

1616417160

Python Unit Testing: One Time Initialization

Introduction
Recently, after writing a microservice to manage Create-Read-Update-Delete (CRUD) operations for a MongoDB database, it logically was time to write unit test cases. As I had developed the microservice using Python and Flask, the unit test also had to be in Python. For unit testing, I used the unittest package.

Writing Unit Tests
By writing unit tests, not only was I ensuring that I could run the same tests over and over again but, assuming that the underlying functionality of the micro-service had not broken, I could expect the same results for each execution. Moreover, as the number of REST functions increased, the unit tests would help me ensure that none of the existing functionality was broken. Additionally, the unit tests would help me reduce the total amount of time I needed to spend on testing the same functionality over and over again.

#python #python development #python unit test #python unit testing

Art  Lind

Art Lind

1602968400

Python Tricks Every Developer Should Know

Python is awesome, it’s one of the easiest languages with simple and intuitive syntax but wait, have you ever thought that there might ways to write your python code simpler?

In this tutorial, you’re going to learn a variety of Python tricks that you can use to write your Python code in a more readable and efficient way like a pro.

Let’s get started

Swapping value in Python

Instead of creating a temporary variable to hold the value of the one while swapping, you can do this instead

>>> FirstName = "kalebu"
>>> LastName = "Jordan"
>>> FirstName, LastName = LastName, FirstName 
>>> print(FirstName, LastName)
('Jordan', 'kalebu')

#python #python-programming #python3 #python-tutorials #learn-python #python-tips #python-skills #python-development

Art  Lind

Art Lind

1602666000

How to Remove all Duplicate Files on your Drive via Python

Today you’re going to learn how to use Python programming in a way that can ultimately save a lot of space on your drive by removing all the duplicates.

Intro

In many situations you may find yourself having duplicates files on your disk and but when it comes to tracking and checking them manually it can tedious.

Heres a solution

Instead of tracking throughout your disk to see if there is a duplicate, you can automate the process using coding, by writing a program to recursively track through the disk and remove all the found duplicates and that’s what this article is about.

But How do we do it?

If we were to read the whole file and then compare it to the rest of the files recursively through the given directory it will take a very long time, then how do we do it?

The answer is hashing, with hashing can generate a given string of letters and numbers which act as the identity of a given file and if we find any other file with the same identity we gonna delete it.

There’s a variety of hashing algorithms out there such as

  • md5
  • sha1
  • sha224, sha256, sha384 and sha512

#python-programming #python-tutorials #learn-python #python-project #python3 #python #python-skills #python-tips