1601956560
In Part 1 of our series on how to write efficient code using NumPy, we covered the important topics of vectorization and broadcasting. In this part we will put these concepts into practice by implementing an efficient version of the K-Means clustering algorithm using NumPy. We will benchmark it against a naive version implemented entirely using looping in Python. In the end we’ll see that the NumPy version is about 70 times faster than the simple loop version.
To be exact, in this post we will cover:
Let’s get started!
In this post we will be optimizing an implementation of the k-means clustering algorithm. It is therefore imperative that we at least have a basic understanding of how the algorithm works. Of course, a detailed discussion would also be beyond the scope of this post; if you want to dig deeper into k-means you can find several recommended links below.
In a nutshell, k-means is an unsupervised learning algorithm which separates data into groups based on similarity. As it’s an unsupervised algorithm, this means we have no labels for the data.
The most important hyperparameter for the k-means algorithm is the number of clusters, or k. Once we have decided upon our value for k, the algorithm works as follows.
Here’s the pseudo-code for the algorithm.
Pseudo-code for the K-Means Clustering Algorithm
I’m going to leave K-Means at that. This is enough to help us code the algorithm. However, there is much more to it, such as how to choose a good value of k, how to evaluate the performance, which distance metrics can be used, preprocessing steps, and theory. In case you wish to dig deeper, here are a few links for you to study it further.
Now, let’s proceed with the implementation of the algorithm.
In this section we will be implementing the K-Means algorithm using Python and loops. We will not be using NumPy for this. This code will be used as a benchmark for our optimized version.
To perform clustering, we first need our data. While we can choose from multiple datasets online, let’s keep things rather simple and intuitive. We are going to synthesize a dataset by sampling from multiple Gaussian distributions, so that visualizing clusters is easy for us.
In case you don’t know what a Gaussian distribution is, check it out!
We will create data from four Gaussian’s with different means and distributions.
import numpy as np
## Size of dataset to be generated. The final size is 4 * data_size
data_size = 1000
num_iters = 50
num_clusters = 4
## sample from Gaussians
data1 = np.random.normal((5,5,5), (4, 4, 4), (data_size,3))
data2 = np.random.normal((4,20,20), (3,3,3), (data_size, 3))
data3 = np.random.normal((25, 20, 5), (5, 5, 5), (data_size,3))
data4 = np.random.normal((30, 30, 30), (5, 5, 5), (data_size,3))
## Combine the data to create the final dataset
data = np.concatenate((data1,data2, data3, data4), axis = 0)
## Shuffle the data
np.random.shuffle(data)
In order to aid our visualization, let’s plot this data in a 3-D space.
3-D Visualization of the Dataset
It’s very easy to see the four clusters of data in the plot above. This, for one, makes it easy for us to pick up a suitable value of k for our implementation. This goes in spirit of keeping the algorithmic details as simple as possible, so that we can focus on the implementation.
We begin by initializing our centroids, as well as a list to keep track of which centroid each data point is assigned to.
## Set random seed for reproducibility
random.seed(0)
## Initialize the list to store centroids
centroids = []
## Sample initial centroids
random_indices = random.sample(range(data.shape[0]), 4)
for i in random_indices:
centroids.append(data[i])
## Create a list to store which centroid is assigned to each dataset
assigned_centroids = [0] * len(data)
Before we implement our loop, we will first implement a few helper functions.
compute_l2_distance
takes two points (say [0, 1, 0]
and [4, 2, 3]
) and computes the L2 distance between them, according to the following formula.
L2(X1,X2)=∑dimensions(X1)i(X1[i]−X2[i])2
L2(X1,X2)=i∑dimensions(X1)(X1[i]−X2[i])2
def compute_l2_distance(x, centroid):
## Initialize the distance to 0
dist = 0
## Loop over the dimensions. Take squared difference and add to 'dist'
for i in range(len(x)):
dist += (centroid[i] - x[i])**2
return dist
The other helper function we implement is called get_closest_centroid
, the name being pretty self-explanatory. The function takes an input x
and a list, centroids
, and returns the index of the list centroids
corresponding to the centroid closest to x
.
def get_closest_centroid(x, centroids):
## Initialize the list to keep distances from each centroid
centroid_distances = []
## Loop over each centroid and compute the distance from data point.
for centroid in centroids:
dist = compute_l2_distance(x, centroid)
centroid_distances.append(dist)
## Get the index of the centroid with the smallest distance to the data point
closest_centroid_index = min(range(len(centroid_distances)), key=lambda x: centroid_distances[x])
return closest_centroid_index
Then we implement the function compute_sse
, which computes the SSE or Sum of Squared Errors. This metric is used to guide how many iterations we have to do. Once this value converges, we can stop training.
def compute_sse(data, centroids, assigned_centroids):
## Initialise SSE
sse = 0
## Compute the squared distance for each data point and add.
for i,x in enumerate(data):
## Get the associated centroid for data point
centroid = centroids[assigned_centroids[i]]
## Compute the distance to the centroid
dist = compute_l2_distance(x, centroid)
## Add to the total distance
sse += dist
sse /= len(data)
return sse
Now, let’s write the main loop. Refer to the pseudo-code mentioned above for reference. Instead of looping until convergence, we merely loop for 50 iterations.
## Number of dimensions in centroid
num_centroid_dims = data.shape[1]
## List to store SSE for each iteration
sse_list = []
tic = time.time()
## Loop over iterations
for n in range(num_iters):
## Loop over each data point
for i in range(len(data)):
x = data[i]
## Get the closest centroid
closest_centroid = get_closest_centroid(x, centroids)
## Assign the centroid to the data point.
assigned_centroids[i] = closest_centroid
## Loop over centroids and compute the new ones.
for c in range(len(centroids)):
## Get all the data points belonging to a particular cluster
cluster_data = [data[i] for i in range(len(data)) if assigned_centroids[i] == c]
## Initialize the list to hold the new centroid
new_centroid = [0] * len(centroids[0])
## Compute the average of cluster members to compute new centroid
## Loop over dimensions of data
for dim in range(num_centroid_dims):
dim_sum = [x[dim] for x in cluster_data]
dim_sum = sum(dim_sum) / len(dim_sum)
new_centroid[dim] = dim_sum
## assign the new centroid
centroids[c] = new_centroid
## Compute the SSE for the iteration
sse = compute_sse(data, centroids, assigned_centroids)
sse_list.append(sse)
The entire code can be viewed below.
import numpy as np
import matplotlib.pyplot as plt
import random
import time
## Size of dataset to be generated. The final size is 4 * data_size
data_size = 1000
num_iters = 50
num_clusters = 4
## sample from Gaussians
data1 = np.random.normal((5,5,5), (4, 4, 4), (data_size,3))
data2 = np.random.normal((4,20,20), (3,3,3), (data_size, 3))
data3 = np.random.normal((25, 20, 5), (5, 5, 5), (data_size,3))
data4 = np.random.normal((30, 30, 30), (5, 5, 5), (data_size,3))
## Combine the data to create the final dataset
data = np.concatenate((data1,data2, data3, data4), axis = 0)
## Shuffle the data
np.random.shuffle(data)
## Set random seed for reproducibility
random.seed(0)
## Initialise the list to store centroids
centroids = []
## Sample initial centroids
random_indices = random.sample(range(data.shape[0]), 4)
for i in random_indices:
centroids.append(data[i])
## Create a list to store which centroid is assigned to each dataset
assigned_centroids = [0] * len(data)
def compute_l2_distance(x, centroid):
## Initialise the distance to 0
dist = 0
## Loop over the dimensions. Take sqaured difference and add to `dist`
for i in range(len(x)):
dist += (centroid[i] - x[i])**2
return dist
def get_closest_centroid(x, centroids):
## Initialise the list to keep distances from each centroid
centroid_distances = []
## Loop over each centroid and compute the distance from data point.
for centroid in centroids:
dist = compute_l2_distance(x, centroid)
centroid_distances.append(dist)
## Get the index of the centroid with the smallest distance to the data point
closest_centroid_index = min(range(len(centroid_distances)), key=lambda x: centroid_distances[x])
return closest_centroid_index
def compute_sse(data, centroids, assigned_centroids):
## Initialise SSE
sse = 0
## Compute the squared distance for each data point and add.
for i,x in enumerate(data):
## Get the associated centroid for data point
centroid = centroids[assigned_centroids[i]]
## Compute the Distance to the centroid
dist = compute_l2_distance(x, centroid)
## Add to the total distance
sse += dist
sse /= len(data)
return sse
## Number of dimensions in centroid
num_centroid_dims = data.shape[1]
## List to store SSE for each iteration
sse_list = []
tic = time.time()
## Loop over iterations
for n in range(num_iters):
## Loop over each data point
for i in range(len(data)):
x = data[i]
## Get the closest centroid
closest_centroid = get_closest_centroid(x, centroids)
## Assign the centroid to the data point.
assigned_centroids[i] = closest_centroid
## Loop over centroids and compute the new ones.
for c in range(len(centroids)):
## Get all the data points belonging to a particular cluster
cluster_data = [data[i] for i in range(len(data)) if assigned_centroids[i] == c]
## Initialise the list to hold the new centroid
new_centroid = [0] * len(centroids[0])
## Compute the average of cluster members to compute new centroid
## Loop over dimensions of data
for dim in range(num_centroid_dims):
dim_sum = [x[dim] for x in cluster_data]
dim_sum = sum(dim_sum) / len(dim_sum)
new_centroid[dim] = dim_sum
## assign the new centroid
centroids[c] = new_centroid
## Compute the SSE for the iteration
sse = compute_sse(data, centroids, assigned_centroids)
sse_list.append(sse)
#
toc = time.time()
print((toc - tic)/50)
#numpy #python #machine-learning #data-science #dveloper
1667425440
Perl script converts PDF files to Gerber format
Pdf2Gerb generates Gerber 274X photoplotting and Excellon drill files from PDFs of a PCB. Up to three PDFs are used: the top copper layer, the bottom copper layer (for 2-sided PCBs), and an optional silk screen layer. The PDFs can be created directly from any PDF drawing software, or a PDF print driver can be used to capture the Print output if the drawing software does not directly support output to PDF.
The general workflow is as follows:
Please note that Pdf2Gerb does NOT perform DRC (Design Rule Checks), as these will vary according to individual PCB manufacturer conventions and capabilities. Also note that Pdf2Gerb is not perfect, so the output files must always be checked before submitting them. As of version 1.6, Pdf2Gerb supports most PCB elements, such as round and square pads, round holes, traces, SMD pads, ground planes, no-fill areas, and panelization. However, because it interprets the graphical output of a Print function, there are limitations in what it can recognize (or there may be bugs).
See docs/Pdf2Gerb.pdf for install/setup, config, usage, and other info.
#Pdf2Gerb config settings:
#Put this file in same folder/directory as pdf2gerb.pl itself (global settings),
#or copy to another folder/directory with PDFs if you want PCB-specific settings.
#There is only one user of this file, so we don't need a custom package or namespace.
#NOTE: all constants defined in here will be added to main namespace.
#package pdf2gerb_cfg;
use strict; #trap undef vars (easier debug)
use warnings; #other useful info (easier debug)
##############################################################################################
#configurable settings:
#change values here instead of in main pfg2gerb.pl file
use constant WANT_COLORS => ($^O !~ m/Win/); #ANSI colors no worky on Windows? this must be set < first DebugPrint() call
#just a little warning; set realistic expectations:
#DebugPrint("${\(CYAN)}Pdf2Gerb.pl ${\(VERSION)}, $^O O/S\n${\(YELLOW)}${\(BOLD)}${\(ITALIC)}This is EXPERIMENTAL software. \nGerber files MAY CONTAIN ERRORS. Please CHECK them before fabrication!${\(RESET)}", 0); #if WANT_DEBUG
use constant METRIC => FALSE; #set to TRUE for metric units (only affect final numbers in output files, not internal arithmetic)
use constant APERTURE_LIMIT => 0; #34; #max #apertures to use; generate warnings if too many apertures are used (0 to not check)
use constant DRILL_FMT => '2.4'; #'2.3'; #'2.4' is the default for PCB fab; change to '2.3' for CNC
use constant WANT_DEBUG => 0; #10; #level of debug wanted; higher == more, lower == less, 0 == none
use constant GERBER_DEBUG => 0; #level of debug to include in Gerber file; DON'T USE FOR FABRICATION
use constant WANT_STREAMS => FALSE; #TRUE; #save decompressed streams to files (for debug)
use constant WANT_ALLINPUT => FALSE; #TRUE; #save entire input stream (for debug ONLY)
#DebugPrint(sprintf("${\(CYAN)}DEBUG: stdout %d, gerber %d, want streams? %d, all input? %d, O/S: $^O, Perl: $]${\(RESET)}\n", WANT_DEBUG, GERBER_DEBUG, WANT_STREAMS, WANT_ALLINPUT), 1);
#DebugPrint(sprintf("max int = %d, min int = %d\n", MAXINT, MININT), 1);
#define standard trace and pad sizes to reduce scaling or PDF rendering errors:
#This avoids weird aperture settings and replaces them with more standardized values.
#(I'm not sure how photoplotters handle strange sizes).
#Fewer choices here gives more accurate mapping in the final Gerber files.
#units are in inches
use constant TOOL_SIZES => #add more as desired
(
#round or square pads (> 0) and drills (< 0):
.010, -.001, #tiny pads for SMD; dummy drill size (too small for practical use, but needed so StandardTool will use this entry)
.031, -.014, #used for vias
.041, -.020, #smallest non-filled plated hole
.051, -.025,
.056, -.029, #useful for IC pins
.070, -.033,
.075, -.040, #heavier leads
# .090, -.043, #NOTE: 600 dpi is not high enough resolution to reliably distinguish between .043" and .046", so choose 1 of the 2 here
.100, -.046,
.115, -.052,
.130, -.061,
.140, -.067,
.150, -.079,
.175, -.088,
.190, -.093,
.200, -.100,
.220, -.110,
.160, -.125, #useful for mounting holes
#some additional pad sizes without holes (repeat a previous hole size if you just want the pad size):
.090, -.040, #want a .090 pad option, but use dummy hole size
.065, -.040, #.065 x .065 rect pad
.035, -.040, #.035 x .065 rect pad
#traces:
.001, #too thin for real traces; use only for board outlines
.006, #minimum real trace width; mainly used for text
.008, #mainly used for mid-sized text, not traces
.010, #minimum recommended trace width for low-current signals
.012,
.015, #moderate low-voltage current
.020, #heavier trace for power, ground (even if a lighter one is adequate)
.025,
.030, #heavy-current traces; be careful with these ones!
.040,
.050,
.060,
.080,
.100,
.120,
);
#Areas larger than the values below will be filled with parallel lines:
#This cuts down on the number of aperture sizes used.
#Set to 0 to always use an aperture or drill, regardless of size.
use constant { MAX_APERTURE => max((TOOL_SIZES)) + .004, MAX_DRILL => -min((TOOL_SIZES)) + .004 }; #max aperture and drill sizes (plus a little tolerance)
#DebugPrint(sprintf("using %d standard tool sizes: %s, max aper %.3f, max drill %.3f\n", scalar((TOOL_SIZES)), join(", ", (TOOL_SIZES)), MAX_APERTURE, MAX_DRILL), 1);
#NOTE: Compare the PDF to the original CAD file to check the accuracy of the PDF rendering and parsing!
#for example, the CAD software I used generated the following circles for holes:
#CAD hole size: parsed PDF diameter: error:
# .014 .016 +.002
# .020 .02267 +.00267
# .025 .026 +.001
# .029 .03167 +.00267
# .033 .036 +.003
# .040 .04267 +.00267
#This was usually ~ .002" - .003" too big compared to the hole as displayed in the CAD software.
#To compensate for PDF rendering errors (either during CAD Print function or PDF parsing logic), adjust the values below as needed.
#units are pixels; for example, a value of 2.4 at 600 dpi = .0004 inch, 2 at 600 dpi = .0033"
use constant
{
HOLE_ADJUST => -0.004 * 600, #-2.6, #holes seemed to be slightly oversized (by .002" - .004"), so shrink them a little
RNDPAD_ADJUST => -0.003 * 600, #-2, #-2.4, #round pads seemed to be slightly oversized, so shrink them a little
SQRPAD_ADJUST => +0.001 * 600, #+.5, #square pads are sometimes too small by .00067, so bump them up a little
RECTPAD_ADJUST => 0, #(pixels) rectangular pads seem to be okay? (not tested much)
TRACE_ADJUST => 0, #(pixels) traces seemed to be okay?
REDUCE_TOLERANCE => .001, #(inches) allow this much variation when reducing circles and rects
};
#Also, my CAD's Print function or the PDF print driver I used was a little off for circles, so define some additional adjustment values here:
#Values are added to X/Y coordinates; units are pixels; for example, a value of 1 at 600 dpi would be ~= .002 inch
use constant
{
CIRCLE_ADJUST_MINX => 0,
CIRCLE_ADJUST_MINY => -0.001 * 600, #-1, #circles were a little too high, so nudge them a little lower
CIRCLE_ADJUST_MAXX => +0.001 * 600, #+1, #circles were a little too far to the left, so nudge them a little to the right
CIRCLE_ADJUST_MAXY => 0,
SUBST_CIRCLE_CLIPRECT => FALSE, #generate circle and substitute for clip rects (to compensate for the way some CAD software draws circles)
WANT_CLIPRECT => TRUE, #FALSE, #AI doesn't need clip rect at all? should be on normally?
RECT_COMPLETION => FALSE, #TRUE, #fill in 4th side of rect when 3 sides found
};
#allow .012 clearance around pads for solder mask:
#This value effectively adjusts pad sizes in the TOOL_SIZES list above (only for solder mask layers).
use constant SOLDER_MARGIN => +.012; #units are inches
#line join/cap styles:
use constant
{
CAP_NONE => 0, #butt (none); line is exact length
CAP_ROUND => 1, #round cap/join; line overhangs by a semi-circle at either end
CAP_SQUARE => 2, #square cap/join; line overhangs by a half square on either end
CAP_OVERRIDE => FALSE, #cap style overrides drawing logic
};
#number of elements in each shape type:
use constant
{
RECT_SHAPELEN => 6, #x0, y0, x1, y1, count, "rect" (start, end corners)
LINE_SHAPELEN => 6, #x0, y0, x1, y1, count, "line" (line seg)
CURVE_SHAPELEN => 10, #xstart, ystart, x0, y0, x1, y1, xend, yend, count, "curve" (bezier 2 points)
CIRCLE_SHAPELEN => 5, #x, y, 5, count, "circle" (center + radius)
};
#const my %SHAPELEN =
#Readonly my %SHAPELEN =>
our %SHAPELEN =
(
rect => RECT_SHAPELEN,
line => LINE_SHAPELEN,
curve => CURVE_SHAPELEN,
circle => CIRCLE_SHAPELEN,
);
#panelization:
#This will repeat the entire body the number of times indicated along the X or Y axes (files grow accordingly).
#Display elements that overhang PCB boundary can be squashed or left as-is (typically text or other silk screen markings).
#Set "overhangs" TRUE to allow overhangs, FALSE to truncate them.
#xpad and ypad allow margins to be added around outer edge of panelized PCB.
use constant PANELIZE => {'x' => 1, 'y' => 1, 'xpad' => 0, 'ypad' => 0, 'overhangs' => TRUE}; #number of times to repeat in X and Y directions
# Set this to 1 if you need TurboCAD support.
#$turboCAD = FALSE; #is this still needed as an option?
#CIRCAD pad generation uses an appropriate aperture, then moves it (stroke) "a little" - we use this to find pads and distinguish them from PCB holes.
use constant PAD_STROKE => 0.3; #0.0005 * 600; #units are pixels
#convert very short traces to pads or holes:
use constant TRACE_MINLEN => .001; #units are inches
#use constant ALWAYS_XY => TRUE; #FALSE; #force XY even if X or Y doesn't change; NOTE: needs to be TRUE for all pads to show in FlatCAM and ViewPlot
use constant REMOVE_POLARITY => FALSE; #TRUE; #set to remove subtractive (negative) polarity; NOTE: must be FALSE for ground planes
#PDF uses "points", each point = 1/72 inch
#combined with a PDF scale factor of .12, this gives 600 dpi resolution (1/72 * .12 = 600 dpi)
use constant INCHES_PER_POINT => 1/72; #0.0138888889; #multiply point-size by this to get inches
# The precision used when computing a bezier curve. Higher numbers are more precise but slower (and generate larger files).
#$bezierPrecision = 100;
use constant BEZIER_PRECISION => 36; #100; #use const; reduced for faster rendering (mainly used for silk screen and thermal pads)
# Ground planes and silk screen or larger copper rectangles or circles are filled line-by-line using this resolution.
use constant FILL_WIDTH => .01; #fill at most 0.01 inch at a time
# The max number of characters to read into memory
use constant MAX_BYTES => 10 * M; #bumped up to 10 MB, use const
use constant DUP_DRILL1 => TRUE; #FALSE; #kludge: ViewPlot doesn't load drill files that are too small so duplicate first tool
my $runtime = time(); #Time::HiRes::gettimeofday(); #measure my execution time
print STDERR "Loaded config settings from '${\(__FILE__)}'.\n";
1; #last value must be truthful to indicate successful load
#############################################################################################
#junk/experiment:
#use Package::Constants;
#use Exporter qw(import); #https://perldoc.perl.org/Exporter.html
#my $caller = "pdf2gerb::";
#sub cfg
#{
# my $proto = shift;
# my $class = ref($proto) || $proto;
# my $settings =
# {
# $WANT_DEBUG => 990, #10; #level of debug wanted; higher == more, lower == less, 0 == none
# };
# bless($settings, $class);
# return $settings;
#}
#use constant HELLO => "hi there2"; #"main::HELLO" => "hi there";
#use constant GOODBYE => 14; #"main::GOODBYE" => 12;
#print STDERR "read cfg file\n";
#our @EXPORT_OK = Package::Constants->list(__PACKAGE__); #https://www.perlmonks.org/?node_id=1072691; NOTE: "_OK" skips short/common names
#print STDERR scalar(@EXPORT_OK) . " consts exported:\n";
#foreach(@EXPORT_OK) { print STDERR "$_\n"; }
#my $val = main::thing("xyz");
#print STDERR "caller gave me $val\n";
#foreach my $arg (@ARGV) { print STDERR "arg $arg\n"; }
Author: swannman
Source Code: https://github.com/swannman/pdf2gerb
License: GPL-3.0 license
1600190040
SciPy is the most efficient open-source library in python. The main purpose is to compute mathematical and scientific problems. There are many sub-packages in SciPy which further increases its functionality. This is a very important package for data interpretation. We can segregate clusters from the data set. We can perform clustering using a single or multi-cluster. Initially, we generate the data set. Then we perform clustering on the data set. Let us learn more SciPy Clusters.
It is a method that can employ to determine clusters and their center. We can use this process on the raw data set. We can define a cluster when the points inside the cluster have the minimum distance when we compare it to points outside the cluster. The k-means method operates in two steps, given an initial set of k-centers,
The process iterates until the center value becomes constant. We then fix and assign the center value. The implementation of this process is very accurate using the SciPy library.
#numpy tutorials #clustering in scipy #k-means clustering in scipy #scipy clusters #numpy
1595235240
In this Numpy tutorial, we will learn Numpy applications.
NumPy is a basic level external library in Python used for complex mathematical operations. NumPy overcomes slower executions with the use of multi-dimensional array objects. It has built-in functions for manipulating arrays. We can convert different algorithms to can into functions for applying on arrays.NumPy has applications that are not only limited to itself. It is a very diverse library and has a wide range of applications in other sectors. Numpy can be put to use along with Data Science, Data Analysis and Machine Learning. It is also a base for other python libraries. These libraries use the functionalities in NumPy to increase their capabilities.
Arrays in Numpy are equivalent to lists in python. Like lists in python, the Numpy arrays are homogenous sets of elements. The most important feature of NumPy arrays is they are homogenous in nature. This differentiates them from python arrays. It maintains uniformity for mathematical operations that would not be possible with heterogeneous elements. Another benefit of using NumPy arrays is there are a large number of functions that are applicable to these arrays. These functions could not be performed when applied to python arrays due to their heterogeneous nature.
Arrays in NumPy are objects. Python deletes and creates these objects continually, as per the requirements. Hence, the memory allocation is less as compared to Python lists. NumPy has features to avoid memory wastage in the data buffer. It consists of functions like copies, view, and indexing that helps in saving a lot of memory. Indexing helps to return the view of the original array, that implements reuse of the data. It also specifies the data type of the elements which leads to code optimization.
We can also create multi-dimensional arrays in NumPy.These arrays have multiple rows and columns. These arrays have more than one column that makes these multi-dimensional. Multi-dimensional array implements the creation of matrices. These matrices are easy to work with. With the use of matrices the code also becomes memory efficient. We have a matrix module to perform various operations on these matrices.
Working with NumPy also includes easy to use functions for mathematical computations on the array data set. We have many modules for performing basic and special mathematical functions in NumPy. There are functions for Linear Algebra, bitwise operations, Fourier transform, arithmetic operations, string operations, etc.
#numpy tutorials #applications of numpy #numpy applications #uses of numpy #numpy
1595235180
Welcome to DataFlair!!! In this tutorial, we will learn Numpy Features and its importance.
NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays
NumPy (Numerical Python) is an open-source core Python library for scientific computations. It is a general-purpose array and matrices processing package. Python is slower as compared to Fortran and other languages to perform looping. To overcome this we use NumPy that converts monotonous code into the compiled form.
These are the important features of NumPy:
This is the most important feature of the NumPy library. It is the homogeneous array object. We perform all the operations on the array elements. The arrays in NumPy can be one dimensional or multidimensional.
The one-dimensional array is an array consisting of a single row or column. The elements of the array are of homogeneous nature.
In this case, we have various rows and columns. We consider each column as a dimension. The structure is similar to an excel sheet. The elements are homogenous.
We can use the functions in NumPy to work with code written in other languages. We can hence integrate the functionalities available in various programming languages. This helps implement inter-platform functions.
#numpy tutorials #features of numpy #numpy features #why use numpy #numpy
1679971140
In this pythonn - Numpy tutorial we will learn about Numpy linalg.svd: Singular Value Decomposition in Python. In mathematics, a singular value decomposition (SVD) of a matrix refers to the factorization of a matrix into three separate matrices. It is a more generalized version of an eigenvalue decomposition of matrices. It is further related to the polar decompositions.
In Python, it is easy to calculate the singular decomposition of a complex or a real matrix using the numerical python or the numpy library. The numpy library consists of various linear algebraic functions including one for calculating the singular value decomposition of a matrix.
In machine learning models, singular value decomposition is widely used to train models and in neural networks. It helps in improving accuracy and in reducing the noise in data. Singular value decomposition transforms one vector into another without them necessarily having the same dimension. Hence, it makes matrix manipulation in vector spaces easier and efficient. It is also used in regression analysis.
The function that calculates the singular value decomposition of a matrix in python belongs to the numpy module, named linalg.svd() .
The syntax of the numpy linalg.svd () is as follows:
numpy.linalg.svd(A, full_matrices=True, compute_uv=True, hermitian=False)
You can customize the true and false boolean values based on your requirements.
The parameters of the function are given below:
The function returns three types of matrices based on the parameters mentioned above:
It raises a LinALgError when the singular values diverse.
Before we dive into the examples, make sure you have the numpy module installed in your local system. This is required for using linear algebraic functions like the one discussed in this article. Run the following command in your terminal.
pip install numpy
That’s all you need right now, let’s look at how we will implement the code in the next section.
To calculate Singular Value Decomposition (SVD) in Python, use the NumPy library’s linalg.svd() function. Its syntax is numpy.linalg.svd(A, full_matrices=True, compute_uv=True, hermitian=False), where A is the matrix for which SVD is being calculated. It returns three matrices: S, U, and V.
In this first example we will take a 3X3 matrix and compute its singular value decomposition in the following way:
#importing the numpy module
import numpy as np
#using the numpy.array() function to create an array
A=np.array([[2,4,6],
[8,10,12],
[14,16,18]])
#calculatin all three matrices for the output
#using the numpy linalg.svd function
u,s,v=np.linalg.svd(A, compute_uv=True)
#displaying the result
print("the output is=")
print('s(the singular value) = ',s)
print('u = ',u)
print('v = ',v)
The output will be:
the output is=
s(the singular value) = [3.36962067e+01 2.13673903e+00 8.83684950e-16]
u = [[-0.21483724 0.88723069 0.40824829]
[-0.52058739 0.24964395 -0.81649658]
[-0.82633754 -0.38794278 0.40824829]]
v = [[-0.47967118 -0.57236779 -0.66506441]
[-0.77669099 -0.07568647 0.62531805]
[-0.40824829 0.81649658 -0.40824829]]
Example 1
In this example, we will be using the numpy.random.randint() function to create a random matrix. Let’s get into it!
#importing the numpy module
import numpy as np
#using the numpy.array() function to craete an array
A=np.random.randint(5, 200, size=(3,3))
#display the created matrix
print("The input matrix is=",A)
#calculatin all three matrices for the output
#using the numpy linalg.svd function
u,s,v=np.linalg.svd(A, compute_uv=True)
#displaying the result
print("the output is=")
print('s(the singular value) = ',s)
print('u = ',u)
print('v = ',v)
The output will be as follows:
The input matrix is= [[ 36 74 101]
[104 129 185]
[139 121 112]]
the output is=
s(the singular value) = [348.32979681 61.03199722 10.12165841]
u = [[-0.3635535 -0.48363012 -0.79619769]
[-0.70916514 -0.41054007 0.57318554]
[-0.60408084 0.77301925 -0.19372034]]
v = [[-0.49036384 -0.54970618 -0.67628871]
[ 0.77570499 0.0784348 -0.62620264]
[ 0.39727203 -0.83166766 0.38794824]]
Example 2
Suggested: Numpy linalg.eigvalsh: A Guide to Eigenvalue Computation.
In this article, we explored the concept of singular value decomposition in mathematics and how to calculate it using Python’s numpy module. We used the linalg.svd() function to compute the singular value decomposition of both given and random matrices. Numpy provides an efficient and easy-to-use method for performing linear algebra operations, making it highly valuable in machine learning, neural networks, and regression analysis. Keep exploring other linear algebraic functions in numpy to enhance your mathematical toolset in Python.
Article source at: https://www.askpython.com