I will walk you through the basics of NumPy. If you want to do machine learning then knowledge of NumPy is necessary. It one of the most widely used Python libraries. It is the most useful library if you are dealing with numbers in Python. NumPy guarantees great execution speed compared to standard Python libraries. It comes with a great number of built-in functions.
Advantages of using NumPy with Python:
First, let’s talk about its installation. NumPy is not part of the basic Python installation. We need to install it after the installation of Python in our system. We can do it by the pip using command,
pip install NumPy, or by installing Conda.
We are done with the installation and now we can jump right into NumPy. First, let’s start with the most important object in NumPy, the ndarray or multi-dimensional array. A multi-dimensional array is an array of arrays. In multi-dimensional arrays, this array,
[1,2,3], is a one-dimensional array because it contains only one row. The below is array is a two-dimensional array, as it contains multiple rows as well as multiple columns.
[[1 2 3]
[4 5 6]
[7 8 9]]
Let’s do some coding now. Here I am using Jupyter Notebook to run my code; you can use any IDE available and best suited to you.
We start with
In the following code, I am renaming the package to
np for convenience sake.
import numpy as np
Now, in order to create an array in NumPy, we use its array function as shown below:
array = np.array([1,2,3])
Output: [1 2 3]
This an example of a one-dimensional array.
Another way to create an array in NumPy is by using the
zeros = np.zeros(3)
Output: [0. 0. 0.]
If you look closely at the output, the generated array contains three zeros, but the type of the value is a float and, by default, NumPy creates the array of float values.
Going back to the first example inside NumPy’s
array function, we pass a list so we can also pass the
list variable inside the
array function and the output will be the same.
my_list = [1,2,3]
array = np.array(my_list)
Output: [1 2 3]
Now, let’s look into how to create a two-dimensional array using NumPy. Instead of passing the list now we have to pass a list of tuples or list of lists as mentioned below.
two_dim_array = np.array([(1,2,3), (4,5,6), (7,8,9)])
[[1 2 3]
[4 5 6]
[7 8 9]]
Note that the number of columns should be equal, otherwise NumPy will create an array of a list.
arr = np.array([[1,2,3], [4,6], [7,8,9]])
Output: [list([1, 2, 3]) list([4, 6]) list([7, 8, 9])]
Now, to create an array of a range, which is very good for making plots, we use the
range_array = np.linspace(0, 10, 4)
Output: [ 0. 3.33333333 6.66666667 10. ]
Here, the first argument is the starting point and next is the endpoint and the last argument defines how many elements you want in your array.
Now, to create random arrays we can use the
random function. Here, I’ve created an array of random integers, and, therefore, used
randint where first I specified the maximum value and then the size of my array.
random_array = np.random.randint(15, size=10)
Output: [ 7 11 8 2 6 4 9 6 10 9]
Now we know the basics of how to create arrays in NumPy. Now let’s look into some of its basic operations. First, we will start by finding the size and shape of an array. Size will give the number of elements in an array whereas shape will give us the shape of an array.
For a one dimensional array, the shape would be
(n, ), where
n is the number of elements in your array.
For a two dimensional array, the shape would be
n is the number of rows and
m is the number of columns in your array
Output: (3, 3)
If we want to change the shape of an array we can easily do it with the
reshape function. It will look like something like this:
two_dim_array = np.array([(1,2,3,4), (5,6,7,8)])
two_dim_array = two_dim_array.reshape(4,2)
We need to make sure that the rows and columns can be interchangeable. For example, here, we can change rows and columns from (2,4) to (4,2) but can not change them to (4,3) because, for that, we’d need 12 elements and we have only 8. Doing so will give an error as shown below.
ValueError: cannot reshape array of size 8 into shape (4,3)
To check the dimensions of our array. we can use the
Now, to get values from an array, a process known as slicing can be done in various ways. For example,
array will fetch the second element of my array, but if we want a range we can use
array[0:1], which will give us the first two elements. For the last value of the array, we can use
array[-1], which is similar to the standard method of getting elements from a list in Python.
Now to find the sum all we have to use is the
sum(), function but if we want to find the sum of the axis we can pass an argument for the axis.
Output: [ 6 8 10 12]
Output: [10 26]
Now to add two arrays all we have to use if + operator. For example:
print(two_dim_array + two_dim_array)
[[ 2 4 6 8]
[10 12 14 16]]
Similarly, we can use other operands as well, like multiple, subtract, and divide.
We have many other operations present in NumPy like
sqrt, which will give us the square root of every element, and
std, which is used to find the standard deviation. To explore more about these operations visit the NumPy’s documentation.
And that’s it for the introduction of NumPy.
Originally published by Prabhat Kashyap at https://dzone.com
A famous general is thought to have said, “A good sketch is better than a long speech.” That advice may have come from the battlefield, but it’s applicable in lots of other areas — including data science. “Sketching” out our data by visualizing it using ggplot2 in R is more impactful than simply describing the trends we find.
This is why we visualize data. We visualize data because it’s easier to learn from something that we can see rather than read. And thankfully for data analysts and data scientists who use R, there’s a tidyverse package called ggplot2 that makes data visualization a snap!
In this blog post, we’ll learn how to take some data and produce a visualization using R. To work through it, it’s best if you already have an understanding of R programming syntax, but you don’t need to be an expert or have any prior experience working with ggplot2
#data science tutorials #beginner #ggplot2 #r #r tutorial #r tutorials #rstats #tutorial #tutorials
Welcome to DataFlair!!! In this tutorial, we will learn Numpy Features and its importance.
NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays
NumPy (Numerical Python) is an open-source core Python library for scientific computations. It is a general-purpose array and matrices processing package. Python is slower as compared to Fortran and other languages to perform looping. To overcome this we use NumPy that converts monotonous code into the compiled form.
These are the important features of NumPy:
This is the most important feature of the NumPy library. It is the homogeneous array object. We perform all the operations on the array elements. The arrays in NumPy can be one dimensional or multidimensional.
The one-dimensional array is an array consisting of a single row or column. The elements of the array are of homogeneous nature.
In this case, we have various rows and columns. We consider each column as a dimension. The structure is similar to an excel sheet. The elements are homogenous.
We can use the functions in NumPy to work with code written in other languages. We can hence integrate the functionalities available in various programming languages. This helps implement inter-platform functions.
#numpy tutorials #features of numpy #numpy features #why use numpy #numpy
In this Numpy tutorial, we will learn Numpy applications.
NumPy is a basic level external library in Python used for complex mathematical operations. NumPy overcomes slower executions with the use of multi-dimensional array objects. It has built-in functions for manipulating arrays. We can convert different algorithms to can into functions for applying on arrays.NumPy has applications that are not only limited to itself. It is a very diverse library and has a wide range of applications in other sectors. Numpy can be put to use along with Data Science, Data Analysis and Machine Learning. It is also a base for other python libraries. These libraries use the functionalities in NumPy to increase their capabilities.
Arrays in Numpy are equivalent to lists in python. Like lists in python, the Numpy arrays are homogenous sets of elements. The most important feature of NumPy arrays is they are homogenous in nature. This differentiates them from python arrays. It maintains uniformity for mathematical operations that would not be possible with heterogeneous elements. Another benefit of using NumPy arrays is there are a large number of functions that are applicable to these arrays. These functions could not be performed when applied to python arrays due to their heterogeneous nature.
Arrays in NumPy are objects. Python deletes and creates these objects continually, as per the requirements. Hence, the memory allocation is less as compared to Python lists. NumPy has features to avoid memory wastage in the data buffer. It consists of functions like copies, view, and indexing that helps in saving a lot of memory. Indexing helps to return the view of the original array, that implements reuse of the data. It also specifies the data type of the elements which leads to code optimization.
We can also create multi-dimensional arrays in NumPy.These arrays have multiple rows and columns. These arrays have more than one column that makes these multi-dimensional. Multi-dimensional array implements the creation of matrices. These matrices are easy to work with. With the use of matrices the code also becomes memory efficient. We have a matrix module to perform various operations on these matrices.
Working with NumPy also includes easy to use functions for mathematical computations on the array data set. We have many modules for performing basic and special mathematical functions in NumPy. There are functions for Linear Algebra, bitwise operations, Fourier transform, arithmetic operations, string operations, etc.
#numpy tutorials #applications of numpy #numpy applications #uses of numpy #numpy
In this tutorial we’ll learn how to begin programming with R using RStudio. We’ll install R, and RStudio RStudio, an extremely popular development environment for R. We’ll learn the key RStudio features in order to start programming in R on our own.
If you already know how to use RStudio and want to learn some tips, tricks, and shortcuts, check out this Dataquest blog post.
[tidyverse](https://www.dataquest.io/blog/tutorial-getting-started-with-r-and-rstudio/#tve-jump-173bb264c2b)Packages into Memory
#data science tutorials #beginner #r tutorial #r tutorials #rstats #tutorial #tutorials
What exactly is clean data? Clean data is accurate, complete, and in a format that is ready to analyze. Characteristics of clean data include data that are:
Common symptoms of messy data include data that contain:
In this blog post, we will work with five property-sales datasets that are publicly available on the New York City Department of Finance Rolling Sales Data website. We encourage you to download the datasets and follow along! Each file contains one year of real estate sales data for one of New York City’s five boroughs. We will work with the following Microsoft Excel files:
As we work through this blog post, imagine that you are helping a friend launch their home-inspection business in New York City. You offer to help them by analyzing the data to better understand the real-estate market. But you realize that before you can analyze the data in R, you will need to diagnose and clean it first. And before you can diagnose the data, you will need to load it into R!
Benefits of using tidyverse tools are often evident in the data-loading process. In many cases, the tidyverse package
readxl will clean some data for you as Microsoft Excel data is loaded into R. If you are working with CSV data, the tidyverse
readr package function
read_csv() is the function to use (we’ll cover that later).
Let’s look at an example. Here’s how the Excel file for the Brooklyn borough looks:
The Brooklyn Excel file
Now let’s load the Brooklyn dataset into R from an Excel file. We’ll use the
readxlpackage. We specify the function argument
skip = 4 because the row that we want to use as the header (i.e. column names) is actually row 5. We can ignore the first four rows entirely and load the data into R beginning at row 5. Here’s the code:
library(readxl) # Load Excel files brooklyn <- read_excel("rollingsales_brooklyn.xls", skip = 4)
Note we saved this dataset with the variable name
brooklyn for future use.
The tidyverse offers a user-friendly way to view this data with the
glimpse() function that is part of the
tibble package. To use this package, we will need to load it for use in our current session. But rather than loading this package alone, we can load many of the tidyverse packages at one time. If you do not have the tidyverse collection of packages, install it on your machine using the following command in your R or R Studio session:
Once the package is installed, load it to memory:
tidyverse is loaded into memory, take a “glimpse” of the Brooklyn dataset:
glimpse(brooklyn) ## Observations: 20,185 ## Variables: 21 ## $ BOROUGH <chr> "3", "3", "3", "3", "3", "3", "… ## $ NEIGHBORHOOD <chr> "BATH BEACH", "BATH BEACH", "BA… ## $ `BUILDING CLASS CATEGORY` <chr> "01 ONE FAMILY DWELLINGS", "01 … ## $ `TAX CLASS AT PRESENT` <chr> "1", "1", "1", "1", "1", "1", "… ## $ BLOCK <dbl> 6359, 6360, 6364, 6367, 6371, 6… ## $ LOT <dbl> 70, 48, 74, 24, 19, 32, 65, 20,… ## $ `EASE-MENT` <lgl> NA, NA, NA, NA, NA, NA, NA, NA,… ## $ `BUILDING CLASS AT PRESENT` <chr> "S1", "A5", "A5", "A9", "A9", "… ## $ ADDRESS <chr> "8684 15TH AVENUE", "14 BAY 10T… ## $ `APARTMENT NUMBER` <chr> NA, NA, NA, NA, NA, NA, NA, NA,… ## $ `ZIP CODE` <dbl> 11228, 11228, 11214, 11214, 112… ## $ `RESIDENTIAL UNITS` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1… ## $ `COMMERCIAL UNITS` <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0… ## $ `TOTAL UNITS` <dbl> 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1… ## $ `LAND SQUARE FEET` <dbl> 1933, 2513, 2492, 1571, 2320, 3… ## $ `GROSS SQUARE FEET` <dbl> 4080, 1428, 972, 1456, 1566, 22… ## $ `YEAR BUILT` <dbl> 1930, 1930, 1950, 1935, 1930, 1… ## $ `TAX CLASS AT TIME OF SALE` <chr> "1", "1", "1", "1", "1", "1", "… ## $ `BUILDING CLASS AT TIME OF SALE` <chr> "S1", "A5", "A5", "A9", "A9", "… ## $ `SALE PRICE` <dbl> 1300000, 849000, 0, 830000, 0, … ## $ `SALE DATE` <dttm> 2020-04-28, 2020-03-18, 2019-0…
glimpse() function provides a user-friendly way to view the column names and data types for all columns, or variables, in the data frame. With this function, we are also able to view the first few observations in the data frame. This data frame has 20,185 observations, or property sales records. And there are 21 variables, or columns.
#data science tutorials #beginner #r #r tutorial #r tutorials #rstats #tidyverse #tutorial #tutorials