1559018017

# NumPy Tutorial for Beginners

I will walk you through the basics of NumPy. If you want to do machine learning then knowledge of NumPy is necessary. It one of the most widely used Python libraries. It is the most useful library if you are dealing with numbers in Python. NumPy guarantees great execution speed compared to standard Python libraries. It comes with a great number of built-in functions.

Advantages of using NumPy with Python:

• Array-oriented computing.
• Efficiently implemented multi-dimensional arrays.
• Designed for scientific computation.

First, let’s talk about its installation. NumPy is not part of the basic Python installation. We need to install it after the installation of Python in our system. We can do it by the pip using command, `pip install NumPy`, or by installing Conda.

We are done with the installation and now we can jump right into NumPy. First, let’s start with the most important object in NumPy, the ndarray or multi-dimensional array. A multi-dimensional array is an array of arrays. In multi-dimensional arrays, this array, `[1,2,3]`, is a one-dimensional array because it contains only one row. The below is array is a two-dimensional array, as it contains multiple rows as well as multiple columns.

```[[1 2 3]
[4 5 6]
[7 8 9]]
```

Let’s do some coding now. Here I am using Jupyter Notebook to run my code; you can use any IDE available and best suited to you.

We start with `import NumPy`.

In the following code, I am renaming the package to `np` for convenience sake.

```import numpy as np
```

Now, in order to create an array in NumPy, we use its array function as shown below:

```array = np.array([1,2,3])
print(array)
Output: [1 2 3]
```

This an example of a one-dimensional array.

Another way to create an array in NumPy is by using the `zeros` function.

```zeros = np.zeros(3)
print(zeros)
Output: [0. 0. 0.]
```

If you look closely at the output, the generated array contains three zeros, but the type of the value is a float and, by default, NumPy creates the array of float values.

```type(zeros[0])
Output: numpy.float64
```

Going back to the first example inside NumPy’s `array` function, we pass a list so we can also pass the `list` variable inside the `array` function and the output will be the same.

```my_list = [1,2,3]
array = np.array(my_list)
print(array)
Output: [1 2 3]
```

Now, let’s look into how to create a two-dimensional array using NumPy. Instead of passing the list now we have to pass a list of tuples or list of lists as mentioned below.

```two_dim_array = np.array([(1,2,3), (4,5,6), (7,8,9)])
print(two_dim_array)
Output:
[[1 2 3]
[4 5 6]
[7 8 9]]
```

Note that the number of columns should be equal, otherwise NumPy will create an array of a list.

```arr = np.array([[1,2,3], [4,6], [7,8,9]])
print(arr)
Output: [list([1, 2, 3]) list([4, 6]) list([7, 8, 9])]
```

Now, to create an array of a range, which is very good for making plots, we use the `linspace` function.

```range_array = np.linspace(0, 10, 4)
print(range_array)
Output: [ 0.          3.33333333 6.66666667 10.        ]
```

Here, the first argument is the starting point and next is the endpoint and the last argument defines how many elements you want in your array.

Now, to create random arrays we can use the `random` function. Here, I’ve created an array of random integers, and, therefore, used `randint` where first I specified the maximum value and then the size of my array.

```random_array = np.random.randint(15, size=10)
print(random_array)
Output: [ 7 11  8 2 6 4 9 6 10  9]
```

Now we know the basics of how to create arrays in NumPy. Now let’s look into some of its basic operations. First, we will start by finding the size and shape of an array. Size will give the number of elements in an array whereas shape will give us the shape of an array.

For a one dimensional array, the shape would be `(n, )`, where `n` is the number of elements in your array.

For a two dimensional array, the shape would be `(n,m)`, where `n` is the number of rows and `m` is the number of columns in your array

```print(array.size)
Output: 3
print(array.shape)
Output: (3,)
print(multi_dim_array.size)
Output: 9
print(multi_dim_array.shape)
Output: (3, 3)
```

If we want to change the shape of an array we can easily do it with the `reshape` function. It will look like something like this:

```two_dim_array = np.array([(1,2,3,4), (5,6,7,8)])
two_dim_array = two_dim_array.reshape(4,2)
print(two_dim_array)
Output:
[[1 2]
[3 4]
[5 6]
[7 8]]
```

We need to make sure that the rows and columns can be interchangeable. For example, here, we can change rows and columns from (2,4) to (4,2) but can not change them to (4,3) because, for that, we’d need 12 elements and we have only 8. Doing so will give an error as shown below.

```ValueError: cannot reshape array of size 8 into shape (4,3)
```

To check the dimensions of our array. we can use the `ndim` function.

```print(two_dim_array.ndim)
Output: 2
```

Now, to get values from an array, a process known as slicing can be done in various ways. For example, `array[1]` will fetch the second element of my array, but if we want a range we can use `array[0:1]`, which will give us the first two elements. For the last value of the array, we can use `array[-1]`, which is similar to the standard method of getting elements from a list in Python.

Now to find the sum all we have to use is the `sum()`, function but if we want to find the sum of the axis we can pass an argument for the axis.

```print(two_dim_array.sum(axis=0))
Output: [ 6  8 10 12]
print(two_dim_array.sum(axis=1))
Output: [10 26]
```

Now to add two arrays all we have to use if + operator. For example:

```print(two_dim_array + two_dim_array)
Output:
[[ 2  4 6 8]
[10 12 14 16]]
```

Similarly, we can use other operands as well, like multiple, subtract, and divide.

We have many other operations present in NumPy like `sqrt`, which will give us the square root of every element, and `std`, which is used to find the standard deviation. To explore more about these operations visit the NumPy’s documentation.

And that’s it for the introduction of NumPy.

#numpy #python

1599097440

## Data Visualization in R with ggplot2: A Beginner Tutorial

A famous general is thought to have said, “A good sketch is better than a long speech.” That advice may have come from the battlefield, but it’s applicable in lots of other areas — including data science. “Sketching” out our data by visualizing it using ggplot2 in R is more impactful than simply describing the trends we find.

This is why we visualize data. We visualize data because it’s easier to learn from something that we can see rather than read. And thankfully for data analysts and data scientists who use R, there’s a tidyverse package called ggplot2 that makes data visualization a snap!

In this blog post, we’ll learn how to take some data and produce a visualization using R. To work through it, it’s best if you already have an understanding of R programming syntax, but you don’t need to be an expert or have any prior experience working with ggplot2

#data science tutorials #beginner #ggplot2 #r #r tutorial #r tutorials #rstats #tutorial #tutorials

1595235240

## NumPy Applications - Uses of Numpy

In this Numpy tutorial, we will learn Numpy applications.

NumPy is a basic level external library in Python used for complex mathematical operations. NumPy overcomes slower executions with the use of multi-dimensional array objects. It has built-in functions for manipulating arrays. We can convert different algorithms to can into functions for applying on arrays.NumPy has applications that are not only limited to itself. It is a very diverse library and has a wide range of applications in other sectors. Numpy can be put to use along with Data Science, Data Analysis and Machine Learning. It is also a base for other python libraries. These libraries use the functionalities in NumPy to increase their capabilities.

#### 1. An alternative for lists and arrays in Python

Arrays in Numpy are equivalent to lists in python. Like lists in python, the Numpy arrays are homogenous sets of elements. The most important feature of NumPy arrays is they are homogenous in nature. This differentiates them from python arrays. It maintains uniformity for mathematical operations that would not be possible with heterogeneous elements. Another benefit of using NumPy arrays is there are a large number of functions that are applicable to these arrays. These functions could not be performed when applied to python arrays due to their heterogeneous nature.

#### 2. NumPy maintains minimal memory

Arrays in NumPy are objects. Python deletes and creates these objects continually, as per the requirements. Hence, the memory allocation is less as compared to Python lists. NumPy has features to avoid memory wastage in the data buffer. It consists of functions like copies, view, and indexing that helps in saving a lot of memory. Indexing helps to return the view of the original array, that implements reuse of the data. It also specifies the data type of the elements which leads to code optimization.

#### 3. Using NumPy for multi-dimensional arrays

We can also create multi-dimensional arrays in NumPy.These arrays have multiple rows and columns. These arrays have more than one column that makes these multi-dimensional. Multi-dimensional array implements the creation of matrices. These matrices are easy to work with. With the use of matrices the code also becomes memory efficient. We have a matrix module to perform various operations on these matrices.

#### 4. Mathematical operations with NumPy

Working with NumPy also includes easy to use functions for mathematical computations on the array data set. We have many modules for performing basic and special mathematical functions in NumPy. There are functions for Linear Algebra, bitwise operations, Fourier transform, arithmetic operations, string operations, etc.

#numpy tutorials #applications of numpy #numpy applications #uses of numpy #numpy

1595235180

## NumPy Features - Why we should use Numpy?

Welcome to DataFlair!!! In this tutorial, we will learn Numpy Features and its importance.

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays

NumPy (Numerical Python) is an open-source core Python library for scientific computations. It is a general-purpose array and matrices processing package. Python is slower as compared to Fortran and other languages to perform looping. To overcome this we use NumPy that converts monotonous code into the compiled form.

#### NumPy Features

These are the important features of NumPy:

#### 1. High-performance N-dimensional array object

This is the most important feature of the NumPy library. It is the homogeneous array object. We perform all the operations on the array elements. The arrays in NumPy can be one dimensional or multidimensional.

#### a. One dimensional array

The one-dimensional array is an array consisting of a single row or column. The elements of the array are of homogeneous nature.

#### b. Multidimensional array

In this case, we have various rows and columns. We consider each column as a dimension. The structure is similar to an excel sheet. The elements are homogenous.

#### 2. It contains tools for integrating code from C/C++ and Fortran

We can use the functions in NumPy to work with code written in other languages. We can hence integrate the functionalities available in various programming languages. This helps implement inter-platform functions.

#numpy tutorials #features of numpy #numpy features #why use numpy #numpy

1596728880

## Tutorial: Getting Started with R and RStudio

In this tutorial we’ll learn how to begin programming with R using RStudio. We’ll install R, and RStudio RStudio, an extremely popular development environment for R. We’ll learn the key RStudio features in order to start programming in R on our own.

If you already know how to use RStudio and want to learn some tips, tricks, and shortcuts, check out this Dataquest blog post.

#data science tutorials #beginner #r tutorial #r tutorials #rstats #tutorial #tutorials

1596513720

## 1. Characteristics of Clean Data and Messy Data

What exactly is clean data? Clean data is accurate, complete, and in a format that is ready to analyze. Characteristics of clean data include data that are:

• Free of duplicate rows/values
• Error-free (e.g. free of misspellings)
• Relevant (e.g. free of special characters)
• The appropriate data type for analysis
• Free of outliers (or only contain outliers have been identified/understood), and
• Follows a “tidy data” structure

Common symptoms of messy data include data that contain:

• Special characters (e.g. commas in numeric values)
• Numeric values stored as text/character data types
• Duplicate rows
• Misspellings
• Inaccuracies
• White space
• Missing data
• Zeros instead of null values

## 2. Motivation

In this blog post, we will work with five property-sales datasets that are publicly available on the New York City Department of Finance Rolling Sales Data website. We encourage you to download the datasets and follow along! Each file contains one year of real estate sales data for one of New York City’s five boroughs. We will work with the following Microsoft Excel files:

• rollingsales_bronx.xls
• rollingsales_brooklyn.xls
• rollingsales_manhattan.xls
• rollingsales_queens.xls
• rollingsales_statenisland.xls

As we work through this blog post, imagine that you are helping a friend launch their home-inspection business in New York City. You offer to help them by analyzing the data to better understand the real-estate market. But you realize that before you can analyze the data in R, you will need to diagnose and clean it first. And before you can diagnose the data, you will need to load it into R!

Benefits of using tidyverse tools are often evident in the data-loading process. In many cases, the tidyverse package `readxl` will clean some data for you as Microsoft Excel data is loaded into R. If you are working with CSV data, the tidyverse `readr` package function `read_csv()` is the function to use (we’ll cover that later).

Let’s look at an example. Here’s how the Excel file for the Brooklyn borough looks:

The Brooklyn Excel file

Now let’s load the Brooklyn dataset into R from an Excel file. We’ll use the `readxl`package. We specify the function argument `skip = 4` because the row that we want to use as the header (i.e. column names) is actually row 5. We can ignore the first four rows entirely and load the data into R beginning at row 5. Here’s the code:

``````library(readxl) # Load Excel files
brooklyn <- read_excel("rollingsales_brooklyn.xls", skip = 4)
``````

Note we saved this dataset with the variable name `brooklyn` for future use.

## 4. View the Data with tidyr::glimpse()

The tidyverse offers a user-friendly way to view this data with the `glimpse()` function that is part of the `tibble` package. To use this package, we will need to load it for use in our current session. But rather than loading this package alone, we can load many of the tidyverse packages at one time. If you do not have the tidyverse collection of packages, install it on your machine using the following command in your R or R Studio session:

``````install.packages("tidyverse")
``````

Once the package is installed, load it to memory:

``````library(tidyverse)
``````

Now that `tidyverse` is loaded into memory, take a “glimpse” of the Brooklyn dataset:

``````glimpse(brooklyn)
## Observations: 20,185
## Variables: 21
## \$ BOROUGH <chr> "3", "3", "3", "3", "3", "3", "…
## \$ NEIGHBORHOOD <chr> "BATH BEACH", "BATH BEACH", "BA…
## \$ `BUILDING CLASS CATEGORY` <chr> "01 ONE FAMILY DWELLINGS", "01 …
## \$ `TAX CLASS AT PRESENT` <chr> "1", "1", "1", "1", "1", "1", "…
## \$ BLOCK <dbl> 6359, 6360, 6364, 6367, 6371, 6…
## \$ LOT <dbl> 70, 48, 74, 24, 19, 32, 65, 20,…
## \$ `EASE-MENT` <lgl> NA, NA, NA, NA, NA, NA, NA, NA,…
## \$ `BUILDING CLASS AT PRESENT` <chr> "S1", "A5", "A5", "A9", "A9", "…
## \$ ADDRESS <chr> "8684 15TH AVENUE", "14 BAY 10T…
## \$ `APARTMENT NUMBER` <chr> NA, NA, NA, NA, NA, NA, NA, NA,…
## \$ `ZIP CODE` <dbl> 11228, 11228, 11214, 11214, 112…
## \$ `RESIDENTIAL UNITS` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1…
## \$ `COMMERCIAL UNITS` <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## \$ `TOTAL UNITS` <dbl> 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1…
## \$ `LAND SQUARE FEET` <dbl> 1933, 2513, 2492, 1571, 2320, 3…
## \$ `GROSS SQUARE FEET` <dbl> 4080, 1428, 972, 1456, 1566, 22…
## \$ `YEAR BUILT` <dbl> 1930, 1930, 1950, 1935, 1930, 1…
## \$ `TAX CLASS AT TIME OF SALE` <chr> "1", "1", "1", "1", "1", "1", "…
## \$ `BUILDING CLASS AT TIME OF SALE` <chr> "S1", "A5", "A5", "A9", "A9", "…
## \$ `SALE PRICE` <dbl> 1300000, 849000, 0, 830000, 0, …
## \$ `SALE DATE` <dttm> 2020-04-28, 2020-03-18, 2019-0…
``````

The `glimpse()` function provides a user-friendly way to view the column names and data types for all columns, or variables, in the data frame. With this function, we are also able to view the first few observations in the data frame. This data frame has 20,185 observations, or property sales records. And there are 21 variables, or columns.

#data science tutorials #beginner #r #r tutorial #r tutorials #rstats #tidyverse #tutorial #tutorials