1593930180

# Don’t let missing values ruin your analysis output, Deal with them!

Missing values or their replacement values can lead to huge errors in your analysis output wheter it is a machine learning model, KPIs or a report.

Often analysts deal with missing values just like there is only one type of them. It is not the case, there is three types of missing values and there is ways of dealing with0 each one of them.

# Type of null values

Missing at random (MAR) : The presence of a null value in a variable is not random but rather dependent of a known or unknown characteristic of the record. So why is it called missing at random you might ask yourself? Because the null value is independent of it actual value. Depending on your dataset it can or cannot be tested. To find out you should compare the others variable distribution for records with missing and non missing values.

**_Ex: _**A dataset on education that contains a lot of missing values for IQ score of young children just because it is less common to have a four years old to pass the test compared to a twelve years old. The null values aren’t correlated to the IQ actual value but to the age.

Missing completely at random (MCAR) : The presence of the null value is independent of any know or unknown characteristics of the record. Here again, depending on your dataset, it can or cannot be tested. Just like for MAR, the test would consists in comparing the distribution of the others variable for records with missing values vs ones with no null values.

Ex: Missing data for survey respondents for which their questionnaires results was lost in the mail. Totally independent from the concerned variable and the characteristics of the respondents (ie records).

Missing not at random (MNAR): The presence of the null value is dependent to it actual value. This one cannot be tested, unless you know the actual value which is a bit paradoxical.

Ex: Missing values for the IQ variable only for individuals which had low score.

You might have guessed it, in the second case only it is safe to drop the null values.

For the two others cases, dropping values would result in ignoring a group of the overall population.

In the last case the fact that the record has a null value carries some information about the actual value.

# Dealing with missing values

## Drop

Dropping row : (Only for MCAR) This can be the perfect solution if you have only a small proportion of missing values relatively to your dataset size. However, it quickly become unviable as the proportion grows.

**Dropping col : **This one is often not considered because it results in an important loss of information. As a rule of thumb you can start considering it when the proportion of null values is higher than 60%.

## Imputation

Last or next value : (Only for time series with MCAR) It is ok to use the last or the next value to fill a missing value as long as you are working on a time series problem.

Mean value : (Only for MCAR) Using the mean value is often a bad solution as it is sensible to outliers.

Median value : (Only for MCAR) Similar to mean value but more robust to outliers.

Mode value : (Only for MCAR) By choosing the most common value you make sure that you are correctly filling the null most of the times. Beware of multi-mode distribution for which it will no longer be a viable solution.

Replace with constant : (Only for MNAR) As we have seen before, missing value in case of MNAR actually hold some information about the actual value. So, it does make sense to fill them using a constant (different from others values).

Linear interpolation : (Only for time series with MCAR) In time series problem with a trend and little to no seasonality a missing value can be approximated by doing a linear interpolation using the value before it and the value after it. Here is the formula :

Linear interpolation (1st order)

Spline interpolation : (Only for time series with MCAR) This is similar to linear interpolation but it used high order polynomial features to have a smoother interpolation. Again, it is not suitable for seasonal data.

Linear/Spline interpolation with seasonal adjustment : (Only for time series with MCAR) it follows the same principle as linear and spline interpolation but with adjustments to the seasonality. It consists in deseasonalizing the data, applying linear/spline interpolation and applying back the seasonality to the time series. Here is a detail explanation of STL a method for deseasonalizing the data.

#data-science #missing-values #data-cleaning #data-imputation #machine-learning #data analysis

1666082925

## How to Create Arrays in Python

### In this tutorial, you'll know the basics of how to create arrays in Python using the array module. Learn how to use Python arrays. You'll see how to define them and the different methods commonly used for performing operations on them.

This tutorialvideo on 'Arrays in Python' will help you establish a strong hold on all the fundamentals in python programming language. Below are the topics covered in this video:
1:15 What is an array?
2:53 Is python list same as an array?
3:48  How to create arrays in python?
7:19 Accessing array elements
9:59 Basic array operations
- 10:33  Finding the length of an array
- 15:06  Removing elements
- 18:32  Array concatenation
- 20:59  Slicing
- 23:26  Looping

Python Array Tutorial – Define, Index, Methods

In this article, you'll learn how to use Python arrays. You'll see how to define them and the different methods commonly used for performing operations on them.

The artcile covers arrays that you create by importing the `array module`. We won't cover NumPy arrays here.

1. Introduction to Arrays
1. The differences between Lists and Arrays
2. When to use arrays
2. How to use arrays
1. Define arrays
2. Find the length of arrays
3. Array indexing
4. Search through arrays
5. Loop through arrays
6. Slice an array
3. Array methods for performing operations
1. Change an existing value
3. Remove a value
4. Conclusion

Let's get started!

## What are Python Arrays?

Arrays are a fundamental data structure, and an important part of most programming languages. In Python, they are containers which are able to store more than one item at the same time.

Specifically, they are an ordered collection of elements with every value being of the same data type. That is the most important thing to remember about Python arrays - the fact that they can only hold a sequence of multiple items that are of the same type.

### What's the Difference between Python Lists and Python Arrays?

Lists are one of the most common data structures in Python, and a core part of the language.

Lists and arrays behave similarly.

Just like arrays, lists are an ordered sequence of elements.

They are also mutable and not fixed in size, which means they can grow and shrink throughout the life of the program. Items can be added and removed, making them very flexible to work with.

However, lists and arrays are not the same thing.

Lists store items that are of various data types. This means that a list can contain integers, floating point numbers, strings, or any other Python data type, at the same time. That is not the case with arrays.

As mentioned in the section above, arrays store only items that are of the same single data type. There are arrays that contain only integers, or only floating point numbers, or only any other Python data type you want to use.

### When to Use Python Arrays

Lists are built into the Python programming language, whereas arrays aren't. Arrays are not a built-in data structure, and therefore need to be imported via the `array module` in order to be used.

Arrays of the `array module` are a thin wrapper over C arrays, and are useful when you want to work with homogeneous data.

They are also more compact and take up less memory and space which makes them more size efficient compared to lists.

If you want to perform mathematical calculations, then you should use NumPy arrays by importing the NumPy package. Besides that, you should just use Python arrays when you really need to, as lists work in a similar way and are more flexible to work with.

## How to Use Arrays in Python

In order to create Python arrays, you'll first have to import the `array module` which contains all the necassary functions.

There are three ways you can import the `array module`:

• By using `import array` at the top of the file. This includes the module `array`. You would then go on to create an array using `array.array()`.
``````import array

#how you would create an array
array.array()``````
• Instead of having to type `array.array()` all the time, you could use `import array as arr` at the top of the file, instead of `import array` alone. You would then create an array by typing `arr.array()`. The `arr` acts as an alias name, with the array constructor then immediately following it.
``````import array as arr

#how you would create an array
arr.array()``````
• Lastly, you could also use `from array import *`, with `*` importing all the functionalities available. You would then create an array by writing the `array()` constructor alone.
``````from array import *

#how you would create an array
array()``````

### How to Define Arrays in Python

Once you've imported the `array module`, you can then go on to define a Python array.

The general syntax for creating an array looks like this:

``variable_name = array(typecode,[elements])``

Let's break it down:

• `variable_name` would be the name of the array.
• The `typecode` specifies what kind of elements would be stored in the array. Whether it would be an array of integers, an array of floats or an array of any other Python data type. Remember that all elements should be of the same data type.
• Inside square brackets you mention the `elements` that would be stored in the array, with each element being separated by a comma. You can also create an empty array by just writing `variable_name = array(typecode)` alone, without any elements.

Below is a typecode table, with the different typecodes that can be used with the different data types when defining Python arrays:

Tying everything together, here is an example of how you would define an array in Python:

``````import array as arr

numbers = arr.array('i',[10,20,30])

print(numbers)

#output

#array('i', [10, 20, 30])``````

Let's break it down:

• First we included the array module, in this case with `import array as arr `.
• Then, we created a `numbers` array.
• We used `arr.array()` because of `import array as arr `.
• Inside the `array()` constructor, we first included `i`, for signed integer. Signed integer means that the array can include positive and negative values. Unsigned integer, with `H` for example, would mean that no negative values are allowed.
• Lastly, we included the values to be stored in the array in square brackets.

Keep in mind that if you tried to include values that were not of `i` typecode, meaning they were not integer values, you would get an error:

``````import array as arr

numbers = arr.array('i',[10.0,20,30])

print(numbers)

#output

#Traceback (most recent call last):
# File "/Users/dionysialemonaki/python_articles/demo.py", line 14, in <module>
#   numbers = arr.array('i',[10.0,20,30])
#TypeError: 'float' object cannot be interpreted as an integer``````

In the example above, I tried to include a floating point number in the array. I got an error because this is meant to be an integer array only.

Another way to create an array is the following:

``````from array import *

#an array of floating point values
numbers = array('d',[10.0,20.0,30.0])

print(numbers)

#output

#array('d', [10.0, 20.0, 30.0])``````

The example above imported the `array module` via `from array import *` and created an array `numbers` of float data type. This means that it holds only floating point numbers, which is specified with the `'d'` typecode.

### How to Find the Length of an Array in Python

To find out the exact number of elements contained in an array, use the built-in `len()` method.

It will return the integer number that is equal to the total number of elements in the array you specify.

``````import array as arr

numbers = arr.array('i',[10,20,30])

print(len(numbers))

#output
# 3``````

In the example above, the array contained three elements – `10, 20, 30` – so the length of `numbers` is `3`.

### Array Indexing and How to Access Individual Items in an Array in Python

Each item in an array has a specific address. Individual items are accessed by referencing their index number.

Indexing in Python, and in all programming languages and computing in general, starts at `0`. It is important to remember that counting starts at `0` and not at `1`.

To access an element, you first write the name of the array followed by square brackets. Inside the square brackets you include the item's index number.

The general syntax would look something like this:

``array_name[index_value_of_item]``

Here is how you would access each individual element in an array:

``````import array as arr

numbers = arr.array('i',[10,20,30])

print(numbers[0]) # gets the 1st element
print(numbers[1]) # gets the 2nd element
print(numbers[2]) # gets the 3rd element

#output

#10
#20
#30``````

Remember that the index value of the last element of an array is always one less than the length of the array. Where `n` is the length of the array, `n - 1` will be the index value of the last item.

Note that you can also access each individual element using negative indexing.

With negative indexing, the last element would have an index of `-1`, the second to last element would have an index of `-2`, and so on.

Here is how you would get each item in an array using that method:

``````import array as arr

numbers = arr.array('i',[10,20,30])

print(numbers[-1]) #gets last item
print(numbers[-2]) #gets second to last item
print(numbers[-3]) #gets first item

#output

#30
#20
#10``````

### How to Search Through an Array in Python

You can find out an element's index number by using the `index()` method.

You pass the value of the element being searched as the argument to the method, and the element's index number is returned.

``````import array as arr

numbers = arr.array('i',[10,20,30])

#search for the index of the value 10
print(numbers.index(10))

#output

#0``````

If there is more than one element with the same value, the index of the first instance of the value will be returned:

``````import array as arr

numbers = arr.array('i',[10,20,30,10,20,30])

#search for the index of the value 10
#will return the index number of the first instance of the value 10
print(numbers.index(10))

#output

#0``````

### How to Loop through an Array in Python

You've seen how to access each individual element in an array and print it out on its own.

You've also seen how to print the array, using the `print()` method. That method gives the following result:

``````import array as arr

numbers = arr.array('i',[10,20,30])

print(numbers)

#output

#array('i', [10, 20, 30])``````

What if you want to print each value one by one?

This is where a loop comes in handy. You can loop through the array and print out each value, one-by-one, with each loop iteration.

For this you can use a simple `for` loop:

``````import array as arr

numbers = arr.array('i',[10,20,30])

for number in numbers:
print(number)

#output
#10
#20
#30``````

You could also use the `range()` function, and pass the `len()` method as its parameter. This would give the same result as above:

``````import array as arr

values = arr.array('i',[10,20,30])

#prints each individual value in the array
for value in range(len(values)):
print(values[value])

#output

#10
#20
#30``````

### How to Slice an Array in Python

To access a specific range of values inside the array, use the slicing operator, which is a colon `:`.

When using the slicing operator and you only include one value, the counting starts from `0` by default. It gets the first item, and goes up to but not including the index number you specify.

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

#get the values 10 and 20 only
print(numbers[:2])  #first to second position

#output

#array('i', [10, 20])``````

When you pass two numbers as arguments, you specify a range of numbers. In this case, the counting starts at the position of the first number in the range, and up to but not including the second one:

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

#get the values 20 and 30 only
print(numbers[1:3]) #second to third position

#output

#rray('i', [20, 30])``````

## Methods For Performing Operations on Arrays in Python

Arrays are mutable, which means they are changeable. You can change the value of the different items, add new ones, or remove any you don't want in your program anymore.

Let's see some of the most commonly used methods which are used for performing operations on arrays.

### How to Change the Value of an Item in an Array

You can change the value of a specific element by speficying its position and assigning it a new value:

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

#change the first element
#change it from having a value of 10 to having a value of 40
numbers[0] = 40

print(numbers)

#output

#array('i', [40, 20, 30])``````

### How to Add a New Value to an Array

To add one single value at the end of an array, use the `append()` method:

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

#add the integer 40 to the end of numbers
numbers.append(40)

print(numbers)

#output

#array('i', [10, 20, 30, 40])``````

Be aware that the new item you add needs to be the same data type as the rest of the items in the array.

Look what happens when I try to add a float to an array of integers:

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

#add the integer 40 to the end of numbers
numbers.append(40.0)

print(numbers)

#output

#Traceback (most recent call last):
#  File "/Users/dionysialemonaki/python_articles/demo.py", line 19, in <module>
#   numbers.append(40.0)
#TypeError: 'float' object cannot be interpreted as an integer``````

But what if you want to add more than one value to the end an array?

Use the `extend()` method, which takes an iterable (such as a list of items) as an argument. Again, make sure that the new items are all the same data type.

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

#add the integers 40,50,60 to the end of numbers
#The numbers need to be enclosed in square brackets

numbers.extend([40,50,60])

print(numbers)

#output

#array('i', [10, 20, 30, 40, 50, 60])``````

And what if you don't want to add an item to the end of an array? Use the `insert()` method, to add an item at a specific position.

The `insert()` function takes two arguments: the index number of the position the new element will be inserted, and the value of the new element.

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

#add the integer 40 in the first position
#remember indexing starts at 0

numbers.insert(0,40)

print(numbers)

#output

#array('i', [40, 10, 20, 30])``````

### How to Remove a Value from an Array

To remove an element from an array, use the `remove()` method and include the value as an argument to the method.

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

numbers.remove(10)

print(numbers)

#output

#array('i', [20, 30])``````

With `remove()`, only the first instance of the value you pass as an argument will be removed.

See what happens when there are more than one identical values:

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30,10,20])

numbers.remove(10)

print(numbers)

#output

#array('i', [20, 30, 10, 20])``````

Only the first occurence of `10` is removed.

You can also use the `pop()` method, and specify the position of the element to be removed:

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30,10,20])

#remove the first instance of 10
numbers.pop(0)

print(numbers)

#output

#array('i', [20, 30, 10, 20])``````

## Conclusion

And there you have it - you now know the basics of how to create arrays in Python using the `array module`. Hopefully you found this guide helpful.

Thanks for reading and happy coding!

#python #programming

1670560264

## Understanding Arrays in Python

### Learn how to use Python arrays. Create arrays in Python using the array module. You'll see how to define them and the different methods commonly used for performing operations on them.

The artcile covers arrays that you create by importing the `array module`. We won't cover NumPy arrays here.

1. Introduction to Arrays
1. The differences between Lists and Arrays
2. When to use arrays
2. How to use arrays
1. Define arrays
2. Find the length of arrays
3. Array indexing
4. Search through arrays
5. Loop through arrays
6. Slice an array
3. Array methods for performing operations
1. Change an existing value
3. Remove a value
4. Conclusion

Let's get started!

## What are Python Arrays?

Arrays are a fundamental data structure, and an important part of most programming languages. In Python, they are containers which are able to store more than one item at the same time.

Specifically, they are an ordered collection of elements with every value being of the same data type. That is the most important thing to remember about Python arrays - the fact that they can only hold a sequence of multiple items that are of the same type.

### What's the Difference between Python Lists and Python Arrays?

Lists are one of the most common data structures in Python, and a core part of the language.

Lists and arrays behave similarly.

Just like arrays, lists are an ordered sequence of elements.

They are also mutable and not fixed in size, which means they can grow and shrink throughout the life of the program. Items can be added and removed, making them very flexible to work with.

However, lists and arrays are not the same thing.

Lists store items that are of various data types. This means that a list can contain integers, floating point numbers, strings, or any other Python data type, at the same time. That is not the case with arrays.

As mentioned in the section above, arrays store only items that are of the same single data type. There are arrays that contain only integers, or only floating point numbers, or only any other Python data type you want to use.

### When to Use Python Arrays

Lists are built into the Python programming language, whereas arrays aren't. Arrays are not a built-in data structure, and therefore need to be imported via the `array module` in order to be used.

Arrays of the `array module` are a thin wrapper over C arrays, and are useful when you want to work with homogeneous data.

They are also more compact and take up less memory and space which makes them more size efficient compared to lists.

If you want to perform mathematical calculations, then you should use NumPy arrays by importing the NumPy package. Besides that, you should just use Python arrays when you really need to, as lists work in a similar way and are more flexible to work with.

## How to Use Arrays in Python

In order to create Python arrays, you'll first have to import the `array module` which contains all the necassary functions.

There are three ways you can import the `array module`:

1. By using `import array` at the top of the file. This includes the module `array`. You would then go on to create an array using `array.array()`.
``````import array

#how you would create an array
array.array()
``````
1. Instead of having to type `array.array()` all the time, you could use `import array as arr` at the top of the file, instead of `import array` alone. You would then create an array by typing `arr.array()`. The `arr` acts as an alias name, with the array constructor then immediately following it.
``````import array as arr

#how you would create an array
arr.array()
``````
1. Lastly, you could also use `from array import *`, with `*` importing all the functionalities available. You would then create an array by writing the `array()` constructor alone.
``````from array import *

#how you would create an array
array()
``````

### How to Define Arrays in Python

Once you've imported the `array module`, you can then go on to define a Python array.

The general syntax for creating an array looks like this:

``````variable_name = array(typecode,[elements])
``````

Let's break it down:

• `variable_name` would be the name of the array.
• The `typecode` specifies what kind of elements would be stored in the array. Whether it would be an array of integers, an array of floats or an array of any other Python data type. Remember that all elements should be of the same data type.
• Inside square brackets you mention the `elements` that would be stored in the array, with each element being separated by a comma. You can also create an empty array by just writing `variable_name = array(typecode)` alone, without any elements.

Below is a typecode table, with the different typecodes that can be used with the different data types when defining Python arrays:

Tying everything together, here is an example of how you would define an array in Python:

``````import array as arr

numbers = arr.array('i',[10,20,30])

print(numbers)

#output

#array('i', [10, 20, 30])
``````

Let's break it down:

• First we included the array module, in this case with `import array as arr `.
• Then, we created a `numbers` array.
• We used `arr.array()` because of `import array as arr `.
• Inside the `array()` constructor, we first included `i`, for signed integer. Signed integer means that the array can include positive and negative values. Unsigned integer, with `H` for example, would mean that no negative values are allowed.
• Lastly, we included the values to be stored in the array in square brackets.

Keep in mind that if you tried to include values that were not of `i` typecode, meaning they were not integer values, you would get an error:

``````import array as arr

numbers = arr.array('i',[10.0,20,30])

print(numbers)

#output

#Traceback (most recent call last):
# File "/Users/dionysialemonaki/python_articles/demo.py", line 14, in <module>
#   numbers = arr.array('i',[10.0,20,30])
#TypeError: 'float' object cannot be interpreted as an integer
``````

In the example above, I tried to include a floating point number in the array. I got an error because this is meant to be an integer array only.

Another way to create an array is the following:

``````from array import *

#an array of floating point values
numbers = array('d',[10.0,20.0,30.0])

print(numbers)

#output

#array('d', [10.0, 20.0, 30.0])
``````

The example above imported the `array module` via `from array import *` and created an array `numbers` of float data type. This means that it holds only floating point numbers, which is specified with the `'d'` typecode.

### How to Find the Length of an Array in Python

To find out the exact number of elements contained in an array, use the built-in `len()` method.

It will return the integer number that is equal to the total number of elements in the array you specify.

``````import array as arr

numbers = arr.array('i',[10,20,30])

print(len(numbers))

#output
# 3
``````

In the example above, the array contained three elements – `10, 20, 30` – so the length of `numbers` is `3`.

### Array Indexing and How to Access Individual Items in an Array in Python

Each item in an array has a specific address. Individual items are accessed by referencing their index number.

Indexing in Python, and in all programming languages and computing in general, starts at `0`. It is important to remember that counting starts at `0` and not at `1`.

To access an element, you first write the name of the array followed by square brackets. Inside the square brackets you include the item's index number.

The general syntax would look something like this:

``````array_name[index_value_of_item]
``````

Here is how you would access each individual element in an array:

``````import array as arr

numbers = arr.array('i',[10,20,30])

print(numbers[0]) # gets the 1st element
print(numbers[1]) # gets the 2nd element
print(numbers[2]) # gets the 3rd element

#output

#10
#20
#30
``````

Remember that the index value of the last element of an array is always one less than the length of the array. Where `n` is the length of the array, `n - 1` will be the index value of the last item.

Note that you can also access each individual element using negative indexing.

With negative indexing, the last element would have an index of `-1`, the second to last element would have an index of `-2`, and so on.

Here is how you would get each item in an array using that method:

``````import array as arr

numbers = arr.array('i',[10,20,30])

print(numbers[-1]) #gets last item
print(numbers[-2]) #gets second to last item
print(numbers[-3]) #gets first item

#output

#30
#20
#10
``````

### How to Search Through an Array in Python

You can find out an element's index number by using the `index()` method.

You pass the value of the element being searched as the argument to the method, and the element's index number is returned.

``````import array as arr

numbers = arr.array('i',[10,20,30])

#search for the index of the value 10
print(numbers.index(10))

#output

#0
``````

If there is more than one element with the same value, the index of the first instance of the value will be returned:

``````import array as arr

numbers = arr.array('i',[10,20,30,10,20,30])

#search for the index of the value 10
#will return the index number of the first instance of the value 10
print(numbers.index(10))

#output

#0
``````

### How to Loop through an Array in Python

You've seen how to access each individual element in an array and print it out on its own.

You've also seen how to print the array, using the `print()` method. That method gives the following result:

``````import array as arr

numbers = arr.array('i',[10,20,30])

print(numbers)

#output

#array('i', [10, 20, 30])
``````

What if you want to print each value one by one?

This is where a loop comes in handy. You can loop through the array and print out each value, one-by-one, with each loop iteration.

For this you can use a simple `for` loop:

``````import array as arr

numbers = arr.array('i',[10,20,30])

for number in numbers:
print(number)

#output
#10
#20
#30
``````

You could also use the `range()` function, and pass the `len()` method as its parameter. This would give the same result as above:

``````import array as arr

values = arr.array('i',[10,20,30])

#prints each individual value in the array
for value in range(len(values)):
print(values[value])

#output

#10
#20
#30
``````

### How to Slice an Array in Python

To access a specific range of values inside the array, use the slicing operator, which is a colon `:`.

When using the slicing operator and you only include one value, the counting starts from `0` by default. It gets the first item, and goes up to but not including the index number you specify.

``````
import array as arr

#original array
numbers = arr.array('i',[10,20,30])

#get the values 10 and 20 only
print(numbers[:2])  #first to second position

#output

#array('i', [10, 20])
``````

When you pass two numbers as arguments, you specify a range of numbers. In this case, the counting starts at the position of the first number in the range, and up to but not including the second one:

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

#get the values 20 and 30 only
print(numbers[1:3]) #second to third position

#output

#rray('i', [20, 30])
``````

## Methods For Performing Operations on Arrays in Python

Arrays are mutable, which means they are changeable. You can change the value of the different items, add new ones, or remove any you don't want in your program anymore.

Let's see some of the most commonly used methods which are used for performing operations on arrays.

### How to Change the Value of an Item in an Array

You can change the value of a specific element by speficying its position and assigning it a new value:

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

#change the first element
#change it from having a value of 10 to having a value of 40
numbers[0] = 40

print(numbers)

#output

#array('i', [40, 20, 30])
``````

### How to Add a New Value to an Array

To add one single value at the end of an array, use the `append()` method:

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

#add the integer 40 to the end of numbers
numbers.append(40)

print(numbers)

#output

#array('i', [10, 20, 30, 40])
``````

Be aware that the new item you add needs to be the same data type as the rest of the items in the array.

Look what happens when I try to add a float to an array of integers:

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

#add the integer 40 to the end of numbers
numbers.append(40.0)

print(numbers)

#output

#Traceback (most recent call last):
#  File "/Users/dionysialemonaki/python_articles/demo.py", line 19, in <module>
#   numbers.append(40.0)
#TypeError: 'float' object cannot be interpreted as an integer
``````

But what if you want to add more than one value to the end an array?

Use the `extend()` method, which takes an iterable (such as a list of items) as an argument. Again, make sure that the new items are all the same data type.

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

#add the integers 40,50,60 to the end of numbers
#The numbers need to be enclosed in square brackets

numbers.extend([40,50,60])

print(numbers)

#output

#array('i', [10, 20, 30, 40, 50, 60])
``````

And what if you don't want to add an item to the end of an array? Use the `insert()` method, to add an item at a specific position.

The `insert()` function takes two arguments: the index number of the position the new element will be inserted, and the value of the new element.

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

#add the integer 40 in the first position
#remember indexing starts at 0

numbers.insert(0,40)

print(numbers)

#output

#array('i', [40, 10, 20, 30])
``````

### How to Remove a Value from an Array

To remove an element from an array, use the `remove()` method and include the value as an argument to the method.

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

numbers.remove(10)

print(numbers)

#output

#array('i', [20, 30])
``````

With `remove()`, only the first instance of the value you pass as an argument will be removed.

See what happens when there are more than one identical values:

``````
import array as arr

#original array
numbers = arr.array('i',[10,20,30,10,20])

numbers.remove(10)

print(numbers)

#output

#array('i', [20, 30, 10, 20])
``````

Only the first occurence of `10` is removed.

You can also use the `pop()` method, and specify the position of the element to be removed:

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30,10,20])

#remove the first instance of 10
numbers.pop(0)

print(numbers)

#output

#array('i', [20, 30, 10, 20])
``````

## Conclusion

And there you have it - you now know the basics of how to create arrays in Python using the `array module`. Hopefully you found this guide helpful.

You'll start from the basics and learn in an interacitve and beginner-friendly way. You'll also build five projects at the end to put into practice and help reinforce what you learned.

Thanks for reading and happy coding!

Original article source at https://www.freecodecamp.org

#python

1646789416

## Top 17 Machine Learning Algorithms with Scikit-Learn

### Machine Learning with Scikit-Learn

Scikit-learn is a library in Python that provides many unsupervised and supervised learning algorithms. It’s built upon some of the technology you might already be familiar with, like NumPy, pandas, and Matplotlib.

### The functionality that scikit-learn provides include:

• Regression, including Linear and Logistic Regression
• Classification, including K-Nearest Neighbors
• Clustering, including K-Means and K-Means++
• Model selection
• Preprocessing, including Min-Max Normalization

In this Article I will explain all machine learning algorithms with scikit-learn which you need to learn as a Data Scientist.

Lets start by importing the libraries:

``````%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.linear_model import LinearRegression
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from scipy import stats
import pylab as pl``````

### Estimator

Given a scikit-learn estimator object named model, the following methods are available:

#### Available in all Estimators

model.fit() : fit training data. For supervised learning applications, this accepts two arguments: the data X and the labels y (e.g. model.fit(X, y)). For unsupervised learning applications, this accepts only a single argument, the data X (e.g. model.fit(X)).

#### Available in supervised estimators

model.predict() : given a trained model, predict the label of a new set of data. This method accepts one argument, the new data X_new (e.g. model.predict(X_new)), and returns the learned label for each object in the array.

model.predict_proba() : For classification problems, some estimators also provide this method, which returns the probability that a new observation has each categorical label. In this case, the label with the highest probability is returned by model.predict(). model.score() : for classification or regression problems, most (all?) estimators implement a score method. Scores are between 0 and 1, with a larger score indicating a better fit.

#### Available in unsupervised estimators

model.predict() : predict labels in clustering algorithms. model.transform() : given an unsupervised model, transform new data into the new basis. This also accepts one argument X_new, and returns the new representation of the data based on the unsupervised model. model.fit_transform() : some estimators implement this method, which more efficiently performs a fit and a transform on the same input data.

``````data =  pd.read_csv('Iris.csv')
``print(data.shape)``
``````#Output
(150, 6)``````
``data.info()``
``````#Output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
#   Column         Non-Null Count  Dtype
---  ------         --------------  -----
0   Id             150 non-null    int64
1   SepalLengthCm  150 non-null    float64
2   SepalWidthCm   150 non-null    float64
3   PetalLengthCm  150 non-null    float64
4   PetalWidthCm   150 non-null    float64
5   Species        150 non-null    object
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB``````

### Visualization

Some graphical representation of information and data.

``````sns.FacetGrid(data,hue='Species',size=5)\
.map(plt.scatter,'SepalLengthCm','SepalWidthCm')\
``sns.pairplot(data,hue='Species')``

### Prepare Train and Test

scikit-learn provides a helpful function for partitioning data, train_test_split, which splits out your data into a training set and a test set.

Training and test usually is 70% for training and 30% for test

• Training set for fitting the model
• Test set for evaluation only
``````X = data.iloc[:, :-1].values    #   X -> Feature Variables
y = data.iloc[:, -1].values #   y ->  Target
# Splitting the data into Train and Test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)``````

### Algorithm 1 – Linear Regression

It is used to estimate real values (cost of houses, number of calls, total sales etc.) based on continuous variable(s). Here, we establish relationship between independent and dependent variables by fitting a best line. This best fit line is known as regression line and represented by a linear equation **Y= a *X + b.

``````#converting object data type into int data type using labelEncoder for Linear reagration in this case

XL = data.iloc[:, :-1].values    #   X -> Feature Variables
yL = data.iloc[:, -1].values #   y ->  Target

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
Y_train= le.fit_transform(yL)

print(Y_train)  # this is Y_train categotical to numerical
``````
``````#Output
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2]``````
``````# This is only for Linear Regretion
X_trainL, X_testL, y_trainL, y_testL = train_test_split(XL, Y_train, test_size = 0.3, random_state = 0)``````
``````from sklearn.linear_model import LinearRegression
modelLR = LinearRegression()
modelLR.fit(X_trainL, y_trainL)
Y_pred = modelLR.predict(X_testL)``````
``````from sklearn import metrics
#calculating the residuals
print('y-intercept             :' , modelLR.intercept_)
print('beta coefficients       :' , modelLR.coef_)
print('Mean Abs Error MAE      :' ,metrics.mean_absolute_error(y_testL,Y_pred))
print('Mean Sqrt Error MSE     :' ,metrics.mean_squared_error(y_testL,Y_pred))
print('Root Mean Sqrt Error RMSE:' ,np.sqrt(metrics.mean_squared_error(y_testL,Y_pred)))
print('r2 value                :' ,metrics.r2_score(y_testL,Y_pred))``````
``````#Output
y-intercept             : -0.024298523519848292
beta coefficients       : [ 0.00680677 -0.10726764 -0.00624275  0.22428158  0.27196685]
Mean Abs Error MAE      : 0.14966835490524963
Mean Sqrt Error MSE     : 0.03255451737969812
Root Mean Sqrt Error RMSE: 0.18042870442282213
r2 value                : 0.9446026069799255``````

### Algorithm 2- Decision Tree

This is one of my favorite algorithm and I use it quite frequently. It is a type of supervised learning algorithm that is mostly used for classification problems. Surprisingly, it works for both categorical and continuous dependent variables.

In this algorithm, we split the population into two or more homogeneous sets. This is done based on most significant attributes/ independent variables to make as distinct groups as possible.

``````# Decision Tree's
from sklearn.tree import DecisionTreeClassifier

Model = DecisionTreeClassifier()

Model.fit(X_train, y_train)

y_pred = Model.predict(X_test)

# Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
# Accuracy score
print('accuracy is',accuracy_score(y_pred,y_test))``````
``````#Output
precision    recall  f1-score   support

Iris-setosa       1.00      1.00      1.00        16
Iris-versicolor       0.95      1.00      0.97        18
Iris-virginica       1.00      0.91      0.95        11

accuracy                           0.98        45
macro avg       0.98      0.97      0.98        45
weighted avg       0.98      0.98      0.98        45

[[16  0  0]
[ 0 18  0]
[ 0  1 10]]
accuracy is 0.9777777777777777``````

## Algorithm 3- RandomForest

Random Forest is a trademark term for an ensemble of decision trees. In Random Forest, we’ve collection of decision trees (so known as “Forest”). To classify a new object based on attributes, each tree gives a classification and we say the tree “votes” for that class. The forest chooses the classification having the most votes (over all the trees in the forest).

``````from sklearn.ensemble import RandomForestClassifier
Model=RandomForestClassifier(max_depth=2)
Model.fit(X_train,y_train)
y_pred=Model.predict(X_test)

# Summary of the predictions made by the classifier
print(classification_report(y_test,y_pred))
print(confusion_matrix(y_pred,y_test))
#Accuracy Score
print('accuracy is ',accuracy_score(y_pred,y_test))``````
``````#Output
precision    recall  f1-score   support

Iris-setosa       1.00      1.00      1.00        16
Iris-versicolor       1.00      1.00      1.00        18
Iris-virginica       1.00      1.00      1.00        11

accuracy                           1.00        45
macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45

[[16  0  0]
[ 0 18  0]
[ 0  0 11]]
accuracy is  1.0``````

### Algorithm 4- Logistic Regression

Don’t get confused by its name! It is a classification not a regression algorithm. It is used to estimate discrete values ( Binary values like 0/1, yes/no, true/false ) based on given set of independent variable(s).

In simple words, it predicts the probability of occurrence of an event by fitting data to a logic function. Hence, it is also known as logic regression. Since, it predicts the probability, its output values lies between 0 and 1 (as expected).

``````# LogisticRegression
from sklearn.linear_model import LogisticRegression
Model = LogisticRegression()
Model.fit(X_train, y_train)

y_pred = Model.predict(X_test)

# Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
# Accuracy score
print('accuracy is',accuracy_score(y_pred,y_test))``````
``````#Output
precision    recall  f1-score   support

Iris-setosa       1.00      1.00      1.00        16
Iris-versicolor       1.00      1.00      1.00        18
Iris-virginica       1.00      1.00      1.00        11

accuracy                           1.00        45
macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45

[[16  0  0]
[ 0 18  0]
[ 0  0 11]]
accuracy is 1.0``````

### Algorithm 5- K Nearest Neighbors

It can be used for both classification and regression problems. However, it is more widely used in classification problems in the industry. K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases by a majority vote of its k neighbors. The case being assigned to the class is most common amongst its K nearest neighbors measured by a distance function.

``````# K-Nearest Neighbours
from sklearn.neighbors import KNeighborsClassifier

Model = KNeighborsClassifier(n_neighbors=8)
Model.fit(X_train, y_train)

y_pred = Model.predict(X_test)

# Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
# Accuracy score

print('accuracy is',accuracy_score(y_pred,y_test))``````
``````#Output
precision    recall  f1-score   support

Iris-setosa       1.00      1.00      1.00        16
Iris-versicolor       1.00      1.00      1.00        18
Iris-virginica       1.00      1.00      1.00        11

accuracy                           1.00        45
macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45

[[16  0  0]
[ 0 18  0]
[ 0  0 11]]
accuracy is 1.0``````

### Algorithm 6- Naive Bayes

It is a classification technique based on Bayes’ theorem with an assumption of independence between predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, a naive Bayes classifier would consider all of these properties to independently contribute to the probability that this fruit is an apple.

``````# Naive Bayes
from sklearn.naive_bayes import GaussianNB
Model = GaussianNB()
Model.fit(X_train, y_train)

y_pred = Model.predict(X_test)

# Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
# Accuracy score
print('accuracy is',accuracy_score(y_pred,y_test))``````
``````#Output
precision    recall  f1-score   support

Iris-setosa       1.00      1.00      1.00        16
Iris-versicolor       1.00      1.00      1.00        18
Iris-virginica       1.00      1.00      1.00        11

accuracy                           1.00        45
macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45

[[16  0  0]
[ 0 18  0]
[ 0  0 11]]
accuracy is 1.0``````

### Algorithm 7- Support Vector Machines

It is a classification method. In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate.

For example, if we only had two features like Height and Hair length of an individual, we’d first plot these two variables in two dimensional space where each point has two co-ordinates (these co-ordinates are known as Support Vectors)

``````# Support Vector Machine
from sklearn.svm import SVC

Model = SVC()
Model.fit(X_train, y_train)

y_pred = Model.predict(X_test)

# Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
# Accuracy score

print('accuracy is',accuracy_score(y_pred,y_test))``````
``````#Output
precision    recall  f1-score   support

Iris-setosa       1.00      1.00      1.00        16
Iris-versicolor       1.00      1.00      1.00        18
Iris-virginica       1.00      1.00      1.00        11

accuracy                           1.00        45
macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45

[[16  0  0]
[ 0 18  0]
[ 0  0 11]]
accuracy is 1.0``````

### Algorithm 8- Radius Neighbors Classifier

In scikit-learn RadiusNeighborsClassifier is very similar to KNeighborsClassifier with the exception of two parameters. First, in RadiusNeighborsClassifier we need to specify the radius of the fixed area used to determine if an observation is a neighbor using radius.

Unless there is some substantive reason for setting radius to some value, it is best to treat it like any other hyperparameter and tune it during model selection. The second useful parameter is outlier_label, which indicates what label to give an observation that has no observations within the radius – which itself can often be a useful tool for identifying outliers.

``````#Output
Model.fit(X_train,y_train)
y_pred=Model.predict(X_test)

#summary of the predictions made by the classifier
print(classification_report(y_test,y_pred))
print(confusion_matrix(y_test,y_pred))

#Accouracy score
print('accuracy is ', accuracy_score(y_test,y_pred))``````
``````#Output
precision    recall  f1-score   support

Iris-setosa       1.00      1.00      1.00        16
Iris-versicolor       1.00      1.00      1.00        18
Iris-virginica       1.00      1.00      1.00        11

accuracy                           1.00        45
macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45

[[16  0  0]
[ 0 18  0]
[ 0  0 11]]
accuracy is  1.0``````

### Algorithm 9- Passive Aggressive Classifier

PA algorithm is a margin based online learning algorithm for binary classification. Unlike PA algorithm, which is a hard-margin based method, PA-I algorithm is a soft margin based method and robuster to noise.

``````from sklearn.linear_model import PassiveAggressiveClassifier
Model = PassiveAggressiveClassifier()
Model.fit(X_train, y_train)

y_pred = Model.predict(X_test)

# Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
# Accuracy score
print('accuracy is',accuracy_score(y_pred,y_test))``````
``````#Output
precision    recall  f1-score   support

Iris-setosa       0.89      1.00      0.94        16
Iris-versicolor       0.00      0.00      0.00        18
Iris-virginica       0.41      1.00      0.58        11

accuracy                           0.60        45
macro avg       0.43      0.67      0.51        45
weighted avg       0.42      0.60      0.48        45

[[16  0  0]
[ 2  0 16]
[ 0  0 11]]
accuracy is 0.6``````

## Algorithm 10- BernoulliNB

Like MultinomialNB, this classifier is suitable for discrete data. The difference is that while MultinomialNB works with occurrence counts, BernoulliNB is designed for binary/boolean features.

``````# BernoulliNB
from sklearn.naive_bayes import BernoulliNB
Model = BernoulliNB()
Model.fit(X_train, y_train)

y_pred = Model.predict(X_test)

# Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
# Accuracy score
print('accuracy is',accuracy_score(y_pred,y_test))``````
``````#Output
precision    recall  f1-score   support

Iris-setosa       0.00      0.00      0.00        16
Iris-versicolor       0.00      0.00      0.00        18
Iris-virginica       0.24      1.00      0.39        11

accuracy                           0.24        45
macro avg       0.08      0.33      0.13        45
weighted avg       0.06      0.24      0.10        45

[[ 0  0 16]
[ 0  0 18]
[ 0  0 11]]
accuracy is 0.24444444444444444``````

### Algorithm 11- ExtraTreeClassifier

ExtraTreesClassifier is an ensemble learning method fundamentally based on decision trees. ExtraTreesClassifier, like RandomForest, randomizes certain decisions and subsets of data to minimize over-learning from the data and overfitting. Let’s look at some ensemble methods ordered from high to low variance, ending in ExtraTreesClassifier.

``````# ExtraTreeClassifier
from sklearn.tree import ExtraTreeClassifier

Model = ExtraTreeClassifier()

Model.fit(X_train, y_train)

y_pred = Model.predict(X_test)

# Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
# Accuracy score
print('accuracy is',accuracy_score(y_pred,y_test))``````
``````#Output
precision    recall  f1-score   support

Iris-setosa       1.00      1.00      1.00        16
Iris-versicolor       1.00      0.94      0.97        18
Iris-virginica       0.92      1.00      0.96        11

accuracy                           0.98        45
macro avg       0.97      0.98      0.98        45
weighted avg       0.98      0.98      0.98        45

[[16  0  0]
[ 0 17  1]
[ 0  0 11]]
accuracy is 0.9777777777777777``````

### Algorithm 12- Bagging classifier

Bagging classifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it.

``````#Output
precision    recall  f1-score   support

Iris-setosa       1.00      1.00      1.00        16
Iris-versicolor       0.95      1.00      0.97        18
Iris-virginica       1.00      0.91      0.95        11

accuracy                           0.98        45
macro avg       0.98      0.97      0.98        45
weighted avg       0.98      0.98      0.98        45

[[16  0  0]
[ 0 18  1]
[ 0  0 10]]
accuracy is  0.9777777777777777``````

An AdaBoost classifier is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.

``````from sklearn.ensemble import AdaBoostClassifier
Model.fit(X_train,y_train)
y_pred=Model.predict(X_test)

# Summary of the predictions made by the classifier
print(classification_report(y_test,y_pred))
print(confusion_matrix(y_pred,y_test))
#Accuracy Score
print('accuracy is ',accuracy_score(y_pred,y_test))``````
``````#Output
precision    recall  f1-score   support

Iris-setosa       1.00      1.00      1.00        16
Iris-versicolor       0.95      1.00      0.97        18
Iris-virginica       1.00      0.91      0.95        11

accuracy                           0.98        45
macro avg       0.98      0.97      0.98        45
weighted avg       0.98      0.98      0.98        45

[[16  0  0]
[ 0 18  1]
[ 0  0 10]]
accuracy is  0.9777777777777777``````

### Algorithm 14- Gradient Boosting Classifier

GBM is a boosting algorithm used when we deal with plenty of data to make a prediction with high prediction power. Boosting is actually an ensemble of learning algorithms which combines the prediction of several base estimators in order to improve robustness over a single estimator. It combines multiple weak or average predictors to a build strong predictor.

``````from sklearn.ensemble import GradientBoostingClassifier
Model.fit(X_train,y_train)
y_pred=Model.predict(X_test)

# Summary of the predictions made by the classifier
print(classification_report(y_test,y_pred))
print(confusion_matrix(y_pred,y_test))

#Accuracy Score
print('accuracy is ',accuracy_score(y_pred,y_test))``````
``````#Output
precision    recall  f1-score   support

Iris-setosa       1.00      1.00      1.00        16
Iris-versicolor       0.95      1.00      0.97        18
Iris-virginica       1.00      0.91      0.95        11

accuracy                           0.98        45
macro avg       0.98      0.97      0.98        45
weighted avg       0.98      0.98      0.98        45

[[16  0  0]
[ 0 18  1]
[ 0  0 10]]
accuracy is  0.9777777777777777``````

### Algorithm 15- Linear Discriminant Analysis

A classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes’ rule.

The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix.

The fitted model can also be used to reduce the dimensionality of the input by projecting it to the most discriminative directions.

``````from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
Model=LinearDiscriminantAnalysis()
Model.fit(X_train,y_train)
y_pred=Model.predict(X_test)

# Summary of the predictions made by the classifier
print(classification_report(y_test,y_pred))
print(confusion_matrix(y_pred,y_test))

#Accuracy Score
print('accuracy is ',accuracy_score(y_pred,y_test))``````
``````#Output
precision    recall  f1-score   support

Iris-setosa       1.00      1.00      1.00        16
Iris-versicolor       1.00      1.00      1.00        18
Iris-virginica       1.00      1.00      1.00        11

accuracy                           1.00        45
macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45

[[16  0  0]
[ 0 18  0]
[ 0  0 11]]
accuracy is  1.0``````

### Algorithm 16- Quadratic Discriminant Analysis

A classifier with a quadratic decision boundary, generated by fitting class conditional densities to the data and using Bayes’ rule.

The model fits a Gaussian density to each class.

``````#Output
precision    recall  f1-score   support

Iris-setosa       1.00      1.00      1.00        16
Iris-versicolor       1.00      1.00      1.00        18
Iris-virginica       1.00      1.00      1.00        11

accuracy                           1.00        45
macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45

[[16  0  0]
[ 0 18  0]
[ 0  0 11]]
accuracy is  1.0``````

### Algorithm 17- K- means

It is a type of unsupervised algorithm which solves the clustering problem. Its procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters). Data points inside a cluster are homogeneous and heterogeneous to peer groups.

Remember figuring out shapes from ink blots? k means is somewhat similar this activity. You look at the shape and spread to decipher how many different clusters / population are present.

``````x = data.iloc[:, [1, 2, 3, 4]].values

#Finding the optimum number of clusters for k-means classification
from sklearn.cluster import KMeans
wcss = []

for i in range(1, 11):
kmeans = KMeans(n_clusters = i, init = 'k-means++', max_iter = 300, n_init = 10, random_state = 0)
kmeans.fit(x)
wcss.append(kmeans.inertia_)

#Plotting the results onto a line graph, allowing us to observe 'The elbow'
plt.plot(range(1, 11), wcss)
plt.title('The elbow method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS') # within cluster sum of squares
plt.show()``````
``````#Applying kmeans to the dataset / Creating the kmeans classifier
kmeans = KMeans(n_clusters = 3, init = 'k-means++', max_iter = 300, n_init = 10, random_state = 0)
y_kmeans = kmeans.fit_predict(x)
#Visualising the clusters

plt.scatter(x[y_kmeans == 0, 0], x[y_kmeans == 0, 1], s = 100, c = 'red', label = 'Iris-Setosa')
plt.scatter(x[y_kmeans == 1, 0], x[y_kmeans == 1, 1], s = 100, c = 'blue', label = 'Iris-Versicolour')
plt.scatter(x[y_kmeans == 2, 0], x[y_kmeans == 2, 1], s = 100, c = 'yellow', label = 'Iris-Virginica')

#Plotting the centroids of the clusters
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,1], s = 100, c = 'green', label = 'Centroids',marker='*')

plt.legend()``````

Original article source at https://thecleverprogrammer.com

#machinelearning #algorithms

1639054860

## Variable de Impresión de Python

Python es un lenguaje versátil y flexible; a menudo hay más de una forma de lograr algo.

En este tutorial, verá algunas de las formas en que puede imprimir una cadena y una variable juntas.

¡Empecemos!

## Cómo usar la `print()`función en Python

Para imprimir cualquier cosa en Python, se utiliza la `print()`función - que es la `print`palabra clave seguida de un conjunto de apertura y cierre de paréntesis, `()`.

``````#how to print a string
print("Hello world")

#how to print an integer
print(7)

#how to print a variable
#to just print the variable on its own include only the name of it

fave_language = "Python"
print(fave_language)

#output

#Hello world
#7
#Python
``````

Si omite los paréntesis, obtendrá un error:

``````print "hello world"

#output after running the code:
#File "/Users/dionysialemonaki/python_articles/demo.py", line 1
#    print "hello world"
#    ^^^^^^^^^^^^^^^^^^^
#SyntaxError: Missing parentheses in call to 'print'. Did you mean print(...)?
``````

Si escribe su código Python en Visual Studio Code, con la extensión Python , también obtendrá un subrayado y una pista que indican que algo no está del todo bien:

Como se mencionó anteriormente, la declaración de impresión se utiliza para generar todo tipo de información. Esto incluye datos textuales y numéricos, variables y otros tipos de datos.

También puede imprimir texto (o cadenas) combinado con variables, todo en una declaración.

Verá algunas de las diferentes formas de hacer esto en las secciones siguientes.

## Cómo imprimir una variable y una cadena en Python usando la concatenación

Concatenar, según el diccionario, significa enlazar (cosas) juntas en una cadena o serie.

Esto se hace mediante la adición de varias cosas (en este caso la programación - se añaden datos), junto con otros, utilizando el operador de suma Python, `+`.

Tenga en cuenta que la concatenación solo se usa para cadenas, por lo que si la variable que desea concatenar con el resto de las cadenas es de un tipo de datos entero, tendrá que convertirla en una cadena con la `str()`función.

En el siguiente ejemplo, quiero imprimir el valor de una variable junto con algún otro texto.

Agrego las cadenas entre comillas dobles y el nombre de la variable sin rodearlo, usando el operador de suma para encadenarlos todos juntos:

``````fave_language = "Python"

print("I like coding in " + fave_language + " the most")

#output
#I like coding in Python the most
``````

Con la concatenación de cadenas, debe agregar espacios usted mismo, por lo que si en el ejemplo anterior no hubiera incluido ningún espacio entre las comillas, la salida se vería así:

``````fave_language = "Python"

print("I like coding in" + fave_language + "the most")

#output
#I like coding inPythonthe most
``````

Incluso puede agregar los espacios por separado:

``````fave_language = "Python"

print("I like coding in" + " " + fave_language + " "  + "the most")

#output
#I like coding in Python the most
``````

Esta no es la forma más preferida de imprimir cadenas y variables, ya que puede ser propensa a errores y consumir mucho tiempo.

## Cómo imprimir una variable y una cadena en Python separando cada una con una coma

Puede imprimir texto junto a una variable, separados por comas, en una declaración de impresión.

``````first_name = "John"

print("Hello",first_name)

#output
#Hello John
``````

En el ejemplo anterior, primero incluí un texto que quería imprimir entre comillas dobles; en este caso, el texto era la cadena `Hello`.

Después de las comillas de cierre, agregué una coma que separa ese fragmento de texto del valor contenido en el nombre de la variable ( `first_name`en este caso) que luego incluí.

Podría haber agregado más texto siguiendo la variable, así:

``````first_name = "John"

print("Hello",first_name,"good to see you")

#output
#Hello John good to see you
``````

Este método también funciona con más de una variable:

``````first_name = "John"
last_name = "Doe"

print("Hello",first_name,last_name,"good to see you")

#output
Hello John Doe good to see you
``````

Asegúrate de separar todo con una coma.

Entonces, separa el texto de las variables con una coma, pero también las variables de otras variables, como se muestra arriba.

Si no se hubiera agregado la coma entre `first_name`y `last_name`, el código habría arrojado un error:

``````first_name = "John"
last_name = "Doe"

print("Hello",first_name last_name,"good to see you")

#output
#File "/Users/dionysialemonaki/python_articles/demo.py", line 4
#    print("Hello",first_name last_name,"good to see you")
#                 ^^^^^^^^^^^^^^^^^^^^
#SyntaxError: invalid syntax. Perhaps you forgot a comma?
``````

Como puede ver, los mensajes de error de Python son extremadamente útiles y facilitan un poco el proceso de depuración :)

## Cómo imprimir una variable y una cadena en Python usando formato de cadena

Utiliza el formato de cadena al incluir un conjunto de llaves de apertura y cierre `{}`, en el lugar donde desea agregar el valor de una variable.

``````first_name = "John"

print("Hello {}, hope you're well!")
``````

En este ejemplo hay una variable, `first_name`.

Dentro de la declaración impresa hay un conjunto de comillas dobles de apertura y cierre con el texto que debe imprimirse.

Dentro de eso, agregué un conjunto de llaves en el lugar donde quiero agregar el valor de la variable `first_name`.

Si intento ejecutar este código, tendrá el siguiente resultado:

``````#output
#Hello {}, hope you're well!
``````

¡En realidad, no imprime el valor de `first_name`!

Para imprimirlo, necesito agregar el `.format()`método de cadena al final de la cadena, que es inmediatamente después de las comillas de cierre:

``````first_name = "John"

print("Hello {}, hope you're well!".format(first_name))

#output
#Hello John, hope you're well!
``````

Cuando hay más de una variable, usa tantas llaves como la cantidad de variables que desee imprimir:

``````first_name = "John"
last_name = "Doe"

print("Hello {} {}, hope you're well!")
``````

En este ejemplo, he creado dos variables y quiero imprimir ambas, una después de la otra, así que agregué dos juegos de llaves en el lugar donde quiero que se sustituyan las variables.

Ahora, cuando se trata del `.format()`método, importa el orden en el que coloque los nombres de las variables.

Entonces, el valor del nombre de la variable que se agregará primero en el método estará en el lugar de la primera llave, el valor del nombre de la variable que se agregará en segundo lugar estará en el lugar de la segunda llave, y pronto.

Asegúrese de separar los nombres de las variables con comas dentro del método:

``````first_name = "John"
last_name = "Doe"

print("Hello {} {}, hope you're well!".format(first_name,last_name))

#output
#Hello John Doe, hope you're well!
``````

Si hubiera invertido el orden de los nombres dentro del método, la salida se vería diferente:

``````first_name = "John"
last_name = "Doe"

print("Hello {} {}, hope you're well!".format(last_name,first_name))

#output
#Hello Doe John, hope you're well!
``````

## Cómo imprimir una variable y una cadena en Python usando `f-strings`

`f-strings` son una forma mejor, más legible y concisa de lograr el formato de cadena en comparación con el método que vimos en la sección anterior.

La sintaxis es más sencilla y requiere menos trabajo manual.

La sintaxis general para crear un se `f-string`ve así:

``````print(f"I want this text printed to the console!")

#output
#I want this text printed to the console!
``````

Primero incluye el carácter `f`antes de las comillas de apertura y cierre, dentro de la `print()`función.

Para imprimir una variable con una cadena en una línea, vuelva a incluir el carácter `f`en el mismo lugar, justo antes de las comillas.

Luego agrega el texto que desea dentro de las comillas, y en el lugar donde desea agregar el valor de una variable, agrega un conjunto de llaves con el nombre de la variable dentro de ellas:

``````first_name = "John"

print(f"Hello, {first_name}!")

#output
#Hello, John!
``````

Para imprimir más de una variable, agrega otro conjunto de llaves con el nombre de la segunda variable:

``````first_name = "John"
last_name = "Doe"

print(f"Hello, {first_name} {last_name}!")

#output
#Hello, John Doe!
``````

El orden en que coloque los nombres de las variables es importante, así que asegúrese de agregarlos de acuerdo con la salida que desee.

Si hubiera invertido el orden de los nombres, obtendría el siguiente resultado:

``````first_name = "John"
last_name = "Doe"

print(f"Hello, {last_name} {first_name}!")

#output
#Hello, Doe John!
``````

## Conclusión

¡Gracias por leer y llegar hasta el final! Ahora conoce algunas formas diferentes de imprimir cadenas y variables juntas en una línea en Python.

Si desea obtener más información sobre Python, consulte la Certificación Python de freeCodeCamp .

Es adecuado para principiantes, ya que comienza desde los fundamentos y se construye gradualmente hacia conceptos más avanzados. También podrás construir cinco proyectos y poner en práctica todos los nuevos conocimientos que adquieras.

¡Feliz codificación!

https://www.freecodecamp.org/news/python-print-variable-how-to-print-a-string-and-variable/

1641884883

## Como anexar A Uma Lista Ou Matriz Em Python Como Um Profissional

Neste artigo, você aprenderá sobre o `.append()`método em Python. Você também verá como `.append()`difere de outros métodos usados ​​para adicionar elementos a listas.

Vamos começar!

## O que são listas em Python? Uma definição para iniciantes

Uma matriz na programação é uma coleção ordenada de itens e todos os itens precisam ser do mesmo tipo de dados.

No entanto, ao contrário de outras linguagens de programação, os arrays não são uma estrutura de dados embutida no Python. Em vez de arrays tradicionais, o Python usa listas.

Listas são essencialmente arrays dinâmicos e são uma das estruturas de dados mais comuns e poderosas em Python.

Você pode pensar neles como contêineres ordenados. Eles armazenam e organizam tipos semelhantes de dados relacionados juntos.

Os elementos armazenados em uma lista podem ser de qualquer tipo de dados.

Pode haver listas de inteiros (números inteiros), listas de floats (números de ponto flutuante), listas de strings (texto) e listas de qualquer outro tipo de dados interno do Python.

Embora seja possível que as listas contenham apenas itens do mesmo tipo de dados, elas são mais flexíveis do que as matrizes tradicionais. Isso significa que pode haver uma variedade de tipos de dados diferentes dentro da mesma lista.

As listas têm 0 ou mais itens, o que significa que também pode haver listas vazias. Dentro de uma lista também pode haver valores duplicados.

Os valores são separados por uma vírgula e colocados entre colchetes, `[]`.

### Como criar listas em Python

Para criar uma nova lista, primeiro dê um nome à lista. Em seguida, adicione o operador de atribuição ( `=`) e um par de colchetes de abertura e fechamento. Dentro dos colchetes, adicione os valores que você deseja que a lista contenha.

``````#create a new list of names
names = ["Jimmy", "Timmy", "Kenny", "Lenny"]

#print the list to the console
print(names)

#output
#['Jimmy', 'Timmy', 'Kenny', 'Lenny']
``````

### Como as listas são indexadas em Python

As listas mantêm uma ordem para cada item.

Cada item na coleção tem seu próprio número de índice, que você pode usar para acessar o próprio item.

Índices em Python (e em qualquer outra linguagem de programação moderna) começam em 0 e aumentam para cada item da lista.

Por exemplo, a lista criada anteriormente tinha 4 valores:

``````names = ["Jimmy", "Timmy", "Kenny", "Lenny"]
``````

O primeiro valor na lista, "Jimmy", tem um índice de 0.

O segundo valor na lista, "Timmy", tem um índice de 1.

O terceiro valor na lista, "Kenny", tem um índice de 2.

O quarto valor na lista, "Lenny", tem um índice de 3.

Para acessar um elemento na lista por seu número de índice, primeiro escreva o nome da lista, depois entre colchetes escreva o inteiro do índice do elemento.

Por exemplo, se você quisesse acessar o elemento que tem um índice de 2, você faria:

``````names = ["Jimmy", "Timmy", "Kenny", "Lenny"]

print(names[2])

#output
#Kenny
``````

## Listas em Python são mutáveis

Em Python, quando os objetos são mutáveis , significa que seus valores podem ser alterados depois de criados.

As listas são objetos mutáveis, portanto, você pode atualizá-las e alterá-las depois de criadas.

As listas também são dinâmicas, o que significa que podem crescer e diminuir ao longo da vida de um programa.

Os itens podem ser removidos de uma lista existente e novos itens podem ser adicionados a uma lista existente.

Existem métodos internos para adicionar e remover itens de listas.

Por exemplo, para add itens, há as `.append()`, `.insert()`e `.extend()`métodos.

Para remove itens, há as `.remove()`, `.pop()`e `.pop(index)`métodos.

### O que o `.append()`método faz?

O `.append()`método adiciona um elemento adicional ao final de uma lista já existente.

A sintaxe geral se parece com isso:

``````list_name.append(item)
``````

Vamos decompô-lo:

• `list_name` é o nome que você deu à lista.
• `.append()`é o método de lista para adicionar um item ao final de `list_name`.
• `item` é o item individual especificado que você deseja adicionar.

Ao usar `.append()`, a lista original é modificada. Nenhuma nova lista é criada.

Se você quiser adicionar um nome extra à lista criada anteriormente, faça o seguinte:

``````names = ["Jimmy", "Timmy", "Kenny", "Lenny"]

#add the name Dylan to the end of the list
names.append("Dylan")

print(names)

#output
#['Jimmy', 'Timmy', 'Kenny', 'Lenny', 'Dylan']
``````

### Qual é a diferença entre os métodos `.append()`e `.insert()`?

A diferença entre os dois métodos é que `.append()`adiciona um item ao final de uma lista, enquanto `.insert()`insere um item em uma posição especificada na lista.

Como você viu na seção anterior, `.append()`irá adicionar o item que você passar como argumento para a função sempre no final da lista.

Se você não quiser apenas adicionar itens ao final de uma lista, poderá especificar a posição com a qual deseja adicioná-los `.insert()`.

A sintaxe geral fica assim:

``````list_name.insert(position,item)
``````

Vamos decompô-lo:

• `list_name` é o nome da lista.
• `.insert()` é o método de lista para inserir um item em uma lista.
• `position`é o primeiro argumento para o método. É sempre um número inteiro - especificamente é o número de índice da posição onde você deseja que o novo item seja colocado.
• `item`é o segundo argumento para o método. Aqui você especifica o novo item que deseja adicionar à lista.

Por exemplo, digamos que você tenha a seguinte lista de linguagens de programação:

``````programming_languages = ["JavaScript", "Java", "C++"]

print(programming_languages)

#output
#['JavaScript', 'Java', 'C++']
``````

Se você quisesse inserir "Python" no início da lista, como um novo item da lista, você usaria o `.insert()`método e especificaria a posição como `0`. (Lembre-se de que o primeiro valor em uma lista sempre tem um índice de 0.)

``````programming_languages = ["JavaScript", "Java", "C++"]

programming_languages.insert(0, "Python")

print(programming_languages)

#output
#['Python', 'JavaScript', 'Java', 'C++']
``````

Se, em vez disso, você quisesse que "JavaScript" fosse o primeiro item da lista e, em seguida, adicionasse "Python" como o novo item, você especificaria a posição como `1`:

``````programming_languages = ["JavaScript", "Java", "C++"]

programming_languages.insert(1,"Python")

print(programming_languages)

#output
#['JavaScript', 'Python', 'Java', 'C++']
``````

O `.insert()`método oferece um pouco mais de flexibilidade em comparação com o `.append()`método que apenas adiciona um novo item ao final da lista.

### Qual é a diferença entre os métodos `.append()`e `.extend()`?

E se você quiser adicionar mais de um item a uma lista de uma só vez, em vez de adicioná-los um de cada vez?

Você pode usar o `.append()`método para adicionar mais de um item ao final de uma lista.

Digamos que você tenha uma lista que contém apenas duas linguagens de programação:

``````programming_languages = ["JavaScript", "Java"]

print(programming_languages)

#output
#['JavaScript', 'Java']
``````

Você então deseja adicionar mais dois idiomas, no final dele.

Nesse caso, você passa uma lista contendo os dois novos valores que deseja adicionar, como argumento para `.append()`:

``````programming_languages = ["JavaScript", "Java"]

#add two new items to the end of the list
programming_languages.append(["Python","C++"])

print(programming_languages)

#output
#['JavaScript', 'Java', ['Python', 'C++']]
``````

Se você observar mais de perto a saída acima, `['JavaScript', 'Java', ['Python', 'C++']]`, verá que uma nova lista foi adicionada ao final da lista já existente.

Então, `.append()` adiciona uma lista dentro de uma lista .

Listas são objetos, e quando você usa `.append()`para adicionar outra lista em uma lista, os novos itens serão adicionados como um único objeto (item).

Digamos que você já tenha duas listas, assim:

``````names = ["Jimmy", "Timmy"]
more_names = ["Kenny", "Lenny"]
``````

E se você quiser combinar o conteúdo de ambas as listas em uma, adicionando o conteúdo de `more_names`a `names`?

Quando o `.append()`método é usado para essa finalidade, outra lista é criada dentro de `names`:

``````names = ["Jimmy", "Timmy"]
more_names = ["Kenny", "Lenny"]

#add contents of more_names to names
names.append(more_names)

print(names)

#output
#['Jimmy', 'Timmy', ['Kenny', 'Lenny']]
``````

Então, `.append()`adiciona os novos elementos como outra lista, anexando o objeto ao final.

Para realmente concatenar (adicionar) listas e combinar todos os itens de uma lista para outra , você precisa usar o `.extend()`método.

A sintaxe geral fica assim:

``````list_name.extend(iterable/other_list_name)
``````

Vamos decompô-lo:

• `list_name` é o nome de uma das listas.
• `.extend()` é o método para adicionar todo o conteúdo de uma lista a outra.
• `iterable`pode ser qualquer iterável, como outra lista, por exemplo, `another_list_name`. Nesse caso, `another_list_name`é uma lista que será concatenada com `list_name`, e seu conteúdo será adicionado um a um ao final de `list_name`, como itens separados.

Então, tomando o exemplo anterior, quando `.append()`for substituído por `.extend()`, a saída ficará assim:

``````names = ["Jimmy", "Timmy"]
more_names = ["Kenny", "Lenny"]

names.extend(more_names)

print(names)

#output
#['Jimmy', 'Timmy', 'Kenny', 'Lenny']
``````

Quando usamos `.extend()`, a `names`lista foi estendida e seu comprimento aumentado em 2.

A maneira como `.extend()`funciona é que ele pega uma lista (ou outro iterável) como argumento, itera sobre cada elemento e, em seguida, cada elemento no iterável é adicionado à lista.

Há outra diferença entre `.append()`e `.extend()`.

Quando você deseja adicionar uma string, como visto anteriormente, `.append()`adiciona o item inteiro e único ao final da lista:

``````names = ["Jimmy", "Timmy", "Kenny", "Lenny"]

#add the name Dylan to the end of the list
names.append("Dylan")

print(names)

#output
#['Jimmy', 'Timmy', 'Kenny', 'Lenny', 'Dylan']
``````

Se, em `.extend()`vez disso, você adicionasse uma string ao final de uma lista, cada caractere na string seria adicionado como um item individual à lista.

Isso ocorre porque as strings são iteráveis ​​e `.extend()`iteram sobre o argumento iterável passado para ela.

Então, o exemplo acima ficaria assim:

``````names = ["Jimmy", "Timmy", "Kenny", "Lenny"]

#pass a string(iterable) to .extend()
names.extend("Dylan")

print(names)

#output
#['Jimmy', 'Timmy', 'Kenny', 'Lenny', 'D', 'y', 'l', 'a', 'n']
``````

## Conclusão

Resumindo, o `.append()`método é usado para adicionar um item ao final de uma lista existente, sem criar uma nova lista.

Quando é usado para adicionar uma lista a outra lista, cria uma lista dentro de uma lista.

Se você quiser saber mais sobre Python, confira a Certificação Python do freeCodeCamp . Você começará a aprender de maneira interativa e amigável para iniciantes. Você também construirá cinco projetos no final para colocar em prática o que aprendeu.

#python