1603350000

# 10 Principles of Practical Statistical Reasoning

There are 2 core aspects to fruitful application of statistics (data science):

1. Domain knowledge.
2. Statistical methodology.

Due to the highly specific nature of this field, it is difficult for any book or article to convey both a detailed and accurate description of the interplay between the two. In general, one can read material of two types:

1. Broad info on statistical methods with conclusions that generalise but are not specific.
2. Detailed statistical methods with conclusions that are useful only in a specific domain.

After 3 years working on my own data science projects and 3.5 years manipulating data on the trading floor, there is an additional category of learnings. It is fundamentally just as useful as the above and I take them into every project/side hustle/consulting gig…

Practical Statistical Reasoning

I made that term up because I don’t really know what to call this category. However, it covers:

• The nature and objective of applied statistics/data science.
• Principles common to all applications
• Practical steps/questions for better conclusions

If you have experience of the application of statistical methods, I encourage you to use your experience to illuminate and criticise the following principles. If you have never tried implementing a statistical model, have a go and then return. Don’t see the following as a list to memorise. You’ll get peak synthesis of information if you can relate to your own experience.

The following principles have helped me become more efficient with my analyses and clearer in my conclusions. I hope you can find value in them too.

#machine-learning #data-science #statistics #programming #data

1666082925

## How to Create Arrays in Python

### In this tutorial, you'll know the basics of how to create arrays in Python using the array module. Learn how to use Python arrays. You'll see how to define them and the different methods commonly used for performing operations on them.

This tutorialvideo on 'Arrays in Python' will help you establish a strong hold on all the fundamentals in python programming language. Below are the topics covered in this video:
1:15 What is an array?
2:53 Is python list same as an array?
3:48  How to create arrays in python?
7:19 Accessing array elements
9:59 Basic array operations
- 10:33  Finding the length of an array
- 11:44  Adding Elements
- 15:06  Removing elements
- 18:32  Array concatenation
- 20:59  Slicing
- 23:26  Looping

Python Array Tutorial – Define, Index, Methods

In this article, you'll learn how to use Python arrays. You'll see how to define them and the different methods commonly used for performing operations on them.

The artcile covers arrays that you create by importing the `array module`. We won't cover NumPy arrays here.

1. Introduction to Arrays
1. The differences between Lists and Arrays
2. When to use arrays
2. How to use arrays
1. Define arrays
2. Find the length of arrays
3. Array indexing
4. Search through arrays
5. Loop through arrays
6. Slice an array
3. Array methods for performing operations
1. Change an existing value
2. Add a new value
3. Remove a value
4. Conclusion

Let's get started!

## What are Python Arrays?

Arrays are a fundamental data structure, and an important part of most programming languages. In Python, they are containers which are able to store more than one item at the same time.

Specifically, they are an ordered collection of elements with every value being of the same data type. That is the most important thing to remember about Python arrays - the fact that they can only hold a sequence of multiple items that are of the same type.

### What's the Difference between Python Lists and Python Arrays?

Lists are one of the most common data structures in Python, and a core part of the language.

Lists and arrays behave similarly.

Just like arrays, lists are an ordered sequence of elements.

They are also mutable and not fixed in size, which means they can grow and shrink throughout the life of the program. Items can be added and removed, making them very flexible to work with.

However, lists and arrays are not the same thing.

Lists store items that are of various data types. This means that a list can contain integers, floating point numbers, strings, or any other Python data type, at the same time. That is not the case with arrays.

As mentioned in the section above, arrays store only items that are of the same single data type. There are arrays that contain only integers, or only floating point numbers, or only any other Python data type you want to use.

### When to Use Python Arrays

Lists are built into the Python programming language, whereas arrays aren't. Arrays are not a built-in data structure, and therefore need to be imported via the `array module` in order to be used.

Arrays of the `array module` are a thin wrapper over C arrays, and are useful when you want to work with homogeneous data.

They are also more compact and take up less memory and space which makes them more size efficient compared to lists.

If you want to perform mathematical calculations, then you should use NumPy arrays by importing the NumPy package. Besides that, you should just use Python arrays when you really need to, as lists work in a similar way and are more flexible to work with.

## How to Use Arrays in Python

In order to create Python arrays, you'll first have to import the `array module` which contains all the necassary functions.

There are three ways you can import the `array module`:

• By using `import array` at the top of the file. This includes the module `array`. You would then go on to create an array using `array.array()`.
``````import array

#how you would create an array
array.array()``````
• Instead of having to type `array.array()` all the time, you could use `import array as arr` at the top of the file, instead of `import array` alone. You would then create an array by typing `arr.array()`. The `arr` acts as an alias name, with the array constructor then immediately following it.
``````import array as arr

#how you would create an array
arr.array()``````
• Lastly, you could also use `from array import *`, with `*` importing all the functionalities available. You would then create an array by writing the `array()` constructor alone.
``````from array import *

#how you would create an array
array()``````

### How to Define Arrays in Python

Once you've imported the `array module`, you can then go on to define a Python array.

The general syntax for creating an array looks like this:

``variable_name = array(typecode,[elements])``

Let's break it down:

• `variable_name` would be the name of the array.
• The `typecode` specifies what kind of elements would be stored in the array. Whether it would be an array of integers, an array of floats or an array of any other Python data type. Remember that all elements should be of the same data type.
• Inside square brackets you mention the `elements` that would be stored in the array, with each element being separated by a comma. You can also create an empty array by just writing `variable_name = array(typecode)` alone, without any elements.

Below is a typecode table, with the different typecodes that can be used with the different data types when defining Python arrays:

Tying everything together, here is an example of how you would define an array in Python:

``````import array as arr

numbers = arr.array('i',[10,20,30])

print(numbers)

#output

#array('i', [10, 20, 30])``````

Let's break it down:

• First we included the array module, in this case with `import array as arr `.
• Then, we created a `numbers` array.
• We used `arr.array()` because of `import array as arr `.
• Inside the `array()` constructor, we first included `i`, for signed integer. Signed integer means that the array can include positive and negative values. Unsigned integer, with `H` for example, would mean that no negative values are allowed.
• Lastly, we included the values to be stored in the array in square brackets.

Keep in mind that if you tried to include values that were not of `i` typecode, meaning they were not integer values, you would get an error:

``````import array as arr

numbers = arr.array('i',[10.0,20,30])

print(numbers)

#output

#Traceback (most recent call last):
# File "/Users/dionysialemonaki/python_articles/demo.py", line 14, in <module>
#   numbers = arr.array('i',[10.0,20,30])
#TypeError: 'float' object cannot be interpreted as an integer``````

In the example above, I tried to include a floating point number in the array. I got an error because this is meant to be an integer array only.

Another way to create an array is the following:

``````from array import *

#an array of floating point values
numbers = array('d',[10.0,20.0,30.0])

print(numbers)

#output

#array('d', [10.0, 20.0, 30.0])``````

The example above imported the `array module` via `from array import *` and created an array `numbers` of float data type. This means that it holds only floating point numbers, which is specified with the `'d'` typecode.

### How to Find the Length of an Array in Python

To find out the exact number of elements contained in an array, use the built-in `len()` method.

It will return the integer number that is equal to the total number of elements in the array you specify.

``````import array as arr

numbers = arr.array('i',[10,20,30])

print(len(numbers))

#output
# 3``````

In the example above, the array contained three elements – `10, 20, 30` – so the length of `numbers` is `3`.

### Array Indexing and How to Access Individual Items in an Array in Python

Each item in an array has a specific address. Individual items are accessed by referencing their index number.

Indexing in Python, and in all programming languages and computing in general, starts at `0`. It is important to remember that counting starts at `0` and not at `1`.

To access an element, you first write the name of the array followed by square brackets. Inside the square brackets you include the item's index number.

The general syntax would look something like this:

``array_name[index_value_of_item]``

Here is how you would access each individual element in an array:

``````import array as arr

numbers = arr.array('i',[10,20,30])

print(numbers[0]) # gets the 1st element
print(numbers[1]) # gets the 2nd element
print(numbers[2]) # gets the 3rd element

#output

#10
#20
#30``````

Remember that the index value of the last element of an array is always one less than the length of the array. Where `n` is the length of the array, `n - 1` will be the index value of the last item.

Note that you can also access each individual element using negative indexing.

With negative indexing, the last element would have an index of `-1`, the second to last element would have an index of `-2`, and so on.

Here is how you would get each item in an array using that method:

``````import array as arr

numbers = arr.array('i',[10,20,30])

print(numbers[-1]) #gets last item
print(numbers[-2]) #gets second to last item
print(numbers[-3]) #gets first item

#output

#30
#20
#10``````

### How to Search Through an Array in Python

You can find out an element's index number by using the `index()` method.

You pass the value of the element being searched as the argument to the method, and the element's index number is returned.

``````import array as arr

numbers = arr.array('i',[10,20,30])

#search for the index of the value 10
print(numbers.index(10))

#output

#0``````

If there is more than one element with the same value, the index of the first instance of the value will be returned:

``````import array as arr

numbers = arr.array('i',[10,20,30,10,20,30])

#search for the index of the value 10
#will return the index number of the first instance of the value 10
print(numbers.index(10))

#output

#0``````

### How to Loop through an Array in Python

You've seen how to access each individual element in an array and print it out on its own.

You've also seen how to print the array, using the `print()` method. That method gives the following result:

``````import array as arr

numbers = arr.array('i',[10,20,30])

print(numbers)

#output

#array('i', [10, 20, 30])``````

What if you want to print each value one by one?

This is where a loop comes in handy. You can loop through the array and print out each value, one-by-one, with each loop iteration.

For this you can use a simple `for` loop:

``````import array as arr

numbers = arr.array('i',[10,20,30])

for number in numbers:
print(number)

#output
#10
#20
#30``````

You could also use the `range()` function, and pass the `len()` method as its parameter. This would give the same result as above:

``````import array as arr

values = arr.array('i',[10,20,30])

#prints each individual value in the array
for value in range(len(values)):
print(values[value])

#output

#10
#20
#30``````

### How to Slice an Array in Python

To access a specific range of values inside the array, use the slicing operator, which is a colon `:`.

When using the slicing operator and you only include one value, the counting starts from `0` by default. It gets the first item, and goes up to but not including the index number you specify.

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

#get the values 10 and 20 only
print(numbers[:2])  #first to second position

#output

#array('i', [10, 20])``````

When you pass two numbers as arguments, you specify a range of numbers. In this case, the counting starts at the position of the first number in the range, and up to but not including the second one:

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

#get the values 20 and 30 only
print(numbers[1:3]) #second to third position

#output

#rray('i', [20, 30])``````

## Methods For Performing Operations on Arrays in Python

Arrays are mutable, which means they are changeable. You can change the value of the different items, add new ones, or remove any you don't want in your program anymore.

Let's see some of the most commonly used methods which are used for performing operations on arrays.

### How to Change the Value of an Item in an Array

You can change the value of a specific element by speficying its position and assigning it a new value:

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

#change the first element
#change it from having a value of 10 to having a value of 40
numbers[0] = 40

print(numbers)

#output

#array('i', [40, 20, 30])``````

### How to Add a New Value to an Array

To add one single value at the end of an array, use the `append()` method:

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

#add the integer 40 to the end of numbers
numbers.append(40)

print(numbers)

#output

#array('i', [10, 20, 30, 40])``````

Be aware that the new item you add needs to be the same data type as the rest of the items in the array.

Look what happens when I try to add a float to an array of integers:

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

#add the integer 40 to the end of numbers
numbers.append(40.0)

print(numbers)

#output

#Traceback (most recent call last):
#  File "/Users/dionysialemonaki/python_articles/demo.py", line 19, in <module>
#   numbers.append(40.0)
#TypeError: 'float' object cannot be interpreted as an integer``````

But what if you want to add more than one value to the end an array?

Use the `extend()` method, which takes an iterable (such as a list of items) as an argument. Again, make sure that the new items are all the same data type.

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

#add the integers 40,50,60 to the end of numbers
#The numbers need to be enclosed in square brackets

numbers.extend([40,50,60])

print(numbers)

#output

#array('i', [10, 20, 30, 40, 50, 60])``````

And what if you don't want to add an item to the end of an array? Use the `insert()` method, to add an item at a specific position.

The `insert()` function takes two arguments: the index number of the position the new element will be inserted, and the value of the new element.

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

#add the integer 40 in the first position
#remember indexing starts at 0

numbers.insert(0,40)

print(numbers)

#output

#array('i', [40, 10, 20, 30])``````

### How to Remove a Value from an Array

To remove an element from an array, use the `remove()` method and include the value as an argument to the method.

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30])

numbers.remove(10)

print(numbers)

#output

#array('i', [20, 30])``````

With `remove()`, only the first instance of the value you pass as an argument will be removed.

See what happens when there are more than one identical values:

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30,10,20])

numbers.remove(10)

print(numbers)

#output

#array('i', [20, 30, 10, 20])``````

Only the first occurence of `10` is removed.

You can also use the `pop()` method, and specify the position of the element to be removed:

``````import array as arr

#original array
numbers = arr.array('i',[10,20,30,10,20])

#remove the first instance of 10
numbers.pop(0)

print(numbers)

#output

#array('i', [20, 30, 10, 20])``````

## Conclusion

And there you have it - you now know the basics of how to create arrays in Python using the `array module`. Hopefully you found this guide helpful.

Thanks for reading and happy coding!

#python #programming

1603350000

## 10 Principles of Practical Statistical Reasoning

There are 2 core aspects to fruitful application of statistics (data science):

1. Domain knowledge.
2. Statistical methodology.

Due to the highly specific nature of this field, it is difficult for any book or article to convey both a detailed and accurate description of the interplay between the two. In general, one can read material of two types:

1. Broad info on statistical methods with conclusions that generalise but are not specific.
2. Detailed statistical methods with conclusions that are useful only in a specific domain.

After 3 years working on my own data science projects and 3.5 years manipulating data on the trading floor, there is an additional category of learnings. It is fundamentally just as useful as the above and I take them into every project/side hustle/consulting gig…

Practical Statistical Reasoning

I made that term up because I don’t really know what to call this category. However, it covers:

• The nature and objective of applied statistics/data science.
• Principles common to all applications
• Practical steps/questions for better conclusions

If you have experience of the application of statistical methods, I encourage you to use your experience to illuminate and criticise the following principles. If you have never tried implementing a statistical model, have a go and then return. Don’t see the following as a list to memorise. You’ll get peak synthesis of information if you can relate to your own experience.

The following principles have helped me become more efficient with my analyses and clearer in my conclusions. I hope you can find value in them too.

#machine-learning #data-science #statistics #programming #data

1658878980

## Branchless workflow for Git

(This suite of tools is 100% compatible with branches. If you think this is confusing, you can suggest a new name here.)

`git-branchless` is a suite of tools which enhances Git in several ways:

It makes Git easier to use, both for novices and for power users. Examples:

It adds more flexibility for power users. Examples:

It provides faster operations for large repositories and monorepos, particularly at large tech companies. Examples:

• See the blog post Lightning-fast rebases with git-move.
• Performance tested: benchmarked on torvalds/linux (1M+ commits) and mozilla/gecko-dev (700k+ commits).
• Operates in-memory: avoids touching the working copy by default (which can slow down `git status` or invalidate build artifacts).
• Sparse indexes: uses a custom implementation of sparse indexes for fast commit and merge operations.
• Segmented changelog DAG: for efficient queries on the commit graph, such as merge-base calculation in O(log n) instead of O(n).
• Ahead-of-time compiled: written in an ahead-of-time compiled language with good runtime performance (Rust).
• Multithreading: distributes work across multiple CPU cores where appropriate.
• To my knowledge, `git-branchless` provides the fastest implementation of rebase among Git tools and UIs, for the above reasons.

See also the User guide and Design goals.

## Demos

### Repair

Undo almost anything:

• Commits.
• Amended commits.
• Merges and rebases (e.g. if you resolved a conflict wrongly).
• Checkouts.
• Branch creations, updates, and deletions.

Why not `git reflog`?

`git reflog` is a tool to view the previous position of a single reference (like `HEAD`), which can be used to undo operations. But since it only tracks the position of a single reference, complicated operations like rebases can be tedious to reverse-engineer. `git undo` operates at a higher level of abstraction: the entire state of your repository.

`git reflog` also fundamentally can't be used to undo some rare operations, such as certain branch creations, updates, and deletions. See the architecture document for more details.

What doesn't `git undo` handle?

`git undo` relies on features in recent versions of Git to work properly. See the compatibility chart.

Currently, `git undo` can't undo the following. You can find the design document to handle some of these cases in issue #10.

• "Uncommitting" a commit by undoing the commit and restoring its changes to the working copy.
• In stock Git, this can be accomplished with `git reset HEAD^`.
• This scenario would be better implemented with a custom `git uncommit` command instead. See issue #3.
• Undoing the staging or unstaging of files. This is tracked by issue #10 above.
• Undoing back into the middle of a conflict, such that `git status` shows a message like `path/to/file (both modified)`, so that you can resolve that specific conflict differently. This is tracked by issue #10 above.

Fundamentally, `git undo` is not intended to handle changes to untracked files.

Comparison to other Git undo tools

### Visualize

Visualize your commit history with the smartlog (`git sl`):

Why not `git log --graph`?

`git log --graph` only shows commits which have branches attached with them. If you prefer to work without branches, then `git log --graph` won't work for you.

To support users who rewrite their commit graph extensively, `git sl` also points out commits which have been abandoned and need to be repaired (descendants of commits marked with `rewritten as abcd1234`). They can be automatically fixed up with `git restack`, or manually handled.

### Manipulate

Edit your commit graph without fear:

Why not `git rebase -i`?

Interactive rebasing with `git rebase -i` is fully supported, but it has a couple of shortcomings:

• `git rebase -i` can only repair linear series of commits, not trees. If you modify a commit with multiple children, then you have to be sure to rebase all of the other children commits appropriately.
• You have to commit to a plan of action before starting the rebase. For some use-cases, it can be easier to operate on individual commits at a time, rather than an entire series of commits all at once.

When you use `git rebase -i` with `git-branchless`, you will be prompted to repair your commit graph if you abandon any commits.

## Installation

Short version: run `cargo install --locked git-branchless`, then run `git branchless init` in your repository.

## Status

`git-branchless` is currently in alpha. Be prepared for breaking changes, as some of the workflows and architecture may change in the future. It's believed that there are no major bugs, but it has not yet been comprehensively battle-tested. You can see the known issues in the issue tracker.

`git-branchless` follows semantic versioning. New 0.x.y versions, and new major versions after reaching 1.0.0, may change the on-disk format in a backward-incompatible way.

To be notified about new versions, select Watch » Custom » Releases in Github's notifications menu at the top of the page. Or use GitPunch to deliver notifications by email.

## Related tools

There's a lot of promising tooling developing in this space. See Related tools for more information.

## Contributing

Thanks for your interest in contributing! If you'd like, I'm happy to set up a call to help you onboard.

For code contributions, check out the Runbook to understand how to set up a development workflow, and the Coding guidelines. You may also want to read the Architecture documentation.

For contributing documentation, see the Wiki style guide.

Contributors should abide by the Code of Conduct.

Author: arxanas
Source code: https://github.com/arxanas/git-branchless

#rust #rustlang #git

1595096220

## Factors That Can Contribute to the Faulty Statistical Inference

Hypothesis testing is a procedure where researchers make a precise statement based on their findings or data. Then, they collect evidence to falsify that precise statement or claim. This precise statement or claim is called the null hypothesis. If the evidence is strong to falsify the null hypothesis, we can reject the null hypothesis and adapt the alternative hypothesis. This is the basic idea of hypothesis testing.

## Error Types in Statistical Testing

There are two distinct types of errors that can occur in formal hypothesis testing. They are:

Type I: Type I error occurs when the null hypothesis is true but the hypothesis testing results show the evidence to reject it. This is called a false positive.

Type II: Type II error occurs when the null hypothesis is not true but it is not rejected in hypothesis testing.

Most hypothesis testing procedure performs well controlling type I error (at 5%) in ideal conditions. That may give a false idea that there is only a 5% probability that the reported findings are wrong. But it’s not that simple. The probability can be much higher than 5%.

## Normality of the Data

The normality of the data is an issue that can break down a statistical test. If the dataset is small, the normality of the data is very important for some statistical processes such as confidence interval or p-test. But if the data is large enough, normality does not have a significant impact.

## Correlation

If the variables in the dataset are correlated with each other, that may result in poor statistical inference. Look at this picture below:

In this graph, two variables seem to have a strong correlation. Or, if a series of data is observed as a sequence, that means values are correlated with its neighbors, and there may have some clustering or autocorrelation in the data. This kind of behavior in the dataset can adversely impact the statistical tests.

## Correlation and Causation

This is especially important when interpreting the result of a statistical test. “Correlation does not mean causation”. Here is an example. Suppose, you have study data that shows, more people who do not have college education believe that women should get paid less than men in the workplace. You may have conducted a good hypothesis testing and prove that. But care must be taken on what conclusion is drawn from this. Probably, there is a correlation between college education and the belief that ‘women should get paid less’. But it is not fair to say that not having a college degree is the cause of such belief. This is a correlation but not a direct cause ad effect relationship.

A more clear example can be provided from medical data. Studies showed that people with fewer cavities are less likely to get heart disease. You may have enough data to statistically prove that but you actually cannot say that the dental cavity causes heart disease. There is no medical theory like that.

#statistical-analysis #statistics #statistical-inference #math #data analysis

1595703120

## Top 10 Statistics Concepts to know prior

The field of statistics is the science of learning from data. Statistical knowledge helps you use the proper methods to collect the data, employ the correct analyses, and effectively present the results. Statistics is a crucial process behind how we make discoveries in science, make decisions based on data, and make predictions. Statistics allows you to understand a subject much more deeply.

STATISTICS — Is known as to be the top prerequisite for a Data Science job. I personally did understand the few concepts when reading about Linear Regression, but if someone randomly asked me about Standard Deviation, I would be confused for sure.

So in this article, I have tried to build up a friendly approach towards some frequently asked Statistics questions. I am sure this will be beneficial to many.

Common Terms:

1. Mean
2. Mode
3. Median
4. Variance
5. Standard Deviation
6. Z-score
7. Correlation
8. Normal Distribution
9. Empirical Rule
10. Sampling

Also lets keep in mind the python library **.describe() , **this will give a hands on practice prior to starting off our Understanding.

Figure 1

I will be referring to this Figure in the further read.

# Lets get started!

## 1. Mean

Also known as one of the Central tendencies, Mean is basically the average of all the data points present for a feature.

But what is Central Tendency?

Central Tendency is used to indicate where does the middle or center of the distribution of our data lies.

Question: Which of these measures are used to analyze the central tendency of data?

a) Mean and Normal Distribution.

b) Mean, Median and Mode.

c) Mode, Alpha & Range.

d) Standard Deviation, Range and Mean

e) Median, Range and Normal Distribution.

Solution (b): The mean, median, mode are the three statistical measures which help us to analyze the central tendency of data. We use these measures to find the central value of the data to summarize the entire data set.

Calculation:

#statistical-analysis #data-science #machine-learning #statistics #interview #data analysis