From Groups to Individuals: Permutation Testing

From Groups to Individuals: Permutation Testing

Introduction to permutation testing. A typical situation regarding solving an experimental question using a data-driven approach involves several groups that differ in (hopefully) one, sometimes more variables.

Approaching group data (between-group)

A typical situation regarding solving an experimental question using a data-driven approach involves several groups that differ in (hopefully) one, sometimes more variables.

Say you collect data on people that either ate (Group 1) or did not eat chocolate (Group 2). Because you know the literature very well, and you are an expert in your field, you believe that people that ate chocolate are more likely to ride camels than people that did not eat the chocolate.

You now want to prove that empirically.

I will be generating simulation data using Python, to demonstrate how permutation testing can be a great tool to detect within-group variations that could reveal peculiar patterns of some individuals. If your two groups are statistically different, then you might explore what underlying parameters could account for this difference. If your two groups are not different, you might want to explore whether some data points still behave “weirdly”, to decide whether to keep on collecting data or dropping the topic.

## Load standard libraries
import panda as pd 
import numpy as np
import matplotlib.pyplot as plt

Now one typical approach in this (a bit crazy) experimental situation would be to look at the difference in camel riding propensity in each group. You could compute the proportions of camel riding actions, or the time spent on a camel, or any other dependent variable that might capture the effect you believe to be true.

Generating data

Let’s generate the distribution of the chocolate group:

## Set seed for replicability
np.random.seed(42)

## Set Mean, SD and sample size
mean = 10; sd=1; sample_size=1000
## Generate distribution according to parameters
chocolate_distibution = np.random.normal(loc=mean, scale=sd, s
size=sample_size)
## Show data
plt.hist(chocolate_distibution)
plt.ylabel("Time spent on a camel")
plt.title("Chocolate Group")

Image for post

Figure 1 | Histogram depicting the number of people that rode the camel in the chocolate group, per minute bin.

As you can see, I created a distribution centered around 10mn. Now let’s create the second distribution, which could be the control, centered at 9mn.

permutations python data-science

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Data Science With Python Training | Python Data Science Course | Intellipaat

🔵 Intellipaat Data Science with Python course: https://intellipaat.com/python-for-data-science-training/In this Data Science With Python Training video, you...

Python for Data Science | Data Science With Python | Python Data Science Tutorial

🔥Intellipaat Python for Data Science Course: https://intellipaat.com/python-for-data-science-training/In this python for data science video you will learn e...

Applied Data Science with Python Certification Training Course -IgmGuru

Master Applied Data Science with Python and get noticed by the top Hiring Companies with IgmGuru's Data Science with Python Certification Program. Enroll Now

50 Data Science Jobs That Opened Just Last Week

Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments. Our latest survey report suggests that as the overall Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments, data scientists and AI practitioners should be aware of the skills and tools that the broader community is working on. A good grip in these skills will further help data science enthusiasts to get the best jobs that various industries in their data science functions are offering.

Basic Data Types in Python | Python Web Development For Beginners

In the programming world, Data types play an important role. Each Variable is stored in different data types and responsible for various functions. Python had two different objects, and They are mutable and immutable objects.