1599491940

# Why Is Entropy? What It Has to Do With Information?

Why the more uncertain we are about the result of an experiment the more information we get after observing it? What is the relation between entropy and information? You will get the answer to those questions and more by going through this article.

## Thinking in terms of Bits

Imagine you want to send outcomes of 3 coin flips to your friend’s house. Your friend knows that you want to send him those messages but all he can do is get the answer of Yes/No questions arranged by him. Let’s assume the arranged question: Is it head? You will send him the sequence of zeros or ones as an answer to those questions which is commonly known as a bit(binary digit). If Zero represents No and one represents Yes and the actual outcome of the toss was Head, Head, and Tail. Then you would need to send him 1 1 0 to convey your facts(information). so it costs us 3 bits to send those messages.

How many bits does it take to make humans?

Our entire genetic code is contained in a sequence of 4 states T, A, G, C in DNA. Now we would need 2 bits (00,01,10,11) to encode these states. And multiplied by 6 billion letters of genetic code in the genome yields 1.5 GB of information. So we can fit our entire genetic code on a single DVD.

## How much Information?

S

uppose I flip a fair coin with a 50% chance of getting head and a 50% chance of getting tails. Similarly now instead of a fair coin, I flip a biased coin with head on both sides. What do you think in which case I am more certain about the outcome of the toss? Obviously the answer would be a biased coin. It is because in the case of the fair coin I am uncertain about the outcome because none of the possibilities are more likely to happen than the other while in biased coin I am not uncertain about the outcome because I know I will get heads.

Let’s look at the other way. Would you be surprised if I told you the coin with head on both sides gave head as an outcome? No. Because you did not learn anything new with that statement. Outcomes did not give you any further information. On the other hand, when the coin is fair you have the least knowledge of what will happen next. So each toss gives you the new information.

So the intuition behind quantifying information is the idea of measuring how much surprise there is in an event. Those events that are rare (low probability) are more surprising and therefore have more information than those events that are common (high probability).

Low Probability Event: High Information (surprising)

High Probability Event: Low Information (unsurprising)

As a prereqisite, If you want to learn about basic probability theory, I wrote about that here.

### Probability for Machine Learning and Data Science

Probability

for Machine Learning and Data Science Probabilitymedium.com

So information seems to be randomness. So if we want to know how much information does something contains we need to know how random and unpredictable it is. Mathematically, Information gained by observing an event X with probability P is given by:

By plugging the values in the formula, we can clearly see information contained in certain events like observing head by tossing a coin with heads on both sides is 0 while the uncertain event leads to less information. So this definition satisfies the basic requirement that it is a decreasing function of p.

But You may have the question ……………

Why the logarithmic function?

And what is the base of the logarithm?

As an answer to the second question, You can use any base of the logarithm. In information theory, we use base 2 in which case the unit of information is called a bit.

#technology #machine-learning #artificial-intelligence #data-science #deep-learning

1622800020

## Guide to PyTerrier: A Python Framework for Information Retrieval

Information Retrieval is one of the key tasks in many natural language processing applications. The process of searching and collecting information from databases or resources based on queries or requirements, Information Retrieval (IR). The fundamental elements of an Information Retrieval system are query and document. The query is the user’s information requirement, and the document is the resource that contains the information. An efficient IR system collects the required information accurately from the document in a compute-effective manner.

Register for AWS ML Fridays and learn how to make a career in data science.

The popular Information Retrieval frameworks are mostly written in Java, Scala, C++ and C. Though they are adaptable in many languages, end-to-end evaluation of Python-based IR models is a tedious process and needs many configuration adjustments. Further, reproducibility of the IR workflow under different environments is practically not possible with the available frameworks.

Machine Learning heavily relies on the high-level Python language. Deep learning models are built almost on one of the two Python frameworks: TensorFlow and PyTorch. Though most natural language processing applications are built on top of Python frameworks and libraries nowadays, there is no well-adaptable Python framework for the Information Retrieval tasks. Hence, here comes the need for a Python-based Information Retrieval framework that supports end-to-end experimentation with reproducible results and model comparisons.

#developers corner #information #information extraction #information retrieval #ir #learn-to-rank #ltr #pyterrier #python #random forest #ranking #terrier #xgboost

1624162320

## Big data in GIS has critical ramifications for how we procure and leverage spatial data

In the midst of the surge of data we gather and fight with consistently, geospatial information possesses an interesting spot. Because of the networks of GPS satellites and cell towers and the rising Internet of Things, we’re able to track and correlate the location of people and items in exact manners that were impractical up to this point. Yet, putting this geospatial information to use is more difficult than one might expect.

It is frequently said that 80% of data has a spatial part. Once in a while it is a coordinate gathered from a GPS application, or essentially an address that gets geocoded to a location along a street centerline. Regardless, it is surprisingly simple to get the location of an item. With moving items, location and time are imperative to follow the article alongside some other applicable attributes (temperature, point, size, shading, and so forth). As sensors and devices become increasingly connected, data is being gathered at an uncommon rate.

The Big data pattern has drastically affected each industry, so it is little amazement that big data in GIS has critical ramifications for how we procure and leverage spatial data. Big data is definitely not a new pattern. Notwithstanding, it is turning into a bigger part of geographic data science.

Maybe perhaps the greatest change in the discussion around big data has been in the relationship between software, hardware, and expertise. One of the foremost utilizations of geospatial big data analytics has been in the humanitarian area. GIS IoT gadgets are currently being utilized across the world to gather information in conditions which were previously hard for aid workers to access and thus hard to work in.

For an illustration of the manner by which geospatial big data analytics can function admirably in this area, consider by DigitalGlobe, a non-profit organization that sources satellite information and coordinates it with different sources like social media notion and aerial imagery, use a GIS machine learning algorithm to follow activity in explicit areas and identify anomalies.

Geospatial information is not simply an area, nonetheless. Geospatial information likewise tracks how things are connected and where they are in relation to other objects. Realizing how an object changes over the long run corresponding to different items can give critical insights. For instance, how truck maintenance recommendations change depending on where a truck is found and how it is driven in the field? Utilizing all of your data to drive more intelligent maintenance plans sets aside cash, time and assets.

#big data #latest news #significant benefits of geospatial information and big data analytics #geospatial information #information

1597716000

## Enhance Decision Tree accuracy with Tsallis Entropy

We have been using decision trees for regression and classification problems for good amount of time. In the training process, growth of the tree depends on the split criteria after random selection of samples and features from the training data. We have been using Gini Index or Shannon Entropy as the split criteria across techniques developed around decision tree. And its well accepted decision criteria across time and domain.

Its has been suggested that choosing between Gini Index and Shannon Entropy does not make significant different. In practice we choose Gini Index over Shanon Entropy just to avoid logarithmic computations.

The most methodical part of decision tree is spliting the nodes. We can understand the criticality of the meaurement we choose for the split. Gini Index has worked out for most of the solutions but whats the harm in getting additional few points of accuracy.

The very near by alternative to Gini Index and Shannon Entropy is Tsallis Entropy. Actually Tsallis is not alternative but the parent of Gini and Entropy. Lets see how -

#machine-learning #data-science #entropy #decision-tree #information-theory #deep learning

1596518131

## But what is Entropy?

This write-up re-introduces the concept of entropy from different perspectives with a focus on its importance in machine learning, probabilistic programming, and information theory.

Here is how it is defined by the dictionaries as per a quick google search -

Based on this result, you can notice that there are two core ideas here and at first, the correlation between them does not seem to be quite obvious -

• Entropy is the missing (or required) energy to do work as per thermodynamics
• Entropy is a measure of disorder or randomness (uncertainty)

So what is it — missing energy, or a measure, or both? Let me provide some perspectives that hopefully would help you come to peace with these definitions.

#### Shit Happens!

Rephrasing this obnoxious title into something bit more acceptable

Anything that can go wrong, will go wrong — Murphy’s Law

We have all accepted this law because we observe and experience this all the time and the culprit behind this is none other than the topic of this writeup — yup, you got it, it’s Entropy!

So now I have confused you more — entropy is not only the missing energy and the measure of disorder but it is also responsible for the disorder. Great!

We can not make up our minds here as far as the definition is concerned. However, the truth is all of the above mentioned 3 perspectives are correct given the appropriate context. To understand these contexts let’s first check out disorder and its relation with entropy.

#### Disorder is the dominating force

I explain this with the help of examples from an article by James Clear (Author of Atomic Habits).

Source: Left Image (https://pixabay.com/illustrations/puzzle-puzzle-piece-puzzles-3303412/) Right Image (Photo by James Lee on Unsplash) + annotated by Author

Theoretically, both of these are possible but the odds of them happening are astronomically small. Ok, fine, call it impossible 🤐 !. The main message here is the following:

There are always far more disorderly variations than orderly ones!

and borrowing the wisdom of great Steven Pinker -:

#kl-divergence #machine-learning #entropy #intuition #tensorflow-probability #tensorflow

1601301600

## DECISION TREE

The decision tree falls under the category of supervised machine learning technique, it is also referred to as CART (Classification and Regression Trees). It utilises a tree structure to model relationships among the features and the outcomes. It consists **_nodes _**which represents decision function and **_branches _**which represent the output of the decision functions. Thus, it is a flow chart for deciding how to classify a new data point.

The decision selects the best attribute using Attribute Selection Measures(ASM) to split the records. The tree criterion splits the data into subsets and subsets into further smaller subsets. The algorithm stops splitting the data when data within the subsets are sufficiently homogeneous. The decision tree splits the nodes on all available variables and then selects the split which results in most homogeneous sub-nodes.

### The decision tree can be used for both classification and regression problems, but they work differently.

#entropy #information-gain #decision-tree #gini-index #machine-learning