Rusty  Shanahan

Rusty Shanahan


The EDA Theoretical Guide

Most data analysis problems start with understanding the data. It is the most crucial and complicated step. This step also affects the further decisions that we make in a predictive modeling problem, one of which is what algorithm we are going to choose for a problem.

In this article, we will see a complete tough guide for such a problem.


  1. Reading Data
  2. Variable Identification
  3. Univariate analysis
  4. Bivariate analysis
  5. Missing values- types and analysis
  6. Outlier treatment
  7. Variable Transformation

Reading data and Variable Identification

Reading the data infers getting the answers to the following questions

  • What is the shape of my data?
  • How many features does my data contain?
  • What does it look like?
  • What are the types of variables?

What is UA?

When we explore a single variable at a time from a given list of features, its called UA. We summarize the variable and help us better understand the data.

We see for the following things in UA

  • Central tendency (mean, median, mode) and dispersion of the variable
  • Distribution of variable- symmetric, right-skewed or left-skewed
  • Missing values and outliers
  • Count and count percent: Observing the frequency of each category in a categorical variable helps us to understand and deal with that variable.

#data-analysis #ai #data-visualization #machine-learning #data analysis

What is GEEK

Buddha Community

The EDA Theoretical Guide
Murray  Beatty

Murray Beatty


Help Your Data Science Career By Publishing Your Work!

This guide aims to cover everything that a data science learner may need to write and publish articles on the internet. It covers why you should write, writing advice for new writers, and a list of places that invite contributions from new writers.

Let’s get to it!

Why you should write:

Writing isn’t just for “writers”. The art of writing well is for everyone to learn - programmers, marketers, managers and leaders, alike. And yes, data scientists and analysts too!

You should write articles because when you do:

You learn:

Writing teaches you the art of writing. It’s kind of circular but it’s true.

Make no mistake, the art of writing isn’t about grammar (although, that’s important) and flowery language (definitely not important). It’s about conveying your thoughts with clarity in simple language.

And learning this art is important even if you absolutely know that you don’t want to write blogs/articles for a living. It’s important because all the jobs have some form of writing involved - messages, emails, memos and the whole spectrum. So basically, writing is a medium for almost any job you can have.

Apart from that, when you write you learn the things that you thought you knew but didn’t really know. So, writing is an opportunity to learn better.

#data science career tips #guide #guides #publishing work #writing guide

Mya  Lynch

Mya Lynch


A Complete Guide To Bamboolib - GUI Tool for Analyzing Pandas

Analyzing and Visualizing the data is the most important and time taking process. We need to invest a lot of time in order to clearly analyze what the data is all about and what it is trying to tell. We use different types of python libraries and functions to visualize the patterns and anomalies in the dataset in order to get familiar with the dataset.

Bamboolib is GUI for pandas DataFrames that enables anyone to work with python in Jupyter Notebook or JupyterLab. Bamboolib is a highly interactive and extensively helpful library in order to analyze, visualize, and manipulate the data. Even a person with a non-technical background can use it to draw insights from data because it does not require any coding experience.

Bamboolib is used by more than 100 companies and it allows data analysts to work with python even without writing code. Bamboolib is not open-source which means that you need to buy bamboolib in order to use it, but it provides a 14-day free trial version so that you can fully explore it and see how it can be useful for you.

In this article, we will explore different uses of bamboolib and see how it saves time and effort. We will explore different functions that bamboolib provides and also export the code used for that functionality.

Implementation of Bamboolib:

For exploring bamboolib we first need to register on their website for a 14 days free trial. After registering you will receive an email with the activation key on registered email-id. Like any other python library, we need to install bamboolib using pip install bamboolib.

  1. Importing required libraries

We will need to import pandas for loading the dataset and bamboolib for visualizing the dataset.

import bamboolib as bam

import pandas as pd

  1. Loading the dataset

We will be using a car design dataset here, which contains different attributes related to Automobile Manufacturing companies. You can download this dataset from Kaggle. We will use pandas to load this dataset.

df = pd.read_csv(‘car_design.csv’)

  1. Analyzing the dataset

This is the main step where we will analyze and visualize the dataset using bamboolib.

#developers corner #automating eda #data analytics #eda #pandas #plotly #python pandas #visualization

Exploratory Data Analysis: Dataprep.eda vs Pandas-Profiling

Use the right tool for Exploratory Data Analysis (EDA)

#data-analysis #eda #python #dataprep.eda #pandas #pandas-profiling

Lisa joly

Lisa joly


Big Data Resume: Complete Guide & Samples [2021]

Thanks to the rapidly piling amounts of Big Data, the job profile of a Big Data Engineer is peaking.

In recent years, there has been such unprecedented growth in the demand for Big Data Engineers that it has become one of the top-ranking jobs in Data Science today. Since numerous companies across different industries are hiring Big Data Engineers, there’s never been a better time than now to build a career in Big Data. However, you must know how to present yourself as different from the others; you need to stand out from the crowd. Read the blog to have a better understanding of the scope of Big Data in India.

And how will you do that?

By designing and crafting a detailed, well-structured, and eye-catching Big Data resume!

When applying for a Big Data job, or rather for the post of a Big Data Engineer, your resume is the first point of contact between you and your potential employer. If your resume impresses an employer, you will be summoned for a personal interview. So, the key is to make sure you have a fantastic resume that can get you job interview calls.

Usually, Hiring Managers have to look at hundreds of resumes, be it for any job profile. However, when it comes to high-profile jobs like that of the Big Data Engineer, you must be able to grab the attention of the Hiring Manager by highlighting your skills, qualifications, certifications, and your willingness to upskill.

Let’s begin the resume-building process with the job description and key roles and responsibilities of a Big Data Engineer.

Table of Contents

#big data #big data resume: complete guide & samples #big data resume #big data resume #data science resume #guide

Ian  Robinson

Ian Robinson


Real-Time Data Analytics: Guiding and Improving Business Decisions

Real-time data analytics help in improving business operations by analyzing and processing data chunks to provide instant insights.

Data, also known as the digital currency, is the fuel for modern businesses. The present-day enterprises are constantly bombarded with a humongous amount of data, which needs to be collected, processed, and analyzed. Hence, it is difficult to deliver useful business outcomes instantly. Real-time data analytics resolves the time lag between data collection and processing.

Gartner defines real-time analytics as, “the discipline that applies logic and mathematics to data to provide insights for making better decisions quickly. For some use cases, real-time simply means the analytics is completed within a few seconds or minutes after the arrival of new data.”

Accuracy and speed are crucial in data analytics. The modern business world needs real-time data analytics to efficiently deliver information, minimize costs and downtimes, and improve business decisions.

Benefits of Real-Time Data Analytics

#big data #latest news #real-time data analytics #improving business decisions #guiding #real-time data analytics: guiding and improving business decisions