Implementation of Pandas and Tensorflow: Classification of IBM employee attrition

Classification is one of the major topics in machine learning. Some classification problems might not even have numbers to do analysis on.

In this article, I will be classifying IBM employee attrition using a neural network from Tensorflow. First, the model will be built with 80% employees as training data sets, and later with the model, 20% of employees will be tested based on their information and the probability of their attrition from their job will be predicted by the same trained model.

#neural-networks #tensorflow #machine-learning #pandas #ai

What is GEEK

Buddha Community

Implementation of Pandas and Tensorflow: Classification of IBM employee attrition

Implementation of Pandas and Tensorflow: Classification of IBM employee attrition

Classification is one of the major topics in machine learning. Some classification problems might not even have numbers to do analysis on.

In this article, I will be classifying IBM employee attrition using a neural network from Tensorflow. First, the model will be built with 80% employees as training data sets, and later with the model, 20% of employees will be tested based on their information and the probability of their attrition from their job will be predicted by the same trained model.

#neural-networks #tensorflow #machine-learning #pandas #ai

Udit Vashisht

1586702221

Python Pandas Objects - Pandas Series and Pandas Dataframe

In this post, we will learn about pandas’ data structures/objects. Pandas provide two type of data structures:-

Pandas Series

Pandas Series is a one dimensional indexed data, which can hold datatypes like integer, string, boolean, float, python object etc. A Pandas Series can hold only one data type at a time. The axis label of the data is called the index of the series. The labels need not to be unique but must be a hashable type. The index of the series can be integer, string and even time-series data. In general, Pandas Series is nothing but a column of an excel sheet with row index being the index of the series.

Pandas Dataframe

Pandas dataframe is a primary data structure of pandas. Pandas dataframe is a two-dimensional size mutable array with both flexible row indices and flexible column names. In general, it is just like an excel sheet or SQL table. It can also be seen as a python’s dict-like container for series objects.

#python #python-pandas #pandas-dataframe #pandas-series #pandas-tutorial

Oleta  Becker

Oleta Becker

1602550800

Pandas in Python

Pandas is used for data manipulation, analysis and cleaning.

What are Data Frames and Series?

Dataframe is a two dimensional, size mutable, potentially heterogeneous tabular data.

It contains rows and columns, arithmetic operations can be applied on both rows and columns.

Series is a one dimensional label array capable of holding data of any type. It can be integer, float, string, python objects etc. Panda series is nothing but a column in an excel sheet.

How to create dataframe and series?

s = pd.Series([1,2,3,4,56,np.nan,7,8,90])

print(s)

Image for post

How to create a dataframe by passing a numpy array?

  1. d= pd.date_range(‘20200809’,periods=15)
  2. print(d)
  3. df = pd.DataFrame(np.random.randn(15,4), index= d, columns = [‘A’,’B’,’C’,’D’])
  4. print(df)

#pandas-series #pandas #pandas-in-python #pandas-dataframe #python

Analysis of Attrition in IBM

Let us see the factors which are responsible for employee attrition in IBM

What factors cause an employee to quit his job? This question is one of the biggest concerns of the organizations because the company’s success depends on its ability to retain its top talent. To understand this problem and its solution, companies are using human resource analytics. Human resource analytics is the application of data analytics within the company to make better decisions related to human resources. Attrition evaluation and prediction is part of Human resource analytics. However, the question is, ‘what is attrition?’

Image for post

Photo by Kevin Ku on Unsplash

Attrition is a reduction in the workforce of a company. It is one of the biggest problems in organizations because of its cost. For a company, it is costly to replace the attired employee with the new employee because it contains the cost like new hire training cost, job posting, and paperwork. High cost associated with attrition is not the only reason why companies should address this problem because when employees change the company, they also carry the company’s valuable information with them that can be very beneficial for the other organization and can lead to competitive advantage. So, there is a massive need to find the key factors responsible for attrition.

In this article, I will try to find out the significant reasons or factors that are responsible for attrition by using the IBM human resource performance dataset.

IBM HR Performance Dataset

This dataset is downloaded from here. It contains 1470 instances, and 35 attributes, including the class variable.

Image for post

Data Head in Python

#employee-attrition #ibm #tableau #data-science #weka #data analysis

WORKING WITH GROUPBY IN PANDAS

In my last post, I mentioned the groupby technique  in Pandas library. After creating a groupby object, it is limited to make calculations on grouped data using groupby’s own functions. For example, in the last lesson, we were able to use a few functions such as mean or sum on the object we created with groupby. But with the aggregate () method, we can use both the functions we have written and the methods used with groupby. I will show how to work with groupby in this post.

#pandas-groupby #python-pandas #pandas #data-preprocessing #pandas-tutorial