Loan Default Prediction with Berka Dataset

Loan Default Prediction with Berka Dataset

An End-to-end ML Project with Real Bank Data. For banks, it is always an interesting and challenging problem to predict how likely a client is going to default the loan when they only have a handful of information. In the modern era, the data science teams in the banks build predictive models using machine learning.

Introduction

For banks, it is always an interesting and challenging problem to predict how likely a client is going to default the loan when they only have a handful of information. In the modern era, the data science teams in the banks build predictive models using machine learning. The datasets used by them are most likely to be proprietary and are usually collected internally through their daily businesses. In other words, there are not many real-world datasets that we can use if we want to work on such financial projects. Fortunately, there is an exception: the** Berka Dataset**.

About the Dataset

The Berka Dataset, or the PKDD’99 Financial Dataset, is a collection of real anonymized financial information from a Czech bank, used for PKDD’99 Discovery Challenge. The dataset can be accessed from my GitHub page.

In the dataset, 8 raw files include 8 tables:

  • account (4500 objects in the file ACCOUNT.ASC) — each record describes static characteristics of an account.
  • client (5369 objects in the file CLIENT.ASC) — each record describes characteristics of a client.
  • disposition (5369 objects in the file DISP.ASC) — each record relates together a client with an account i.e. this relation describes the rights of clients to operate accounts.
  • permanent order (6471 objects in the file ORDER.ASC) — each record describes characteristics of a payment order.
  • transaction (1056320 objects in the file TRANS.ASC) — each record describes one transaction on an account.
  • loan (682 objects in the file LOAN.ASC) — each record describes a loan granted for a given account.
  • *credit card *(892 objects in the file CARD.ASC) — each record describes a credit card issued to an account.
  • demographic data (77 objects in the file DISTRICT.ASC) — each record describes demographic characteristics of a district.
  • Each account has both static characteristics (e.g. date of creation, address of the branch) given in relation “account” and dynamic characteristics (e.g. payments debited or credited, balances) given in the relations “permanent order” and “transaction”.
  • Relation “client” describes the characteristics of persons who can manipulate the accounts.
  • One client can have more accounts, more clients can manipulate with a single account; clients and accounts are related together in relation “disposition”.
  • Relations “loan” and “credit card” describe some services which the bank offers to its clients.
  • More than one credit card can be issued to an account.
  • At most one loan can be granted for an account.
  • Relation “demographic data” gives some publicly available information about the districts (e.g. the unemployment rate); additional information about the clients can be deduced from this.

data-science python banking machine-learning mysql

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

MySQL with Python | Data Science | Machine Learning | Python

In this article you will learn to create databases, manipulate databases, and will also learn some operations on handling databases in MySQL with Python.

Applied Data Analysis in Python Machine Learning and Data Science | Scikit-Learn

Applied Data Analysis in Python Machine learning and Data science, we will investigate the use of scikit-learn for machine learning to discover things about whatever data may come across your desk.

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.