Angela  Dickens

Angela Dickens

1597696860

Pre-Processing in OCR!!!

Welcome to **_part II, _**in the seriesabout working of an OCR system. In the previous post, we briefly discussed the different phases of an OCR system.

Among all the phases of OCR, Preprocessing and Segmentation are the most important phases, as the accuracy of the OCR system highly depends upon how well Preprocessing and _Segmentation _are performed. So, here we are going to learn some of the most basic and commonly used preprocessing techniques on an image.

Let’s go…

The main objective of the Preprocessing phase is **_To make as easy as possible _**for the OCR system to distinguish a character/word from the background.

Some of the most basic and important**Preprocessing** techniques are:-

1) Binarization

2) Skew Correction

3) Noise Removal

4) Thinning and Skeletonization

Before discussing these techniques, let’s understand how an OCR system comprehends an image. For an OCR system, an Image is a multidimensional array (2D array if the image is grayscale (or) binary, 3D array if the image is coloured). Each cell in the matrix is called a pixel and it can store 8-bit integer which means the pixel range is 0–255.

Image for post

Image for post

Internal Representation of RGB image with Red, Green and Blue Channels. Source: left image from semantics scholar, right image from researchgate.

#pre-processing #ocr #image-processing #skewness #machine-learning #deep learning

What is GEEK

Buddha Community

Pre-Processing in OCR!!!
Angela  Dickens

Angela Dickens

1597696860

Pre-Processing in OCR!!!

Welcome to **_part II, _**in the seriesabout working of an OCR system. In the previous post, we briefly discussed the different phases of an OCR system.

Among all the phases of OCR, Preprocessing and Segmentation are the most important phases, as the accuracy of the OCR system highly depends upon how well Preprocessing and _Segmentation _are performed. So, here we are going to learn some of the most basic and commonly used preprocessing techniques on an image.

Let’s go…

The main objective of the Preprocessing phase is **_To make as easy as possible _**for the OCR system to distinguish a character/word from the background.

Some of the most basic and important**Preprocessing** techniques are:-

1) Binarization

2) Skew Correction

3) Noise Removal

4) Thinning and Skeletonization

Before discussing these techniques, let’s understand how an OCR system comprehends an image. For an OCR system, an Image is a multidimensional array (2D array if the image is grayscale (or) binary, 3D array if the image is coloured). Each cell in the matrix is called a pixel and it can store 8-bit integer which means the pixel range is 0–255.

Image for post

Image for post

Internal Representation of RGB image with Red, Green and Blue Channels. Source: left image from semantics scholar, right image from researchgate.

#pre-processing #ocr #image-processing #skewness #machine-learning #deep learning

Kasey  Turcotte

Kasey Turcotte

1623947400

One Line of Code for a Common Text Pre-Processing Step in Pandas

A quick look at splitting text columns for use in machine learning and data analysis

ometimes you’ll want to do some processing to create new variables out of your existing data. This can be as simple as splitting up a “name” column into “first name” and “last name”.

Whatever the case may be, Pandas will allow you to effortlessly work with text data through a variety of in-built methods. In this piece, we’ll go specifically into parsing text columns for the exact information you need either for further data analysis or for use in a machine learning model.

If you’d like to follow along, go ahead and download the ‘train’ dataset here. Once you’ve done that, make sure it’s saved to the same directory as your notebook and then run the code below to read it in:

import pandas as pd
df = pd.read_csv('train.csv')

Let’s get to it!

#programming #python #one line of code for a common text pre-processing step in pandas #pandas #one line of code for a common text pre-processing #text pre-processing

Brad  Hintz

Brad Hintz

1599207180

Pre-Processing and Model Training using Python

Pre-processing and model training go hand in hand in machine learning to mean that one cannot do without the other. The thing is, we humans interact with data we can understand, that is data written in natural language, and we expect our machine learning models to take in the same data and give us some insights. Well, machines only understand binary language (0s and 1s) and there must be a way for the machines to understand this same data. That’s where pre-processing comes in. Pre-processing is basically transforming the data in natural language to a form the machine can understand. This process is also referred to as encoding.

So how do we do pre-processing and model training to get insights from data? There are multiple ways to pre-process data and train machine learning models, for sure I don’t know all of them. In this post, we’ll look at some of the pre-processing methods and explore three machine learning algorithms we can use to train a model. Therefore, we’ll look into:

Pre-Processing

  • Data Cleaning
  • Categorical features encoding using OneHotEncoder
  • Numerical features scaling using StandardScaler
  • Dimensionality reduction using PCA, T-SNE and Autoencoders
  • Balancing classes by oversampling
  • Feature Extraction

Model Training

  • Logistic Regression
  • Random Forests
  • Decision Trees

For this post, we will be using bank campaign data which can be found here. In addition to the data, the source gives a full description of the features in the data. In brief, our data consist of 20 features describing a customer, 10 categorical features and 10 numerical features, and the target variable. Our target variable is a representation of whether a customer subscribed to a term deposit or not. The goal of this project would be to predict on which future customers will subscribe to the term deposits. For a full Exploratory Data Analysis of our dataset you can check my Tableau Dashboard or go through the same on this GitHub repository. This post will only focus on pre-processing and model training.

Pre-Processing

Data Cleaning

Data cleaning involves checking for missing values in a dataset and dropping the null rows or imputing the missing values depending on how many they are and their significance in our data. Data cleaning also involves looking for duplicates in our data and dropping them as they may significantly affect the effectiveness of the model. Data cleaning also involves checking for outliers in our data and replacing the data with median or mean based on how frequent they occur. This are just some of the few ways of data cleaning that we will explore in this post.

Missing Values

We can check for missing values in our data by calling dataset.info() and inspecting all the features in our dataset or simply running dataset.isnull().values.any() which will return True if there’s any null values in our dataset or False if there is none. We can then decide to drop values which contain unique data, or impute categorical features with mode data and numerical features with mean data. Our bank dataset has no missing values and we therefore, proceed to the next step.

#machine-learning #pre-processing #model-training #data-pre-processing

Harry Patel

Harry Patel

1614145832

A Complete Process to Create an App in 2021

It’s 2021, everything is getting replaced by a technologically emerged ecosystem, and mobile apps are one of the best examples to convey this message.

Though bypassing times, the development structure of mobile app has also been changed, but if you still follow the same process to create a mobile app for your business, then you are losing a ton of opportunities by not giving top-notch mobile experience to your users, which your competitors are doing.

You are about to lose potential existing customers you have, so what’s the ideal solution to build a successful mobile app in 2021?

This article will discuss how to build a mobile app in 2021 to help out many small businesses, startups & entrepreneurs by simplifying the mobile app development process for their business.

The first thing is to EVALUATE your mobile app IDEA means how your mobile app will change your target audience’s life and why your mobile app only can be the solution to their problem.

Now you have proposed a solution to a specific audience group, now start to think about the mobile app functionalities, the features would be in it, and simple to understand user interface with impressive UI designs.

From designing to development, everything is covered at this point; now, focus on a prelaunch marketing plan to create hype for your mobile app’s targeted audience, which will help you score initial downloads.

Boom, you are about to cross a particular download to generate a specific revenue through your mobile app.

#create an app in 2021 #process to create an app in 2021 #a complete process to create an app in 2021 #complete process to create an app in 2021 #process to create an app #complete process to create an app

Siphiwe  Nair

Siphiwe Nair

1622608260

Making Sense of Unbounded Data & Real-Time Processing Systems

Unbounded data refers to continuous, never-ending data streams with no beginning or end. They are made available over time. Anyone who wishes to act upon them can do without downloading them first.

As Martin Kleppmann stated in his famous book, unbounded data will never “complete” in any meaningful way.

“In reality, a lot of data is unbounded because it arrives gradually over time: your users produced data yesterday and today, and they will continue to produce more data tomorrow. Unless you go out of business, this process never ends, and so the dataset is never “complete” in any meaningful way.”

— Martin Kleppmann, Designing Data-Intensive Applications

Processing unbounded data requires an entirely different approach than its counterpart, batch processing. This article summarises the value of unbounded data and how you can build systems to harness the power of real-time data.

#stream-processing #software-architecture #event-driven-architecture #data-processing #data-analysis #big-data-processing #real-time-processing #data-storage