User is working with your application, suddenly, UI freezes and probably, one of the CPU cores is burning! They cannot do anything. The only perception you can feel is as hot as a hell metal case of the laptop. Although this sounds like a horror movie, this is your application that cannot leverage modern APIs to lift heavy computation to a different thread where consequently user suffers the pain.
In this session, I am going to show my experience running jobs in parallel on a React.Js application that will provide a pleasant user experience and exciting development.
So you, my dear Python enthusiast, have been learning Pandas and Matplotlib for a while and have written a super cool code to analyze your data and visualize it. You are ready to run your script that reads a huge file and all of a sudden your laptop starts making un ugly noise and burning like hell. Sounds familiar?
Well, I have got a couple of good news for you: this issue doesn’t need to happen anymore and you no, you don’t need to upgrade your laptop or your server.
Dask is a flexible library for parallel computing with Python. It provides multi-core and distributed parallel execution on larger-than-memory datasets. It figures out how to break up large computations and route parts of them efficiently onto distributed hardware.
A massive cluster is not always the right choice
Today’s laptops and workstations are surprisingly powerful and, if used correctly, can handle datasets and computations for which we previously depended on clusters. A modern laptop has a multi-core CPU, 32GB of RAM, and flash-based hard drives that can stream through data several times faster than HDDs or SSDs of even a year or two ago.
As a result, Dask can empower analysts to manipulate 100GB+ datasets on their laptop or 1TB+ datasets on a workstation without bothering with the cluster at all.
The project has been a massive plus for the Python machine learning Ecosystem because it democratizes big data analysis. Not only can you save money on bigger servers, but also it copies the Pandas API so you can run your Panda script changing very few lines of code.
#making pandas fast with dask parallel computing #dask parallel computing #pandas #pandas fast #dask #dask parallel
ByteCipher is one of the leading React JS app development Companies. We offer innovative, efficient and high performing app solutions. As a ReactJS web development company, ByteCipher is providing services for customized web app development, front end app development services, astonishing react to JS UI/UX development and designing solutions, reactJS app support and maintenance services, etc.
#reactjs development company usa #reactjs web development company #reactjs development company in india #reactjs development company india #reactjs development india
The final objective is to estimate the cost of a certain house in a Boston suburb. In 1970, the Boston Standard Metropolitan Statistical Area provided the information. To examine and modify the data, we will use several techniques such as data pre-processing and feature engineering. After that, we'll apply a statistical model like regression model to anticipate and monitor the real estate market.
Before using a statistical model, the EDA is a good step to go through in order to:
# Import the libraries #Dataframe/Numerical libraries import pandas as pd import numpy as np #Data visualization import plotly.express as px import matplotlib import matplotlib.pyplot as plt import seaborn as sns #Machine learning model from sklearn.linear_model import LinearRegression
#Reading the data path='./housing.csv' housing_df=pd.read_csv(path,header=None,delim_whitespace=True)
Crime: It refers to a town's per capita crime rate.
ZN: It is the percentage of residential land allocated for 25,000 square feet.
Indus: The amount of non-retail business lands per town is referred to as the indus.
CHAS: CHAS denotes whether or not the land is surrounded by a river.
NOX: The NOX stands for nitric oxide content (part per 10m)
RM: The average number of rooms per home is referred to as RM.
AGE: The percentage of owner-occupied housing built before 1940 is referred to as AGE.
DIS: Weighted distance to five Boston employment centers are referred to as dis.
RAD: Accessibility to radial highways index
TAX: The TAX columns denote the rate of full-value property taxes per $10,000 dollars.
B: B=1000(Bk — 0.63)2 is the outcome of the equation, where Bk is the proportion of blacks in each town.
PTRATIO: It refers to the student-to-teacher ratio in each community.
LSTAT: It refers to the population's lower socioeconomic status.
MEDV: It refers to the 1000-dollar median value of owner-occupied residences.
# Check if there is any missing values. housing_df.isna().sum() CRIM 0 ZN 0 INDUS 0 CHAS 0 NOX 0 RM 0 AGE 0 DIS 0 RAD 0 TAX 0 PTRATIO 0 B 0 LSTAT 0 MEDV 0 dtype: int64
No missing values are found
We examine our data's mean, standard deviation, and percentiles.
The crime, area, sector, nitric oxides, 'B' appear to have multiple outliers at first look because the minimum and maximum values are so far apart. In the Age columns, the mean and the Q2(50 percentile) do not match.
We might double-check it by examining the distribution of each column.
Because the model is overly generic, removing all outliers will underfit it. Keeping all outliers causes the model to overfit and become excessively accurate. The data's noise will be learned.
The approach is to establish a happy medium that prevents the model from becoming overly precise. When faced with a new set of data, however, they generalise well.
We'll keep numbers below 600 because there's a huge anomaly in the TAX column around 600.
The overall distribution, particularly the TAX, PTRATIO, and RAD, has improved slightly.
Perfect correlation is denoted by the clear values. The medium correlation between the columns is represented by the reds, while the negative correlation is represented by the black.
With a value of 0.89, we can see that 'MEDV', which is the medium price we wish to anticipate, is substantially connected with the number of rooms 'RM'. The proportion of black people in area 'B' with a value of 0.19 is followed by the residential land 'ZN' with a value of 0.32 and the percentage of black people in area 'ZN' with a value of 0.32.
The metrics that are most connected with price will be plotted.
Gradient descent is aided by feature scaling, which ensures that all features are on the same scale. It makes locating the local optimum much easier.
Mean standardization is one strategy to employ. It substitutes (target-mean) for the target to ensure that the feature has a mean of nearly zero.
def standard(X): '''Standard makes the feature 'X' have a zero mean''' mu=np.mean(X) #mean std=np.std(X) #standard deviation sta=(X-mu)/std # mean normalization return mu,std,sta mu,std,sta=standard(X) X=sta X
For the sake of the project, we'll apply linear regression.
Typically, we run numerous models and select the best one based on a particular criterion.
Linear regression is a sort of supervised learning model in which the response is continuous, as it relates to machine learning.
Form of Linear Regression
y= θX+θ1 or y= θ1+X1θ2 +X2θ3 + X3θ4
y is the target you will be predicting
0 is the coefficient
x is the input
We will Sklearn to develop and train the model
#Import the libraries to train the model from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression
Allow us to utilise the train/test method to learn a part of the data on one set and predict using another set using the train/test approach.
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.4) #Create and Train the model model=LinearRegression().fit(X_train,y_train) #Generate prediction predictions_test=model.predict(X_test) #Compute loss to evaluate the model coefficient= model.coef_ intercept=model.intercept_ print(coefficient,intercept) [7.22218258] 24.66379606613584
In this example, you will learn the model using below hypothesis:
Price= 24.85 + 7.18* Room
It is interpreted as:
For a decided price of a house:
A 7.18-unit increase in the price is connected with a growth in the number of rooms.
As a side note, this is an association, not a cause!
You will need a metric to determine whether our hypothesis was right. The RMSE approach will be used.
Root Means Square Error (RMSE) is defined as the square root of the mean of square error. The difference between the true and anticipated numbers called the error. It's popular because it can be expressed in y-units, which is the median price of a home in our scenario.
def rmse(predict,actual): return np.sqrt(np.mean(np.square(predict - actual))) # Split the Data into train and test set X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.4) #Create and Train the model model=LinearRegression().fit(X_train,y_train) #Generate prediction predictions_test=model.predict(X_test) #Compute loss to evaluate the model coefficient= model.coef_ intercept=model.intercept_ print(coefficient,intercept) loss=rmse(predictions_test,y_test) print('loss: ',loss) print(model.score(X_test,y_test)) #accuracy [7.43327725] 24.912055881970886 loss: 3.9673165450580714 0.7552661033654667 Loss will be 3.96
This means that y-units refer to the median value of occupied homes with 1000 dollars.
This will be less by 3960 dollars.
While learning the model you will have a high variance when you divide the data. Coefficient and intercept will vary. It's because when we utilized the train/test approach, we choose a set of data at random to place in either the train or test set. As a result, our theory will change each time the dataset is divided.
This problem can be solved using a technique called cross-validation.
With 'Forward Selection,' we'll iterate through each parameter to assist us choose the numbers characteristics to include in our model.
We'll use a random state of 1 so that each iteration yields the same outcome.
cols= los= los_train= scor= i=0 while i < len(high_corr_var): cols.append(high_corr_var[i]) # Select inputs variables X=new_df[cols] #mean normalization mu,std,sta=standard(X) X=sta # Split the data into training and testing X_train,X_test,y_train,y_test= train_test_split(X,y,random_state=1) #fit the model to the training lnreg=LinearRegression().fit(X_train,y_train) #make prediction on the training test prediction_train=lnreg.predict(X_train) #make prediction on the testing test prediction=lnreg.predict(X_test) #compute the loss on train test loss=rmse(prediction,y_test) loss_train=rmse(prediction_train,y_train) los_train.append(loss_train) los.append(loss) #compute the score score=lnreg.score(X_test,y_test) scor.append(score) i+=1
We have a big 'loss' with a smaller collection of variables, yet our system will overgeneralize in this scenario. Although we have a reduced 'loss,' we have a large number of variables. However, if the model grows too precise, it may not generalize well to new data.
In order for our model to generalize well with another set of data, we might use 6 or 7 features. The characteristic chosen is descending based on how strong the price correlation is.
high_corr_var ['RM', 'ZN', 'B', 'CHAS', 'RAD', 'DIS', 'CRIM', 'NOX', 'AGE', 'TAX', 'INDUS', 'PTRATIO', 'LSTAT']
With 'RM' having a high price correlation and LSTAT having a negative price correlation.
# Create a list of features names feature_cols=['RM','ZN','B','CHAS','RAD','CRIM','DIS','NOX'] #Select inputs variables X=new_df[feature_cols] # Split the data into training and testing sets X_train,X_test,y_train,y_test= train_test_split(X,y, random_state=1) # feature engineering mu,std,sta=standard(X) X=sta # fit the model to the trainning data lnreg=LinearRegression().fit(X_train,y_train) # make prediction on the testing test prediction=lnreg.predict(X_test) # compute the loss loss=rmse(prediction,y_test) print('loss: ',loss) lnreg.score(X_test,y_test) loss: 3.212659865936143 0.8582338376696363
The test set yielded a loss of 3.21 and an accuracy of 85%.
Other factors, such as alpha, the learning rate at which our model learns, could still be tweaked to improve our model. Alternatively, return to the preprocessing section and working to increase the parameter distribution.
For more details regarding scraping real estate data you can contact Scraping Intelligence today
In this era of computation power greed, we tend to forget to use the power we can utilize on our very computers
The hunger for computation power among programmers, gamers, scientists, software developers, and most humans that know how to use a computer, in general, is immense. We are always looking for applications that are less compute-intensive and more efficient. This allows us to make use of our computer setups more efficiently.
However, many of us do not fully utilize the computation power already available to us on our computers. Utilizing this power when needed can lead to exponentially better performances and usually, you can run the processes 2–3 times faster with some changes in code. How do we do this you ask? Well, let’s dive in.
This blog focuses on parallel programming. i.e, running a program on multiple processors simultaneously. When you run your program it usually uses one of the cores in your computer. However, most computers have multiple cores. Depending on your processor it maybe dual-core, quad-core, octa-core, or may contain more cores. If (let’s say) you have a quad-core processor running a program only on one core, you are essentially letting go of the other three cores and hence three times the computation power you are using. Using all these cores can theoretically speed up your tasks by four.
However, it is not so simple otherwise software companies would all be using all the cores all the time for better performance. If you want to increase the performance of your program you need to make sure that it can be parallelized. i.e, it can be run on different cores at once simultaneously.
Let us take a simple example to understand this point. Let us say you have to build a product and you divide its manufacturing into four stages. If you can go from stage 1 to stage 2 only when stage 1 is already completed, and then from stage 2 to stage 3 only when stage 2 is completed and so on. Then this process is a sequential process since you have to follow a sequence to execute your instructions. However, If you can break the manufacturing into four parts and assign it to different workers, then the process can be parallelized. e.g, building four components for a toy that can be joined once they are finished.
Once you have established that your program can be parallelized, the next step is to write the code for it. I will not write the code in this blog, however, the code for this blog can be found at this Github repo. I choose python to write the code and I used the multiprocessing module to run the program on multiple processors.
In this program, we will see two applications of parallel programming. First is Matrix Multiplication which can be easily parallelized and next we shall see prefix sum scan, which on the first look seems to be a sequential problem but can be parallelized to run on multiple processors.
#programming #computer-science #computers #gpu #parallel-computing
Hire ReactJS app developers for end-to-end services starting from development to customization with AppClues Infotech.
Are you looking for the best company in USA that provides high-quality ReactJS app development services? Having expertise in building robust and real-time mobile apps using React Native Library.
We can fully support your specific business idea with outstanding tech skills and deliver a perfect mobile app on time.
Our ReactJS App Development Services
• Custom ReactJS Development
• ReactJS Consulting
• React UX/UI development and design
• App modernization using React
• React Native mobile development
• Dedicated React development team
• Application migration to React
#top reactjs app development company in usa #hire best reactjs app developers #best reactjs app development services #custom reactjs app development agency #how to develop reactjs app #cost to build reactjs application