With the trend of online shopping malls replacing traditional malls, more and more people are getting interested in becoming an online seller.

With the trend of online shopping malls replacing traditional malls, more and more people are getting interested in becoming an online seller.

The purpose of this article is to give some insights to online sellers who may be interested in finding the characteristics of product postings that might increase the sale of their products. The data used for this project is the query results of typing ‘keyboard’ in ebay.com and it was scraped using ‘BeautifulSoup’.

The raw data is messy and there are lots of duplicate product postings as eBay has an option for users to opt for automatically re-listing the item if it doesn’t sell. Also there’s a lot of cleaning to do such as stripping out less meaningful strings, converting data types, removing sparse columns, etc.

With initial datasets cleaned, there were 5,211 observations left which are then split again with 7:3 ratio. There’s still more engineering to do such as imputating missing values, checking multicollinearity, feature-engineering, etc.

Let’s check which variables have missing values.

```
np.sum(pd.isna(x_train), axis =0)
price 7
rating 3418
num_ratings 0
watcher 0
shipping 2
free_return 0
open_box 2
pre_owned 2
refurbished 2
benefits_charity 0
price_present 0
rating_present 0
shipping_present 0
status_present 0
dtype: int64
```

There are 3,418 rating (92%) that are missing. Imputating with mean or median would underestimate the variance of ratings, which may not be an ideal solution. Here, we use MICE(**Imputation by Multiple imputation by chained equations**) which uses regression to predict the missing value with the other features. You can check out “MICE steps” from this link if you want more details: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074241/

Now let’s check the target’s distribution in a training set.

Distribution of target

It is extremely skewed to the right. This may not satisfy the Normal assumption about target in linear regression models. We may need to consider transformation of target or even Poisson regression since Poisson is skewed to the right when the mean is close to zero. Log-transformation can reduce the skewness.

In regression, there are more assumptions to check: linearity between each feature and the (transformed) target, interaction effects, and constant variance of residuals.

Of course, the assumptions are not going to be met perfectly, but they should at least be checked if we want to reduce bias of the estimated coefficients in the model.

Lets begin our machine learning journey. A Deep Dive into Linear Regression. Why is this not learning? Because if you change the training data or environment even slightly, the algorithm will go haywire! Not how learning works in humans. If you learned to play a video game by looking straight at the screen, you would still be a good player if the screen is slightly tilted by someone, which would not be the case in ML algorithms.

What is regression analysis in simple words? How is it applied in practice for real-world problems?

Machine learning algorithms are not your regular algorithms that we may be used to because they are often described by a combination of some complex statistics and mathematics.

Regression analysis is a core approach in the fields of machine learning and statistics. There are many type of regression analysis — logistic regression, linear regression, polynomial regression, etc. For this post we’ll be focusing on Linear Regression.

In this tutorial, you’re going to learn a variety of Python tricks that you can use to write your Python code in a more readable and efficient way like a pro.