Angela  Dickens

Angela Dickens

1595040900

Detailed Solution to Mercedes Benz Green Manufacturing Competition

Table of Contents:

  1. Business Problem
  2. Problem Statement
  3. Data Preparation
  4. Exploratory Data Analysis
  5. Feature Engineering
  6. Data Preprocessing
  7. Feature Selection
  8. Modeling
  9. Summary
  10. Predictions on Test Data
  11. Conclusion and Future Work
  12. References

1. Business Problem:

Vehicle Testing is an important aspect in the automobile manufacturing process. Every vehicle must pass a certain standard before it is delivered to the customer. Mercedes Benz offers a wide range of vehicles with different customization. Each vehicle must undergo testing in order to ensure vehicle satisfies the safety requirements and meets the emission norms. Each model requires a different test stand configuration due to the customization. Since the number of models are more, large number of tests need to be conducted. More tests result in more time spent on the test stand increasing costs for Mercedes-Benz and generating higher carbon dioxide (green house pollutant gas).

The Mercedes Benz Green Manufacturing Competition hosted on Kaggle intends to optimize the vehicle testing process by developing a machine learning model which can predict the time spent by a vehicle on test stand. The ultimate goal is to reduce the time spent on a test stand which will result in reduced carbon dioxide emissions during testing phase. The dataset for this study is provided by Mercedes Benz. The data can be downloaded from this link.

2. Problem Statement:

The task is to develop a machine learning model that can predict the time a car will spend on the test bench based on the vehicle configuration. The vehicle configuration is defined as the set of customization options and features selected for the particular vehicle. The motivation behind the problem is that an accurate model will be able to reduce the total time spent on testing vehicles by allowing cars with similar testing configurations to be run successively. This problem is a supervised machine learning regression task since it involves predicting a continuous target variable based on a bunch of independent variables by learning for a labelled training data.

The evaluation metric is R-squared also known as co-efficient of determination. It quantifies the percentage of variation in target variables that is explained by the features. R-squared value can lie between 0 and 1. The best possible value of R-squared is 1 which indicates that all the variation in target variables is explained by the input features.

3. Data Preparation:

Two datasets are provided by Mercedes-Benz for this competition namely train.csv and test.csv. The file train.csv is the labeled dataset on which machine learning model has to be developed. The file test.csv is the dataset on which predictions are to be made. Both training and test data contain 377 features which represent the vehicle configuration during the vehicle testing phase. The features have names such as ‘X0’, ‘X1’, ‘X2’ and so on. There is a feature ‘ID’ which represents the ID assigned to each vehicle test. The features are anonymous and do not have any physical representation. The description of the data states that these features are configuration options such as suspension setting, adaptive cruise control, all-wheel drive and a number of different options that together define a car model. A subset of training data in shown in image below.

#Load dataset
data = pd.read_csv('train.csv')
data.head()

Image for post

There are 377 features out of which 368 are binary, 8 are categorical and 1 is continuous. The target variable y is continuous value which represents the time taken by vehicle for testing in seconds. There are no missing values present in the dataset. The train.csv file is split into training and validation set. Below image shows the code for this operation.

#Separate the dependent and independent features
X = data.drop(columns=['y'])
Y = data['y']

#Split the dataset
X_train, X_val, y_train, y_val = train_test_split(X, Y, 
test_size=0.2, random_state=25)
#Concatenate X_train and y_train
train_data = pd.merge(X_train,y_train.to_frame(),left_index=True, right_index=True)
train_data.head()

Image for post

Exploratory Data Analysis (EDA) is performed on train_data mentioned in above code.

4. Exploratory Data Analysis:

4.1. Analyze target/dependent variable:

Below image contains the histogram and box-plot of target variable.

Image for post

The target variable has a mean of 100 seconds. Points with target values above 137.5 can be inferred as outlier points from boxplot. For this competition, points above 150 are classified as outlier points and they are removed from the training set.

4.2. Analyze categorical variables:

There are 8 categorical features namely X0, X1, X2, X3, X4, X5, X6, and X8. For each of these features histogram of counts of unique values and boxplot of unique values is plotted.

4.2.1. X0 feature:

Image for post

Observations from above plots:

  1. aa, ab, g and ac occur only once.
  2. The box-plots of z, y, t, o, f, n, s, al, m, ai, e, ba, aq, am, u, i, ad and b are nearly same. The mean of these categories is nearly 93.
  3. The box-plots of ak, x, j, w, af, at, a, ax, i, au, as, r and c are nearly same. The mean of these categories is nearly 110.
  4. Thus there appears to be grouping among different categories of X0.

4.2.2. X1 feature:

Image for post

Observations from above plots:

  1. Most of the categories of X1 have mean of 100.
  2. y of X1 category is clearly separated from rest of the categories.

4.2.3. X2 feature:

Image for post

Observations from above plots:

  1. ae category dominates in X2. 39% values of X2 are ae.
  2. Similar to X0 there appears to be grouping in X2. X2 has less grouping than X0.
  3. Most of categories of X2 have mean close to 97.

4.2.4. X3 feature:

Image for post

Observations from above plots:

  1. c category dominates in X3. 46% values of X3 are c.
  2. Almost all the categories of X3 have mean of 100.
  3. There appears to be less variation in dependent variable y across the categories of X3. The box-plots for most of the categories of X3 match.

4.2.5. X4 feature:

Image for post

Observations from above plots:

  1. d category dominates in X4. 99.9% values of X4 are d.
  2. This feature must be dropped as there is no variance present in the feature.

4.2.6. X5 feature:

Image for post

Observations from above plots:

  1. x, h, g, y and u occur very rarely in the data.
  2. The mean of all categories of X5 is close to 98.
  3. There appears to be less variation in dependent variable y across the categories of X5. The boxplots for most of the categories of X5 match.

#deep-learning #machine-learning #kaggle #stacking #top-5 #deep learning

What is GEEK

Buddha Community

Detailed Solution to Mercedes Benz Green Manufacturing Competition

Latest Technology Solution Development - WebClues Infotech

Latest IT Tech Solution Development Company

The technology in the IT sector is rapidly growing with everything in the world moving online to make users life easy with it. This development in technology has allowed critical industries to also move online with technologies like blockchain, Artificial intelligence, Cloud Computing, Big Data Service, etc.

Want to use the latest technologies in IT for your business?

WebClues Infotech with its policy to train employees with the latest technologies like Blockchain, Wearables app, Chatbot app, AI and many more is the leader in the development of those technologies. With a highly-skilled team of 120+ people there can be no better option for your development requirements in the latest techs.

Want to know more about the technologies we provide solutions in?

Visit: https://www.webcluesinfotech.com/latest-technology-development/

Share your requirements https://www.webcluesinfotech.com/contact-us/

View Portfolio https://www.webcluesinfotech.com/portfolio/

#latest it tech solution development company #it tech solution development company #it tech solution #technology solution development #it path solutions #tech solution india

How Do I Pass the AWS Solutions Architect Associate Exam? Careerera

The AWS solutions architect associate exam is one of the most difficult certification exams in the world. There are many certifications for various things such as the PMP certification and the CISSP certification, but it is the AWS solutions architect associate exam which takes the crown when it comes to difficulty.

How to Qualify an AWS Certified Solutions Architect Associate Exam?

Naturally, given the difficulty of the exam many people wonder, “How Do I Pass the AWS Solutions Architect Associate Exam?” on the first attempt. This is a very valid question and, in this article, we will discuss all the ways in which one can maximize his chances of passing the AWS solutions architect associate exam in the first attempt.

Before starting we must remember one thing – the AWS platform is a beast of a platform and is vast beyond comprehension. So, for a beginner looking to take the AWS solutions architect associate exam it will not be possible to cover all the topics and services related to the AWS platform.

However, one can make one’s best attempt to cover all the core concepts and topics which are most relevant and pertinent to the AWS platform. A candidate for the AWS solutions architect associate exam in particular should keep himself updated on all the most recent advances and developments in the field of AWS.

Which services are tested most frequently on the exam?

  1. Amazon EC2 -

This service is used to create virtual machines which are offsite. It is also used to manage things such as ports, security, and storage because of its many features. It allows the users to utilize AWS’ vast computing capabilities on-demand. With a ‘scale as you grow’ philosophy the user is not trapped into an agreement in which they will have to purchase excessive resources from Amazon.
They only have to make use of as many resources as they need. The Amazon EC2 web interface allows the software developers to configure and resize the compute machines to their heart’s content with minimal friction and absolutely no confusion whatsoever. All decent AWS Solutions Architect Associate Certification courses teach how to make use of this service.

  1. Amazon RDS -

The full form of Amazon RDS is Amazon Relational Database Service. It is an extremely useful service launched by Amazon. It is used by software developers to create a database with all the features of a full-fledged offline database in a matter of minutes.

The main purpose of Amazon RDS is to set up relational databases in the cloud. These databases can be set up, operated, and scaled very easily and very smoothly. Amazon provides a very seamless and slick interface which is ideal for operating the databases thus created.

The databases are very cost-effective and can be resized very easily. They provide all facilities to carry out administrative tasks such as hardware provisioning, database setup, patching and backups. With the help of these databases, one will be able to give their applications fast performance, high availability, security and compatibility.

  1. Amazon S3 -

The full form of Amazon S3 is Amazon Simple Storage Service. It is a service which provides a lot of expediency to software developers in the matter of storage. Software developers can create objects through programming and they can then store those objects which they have coded into the Amazon S3 service.

This service is responsible in a large part for having made AWS the leading cloud services provider in the world. Truly Jeff Bezos must have been inspired by a divine vision when he conceived of this service. This service leads the industry in terms of scalability, data availability, security, and performance.

How difficult is the AWS solutions architect associate exam?

As we have mentioned before, the AWS platform is vast beyond comprehension. It contains multitudes of services and all of them have their own various configuration options and switches. This means that for a candidate who is just starting his journey of becoming a AWS certified solution architect associate it is not possible to master the whole platform immediately.

It will take many years and a lot of practical and hands-on experience before he is able to do so. But the AWS solutions architect associate exam has a very extensive syllabus and is thus prohibitively difficult for the candidates. Its syllabus contains the following domains of knowledge -

• Design Resilient Architectures - 34%
• Define Performant Architecture – 24%
• Specify Secure Applications and Architectures – 24%
• Design Cost-Optimized Architectures – 10%
• Define Operationally Excellent Architectures – 8%

Some tips which will help to clear the exam -

  1. Read as many AWS whitepapers as possible -

The AWS whitepapers explain many core concepts of the AWS platform in very technical, precise, and accurate language. For a candidate preparing for the AWS solutions architect associate exam, it is very beneficial to read the whitepapers as they will illuminate many technical and hard to grasp concepts of the AWS platform in a detailed and scientific way.

  1. Make use of the process of elimination for multiple correct answers -

On the AWS solutions architect associate exam there will be many questions which will have multiple options. For those questions the candidate should first try to identify and eliminate the incorrect options so that they have to contemplate a fewer number of options while trying to find the correct answer.

  1. Try to spot questions which have hints and details about other questions -

Many times, there will be questions on the exam which will contain hints and details pertinent to other questions on the exam. So, the candidate should keep a weather eye out for such questions and read all questions carefully with this aspect of the exam kept in mind firmly. This trick is taught in many AWS Solutions Architect Associate Certification courses.

  1. Take an AWS solution architect associate course -

The best way to pass the AWS solutions architect associate exam is to take an AWS solution architect associate course. This will help the candidate because they will be studying under the guidance of seasoned and experienced instructors who will be able to bring their world-class teaching skills and subject matter expertise to bear to make the learners fully prepared for the exam.

#how do i pass the aws solutions architect associate exam #how to qualify an aws certified solutions architect associate exam #aws solutions architect associate exam #aws solutions architect associate certification courses #aws certified solution architect associate #aws solution architect associate course

Angela  Dickens

Angela Dickens

1595040900

Detailed Solution to Mercedes Benz Green Manufacturing Competition

Table of Contents:

  1. Business Problem
  2. Problem Statement
  3. Data Preparation
  4. Exploratory Data Analysis
  5. Feature Engineering
  6. Data Preprocessing
  7. Feature Selection
  8. Modeling
  9. Summary
  10. Predictions on Test Data
  11. Conclusion and Future Work
  12. References

1. Business Problem:

Vehicle Testing is an important aspect in the automobile manufacturing process. Every vehicle must pass a certain standard before it is delivered to the customer. Mercedes Benz offers a wide range of vehicles with different customization. Each vehicle must undergo testing in order to ensure vehicle satisfies the safety requirements and meets the emission norms. Each model requires a different test stand configuration due to the customization. Since the number of models are more, large number of tests need to be conducted. More tests result in more time spent on the test stand increasing costs for Mercedes-Benz and generating higher carbon dioxide (green house pollutant gas).

The Mercedes Benz Green Manufacturing Competition hosted on Kaggle intends to optimize the vehicle testing process by developing a machine learning model which can predict the time spent by a vehicle on test stand. The ultimate goal is to reduce the time spent on a test stand which will result in reduced carbon dioxide emissions during testing phase. The dataset for this study is provided by Mercedes Benz. The data can be downloaded from this link.

2. Problem Statement:

The task is to develop a machine learning model that can predict the time a car will spend on the test bench based on the vehicle configuration. The vehicle configuration is defined as the set of customization options and features selected for the particular vehicle. The motivation behind the problem is that an accurate model will be able to reduce the total time spent on testing vehicles by allowing cars with similar testing configurations to be run successively. This problem is a supervised machine learning regression task since it involves predicting a continuous target variable based on a bunch of independent variables by learning for a labelled training data.

The evaluation metric is R-squared also known as co-efficient of determination. It quantifies the percentage of variation in target variables that is explained by the features. R-squared value can lie between 0 and 1. The best possible value of R-squared is 1 which indicates that all the variation in target variables is explained by the input features.

3. Data Preparation:

Two datasets are provided by Mercedes-Benz for this competition namely train.csv and test.csv. The file train.csv is the labeled dataset on which machine learning model has to be developed. The file test.csv is the dataset on which predictions are to be made. Both training and test data contain 377 features which represent the vehicle configuration during the vehicle testing phase. The features have names such as ‘X0’, ‘X1’, ‘X2’ and so on. There is a feature ‘ID’ which represents the ID assigned to each vehicle test. The features are anonymous and do not have any physical representation. The description of the data states that these features are configuration options such as suspension setting, adaptive cruise control, all-wheel drive and a number of different options that together define a car model. A subset of training data in shown in image below.

#Load dataset
data = pd.read_csv('train.csv')
data.head()

Image for post

There are 377 features out of which 368 are binary, 8 are categorical and 1 is continuous. The target variable y is continuous value which represents the time taken by vehicle for testing in seconds. There are no missing values present in the dataset. The train.csv file is split into training and validation set. Below image shows the code for this operation.

#Separate the dependent and independent features
X = data.drop(columns=['y'])
Y = data['y']

#Split the dataset
X_train, X_val, y_train, y_val = train_test_split(X, Y, 
test_size=0.2, random_state=25)
#Concatenate X_train and y_train
train_data = pd.merge(X_train,y_train.to_frame(),left_index=True, right_index=True)
train_data.head()

Image for post

Exploratory Data Analysis (EDA) is performed on train_data mentioned in above code.

4. Exploratory Data Analysis:

4.1. Analyze target/dependent variable:

Below image contains the histogram and box-plot of target variable.

Image for post

The target variable has a mean of 100 seconds. Points with target values above 137.5 can be inferred as outlier points from boxplot. For this competition, points above 150 are classified as outlier points and they are removed from the training set.

4.2. Analyze categorical variables:

There are 8 categorical features namely X0, X1, X2, X3, X4, X5, X6, and X8. For each of these features histogram of counts of unique values and boxplot of unique values is plotted.

4.2.1. X0 feature:

Image for post

Observations from above plots:

  1. aa, ab, g and ac occur only once.
  2. The box-plots of z, y, t, o, f, n, s, al, m, ai, e, ba, aq, am, u, i, ad and b are nearly same. The mean of these categories is nearly 93.
  3. The box-plots of ak, x, j, w, af, at, a, ax, i, au, as, r and c are nearly same. The mean of these categories is nearly 110.
  4. Thus there appears to be grouping among different categories of X0.

4.2.2. X1 feature:

Image for post

Observations from above plots:

  1. Most of the categories of X1 have mean of 100.
  2. y of X1 category is clearly separated from rest of the categories.

4.2.3. X2 feature:

Image for post

Observations from above plots:

  1. ae category dominates in X2. 39% values of X2 are ae.
  2. Similar to X0 there appears to be grouping in X2. X2 has less grouping than X0.
  3. Most of categories of X2 have mean close to 97.

4.2.4. X3 feature:

Image for post

Observations from above plots:

  1. c category dominates in X3. 46% values of X3 are c.
  2. Almost all the categories of X3 have mean of 100.
  3. There appears to be less variation in dependent variable y across the categories of X3. The box-plots for most of the categories of X3 match.

4.2.5. X4 feature:

Image for post

Observations from above plots:

  1. d category dominates in X4. 99.9% values of X4 are d.
  2. This feature must be dropped as there is no variance present in the feature.

4.2.6. X5 feature:

Image for post

Observations from above plots:

  1. x, h, g, y and u occur very rarely in the data.
  2. The mean of all categories of X5 is close to 98.
  3. There appears to be less variation in dependent variable y across the categories of X5. The boxplots for most of the categories of X5 match.

#deep-learning #machine-learning #kaggle #stacking #top-5 #deep learning

Mercedes Green Manufacturing: Kaggle Competition

As part of my continuing data analysis learning journey I thought of trying out past completed Kaggle competition in order to test my skills and knowledge so far . While going through the datasets I came across this Mercedes Green Manufacturing Kaggle competition conducted in sometime in 2017.

Coming from a automotive domain I though this could be a good dataset to apply by data analysis skills. On reading the competition description I could relate to this problem even more closely . The competition is asking, given a set of anonymous categorical and binary variable can you predict the time which the car will take to complete its testing.

As a engineer from this domain I can completely see the importance of such a model . I know how time consuming vehicle testing can be. The process consists of building a prototype car, instrumenting it and then running the required tests . The major bottle neck in car testing occurs during instrumentation phase which requires to de-assemble the car ,fit the required recording instruments and then re-assemble the car.

Another bottle neck during testing is also the availability of testing equipment such as drive cells required to run the test.

All this factors results in man-hours wastage and a increased development time in the vehicle development program. This adds unplanned over-head cost to the company. Hence a model which can predict how much time a car will take to complete a test will help better plan and manage cost and resources.

#stacking #mercedes-benz #xgboost #ensemble-learning #automotive #deep learning

aviana farren

aviana farren

1617947902

Obtain a well-suited White Label Crypto Exchange Solution

A White label Crypto exchange solution comprises characteristics sort of a capable matching engine, more noteworthy transaction throughput, committed back-office software, API integration, the acknowledgment of both market and limit orders, the existence of multi-level referral programs, and integration with various payment service providers.

#white label crypto exchange solution #white label crypto exchange solutions #crypto exchange solutions #cryptocurrency exchange solutions