This is second in a series of articles on the amazing pycaret package in python, that enables fast tracking and automation of virtually every stage of the ML project life cycle with ridiculously minimal lines of code. If you missed the first part, click the link below, in which we have covered briefly the initial set up, which, in a single line of code, completes all aspects of data pre-processing and takes us right to the modelling stage.

PyCaret: The Machine Learning Omnibus

In this article, we will look at several arguments that can be passed to the setup() function to further control the preprocessing done by pycaret. By default thesetup function requires only the dataframe and the target feature whose category labels we want to predict. However, the feature datatypes automatically inferred by the function may not always be correct. In some instances, we may need to step in. In the Titanic dataset we are using, for example, the setup function correctly infers Pclass (passenger class), SibSp(Siblings & Spouses onboard) and Parch (Parents and Children onboard) as categorical features along with sexpycaret will automatically one-hot encode the categorical features and in this case will do so for PclassSibsp , Parch and sex. However, these features except sex have an inherent order to their levels (ordinality) and it would be more appropriate to label encode them to capture the order in the levels.

#machine-learning #classification-algorithms #python #data-preprocessing #pycaret

PyCaret: The Machine Learning Omnibus 
3.10 GEEK