This is second in a series of articles on the amazing pycaret
package in python, that enables fast tracking and automation of virtually every stage of the ML project life cycle with ridiculously minimal lines of code. If you missed the first part, click the link below, in which we have covered briefly the initial set up, which, in a single line of code, completes all aspects of data pre-processing and takes us right to the modelling stage.
In this article, we will look at several arguments that can be passed to the setup()
function to further control the preprocessing done by pycaret
. By default thesetup
function requires only the dataframe and the target feature whose category labels we want to predict. However, the feature datatypes automatically inferred by the function may not always be correct. In some instances, we may need to step in. In the Titanic dataset we are using, for example, the setup
function correctly infers Pclass
(passenger class), SibSp
(Siblings & Spouses onboard) and Parch
(Parents and Children onboard) as categorical features along with sex
. pycaret
will automatically one-hot encode the categorical features and in this case will do so for Pclass
, Sibsp
, Parch
and sex
. However, these features except sex
have an inherent order to their levels (ordinality) and it would be more appropriate to label encode them to capture the order in the levels.
#machine-learning #classification-algorithms #python #data-preprocessing #pycaret