First and foremost, let’s start by defining what student retention is, at least in the scope of this article. We’ll define it, as the indicator that tells us if a student that started in College for the first time in a particular Fall semester, came back to the following next Fall (or not). For instance, let’s say a student started on particular University for the first time in the Fall of 2018. If this student enrolled himself/herself for Fall 2019, then this student was retained.

Another common names for retention are, persistence, and/or drop out. Here, these names mean the same thing: one year undergraduate retention. The reader may ask at this point, “Why is student retention important anyways?” That’s a fair question, and without the pretension of exhausting this answer, we can say that is important for a myriad of reasons, starting with the financial impact, the ranks and prestige that schools can get and the list may go on and on. By the way, when we say “school” here, we are specifically referring to Universities and Colleges (Higher Education).

The Challenging of Finding a Suitable Dataset

Finding a dataset that goes at the student level is very hard — if you have one and want to send me, please do so, but it must be anonymized. Meaning, no student’s name or ID, or any other information that allows the researcher to identify the student — particularly, when we have regulations that rightfully, protect student’s data, such as FERPA. This is a big deal, and we must be very careful when handling sensitive information.

That is to say, locating the right dataset, explicitly for this experiment, really imposes a challenging. However, and on the positive side, we managed to find one dataset that we could use, and that is the one we’ll be manipulating here. You can find it at UCI Machine Learning Repository, just click here.

To use this data, it’s requested that we do a proper citation, and please refer toe the “Reference” section of this article to check that out. On an additional note, a big shout out for these researchers that made this dataset available. Thank you.

#machine-learning #r #developer

Crafting a Machine Learning Model to Predict Student Retention Using R
2.25 GEEK