Table of Contents

  1. Introduction
  2. Here’s Why
  3. Summary
  4. References

Introduction

While saying that Python is better than R is true for me, it may not be true for you. You could, of course think R is more useful than Python for various reasons. Even if you oppose my declaration, I still hope to start a conversation where we can all see the benefits of both programming languages. For data scientists, I believe Python has more benefits than R does. I do realize that R has some unique and powerful statistical libraries that could most likely overshadow Python libraries: however; the overall data science process can reap more positives from being able to scale with data engineers, software engineers, and machine learning engineers with the use of Python.

Below, I will discuss five major reasons to why I think Python is better than R. The reasons include: scalability, Jupyter Notebooks, library packages, integrations, and having the ability to become a cross-functional team-member.

Here’s Why

  • Scalability

Scalability is an enormous benefit to employ in data science. Because most data scientists oftentimes work with other employees from an engineering department, modeling can become easier to deploy, as well as a model’s general, overall process. For example, a typical data scientist could focus on performing only modeling, perhaps even a one-off output. However, there is a step before modeling that you will most likely need to do before training your machine learning model. That step is the data engineering part. With this part of the process, you can read in new data automatically from a SQL database, so that your model is always up-to-date when training. On the other side of process is the deployment aspect. It can be quite intimidating to deploy a model for the first time, especially because it is not taught in school nearly as much as the modeling process.

Software engineers and machine learning engineers can work side-by-side with you, because of Python.

You can create Airflow Directed Acyclic Graph (DAG) that could automatically train the model when there is new data on a specific schedule or when certain parameters are satisfied (e.g., only train this model if we obtain 100 new records of incoming data). Once the model is trained, it can evaluate new data, which can then be outputted into a SQL table through the use of Python.

  • Jupyter Notebooks

Or another similar visualization tool for data science, is able to interpret Python. You can run cells of code, comment, create headings, and add widgets that can improve the functionality of your notebook. The code you write and share here is Python. Being able to code in this programming language in your Jupyter Notebook is a big win for data scientists.

  • Library Packages

There are several powerful and commonly used packages that can be accessed with Python. Some that come to mind are sklearn (also referred to as sci-kit learn) and TensorFlow.

#data-science #machine-learning #towards-data-science #python #programming #data analysis

Python is Better Than R.
1.05 GEEK