Learn how to use Python and R in conjunction with each other to utilize the best of both in a single data science project. An introduction for working with R within Python. An introductory guide to incorporate R in your workflow as a Python Data Scientist. In case, you’ll like to integrate Python in your workflow as an R Data Scientist
When I was a university student, the statistics courses (Survival Analysis, Multivariate Analysis, etc…) were taught in R. Nevertheless, as I wished to learn Data Science, I choose Python because it seemed “spooky” to me.
By working only with Python, I stumble upon the need of implementing some Statistical techniques like the Grubb Test for outliers, Markov Chain Monte Carlo for simulations or Bayesian Networks for synthetic data. Thus, this article is intended to be an introductory guide to incorporate R in your workflow as a Python Data Scientist. In case, you’ll like to integrate Python in your workflow as an R Data Scientist, the reticulate package is useful, check out .
We choose the rpy2 framework, other options are pyRserve or pypeR, because it runs an embedded R. In other words, it allows communication between Python and R objects through rpy2.robjects, we’ll see later a particular example when converting a pandas DataFrame to an R DataFrame. If you get stuck in any of the below steps read the official documentation or the references.
We’ll cover three steps appropriate to start working with R within Python. Finally, we’ll do a practical example and cover further functionalities that the rpy2 package allows you to handle.
But first, we should install the rpy2 package.
# Jupyter Notebook option !pip install rpy2 # Terminal option pip install rpy2
In R, installing packages is performed by downloading them from CRAN mirrors and then installing them locally. In a similar way to Python modules, the packages can be installed and then loaded.
# Choosing a CRAN Mirror import rpy2.robjects.packages as rpackages utils = rpackages.importr('utils') utils.chooseCRANmirror(ind=1) # Installing required packages from rpy2.robjects.vectors import StrVector packages = ('bnlearn',...,'other desired packages') utils.install_packages(StrVector(packages))
ind = 1in
chosseCRANmirror , we assure an automatic redirection to the server nearest to our location. Now, we’re going to cover step two.
Here, we’re going to import the libraries and functions required to perform a Bayesian Network in the practical example.
# Import packages from rpy2.robjects.packages import importr base, bnlearn = importr('base'), importr('bnlearn') # Import Functions bn_fit, rbn = bnlearn.bn_fit, bnlearn.rbn hpc, rsmax2, tabu = bnlearn.hpc, bnlearn.rsmax2, bnlearn.tabu
In order to import any function, it is convenient to see the ‘rpy2’ key in the dictionary of every package, for example, to see available functions to import on bnlearn we run:
bnlearn.__dict__['_rpy2r']Output: ... ... 'bn_boot': 'bn.boot', 'bn_cv': 'bn.cv', 'bn_cv_algorithm': 'bn.cv.algorithm', 'bn_cv_structure': 'bn.cv.structure', 'bn_fit': 'bn.fit', 'bn_fit_backend': 'bn.fit.backend', 'bn_fit_backend_continuous': 'bn.fit.backend.continuous', ... ...
For more info on how to import functions checkout  or .
Personally, I think this functionality is what allows you to combine the scalability (python) with statistical tools (R). As a personal example, while I was using the Multiprocessing python library to implement parallel computation, I also wanted to try the
auto.arima() function from the forecast R library, besides the functions of statsmodels Python package, for forecasting. So, the
robjects.conversion is what allows one to merge the best of the two programming languages.
# Allow conversion import rpy2.robjects as ro from rpy2.objects import pandas2ri pandas2ri.activate() # Convert to R dataframe r_dt = ro.conversion.py2rpy(dt) # dt is a pd.DataFrame object # Convert back to pandas DataFrame pd_dt = ro.conversion.rpy2py(r_dt)
When activating the pandas conversion (
pandas2ri.activate()), many conversions of R to pandas will be done automatically. Yet, for explicit conversion we call the
Besides Monte-Carlo methods, Bayesian Networks are an option for simulating data. However, as today there is no library available for this task in Python. So, I opt for the bnlearn package, which let to learn the graphical structure of Bayesian networks and perform inference from them.
In the example below, we’re using a hybrid algorithm (
rsmax2) for learning the structure of the network because it allows us to use any combination of constraint-based and score-based algorithms. However, depending on the nature of the problem you should choose the right heuristic, for the complete list of available algorithms see . Once, the network is learned we simulate n random samples from the bayesian network with the
rbn function. Finally, we perform a try-except structure to handle a particular type of error.
r_imputados = robjects.conversion.py2rpy(imputados) try: # Learn structure of Network structure = rsmax2(data, restrict = 'hpc', maximize = 'tabu') fitted = bn_fit(structure, data = data, method = "mle") # Generate n number of observations r_sim = rbn(fitted, n = 10) except rpy2.rinterface_lib.embedded.RRuntimeError: print("Error while running R methods")
RunTimeError happens when we don’t want the function to fail or do something unexpected. In this case, we’re catching this error because it is a way to inform the user when something went wrong that it isn't another kind of error (for complete exceptions see ). As an illustration, I got the error of not finding the
hybrid.pc.filter hybrid.pc.filter while running the
There is much more you could do with the rpy2 low-level interface and high-level interface. For instance, you could call python functions with R, let’s see how to find the minimum of a four-dimensional Colville Function through Conjugate-Gradient Method.
from rpy2.robjects.vectors import FloatVector from rpy2.robjects.packages import importr import rpy2.rinterface as ri stats = importr('stats') # Colville f: R^4 ---> R def Colville(x): x1, x2, x3, x4 = x, x, x, x return 100*(x1**2-x2)**2 + (x1-1)**2+(x3-1)**2 + 90*(x3**2-x4)**2 + 10.1*((x2-1)**2 + (x4-1)**2) + 19.8*(x2-1)*(x4-1) # Expose function to R Colville = ri.rternalize(Colville) # Initial point init_point = FloatVector((3, 3, 3, 3)) # Optimization Function res = stats.optim(init_point, Colville, method = c("CG"))
In this article, you'll learn to leverage the best of both ‘Python and R’ in a single project.
⚔️ The big question is which one should we learn as for someone who is interested in machine learning or large datasets – Python or R? ⚔️ In this article, we will answer this question considering all the aspects of both the languages. ⚖
In this video, I show you how you can use Python code right inside R via the "reticulate" R package. How cool is that, now you can reap the benefits of both Python and R!