Python provides a basic and simple way to handle such requirements where we have to switch to and fro between multiple languages
Python is great. Really great.
But the field is getting/will get language agnostic with time. And a lot of great work is being done in many other languages.
While I still treat Python as a primary language, I never hesitate to move to a different language if it gets the work done.
The fact is that every language has evolved in such a way that it has built its stronghold in certain areas. For example, Some people may find it easier to use R for regression, or Plot in R using ggplot(Though I sincerely feel Python has come a long way in the visualization department.)
Sometimes it is because a particular library is written in Java/C and someone hasn’t yet ported it to Python.
But is there a better way to handle this constant nuisance?
I like Python because I understand it well now. It is easy for me to do so many things in Python as compared to doing it in R or Java or Scala.
Why do I have to code my data preparation steps in R if I just want to use the Linear Regression package in R?
Or why do I have to learn to create charts in Java if I only want to use the Stacknet package?
Now Python and R have many wrappers. How can I use R in Python or How I can use Python in R? rpy2
and reticulate
These packages are all well and good, and they may solve some problems.*** But they don’t address the generic problem. Every time I want to switch from one language to another I need to learn a whole new package/library. Not scalable at all.***
In this series of posts named Python Shorts, I will explain some simple constructs provided by Python, some essential tips and some use cases I come up with regularly in my Data Science work.
This post is about utilizing a particular package/library from another language, while not leaving the comfort of coding in our primary language.
I will start with a problem statement to explain this. Let’s say I had to create a graph using R, but I wanted to prepare my data in Python.
It is a generic problem any data scientist can potentially face. Do something in one language and then move to another language to do some other thing.
Can I do this without leaving my Jupyter notebook? Or my Python Script?
Here is how I could accomplish this. It might seem hacky to some but I love hacks.
import pandas as pd
data=pd.read_csv("data.csv")
data = preprocess(data)
data.to_csv("data.csv",index=None)
os.system("Rscript create_visualization.R")
The <strong><em>os.system</em></strong>
command provides me with a way to access my shell using Python. And the shell is a potent tool at your disposal. You can run almost any language on the shell.
The corresponding Rscript
that will run in python would look something like:
data<-read.table("data.csv")
ggplot(...)
ggsave("plot.png")
I can then maybe load the png file and show it in my Jupyter notebook using something like a markdown hack.
![alt text](plot.png "Title")
For R users, who don’t want to leave the comfort of R, R also has a system
command analogous to os.system
that you can use to run Python code in R.
<strong><em>os.system</em></strong>
*** in Python provides us a way to do each and everything in Python by letting us call shell commands from Python.***
I have used it in plenty of my projects where I have used this concept to send e-mails using Mutt. Or to run some Java program or to fiddle around.
It seems like a hacky way, but it works and is generalized enough that you don’t have to learn a new library anytime you want to do something with any other language and get it integrated with Python.
#python