R Loop over unique values in a dataframe column to create another one based on conditions

My dataset consists of scores and total respondents for questions asked in a survey, over a number of fiscal years (FY13, FY14 & FY15) and in different regions.

My dataset consists of scores and total respondents for questions asked in a survey, over a number of fiscal years (FY13, FY14 & FY15) and in different regions.

My objective is to loop through the FY column and identify when each question was asked, for each region. And store this information in a new column.

This is what a reproducible sample looks like -

testdf=data.frame(FY=c("FY13","FY14","FY15","FY14","FY15","FY13","FY14","FY15","FY13","FY15","FY13","FY14","FY15","FY13","FY14","FY15"),
              Region=c(rep("AFRICA",5),rep("ASIA",5),rep("AMERICA",6)),
              QST=c(rep("Q2",3),rep("Q5",2),rep("Q2",3),rep("Q5",2),rep("Q2",3),rep("Q5",3)),
              Very.Satisfied=runif(16,min = 0, max=1),
              Total.Very.Satisfied=floor(runif(16,min=10,max=120)),
              Satisfied=runif(16,min = 0, max=1),
              Total.Satisfied=floor(runif(16,min=10,max=120)),
              Dissatisfied=runif(16,min = 0, max=1),
              Total.Dissatisfied=floor(runif(16,min=10,max=120)),
              Very.Dissatisfied=runif(16,min = 0, max=1),
              Total.Very.Dissatisfied=floor(runif(16,min=10,max=120)))

I start with creating an ID column, by concatenating Region & QST

library(tidyr)
testdf = testdf %>%
unite(ID,c('Region','QST'),sep = "",remove = F)

My Objective

1) For each unique ID, identify whether the given question was asked -

a) Only on one year (either FY13, FY14 or FY15)

b) Over the Past Two Years (FY15 & FY14 only)

c) Over the Past Three Years (FY15 & FY14 & FY13)

d) On FY13 & FY15 Only

My Attempt

For this problem, I tried to create a for loop, and for each unique ID, I first store the unique occurences of each FY the question was asked in a vector v. Then using an IF conditional statement I assign a comment to a newly created column called Tally based on these occurences.

for (i in unique(testdf$ID))
{
v=unique(testdf$FY)

if(('FY15' %in% v) & ('FY14' %in% v)) {
testdf$Tally=='Asked Over The Past Two Years'
}
else if(('FY15' %in% v) & ('FY14' %in% v) & ('FY13' %in% v)) {
testdf$Tally=='Asked Over The Past Three Years'
}
else if(('FY13' %in% v) & ('FY15' %in% v)) {
testdf$Tally=='Question Asked in FY13 & FY15 Only'
}
else { testdf$Tally=='Question Asked Once Only'
}

}

The loop seems to run without throwing an error message, but it doesn't seem to create the new Tally column.

Any help with this will be greatly appreciated.

R Programming For Beginners - R Language Tutorial - R Tutorial For Beginners

R Programming For Beginners - R Language Tutorial - R Tutorial For Beginners

What you’ll learn

  • You will learn how to navigate in the RStudio interface
  • You will learn how to make basic graphs
  • You will learn about the basic structure of R including packages
  • You will learn how to perform basic commands in the R programming language
  • You will also learn how to handle add on packages, how to use the R help tools and generally how to find your way in the R world.


Learn More

The Data Science Course 2019: Complete Data Science Bootcamp

Machine Learning A-Z™: Hands-On Python & R In Data Science

Tableau 10 A-Z: Hands-On Tableau Training For Data Science!

R Programming A-Z™: R For Data Science With Real Exercises!

Machine Learning, Data Science and Deep Learning with Python

Intro to Data Science - Crash Course for Beginners

How to get started with Python for Deep Learning and Data Science

R vs Python: What’s The Difference?

Data Science with Python explained

From ‘R vs Python’ to ‘R and Python’

From ‘R vs Python’ to ‘R and Python’

In this article, you'll learn to leverage the best of both ‘Python and R’ in a single project.

In this article, you'll learn to leverage the best of both ‘Python and R’ in a single project.

If you are into Data Science, the two programming languages that immediately come to mind are R and Python. However, instead of considering them as two options, more often than not, we end up comparing the two. R and Python, are excellent tools in their own right but are very often conceived as rivals. If you type R vs Python , in your Google search bar, you instantly get a plethora of resources on topics which talk about the supremacy of one over the other.

One of the reasons for such an outlook is because people have divided the Data Science field into camps based on the choice of the programming language they use. There is an R camp and a Python camp and history is a testimony to the fact that camps cannot live in harmony. Members of both the camps fervently believe that their choice of language is superior to the other. So, in a way, divergence doesn’t lie with the tools but with the people using those tools.

Why not use Both?

There are people in the Data Science community who are using both Python and R, but their percentage is small. On the other hand, there are a lot of people who are committed to only one programming language but wished they had access to some of the capabilities of their adversary. For instance, R users sometimes yearn for the object-oriented capacities that are native to Python and similarly, some Python users long for the wide range of the statistical distributions that are available within R.

The figure above shows the results of the survey conducted by Red Monk in the third quarter of 2018. These results are based on the popularity of the languages on Stack Overflow as well as on Github and clearly show that both R and Python are rated quite high. Therefore, there is no inherent reason as to why we cannot work with both of them on the same project. Our ultimate goal should be to do better analytics and derive better insights and choice of a programming language should not be a hindrance in achieving that.

Overview of R and Python

Let’s have a look at the various aspects of these languages and what’s good and not so good about them.

Python

Since its release in 1991, Python has been extremely popular and is widely used in data processing. Some of the reasons for its wide popularity are:

  • Object-oriented language
  • General Purpose
  • Has a lot of extensions and incredible community support
  • Simple and easy to understand and learn
  • packages like pandas, numpy and scikit-learn, make Python an excellent choice for machine learning activities.

However, Python doesn’t have specialized packages for statistical computing, unlike R.

R

R’s first release came in 1995 and since then it has gone on to become one of the most used tools for data science in the industry.

  • Object-oriented language
  • General Purpose
  • Has a lot of extensions and incredible community support
  • Simple and easy to understand and learn
  • packages like pandas, numpy and scikit-learn, make Python an excellent choice for machine learning activities.

Performance wise R is not the fastest language and can be a memory glutton sometimes when dealing with large datasets.

Leveraging the best of Both Worlds

Could we utilize the statistical prowess of R along with the programming capabilities of Python? Well, when we can easily embed SQL code within either R or Python script, why not blend R and Python together?

There are basically two approaches by which we can use both Python and R side by side in a single project.

R within Python

  • Object-oriented language
  • General Purpose
  • Has a lot of extensions and incredible community support
  • Simple and easy to understand and learn
  • packages like pandas, numpy and scikit-learn, make Python an excellent choice for machine learning activities.

PypeR provides a simple way to access R from Python through pipes. PypeR is also included in Python’s Package Index which provides a more convenient way for installation. PypeR is especially useful when there is no need for frequent interactive data transfers between Python and R. By running R through pipe, the Python program gains flexibility in sub-process controls, memory control, and portability across popular operating system platforms, including Windows, GNU Linux and Mac OS

  • Object-oriented language
  • General Purpose
  • Has a lot of extensions and incredible community support
  • Simple and easy to understand and learn
  • packages like pandas, numpy and scikit-learn, make Python an excellent choice for machine learning activities.

pyRserve uses Rserve as an RPC connection gateway. Through such a connection, variables can be set in R from Python, and also R-functions can be called remotely. R objects are exposed as instances of Python-implemented classes, with R functions as bound methods to those objects in a number of cases.

  • Object-oriented language
  • General Purpose
  • Has a lot of extensions and incredible community support
  • Simple and easy to understand and learn
  • packages like pandas, numpy and scikit-learn, make Python an excellent choice for machine learning activities.

rpy2 runs embedded R in a Python process. It creates a framework that can translate Python objects into R objects, pass them into R functions, and convert R output back into Python objects. rpy2 is used more often since it is one which is being actively developed.

One advantage of using R within Python is that we would able to use R’s awesome packages like ggplot2, tidyr, dplyr et al easily in Python. As an example let’s see how we can easily use ggplot2 for mapping in Python.

  • Object-oriented language
  • General Purpose
  • Has a lot of extensions and incredible community support
  • Simple and easy to understand and learn
  • packages like pandas, numpy and scikit-learn, make Python an excellent choice for machine learning activities.

  • Object-oriented language
  • General Purpose
  • Has a lot of extensions and incredible community support
  • Simple and easy to understand and learn
  • packages like pandas, numpy and scikit-learn, make Python an excellent choice for machine learning activities.

[https://rpy2.github.io/doc/latest/html/graphics.html#geometry](https://rpy2.github.io/doc/latest/html/graphics.html#geometry](https://rpy2.github.io/doc/latest/html/graphics.html#geometry) "https://rpy2.github.io/doc/latest/html/graphics.html#geometry](https://rpy2.github.io/doc/latest/html/graphics.html#geometry)")

Resources

You may want to have a look at the following resources for more in-depth review of rpy2:

  • Object-oriented language
  • General Purpose
  • Has a lot of extensions and incredible community support
  • Simple and easy to understand and learn
  • packages like pandas, numpy and scikit-learn, make Python an excellent choice for machine learning activities.

Python with R

We can run R scripts in Python by using one of the alternatives below:

  • Object-oriented language
  • General Purpose
  • Has a lot of extensions and incredible community support
  • Simple and easy to understand and learn
  • packages like pandas, numpy and scikit-learn, make Python an excellent choice for machine learning activities.

This package implements an interface to Python via Jython. It is intended for other packages to be able to embed python code along with R.

  • Object-oriented language
  • General Purpose
  • Has a lot of extensions and incredible community support
  • Simple and easy to understand and learn
  • packages like pandas, numpy and scikit-learn, make Python an excellent choice for machine learning activities.

rPython is again a Package Allowing R to Call Python. It makes it possible to run Python code, make function calls, assign and retrieve variables, etc. from R.

  • Object-oriented language
  • General Purpose
  • Has a lot of extensions and incredible community support
  • Simple and easy to understand and learn
  • packages like pandas, numpy and scikit-learn, make Python an excellent choice for machine learning activities.

SnakeCharmR is a modern overhauled version of rPython. It is a fork from ‘rPython’ which uses ‘jsonlite’ and has a lot of improvements over rPython.

  • Object-oriented language
  • General Purpose
  • Has a lot of extensions and incredible community support
  • Simple and easy to understand and learn
  • packages like pandas, numpy and scikit-learn, make Python an excellent choice for machine learning activities.

PythonInR makes accessing Python from within R very easy by providing functions to interact with Python from within R.

  • Object-oriented language
  • General Purpose
  • Has a lot of extensions and incredible community support
  • Simple and easy to understand and learn
  • packages like pandas, numpy and scikit-learn, make Python an excellent choice for machine learning activities.

The reticulate package provides a comprehensive set of tools for interoperability between Python and R. Out of all the above alternatives, this one is the most widely used, more so because it is being aggressively developed by Rstudio. Reticulate embeds a Python session within the R session, enabling seamless, high-performance interoperability. The package enables you to reticulate Python code into R, creating a new breed of a project that weaves together the two languages.

The reticulate package provides the following facilities:

  • Object-oriented language
  • General Purpose
  • Has a lot of extensions and incredible community support
  • Simple and easy to understand and learn
  • packages like pandas, numpy and scikit-learn, make Python an excellent choice for machine learning activities.

Resources

Some great resources on using the reticulate package are:

  • Object-oriented language
  • General Purpose
  • Has a lot of extensions and incredible community support
  • Simple and easy to understand and learn
  • packages like pandas, numpy and scikit-learn, make Python an excellent choice for machine learning activities.

Conclusion

Both R and Python are quite robust languages and either one of them is actually sufficient to carry on the Data Analysis task. However, there are definitely some high and low points for both of them and if we could utilize the strengths of both, we could end up doing a much better job. Either way, having knowledge of both will make us more flexible thereby increasing our chances of being able to work in different environments.

References:

Interfacing R and Python — Andrew Collier

http://blog.yhat.com/tutorials/rpy2-combing-the-power-of-r-and-python.html

Learn More

An A-Z of useful Python tricks

A Complete Machine Learning Project Walk-Through in Python

A Feature Selection Tool for Machine Learning in Python

Machine Learning: how to go from Zero to Hero

Learning Python: From Zero to Hero

Introduction to PyTorch and Machine Learning

NumPy Tutorial for Beginners

Python Tutorial for Beginners (2019) - Learn Python for Machine Learning and Web Development

Machine Learning A-Z™: Hands-On Python & R In Data Science

Python for Data Science and Machine Learning Bootcamp

Data Science, Deep Learning, & Machine Learning with Python

Deep Learning A-Z™: Hands-On Artificial Neural Networks

R Vs. Python ⚔️ - Difference Between R and Python ⚖

R Vs. Python ⚔️ - Difference Between R and Python ⚖

⚔️ The big question is which one should we learn as for someone who is interested in machine learning or large datasets – Python or R? ⚔️ In this article, we will answer this question considering all the aspects of both the languages. ⚖

⚔️ The big question is which one should we learn as for someone who is interested in machine learning or large datasets – Python or R? ⚔️ In this article, we will answer this question considering all the aspects of both the languages. ⚖

For a large number of people, data analysis is one of the most important parts of their jobs. The increased availability of data has made computing more powerful and the need for an analytics-driven decision in businesses has brought data science into the limelight. According to a report by IBM, in 2015, there were 2.35 million openings for data analytics jobs in the US. It is expected and estimated that by 2020, the number will rise to 2.72 million. IBM likes to call it “The Quant Crunch”.

In the current era, programming languages like R and Python have been in much demand especially in this quest for data science. Both were developed in the early 1990s. R was mainly for statistical analysis and Python was rather a general-purpose language. Now the big question is which one should we learn as for someone who is interested in machine learning or large datasets – Python or R? In this article, we will answer this question considering all the aspects of both the languages.

Introducing Python and R

Python and R are both open-source, state-of-the-art programming languages. Both languages are oriented toward data science. Learning both of them would be an ideal solution. But since we are to make a comparison let us segregate both the language modules based on their respective qualities.

Python

Python, which is also called the Swiss army knife of coding, is a general-purpose, high-level programming language which focuses on versatility and cleaner programming.

It is easy-to-use and makes replicability and accessibility easier than R. Python is primarily used in the field of Artificial Intelligence and game development.

R

It is basically a low-level programming language used by statisticians and data miners for developing statistical software, graphical representations, and for data analysis. R Foundation for Statistical Computing has been supporting it. R has one of the richest ecosystems of around 12000 packages in the open-source repository for performing data analysis.

History

Python

Python is not named after the snake, but rather after the British TV show Monty Python. Influenced by Modula-3 and successor of the ABC programming language, Python was implemented in the year 1989 by Guido van Rossum.

It was initially released in the year 1991 as Python 0.9.0. Python 2.0 and Python 3.0 were released in the year 2000 and 2008 respectively (the latest version of Python is 3.7.3).

R

Ross Ihaka and Robert Gentleman were the developers of R, which is an implementation of the S programming language created by John Chambers in 1976. Ihaka and Gentleman developed it while working together in New Zealand.

When R was released in 1990, many joined the project to make improvements. It was declared “open-source” in the year 1995. The first version of R was released to the public in the year 2000.

Features

R

R is a free programming language and is considered to be the best since most statistical languages are not priceless.

It covers a wide range of packages which are used in various fields starting from statistical computing, genomics, machine learning, finance, medicine and so on.

Let us list some key features of R -

  • A lot of Techniques - It is a well-developed programming language which encompasses a wide range of techniques such as linear and non-linear modelling, clustering, classification, etc.
  • Matrix and vectors computations - R supports matrix arithmetic and its data structures contain lists, matrices, vectors, and arrays.
  • Compliance - It complies with other programming languages like C, C++ or Java and allows communication with statistical packages(SAS and SPSS).
  • **Large Community - **R has a progressive community that influences its modifications, which allows R to run on almost any operating system including Windows and Linux.

Python

Python is an interpreted high-level language and it is extremely versatile. It’s a name you can hear among people who love working with data.

According to the TIOBE Programming Community Index, Python is the 3rd most popular language of 2019 after Java and C.

Let us list five significant reasons why Python is the language for all.

  • A lot of Techniques - It is a well-developed programming language which encompasses a wide range of techniques such as linear and non-linear modelling, clustering, classification, etc.
  • Matrix and vectors computations - R supports matrix arithmetic and its data structures contain lists, matrices, vectors, and arrays.
  • Compliance - It complies with other programming languages like C, C++ or Java and allows communication with statistical packages(SAS and SPSS).
  • **Large Community - **R has a progressive community that influences its modifications, which allows R to run on almost any operating system including Windows and Linux.

Below are two images which show the difference in the code for displaying “Hello World” in Python and R.

Code for displaying “Hello World” in Python

Code for displaying “Hello World” in R

Setup Instructions and Installation

Python

For Windows—

**Step 1: **Open any browser and go to https://www.python.org/

Step 2: Click on the Downloads option. You will see the latest version of Python(which is Python 3.7.3 and stable too).

**Step 3: **Click on ” Download Python 3.7.x ” option.

**Step 4: **The file named “Python-3.7.x.exe” should start downloading into your standard download folder.

Step 5: After it is downloaded, go to the specified folder and run it. Proceed with the Installation process. After a few minutes or so, you will have your Python IDLE running in your computer.

For MacOS—

**Step 1: **Open any browser and go to https://www.python.org/

Step 2: Click on the Downloads option. You will see the latest version of Python(Python 3.7.3).

**Step 3: **Click on “Download Python 3.7.x” option.

**Step 4: **The file named “Python-3.7.x.pkg” should start downloading into your standard download folder.

**Step 5: **After it is downloaded, go to the specified folder and run it. Proceed with the Installation process. After a few minutes or so, you will have your Python IDLE running in your computer.

R

For Windows—

Step 1: Open any internet browser and go to www.r-project.org.

Step 2: Click on the ”download R” link in the middle of the page under "Getting Started."

Step 3: Select a CRAN location and click the corresponding link.

Step 4: Click on the "install R for the first time" link at the top of the page.

Step 5: Click on "Download R for Windows" and save the file on your computer. Run the .exe file and follow the installation instructions thereafter.

For MacOS—

Step 1: Open any internet browser and go to www.r-project.org.

Step 2: Click the "download R" link in the centre of the page under "Getting Started".

Step 3: Select a CRAN location (a mirror site) and click the corresponding link.

Step 4: Click on the "Download R for (Mac) OS X" link at the top of the page.

Step 5: Click on the file which contains the latest version of R under "Files".

Step 6: Save the .pkg file, double-click it to open, and follow the installation instructions thereafter.

Distributions

Both R and Python have a common free and open-source distribution— Anaconda. Its main functions include applications of machine learning, large-scale data processing, predictive analysis, and data science.

The Anaconda distribution consists around 1400 popular data science packages including Anaconda Navigator, a desktop Graphical User Interface(GUI) which allows users to launch applications and manage the conda package.

Some of the commonly used IDEs of Python are -

  • A lot of Techniques - It is a well-developed programming language which encompasses a wide range of techniques such as linear and non-linear modelling, clustering, classification, etc.
  • Matrix and vectors computations - R supports matrix arithmetic and its data structures contain lists, matrices, vectors, and arrays.
  • Compliance - It complies with other programming languages like C, C++ or Java and allows communication with statistical packages(SAS and SPSS).
  • **Large Community - **R has a progressive community that influences its modifications, which allows R to run on almost any operating system including Windows and Linux.

Some of the commonly used IDEs of R are -

  • A lot of Techniques - It is a well-developed programming language which encompasses a wide range of techniques such as linear and non-linear modelling, clustering, classification, etc.
  • Matrix and vectors computations - R supports matrix arithmetic and its data structures contain lists, matrices, vectors, and arrays.
  • Compliance - It complies with other programming languages like C, C++ or Java and allows communication with statistical packages(SAS and SPSS).
  • **Large Community - **R has a progressive community that influences its modifications, which allows R to run on almost any operating system including Windows and Linux.
Which language to choose to learn out of these two?

If you have programming experience, which is better to learn, R or Python?

If you have gathered some knowledge about programming, Python is the language for you. The syntax of Python is much analogous to other languages in comparison to R’s syntax.

R has a non-standardized kind of code which might be a difficulty for people who are new to programming. On the other hand, Python is much readable and focuses on development fruitfulness.

Which is better, R or Python, if you want to go into industry or academia?

R is a statistical programming language which is mainly used in the academic sector. But the real question is which one is industry-ready?

If we consider this, Python would be a better option. Organizations use Python extensively to develop their production systems.

But since some time now, R has updated their libraries to open-source, industries are also considering it for their work and is being largely used.

Which is better for data analysis?

This is the most common question which is lurking around everyone for some time. But before settling to the conclusion, let me provide you with two examples.

Consider a situation where we need to cover election data. This is a relatively repetitive and predictable process where we need to collect data and make recurrent analysis and make pies and charts based on that. In this case, Python will provide ease of work.

Now, if we take text analysis, for example, where we need to break paragraphs into phrases and words and analyze patterns, it is better to make use of R.

Conclusively, we can say Python is used for repeated jobs and data manipulation whereas R for heavy statistical projects and situations where we need to dive into one-time datasets.

What do you want to learn, “statistical learning” or “machine learning”?

Machine learning comes in the category of Artificial Intelligence while Statistical learning is a subfield of Statistics. Machine learning focuses on the development of real-world applications and predictive models; while Statistical learning mainly emphasizes on preciseness and uncertainty.

Since R was developed by statisticians, people who have a background in statistics, R would be easier to work with.

Python, on the other hand, is a better choice for those in the data department where they need to perform analysis and also for those in the machine learning sector, especially because of its flexibility.

Which language to learn if you want to do a lot of web development and software engineering?

R would be your choice if you want to go for web development. Though it is not the best in comparison to JavaScript or CSS. R provides you with the Shiny library by which websites can be developed which will be powered by R.

For software engineering, Python is the one. For an engineering environment, Python is better than R in the larger spectrum. However, you might need to make use of a low-level module like C++ or Java for really efficient coding.

Which language helps to create beautiful and interactive data visualizations, R or Python?

R is always a better option for continuous prototyping and handling datasets. Data visualizations can be performed with R with library packages like ggplot2, HTML widgets, Leaflet. Though Python has made some advances with Matplotlib but still lags behind R in this area.

What are the libraries R and Python offers?

**For data collection **

Python

The data you seek, python has it for you. It contains CSV(comma-separated value documents) and JSON(JavaScript Object Notation) sourced from the web. SQL tables can also be inserted in the code.

Python has a special library called the Python requests library which simplifies HTTP requests into a line of code by allowing data from websites. It also contains libraries for organizing data and making an in-depth analysis.

R

R is not very efficient in collecting information from websites as compared to Python. However, packages like Rvest and magrittr can be used for web scraping, cleaning and breaking down information. You can also insert data from CSV, Excel and from text files into R.

**For data exploration **

Python

Pandas is the data analysis library of Python. It can work easily with large amounts of data. It allows the user to filter, arrange and display the data in minimal time.

While working with projects, Pandas allows the construction and reconstruction of frameworks. Invalid values like Nan(not a number) can be replaced with a value(such as 0) which will allow ease in numerical analysis. You can scan and clean the illogical data.

R

Since R was made by statisticians to perform statistical and numerical analysis, data exploration is a privilege to those using R. You can make probability distributions, perform statistical tests and make standard machine learning models.

Optimization techniques, statistical processing, random number generation, signal processing, and machine learning are some basic functionalities of R.

For data modelling

Python

Ask a question and Python is there to help you out. Numerical modelling analysis? There’s Numpy.

Scientific computation and calculation? SciPyi is there.

And for Machine learning algorithms? It is a scikit-learn. By using scikit-learn you can use all the machine learning library packages contained in Python without worrying about the inside complexities.

R

If you want to perform some particular modeling analysis, you have to go outside of R’s basic library functions.

Poisson’s distribution and mixtures of probability laws are some of the outside library packages used for some specific data modeling analysis.

For data visualization

Python

For data visualization, we can use Python’s distribution—Anaconda.

Matplotlib is used to create graphs and charts using the data stored in Python and for advanced ones and better design, Plot.ly is used.

You might have seen online tutorials on how to learn Python. People use the nbconvert function to create it. With this function, you can convert your snippets of code to HTML documents.

R

R contains packages for scientific visualization techniques which allows the results to be displayed graphically.

You can create elementary graphs and plots from data matrices and save them in .jpg or PDF formats. This can be done from the basic R libraries.

However, for advance plots or graphs, you can use the ggplot2 function.

Topographic hill shading using Matplotlib

Plot.ly correlation points of the Iris dataset

Advantages of using R and Python in Data Science and Machine Learning
  • A lot of Techniques - It is a well-developed programming language which encompasses a wide range of techniques such as linear and non-linear modelling, clustering, classification, etc.
  • Matrix and vectors computations - R supports matrix arithmetic and its data structures contain lists, matrices, vectors, and arrays.
  • Compliance - It complies with other programming languages like C, C++ or Java and allows communication with statistical packages(SAS and SPSS).
  • **Large Community - **R has a progressive community that influences its modifications, which allows R to run on almost any operating system including Windows and Linux.
The popularity of Python vs R

Both R and Python have become stars in the field of Data Science and Machine Learning.

R had its popularity in the year 2015 – 2016. But in recent years, Python has become more popular.

Python’s popularity has been because of its multi-programming paradigms, easy readability, availability of vast library, and community support. While other programming languages like C, C++ or Java takes around 5 to 7 lines code to print “hello world”, Python saves your time and effort because a single line of code is more than enough to execute it.

Some of the sectors where both R and Python have gained popularity in recent years are –

  • A lot of Techniques - It is a well-developed programming language which encompasses a wide range of techniques such as linear and non-linear modelling, clustering, classification, etc.
  • Matrix and vectors computations - R supports matrix arithmetic and its data structures contain lists, matrices, vectors, and arrays.
  • Compliance - It complies with other programming languages like C, C++ or Java and allows communication with statistical packages(SAS and SPSS).
  • **Large Community - **R has a progressive community that influences its modifications, which allows R to run on almost any operating system including Windows and Linux.

In the above chart, we can see that gradually other sectors are also adapting R and Python as a preference. Organizations like financial firms, retail organizations, banks and healthcare institutions have started offering job roles in R.

The Growing Rate of R and Python

Python

Python is considered to be the fastest growing programming language in the world. According to Stack Overflow developer survey, in 2013, Python overtook R as the most popular language for data science.

According to Forbes, a data scientist is the “sexiest job of the 21st century”. Python is real-life implemented. Basic data science operations are easier in Python as compared to R. In addition to its versatility and easier to code features, developers tend to use it more.

R

In the year 2016, R was used by 55% data scientists while Python stood at 51%. In the following 2 years, Python increased by 33% and R got reduced by 25%.

So the question is will the slope of R continue going downwards? I guess it will, but not in practice.

R is the statistician’s language. People having mathematics and statistics as their background will never neglect R while creating a data science model. R would be easy and simple to them rather than Python.

So how will we choose?

Since the popularity of R is down-swinging, using R as complementary to Python will be a good combination. This way R would always have a role to play in a data scientist’s toolbox.

Below is a Python’s Jupyter Notebook’s percentage of Monthly Active Users (MAU) on Github survey by Ben Frederickson which shows a sharp increase after 2015.

“Ranking programming languages by Github users” – Ben Frederickson

Career Opportunities

Python

According to IEEE, which tracks the programming languages by its popularity, Python is currently considered to be the most popular language for Data Scientists worldwide.

Some of the regions in which Python is widely used are mentioned below:

Some of the organizations which use Python language—

  • A lot of Techniques - It is a well-developed programming language which encompasses a wide range of techniques such as linear and non-linear modelling, clustering, classification, etc.
  • Matrix and vectors computations - R supports matrix arithmetic and its data structures contain lists, matrices, vectors, and arrays.
  • Compliance - It complies with other programming languages like C, C++ or Java and allows communication with statistical packages(SAS and SPSS).
  • **Large Community - **R has a progressive community that influences its modifications, which allows R to run on almost any operating system including Windows and Linux.

Some of the Python job profiles with their basic salary package—

According to Payscale.com, below is a graph depicting average Python salary for India and US.

You can also take up the Python training to learn the basics of the world’s fastest growing and most popular programming language used by data scientists, software engineers, machine learning engineers. This training will be a great introduction to both fundamental programming concepts and the programming language and will also enhance your skill sets.

R

The graph below highlights the jobs of R programmers from the year 2009 – 2017.

Source: Stackoverflow

Some of the organizations which use R as a tool for analytics—

  • A lot of Techniques - It is a well-developed programming language which encompasses a wide range of techniques such as linear and non-linear modelling, clustering, classification, etc.
  • Matrix and vectors computations - R supports matrix arithmetic and its data structures contain lists, matrices, vectors, and arrays.
  • Compliance - It complies with other programming languages like C, C++ or Java and allows communication with statistical packages(SAS and SPSS).
  • **Large Community - **R has a progressive community that influences its modifications, which allows R to run on almost any operating system including Windows and Linux.

R job roles with their basic salary package—

  • A lot of Techniques - It is a well-developed programming language which encompasses a wide range of techniques such as linear and non-linear modelling, clustering, classification, etc.
  • Matrix and vectors computations - R supports matrix arithmetic and its data structures contain lists, matrices, vectors, and arrays.
  • Compliance - It complies with other programming languages like C, C++ or Java and allows communication with statistical packages(SAS and SPSS).
  • **Large Community - **R has a progressive community that influences its modifications, which allows R to run on almost any operating system including Windows and Linux.
PROS and CONS

Python

Pros —

**1) All-in-one language - **Python is an interpreted, interactive, modular, dynamic, portable, object-oriented, high-level programming language which is accessible and easy to learn and has a gentle learning curve.

**2) A handful of Support Libraries - **Python boasts a high number of standard libraries for string operations, operating system interfaces, data manipulation, data collection, machine learning, Internet and so on.

Scikit-learn and Pandas are two tools for data analysis and high-performance structures respectively. If you want to include R-like functions, you have the RPy2 package.

3) Integration - Python has better integration features than R. It can develop Web Services by integrating with Enterprise Application Integration.

Though developers prefer low-level languages like C, C++ or Java, if Python gets integrated with them, the control capabilities of Python gets boosted.

4) Productivity - Python is extremely productive to the programmer and also in the development area. Due to its integration feature, framework and increased control abilities, it speeds up the development process.

Cons—

**1) Difficulty in going to other languages - **If you work with Python for a span of time, I would warn you not to fall in blind love. Declaring values and variables would stand as insecurity thereafter.

**2) Weak computation in mobile - **Though Python has made its name in most of desktop and server platforms, mobile computation is still a dream.

3) Speed reduction - Since Python executes using an interpreter rather than a compiler, the time needed for execution and compilation is a bit higher than expected.

**4) Run-time errors - **Testing time, run-time errors and design restrictions are some common problems since Python was initially dynamically typed.

R

Pros—

**1) Data and visualization - **R would be your choice if data analytics and data visualization are priorities for your project.

**2) Wealthy with libraries and tools - **R has a rich ecosystem of statistical libraries which makes it a better tool for statistical computations.

Caret is a machine learning library which is capable of creating effective prediction models.

R contains advanced data analysis packages which can control the pre-modeling, modeling and post-modeling phases and can also perform particular tasks like data visualization and model validation.

3) Good Explorations - If you are work is about statistical models and you are just in phase 1 of your exploratory project, consider R to be that friend of yours who explains concepts in simple and brief just before the exam.

Cons—

**1) Steep learning curve - **R is definitely a challenging programming language and few developers work with it for building projects.

**2) Inconsistency - **The pace of development of R is decreased due to the inconsistency of the language because most algorithms in R are provided by third parties.

Every time you have a new algorithm in hand, it needs to learn new ways to model it.

Conclusion and Summary

Here’s a brief summary of all the important aspects of comparison between the two most important languages for Data Science and Machine Learning - Python and R.

After understanding the whole scenario, we can draw a conclusion that the entire decision whether R is better than Python, is up to us. It is the users’ requirement which makes a programming language like R and Python popular than the other. It is our choice, based on the features, to select the programming language to work on Data Science or Machine learning or Predictive models or data manipulation and so on. On the other hand, it might be possible for a third language as a conjunction of both R and Python. Till then let us merge our creativity and the machine and develop models that could nearly be a betterment for the human race.