In every profession, there are disagreements between the members of the community. Most of the time, the quarrels happen either because both options are equally viable or there is very little evidence to prove one way or the other. And sometimes, people disagree just because they have different preferences and the choice is very subjective.

Having an opinion on these disagreements is a neat cheat to look and feel like part of the community. Sooner or later you will be in the middle of these discussions anyways. I just want to give a small boost with this article.

I will list some of the disagreements I’ve heard over the years in data science circles and share my personal opinion on them.

Python vs R

You might have heard this discussion before you even started studying data science. It is everywhere on the internet, everyone has something to say about it, and some people have very strong opinions on it.

If you think caring about which language you use this much is silly then I’m with you. But it might just be one of the first things your colleagues ask you when you start working.

I have to admit that it is fun to discuss with your colleagues and go back and forth on all the pros and cons but it might be doing more harm than good. I see many aspiring data scientists confused to the point of decision paralysis on this. They naturally want to make the right choice but all the discussion on the internet is not helping them.

If I had to choose a side (which I don’t, but I still will), I would prefer Python. Mostly because I’m more comfortable with it, I already used it a lot before, and I can start getting results faster. That’s also the language I recommend when I get asked. I find it intuitive and easy to learn. Moreover, there is a great community behind Python that will provide you with answers and support when you get stuck. Let alone all the amazing libraries that make your job much easier.

Though I have encountered many people who strongly prefer R and their reasons seem to be similar.

But hey, let’s look on the bright side. If both languages have serious die-hard fans, it might just mean that both are very good languages!

Matplotlib vs ggplot2

This is originally an extension of the Python vs. R discussion. Matplotlib is the go-to visualization tool when using Python and ggplot is what people go for with R. People mostly criticize matplotlib for its inability to create beautiful diagrams. A friend of mine recently sent me a meme on this which I think clearly describes the whole discussion.

Source: news about Christiano Ronaldo’s statue

Let me show you examples of plots generated by each library. Of course, I agree that ggplot2 plots look much nicer to the eye without putting int extra effort. But at the same time, how beautiful do you need your plots to look when you’re just analyzing away. Most of the time, as long as they show you what you need to see, it’s alright.

Source: Pythonspot and R-Graph-Gallery

I find that matplotlib plots are just as functional and adaptable as it’s ggplot2 counterparts. Some R experts might disagree with me on this one.

One secret weapon of matplotlib (or more generally Python) is the additional Seaborn library that can make pretty kick-ass graphs/plots. Your move, R.

#data-scientist #data-science

Burning Questions in Data Science
1.90 GEEK