R vs Python: The Great Debate
When I was doing the Google Analytics certificate, I noticed that Google teaches R as the chosen preferred language. One would expect them to use Python seeing as it is the preferred language in tech. I had expected Google, as one of the greatest tech companies in the world, to teach the popular programming language. Turns out there has been an ongoing debate about which of the two programming languages is better for data analytics and data science.
In more ways than one, these two languages are pretty similar. Both are open source, free to use and quite easy to start using. They both do a good job in the data science process — from data wrangling, cleaning, manipulation, visualization, and automation. They both maintain their own roles in exploring big data as well.
What is R?
R is a programming language built by statisticians for visualization and statistical analysis of data. It was created by Robert Gentleman and Ross Ihaka at the University of Auckland in New Zealand. An interesting fact is that it was named R because it’s the first letter of the creators’ names.
It is available for free under the General Public License and can be installed on Mac, Windows, and Linux. R was built off S, a language developed for people with a stronger background in statistics than programming. The biggest challenge of S was that one had to buy the S-PLUS package. Ross and Robert preferred something open-source — and thus, R was born.
R has more than 10,000 libraries that can be used for exploring, analyzing, and visualizing data. Its statistical packages are powerful and can perform complex mathematical operations. It is also useful when building statistical models. Primarily, R is used by statisticians, data analysts, and data miners.
What About Python?
Python is a multi-purpose language that can handle a wide variety of tasks. Think of it like Java or C++, but with an almost natural syntax that is easier to learn. It is very popular in data science because of its built-in libraries for math and statistics.
It is also a darling of the machine learning world, especially due to its scalability. Built by Guido van Rossum a little over three decades ago, Python remains one of the favorite programming languages across the board.
Python has numerous libraries depending on the field in which one is working. SciPy.stats, Statsmodels, and Pingouin are three popular statistical packages. Matplotlib and Seaborn are two widely used packages for data visualization.
So Which Should You Learn?
It depends on what you want to do. If the work is purely statistical, then R takes the day. Specifically designed by statisticians who set out to build statistical software with computing capabilities, R does an exemplary job in statistical analysis.
On the other hand, Python was built for programmers — with a focus on writing fewer lines of code. Don’t forget though that Python is a multi-purpose language and can be used far beyond the world of data.
In my opinion, R has a steeper learning curve, especially for beginners. This is unlike Python, which is generally recommended for beginners because it’s quite beginner-friendly. For instance, in order to load and manipulate your dataset in R, you need to install dplyr, readr, and tidyr, while in Python you only need pandas.
RStudio also has a more complex interface, which takes a while to learn how to navigate.
Verdict
Methinks they chose R since they figured Python resources are widely available. Any self-taught data scientist worth their name will teach themselves Python one way or another. Python’s popularity in data cannot be underpinned — and yet the capabilities of R cannot be ignored.
My final take? Pick whichever language and get going!!
← Back to Blog