Often people get frustrated when using a software for data analysis which is not particularly suitable for a given task but nevertheless.
Often people get frustrated when using a software for data analysis which is not particularly suitable for a given task but nevertheless continue using it because they are familiar with that software. For example, using MS Excel for data which consists of mainly text. Using Python or R would make the job way easier and allow people to work more efficiently. However, just as often people shy away from learning Python or R because they believe that coding is difficult. A common misconception is that you need to be good at maths in order to be a good programmer (check this out for more misconceptions). If you are one of those people, let me assure you that this is not true. In this article, I want to provide you a set of tools to get started with Data Analysis and Data Science in Python or R. I have taught a beginner’s Data Science summer course at University College London in 2019 and will share all my tips and resources here (for free).
Just like with everything new you learn, you need to start with the basics. In this case, learn basic syntax. I would suggest spending at least a weekend to get a feel for the language you want to learn by doing some simple arithmetics, familiarise yourself with simple data structures (lists, sets, dictionaries, etc) and write some functions, if-else statements, and for-loops. There are enough resources out there to get you started. I suggest you check out sites like Coursera, Udemy, edX, and Udacity and find a course that fits your learning style (I personally like the syllabus of IBM’s course on Coursera “Python for Data Science and AI”). Hackerrank and Leetcode are also great websites to practice your coding skills and do not require you to download anything onto your computer — you can practice in the browser (although I prefer doing it offline — more on that in the next section). Check out Hackerrank’s Python challenges and Datacamp’s Introduction to R.
For Python, as well as R, I suggest you download Anaconda. Anaconda is a free distribution of Python and R for scientific computing, that aims to simplify package management (if you don’t know what packages are — don’t worry, more on that later). Anaconda comes with a tool called Jupyter Notebook which is an open-source web application that allows you to create and share documents that contain live code, equations, visualisations and narrative text. It’s my favourite tool for data analysis. If you want to get a feel for it, check out Google Colab which is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.
As I have told my students countless times over the past few years “you are not the first person to encounter that problem — someone has already asked that question before — google it!”. If you get an error message you don’t understand, or you don’t know how to round a number to two decimal points in python — stackoverflow is your friend! I can guarantee you that no matter what it is — someone has already answered your question on stackoverflow. So don’t be afraid of googling something when you get stuck. Programmers google all-the-time. (And as I told my Chinese summer school students — Google being banned in your country is NO excuse to not search the web for answers)
What I have noticed, however, while teaching the summer school course is that many people don’t know how to formulate their problem and therefore struggle to find an answer on the internet. Knowing how to put your problem into words is a skill in itself and requires practice. Taking the rounding example from above — googling “how to round number” _will not give you the answer you are looking for (try it). The top result when googling _“how to round number python” _will talk you through Python’s inbuilt round() function, which will require you to read through more text than necessary. Only if you google _“how to round number python two decimal points” you will get a list of suggested questions on stackoverflow. This may sound obvious but might prove difficult when dealing with questions and problems one has not dealt with before.
Why should you learn R programming when you're aiming to learn data science? Here are six reasons why R is the right language for you.
PySpark in Machine Learning | Data Science | Machine Learning | Python. PySpark is the API of Python to support the framework of Apache Spark. Apache Spark is the component of Hadoop Ecosystem, which is now getting very popular with the big data frameworks.
PyTorch for Deep Learning | Data Science | Machine Learning | Python. PyTorch is a library in Python which provides tools to build deep learning models. What python does for programming PyTorch does for deep learning. Python is a very flexible language for programming and just like python, the PyTorch library provides flexible tools for deep learning.
A data scientist/analyst in the making needs to format and clean data before being able to perform any kind of exploratory data analysis.
This article covers some of the most popular books on Data Science and is to assist newcomers with exploring the world of data science…This article covers some of the most popular books on Data Science and is to assist newcomers in exploring the world of data science and experienced practitioners to get deeper knowledge.