Afer explaining how to unpivot columns of delimeted data in Power Query and Python, today I’m extending those explanations to R.
Today I am expanding this mini-series by explaining how these data transformations can be achieved in R.
Once again I’ll make use of this social network usage sample data to demo the transformations.
The objective is to take the above inital data (loaded from a CSV file) and transform it to the following form:
As an extra, I will also show you how to visualize the frequency of the social networks in a bar chart, with Plotly.
If you wish to skip the explanations and jump directly to the code, feel free to visit my GitHub repository where I have all the code and sample data.
The main focus of this demo is splitting and unpivoting the delimited data.
We can see the “Used Social Networks” column can have multiple social networks in each row (maybe it was a multiple choice question in a survey), separated by semicolons (;). This isn’t a suitable format for data analysis, as we can’t count the frequency of each individual social network.
So, the logic for extracting the individual social networks and putting them on their own rows (unpivot) is as follows:
Split and unpivot data transformations
(Notice how the data in the “Respondent ID” and “Gender” columns is repeated to make sure the social networks are still respective to their respondent)
Learn the essential concepts in data science and understand the important packages in R for data science. You will look at some of the widely used data science algorithms such as Linear regression, logistic regression, decision trees, random forest, including time-series analysis. Finally, you will get an idea about the Salary structure, Skills, Jobs, and resume of a data scientist.
A data scientist/analyst in the making needs to format and clean data before being able to perform any kind of exploratory data analysis.
TV Series that Geeks (and not so geeks) love
This video on Data Manipulation in R will help you learn how to transform and summarize your data using different packages and functions. You will use the dplyr package to select, filter, arrange, and mutate data. You will use the tidyr library to create tidy data. You will look at functions such as gather, spread, separate, and unite. Let's begin!
Data science is omnipresent to advanced statistical and machine learning methods. For whatever length of time that there is data to analyse, the need to investigate is obvious.