5 Data Structures to Master in R if you want to be a Data Scientist: Learn how to master the basic data types, and advanced data structures, such as factors, lists, and data frames.
To become an R data scientist, you will need to master the basics of this widely used open source language, including factors, lists, and data frames. After mastering these data structures, you’ll be ready to undertake your first very own data analysis!
The five data structures are:
Read until the end for a cheat-sheet of the data types.
Before we start with the data structures, it is important to take a look at the basic data types that make up some of the elements in these data structures.
The key types are:
4.5, or interger values like
"medium"(Note that these are case sensitive)
# Here are some variables being assigned these basic data types my_numeric <- 9.5 my_logical <- TRUE my_character <- "Linda"
If you want to get more in-depth with the basics in R, check out this article I wrote which teaches you how to calculate, assign variables, and work with the basic data types. It includes practice problems to work on too!
Vectors are one-dimensional arrays that can store any of the basic data types, including numerics, logicals, and characters.
To create a vector, use the combine function
[c()](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/c) with the elements separated by a comma between the parenthesis.
my_vector <- c(elem1, elem2, elem3) numeric_vector <- c(1, 2, 3) character_vector <- c("a", "b", "c")
Naming a vector gives a name to the vector elements. It can be done using the name function
# Without names, its not clear what data is being used some_vector <- c("Linda", "Data Scientist") names(some_vector) <- c("Name", "Profession") # Output > some_vector Name Profession "Linda" "Data Scientist"
If we want to select a single element from a vector, we simply put in the index of the element we want to select between square brackets.
my_vector[i] # my_vector is the vector we are selecting from # i is the index of the element # To select the first element # Note that the first element has index 1 not 0 (as in many other programming languages) my_vector
To select multiple elements from a vector, indicate which elements should be selected using a vector within the square brackets.
# For example, to select the first and fifth element, us c(1,5) my_vector[c(1,5)] # For example, to select a range, we can abbreviate c(2,3,4) to 2:4 my_vector[2:4]
We can also use the names of the elements instead of their numeric position.
If you want to get more in-depth with vectors, check out this article I wrote on how to create, name, select, and compare vectors. By the end of it, you’ll learn how to analyze gaming results using vectors!
A matrix is a collection of elements of the same data type (numeric, character, or logical) arranged into a fixed number of rows and columns. A two-dimensional matrix is one that works only with rows and columns.
[**matrix()**](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/matrix) function creates a matrix. There are three important arguments to this function:
**vector**— This is the collection of elements that will be arranged into the matrix rows and columns. This argument is optional; if we leave this argument blank, the matrix just won’t be filled in, but it can be filled in later. We can use vectors we already created here.
**byrow**— This indicates whether the matrix is filled row-wise (
byrow=TRUE) or column-wise (
byrow=FALSE). By default it is set to
**nrows**— This indicates the desired number of rows.
Learn the essential concepts in data science and understand the important packages in R for data science. You will look at some of the widely used data science algorithms such as Linear regression, logistic regression, decision trees, random forest, including time-series analysis. Finally, you will get an idea about the Salary structure, Skills, Jobs, and resume of a data scientist.
In this article, see the role of big data in healthcare and look at the new healthcare dynamics. Big Data is creating a revolution in healthcare, providing better outcomes while eliminating fraud and abuse, which contributes to a large percentage of healthcare costs.
A data scientist/analyst in the making needs to format and clean data before being able to perform any kind of exploratory data analysis.
In Conversation With Dr Suman Sanyal, NIIT University,he shares his insights on how universities can contribute to this highly promising sector and what aspirants can do to build a successful data science career.
An extensively researched list of top microsoft big data analytics and solution with ratings & reviews to help find the best Microsoft big data solutions development companies around the world.