To become an R data scientist, you will need to master the basics of this widely used open source language, including factors, lists, and data frames. After mastering these data structures, you’ll be ready to undertake your first very own data analysis!
The five data structures are:
Read until the end for a cheat-sheet of the data types.
Before we start with the data structures, it is important to take a look at the basic data types that make up some of the elements in these data structures.
The key types are:
4.5
, or interger values like 4
.TRUE
or FALSE
)"medium"
(Note that these are case sensitive)# Here are some variables being assigned these basic data types
my_numeric <- 9.5
my_logical <- TRUE
my_character <- "Linda"
If you want to get more in-depth with the basics in R, check out this article I wrote which teaches you how to calculate, assign variables, and work with the basic data types. It includes practice problems to work on too!
Vectors are one-dimensional arrays that can store any of the basic data types, including numerics, logicals, and characters.
To create a vector, use the combine function [c()](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/c)
with the elements separated by a comma between the parenthesis.
my_vector <- c(elem1, elem2, elem3)
numeric_vector <- c(1, 2, 3)
character_vector <- c("a", "b", "c")
Naming a vector gives a name to the vector elements. It can be done using the name function names()
.
# Without names, its not clear what data is being used
some_vector <- c("Linda", "Data Scientist")
names(some_vector) <- c("Name", "Profession")
# Output
> some_vector
Name Profession
"Linda" "Data Scientist"
If we want to select a single element from a vector, we simply put in the index of the element we want to select between square brackets.
my_vector[i]
# my_vector is the vector we are selecting from
# i is the index of the element
# To select the first element
# Note that the first element has index 1 not 0 (as in many other programming languages)
my_vector[1]
To select multiple elements from a vector, indicate which elements should be selected using a vector within the square brackets.
# For example, to select the first and fifth element, us c(1,5)
my_vector[c(1,5)]
# For example, to select a range, we can abbreviate c(2,3,4) to 2:4
my_vector[2:4]
We can also use the names of the elements instead of their numeric position.
weekday_vector["Monday"]
If you want to get more in-depth with vectors, check out this article I wrote on how to create, name, select, and compare vectors. By the end of it, you’ll learn how to analyze gaming results using vectors!
A matrix is a collection of elements of the same data type (numeric, character, or logical) arranged into a fixed number of rows and columns. A two-dimensional matrix is one that works only with rows and columns.
The[**matrix()**](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/matrix)
function creates a matrix. There are three important arguments to this function:
**vector**
— This is the collection of elements that will be arranged into the matrix rows and columns. This argument is optional; if we leave this argument blank, the matrix just won’t be filled in, but it can be filled in later. We can use vectors we already created here.**byrow**
— This indicates whether the matrix is filled row-wise (byrow=TRUE
) or column-wise (byrow=FALSE
). By default it is set to FALSE
.**nrows**
— This indicates the desired number of rows.#r-programming #data-science #programming #big-data #data analysis