To become an R data scientist, you will need to master the basics of this widely used open source language, including factors, lists, and data frames. After mastering these data structures, you’ll be ready to undertake your first very own data analysis!

The five data structures are:

  1. Vectors
  2. Matrices
  3. Factors
  4. Data Frames
  5. Lists

Read until the end for a cheat-sheet of the data types.


The Basic Data Types

Before we start with the data structures, it is important to take a look at the basic data types that make up some of the elements in these data structures.

The key types are:

  • numerics — decimal values like 4.5, or interger values like 4.
  • logicals — boolean values (TRUE or FALSE)
  • characters — text (or string) values like "medium" (Note that these are case sensitive)
# Here are some variables being assigned these basic data types
my_numeric <- 9.5
my_logical <- TRUE
my_character <- "Linda"

If you want to get more in-depth with the basics in R, check out this article I wrote which teaches you how to calculate, assign variables, and work with the basic data types. It includes practice problems to work on too!


Vectors

Vectors are one-dimensional arrays that can store any of the basic data types, including numerics, logicals, and characters.

Creating a Vector

To create a vector, use the combine function [c()](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/c) with the elements separated by a comma between the parenthesis.

my_vector <- c(elem1, elem2, elem3)
numeric_vector <- c(1, 2, 3)
character_vector <- c("a", "b", "c")

Naming a Vector

Naming a vector gives a name to the vector elements. It can be done using the name function names().

# Without names, its not clear what data is being used
some_vector <- c("Linda", "Data Scientist")
names(some_vector) <- c("Name", "Profession")

# Output
> some_vector
     Name          Profession
   "Linda"   "Data Scientist"

Selecting from a Vector

If we want to select a single element from a vector, we simply put in the index of the element we want to select between square brackets.

my_vector[i]
# my_vector is the vector we are selecting from
# i is the index of the element

# To select the first element 
# Note that the first element has index 1 not 0 (as in many other programming languages)
my_vector[1]

To select multiple elements from a vector, indicate which elements should be selected using a vector within the square brackets.

# For example, to select the first and fifth element, us c(1,5)
my_vector[c(1,5)]

# For example, to select a range, we can abbreviate c(2,3,4) to 2:4
my_vector[2:4]

We can also use the names of the elements instead of their numeric position.

weekday_vector["Monday"]

If you want to get more in-depth with vectors, check out this article I wrote on how to create, name, select, and compare vectors. By the end of it, you’ll learn how to analyze gaming results using vectors!


Matrices

matrix is a collection of elements of the same data type (numeric, character, or logical) arranged into a fixed number of rows and columns. A two-dimensional matrix is one that works only with rows and columns.

Creating a Matrix

The[**matrix()**](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/matrix) function creates a matrix. There are three important arguments to this function:

  1. **vector** — This is the collection of elements that will be arranged into the matrix rows and columns. This argument is optional; if we leave this argument blank, the matrix just won’t be filled in, but it can be filled in later. We can use vectors we already created here.
  2. **byrow**— This indicates whether the matrix is filled row-wise (byrow=TRUE) or column-wise (byrow=FALSE). By default it is set to FALSE.
  3. **nrows**— This indicates the desired number of rows.

#r-programming #data-science #programming #big-data #data analysis

5 Data Structures to Master in R if you want to be a Data Scientist
1.15 GEEK