SAS Vs R: What Is Difference Between R and SAS?

In this article, we will learn about SAS Vs R: What Is Difference Between R and SAS?. SAS is losing its footing across industries due to the rise of Shiny, an R package giving users bespoke interactivity on top of their R routines. In this article, we’ll discuss SAS vs R Programming in the context of the pharmaceutical industry, but the topic of conversation applies to any data science user looking to switch data analytics tooling

Data science teams are searching for SAS alternatives that can better handle their technical needs while satisfying non-technical personnel with interactive data storytelling. The result of this search boils down to Python vs R Programming. And although there are drag-n-drop BI tools, these solutions do not satisfy custom development, machine learning, and big data handling needs. 

Appsilon is an RStudio (Posit) Full Service Certified Partner. Find out how we can help you with R and Python development services and RStudio discounts.

If your team uses Python and is comfortable with this language, we won’t try to evangelize you. But if you find value in R for data analytics and statistical analysis, we highly recommend exploring the innovations R can provide in your organization.

Breaking down statistical analysis to understand R vs SAS

A statistical analysis has several steps: problem statement, data collection, data wrangling, data analysis, and results-based communication.

The data analysis step usually involves summarizing data using descriptive statistics and applying inferential statistics through hypothesis testing and modeling.

Sharing results is usually done by writing reports. These reports commonly contain different visualizations that help contextualize the analysis; especially for those not involved throughout the analysis process.

To perform this kind of analysis, a data analyst can choose from a variety of tools. But as we mentioned, there are some solutions that work better for your unique case. In this post, we will compare two of them: R and SAS.

What are SAS and R programming?

SAS and R programming are both statistical software used by researchers and data scientists to create statistical data analyses and visualizations.

Let’s begin by introducing the tools:

What is SAS?

SAS is commercial software that can be used to perform advanced analytics, business intelligence, data management, and predictive analytics. You can use SAS software through both a graphical interface and the SAS programming language.

SAS logo 2022

What is SAS programming?

A SAS program is a sequence of steps that you submit to SAS for execution. Each step in the program performs a specific task. Only two kinds of steps make up SAS programs: 

  1. DATA steps: in this step data is created, imported, modified, merged, or calculated.
  2. PROC steps: a group of SAS statements that call and execute a procedure, usually with a SAS data set as input. SAS procedures analyze data in SAS data sets to produce statistics, tables, reports, charts, and plots.

A SAS program can contain a DATA step, a PROC step, or any combination of DATA steps and PROC steps. The number and kind of steps depend on what tasks you need to perform.

SAS program example

The following example uses SAS to Compare Group Means. The idea is to showcase how the code and output look; not to perform a real analysis. The example data set created consists of only 6 observations.


* create example dataset;
data patients;
input patient_id treatment $ age;
cards;
1 a 24
2 a 23
3 a 25
4 b 30
5 b 36
6 b 34
;
run;

* compare group means;
ods graphics on;

proc ttest cochran ci=equal umpu;
   class treatment;
   var age;
run;

ods graphics off;

You can see how we created the data in the DATA step and then called the PROC step to perform our analysis. Here you can explore the options used in the PROC step.

The output of the analysis looks like this:

t-test procedure in sas programming

As we can see, SAS provides a lot of information when you run a PROC. You can draw conclusions from both the tables and charts. It’s standard styling and output with no custom branding.

What is the SAS suite?

The SAS software suite is made of components for data management, advanced analytics, multivariate analysis, and more. Here are just a few of some important components of the SAS suite:

  • Base SAS: Designed for data access, transformation, and reporting.
  • SAS/STAT: Designed to perform statistical analysis.
  • SAS/GRAPH: Data visualization tool to produce graphs.
  • SAS/IML: Interactive Matrix Language. Includes functions for implementing algorithms.

What is R programming compared to SAS?

R is an open-source language and environment for statistical computing and graphics. It provides a wide variety of statistical techniques such as linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, and clustering. Its most popular IDE (Integrated Development Environment) by far is RStudio by RStudio PBC (Posit).

rstudio rebrands to posit

In R, data is stored in objects. These objects can store strings, numeric values, data sets, or anything that can be referenced. To work with these objects we create functions. R has a style of problem-solving centered on functions.

Thinking about switching to R and Shiny? See why you might want to switch to R Shiny for enterprise application development

R packages

In R, code can be shared through packages. A package is a shareable collection of code that is used to perform a desired function or specific task. Examples of packages used for data science include readr, dplyr, tidyr, ggplot2.

Anyone can make a package. They can be made publicly available via CRAN (Comprehensive R Archive Network), but you can also create private packages that you can use within your organization. As of writing these lines, there are over 18.000 packages available on CRAN! 

Appsilon contributes to open source as well through our Shiny tools. We create packages to help us, and other R/Shiny developers build scalable, reproducible, and better-looking Shiny applications.

R Program Example

Let’s reproduce the SAS example using R code. Again, the idea is to showcase how code and output look, and not to focus on the interpretation of results.

Code:


# create example dataset
patients <- data.frame(
  patient_id = 1:6,
  treatment = rep(c("a", "b"), each = 3),
  age = c(24, 23, 25, 30, 36, 34)
)

# compare group means
t.test(age ~ treatment, data = patients)

Console output:

t-test r programming

R output is not styled by default. Nevertheless, there are tools you can use to make the output something you can share in a report, for example, RMarkdown.

The only limitation to output styling is your imagination. You can explore some of our Shiny demos from a variety of use cases. 

Should you use SAS or R?

In general, you should at least begin incorporating R programming into your data science toolset whether you work in an enterprise or as a private individual.

Obviously, the real answer depends on your use case. But as mentioned above, SAS is proprietary, commercial software which can be expensive to the average user. R programming is free with plenty of open source tools and frankly outpaces SAS in a lot of aspects.

Curious what R Shiny applications you can build? Explore some of our R Shiny demos to see what you could create.

And in this Section, we’ll compare both software on different topics that will affect your choice of SAS or R within an enterprise.

Access to New Developments in R vs SAS

Open-source software acceptance has increased in recent years. People working together in the community allows for quicker “to market” solutions. It also creates open access to see what lies underneath the hood. There’s no guesswork in how algorithms work or if it’s the best for your case.

These new algorithms are developed and shared with the community. Implementation of them in SAS takes longer than in R. This means that more advanced data science techniques might be available right now in R but not yet available in SAS.

Collaboration in R

File sharing and collaboration are easier with R. If you want to share with a friend or colleague something you developed using SAS, that person requires access to the software – which is licensed. Even though there are some free versions, they require setting up an account which might be something you want to avoid. R is easily downloaded and installed so you can quickly set it up and run code. You can also quickly publish a dashboard on the web using Shiny.

Get your data story into the hands of colleagues quickly using these top 3 methods for sharing R Shiny apps

Cost of SAS vs R for data science teams

SAS is commercial software. Meaning, that you must pay to play, so to speak. SAS licenses are known to be expensive so it makes it difficult for individuals and small businesses to use or scale.

On the other hand, R is open source. In other words, it’s free to use. Anyone can download it and start using it.

Join the Shiny movement and develop your own R Shiny dashboard in less than 10 minutes!

Should you learn R or SAS?

Whether or not you have experience with programming, we recommend learning R first. It’s easy to get started, free, and there are lots of freely accessible learning materials.

If you have experience using programming languages, switching to a different language is a matter of learning how to do the things you know, in another place. It usually depends on the resources available for learning. 

Need a Shiny dashboard now? Download our free Shiny templates and get started today!

R educational resources

R has a lot of free, online resources to get started (e.g., Hands-on Programming with R and  R for Data Science. You can also find books for different topics you want to learn (e.g., reporting, and creating web applications). Books are not the only resources available. Join RStudio’s webinars, learn and connect with R users in industry!

SAS educational resources

SAS offers courses to learn its software. They also have extensive documentation. Another thing worth mentioning about SAS is that it offers some products that don’t require knowing how to code (e.g., SAS Enterprise Guide). These tools Access the functionality of SAS from a point-and-click Windows interface.

Functionality of SAS compared to R

The following table compares how SAS and R work.

SASR
Data stepsExpressions with functions
ProceduresExpressions with functions
MacrosExpressed in R functions
SAS FunctionsR functions
SAS ODS (Output Delivery System)R Markdown, R Quarto

Hiring SAS developers vs R developers

Over the last decade, universities have begun to shift from teaching SAS to R. Even domain-specific stats courses tend to use R and train on the RStudio IDE. This means that the R talent pool has increased and will continue to do so in the future.

With that being said, R is not as popular as Python for developers. The TIOBE Index for 2022 indicates Python is King of the hill at #1 (R is #16, and SAS is a lowly #26). So if stacking your team is a priority and you’re already using Python routines for your analytics, stick with Python.

Data from PYPL Index

If you don’t know already, you can now use Python on RStudio. If you need help setting up your environment with your preferred language on RStudio platforms, contact Appsilon. We’re RStudio Certified Partners and can help you with 

R programming outsourcing and R Shiny consulting

There’s a growing number of R-based consultancies popping up as companies begin to expand their data science teams and the need for R Shiny developers to handle more complex data handling and visualization requirements.

At Appsilon, we’ve been creating, maintaining, and developing Shiny applications for enterprise customers all over the world for many years now. Appsilon provides scalability, security, and modern UI/UX with custom R packages that native Shiny apps do not provide. Our team is among the world’s foremost experts in R Shiny and has made a variety of Shiny innovations (including scaling to 700 users!) over the years.

Appsilon is also a proud RStudio (Posit) Full Service Certified Partner. Meaning we can help you throughout the entire process of implementing and scaling RStudio (Posit) products and simplify your data-driven decision-making.

Some of the services we, as Shiny consultants provide include:

  • Rapid dashboard development
  • Support of full-stack engineers (from setting up a shiny server to UX optimization)
  • DevOps support & advisory for all RStudio products
  • Machine learning solutions
  • Advanced statistical models

We deliver world-class Shiny applications faster than other vendors. Ultimately, lowering the overall cost of development and improving time to deployment. We use continuous collaboration with clients, end-to-end testing, and automated processes to streamline the development process. Our team can step in at every phase of a Shiny project, starting from business analysis and data science consulting to code refactoring.

R programming packages vs SAS tools

As mentioned, R packages can be developed by anyone. Even though there is no guarantee that they will work as expected, a package that is used by a lot of people is usually something safe to use. The reason is the following: suppose that a package has a bug the creator wasn’t aware of. People start using that package and someone identifies that problem. That person shares that with the creator (and the community) so that it can be fixed. Even someone other than the creator of the package can help code a solution!

In R, there is something called the tidyverse, a collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.

On the other hand, if you detect a problem in SAS you have to communicate with them and wait for a new release with a solution. This might be another reason why things take longer to implement in SAS.

Creating visualizations in R and SAS

R provides a variety of packages to create custom charts, both static (ggplot2) and dynamic (plotly, highcharter). Here you can see lots of different visualizations you can make with R and the code used to produce them!

SAS data visualization features are more limited than R and don’t provide as much customization.

SAS Example

In this example, we will create another example dataset. We will create a histogram to understand age distribution by treatment group. SAS’ PROC UNIVARIATE also provides more information which we will not show here because the goal is to see the chart.


* create example dataset;
data patients;
input treatment $ age sex $;
cards;
a 24 m
a 23 m
a 25 m
a 21 m
a 22 f
a 22 f
a 23 f
a 28 f
a 21 f
a 20 f
a 29 f
a 18 f
a 30 f
a 23 f
a 25 f
a 24 f
a 23 f
a 25 f
b 30 f
b 36 f
b 34 f
b 31 f
b 32 m
b 32 m
b 34 m
b 33 m
b 34 m
b 30 m
b 28 m
b 33 m
b 40 m
b 22 m
b 29 m
;
run;

/*create histogram for age variable by treatment*/
proc univariate data=patients;
    class treatment;
    var age;
    histogram age / overlay;
run;

Output:

R

We will produce the same chart with R code.


# load libraries
library(tibble)
library(ggplot2)

# create example dataset
patients <- tibble::tribble(
  ~treatment, ~age, ~sex,
  "a", 24, "m",
  "a", 23, "m",
  "a", 25, "m",
  "a", 21, "m",
  "a", 22, "f",
  "a", 22, "f",
  "a", 23, "f",
  "a", 28, "f",
  "a", 21, "f",
  "a", 20, "f",
  "a", 29, "f",
  "a", 18, "f",
  "a", 30, "f",
  "a", 23, "f",
  "a", 25, "f",
  "a", 24, "f",
  "a", 23, "f",
  "a", 25, "f",
  "b", 30, "f",
  "b", 36, "f",
  "b", 34, "f",
  "b", 31, "f",
  "b", 32, "m",
  "b", 32, "m",
  "b", 34, "m",
  "b", 33, "m",
  "b", 34, "m",
  "b", 30, "m",
  "b", 28, "m",
  "b", 33, "m",
  "b", 40, "m",
  "b", 22, "m",
  "b", 29, "m"
)

# create chart
ggplot(data = patients, aes(x = age, fill = treatment)) +
  geom_histogram(position = "identity", 
                 alpha = 0.5, 
                 bins = 9,
                 color = "black") +
  labs(
    title = "Distribution of age by treatment",
    x = "Age (years)",
    y = "Number of Patients",
    fill = "Treatment"
  ) +
  theme_minimal() +
  theme(
    legend.position = "top"
  )

We see how with just a few lines of code we are able to create a beautiful chart using R. And we’re able to provide more customization to better suit our needs.

Support in SAS vs R

SAS provides Technical Support and has Documentation with information about everything you can do and how things are implemented in the software.

R doesn’t have Technical Support (it is open source) but it has a large community you can reach out to for help. Packages are usually well documented and come with excellent tutorials (called Vignettes) with examples. If you’re used to Python, you’ll be pleasantly surprised by the quality of documentation that is standard for the R ecosystem (e.g., dplyr vignette, tidyr vignette).

Should you use SAS or R for Clinical Data Science?

SAS is great when you need minimal output or sequential processing. But R offers greater flexibility. And with the recent successes by the R Consortium and enhanced collaboration with the FDA, R is trending toward’s higher standardization to satisfy regulatory needs.

In this section, we will explore how to solve a particular problem using each software. We will use R through RStudio IDE and SAS using SAS On Demand (which is a free version).

Problem Statement

We want to analyze the effect of different variables on mortality due to a particular disease. In particular, we want to understand the differences between treatment application times (no treatment, fast treatment, slow treatment). To do so, we will create a logistic regression model.

Data

The example dataset was created to showcase how to perform the different steps that are usually part of an analysis. It is based on real data, but it has been anonymized and some information was removed from the file (such as disease and treatment names). The focus here is on the how not the what.

Data is in .csv format and contains the following information about patients:

  • ID: Patient identifier.
  • AGE: Age of the patient measured in years.
  • SEX: Sex of the patient (F = Female, M = Male).
  • CHARLSON: Charlson score of the patient.
  • PITT: Pitt bacteremia score of the patient.
  • SURVIVED: Indicator variable. 1 means the patient was cured.
  • DIED_OF_DISEASE: Indicator variable. 1 means the patient died due to the disease.
  • DIED_OTHER: Indicator variable. 1 means the patient died due to another cause.
  • UNKNOWN: Indicator variable. 1 means status was lost for the patient.
  • TREATMENT: Indicator variable. 1 means the patient received treatment.
  • TREATMENT_FAST: Indicator variable. 1 means treatment was applied within 48hs.

In the following Sections we will see how to perform different tasks with SAS and R:

  • Read data
  • Wrangle data (apply filters, add and remove columns)
  • Create a logistic regression model
  • Explore outputs

Working with SAS vs R

Reading data in SAS vs R programming

To start working with data, first, we need to have access to it.

SAS:


* READ DATA;
FILENAME REFFILE '/home/u4729884/data.csv';
PROC IMPORT DATAFILE=REFFILE
	DBMS=CSV
	OUT=WORK.RAW_DATA;
	GETNAMES=YES;
RUN;

R programming:


# Read data
raw_data <- read.csv("data.csv")

Data Wrangling in SAS vs R programming

Now that we have data available, we will prepare it for the model. In this Section we will show how to:

  • Remove rows corresponding to patients with unknown status
  • Remove rows corresponding to patients that died due to another cause
  • Create a new treatment variable using values found in other columns
  • Create new indicator variables for age, Charlson and Pitt using specific cutpoints
  • Select columns

This is not the only way one could process the data. We choose to do it this way for simplicity. Feel free to try something different and share your results!

SAS:


* DATA WRANGLING;
DATA WORK.DATA_TO_MODEL (KEEP = AGE_60 
				      SEX 
					CHARLSON_4
					PITT_3
					TREATMENT_CATEGORY
					DIED_OF_DISEASE); * KEEP VARS OF INTEREST;
	SET WORK.RAW_DATA;
	
	* REMOVE PATIENTS WITH UNKNOWN STATUS;
	IF UNKNOWN = 1 THEN DELETE;
	* REMOVE PATIENTS THAT DIED DUE TO OTHER CAUSE;
	IF DIED_OTHER_CAUSE = 1 THEN DELETE;
	
	* CREATE TREATMENT FACTOR VARIABLE;
	LENGTH TREATMENT_CATEGORY $14;
	IF TREATMENT = 0 THEN TREATMENT_CATEGORY = "NO TREATMENT";
	ELSE IF TREATMENT_FAST = 1 THEN TREATMENT_CATEGORY = "FAST TREATMENT";
	ELSE TREATMENT_CATEGORY = "SLOW TREATMENT";
	* NEW AGE VARIABLE;
	IF AGE >= 60 THEN AGE_60 = 1;
	ELSE AGE_60 = 0;
	* NEW CHARLSON VARIABLE;
	IF CHARLSON > 4 THEN CHARLSON_4 = 1;
	ELSE CHARLSON_4 = 0;
	* NEW PITT VARIABLE;
	IF PITT > 3 THEN PITT_3 = 1;
	ELSE PITT_3 = 0;
RUN;

R programming:


# Load required library
library(dplyr)
# Data wrangling
data_to_model <- raw_data |>
  # Filter rows
  filter(
    UNKNOWN != 1,
    DIED_OTHER_CAUSE != 1
  ) |>
  # Create new columns
  mutate(
    TREATMENT_CATEGORY = case_when(
      TREATMENT == 0 ~ "NO TREATMENT",
      TREATMENT_FAST == 1 ~ "FAST TREATMENT",
      TRUE ~ "SLOW TREATMENT"
    ),
    AGE_60 = ifelse(AGE >= 60, 1, 0),
    CHARLSON_4 = ifelse(CHARLSON > 4, 1, 0),
    PITT_3 = ifelse(PITT > 3, 1, 0)
  ) |>
  # Select columns
  select(
    AGE_60,
    CHARLSON_4,
    PITT_3,
    TREATMENT_CATEGORY,
    DIED_OF_DISEASE
  )

Modeling data in SAS vs R programming

Once data is processed, we are ready to model. We will create a logistic regression model where we will model the probability of dying due to the disease. As explanatory variables we will include:

  • Dichotomized age (cutpoint: 60 years, reference: age > 60)
  • Dichotomized Charlson score (cutpoint: score of 4, reference: score > 4)
  • Dichotomized Pitt score (cutpoint: score of 3, reference: score > 3)  
  • Treatment (Factor with three levels: no treatment, fast treatment, slow treatment). We will use fast treatment as a reference.

SAS:


* MODELING;
PROC LOGISTIC DATA = WORK.DATA_TO_MODEL DESCENDING;
CLASS TREATMENT_CATEGORY (REF = "FAST TREATMENT") SEX (REF = "F") / PARAM = REFERENCE;
MODEL DIED_OF_DISEASE = AGE_60 CHARLSON_4 PITT_3 TREATMENT_CATEGORY / LINK = LOGIT SCALE = NONE; 
RUN;

R programming:


# Create model
model <- glm(formula =  DIED_OF_DISEASE ~ .,
             data = data_to_model,
             family = binomial)

# Explore results
summary(model)

# Get odds ratio
exp(cbind(coef(model), confint(model, level = 0.95)))

Exploring model results in SAS vs R programming

SAS:

Let’s focus on these two tables:

Here we can see the coefficients of the model, its significance level, and the translation to odds ratio estimates (which are more interpretable when doing logistic regression.

We will not dive into details for explaining model assessment or results interpretation. The idea here is to show where this information is available and how to get it using code.

R programming:

In R, we have to compute the odds ratio using the model coefficients. It can be done with:

Final comparison of SAS vs R programming

General remarks:

  • With both SAS and R we arrived at the same results.
  • SAS code requires the use of a semicolon ( ; ) to end statements as well as an explicit RUN to run code. This makes SAS more prone to distraction errors.
  • By default, SAS shows more information in its output and has some styling.

Detailed comparison:

  • Reading data. Importing data with SAS involves more lines of code, with somewhat cryptic parameters. In R, it was straightforwardly done with a function available by default when installing R.
  • Data wrangling. SAS syntax seems more complicated than R. In R, thanks to the use of the pipe operator, code is easier to read. Also, we can follow what is going on in each step. In SAS, some parts are defined at the beginning (for example, the columns to keep). Also, when creating the new treatment variable, in SAS we had to define its type before being able to create it, which might seem counter-intuitive.
  • Modeling. Again, SAS code seems more verbose than R in order to accomplish this task. One thing to mention here is that in both software there are multiple ways to get to the same result, so maybe another person can write code in a more succinct way.
  • Exploring results. Even though SAS output looks nicer, maybe you want to write a report using part of those results. Right now, it would imply copying and pasting tables in a document and sharing it. Regarding R, we have shown what the output looks like in the console. With R you can create reports with code in the same file using R Markdown. One more thing to mention is that SAS by default computed the odds ratio. If you want to access those values in R you need to apply computations using the model results.

Should You Choose SAS or R? (Conclusion)

If you’re looking to keep pace within your industry or create faster tooling and PoCs for your team, you should consider switching to R programming. SAS still holds value for a lot of users, but R and its open source packages are becoming the standard for the new workforce. Don’t get left behind!


Original article sourced at: https://appsilon.com

#sas #r #programming 

What is GEEK

Buddha Community

SAS Vs R: What Is Difference Between R and SAS?

CSharp REPL: A Command Line C# REPL with Syntax Highlighting

C# REPL

A cross-platform command line REPL for the rapid experimentation and exploration of C#. It supports intellisense, installing NuGet packages, and referencing local .NET projects and assemblies.

C# REPL screenshot 

(click to view animation)

C# REPL provides the following features:

  • Syntax highlighting via ANSI escape sequences
  • Intellisense with fly-out documentation
  • Nuget package installation
  • Reference local assemblies, solutions, and projects
  • Navigate to source via Source Link
  • IL disassembly (both Debug and Release mode)
  • Fast and flicker-free rendering. A "diff" algorithm is used to only render what's changed.

Installation

C# REPL is a .NET 6 global tool, and runs on Windows 10, Mac OS, and Linux. It can be installed via:

dotnet tool install -g csharprepl

If you're running on Mac OS Catalina (10.15) or later, make sure you follow any additional directions printed to the screen. You may need to update your PATH variable in order to use .NET global tools.

After installation is complete, run csharprepl to begin. C# REPL can be updated via dotnet tool update -g csharprepl.

Usage:

Run csharprepl from the command line to begin an interactive session. The default colorscheme uses the color palette defined by your terminal, but these colors can be changed using a theme.json file provided as a command line argument.

Evaluating Code

Type some C# into the prompt and press Enter to run it. The result, if any, will be printed:

> Console.WriteLine("Hello World")
Hello World

> DateTime.Now.AddDays(8)
[6/7/2021 5:13:00 PM]

To evaluate multiple lines of code, use Shift+Enter to insert a newline:

> var x = 5;
  var y = 8;
  x * y
40

Additionally, if the statement is not a "complete statement" a newline will automatically be inserted when Enter is pressed. For example, in the below code, the first line is not a syntactically complete statement, so when we press enter we'll go down to a new line:

> if (x == 5)
  | // caret position, after we press Enter on Line 1

Finally, pressing Ctrl+Enter will show a "detailed view" of the result. For example, for the DateTime.Now expression below, on the first line we pressed Enter, and on the second line we pressed Ctrl+Enter to view more detailed output:

> DateTime.Now // Pressing Enter shows a reasonable representation
[5/30/2021 5:13:00 PM]

> DateTime.Now // Pressing Ctrl+Enter shows a detailed representation
[5/30/2021 5:13:00 PM] {
  Date: [5/30/2021 12:00:00 AM],
  Day: 30,
  DayOfWeek: Sunday,
  DayOfYear: 150,
  Hour: 17,
  InternalKind: 9223372036854775808,
  InternalTicks: 637579915804530992,
  Kind: Local,
  Millisecond: 453,
  Minute: 13,
  Month: 5,
  Second: 0,
  Ticks: 637579915804530992,
  TimeOfDay: [17:13:00.4530992],
  Year: 2021,
  _dateData: 9860951952659306800
}

A note on semicolons: C# expressions do not require semicolons, but statements do. If a statement is missing a required semicolon, a newline will be added instead of trying to run the syntatically incomplete statement; simply type the semicolon to complete the statement.

> var now = DateTime.Now; // assignment statement, semicolon required

> DateTime.Now.AddDays(8) // expression, we don't need a semicolon
[6/7/2021 5:03:05 PM]

Keyboard Shortcuts

  • Basic Usage
    • Ctrl+C - Cancel current line
    • Ctrl+L - Clear screen
    • Enter - Evaluate the current line if it's a syntactically complete statement; otherwise add a newline
    • Control+Enter - Evaluate the current line, and return a more detailed representation of the result
    • Shift+Enter - Insert a new line (this does not currently work on Linux or Mac OS; Hopefully this will work in .NET 7)
    • Ctrl+Shift+C - Copy current line to clipboard
    • Ctrl+V, Shift+Insert, and Ctrl+Shift+V - Paste text to prompt. Automatically trims leading indent
  • Code Actions
    • F1 - Opens the MSDN documentation for the class/method under the caret (example)
    • F9 - Shows the IL (intermediate language) for the current statement in Debug mode.
    • Ctrl+F9 - Shows the IL for the current statement with Release mode optimizations.
    • F12 - Opens the source code in the browser for the class/method under the caret, if the assembly supports Source Link.
  • Autocompletion
    • Ctrl+Space - Open autocomplete menu. If there's a single option, pressing Ctrl+Space again will select the option
    • Enter, Right Arrow, Tab - Select active autocompletion option
    • Escape - closes autocomplete menu
  • Text Navigation
    • Home and End - Navigate to beginning of a single line and end of a single line, respectively
    • Ctrl+Home and Ctrl+End - Navigate to beginning of line and end across multiple lines in a multiline prompt, respectively
    • Arrows - Navigate characters within text
    • Ctrl+Arrows - Navigate words within text
    • Ctrl+Backspace - Delete previous word
    • Ctrl+Delete - Delete next word

Adding References

Use the #r command to add assembly or nuget references.

  • For assembly references, run #r "AssemblyName" or #r "path/to/assembly.dll"
  • For project references, run #r "path/to/project.csproj". Solution files (.sln) can also be referenced.
  • For nuget references, run #r "nuget: PackageName" to install the latest version of a package, or #r "nuget: PackageName, 13.0.5" to install a specific version (13.0.5 in this case).

Installing nuget packages

To run ASP.NET applications inside the REPL, start the csharprepl application with the --framework parameter, specifying the Microsoft.AspNetCore.App shared framework. Then, use the above #r command to reference the application DLL. See the Command Line Configuration section below for more details.

csharprepl --framework  Microsoft.AspNetCore.App

Command Line Configuration

The C# REPL supports multiple configuration flags to control startup, behavior, and appearance:

csharprepl [OPTIONS] [response-file.rsp] [script-file.csx] [-- <additional-arguments>]

Supported options are:

  • OPTIONS:
    • -r <dll> or --reference <dll>: Reference an assembly, project file, or nuget package. Can be specified multiple times. Uses the same syntax as #r statements inside the REPL. For example, csharprepl -r "nuget:Newtonsoft.Json" "path/to/myproj.csproj"
      • When an assembly or project is referenced, assemblies in the containing directory will be added to the assembly search path. This means that you don't need to manually add references to all of your assembly's dependencies (e.g. other references and nuget packages). Referencing the main entry assembly is enough.
    • -u <namespace> or --using <namespace>: Add a using statement. Can be specified multiple times.
    • -f <framework> or --framework <framework>: Reference a shared framework. The available shared frameworks depends on the local .NET installation, and can be useful when running an ASP.NET application from the REPL. Example frameworks are:
      • Microsoft.NETCore.App (default)
      • Microsoft.AspNetCore.All
      • Microsoft.AspNetCore.App
      • Microsoft.WindowsDesktop.App
    • -t <theme.json> or --theme <theme.json>: Read a theme file for syntax highlighting. This theme file associates C# syntax classifications with colors. The color values can be full RGB, or ANSI color names (defined in your terminal's theme). The NO_COLOR standard is supported.
    • --trace: Produce a trace file in the current directory that logs CSharpRepl internals. Useful for CSharpRepl bug reports.
    • -v or --version: Show version number and exit.
    • -h or --help: Show help and exit.
  • response-file.rsp: A filepath of an .rsp file, containing any of the above command line options.
  • script-file.csx: A filepath of a .csx file, containing lines of C# to evaluate before starting the REPL. Arguments to this script can be passed as <additional-arguments>, after a double hyphen (--), and will be available in a global args variable.

If you have dotnet-suggest enabled, all options can be tab-completed, including values provided to --framework and .NET namespaces provided to --using.

Integrating with other software

C# REPL is a standalone software application, but it can be useful to integrate it with other developer tools:

Windows Terminal

To add C# REPL as a menu entry in Windows Terminal, add the following profile to Windows Terminal's settings.json configuration file (under the JSON property profiles.list):

{
    "name": "C# REPL",
    "commandline": "csharprepl"
},

To get the exact colors shown in the screenshots in this README, install the Windows Terminal Dracula theme.

Visual Studio Code

To use the C# REPL with Visual Studio Code, simply run the csharprepl command in the Visual Studio Code terminal. To send commands to the REPL, use the built-in Terminal: Run Selected Text In Active Terminal command from the Command Palette (workbench.action.terminal.runSelectedText).

Visual Studio Code screenshot

Windows OS

To add the C# REPL to the Windows Start Menu for quick access, you can run the following PowerShell command, which will start C# REPL in Windows Terminal:

$shell = New-Object -ComObject WScript.Shell
$shortcut = $shell.CreateShortcut("$env:appdata\Microsoft\Windows\Start Menu\Programs\csharprepl.lnk")
$shortcut.TargetPath = "wt.exe"
$shortcut.Arguments = "-w 0 nt csharprepl.exe"
$shortcut.Save()

You may also wish to add a shorter alias for C# REPL, which can be done by creating a .cmd file somewhere on your path. For example, put the following contents in C:\Users\username\.dotnet\tools\csr.cmd:

wt -w 0 nt csharprepl

This will allow you to launch C# REPL by running csr from anywhere that accepts Windows commands, like the Window Run dialog.

Comparison with other REPLs

This project is far from being the first REPL for C#. Here are some other projects; if this project doesn't suit you, another one might!

Visual Studio's C# Interactive pane is full-featured (it has syntax highlighting and intellisense) and is part of Visual Studio. This deep integration with Visual Studio is both a benefit from a workflow perspective, and a drawback as it's not cross-platform. As far as I know, the C# Interactive pane does not support NuGet packages or navigating to documentation/source code. Subjectively, it does not follow typical command line keybindings, so can feel a bit foreign.

csi.exe ships with C# and is a command line REPL. It's great because it's a cross platform REPL that comes out of the box, but it doesn't support syntax highlighting or autocompletion.

dotnet script allows you to run C# scripts from the command line. It has a REPL built-in, but the predominant focus seems to be as a script runner. It's a great tool, though, and has a strong community following.

dotnet interactive is a tool from Microsoft that creates a Jupyter notebook for C#, runnable through Visual Studio Code. It also provides a general framework useful for running REPLs.

Download Details:
Author: waf
Source Code: https://github.com/waf/CSharpRepl
License: MPL-2.0 License

#dotnet  #aspdotnet  #csharp 

August  Larson

August Larson

1624422360

R vs Python: What Should Beginners Learn?

Let go of any doubts or confusion, make the right choice and then focus and thrive as a data scientist.

I currently lead a research group with data scientists who use both R and Python. I have been in this field for over 14 years. I have witnessed the growth of both languages over the years and there is now a thriving community behind both.

I did not have a straightforward journey and learned many things the hard way. However, you can avoid making the mistakes I made and lead a more focussed, more rewarding journey and reach your goals quicker than others.

Before I dive in, let’s get something out of the way. R and Python are just tools to do the same thing. Data Science. Neither of the tools is inherently better than the other. Both the tools have been evolving over years (and will likely continue to do so).

Therefore, the short answer on whether you should learn Python or R is: it depends.

The longer answer, if you can spare a few minutes, will help you focus on what really matters and avoid the most common mistakes most enthusiastic beginners aspiring to become expert data scientists make.

#r-programming #python #perspective #r vs python: what should beginners learn? #r vs python #r

Dotnet Script: Run C# Scripts From The .NET CLI

dotnet script

Run C# scripts from the .NET CLI, define NuGet packages inline and edit/debug them in VS Code - all of that with full language services support from OmniSharp.

NuGet Packages

NameVersionFramework(s)
dotnet-script (global tool)Nugetnet6.0, net5.0, netcoreapp3.1
Dotnet.Script (CLI as Nuget)Nugetnet6.0, net5.0, netcoreapp3.1
Dotnet.Script.CoreNugetnetcoreapp3.1 , netstandard2.0
Dotnet.Script.DependencyModelNugetnetstandard2.0
Dotnet.Script.DependencyModel.NugetNugetnetstandard2.0

Installing

Prerequisites

The only thing we need to install is .NET Core 3.1 or .NET 5.0 SDK.

.NET Core Global Tool

.NET Core 2.1 introduced the concept of global tools meaning that you can install dotnet-script using nothing but the .NET CLI.

dotnet tool install -g dotnet-script

You can invoke the tool using the following command: dotnet-script
Tool 'dotnet-script' (version '0.22.0') was successfully installed.

The advantage of this approach is that you can use the same command for installation across all platforms. .NET Core SDK also supports viewing a list of installed tools and their uninstallation.

dotnet tool list -g

Package Id         Version      Commands
---------------------------------------------
dotnet-script      0.22.0       dotnet-script
dotnet tool uninstall dotnet-script -g

Tool 'dotnet-script' (version '0.22.0') was successfully uninstalled.

Windows

choco install dotnet.script

We also provide a PowerShell script for installation.

(new-object Net.WebClient).DownloadString("https://raw.githubusercontent.com/filipw/dotnet-script/master/install/install.ps1") | iex

Linux and Mac

curl -s https://raw.githubusercontent.com/filipw/dotnet-script/master/install/install.sh | bash

If permission is denied we can try with sudo

curl -s https://raw.githubusercontent.com/filipw/dotnet-script/master/install/install.sh | sudo bash

Docker

A Dockerfile for running dotnet-script in a Linux container is available. Build:

cd build
docker build -t dotnet-script -f Dockerfile ..

And run:

docker run -it dotnet-script --version

Github

You can manually download all the releases in zip format from the GitHub releases page.

Usage

Our typical helloworld.csx might look like this:

Console.WriteLine("Hello world!");

That is all it takes and we can execute the script. Args are accessible via the global Args array.

dotnet script helloworld.csx

Scaffolding

Simply create a folder somewhere on your system and issue the following command.

dotnet script init

This will create main.csx along with the launch configuration needed to debug the script in VS Code.

.
├── .vscode
│   └── launch.json
├── main.csx
└── omnisharp.json

We can also initialize a folder using a custom filename.

dotnet script init custom.csx

Instead of main.csx which is the default, we now have a file named custom.csx.

.
├── .vscode
│   └── launch.json
├── custom.csx
└── omnisharp.json

Note: Executing dotnet script init inside a folder that already contains one or more script files will not create the main.csx file.

Running scripts

Scripts can be executed directly from the shell as if they were executables.

foo.csx arg1 arg2 arg3

OSX/Linux

Just like all scripts, on OSX/Linux you need to have a #! and mark the file as executable via chmod +x foo.csx. If you use dotnet script init to create your csx it will automatically have the #! directive and be marked as executable.

The OSX/Linux shebang directive should be #!/usr/bin/env dotnet-script

#!/usr/bin/env dotnet-script
Console.WriteLine("Hello world");

You can execute your script using dotnet script or dotnet-script, which allows you to pass arguments to control your script execution more.

foo.csx arg1 arg2 arg3
dotnet script foo.csx -- arg1 arg2 arg3
dotnet-script foo.csx -- arg1 arg2 arg3

Passing arguments to scripts

All arguments after -- are passed to the script in the following way:

dotnet script foo.csx -- arg1 arg2 arg3

Then you can access the arguments in the script context using the global Args collection:

foreach (var arg in Args)
{
    Console.WriteLine(arg);
}

All arguments before -- are processed by dotnet script. For example, the following command-line

dotnet script -d foo.csx -- -d

will pass the -d before -- to dotnet script and enable the debug mode whereas the -d after -- is passed to script for its own interpretation of the argument.

NuGet Packages

dotnet script has built-in support for referencing NuGet packages directly from within the script.

#r "nuget: AutoMapper, 6.1.0"

package

Note: Omnisharp needs to be restarted after adding a new package reference

Package Sources

We can define package sources using a NuGet.Config file in the script root folder. In addition to being used during execution of the script, it will also be used by OmniSharp that provides language services for packages resolved from these package sources.

As an alternative to maintaining a local NuGet.Config file we can define these package sources globally either at the user level or at the computer level as described in Configuring NuGet Behaviour

It is also possible to specify packages sources when executing the script.

dotnet script foo.csx -s https://SomePackageSource

Multiple packages sources can be specified like this:

dotnet script foo.csx -s https://SomePackageSource -s https://AnotherPackageSource

Creating DLLs or Exes from a CSX file

Dotnet-Script can create a standalone executable or DLL for your script.

SwitchLong switchdescription
-o--outputDirectory where the published executable should be placed. Defaults to a 'publish' folder in the current directory.
-n--nameThe name for the generated DLL (executable not supported at this time). Defaults to the name of the script.
 --dllPublish to a .dll instead of an executable.
-c--configurationConfiguration to use for publishing the script [Release/Debug]. Default is "Debug"
-d--debugEnables debug output.
-r--runtimeThe runtime used when publishing the self contained executable. Defaults to your current runtime.

The executable you can run directly independent of dotnet install, while the DLL can be run using the dotnet CLI like this:

dotnet script exec {path_to_dll} -- arg1 arg2

Caching

We provide two types of caching, the dependency cache and the execution cache which is explained in detail below. In order for any of these caches to be enabled, it is required that all NuGet package references are specified using an exact version number. The reason for this constraint is that we need to make sure that we don't execute a script with a stale dependency graph.

Dependency Cache

In order to resolve the dependencies for a script, a dotnet restore is executed under the hood to produce a project.assets.json file from which we can figure out all the dependencies we need to add to the compilation. This is an out-of-process operation and represents a significant overhead to the script execution. So this cache works by looking at all the dependencies specified in the script(s) either in the form of NuGet package references or assembly file references. If these dependencies matches the dependencies from the last script execution, we skip the restore and read the dependencies from the already generated project.assets.json file. If any of the dependencies has changed, we must restore again to obtain the new dependency graph.

Execution cache

In order to execute a script it needs to be compiled first and since that is a CPU and time consuming operation, we make sure that we only compile when the source code has changed. This works by creating a SHA256 hash from all the script files involved in the execution. This hash is written to a temporary location along with the DLL that represents the result of the script compilation. When a script is executed the hash is computed and compared with the hash from the previous compilation. If they match there is no need to recompile and we run from the already compiled DLL. If the hashes don't match, the cache is invalidated and we recompile.

You can override this automatic caching by passing --no-cache flag, which will bypass both caches and cause dependency resolution and script compilation to happen every time we execute the script.

Cache Location

The temporary location used for caches is a sub-directory named dotnet-script under (in order of priority):

  1. The path specified for the value of the environment variable named DOTNET_SCRIPT_CACHE_LOCATION, if defined and value is not empty.
  2. Linux distributions only: $XDG_CACHE_HOME if defined otherwise $HOME/.cache
  3. macOS only: ~/Library/Caches
  4. The value returned by Path.GetTempPath for the platform.

 

Debugging

The days of debugging scripts using Console.WriteLine are over. One major feature of dotnet script is the ability to debug scripts directly in VS Code. Just set a breakpoint anywhere in your script file(s) and hit F5(start debugging)

debug

Script Packages

Script packages are a way of organizing reusable scripts into NuGet packages that can be consumed by other scripts. This means that we now can leverage scripting infrastructure without the need for any kind of bootstrapping.

Creating a script package

A script package is just a regular NuGet package that contains script files inside the content or contentFiles folder.

The following example shows how the scripts are laid out inside the NuGet package according to the standard convention .

└── contentFiles
    └── csx
        └── netstandard2.0
            └── main.csx

This example contains just the main.csx file in the root folder, but packages may have multiple script files either in the root folder or in subfolders below the root folder.

When loading a script package we will look for an entry point script to be loaded. This entry point script is identified by one of the following.

  • A script called main.csx in the root folder
  • A single script file in the root folder

If the entry point script cannot be determined, we will simply load all the scripts files in the package.

The advantage with using an entry point script is that we can control loading other scripts from the package.

Consuming a script package

To consume a script package all we need to do specify the NuGet package in the #loaddirective.

The following example loads the simple-targets package that contains script files to be included in our script.

#load "nuget:simple-targets-csx, 6.0.0"

using static SimpleTargets;
var targets = new TargetDictionary();

targets.Add("default", () => Console.WriteLine("Hello, world!"));

Run(Args, targets);

Note: Debugging also works for script packages so that we can easily step into the scripts that are brought in using the #load directive.

Remote Scripts

Scripts don't actually have to exist locally on the machine. We can also execute scripts that are made available on an http(s) endpoint.

This means that we can create a Gist on Github and execute it just by providing the URL to the Gist.

This Gist contains a script that prints out "Hello World"

We can execute the script like this

dotnet script https://gist.githubusercontent.com/seesharper/5d6859509ea8364a1fdf66bbf5b7923d/raw/0a32bac2c3ea807f9379a38e251d93e39c8131cb/HelloWorld.csx

That is a pretty long URL, so why don't make it a TinyURL like this:

dotnet script https://tinyurl.com/y8cda9zt

Script Location

A pretty common scenario is that we have logic that is relative to the script path. We don't want to require the user to be in a certain directory for these paths to resolve correctly so here is how to provide the script path and the script folder regardless of the current working directory.

public static string GetScriptPath([CallerFilePath] string path = null) => path;
public static string GetScriptFolder([CallerFilePath] string path = null) => Path.GetDirectoryName(path);

Tip: Put these methods as top level methods in a separate script file and #load that file wherever access to the script path and/or folder is needed.

REPL

This release contains a C# REPL (Read-Evaluate-Print-Loop). The REPL mode ("interactive mode") is started by executing dotnet-script without any arguments.

The interactive mode allows you to supply individual C# code blocks and have them executed as soon as you press Enter. The REPL is configured with the same default set of assembly references and using statements as regular CSX script execution.

Basic usage

Once dotnet-script starts you will see a prompt for input. You can start typing C# code there.

~$ dotnet script
> var x = 1;
> x+x
2

If you submit an unterminated expression into the REPL (no ; at the end), it will be evaluated and the result will be serialized using a formatter and printed in the output. This is a bit more interesting than just calling ToString() on the object, because it attempts to capture the actual structure of the object. For example:

~$ dotnet script
> var x = new List<string>();
> x.Add("foo");
> x
List<string>(1) { "foo" }
> x.Add("bar");
> x
List<string>(2) { "foo", "bar" }
>

Inline Nuget packages

REPL also supports inline Nuget packages - meaning the Nuget packages can be installed into the REPL from within the REPL. This is done via our #r and #load from Nuget support and uses identical syntax.

~$ dotnet script
> #r "nuget: Automapper, 6.1.1"
> using AutoMapper;
> typeof(MapperConfiguration)
[AutoMapper.MapperConfiguration]
> #load "nuget: simple-targets-csx, 6.0.0";
> using static SimpleTargets;
> typeof(TargetDictionary)
[Submission#0+SimpleTargets+TargetDictionary]

Multiline mode

Using Roslyn syntax parsing, we also support multiline REPL mode. This means that if you have an uncompleted code block and press Enter, we will automatically enter the multiline mode. The mode is indicated by the * character. This is particularly useful for declaring classes and other more complex constructs.

~$ dotnet script
> class Foo {
* public string Bar {get; set;}
* }
> var foo = new Foo();

REPL commands

Aside from the regular C# script code, you can invoke the following commands (directives) from within the REPL:

CommandDescription
#loadLoad a script into the REPL (same as #load usage in CSX)
#rLoad an assembly into the REPL (same as #r usage in CSX)
#resetReset the REPL back to initial state (without restarting it)
#clsClear the console screen without resetting the REPL state
#exitExits the REPL

Seeding REPL with a script

You can execute a CSX script and, at the end of it, drop yourself into the context of the REPL. This way, the REPL becomes "seeded" with your code - all the classes, methods or variables are available in the REPL context. This is achieved by running a script with an -i flag.

For example, given the following CSX script:

var msg = "Hello World";
Console.WriteLine(msg);

When you run this with the -i flag, Hello World is printed, REPL starts and msg variable is available in the REPL context.

~$ dotnet script foo.csx -i
Hello World
>

You can also seed the REPL from inside the REPL - at any point - by invoking a #load directive pointed at a specific file. For example:

~$ dotnet script
> #load "foo.csx"
Hello World
>

Piping

The following example shows how we can pipe data in and out of a script.

The UpperCase.csx script simply converts the standard input to upper case and writes it back out to standard output.

using (var streamReader = new StreamReader(Console.OpenStandardInput()))
{
    Write(streamReader.ReadToEnd().ToUpper());
}

We can now simply pipe the output from one command into our script like this.

echo "This is some text" | dotnet script UpperCase.csx
THIS IS SOME TEXT

Debugging

The first thing we need to do add the following to the launch.config file that allows VS Code to debug a running process.

{
    "name": ".NET Core Attach",
    "type": "coreclr",
    "request": "attach",
    "processId": "${command:pickProcess}"
}

To debug this script we need a way to attach the debugger in VS Code and the simplest thing we can do here is to wait for the debugger to attach by adding this method somewhere.

public static void WaitForDebugger()
{
    Console.WriteLine("Attach Debugger (VS Code)");
    while(!Debugger.IsAttached)
    {
    }
}

To debug the script when executing it from the command line we can do something like

WaitForDebugger();
using (var streamReader = new StreamReader(Console.OpenStandardInput()))
{
    Write(streamReader.ReadToEnd().ToUpper()); // <- SET BREAKPOINT HERE
}

Now when we run the script from the command line we will get

$ echo "This is some text" | dotnet script UpperCase.csx
Attach Debugger (VS Code)

This now gives us a chance to attach the debugger before stepping into the script and from VS Code, select the .NET Core Attach debugger and pick the process that represents the executing script.

Once that is done we should see our breakpoint being hit.

Configuration(Debug/Release)

By default, scripts will be compiled using the debug configuration. This is to ensure that we can debug a script in VS Code as well as attaching a debugger for long running scripts.

There are however situations where we might need to execute a script that is compiled with the release configuration. For instance, running benchmarks using BenchmarkDotNet is not possible unless the script is compiled with the release configuration.

We can specify this when executing the script.

dotnet script foo.csx -c release

 

Nullable reference types

Starting from version 0.50.0, dotnet-script supports .Net Core 3.0 and all the C# 8 features. The way we deal with nullable references types in dotnet-script is that we turn every warning related to nullable reference types into compiler errors. This means every warning between CS8600 and CS8655 are treated as an error when compiling the script.

Nullable references types are turned off by default and the way we enable it is using the #nullable enable compiler directive. This means that existing scripts will continue to work, but we can now opt-in on this new feature.

#!/usr/bin/env dotnet-script

#nullable enable

string name = null;

Trying to execute the script will result in the following error

main.csx(5,15): error CS8625: Cannot convert null literal to non-nullable reference type.

We will also see this when working with scripts in VS Code under the problems panel.

image

Download Details:
Author: filipw
Source Code: https://github.com/filipw/dotnet-script
License: MIT License

#dotnet  #aspdotnet  #csharp 

SAS Vs R: What Is Difference Between R and SAS?

In this article, we will learn about SAS Vs R: What Is Difference Between R and SAS?. SAS is losing its footing across industries due to the rise of Shiny, an R package giving users bespoke interactivity on top of their R routines. In this article, we’ll discuss SAS vs R Programming in the context of the pharmaceutical industry, but the topic of conversation applies to any data science user looking to switch data analytics tooling

Data science teams are searching for SAS alternatives that can better handle their technical needs while satisfying non-technical personnel with interactive data storytelling. The result of this search boils down to Python vs R Programming. And although there are drag-n-drop BI tools, these solutions do not satisfy custom development, machine learning, and big data handling needs. 

Appsilon is an RStudio (Posit) Full Service Certified Partner. Find out how we can help you with R and Python development services and RStudio discounts.

If your team uses Python and is comfortable with this language, we won’t try to evangelize you. But if you find value in R for data analytics and statistical analysis, we highly recommend exploring the innovations R can provide in your organization.

Breaking down statistical analysis to understand R vs SAS

A statistical analysis has several steps: problem statement, data collection, data wrangling, data analysis, and results-based communication.

The data analysis step usually involves summarizing data using descriptive statistics and applying inferential statistics through hypothesis testing and modeling.

Sharing results is usually done by writing reports. These reports commonly contain different visualizations that help contextualize the analysis; especially for those not involved throughout the analysis process.

To perform this kind of analysis, a data analyst can choose from a variety of tools. But as we mentioned, there are some solutions that work better for your unique case. In this post, we will compare two of them: R and SAS.

What are SAS and R programming?

SAS and R programming are both statistical software used by researchers and data scientists to create statistical data analyses and visualizations.

Let’s begin by introducing the tools:

What is SAS?

SAS is commercial software that can be used to perform advanced analytics, business intelligence, data management, and predictive analytics. You can use SAS software through both a graphical interface and the SAS programming language.

SAS logo 2022

What is SAS programming?

A SAS program is a sequence of steps that you submit to SAS for execution. Each step in the program performs a specific task. Only two kinds of steps make up SAS programs: 

  1. DATA steps: in this step data is created, imported, modified, merged, or calculated.
  2. PROC steps: a group of SAS statements that call and execute a procedure, usually with a SAS data set as input. SAS procedures analyze data in SAS data sets to produce statistics, tables, reports, charts, and plots.

A SAS program can contain a DATA step, a PROC step, or any combination of DATA steps and PROC steps. The number and kind of steps depend on what tasks you need to perform.

SAS program example

The following example uses SAS to Compare Group Means. The idea is to showcase how the code and output look; not to perform a real analysis. The example data set created consists of only 6 observations.


* create example dataset;
data patients;
input patient_id treatment $ age;
cards;
1 a 24
2 a 23
3 a 25
4 b 30
5 b 36
6 b 34
;
run;

* compare group means;
ods graphics on;

proc ttest cochran ci=equal umpu;
   class treatment;
   var age;
run;

ods graphics off;

You can see how we created the data in the DATA step and then called the PROC step to perform our analysis. Here you can explore the options used in the PROC step.

The output of the analysis looks like this:

t-test procedure in sas programming

As we can see, SAS provides a lot of information when you run a PROC. You can draw conclusions from both the tables and charts. It’s standard styling and output with no custom branding.

What is the SAS suite?

The SAS software suite is made of components for data management, advanced analytics, multivariate analysis, and more. Here are just a few of some important components of the SAS suite:

  • Base SAS: Designed for data access, transformation, and reporting.
  • SAS/STAT: Designed to perform statistical analysis.
  • SAS/GRAPH: Data visualization tool to produce graphs.
  • SAS/IML: Interactive Matrix Language. Includes functions for implementing algorithms.

What is R programming compared to SAS?

R is an open-source language and environment for statistical computing and graphics. It provides a wide variety of statistical techniques such as linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, and clustering. Its most popular IDE (Integrated Development Environment) by far is RStudio by RStudio PBC (Posit).

rstudio rebrands to posit

In R, data is stored in objects. These objects can store strings, numeric values, data sets, or anything that can be referenced. To work with these objects we create functions. R has a style of problem-solving centered on functions.

Thinking about switching to R and Shiny? See why you might want to switch to R Shiny for enterprise application development

R packages

In R, code can be shared through packages. A package is a shareable collection of code that is used to perform a desired function or specific task. Examples of packages used for data science include readr, dplyr, tidyr, ggplot2.

Anyone can make a package. They can be made publicly available via CRAN (Comprehensive R Archive Network), but you can also create private packages that you can use within your organization. As of writing these lines, there are over 18.000 packages available on CRAN! 

Appsilon contributes to open source as well through our Shiny tools. We create packages to help us, and other R/Shiny developers build scalable, reproducible, and better-looking Shiny applications.

R Program Example

Let’s reproduce the SAS example using R code. Again, the idea is to showcase how code and output look, and not to focus on the interpretation of results.

Code:


# create example dataset
patients <- data.frame(
  patient_id = 1:6,
  treatment = rep(c("a", "b"), each = 3),
  age = c(24, 23, 25, 30, 36, 34)
)

# compare group means
t.test(age ~ treatment, data = patients)

Console output:

t-test r programming

R output is not styled by default. Nevertheless, there are tools you can use to make the output something you can share in a report, for example, RMarkdown.

The only limitation to output styling is your imagination. You can explore some of our Shiny demos from a variety of use cases. 

Should you use SAS or R?

In general, you should at least begin incorporating R programming into your data science toolset whether you work in an enterprise or as a private individual.

Obviously, the real answer depends on your use case. But as mentioned above, SAS is proprietary, commercial software which can be expensive to the average user. R programming is free with plenty of open source tools and frankly outpaces SAS in a lot of aspects.

Curious what R Shiny applications you can build? Explore some of our R Shiny demos to see what you could create.

And in this Section, we’ll compare both software on different topics that will affect your choice of SAS or R within an enterprise.

Access to New Developments in R vs SAS

Open-source software acceptance has increased in recent years. People working together in the community allows for quicker “to market” solutions. It also creates open access to see what lies underneath the hood. There’s no guesswork in how algorithms work or if it’s the best for your case.

These new algorithms are developed and shared with the community. Implementation of them in SAS takes longer than in R. This means that more advanced data science techniques might be available right now in R but not yet available in SAS.

Collaboration in R

File sharing and collaboration are easier with R. If you want to share with a friend or colleague something you developed using SAS, that person requires access to the software – which is licensed. Even though there are some free versions, they require setting up an account which might be something you want to avoid. R is easily downloaded and installed so you can quickly set it up and run code. You can also quickly publish a dashboard on the web using Shiny.

Get your data story into the hands of colleagues quickly using these top 3 methods for sharing R Shiny apps

Cost of SAS vs R for data science teams

SAS is commercial software. Meaning, that you must pay to play, so to speak. SAS licenses are known to be expensive so it makes it difficult for individuals and small businesses to use or scale.

On the other hand, R is open source. In other words, it’s free to use. Anyone can download it and start using it.

Join the Shiny movement and develop your own R Shiny dashboard in less than 10 minutes!

Should you learn R or SAS?

Whether or not you have experience with programming, we recommend learning R first. It’s easy to get started, free, and there are lots of freely accessible learning materials.

If you have experience using programming languages, switching to a different language is a matter of learning how to do the things you know, in another place. It usually depends on the resources available for learning. 

Need a Shiny dashboard now? Download our free Shiny templates and get started today!

R educational resources

R has a lot of free, online resources to get started (e.g., Hands-on Programming with R and  R for Data Science. You can also find books for different topics you want to learn (e.g., reporting, and creating web applications). Books are not the only resources available. Join RStudio’s webinars, learn and connect with R users in industry!

SAS educational resources

SAS offers courses to learn its software. They also have extensive documentation. Another thing worth mentioning about SAS is that it offers some products that don’t require knowing how to code (e.g., SAS Enterprise Guide). These tools Access the functionality of SAS from a point-and-click Windows interface.

Functionality of SAS compared to R

The following table compares how SAS and R work.

SASR
Data stepsExpressions with functions
ProceduresExpressions with functions
MacrosExpressed in R functions
SAS FunctionsR functions
SAS ODS (Output Delivery System)R Markdown, R Quarto

Hiring SAS developers vs R developers

Over the last decade, universities have begun to shift from teaching SAS to R. Even domain-specific stats courses tend to use R and train on the RStudio IDE. This means that the R talent pool has increased and will continue to do so in the future.

With that being said, R is not as popular as Python for developers. The TIOBE Index for 2022 indicates Python is King of the hill at #1 (R is #16, and SAS is a lowly #26). So if stacking your team is a priority and you’re already using Python routines for your analytics, stick with Python.

Data from PYPL Index

If you don’t know already, you can now use Python on RStudio. If you need help setting up your environment with your preferred language on RStudio platforms, contact Appsilon. We’re RStudio Certified Partners and can help you with 

R programming outsourcing and R Shiny consulting

There’s a growing number of R-based consultancies popping up as companies begin to expand their data science teams and the need for R Shiny developers to handle more complex data handling and visualization requirements.

At Appsilon, we’ve been creating, maintaining, and developing Shiny applications for enterprise customers all over the world for many years now. Appsilon provides scalability, security, and modern UI/UX with custom R packages that native Shiny apps do not provide. Our team is among the world’s foremost experts in R Shiny and has made a variety of Shiny innovations (including scaling to 700 users!) over the years.

Appsilon is also a proud RStudio (Posit) Full Service Certified Partner. Meaning we can help you throughout the entire process of implementing and scaling RStudio (Posit) products and simplify your data-driven decision-making.

Some of the services we, as Shiny consultants provide include:

  • Rapid dashboard development
  • Support of full-stack engineers (from setting up a shiny server to UX optimization)
  • DevOps support & advisory for all RStudio products
  • Machine learning solutions
  • Advanced statistical models

We deliver world-class Shiny applications faster than other vendors. Ultimately, lowering the overall cost of development and improving time to deployment. We use continuous collaboration with clients, end-to-end testing, and automated processes to streamline the development process. Our team can step in at every phase of a Shiny project, starting from business analysis and data science consulting to code refactoring.

R programming packages vs SAS tools

As mentioned, R packages can be developed by anyone. Even though there is no guarantee that they will work as expected, a package that is used by a lot of people is usually something safe to use. The reason is the following: suppose that a package has a bug the creator wasn’t aware of. People start using that package and someone identifies that problem. That person shares that with the creator (and the community) so that it can be fixed. Even someone other than the creator of the package can help code a solution!

In R, there is something called the tidyverse, a collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.

On the other hand, if you detect a problem in SAS you have to communicate with them and wait for a new release with a solution. This might be another reason why things take longer to implement in SAS.

Creating visualizations in R and SAS

R provides a variety of packages to create custom charts, both static (ggplot2) and dynamic (plotly, highcharter). Here you can see lots of different visualizations you can make with R and the code used to produce them!

SAS data visualization features are more limited than R and don’t provide as much customization.

SAS Example

In this example, we will create another example dataset. We will create a histogram to understand age distribution by treatment group. SAS’ PROC UNIVARIATE also provides more information which we will not show here because the goal is to see the chart.


* create example dataset;
data patients;
input treatment $ age sex $;
cards;
a 24 m
a 23 m
a 25 m
a 21 m
a 22 f
a 22 f
a 23 f
a 28 f
a 21 f
a 20 f
a 29 f
a 18 f
a 30 f
a 23 f
a 25 f
a 24 f
a 23 f
a 25 f
b 30 f
b 36 f
b 34 f
b 31 f
b 32 m
b 32 m
b 34 m
b 33 m
b 34 m
b 30 m
b 28 m
b 33 m
b 40 m
b 22 m
b 29 m
;
run;

/*create histogram for age variable by treatment*/
proc univariate data=patients;
    class treatment;
    var age;
    histogram age / overlay;
run;

Output:

R

We will produce the same chart with R code.


# load libraries
library(tibble)
library(ggplot2)

# create example dataset
patients <- tibble::tribble(
  ~treatment, ~age, ~sex,
  "a", 24, "m",
  "a", 23, "m",
  "a", 25, "m",
  "a", 21, "m",
  "a", 22, "f",
  "a", 22, "f",
  "a", 23, "f",
  "a", 28, "f",
  "a", 21, "f",
  "a", 20, "f",
  "a", 29, "f",
  "a", 18, "f",
  "a", 30, "f",
  "a", 23, "f",
  "a", 25, "f",
  "a", 24, "f",
  "a", 23, "f",
  "a", 25, "f",
  "b", 30, "f",
  "b", 36, "f",
  "b", 34, "f",
  "b", 31, "f",
  "b", 32, "m",
  "b", 32, "m",
  "b", 34, "m",
  "b", 33, "m",
  "b", 34, "m",
  "b", 30, "m",
  "b", 28, "m",
  "b", 33, "m",
  "b", 40, "m",
  "b", 22, "m",
  "b", 29, "m"
)

# create chart
ggplot(data = patients, aes(x = age, fill = treatment)) +
  geom_histogram(position = "identity", 
                 alpha = 0.5, 
                 bins = 9,
                 color = "black") +
  labs(
    title = "Distribution of age by treatment",
    x = "Age (years)",
    y = "Number of Patients",
    fill = "Treatment"
  ) +
  theme_minimal() +
  theme(
    legend.position = "top"
  )

We see how with just a few lines of code we are able to create a beautiful chart using R. And we’re able to provide more customization to better suit our needs.

Support in SAS vs R

SAS provides Technical Support and has Documentation with information about everything you can do and how things are implemented in the software.

R doesn’t have Technical Support (it is open source) but it has a large community you can reach out to for help. Packages are usually well documented and come with excellent tutorials (called Vignettes) with examples. If you’re used to Python, you’ll be pleasantly surprised by the quality of documentation that is standard for the R ecosystem (e.g., dplyr vignette, tidyr vignette).

Should you use SAS or R for Clinical Data Science?

SAS is great when you need minimal output or sequential processing. But R offers greater flexibility. And with the recent successes by the R Consortium and enhanced collaboration with the FDA, R is trending toward’s higher standardization to satisfy regulatory needs.

In this section, we will explore how to solve a particular problem using each software. We will use R through RStudio IDE and SAS using SAS On Demand (which is a free version).

Problem Statement

We want to analyze the effect of different variables on mortality due to a particular disease. In particular, we want to understand the differences between treatment application times (no treatment, fast treatment, slow treatment). To do so, we will create a logistic regression model.

Data

The example dataset was created to showcase how to perform the different steps that are usually part of an analysis. It is based on real data, but it has been anonymized and some information was removed from the file (such as disease and treatment names). The focus here is on the how not the what.

Data is in .csv format and contains the following information about patients:

  • ID: Patient identifier.
  • AGE: Age of the patient measured in years.
  • SEX: Sex of the patient (F = Female, M = Male).
  • CHARLSON: Charlson score of the patient.
  • PITT: Pitt bacteremia score of the patient.
  • SURVIVED: Indicator variable. 1 means the patient was cured.
  • DIED_OF_DISEASE: Indicator variable. 1 means the patient died due to the disease.
  • DIED_OTHER: Indicator variable. 1 means the patient died due to another cause.
  • UNKNOWN: Indicator variable. 1 means status was lost for the patient.
  • TREATMENT: Indicator variable. 1 means the patient received treatment.
  • TREATMENT_FAST: Indicator variable. 1 means treatment was applied within 48hs.

In the following Sections we will see how to perform different tasks with SAS and R:

  • Read data
  • Wrangle data (apply filters, add and remove columns)
  • Create a logistic regression model
  • Explore outputs

Working with SAS vs R

Reading data in SAS vs R programming

To start working with data, first, we need to have access to it.

SAS:


* READ DATA;
FILENAME REFFILE '/home/u4729884/data.csv';
PROC IMPORT DATAFILE=REFFILE
	DBMS=CSV
	OUT=WORK.RAW_DATA;
	GETNAMES=YES;
RUN;

R programming:


# Read data
raw_data <- read.csv("data.csv")

Data Wrangling in SAS vs R programming

Now that we have data available, we will prepare it for the model. In this Section we will show how to:

  • Remove rows corresponding to patients with unknown status
  • Remove rows corresponding to patients that died due to another cause
  • Create a new treatment variable using values found in other columns
  • Create new indicator variables for age, Charlson and Pitt using specific cutpoints
  • Select columns

This is not the only way one could process the data. We choose to do it this way for simplicity. Feel free to try something different and share your results!

SAS:


* DATA WRANGLING;
DATA WORK.DATA_TO_MODEL (KEEP = AGE_60 
				      SEX 
					CHARLSON_4
					PITT_3
					TREATMENT_CATEGORY
					DIED_OF_DISEASE); * KEEP VARS OF INTEREST;
	SET WORK.RAW_DATA;
	
	* REMOVE PATIENTS WITH UNKNOWN STATUS;
	IF UNKNOWN = 1 THEN DELETE;
	* REMOVE PATIENTS THAT DIED DUE TO OTHER CAUSE;
	IF DIED_OTHER_CAUSE = 1 THEN DELETE;
	
	* CREATE TREATMENT FACTOR VARIABLE;
	LENGTH TREATMENT_CATEGORY $14;
	IF TREATMENT = 0 THEN TREATMENT_CATEGORY = "NO TREATMENT";
	ELSE IF TREATMENT_FAST = 1 THEN TREATMENT_CATEGORY = "FAST TREATMENT";
	ELSE TREATMENT_CATEGORY = "SLOW TREATMENT";
	* NEW AGE VARIABLE;
	IF AGE >= 60 THEN AGE_60 = 1;
	ELSE AGE_60 = 0;
	* NEW CHARLSON VARIABLE;
	IF CHARLSON > 4 THEN CHARLSON_4 = 1;
	ELSE CHARLSON_4 = 0;
	* NEW PITT VARIABLE;
	IF PITT > 3 THEN PITT_3 = 1;
	ELSE PITT_3 = 0;
RUN;

R programming:


# Load required library
library(dplyr)
# Data wrangling
data_to_model <- raw_data |>
  # Filter rows
  filter(
    UNKNOWN != 1,
    DIED_OTHER_CAUSE != 1
  ) |>
  # Create new columns
  mutate(
    TREATMENT_CATEGORY = case_when(
      TREATMENT == 0 ~ "NO TREATMENT",
      TREATMENT_FAST == 1 ~ "FAST TREATMENT",
      TRUE ~ "SLOW TREATMENT"
    ),
    AGE_60 = ifelse(AGE >= 60, 1, 0),
    CHARLSON_4 = ifelse(CHARLSON > 4, 1, 0),
    PITT_3 = ifelse(PITT > 3, 1, 0)
  ) |>
  # Select columns
  select(
    AGE_60,
    CHARLSON_4,
    PITT_3,
    TREATMENT_CATEGORY,
    DIED_OF_DISEASE
  )

Modeling data in SAS vs R programming

Once data is processed, we are ready to model. We will create a logistic regression model where we will model the probability of dying due to the disease. As explanatory variables we will include:

  • Dichotomized age (cutpoint: 60 years, reference: age > 60)
  • Dichotomized Charlson score (cutpoint: score of 4, reference: score > 4)
  • Dichotomized Pitt score (cutpoint: score of 3, reference: score > 3)  
  • Treatment (Factor with three levels: no treatment, fast treatment, slow treatment). We will use fast treatment as a reference.

SAS:


* MODELING;
PROC LOGISTIC DATA = WORK.DATA_TO_MODEL DESCENDING;
CLASS TREATMENT_CATEGORY (REF = "FAST TREATMENT") SEX (REF = "F") / PARAM = REFERENCE;
MODEL DIED_OF_DISEASE = AGE_60 CHARLSON_4 PITT_3 TREATMENT_CATEGORY / LINK = LOGIT SCALE = NONE; 
RUN;

R programming:


# Create model
model <- glm(formula =  DIED_OF_DISEASE ~ .,
             data = data_to_model,
             family = binomial)

# Explore results
summary(model)

# Get odds ratio
exp(cbind(coef(model), confint(model, level = 0.95)))

Exploring model results in SAS vs R programming

SAS:

Let’s focus on these two tables:

Here we can see the coefficients of the model, its significance level, and the translation to odds ratio estimates (which are more interpretable when doing logistic regression.

We will not dive into details for explaining model assessment or results interpretation. The idea here is to show where this information is available and how to get it using code.

R programming:

In R, we have to compute the odds ratio using the model coefficients. It can be done with:

Final comparison of SAS vs R programming

General remarks:

  • With both SAS and R we arrived at the same results.
  • SAS code requires the use of a semicolon ( ; ) to end statements as well as an explicit RUN to run code. This makes SAS more prone to distraction errors.
  • By default, SAS shows more information in its output and has some styling.

Detailed comparison:

  • Reading data. Importing data with SAS involves more lines of code, with somewhat cryptic parameters. In R, it was straightforwardly done with a function available by default when installing R.
  • Data wrangling. SAS syntax seems more complicated than R. In R, thanks to the use of the pipe operator, code is easier to read. Also, we can follow what is going on in each step. In SAS, some parts are defined at the beginning (for example, the columns to keep). Also, when creating the new treatment variable, in SAS we had to define its type before being able to create it, which might seem counter-intuitive.
  • Modeling. Again, SAS code seems more verbose than R in order to accomplish this task. One thing to mention here is that in both software there are multiple ways to get to the same result, so maybe another person can write code in a more succinct way.
  • Exploring results. Even though SAS output looks nicer, maybe you want to write a report using part of those results. Right now, it would imply copying and pasting tables in a document and sharing it. Regarding R, we have shown what the output looks like in the console. With R you can create reports with code in the same file using R Markdown. One more thing to mention is that SAS by default computed the odds ratio. If you want to access those values in R you need to apply computations using the model results.

Should You Choose SAS or R? (Conclusion)

If you’re looking to keep pace within your industry or create faster tooling and PoCs for your team, you should consider switching to R programming. SAS still holds value for a lot of users, but R and its open source packages are becoming the standard for the new workforce. Don’t get left behind!


Original article sourced at: https://appsilon.com

#sas #r #programming 

akshay L

akshay L

1590751169

Python vs R vs SAS | R, Python and SAS Comparison | Learn R, Python and SAS? | Intellipaat

🔥🔥🔥This R, Python and SAS Comparison video you will learn the difference between Python vs R vs SAS and whether you should learn r, python and sas for data science? This video also provides you with a short and crisp introduction to top three languages used in the IT industry: R, Python and sas. Some important parameters have been taken into consideration to give you R, Python and sas comparison so that you understand how these languages differ from each other and also learn why one is preferred over the other in certain aspects.
Link: https://www.youtube.com/watch?v=S0P4N7m9y28

#pythonvsrvssas #learnrpythonandsas #python #sas #r