1670082180
In this article, we will learn about SAS Vs R: What Is Difference Between R and SAS?. SAS is losing its footing across industries due to the rise of Shiny, an R package giving users bespoke interactivity on top of their R routines. In this article, we’ll discuss SAS vs R Programming in the context of the pharmaceutical industry, but the topic of conversation applies to any data science user looking to switch data analytics tooling.
Data science teams are searching for SAS alternatives that can better handle their technical needs while satisfying non-technical personnel with interactive data storytelling. The result of this search boils down to Python vs R Programming. And although there are drag-n-drop BI tools, these solutions do not satisfy custom development, machine learning, and big data handling needs.
Appsilon is an RStudio (Posit) Full Service Certified Partner. Find out how we can help you with R and Python development services and RStudio discounts.
If your team uses Python and is comfortable with this language, we won’t try to evangelize you. But if you find value in R for data analytics and statistical analysis, we highly recommend exploring the innovations R can provide in your organization.
A statistical analysis has several steps: problem statement, data collection, data wrangling, data analysis, and results-based communication.
The data analysis step usually involves summarizing data using descriptive statistics and applying inferential statistics through hypothesis testing and modeling.
Sharing results is usually done by writing reports. These reports commonly contain different visualizations that help contextualize the analysis; especially for those not involved throughout the analysis process.
To perform this kind of analysis, a data analyst can choose from a variety of tools. But as we mentioned, there are some solutions that work better for your unique case. In this post, we will compare two of them: R and SAS.
SAS and R programming are both statistical software used by researchers and data scientists to create statistical data analyses and visualizations.
Let’s begin by introducing the tools:
SAS is commercial software that can be used to perform advanced analytics, business intelligence, data management, and predictive analytics. You can use SAS software through both a graphical interface and the SAS programming language.
A SAS program is a sequence of steps that you submit to SAS for execution. Each step in the program performs a specific task. Only two kinds of steps make up SAS programs:
A SAS program can contain a DATA step, a PROC step, or any combination of DATA steps and PROC steps. The number and kind of steps depend on what tasks you need to perform.
The following example uses SAS to Compare Group Means. The idea is to showcase how the code and output look; not to perform a real analysis. The example data set created consists of only 6 observations.
* create example dataset;
data patients;
input patient_id treatment $ age;
cards;
1 a 24
2 a 23
3 a 25
4 b 30
5 b 36
6 b 34
;
run;
* compare group means;
ods graphics on;
proc ttest cochran ci=equal umpu;
class treatment;
var age;
run;
ods graphics off;
You can see how we created the data in the DATA step and then called the PROC step to perform our analysis. Here you can explore the options used in the PROC step.
The output of the analysis looks like this:
As we can see, SAS provides a lot of information when you run a PROC. You can draw conclusions from both the tables and charts. It’s standard styling and output with no custom branding.
The SAS software suite is made of components for data management, advanced analytics, multivariate analysis, and more. Here are just a few of some important components of the SAS suite:
R is an open-source language and environment for statistical computing and graphics. It provides a wide variety of statistical techniques such as linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, and clustering. Its most popular IDE (Integrated Development Environment) by far is RStudio by RStudio PBC (Posit).
In R, data is stored in objects. These objects can store strings, numeric values, data sets, or anything that can be referenced. To work with these objects we create functions. R has a style of problem-solving centered on functions.
Thinking about switching to R and Shiny? See why you might want to switch to R Shiny for enterprise application development.
In R, code can be shared through packages. A package is a shareable collection of code that is used to perform a desired function or specific task. Examples of packages used for data science include readr, dplyr, tidyr, ggplot2.
Anyone can make a package. They can be made publicly available via CRAN (Comprehensive R Archive Network), but you can also create private packages that you can use within your organization. As of writing these lines, there are over 18.000 packages available on CRAN!
Appsilon contributes to open source as well through our Shiny tools. We create packages to help us, and other R/Shiny developers build scalable, reproducible, and better-looking Shiny applications.
Let’s reproduce the SAS example using R code. Again, the idea is to showcase how code and output look, and not to focus on the interpretation of results.
Code:
# create example dataset
patients <- data.frame(
patient_id = 1:6,
treatment = rep(c("a", "b"), each = 3),
age = c(24, 23, 25, 30, 36, 34)
)
# compare group means
t.test(age ~ treatment, data = patients)
Console output:
R output is not styled by default. Nevertheless, there are tools you can use to make the output something you can share in a report, for example, RMarkdown.
The only limitation to output styling is your imagination. You can explore some of our Shiny demos from a variety of use cases.
In general, you should at least begin incorporating R programming into your data science toolset whether you work in an enterprise or as a private individual.
Obviously, the real answer depends on your use case. But as mentioned above, SAS is proprietary, commercial software which can be expensive to the average user. R programming is free with plenty of open source tools and frankly outpaces SAS in a lot of aspects.
Curious what R Shiny applications you can build? Explore some of our R Shiny demos to see what you could create.
And in this Section, we’ll compare both software on different topics that will affect your choice of SAS or R within an enterprise.
Open-source software acceptance has increased in recent years. People working together in the community allows for quicker “to market” solutions. It also creates open access to see what lies underneath the hood. There’s no guesswork in how algorithms work or if it’s the best for your case.
These new algorithms are developed and shared with the community. Implementation of them in SAS takes longer than in R. This means that more advanced data science techniques might be available right now in R but not yet available in SAS.
File sharing and collaboration are easier with R. If you want to share with a friend or colleague something you developed using SAS, that person requires access to the software – which is licensed. Even though there are some free versions, they require setting up an account which might be something you want to avoid. R is easily downloaded and installed so you can quickly set it up and run code. You can also quickly publish a dashboard on the web using Shiny.
Get your data story into the hands of colleagues quickly using these top 3 methods for sharing R Shiny apps.
SAS is commercial software. Meaning, that you must pay to play, so to speak. SAS licenses are known to be expensive so it makes it difficult for individuals and small businesses to use or scale.
On the other hand, R is open source. In other words, it’s free to use. Anyone can download it and start using it.
Join the Shiny movement and develop your own R Shiny dashboard in less than 10 minutes!
Whether or not you have experience with programming, we recommend learning R first. It’s easy to get started, free, and there are lots of freely accessible learning materials.
If you have experience using programming languages, switching to a different language is a matter of learning how to do the things you know, in another place. It usually depends on the resources available for learning.
Need a Shiny dashboard now? Download our free Shiny templates and get started today!
R has a lot of free, online resources to get started (e.g., Hands-on Programming with R and R for Data Science. You can also find books for different topics you want to learn (e.g., reporting, and creating web applications). Books are not the only resources available. Join RStudio’s webinars, learn and connect with R users in industry!
SAS offers courses to learn its software. They also have extensive documentation. Another thing worth mentioning about SAS is that it offers some products that don’t require knowing how to code (e.g., SAS Enterprise Guide). These tools Access the functionality of SAS from a point-and-click Windows interface.
The following table compares how SAS and R work.
SAS | R |
Data steps | Expressions with functions |
Procedures | Expressions with functions |
Macros | Expressed in R functions |
SAS Functions | R functions |
SAS ODS (Output Delivery System) | R Markdown, R Quarto |
Over the last decade, universities have begun to shift from teaching SAS to R. Even domain-specific stats courses tend to use R and train on the RStudio IDE. This means that the R talent pool has increased and will continue to do so in the future.
With that being said, R is not as popular as Python for developers. The TIOBE Index for 2022 indicates Python is King of the hill at #1 (R is #16, and SAS is a lowly #26). So if stacking your team is a priority and you’re already using Python routines for your analytics, stick with Python.
Data from PYPL Index
If you don’t know already, you can now use Python on RStudio. If you need help setting up your environment with your preferred language on RStudio platforms, contact Appsilon. We’re RStudio Certified Partners and can help you with
There’s a growing number of R-based consultancies popping up as companies begin to expand their data science teams and the need for R Shiny developers to handle more complex data handling and visualization requirements.
At Appsilon, we’ve been creating, maintaining, and developing Shiny applications for enterprise customers all over the world for many years now. Appsilon provides scalability, security, and modern UI/UX with custom R packages that native Shiny apps do not provide. Our team is among the world’s foremost experts in R Shiny and has made a variety of Shiny innovations (including scaling to 700 users!) over the years.
Appsilon is also a proud RStudio (Posit) Full Service Certified Partner. Meaning we can help you throughout the entire process of implementing and scaling RStudio (Posit) products and simplify your data-driven decision-making.
Some of the services we, as Shiny consultants provide include:
We deliver world-class Shiny applications faster than other vendors. Ultimately, lowering the overall cost of development and improving time to deployment. We use continuous collaboration with clients, end-to-end testing, and automated processes to streamline the development process. Our team can step in at every phase of a Shiny project, starting from business analysis and data science consulting to code refactoring.
As mentioned, R packages can be developed by anyone. Even though there is no guarantee that they will work as expected, a package that is used by a lot of people is usually something safe to use. The reason is the following: suppose that a package has a bug the creator wasn’t aware of. People start using that package and someone identifies that problem. That person shares that with the creator (and the community) so that it can be fixed. Even someone other than the creator of the package can help code a solution!
In R, there is something called the tidyverse, a collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
On the other hand, if you detect a problem in SAS you have to communicate with them and wait for a new release with a solution. This might be another reason why things take longer to implement in SAS.
R provides a variety of packages to create custom charts, both static (ggplot2) and dynamic (plotly, highcharter). Here you can see lots of different visualizations you can make with R and the code used to produce them!
SAS data visualization features are more limited than R and don’t provide as much customization.
In this example, we will create another example dataset. We will create a histogram to understand age distribution by treatment group. SAS’ PROC UNIVARIATE also provides more information which we will not show here because the goal is to see the chart.
* create example dataset;
data patients;
input treatment $ age sex $;
cards;
a 24 m
a 23 m
a 25 m
a 21 m
a 22 f
a 22 f
a 23 f
a 28 f
a 21 f
a 20 f
a 29 f
a 18 f
a 30 f
a 23 f
a 25 f
a 24 f
a 23 f
a 25 f
b 30 f
b 36 f
b 34 f
b 31 f
b 32 m
b 32 m
b 34 m
b 33 m
b 34 m
b 30 m
b 28 m
b 33 m
b 40 m
b 22 m
b 29 m
;
run;
/*create histogram for age variable by treatment*/
proc univariate data=patients;
class treatment;
var age;
histogram age / overlay;
run;
Output:
We will produce the same chart with R code.
# load libraries
library(tibble)
library(ggplot2)
# create example dataset
patients <- tibble::tribble(
~treatment, ~age, ~sex,
"a", 24, "m",
"a", 23, "m",
"a", 25, "m",
"a", 21, "m",
"a", 22, "f",
"a", 22, "f",
"a", 23, "f",
"a", 28, "f",
"a", 21, "f",
"a", 20, "f",
"a", 29, "f",
"a", 18, "f",
"a", 30, "f",
"a", 23, "f",
"a", 25, "f",
"a", 24, "f",
"a", 23, "f",
"a", 25, "f",
"b", 30, "f",
"b", 36, "f",
"b", 34, "f",
"b", 31, "f",
"b", 32, "m",
"b", 32, "m",
"b", 34, "m",
"b", 33, "m",
"b", 34, "m",
"b", 30, "m",
"b", 28, "m",
"b", 33, "m",
"b", 40, "m",
"b", 22, "m",
"b", 29, "m"
)
# create chart
ggplot(data = patients, aes(x = age, fill = treatment)) +
geom_histogram(position = "identity",
alpha = 0.5,
bins = 9,
color = "black") +
labs(
title = "Distribution of age by treatment",
x = "Age (years)",
y = "Number of Patients",
fill = "Treatment"
) +
theme_minimal() +
theme(
legend.position = "top"
)
We see how with just a few lines of code we are able to create a beautiful chart using R. And we’re able to provide more customization to better suit our needs.
SAS provides Technical Support and has Documentation with information about everything you can do and how things are implemented in the software.
R doesn’t have Technical Support (it is open source) but it has a large community you can reach out to for help. Packages are usually well documented and come with excellent tutorials (called Vignettes) with examples. If you’re used to Python, you’ll be pleasantly surprised by the quality of documentation that is standard for the R ecosystem (e.g., dplyr vignette, tidyr vignette).
SAS is great when you need minimal output or sequential processing. But R offers greater flexibility. And with the recent successes by the R Consortium and enhanced collaboration with the FDA, R is trending toward’s higher standardization to satisfy regulatory needs.
In this section, we will explore how to solve a particular problem using each software. We will use R through RStudio IDE and SAS using SAS On Demand (which is a free version).
We want to analyze the effect of different variables on mortality due to a particular disease. In particular, we want to understand the differences between treatment application times (no treatment, fast treatment, slow treatment). To do so, we will create a logistic regression model.
The example dataset was created to showcase how to perform the different steps that are usually part of an analysis. It is based on real data, but it has been anonymized and some information was removed from the file (such as disease and treatment names). The focus here is on the how not the what.
Data is in .csv format and contains the following information about patients:
In the following Sections we will see how to perform different tasks with SAS and R:
To start working with data, first, we need to have access to it.
* READ DATA;
FILENAME REFFILE '/home/u4729884/data.csv';
PROC IMPORT DATAFILE=REFFILE
DBMS=CSV
OUT=WORK.RAW_DATA;
GETNAMES=YES;
RUN;
# Read data
raw_data <- read.csv("data.csv")
Now that we have data available, we will prepare it for the model. In this Section we will show how to:
This is not the only way one could process the data. We choose to do it this way for simplicity. Feel free to try something different and share your results!
* DATA WRANGLING;
DATA WORK.DATA_TO_MODEL (KEEP = AGE_60
SEX
CHARLSON_4
PITT_3
TREATMENT_CATEGORY
DIED_OF_DISEASE); * KEEP VARS OF INTEREST;
SET WORK.RAW_DATA;
* REMOVE PATIENTS WITH UNKNOWN STATUS;
IF UNKNOWN = 1 THEN DELETE;
* REMOVE PATIENTS THAT DIED DUE TO OTHER CAUSE;
IF DIED_OTHER_CAUSE = 1 THEN DELETE;
* CREATE TREATMENT FACTOR VARIABLE;
LENGTH TREATMENT_CATEGORY $14;
IF TREATMENT = 0 THEN TREATMENT_CATEGORY = "NO TREATMENT";
ELSE IF TREATMENT_FAST = 1 THEN TREATMENT_CATEGORY = "FAST TREATMENT";
ELSE TREATMENT_CATEGORY = "SLOW TREATMENT";
* NEW AGE VARIABLE;
IF AGE >= 60 THEN AGE_60 = 1;
ELSE AGE_60 = 0;
* NEW CHARLSON VARIABLE;
IF CHARLSON > 4 THEN CHARLSON_4 = 1;
ELSE CHARLSON_4 = 0;
* NEW PITT VARIABLE;
IF PITT > 3 THEN PITT_3 = 1;
ELSE PITT_3 = 0;
RUN;
# Load required library
library(dplyr)
# Data wrangling
data_to_model <- raw_data |>
# Filter rows
filter(
UNKNOWN != 1,
DIED_OTHER_CAUSE != 1
) |>
# Create new columns
mutate(
TREATMENT_CATEGORY = case_when(
TREATMENT == 0 ~ "NO TREATMENT",
TREATMENT_FAST == 1 ~ "FAST TREATMENT",
TRUE ~ "SLOW TREATMENT"
),
AGE_60 = ifelse(AGE >= 60, 1, 0),
CHARLSON_4 = ifelse(CHARLSON > 4, 1, 0),
PITT_3 = ifelse(PITT > 3, 1, 0)
) |>
# Select columns
select(
AGE_60,
CHARLSON_4,
PITT_3,
TREATMENT_CATEGORY,
DIED_OF_DISEASE
)
Once data is processed, we are ready to model. We will create a logistic regression model where we will model the probability of dying due to the disease. As explanatory variables we will include:
* MODELING;
PROC LOGISTIC DATA = WORK.DATA_TO_MODEL DESCENDING;
CLASS TREATMENT_CATEGORY (REF = "FAST TREATMENT") SEX (REF = "F") / PARAM = REFERENCE;
MODEL DIED_OF_DISEASE = AGE_60 CHARLSON_4 PITT_3 TREATMENT_CATEGORY / LINK = LOGIT SCALE = NONE;
RUN;
# Create model
model <- glm(formula = DIED_OF_DISEASE ~ .,
data = data_to_model,
family = binomial)
# Explore results
summary(model)
# Get odds ratio
exp(cbind(coef(model), confint(model, level = 0.95)))
Let’s focus on these two tables:
Here we can see the coefficients of the model, its significance level, and the translation to odds ratio estimates (which are more interpretable when doing logistic regression.
We will not dive into details for explaining model assessment or results interpretation. The idea here is to show where this information is available and how to get it using code.
In R, we have to compute the odds ratio using the model coefficients. It can be done with:
General remarks:
Detailed comparison:
If you’re looking to keep pace within your industry or create faster tooling and PoCs for your team, you should consider switching to R programming. SAS still holds value for a lot of users, but R and its open source packages are becoming the standard for the new workforce. Don’t get left behind!
Original article sourced at: https://appsilon.com
1649209980
A cross-platform command line REPL for the rapid experimentation and exploration of C#. It supports intellisense, installing NuGet packages, and referencing local .NET projects and assemblies.
(click to view animation)
C# REPL provides the following features:
C# REPL is a .NET 6 global tool, and runs on Windows 10, Mac OS, and Linux. It can be installed via:
dotnet tool install -g csharprepl
If you're running on Mac OS Catalina (10.15) or later, make sure you follow any additional directions printed to the screen. You may need to update your PATH variable in order to use .NET global tools.
After installation is complete, run csharprepl
to begin. C# REPL can be updated via dotnet tool update -g csharprepl
.
Run csharprepl
from the command line to begin an interactive session. The default colorscheme uses the color palette defined by your terminal, but these colors can be changed using a theme.json
file provided as a command line argument.
Type some C# into the prompt and press Enter to run it. The result, if any, will be printed:
> Console.WriteLine("Hello World")
Hello World
> DateTime.Now.AddDays(8)
[6/7/2021 5:13:00 PM]
To evaluate multiple lines of code, use Shift+Enter to insert a newline:
> var x = 5;
var y = 8;
x * y
40
Additionally, if the statement is not a "complete statement" a newline will automatically be inserted when Enter is pressed. For example, in the below code, the first line is not a syntactically complete statement, so when we press enter we'll go down to a new line:
> if (x == 5)
| // caret position, after we press Enter on Line 1
Finally, pressing Ctrl+Enter will show a "detailed view" of the result. For example, for the DateTime.Now
expression below, on the first line we pressed Enter, and on the second line we pressed Ctrl+Enter to view more detailed output:
> DateTime.Now // Pressing Enter shows a reasonable representation
[5/30/2021 5:13:00 PM]
> DateTime.Now // Pressing Ctrl+Enter shows a detailed representation
[5/30/2021 5:13:00 PM] {
Date: [5/30/2021 12:00:00 AM],
Day: 30,
DayOfWeek: Sunday,
DayOfYear: 150,
Hour: 17,
InternalKind: 9223372036854775808,
InternalTicks: 637579915804530992,
Kind: Local,
Millisecond: 453,
Minute: 13,
Month: 5,
Second: 0,
Ticks: 637579915804530992,
TimeOfDay: [17:13:00.4530992],
Year: 2021,
_dateData: 9860951952659306800
}
A note on semicolons: C# expressions do not require semicolons, but statements do. If a statement is missing a required semicolon, a newline will be added instead of trying to run the syntatically incomplete statement; simply type the semicolon to complete the statement.
> var now = DateTime.Now; // assignment statement, semicolon required
> DateTime.Now.AddDays(8) // expression, we don't need a semicolon
[6/7/2021 5:03:05 PM]
Use the #r
command to add assembly or nuget references.
#r "AssemblyName"
or #r "path/to/assembly.dll"
#r "path/to/project.csproj"
. Solution files (.sln) can also be referenced.#r "nuget: PackageName"
to install the latest version of a package, or #r "nuget: PackageName, 13.0.5"
to install a specific version (13.0.5 in this case).To run ASP.NET applications inside the REPL, start the csharprepl
application with the --framework
parameter, specifying the Microsoft.AspNetCore.App
shared framework. Then, use the above #r
command to reference the application DLL. See the Command Line Configuration section below for more details.
csharprepl --framework Microsoft.AspNetCore.App
The C# REPL supports multiple configuration flags to control startup, behavior, and appearance:
csharprepl [OPTIONS] [response-file.rsp] [script-file.csx] [-- <additional-arguments>]
Supported options are:
-r <dll>
or --reference <dll>
: Reference an assembly, project file, or nuget package. Can be specified multiple times. Uses the same syntax as #r
statements inside the REPL. For example, csharprepl -r "nuget:Newtonsoft.Json" "path/to/myproj.csproj"
-u <namespace>
or --using <namespace>
: Add a using statement. Can be specified multiple times.-f <framework>
or --framework <framework>
: Reference a shared framework. The available shared frameworks depends on the local .NET installation, and can be useful when running an ASP.NET application from the REPL. Example frameworks are:-t <theme.json>
or --theme <theme.json>
: Read a theme file for syntax highlighting. This theme file associates C# syntax classifications with colors. The color values can be full RGB, or ANSI color names (defined in your terminal's theme). The NO_COLOR standard is supported.--trace
: Produce a trace file in the current directory that logs CSharpRepl internals. Useful for CSharpRepl bug reports.-v
or --version
: Show version number and exit.-h
or --help
: Show help and exit.response-file.rsp
: A filepath of an .rsp file, containing any of the above command line options.script-file.csx
: A filepath of a .csx file, containing lines of C# to evaluate before starting the REPL. Arguments to this script can be passed as <additional-arguments>
, after a double hyphen (--
), and will be available in a global args
variable.If you have dotnet-suggest
enabled, all options can be tab-completed, including values provided to --framework
and .NET namespaces provided to --using
.
C# REPL is a standalone software application, but it can be useful to integrate it with other developer tools:
To add C# REPL as a menu entry in Windows Terminal, add the following profile to Windows Terminal's settings.json
configuration file (under the JSON property profiles.list
):
{
"name": "C# REPL",
"commandline": "csharprepl"
},
To get the exact colors shown in the screenshots in this README, install the Windows Terminal Dracula theme.
To use the C# REPL with Visual Studio Code, simply run the csharprepl
command in the Visual Studio Code terminal. To send commands to the REPL, use the built-in Terminal: Run Selected Text In Active Terminal
command from the Command Palette (workbench.action.terminal.runSelectedText
).
To add the C# REPL to the Windows Start Menu for quick access, you can run the following PowerShell command, which will start C# REPL in Windows Terminal:
$shell = New-Object -ComObject WScript.Shell
$shortcut = $shell.CreateShortcut("$env:appdata\Microsoft\Windows\Start Menu\Programs\csharprepl.lnk")
$shortcut.TargetPath = "wt.exe"
$shortcut.Arguments = "-w 0 nt csharprepl.exe"
$shortcut.Save()
You may also wish to add a shorter alias for C# REPL, which can be done by creating a .cmd
file somewhere on your path. For example, put the following contents in C:\Users\username\.dotnet\tools\csr.cmd
:
wt -w 0 nt csharprepl
This will allow you to launch C# REPL by running csr
from anywhere that accepts Windows commands, like the Window Run dialog.
This project is far from being the first REPL for C#. Here are some other projects; if this project doesn't suit you, another one might!
Visual Studio's C# Interactive pane is full-featured (it has syntax highlighting and intellisense) and is part of Visual Studio. This deep integration with Visual Studio is both a benefit from a workflow perspective, and a drawback as it's not cross-platform. As far as I know, the C# Interactive pane does not support NuGet packages or navigating to documentation/source code. Subjectively, it does not follow typical command line keybindings, so can feel a bit foreign.
csi.exe ships with C# and is a command line REPL. It's great because it's a cross platform REPL that comes out of the box, but it doesn't support syntax highlighting or autocompletion.
dotnet script allows you to run C# scripts from the command line. It has a REPL built-in, but the predominant focus seems to be as a script runner. It's a great tool, though, and has a strong community following.
dotnet interactive is a tool from Microsoft that creates a Jupyter notebook for C#, runnable through Visual Studio Code. It also provides a general framework useful for running REPLs.
Download Details:
Author: waf
Source Code: https://github.com/waf/CSharpRepl
License: MPL-2.0 License
1624422360
I currently lead a research group with data scientists who use both R and Python. I have been in this field for over 14 years. I have witnessed the growth of both languages over the years and there is now a thriving community behind both.
I did not have a straightforward journey and learned many things the hard way. However, you can avoid making the mistakes I made and lead a more focussed, more rewarding journey and reach your goals quicker than others.
Before I dive in, let’s get something out of the way. R and Python are just tools to do the same thing. Data Science. Neither of the tools is inherently better than the other. Both the tools have been evolving over years (and will likely continue to do so).
Therefore, the short answer on whether you should learn Python or R is: it depends.
The longer answer, if you can spare a few minutes, will help you focus on what really matters and avoid the most common mistakes most enthusiastic beginners aspiring to become expert data scientists make.
#r-programming #python #perspective #r vs python: what should beginners learn? #r vs python #r
1647064260
Run C# scripts from the .NET CLI, define NuGet packages inline and edit/debug them in VS Code - all of that with full language services support from OmniSharp.
Name | Version | Framework(s) |
---|---|---|
dotnet-script (global tool) | net6.0 , net5.0 , netcoreapp3.1 | |
Dotnet.Script (CLI as Nuget) | net6.0 , net5.0 , netcoreapp3.1 | |
Dotnet.Script.Core | netcoreapp3.1 , netstandard2.0 | |
Dotnet.Script.DependencyModel | netstandard2.0 | |
Dotnet.Script.DependencyModel.Nuget | netstandard2.0 |
The only thing we need to install is .NET Core 3.1 or .NET 5.0 SDK.
.NET Core 2.1 introduced the concept of global tools meaning that you can install dotnet-script
using nothing but the .NET CLI.
dotnet tool install -g dotnet-script
You can invoke the tool using the following command: dotnet-script
Tool 'dotnet-script' (version '0.22.0') was successfully installed.
The advantage of this approach is that you can use the same command for installation across all platforms. .NET Core SDK also supports viewing a list of installed tools and their uninstallation.
dotnet tool list -g
Package Id Version Commands
---------------------------------------------
dotnet-script 0.22.0 dotnet-script
dotnet tool uninstall dotnet-script -g
Tool 'dotnet-script' (version '0.22.0') was successfully uninstalled.
choco install dotnet.script
We also provide a PowerShell script for installation.
(new-object Net.WebClient).DownloadString("https://raw.githubusercontent.com/filipw/dotnet-script/master/install/install.ps1") | iex
curl -s https://raw.githubusercontent.com/filipw/dotnet-script/master/install/install.sh | bash
If permission is denied we can try with sudo
curl -s https://raw.githubusercontent.com/filipw/dotnet-script/master/install/install.sh | sudo bash
A Dockerfile for running dotnet-script in a Linux container is available. Build:
cd build
docker build -t dotnet-script -f Dockerfile ..
And run:
docker run -it dotnet-script --version
You can manually download all the releases in zip
format from the GitHub releases page.
Our typical helloworld.csx
might look like this:
Console.WriteLine("Hello world!");
That is all it takes and we can execute the script. Args are accessible via the global Args array.
dotnet script helloworld.csx
Simply create a folder somewhere on your system and issue the following command.
dotnet script init
This will create main.csx
along with the launch configuration needed to debug the script in VS Code.
.
├── .vscode
│ └── launch.json
├── main.csx
└── omnisharp.json
We can also initialize a folder using a custom filename.
dotnet script init custom.csx
Instead of main.csx
which is the default, we now have a file named custom.csx
.
.
├── .vscode
│ └── launch.json
├── custom.csx
└── omnisharp.json
Note: Executing
dotnet script init
inside a folder that already contains one or more script files will not create themain.csx
file.
Scripts can be executed directly from the shell as if they were executables.
foo.csx arg1 arg2 arg3
OSX/Linux
Just like all scripts, on OSX/Linux you need to have a
#!
and mark the file as executable via chmod +x foo.csx. If you use dotnet script init to create your csx it will automatically have the#!
directive and be marked as executable.
The OSX/Linux shebang directive should be #!/usr/bin/env dotnet-script
#!/usr/bin/env dotnet-script
Console.WriteLine("Hello world");
You can execute your script using dotnet script or dotnet-script, which allows you to pass arguments to control your script execution more.
foo.csx arg1 arg2 arg3
dotnet script foo.csx -- arg1 arg2 arg3
dotnet-script foo.csx -- arg1 arg2 arg3
All arguments after --
are passed to the script in the following way:
dotnet script foo.csx -- arg1 arg2 arg3
Then you can access the arguments in the script context using the global Args
collection:
foreach (var arg in Args)
{
Console.WriteLine(arg);
}
All arguments before --
are processed by dotnet script
. For example, the following command-line
dotnet script -d foo.csx -- -d
will pass the -d
before --
to dotnet script
and enable the debug mode whereas the -d
after --
is passed to script for its own interpretation of the argument.
dotnet script
has built-in support for referencing NuGet packages directly from within the script.
#r "nuget: AutoMapper, 6.1.0"
Note: Omnisharp needs to be restarted after adding a new package reference
We can define package sources using a NuGet.Config
file in the script root folder. In addition to being used during execution of the script, it will also be used by OmniSharp
that provides language services for packages resolved from these package sources.
As an alternative to maintaining a local NuGet.Config
file we can define these package sources globally either at the user level or at the computer level as described in Configuring NuGet Behaviour
It is also possible to specify packages sources when executing the script.
dotnet script foo.csx -s https://SomePackageSource
Multiple packages sources can be specified like this:
dotnet script foo.csx -s https://SomePackageSource -s https://AnotherPackageSource
Dotnet-Script can create a standalone executable or DLL for your script.
Switch | Long switch | description |
---|---|---|
-o | --output | Directory where the published executable should be placed. Defaults to a 'publish' folder in the current directory. |
-n | --name | The name for the generated DLL (executable not supported at this time). Defaults to the name of the script. |
--dll | Publish to a .dll instead of an executable. | |
-c | --configuration | Configuration to use for publishing the script [Release/Debug]. Default is "Debug" |
-d | --debug | Enables debug output. |
-r | --runtime | The runtime used when publishing the self contained executable. Defaults to your current runtime. |
The executable you can run directly independent of dotnet install, while the DLL can be run using the dotnet CLI like this:
dotnet script exec {path_to_dll} -- arg1 arg2
We provide two types of caching, the dependency cache
and the execution cache
which is explained in detail below. In order for any of these caches to be enabled, it is required that all NuGet package references are specified using an exact version number. The reason for this constraint is that we need to make sure that we don't execute a script with a stale dependency graph.
In order to resolve the dependencies for a script, a dotnet restore
is executed under the hood to produce a project.assets.json
file from which we can figure out all the dependencies we need to add to the compilation. This is an out-of-process operation and represents a significant overhead to the script execution. So this cache works by looking at all the dependencies specified in the script(s) either in the form of NuGet package references or assembly file references. If these dependencies matches the dependencies from the last script execution, we skip the restore and read the dependencies from the already generated project.assets.json
file. If any of the dependencies has changed, we must restore again to obtain the new dependency graph.
In order to execute a script it needs to be compiled first and since that is a CPU and time consuming operation, we make sure that we only compile when the source code has changed. This works by creating a SHA256 hash from all the script files involved in the execution. This hash is written to a temporary location along with the DLL that represents the result of the script compilation. When a script is executed the hash is computed and compared with the hash from the previous compilation. If they match there is no need to recompile and we run from the already compiled DLL. If the hashes don't match, the cache is invalidated and we recompile.
You can override this automatic caching by passing --no-cache flag, which will bypass both caches and cause dependency resolution and script compilation to happen every time we execute the script.
The temporary location used for caches is a sub-directory named dotnet-script
under (in order of priority):
DOTNET_SCRIPT_CACHE_LOCATION
, if defined and value is not empty.$XDG_CACHE_HOME
if defined otherwise $HOME/.cache
~/Library/Caches
Path.GetTempPath
for the platform.The days of debugging scripts using Console.WriteLine
are over. One major feature of dotnet script
is the ability to debug scripts directly in VS Code. Just set a breakpoint anywhere in your script file(s) and hit F5(start debugging)
Script packages are a way of organizing reusable scripts into NuGet packages that can be consumed by other scripts. This means that we now can leverage scripting infrastructure without the need for any kind of bootstrapping.
A script package is just a regular NuGet package that contains script files inside the content
or contentFiles
folder.
The following example shows how the scripts are laid out inside the NuGet package according to the standard convention .
└── contentFiles
└── csx
└── netstandard2.0
└── main.csx
This example contains just the main.csx
file in the root folder, but packages may have multiple script files either in the root folder or in subfolders below the root folder.
When loading a script package we will look for an entry point script to be loaded. This entry point script is identified by one of the following.
main.csx
in the root folderIf the entry point script cannot be determined, we will simply load all the scripts files in the package.
The advantage with using an entry point script is that we can control loading other scripts from the package.
To consume a script package all we need to do specify the NuGet package in the #load
directive.
The following example loads the simple-targets package that contains script files to be included in our script.
#load "nuget:simple-targets-csx, 6.0.0"
using static SimpleTargets;
var targets = new TargetDictionary();
targets.Add("default", () => Console.WriteLine("Hello, world!"));
Run(Args, targets);
Note: Debugging also works for script packages so that we can easily step into the scripts that are brought in using the
#load
directive.
Scripts don't actually have to exist locally on the machine. We can also execute scripts that are made available on an http(s)
endpoint.
This means that we can create a Gist on Github and execute it just by providing the URL to the Gist.
This Gist contains a script that prints out "Hello World"
We can execute the script like this
dotnet script https://gist.githubusercontent.com/seesharper/5d6859509ea8364a1fdf66bbf5b7923d/raw/0a32bac2c3ea807f9379a38e251d93e39c8131cb/HelloWorld.csx
That is a pretty long URL, so why don't make it a TinyURL like this:
dotnet script https://tinyurl.com/y8cda9zt
A pretty common scenario is that we have logic that is relative to the script path. We don't want to require the user to be in a certain directory for these paths to resolve correctly so here is how to provide the script path and the script folder regardless of the current working directory.
public static string GetScriptPath([CallerFilePath] string path = null) => path;
public static string GetScriptFolder([CallerFilePath] string path = null) => Path.GetDirectoryName(path);
Tip: Put these methods as top level methods in a separate script file and
#load
that file wherever access to the script path and/or folder is needed.
This release contains a C# REPL (Read-Evaluate-Print-Loop). The REPL mode ("interactive mode") is started by executing dotnet-script
without any arguments.
The interactive mode allows you to supply individual C# code blocks and have them executed as soon as you press Enter. The REPL is configured with the same default set of assembly references and using statements as regular CSX script execution.
Once dotnet-script
starts you will see a prompt for input. You can start typing C# code there.
~$ dotnet script
> var x = 1;
> x+x
2
If you submit an unterminated expression into the REPL (no ;
at the end), it will be evaluated and the result will be serialized using a formatter and printed in the output. This is a bit more interesting than just calling ToString()
on the object, because it attempts to capture the actual structure of the object. For example:
~$ dotnet script
> var x = new List<string>();
> x.Add("foo");
> x
List<string>(1) { "foo" }
> x.Add("bar");
> x
List<string>(2) { "foo", "bar" }
>
REPL also supports inline Nuget packages - meaning the Nuget packages can be installed into the REPL from within the REPL. This is done via our #r
and #load
from Nuget support and uses identical syntax.
~$ dotnet script
> #r "nuget: Automapper, 6.1.1"
> using AutoMapper;
> typeof(MapperConfiguration)
[AutoMapper.MapperConfiguration]
> #load "nuget: simple-targets-csx, 6.0.0";
> using static SimpleTargets;
> typeof(TargetDictionary)
[Submission#0+SimpleTargets+TargetDictionary]
Using Roslyn syntax parsing, we also support multiline REPL mode. This means that if you have an uncompleted code block and press Enter, we will automatically enter the multiline mode. The mode is indicated by the *
character. This is particularly useful for declaring classes and other more complex constructs.
~$ dotnet script
> class Foo {
* public string Bar {get; set;}
* }
> var foo = new Foo();
Aside from the regular C# script code, you can invoke the following commands (directives) from within the REPL:
Command | Description |
---|---|
#load | Load a script into the REPL (same as #load usage in CSX) |
#r | Load an assembly into the REPL (same as #r usage in CSX) |
#reset | Reset the REPL back to initial state (without restarting it) |
#cls | Clear the console screen without resetting the REPL state |
#exit | Exits the REPL |
You can execute a CSX script and, at the end of it, drop yourself into the context of the REPL. This way, the REPL becomes "seeded" with your code - all the classes, methods or variables are available in the REPL context. This is achieved by running a script with an -i
flag.
For example, given the following CSX script:
var msg = "Hello World";
Console.WriteLine(msg);
When you run this with the -i
flag, Hello World
is printed, REPL starts and msg
variable is available in the REPL context.
~$ dotnet script foo.csx -i
Hello World
>
You can also seed the REPL from inside the REPL - at any point - by invoking a #load
directive pointed at a specific file. For example:
~$ dotnet script
> #load "foo.csx"
Hello World
>
The following example shows how we can pipe data in and out of a script.
The UpperCase.csx
script simply converts the standard input to upper case and writes it back out to standard output.
using (var streamReader = new StreamReader(Console.OpenStandardInput()))
{
Write(streamReader.ReadToEnd().ToUpper());
}
We can now simply pipe the output from one command into our script like this.
echo "This is some text" | dotnet script UpperCase.csx
THIS IS SOME TEXT
The first thing we need to do add the following to the launch.config
file that allows VS Code to debug a running process.
{
"name": ".NET Core Attach",
"type": "coreclr",
"request": "attach",
"processId": "${command:pickProcess}"
}
To debug this script we need a way to attach the debugger in VS Code and the simplest thing we can do here is to wait for the debugger to attach by adding this method somewhere.
public static void WaitForDebugger()
{
Console.WriteLine("Attach Debugger (VS Code)");
while(!Debugger.IsAttached)
{
}
}
To debug the script when executing it from the command line we can do something like
WaitForDebugger();
using (var streamReader = new StreamReader(Console.OpenStandardInput()))
{
Write(streamReader.ReadToEnd().ToUpper()); // <- SET BREAKPOINT HERE
}
Now when we run the script from the command line we will get
$ echo "This is some text" | dotnet script UpperCase.csx
Attach Debugger (VS Code)
This now gives us a chance to attach the debugger before stepping into the script and from VS Code, select the .NET Core Attach
debugger and pick the process that represents the executing script.
Once that is done we should see our breakpoint being hit.
By default, scripts will be compiled using the debug
configuration. This is to ensure that we can debug a script in VS Code as well as attaching a debugger for long running scripts.
There are however situations where we might need to execute a script that is compiled with the release
configuration. For instance, running benchmarks using BenchmarkDotNet is not possible unless the script is compiled with the release
configuration.
We can specify this when executing the script.
dotnet script foo.csx -c release
Starting from version 0.50.0, dotnet-script
supports .Net Core 3.0 and all the C# 8 features. The way we deal with nullable references types in dotnet-script
is that we turn every warning related to nullable reference types into compiler errors. This means every warning between CS8600
and CS8655
are treated as an error when compiling the script.
Nullable references types are turned off by default and the way we enable it is using the #nullable enable
compiler directive. This means that existing scripts will continue to work, but we can now opt-in on this new feature.
#!/usr/bin/env dotnet-script
#nullable enable
string name = null;
Trying to execute the script will result in the following error
main.csx(5,15): error CS8625: Cannot convert null literal to non-nullable reference type.
We will also see this when working with scripts in VS Code under the problems panel.
Download Details:
Author: filipw
Source Code: https://github.com/filipw/dotnet-script
License: MIT License
1670082180
In this article, we will learn about SAS Vs R: What Is Difference Between R and SAS?. SAS is losing its footing across industries due to the rise of Shiny, an R package giving users bespoke interactivity on top of their R routines. In this article, we’ll discuss SAS vs R Programming in the context of the pharmaceutical industry, but the topic of conversation applies to any data science user looking to switch data analytics tooling.
Data science teams are searching for SAS alternatives that can better handle their technical needs while satisfying non-technical personnel with interactive data storytelling. The result of this search boils down to Python vs R Programming. And although there are drag-n-drop BI tools, these solutions do not satisfy custom development, machine learning, and big data handling needs.
Appsilon is an RStudio (Posit) Full Service Certified Partner. Find out how we can help you with R and Python development services and RStudio discounts.
If your team uses Python and is comfortable with this language, we won’t try to evangelize you. But if you find value in R for data analytics and statistical analysis, we highly recommend exploring the innovations R can provide in your organization.
A statistical analysis has several steps: problem statement, data collection, data wrangling, data analysis, and results-based communication.
The data analysis step usually involves summarizing data using descriptive statistics and applying inferential statistics through hypothesis testing and modeling.
Sharing results is usually done by writing reports. These reports commonly contain different visualizations that help contextualize the analysis; especially for those not involved throughout the analysis process.
To perform this kind of analysis, a data analyst can choose from a variety of tools. But as we mentioned, there are some solutions that work better for your unique case. In this post, we will compare two of them: R and SAS.
SAS and R programming are both statistical software used by researchers and data scientists to create statistical data analyses and visualizations.
Let’s begin by introducing the tools:
SAS is commercial software that can be used to perform advanced analytics, business intelligence, data management, and predictive analytics. You can use SAS software through both a graphical interface and the SAS programming language.
A SAS program is a sequence of steps that you submit to SAS for execution. Each step in the program performs a specific task. Only two kinds of steps make up SAS programs:
A SAS program can contain a DATA step, a PROC step, or any combination of DATA steps and PROC steps. The number and kind of steps depend on what tasks you need to perform.
The following example uses SAS to Compare Group Means. The idea is to showcase how the code and output look; not to perform a real analysis. The example data set created consists of only 6 observations.
* create example dataset;
data patients;
input patient_id treatment $ age;
cards;
1 a 24
2 a 23
3 a 25
4 b 30
5 b 36
6 b 34
;
run;
* compare group means;
ods graphics on;
proc ttest cochran ci=equal umpu;
class treatment;
var age;
run;
ods graphics off;
You can see how we created the data in the DATA step and then called the PROC step to perform our analysis. Here you can explore the options used in the PROC step.
The output of the analysis looks like this:
As we can see, SAS provides a lot of information when you run a PROC. You can draw conclusions from both the tables and charts. It’s standard styling and output with no custom branding.
The SAS software suite is made of components for data management, advanced analytics, multivariate analysis, and more. Here are just a few of some important components of the SAS suite:
R is an open-source language and environment for statistical computing and graphics. It provides a wide variety of statistical techniques such as linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, and clustering. Its most popular IDE (Integrated Development Environment) by far is RStudio by RStudio PBC (Posit).
In R, data is stored in objects. These objects can store strings, numeric values, data sets, or anything that can be referenced. To work with these objects we create functions. R has a style of problem-solving centered on functions.
Thinking about switching to R and Shiny? See why you might want to switch to R Shiny for enterprise application development.
In R, code can be shared through packages. A package is a shareable collection of code that is used to perform a desired function or specific task. Examples of packages used for data science include readr, dplyr, tidyr, ggplot2.
Anyone can make a package. They can be made publicly available via CRAN (Comprehensive R Archive Network), but you can also create private packages that you can use within your organization. As of writing these lines, there are over 18.000 packages available on CRAN!
Appsilon contributes to open source as well through our Shiny tools. We create packages to help us, and other R/Shiny developers build scalable, reproducible, and better-looking Shiny applications.
Let’s reproduce the SAS example using R code. Again, the idea is to showcase how code and output look, and not to focus on the interpretation of results.
Code:
# create example dataset
patients <- data.frame(
patient_id = 1:6,
treatment = rep(c("a", "b"), each = 3),
age = c(24, 23, 25, 30, 36, 34)
)
# compare group means
t.test(age ~ treatment, data = patients)
Console output:
R output is not styled by default. Nevertheless, there are tools you can use to make the output something you can share in a report, for example, RMarkdown.
The only limitation to output styling is your imagination. You can explore some of our Shiny demos from a variety of use cases.
In general, you should at least begin incorporating R programming into your data science toolset whether you work in an enterprise or as a private individual.
Obviously, the real answer depends on your use case. But as mentioned above, SAS is proprietary, commercial software which can be expensive to the average user. R programming is free with plenty of open source tools and frankly outpaces SAS in a lot of aspects.
Curious what R Shiny applications you can build? Explore some of our R Shiny demos to see what you could create.
And in this Section, we’ll compare both software on different topics that will affect your choice of SAS or R within an enterprise.
Open-source software acceptance has increased in recent years. People working together in the community allows for quicker “to market” solutions. It also creates open access to see what lies underneath the hood. There’s no guesswork in how algorithms work or if it’s the best for your case.
These new algorithms are developed and shared with the community. Implementation of them in SAS takes longer than in R. This means that more advanced data science techniques might be available right now in R but not yet available in SAS.
File sharing and collaboration are easier with R. If you want to share with a friend or colleague something you developed using SAS, that person requires access to the software – which is licensed. Even though there are some free versions, they require setting up an account which might be something you want to avoid. R is easily downloaded and installed so you can quickly set it up and run code. You can also quickly publish a dashboard on the web using Shiny.
Get your data story into the hands of colleagues quickly using these top 3 methods for sharing R Shiny apps.
SAS is commercial software. Meaning, that you must pay to play, so to speak. SAS licenses are known to be expensive so it makes it difficult for individuals and small businesses to use or scale.
On the other hand, R is open source. In other words, it’s free to use. Anyone can download it and start using it.
Join the Shiny movement and develop your own R Shiny dashboard in less than 10 minutes!
Whether or not you have experience with programming, we recommend learning R first. It’s easy to get started, free, and there are lots of freely accessible learning materials.
If you have experience using programming languages, switching to a different language is a matter of learning how to do the things you know, in another place. It usually depends on the resources available for learning.
Need a Shiny dashboard now? Download our free Shiny templates and get started today!
R has a lot of free, online resources to get started (e.g., Hands-on Programming with R and R for Data Science. You can also find books for different topics you want to learn (e.g., reporting, and creating web applications). Books are not the only resources available. Join RStudio’s webinars, learn and connect with R users in industry!
SAS offers courses to learn its software. They also have extensive documentation. Another thing worth mentioning about SAS is that it offers some products that don’t require knowing how to code (e.g., SAS Enterprise Guide). These tools Access the functionality of SAS from a point-and-click Windows interface.
The following table compares how SAS and R work.
SAS | R |
Data steps | Expressions with functions |
Procedures | Expressions with functions |
Macros | Expressed in R functions |
SAS Functions | R functions |
SAS ODS (Output Delivery System) | R Markdown, R Quarto |
Over the last decade, universities have begun to shift from teaching SAS to R. Even domain-specific stats courses tend to use R and train on the RStudio IDE. This means that the R talent pool has increased and will continue to do so in the future.
With that being said, R is not as popular as Python for developers. The TIOBE Index for 2022 indicates Python is King of the hill at #1 (R is #16, and SAS is a lowly #26). So if stacking your team is a priority and you’re already using Python routines for your analytics, stick with Python.
Data from PYPL Index
If you don’t know already, you can now use Python on RStudio. If you need help setting up your environment with your preferred language on RStudio platforms, contact Appsilon. We’re RStudio Certified Partners and can help you with
There’s a growing number of R-based consultancies popping up as companies begin to expand their data science teams and the need for R Shiny developers to handle more complex data handling and visualization requirements.
At Appsilon, we’ve been creating, maintaining, and developing Shiny applications for enterprise customers all over the world for many years now. Appsilon provides scalability, security, and modern UI/UX with custom R packages that native Shiny apps do not provide. Our team is among the world’s foremost experts in R Shiny and has made a variety of Shiny innovations (including scaling to 700 users!) over the years.
Appsilon is also a proud RStudio (Posit) Full Service Certified Partner. Meaning we can help you throughout the entire process of implementing and scaling RStudio (Posit) products and simplify your data-driven decision-making.
Some of the services we, as Shiny consultants provide include:
We deliver world-class Shiny applications faster than other vendors. Ultimately, lowering the overall cost of development and improving time to deployment. We use continuous collaboration with clients, end-to-end testing, and automated processes to streamline the development process. Our team can step in at every phase of a Shiny project, starting from business analysis and data science consulting to code refactoring.
As mentioned, R packages can be developed by anyone. Even though there is no guarantee that they will work as expected, a package that is used by a lot of people is usually something safe to use. The reason is the following: suppose that a package has a bug the creator wasn’t aware of. People start using that package and someone identifies that problem. That person shares that with the creator (and the community) so that it can be fixed. Even someone other than the creator of the package can help code a solution!
In R, there is something called the tidyverse, a collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
On the other hand, if you detect a problem in SAS you have to communicate with them and wait for a new release with a solution. This might be another reason why things take longer to implement in SAS.
R provides a variety of packages to create custom charts, both static (ggplot2) and dynamic (plotly, highcharter). Here you can see lots of different visualizations you can make with R and the code used to produce them!
SAS data visualization features are more limited than R and don’t provide as much customization.
In this example, we will create another example dataset. We will create a histogram to understand age distribution by treatment group. SAS’ PROC UNIVARIATE also provides more information which we will not show here because the goal is to see the chart.
* create example dataset;
data patients;
input treatment $ age sex $;
cards;
a 24 m
a 23 m
a 25 m
a 21 m
a 22 f
a 22 f
a 23 f
a 28 f
a 21 f
a 20 f
a 29 f
a 18 f
a 30 f
a 23 f
a 25 f
a 24 f
a 23 f
a 25 f
b 30 f
b 36 f
b 34 f
b 31 f
b 32 m
b 32 m
b 34 m
b 33 m
b 34 m
b 30 m
b 28 m
b 33 m
b 40 m
b 22 m
b 29 m
;
run;
/*create histogram for age variable by treatment*/
proc univariate data=patients;
class treatment;
var age;
histogram age / overlay;
run;
Output:
We will produce the same chart with R code.
# load libraries
library(tibble)
library(ggplot2)
# create example dataset
patients <- tibble::tribble(
~treatment, ~age, ~sex,
"a", 24, "m",
"a", 23, "m",
"a", 25, "m",
"a", 21, "m",
"a", 22, "f",
"a", 22, "f",
"a", 23, "f",
"a", 28, "f",
"a", 21, "f",
"a", 20, "f",
"a", 29, "f",
"a", 18, "f",
"a", 30, "f",
"a", 23, "f",
"a", 25, "f",
"a", 24, "f",
"a", 23, "f",
"a", 25, "f",
"b", 30, "f",
"b", 36, "f",
"b", 34, "f",
"b", 31, "f",
"b", 32, "m",
"b", 32, "m",
"b", 34, "m",
"b", 33, "m",
"b", 34, "m",
"b", 30, "m",
"b", 28, "m",
"b", 33, "m",
"b", 40, "m",
"b", 22, "m",
"b", 29, "m"
)
# create chart
ggplot(data = patients, aes(x = age, fill = treatment)) +
geom_histogram(position = "identity",
alpha = 0.5,
bins = 9,
color = "black") +
labs(
title = "Distribution of age by treatment",
x = "Age (years)",
y = "Number of Patients",
fill = "Treatment"
) +
theme_minimal() +
theme(
legend.position = "top"
)
We see how with just a few lines of code we are able to create a beautiful chart using R. And we’re able to provide more customization to better suit our needs.
SAS provides Technical Support and has Documentation with information about everything you can do and how things are implemented in the software.
R doesn’t have Technical Support (it is open source) but it has a large community you can reach out to for help. Packages are usually well documented and come with excellent tutorials (called Vignettes) with examples. If you’re used to Python, you’ll be pleasantly surprised by the quality of documentation that is standard for the R ecosystem (e.g., dplyr vignette, tidyr vignette).
SAS is great when you need minimal output or sequential processing. But R offers greater flexibility. And with the recent successes by the R Consortium and enhanced collaboration with the FDA, R is trending toward’s higher standardization to satisfy regulatory needs.
In this section, we will explore how to solve a particular problem using each software. We will use R through RStudio IDE and SAS using SAS On Demand (which is a free version).
We want to analyze the effect of different variables on mortality due to a particular disease. In particular, we want to understand the differences between treatment application times (no treatment, fast treatment, slow treatment). To do so, we will create a logistic regression model.
The example dataset was created to showcase how to perform the different steps that are usually part of an analysis. It is based on real data, but it has been anonymized and some information was removed from the file (such as disease and treatment names). The focus here is on the how not the what.
Data is in .csv format and contains the following information about patients:
In the following Sections we will see how to perform different tasks with SAS and R:
To start working with data, first, we need to have access to it.
* READ DATA;
FILENAME REFFILE '/home/u4729884/data.csv';
PROC IMPORT DATAFILE=REFFILE
DBMS=CSV
OUT=WORK.RAW_DATA;
GETNAMES=YES;
RUN;
# Read data
raw_data <- read.csv("data.csv")
Now that we have data available, we will prepare it for the model. In this Section we will show how to:
This is not the only way one could process the data. We choose to do it this way for simplicity. Feel free to try something different and share your results!
* DATA WRANGLING;
DATA WORK.DATA_TO_MODEL (KEEP = AGE_60
SEX
CHARLSON_4
PITT_3
TREATMENT_CATEGORY
DIED_OF_DISEASE); * KEEP VARS OF INTEREST;
SET WORK.RAW_DATA;
* REMOVE PATIENTS WITH UNKNOWN STATUS;
IF UNKNOWN = 1 THEN DELETE;
* REMOVE PATIENTS THAT DIED DUE TO OTHER CAUSE;
IF DIED_OTHER_CAUSE = 1 THEN DELETE;
* CREATE TREATMENT FACTOR VARIABLE;
LENGTH TREATMENT_CATEGORY $14;
IF TREATMENT = 0 THEN TREATMENT_CATEGORY = "NO TREATMENT";
ELSE IF TREATMENT_FAST = 1 THEN TREATMENT_CATEGORY = "FAST TREATMENT";
ELSE TREATMENT_CATEGORY = "SLOW TREATMENT";
* NEW AGE VARIABLE;
IF AGE >= 60 THEN AGE_60 = 1;
ELSE AGE_60 = 0;
* NEW CHARLSON VARIABLE;
IF CHARLSON > 4 THEN CHARLSON_4 = 1;
ELSE CHARLSON_4 = 0;
* NEW PITT VARIABLE;
IF PITT > 3 THEN PITT_3 = 1;
ELSE PITT_3 = 0;
RUN;
# Load required library
library(dplyr)
# Data wrangling
data_to_model <- raw_data |>
# Filter rows
filter(
UNKNOWN != 1,
DIED_OTHER_CAUSE != 1
) |>
# Create new columns
mutate(
TREATMENT_CATEGORY = case_when(
TREATMENT == 0 ~ "NO TREATMENT",
TREATMENT_FAST == 1 ~ "FAST TREATMENT",
TRUE ~ "SLOW TREATMENT"
),
AGE_60 = ifelse(AGE >= 60, 1, 0),
CHARLSON_4 = ifelse(CHARLSON > 4, 1, 0),
PITT_3 = ifelse(PITT > 3, 1, 0)
) |>
# Select columns
select(
AGE_60,
CHARLSON_4,
PITT_3,
TREATMENT_CATEGORY,
DIED_OF_DISEASE
)
Once data is processed, we are ready to model. We will create a logistic regression model where we will model the probability of dying due to the disease. As explanatory variables we will include:
* MODELING;
PROC LOGISTIC DATA = WORK.DATA_TO_MODEL DESCENDING;
CLASS TREATMENT_CATEGORY (REF = "FAST TREATMENT") SEX (REF = "F") / PARAM = REFERENCE;
MODEL DIED_OF_DISEASE = AGE_60 CHARLSON_4 PITT_3 TREATMENT_CATEGORY / LINK = LOGIT SCALE = NONE;
RUN;
# Create model
model <- glm(formula = DIED_OF_DISEASE ~ .,
data = data_to_model,
family = binomial)
# Explore results
summary(model)
# Get odds ratio
exp(cbind(coef(model), confint(model, level = 0.95)))
Let’s focus on these two tables:
Here we can see the coefficients of the model, its significance level, and the translation to odds ratio estimates (which are more interpretable when doing logistic regression.
We will not dive into details for explaining model assessment or results interpretation. The idea here is to show where this information is available and how to get it using code.
In R, we have to compute the odds ratio using the model coefficients. It can be done with:
General remarks:
Detailed comparison:
If you’re looking to keep pace within your industry or create faster tooling and PoCs for your team, you should consider switching to R programming. SAS still holds value for a lot of users, but R and its open source packages are becoming the standard for the new workforce. Don’t get left behind!
Original article sourced at: https://appsilon.com
1590751169
🔥🔥🔥This R, Python and SAS Comparison video you will learn the difference between Python vs R vs SAS and whether you should learn r, python and sas for data science? This video also provides you with a short and crisp introduction to top three languages used in the IT industry: R, Python and sas. Some important parameters have been taken into consideration to give you R, Python and sas comparison so that you understand how these languages differ from each other and also learn why one is preferred over the other in certain aspects.
Link: https://www.youtube.com/watch?v=S0P4N7m9y28
#pythonvsrvssas #learnrpythonandsas #python #sas #r