George  Koelpin

George Koelpin

1603256400

The Most Underrated R packages

In my experience as an R user, I’ve come across a lot of different packages and curated lists. Some are in my bookmarks like the great awesome-R list, or the monthly “best of” list curated by R studio. If you don’t know them, go check them out asap.

In this post, I’d like to show you something else. These are the results of late-night GitHub/Reddit browsing, and cool stuff shared by colleagues.

Some of these packages are really unique, others are just fun to use and real underdogs among the data scientist/statistician I’ve worked with.

Let’s start!

💥Misc (the weird ones) 💥

  • BRRR** and b**eepr: Have you ever wanted to know — and celebrate — when your simulations are finally done running in R? Have you ever been so proud of pulling off a tricky bit of code that you wanted Flavor Flav to yell “yeaaahhhh, boi!!” as soon as it successfully completes?
  • calendR:Ready to print monthly and yearly calendars made with ggplot2.
  • checkpoint: It makes it possible to install package versions from a specific date in the past as if you had a CRAN time machine.
  • DataEditR: DataEditR is a lightweight package to interactively view, enter or edit data in R.
  • Drake:It analyzes your workflow, skips steps with up-to-date results, and orchestrates the rest with optional distributed computing. In the end, drake provides evidence that your results match the underlying code and data, which increases your ability to trust your research
  • flow:Visualize as flow diagrams the logic of functions, expressions or scripts and ease debugging.

#analytics #data-science #r #statistics #r-package

What is GEEK

Buddha Community

The Most Underrated R packages
George  Koelpin

George Koelpin

1603256400

The Most Underrated R packages

In my experience as an R user, I’ve come across a lot of different packages and curated lists. Some are in my bookmarks like the great awesome-R list, or the monthly “best of” list curated by R studio. If you don’t know them, go check them out asap.

In this post, I’d like to show you something else. These are the results of late-night GitHub/Reddit browsing, and cool stuff shared by colleagues.

Some of these packages are really unique, others are just fun to use and real underdogs among the data scientist/statistician I’ve worked with.

Let’s start!

💥Misc (the weird ones) 💥

  • BRRR** and b**eepr: Have you ever wanted to know — and celebrate — when your simulations are finally done running in R? Have you ever been so proud of pulling off a tricky bit of code that you wanted Flavor Flav to yell “yeaaahhhh, boi!!” as soon as it successfully completes?
  • calendR:Ready to print monthly and yearly calendars made with ggplot2.
  • checkpoint: It makes it possible to install package versions from a specific date in the past as if you had a CRAN time machine.
  • DataEditR: DataEditR is a lightweight package to interactively view, enter or edit data in R.
  • Drake:It analyzes your workflow, skips steps with up-to-date results, and orchestrates the rest with optional distributed computing. In the end, drake provides evidence that your results match the underlying code and data, which increases your ability to trust your research
  • flow:Visualize as flow diagrams the logic of functions, expressions or scripts and ease debugging.

#analytics #data-science #r #statistics #r-package

Marcus  Flatley

Marcus Flatley

1594399440

Getting Started with R Markdown — Guide and Cheatsheet

In this blog post, we’ll look at how to use R Markdown. By the end, you’ll have the skills you need to produce a document or presentation using R Mardown, from scratch!

We’ll show you how to convert the default R Markdown document into a useful reference guide of your own. We encourage you to follow along by building out your own R Markdown guide, but if you prefer to just read along, that works, too!

R Markdown is an open-source tool for producing reproducible reports in R. It enables you to keep all of your code, results, plots, and writing in one place. R Markdown is particularly useful when you are producing a document for an audience that is interested in the results from your analysis, but not your code.

R Markdown is powerful because it can be used for data analysis and data science, collaborating with others, and communicating results to decision makers. With R Markdown, you have the option to export your work to numerous formats including PDF, Microsoft Word, a slideshow, or an HTML document for use in a website.

r markdown tips, tricks, and shortcuts

Turn your data analysis into pretty documents with R Markdown.

We’ll use the RStudio integrated development environment (IDE) to produce our R Markdown reference guide. If you’d like to learn more about RStudio, check out our list of 23 awesome RStudio tips and tricks!

Here at Dataquest, we love using R Markdown for coding in R and authoring content. In fact, we wrote this blog post in R Markdown! Also, learners on the Dataquest platform use R Markdown for completing their R projects.

We included fully-reproducible code examples in this blog post. When you’ve mastered the content in this post, check out our other blog post on R Markdown tips, tricks, and shortcuts.

Okay, let’s get started with building our very own R Markdown reference document!

R Markdown Guide and Cheatsheet: Quick Navigation

1. Install R Markdown

R Markdown is a free, open source tool that is installed like any other R package. Use the following command to install R Markdown:

install.packages("rmarkdown")

Now that R Markdown is installed, open a new R Markdown file in RStudio by navigating to File > New File > R Markdown…. R Markdown files have the file extension “.Rmd”.

2. Default Output Format

When you open a new R Markdown file in RStudio, a pop-up window appears that prompts you to select output format to use for the document.

New Document

The default output format is HTML. With HTML, you can easily view it in a web browser.

We recommend selecting the default HTML setting for now — it can save you time! Why? Because compiling an HTML document is generally faster than generating a PDF or other format. When you near a finished product, you change the output to the format of your choosing and then make the final touches.

One final thing to note is that the title you give your document in the pop-up above is not the file name! Navigate to File > Save As.. to name, and save, the document.

#data science tutorials #beginner #r #r markdown #r tutorial #r tutorials #rstats #rstudio #tutorial #tutorials

ROC Curve and AUC — Detailed understanding and R pROC Package

The world is facing a unique crisis these days and we all are stuck in a never seen before lockdown. As all of us are utilizing this time in many productive ways, I thought of creating some blogs of data concepts I know, not only to share it with the community but also to develop a more deep understanding of the concept as I write it down.

The first one is here about the most loved evaluation metric — The ROC curve.

ROC (Receiver Operating Characteristic) Curve is a way to visualize the performance of a binary classifier.

Understanding the confusion matrix

In order to understand AUC/ROC curve, it is important to understand the confusion matrix first.

Image for post

Image by author

TPR = TP/(TP+FN)

FPR = FP/(TN+FP)

TPR or True Positive Rate answers the question — When the actual classification is positive, how often does the classifier predict positive?

FPR or False Positive Rate answers the qestion — When the actual classification is negative, how often does the classifier incorrectly predict positive?

To understand it more clearly, let us take an example of the current COVID situation. Assume that we have data for COVID patients and using some classifier we were able to classify the patients as positive and negative.

Let us now, without going into further details have a look at the distribution of the predicted classes. Here, again for simplicity let us assume that the data is balanced i.e. negative and positive classes are almost equal, additionaly they follow a normal distribution.

Image for post

Image by author

In the above graph, my classifier is doing a great job in classifying the patients — positive and negative. If I calculate the accuracy for such model, it will be quite high. Now, for different values of threshold, I can go ahead and calculate my TPR and FPR. According to the graph let us assume, that my threshold =0.5. At this threshold, the number of patients for which my classifier predicted a probability of 0.5, half were negative and half were positive.Similarly, I can check for other thresholds as well. For every threshold, TPR would be all patients in green area in the right of the threshold line divided by total patients in the green area.

FPR would be all patients in pink area in the right of the threshold line divided by total patients in the pink area.

ROC Curve

Now, if I plot this data on a graph, I will get a ROC curve.

The ROC curve is the graph plotted with TPR on y-axis and FPR on x-axis for all possible threshold. Both TPR and FPR vary from 0 to 1.

Image for post

Therefore, a good classifier will have an arc/ curve and will be further away from the random classifier line.

To qantify a good classifier from a bad one using a ROC curve, is done by AUC (Area under Curve). From the graph it is quite clear that a good classifier will have AUC higher than a bad classifier as the area under curve will be higher for the former.

From the above discussion, it is evident that ROC is a robust evaluation metrics than say Accuracy or Missclassification error because ROC takes into account all possible threshold levels whereas a metric like missclassification error takes only one threshold level into account.

The choice of your threshold depends on the business problem or domain knowledge. In our COVID patients example above, I would be okay with high FPR thus keeping my threshold levels low to ensure maximum COVID patients tracked.

#r #auc-roc #r-package #data-science #roc #data analysis

August  Larson

August Larson

1624422360

R vs Python: What Should Beginners Learn?

Let go of any doubts or confusion, make the right choice and then focus and thrive as a data scientist.

I currently lead a research group with data scientists who use both R and Python. I have been in this field for over 14 years. I have witnessed the growth of both languages over the years and there is now a thriving community behind both.

I did not have a straightforward journey and learned many things the hard way. However, you can avoid making the mistakes I made and lead a more focussed, more rewarding journey and reach your goals quicker than others.

Before I dive in, let’s get something out of the way. R and Python are just tools to do the same thing. Data Science. Neither of the tools is inherently better than the other. Both the tools have been evolving over years (and will likely continue to do so).

Therefore, the short answer on whether you should learn Python or R is: it depends.

The longer answer, if you can spare a few minutes, will help you focus on what really matters and avoid the most common mistakes most enthusiastic beginners aspiring to become expert data scientists make.

#r-programming #python #perspective #r vs python: what should beginners learn? #r vs python #r

Top 10 R Packages For Data Visualisation One Must Know

As per study reports, data scientists and practitioners prefer R as the language for statistical modelling after Python language. Also, R dominates the preference scale, with a combined figure of 81.9% utilisation for statistical modelling among those surveyed.

Below here, we listed the top 10 libraries in R for data visualisation one must know.

(The list is in alphabetical order).


1| Colourpicker

About: Colourpicker is a tool for Shiny framework and for selecting colours in plots. This tool supports various options, such as alpha opacity, custom colour palettes, and more. The most common uses of this tool include the utilisation of the colourInput() function to create a colour input in Shiny as well as the use of the plotHelper() function/RStudio Addin to select colours for a plot.

Know more here.

2| Esquisse

About: The esquisse package allows a user to interactively explore data by visualising it with the ggplot2 package. It allows a user to draw bar graphs, curves, scatter plots, histograms, export the graphs, and retrieve the code generating the graph. With the help of esquisse, one can quickly visualise the data according to their type as well as export to PNG or PowerPoint, and retrieve the code to reproduce the chart.

Know more here.

3| ggplot2

About: ggplot is a popular package that is based on the grammar of graphics. The idea behind this library is that one can build every graph from the same components, such as a dataset, a coordinate system, and more. The package provides graphics language for creating intuitive and intricate plots. It allows a user to create graphs that represent both univariate and multivariate numerical and categorical data.

Know more here.

4| ggvis

**About: **ggvis is a data visualisation package for R that allows to declaratively describe data graphics with a syntax similar in spirit to ggplot2. It allows creating rich interactive graphics locally in Rstudio or in the browser as well as leverage the infrastructure of the Shiny package to publish interactive graphics usable from any browser. The goal of ggvis is to make it easy to build interactive graphics for exploratory data analysis.

Know more here.


#developers corner #data visualisation #r libraries #r packages #data science