1670663520

# 5 Key Data Visualization Principles Explained in R

In this R tutorial, we will learn about 5 Key Data Visualization Principles Explained in R. Data visualization can be tricky to do right. There are a ton of key principles you need to be aware of. Today we bring you 5 best practices for visualizing data with examples in R programming language. Incorporate these key R data visualization principles into your toolset to improve your data storytelling.

After reading, you’ll know how to produce publication-ready charts that won’t leave users questioning the data or the logic. You’ll know how to use `ggplot2` and `plotly` for both static and interactive charts, and also how to get maximum interactivity out of your visualizations with R Shiny.

## Don’t Manipulate with Axis Ranges

In the past, companies and individuals loved to exaggerate small and insignificant differences by manipulating axis ranges. For example, imagine a company had a profit of \$100M in 2020 and \$105M in 2021. In relative terms, that’s only a 5% increase – nothing to write home about – so the difference wouldn’t be immediately visible on a chart if the Y-axis range goes from 0 to 120 (Y-axis shows the profit).

What you could do – but shouldn’t – is to shorten the Y-axis range. A range between 99.5 and 105.5 would do the trick.

Let’s see the effect in action. Use the following code to declare a `data.frame` object containing profit for the mentioned two years:

With `ggplot2`, you can use the `coord_cartesian(ylim = c(lower, upper))` to change the Y-axis range. Let’s set it to go from 99.5 to 105.5:

Image 1 – Bar chart with manipulatively formatted Y-axis

It looks like the difference is huge – easily 5-6 times higher than the year before. The chart doesn’t lie actually, but it doesn’t respect key data visualization principles. It’s easy to get the whole story wrong if you don’t look at the axis ticks.

The same chart looks nowhere near as impressive with the default Y-axis range:

Image 2 – Bar chart with normally formatted Y-axis

Take-home point: Always read the axis ticks. Just because you’re obeying key data visualization principles, it doesn’t mean everyone else is.

## Always Add Title and Axis Labels

A chart without a title and axis labels is pretty much useless. It might look great otherwise, but how can you know what you’re looking at? There’s no way to tell. Sure, you can describe the contents in the paragraph above, but that’s not a replacement. It’s only a supplement at best.

Luckily, `ggplot2` makes it easy to obey this key data visualization principle. You can use the `labs()` function to add title, subtitle, caption, and axis labels, and you can use the `theme()` function to style them:

Image 3 – Bar chart with title, subtitle, caption, and axis labels

Not all charts need a subtitle and a caption, but we added them just for the fun. Every chart you make should include a title and axis labels at least.

Additionally, be aware of proportioning in your visualizations. If you’re title and label texts are small, they might be overwhelmed by elements in the chart. Be cognizant of what you want readers to view and emphasize elements accordingly.

## Choose Appropriate and Appealing Color Palettes

There’s nothing worse than spending hours making the best out of your data but failing to make the chart visually appealing. We get it – not everyone has an eye for design. If you’re a software engineer, it’s likely you find design and aesthetics a nightmare. Similarly, if you’re a graphics designer, you’re able to design great-looking visuals – but can you implement them in code?

That’s where choosing an appropriate color palette comes in. The coolors.co is used and loved by many when it comes to picking a color palette.

Image 4 – coolors.co – A website for generating color palettes

It’s mostly used for entire websites and brand identities, but there’s no reason you can’t pick a single color you like (or multiple), and use it in your data visualizations.

The second one – Prussian Blue looks promising. Specify the `fill` parameter in the call to `geom_bar()` to change the color:

Image 5 – Chart with a single color bars

Sometimes, a single color won’t work. If your dataset has a categorical feature (e.g., day of the week, gender, age group), you can use it to color the bars or different chart segments. Simply set the `fill` parameter to the name of the dataset variable in the `ggplot()` function call:

Image 6 – Chart with a multiple color bars

The selected column of this dataset has only two features, but you get the gist. Just don’t go around using color ramps willy-nilly or combining two scales for one trait – or at least don’t tell anyone we told you to do it.

Color is a key data visualization principle. Master it and use it wisely.

## Ditch 3D Charts – 2D is Plenty Enough

Take a look at the following three charts – don’t worry, we didn’t create them, we just picked them from the Internet:

Image 7 – Various 3D charts

What do they all have in common? You’ve guessed it – they all look horrible. Depth has no place in most data visualizations, especially not in those aimed at business users and the general public. Also, you can’t embed 3D visualizations in publications.

You can use depth, or Z-axis, when analyzing data yourself. After all, you know best what works for you – but that’s where the story should end.

Most users find the third dimension confusing for data visualization, and we get that. It’s easy to distort the data and come up with wrong insights. After all, everything is a matter of perspective. Two dimensions are just enough for 99.9% of the cases. If you want to convey extra information, consider changing the size or color of graph elements to accommodate for extra variables.

## Make Your Charts Interactive – Go the Extra Mile for Better Data Visualizations

Probably the most important key data visualization principle and component is interactivity. There’s nothing wrong with static charts, especially if you’re just getting into data visualization, but interactivity will set you apart from the crowd.

The idea is that something should happen when you click or hover over a chart element. With bar charts, the most common thing you can do is to display the counts of the selected category.

Unfortunately, `ggplot2` doesn’t support interactivity at this time. You’ll have to switch to some other alternative instead, like `plotly`. The syntax is a bit different, but you’ll quickly get the hang of it. Their documentation is superb, and you’ll find everything you need there.

Here’s how to “redraw” the chart from the previous sections in Plotly:

Image 8 – Interactive Plotly bar chart

You can see how detailed data is shown automatically as you hover over individual bars. What gets displayed can be tweaked, but more on that some other time.

Do you know what really sets your visualizations from the crowd? You’ve guessed it – dashboards – at least in the interactivity department. For demonstration’s sake, we’ll declare a new dataset consisting of budgets across two departments in a two-year time span. The end-user can select the department on the dashboard, and the chart gets redrawn instantly. Take a look:

Image 9 – Interactive R Shiny application

Embedding your visualizations into dashboards is light years ahead of everything you can do with a static graphing library. It allows for the most flexibility for the end-user, which is the only thing that matters in the long run.

It’s safe to say interactivity is among the most important key data visualization principles of 2022 and beyond.

## Summary of Key Data Visualization Principles

Data visualization is one of those things that looks easy, but in reality, it’s easy to get wrong. A small error like forgetting to add axis labels can cost you a lot in the long run, especially if you can’t add it afterward.

Today you’ve learned five key principles of data visualization and got hands-on experience of visualizing data in R – with `ggplot2`, `plotly`, and `shiny`. It’s a lot to process for a single article, but we hope you managed to follow along.

Original article sourced at: https://appsilon.com

1649209980

## C# REPL

A cross-platform command line REPL for the rapid experimentation and exploration of C#. It supports intellisense, installing NuGet packages, and referencing local .NET projects and assemblies.

(click to view animation)

C# REPL provides the following features:

• Syntax highlighting via ANSI escape sequences
• Intellisense with fly-out documentation
• Nuget package installation
• Reference local assemblies, solutions, and projects
• Navigate to source via Source Link
• IL disassembly (both Debug and Release mode)
• Fast and flicker-free rendering. A "diff" algorithm is used to only render what's changed.

## Installation

C# REPL is a .NET 6 global tool, and runs on Windows 10, Mac OS, and Linux. It can be installed via:

``````dotnet tool install -g csharprepl
``````

If you're running on Mac OS Catalina (10.15) or later, make sure you follow any additional directions printed to the screen. You may need to update your PATH variable in order to use .NET global tools.

After installation is complete, run `csharprepl` to begin. C# REPL can be updated via `dotnet tool update -g csharprepl`.

## Usage:

Run `csharprepl` from the command line to begin an interactive session. The default colorscheme uses the color palette defined by your terminal, but these colors can be changed using a `theme.json` file provided as a command line argument.

### Evaluating Code

Type some C# into the prompt and press Enter to run it. The result, if any, will be printed:

``````> Console.WriteLine("Hello World")
Hello World

[6/7/2021 5:13:00 PM]
``````

To evaluate multiple lines of code, use Shift+Enter to insert a newline:

``````> var x = 5;
var y = 8;
x * y
40
``````

Additionally, if the statement is not a "complete statement" a newline will automatically be inserted when Enter is pressed. For example, in the below code, the first line is not a syntactically complete statement, so when we press enter we'll go down to a new line:

``````> if (x == 5)
| // caret position, after we press Enter on Line 1
``````

Finally, pressing Ctrl+Enter will show a "detailed view" of the result. For example, for the `DateTime.Now` expression below, on the first line we pressed Enter, and on the second line we pressed Ctrl+Enter to view more detailed output:

``````> DateTime.Now // Pressing Enter shows a reasonable representation
[5/30/2021 5:13:00 PM]

> DateTime.Now // Pressing Ctrl+Enter shows a detailed representation
[5/30/2021 5:13:00 PM] {
Date: [5/30/2021 12:00:00 AM],
Day: 30,
DayOfWeek: Sunday,
DayOfYear: 150,
Hour: 17,
InternalKind: 9223372036854775808,
InternalTicks: 637579915804530992,
Kind: Local,
Millisecond: 453,
Minute: 13,
Month: 5,
Second: 0,
Ticks: 637579915804530992,
TimeOfDay: [17:13:00.4530992],
Year: 2021,
_dateData: 9860951952659306800
}
``````

A note on semicolons: C# expressions do not require semicolons, but statements do. If a statement is missing a required semicolon, a newline will be added instead of trying to run the syntatically incomplete statement; simply type the semicolon to complete the statement.

``````> var now = DateTime.Now; // assignment statement, semicolon required

> DateTime.Now.AddDays(8) // expression, we don't need a semicolon
[6/7/2021 5:03:05 PM]
``````

### Keyboard Shortcuts

• Basic Usage
• Ctrl+C - Cancel current line
• Ctrl+L - Clear screen
• Enter - Evaluate the current line if it's a syntactically complete statement; otherwise add a newline
• Control+Enter - Evaluate the current line, and return a more detailed representation of the result
• Shift+Enter - Insert a new line (this does not currently work on Linux or Mac OS; Hopefully this will work in .NET 7)
• Ctrl+Shift+C - Copy current line to clipboard
• Ctrl+V, Shift+Insert, and Ctrl+Shift+V - Paste text to prompt. Automatically trims leading indent
• Code Actions
• F1 - Opens the MSDN documentation for the class/method under the caret (example)
• F9 - Shows the IL (intermediate language) for the current statement in Debug mode.
• Ctrl+F9 - Shows the IL for the current statement with Release mode optimizations.
• F12 - Opens the source code in the browser for the class/method under the caret, if the assembly supports Source Link.
• Autocompletion
• Ctrl+Space - Open autocomplete menu. If there's a single option, pressing Ctrl+Space again will select the option
• Enter, Right Arrow, Tab - Select active autocompletion option
• Escape - closes autocomplete menu
• Home and End - Navigate to beginning of a single line and end of a single line, respectively
• Ctrl+Home and Ctrl+End - Navigate to beginning of line and end across multiple lines in a multiline prompt, respectively
• Arrows - Navigate characters within text
• Ctrl+Arrows - Navigate words within text
• Ctrl+Backspace - Delete previous word
• Ctrl+Delete - Delete next word

Use the `#r` command to add assembly or nuget references.

• For assembly references, run `#r "AssemblyName"` or `#r "path/to/assembly.dll"`
• For project references, run `#r "path/to/project.csproj"`. Solution files (.sln) can also be referenced.
• For nuget references, run `#r "nuget: PackageName"` to install the latest version of a package, or `#r "nuget: PackageName, 13.0.5"` to install a specific version (13.0.5 in this case).

To run ASP.NET applications inside the REPL, start the `csharprepl `application with the `--framework` parameter, specifying the `Microsoft.AspNetCore.App` shared framework. Then, use the above `#r` command to reference the application DLL. See the Command Line Configuration section below for more details.

``````csharprepl --framework  Microsoft.AspNetCore.App
``````

## Command Line Configuration

The C# REPL supports multiple configuration flags to control startup, behavior, and appearance:

``````csharprepl [OPTIONS] [response-file.rsp] [script-file.csx] [-- <additional-arguments>]
``````

Supported options are:

• OPTIONS:
• `-r <dll>` or `--reference <dll>`: Reference an assembly, project file, or nuget package. Can be specified multiple times. Uses the same syntax as `#r` statements inside the REPL. For example, `csharprepl -r "nuget:Newtonsoft.Json" "path/to/myproj.csproj"`
• When an assembly or project is referenced, assemblies in the containing directory will be added to the assembly search path. This means that you don't need to manually add references to all of your assembly's dependencies (e.g. other references and nuget packages). Referencing the main entry assembly is enough.
• `-u <namespace>` or `--using <namespace>`: Add a using statement. Can be specified multiple times.
• `-f <framework>` or `--framework <framework>`: Reference a shared framework. The available shared frameworks depends on the local .NET installation, and can be useful when running an ASP.NET application from the REPL. Example frameworks are:
• Microsoft.NETCore.App (default)
• Microsoft.AspNetCore.All
• Microsoft.AspNetCore.App
• Microsoft.WindowsDesktop.App
• `-t <theme.json>` or `--theme <theme.json>`: Read a theme file for syntax highlighting. This theme file associates C# syntax classifications with colors. The color values can be full RGB, or ANSI color names (defined in your terminal's theme). The NO_COLOR standard is supported.
• `--trace`: Produce a trace file in the current directory that logs CSharpRepl internals. Useful for CSharpRepl bug reports.
• `-v` or `--version`: Show version number and exit.
• `-h` or `--help`: Show help and exit.
• `response-file.rsp`: A filepath of an .rsp file, containing any of the above command line options.
• `script-file.csx`: A filepath of a .csx file, containing lines of C# to evaluate before starting the REPL. Arguments to this script can be passed as `<additional-arguments>`, after a double hyphen (`--`), and will be available in a global `args` variable.

If you have `dotnet-suggest` enabled, all options can be tab-completed, including values provided to `--framework` and .NET namespaces provided to `--using`.

## Integrating with other software

C# REPL is a standalone software application, but it can be useful to integrate it with other developer tools:

### Windows Terminal

To add C# REPL as a menu entry in Windows Terminal, add the following profile to Windows Terminal's `settings.json` configuration file (under the JSON property `profiles.list`):

``````{
"name": "C# REPL",
"commandline": "csharprepl"
},
``````

To get the exact colors shown in the screenshots in this README, install the Windows Terminal Dracula theme.

### Visual Studio Code

To use the C# REPL with Visual Studio Code, simply run the `csharprepl` command in the Visual Studio Code terminal. To send commands to the REPL, use the built-in `Terminal: Run Selected Text In Active Terminal` command from the Command Palette (`workbench.action.terminal.runSelectedText`).

### Windows OS

To add the C# REPL to the Windows Start Menu for quick access, you can run the following PowerShell command, which will start C# REPL in Windows Terminal:

``````\$shell = New-Object -ComObject WScript.Shell
\$shortcut.TargetPath = "wt.exe"
\$shortcut.Arguments = "-w 0 nt csharprepl.exe"
\$shortcut.Save()
``````

You may also wish to add a shorter alias for C# REPL, which can be done by creating a `.cmd` file somewhere on your path. For example, put the following contents in `C:\Users\username\.dotnet\tools\csr.cmd`:

``````wt -w 0 nt csharprepl
``````

This will allow you to launch C# REPL by running `csr` from anywhere that accepts Windows commands, like the Window Run dialog.

## Comparison with other REPLs

This project is far from being the first REPL for C#. Here are some other projects; if this project doesn't suit you, another one might!

Visual Studio's C# Interactive pane is full-featured (it has syntax highlighting and intellisense) and is part of Visual Studio. This deep integration with Visual Studio is both a benefit from a workflow perspective, and a drawback as it's not cross-platform. As far as I know, the C# Interactive pane does not support NuGet packages or navigating to documentation/source code. Subjectively, it does not follow typical command line keybindings, so can feel a bit foreign.

csi.exe ships with C# and is a command line REPL. It's great because it's a cross platform REPL that comes out of the box, but it doesn't support syntax highlighting or autocompletion.

dotnet script allows you to run C# scripts from the command line. It has a REPL built-in, but the predominant focus seems to be as a script runner. It's a great tool, though, and has a strong community following.

dotnet interactive is a tool from Microsoft that creates a Jupyter notebook for C#, runnable through Visual Studio Code. It also provides a general framework useful for running REPLs.

Author: waf
Source Code: https://github.com/waf/CSharpRepl

1620466520

If you accumulate data on which you base your decision-making as an organization, you should probably think about your data architecture and possible best practices.

If you accumulate data on which you base your decision-making as an organization, you most probably need to think about your data architecture and consider possible best practices. Gaining a competitive edge, remaining customer-centric to the greatest extent possible, and streamlining processes to get on-the-button outcomes can all be traced back to an organization’s capacity to build a future-ready data architecture.

In what follows, we offer a short overview of the overarching capabilities of data architecture. These include user-centricity, elasticity, robustness, and the capacity to ensure the seamless flow of data at all times. Added to these are automation enablement, plus security and data governance considerations. These points from our checklist for what we perceive to be an anticipatory analytics ecosystem.

#big data #data science #big data analytics #data analysis #data architecture #data transformation #data platform #data strategy #cloud data platform #data acquisition

1617988080

## How To Blend Data in Google Data Studio For Better Data Analysis

Using data to inform decisions is essential to product management, or anything really. And thankfully, we aren’t short of it. Any online application generates an abundance of data and it’s up to us to collect it and then make sense of it.

Google Data Studio helps us understand the meaning behind data, enabling us to build beautiful visualizations and dashboards that transform data into stories. If it wasn’t already, data literacy is as much a fundamental skill as learning to read or write. Or it certainly will be.

Nothing is more powerful than data democracy, where anyone in your organization can regularly make decisions informed with data. As part of enabling this, we need to be able to visualize data in a way that brings it to life and makes it more accessible. I’ve recently been learning how to do this and wanted to share some of the cool ways you can do this in Google Data Studio.

#google-data-studio #blending-data #dashboard #data-visualization #creating-visualizations #how-to-visualize-data #data-analysis #data-visualisation

1620629020

## Getting Started With Data Lakes

### Frameworks for Efficient Enterprise Analytics

The opportunities big data offers also come with very real challenges that many organizations are facing today. Often, it’s finding the most cost-effective, scalable way to store and process boundless volumes of data in multiple formats that come from a growing number of sources. Then organizations need the analytical capabilities and flexibility to turn this data into insights that can meet their specific business objectives.

This Refcard dives into how a data lake helps tackle these challenges at both ends — from its enhanced architecture that’s designed for efficient data ingestion, storage, and management to its advanced analytics functionality and performance flexibility. You’ll also explore key benefits and common use cases.

### Introduction

As technology continues to evolve with new data sources, such as IoT sensors and social media churning out large volumes of data, there has never been a better time to discuss the possibilities and challenges of managing such data for varying analytical insights. In this Refcard, we dig deep into how data lakes solve the problem of storing and processing enormous amounts of data. While doing so, we also explore the benefits of data lakes, their use cases, and how they differ from data warehouses (DWHs).

#big data #data analytics #data analysis #business analytics #data warehouse #data storage #data lake #data lake architecture #data lake governance #data lake management

1593801840

## Data Cleaning in R for Data Science

A data scientist/analyst in the making needs to format and clean data before being able to perform any kind of exploratory data analysis. Because when you have raw data, it has numerous problems that need fixing.

So when we say we are cleaning data into a tidy data set to be used for analysis later, we are actually (among many other things):

1. Removing duplicate values

2. Removing null values

3. Changing column names to readable, understandable, formatted names

4. Removing commas from numeric values i.e. (1,000,657 to 1000657)

5. Converting data types into their appropriate types for analysis

This article is based upon a brief course project I have recently completed in my Data Science Specialization, focused on retrieving raw data, combining it into one dataset and getting it ready for later analysis (not covered in this article). The language opted is R using Rstudio.

The Experiment:

The experiment conducted here is retrieved from UCI Machine Learning Repository where a group of 30 volunteers (age bracket of 19–48 years) performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a Samsung Galaxy S smartphone. The data collected from the embedded accelerometers was divided into testing and trained data. More information regarding the experiment can be found at this link.

http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones

## Step 1: Retrieving Data from URL

The first step required is to obtain the data. Often, to avoid the headache of manually downloading thousands of files, they are downloaded using small code snippets. Since this was a zipped folder, I used the following commands to get started.

``````download.file(“https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip", destfile = “files”, method = “curl”, mode = “wb”)
``````

The download.file functions takes the URL as the first argument and saves it on your local PC in the name you assign to destfile.

``````unzip(“files”)
``````

This function just unzips the zipped folder.

## Step 2: Reading the files into R

``````features <- read.table(“UCI HAR Dataset/features.txt”, col.names = c(“serial”, “Functions”))

activities <- read.table(“UCI HAR Dataset/activity_labels.txt”, col.names = c(“serial”, “Activity”))
x_test <- read.table(“UCI HAR Dataset/test/X_test.txt”, col.names = features\$Functions)
y_test <- read.table(“UCI HAR Dataset/test/y_test.txt”, col.names = “serial”)
subject_test <- read.table(“UCI HAR Dataset/test/subject_test.txt”, col.names = “subject”)
subject_train <- read.table(“UCI HAR Dataset/train/subject_train.txt”, col.names = “subject”)
x_train <- read.table(“UCI HAR Dataset/train/X_train.txt”, col.names = features\$Functions)
y_train <- read.table(“UCI HAR Dataset/train/y_train.txt”, col.names = “serial”)
``````

Note: It might be difficult to understand at first what the data means and what column names to use, but after a while you’ll start making sense. For example, it is important to note that the x_test and x_train files are values that refer to the columns in features.txt (hence I’ve linked them up using features\$functions)

Making sense of the Data:

After being able to actually look at the files, I found out they were a mess of several files with hundreds of just column names in one .txt file, others having the row values and one having the activity labels. After spending hours of trying to understand the logical representation of data, I was able to visualize it something as follows:

This clearly implies two things:

1. I had to merge the training and test sets by row binding them

2. I had to merge the different attributes of the subjects by column binding them.

This is where step 3 comes into play.

## Step 3: Merging the tables intelligently

First, I performed the rbind() function to make one huge dataset.

``````binded_x <- rbind(x_test, x_train)

binded_y <- rbind(y_test, y_train)
subject <- rbind(subject_test, subject_train)
Next, I used the cbind() function to complete attaching the columns as well.
raw_data_combined <- cbind(subject, binded_x, binded_y)
``````

#r #data-science-tools #data-analytics #data-science #data-cleaning #data analysis