Nat  Grady

Nat Grady

1668042600

Visdat: Preliminary Exploratory Visualisation of Data

visdat 

How to install

visdat is available on CRAN

install.packages("visdat")

If you would like to use the development version, install from github with:

# install.packages("devtools")
devtools::install_github("ropensci/visdat")

What does visdat do?

Initially inspired by csv-fingerprint, vis_dat helps you visualise a dataframe and “get a look at the data” by displaying the variable classes in a dataframe as a plot with vis_dat, and getting a brief look into missing data patterns using vis_miss.

visdat has 6 functions:

vis_dat() visualises a dataframe showing you what the classes of the columns are, and also displaying the missing data.

vis_miss() visualises just the missing data, and allows for missingness to be clustered and columns rearranged. vis_miss() is similar to missing.pattern.plot from the mi package. Unfortunately missing.pattern.plot is no longer in the mi package (as of 14/02/2016).

vis_compare() visualise differences between two dataframes of the same dimensions

vis_expect() visualise where certain conditions hold true in your data

vis_cor() visualise the correlation of variables in a nice heatmap

vis_guess() visualise the individual class of each value in your data

vis_value() visualise the value class of each cell in your data

vis_binary() visualise the occurrence of binary values in your data

You can read more about visdat in the vignette, “using visdat”.

Code of Conduct

Please note that the visdat project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Examples

Using vis_dat()

Let’s see what’s inside the airquality dataset from base R, which contains information about daily air quality measurements in New York from May to September 1973. More information about the dataset can be found with ?airquality.

library(visdat)

vis_dat(airquality)

The plot above tells us that R reads this dataset as having numeric and integer values, with some missing data in Ozone and Solar.R. The classes are represented on the legend, and missing data represented by grey. The column/variable names are listed on the x axis.

Using vis_miss()

We can explore the missing data further using vis_miss():

vis_miss(airquality)

Percentages of missing/complete in vis_miss are accurate to 1 decimal place.

You can cluster the missingness by setting cluster = TRUE:

vis_miss(airquality, 
         cluster = TRUE)

Columns can also be arranged by columns with most missingness, by setting sort_miss = TRUE:

vis_miss(airquality,
         sort_miss = TRUE)

vis_miss indicates when there is a very small amount of missing data at <0.1% missingness:

test_miss_df <- data.frame(x1 = 1:10000,
                           x2 = rep("A", 10000),
                           x3 = c(rep(1L, 9999), NA))

vis_miss(test_miss_df)

vis_miss will also indicate when there is no missing data at all:

vis_miss(mtcars)

To further explore the missingness structure in a dataset, I recommend the naniar package, which provides more general tools for graphical and numerical exploration of missing values.

Using vis_compare()

Sometimes you want to see what has changed in your data. vis_compare() displays the differences in two dataframes of the same size. Let’s look at an example.

Let’s make some changes to the chickwts, and compare this new dataset:

set.seed(2019-04-03-1105)
chickwts_diff <- chickwts
chickwts_diff[sample(1:nrow(chickwts), 30),sample(1:ncol(chickwts), 2)] <- NA

vis_compare(chickwts_diff, chickwts)

Here the differences are marked in blue.

If you try and compare differences when the dimensions are different, you get an ugly error:

chickwts_diff_2 <- chickwts
chickwts_diff_2$new_col <- chickwts_diff_2$weight*2

vis_compare(chickwts, chickwts_diff_2)
# Error in vis_compare(chickwts, chickwts_diff_2) : 
#   Dimensions of df1 and df2 are not the same. vis_compare requires dataframes of identical dimensions.

Using vis_expect()

vis_expect visualises certain conditions or values in your data. For example, If you are not sure whether to expect values greater than 25 in your data (airquality), you could write: vis_expect(airquality, ~.x>=25), and you can see if there are times where the values in your data are greater than or equal to 25:

vis_expect(airquality, ~.x >= 25)

This shows the proportion of times that there are values greater than 25, as well as the missings.

Using vis_cor()

To make it easy to plot correlations of your data, use vis_cor:

vis_cor(airquality)

Using vis_value

vis_value() visualises the values of your data on a 0 to 1 scale.

vis_value(airquality)

It only works on numeric data, so you might get strange results if you are using factors:

library(ggplot2)
vis_value(iris)
data input can only contain numeric values, please subset the data to the numeric values you would like. dplyr::select_if(data, is.numeric) can be helpful here!

So you might need to subset the data beforehand like so:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

iris %>%
  select_if(is.numeric) %>%
  vis_value()

Using vis_binary()

vis_binary() visualises binary values. See below for use with example data, dat_bin

vis_binary(dat_bin)

If you don’t have only binary values a warning will be shown.

vis_binary(airquality)
Error in test_if_all_binary(data) : 
  data input can only contain binary values - this means either 0 or 1, or NA. Please subset the data to be binary values, or see ?vis_value.

Using vis_guess()

vis_guess() takes a guess at what each cell is. It’s best illustrated using some messy data, which we’ll make here:

messy_vector <- c(TRUE,
                  T,
                  "TRUE",
                  "T",
                  "01/01/01",
                  "01/01/2001",
                  NA,
                  NaN,
                  "NA",
                  "Na",
                  "na",
                  "10",
                  10,
                  "10.1",
                  10.1,
                  "abc",
                  "$%TG")

set.seed(2019-04-03-1106)
messy_df <- data.frame(var1 = messy_vector,
                       var2 = sample(messy_vector),
                       var3 = sample(messy_vector))

vis_guess(messy_df) vis_dat(messy_df)

So here we see that there are many different kinds of data in your dataframe. As an analyst this might be a depressing finding. We can see this comparison above.

Thank yous

Thank you to Ivan Hanigan who first commented this suggestion after I made a blog post about an initial prototype ggplot_missing, and Jenny Bryan, whose tweet got me thinking about vis_dat, and for her code contributions that removed a lot of errors.

Thank you to Hadley Wickham for suggesting the use of the internals of readr to make vis_guess work. Thank you to Miles McBain for his suggestions on how to improve vis_guess. This resulted in making it at least 2-3 times faster. Thanks to Carson Sievert for writing the code that combined plotly with visdat, and for Noam Ross for suggesting this in the first place. Thank you also to Earo Wang and Stuart Lee for their help in getting capturing expressions in vis_expect.

Finally thank you to rOpenSci and it’s amazing onboarding process, this process has made visdat a much better package, thanks to the editor Noam Ross (@noamross), and the reviewers Sean Hughes (@seaaan) and Mara Averick (@batpigandme).

ropensci_footer

Download Details:

Author: ropensci
Source Code: https://github.com/ropensci/visdat 
License: View license

#r #dataanalysis #visualising #rstats 

Visdat: Preliminary Exploratory Visualisation of Data
Nat  Grady

Nat Grady

1668034680

LidR: Airborne LiDAR Data Manipulation, Visualisation for Forestry App

lidR 

R package for Airborne LiDAR Data Manipulation and Visualization for Forestry Applications

The lidR package provides functions to read and write .las and .laz files, plot point clouds, compute metrics using an area-based approach, compute digital canopy models, thin LiDAR data, manage a collection of LAS/LAZ files, automatically extract ground inventories, process a collection of tiles using multicore processing, segment individual trees, classify points from geographic data, and provides other tools to manipulate LiDAR data in a research and development context.

:book: Read the book to get started with the lidR package. See changelogs on NEW.md

To cite the package use citation() from within R:

citation("lidR")
#> Roussel, J.R., Auty, D., Coops, N. C., Tompalski, P., Goodbody, T. R. H., Sánchez Meador, A., Bourdon, J.F., De Boissieu, F., Achim, A. (2020). lidR : An R package for analysis of Airborne Laser Scanning (ALS) data. Remote Sensing of Environment, 251 (August), 112061. <doi:10.1016/j.rse.2020.112061>.
#> Jean-Romain Roussel and David Auty (2021). Airborne LiDAR Data Manipulation and Visualization for Forestry Applications. R package version 3.1.0. https://cran.r-project.org/package=lidR

Key features

Read and display a las file

In R-fashion style the function plot, based on rgl, enables the user to display, rotate and zoom a point cloud. Because rgl has limited capabilities with respect to large datasets, we also made a package lidRviewer with better display capabilities.

las <- readLAS("<file.las>")
plot(las)

Compute a canopy height model

lidR has several algorithms from the literature to compute canopy height models either point-to-raster based or triangulation based. This allows testing and comparison of some methods that rely on a CHM, such as individual tree segmentation or the computation of a canopy roughness index.

las <- readLAS("<file.las>")

# Khosravipour et al. pitfree algorithm
thr <- c(0,2,5,10,15)
edg <- c(0, 1.5)
chm <- rasterize_canopy(las, 1, pitfree(thr, edg))

plot(chm)

Read and display a catalog of las files

lidR enables the user to manage, use and process a collection of las files. The function readLAScatalog builds a LAScatalog object from a folder. The function plot displays this collection on an interactive map using the mapview package (if installed).

ctg <- readLAScatalog("<folder/>")
plot(ctg, map = TRUE)

From a LAScatalog object the user can (for example) extract some regions of interest (ROI) with clip_roi(). Using a catalog for the extraction of the ROI guarantees fast and memory-efficient clipping. LAScatalog objects allow many other manipulations that can be done with multicore processing.

Individual tree segmentation

The segment_trees() function has several algorithms from the literature for individual tree segmentation, based either on the digital canopy model or on the point-cloud. Each algorithm has been coded from the source article to be as close as possible to what was written in the peer-reviewed papers. Our goal is to make published algorithms usable, testable and comparable.

las <- readLAS("<file.las>")

las <- segment_trees(las, li2012())
col <- random.colors(200)
plot(las, color = "treeID", colorPalette = col)

Wall-to-wall dataset processing

Most of the lidR functions can seamlessly process a set of tiles and return a continuous output. Users can create their own methods using the LAScatalog processing engine via the catalog_apply() function. Among other features the engine takes advantage of point indexation with lax files, takes care of processing tiles with a buffer and allows for processing big files that do not fit in memory.

# Load a LAScatalog instead of a LAS file
ctg <- readLAScatalog("<path/to/folder/>")

# Process it like a LAS file
chm <- rasterize_canopy(ctg, 2, p2r())
col <- random.colors(50)
plot(chm, col = col)

Full waveform

lidR can read full waveform data from LAS files and provides interpreter functions to convert the raw data into something easier to manage and display in R. The support of FWF is still in the early stages of development.

fwf <- readLAS("<fullwaveform.las>")

# Interpret the waveform into something easier to manage
las <- interpret_waveform(fwf)

# Display discrete points and waveforms
x <- plot(fwf, colorPalette = "red", bg = "white")
plot(las, color = "Amplitude", add = x)

About

lidR is developed openly at Laval University.

Install lidR dependencies on GNU/Linux

# Ubuntu
sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable
sudo apt-get update
sudo apt-get install libgdal-dev libgeos++-dev libudunits2-dev libproj-dev libx11-dev libgl1-mesa-dev libglu1-mesa-dev libfreetype6-dev libxt-dev libfftw3-dev

# Fedora
sudo dnf install gdal-devel geos-devel udunits2-devel proj-devel mesa-libGL-devel mesa-libGLU-devel freetype-devel libjpeg-turbo-devel

Download Details:

Author: r-lidar
Source Code: https://github.com/r-lidar/lidR 
License: GPL-3.0, GPL-3.0 licenses found 

#r #data #visualising 

LidR: Airborne LiDAR Data Manipulation, Visualisation for Forestry App

GR Framework: A Graphics Library for Visualisation Applications

GR - a universal framework for visualization applications   

GR is a universal framework for cross-platform visualization applications. It offers developers a compact, portable and consistent graphics library for their programs. Applications range from publication quality 2D graphs to the representation of complex 3D scenes.

GR is essentially based on an implementation of a Graphical Kernel System (GKS). As a self-contained system it can quickly and easily be integrated into existing applications (i.e. using the ctypes mechanism in Python or ccall in Julia).

The GR framework can be used in imperative programming systems or integrated into modern object-oriented systems, in particular those based on GUI toolkits. GR is characterized by its high interoperability and can be used with modern web technologies. The GR framework is especially suitable for real-time or signal processing environments.

GR was developed by the Scientific IT-Systems group at the Peter Grünberg Institute at Forschunsgzentrum Jülich. The main development has been done by Josef Heinen who currently maintains the software, but there are other developers who currently make valuable contributions. Special thanks to Florian Rhiem (GR3) and Christian Felder (qtgr, setup.py).

Starting with release 0.6 GR can be used as a backend for Matplotlib and significantly improve the performance of existing Matplotlib or PyPlot applications written in Python or Julia, respectively. In this tutorial section you can find some examples.

Beginning with version 0.10.0 GR supports inline graphics which shows up in IPython's Qt Console or interactive computing environments for Python and Julia, such as IPython and Jupyter. An interesting example can be found here.

Installation and Getting Started

To install GR and try it using Python, Julia or C, please see the corresponding documentation:

Documentation

You can find more information about GR on the GR home page.

Contributing

If you want to improve GR, please read the contribution guide for a few notes on how to report issues or submit changes.

Support

If you have any questions about GR or run into any issues setting up or running GR, please open an issue on GitHub, either in this repo or in the repo for the language binding you are using (Python, Julia, Ruby).

Download Details:

Author: Sciapp
Source Code: https://github.com/sciapp/gr 
License: View license

#julia #graphic  #visualising 

GR Framework: A Graphics Library for Visualisation Applications

GR.jl: Plotting for Julia Based on GR

The GR module for Julia     

This module provides a Julia interface to GR, a framework for visualisation applications.

Screenshots

Installation

From the Julia REPL an up to date version can be installed with:

Pkg.add("GR")

or in the Pkg REPL-mode:

add GR

The Julia package manager will download and install a pre-compiled run-time (for your hardware architecture), if the GR software is not already installed in the recommended locations.

Getting started

In Julia simply type using GR and begin calling functions in the GR framework API.

Let's start with a simple example. We generate 10,000 random numbers and create a histogram. The histogram function automatically chooses an appropriate number of bins to cover the range of values in x and show the shape of the underlying distribution.

using GR
histogram(randn(10000))

Using GR as backend for Plots.jl

Plots is a powerful wrapper around other Julia visualization "backends", where GR seems to be one of the favorite ones. To get an impression how complex visualizations may become easier with Plots, take a look at these examples.

Plots is great on its own, but the real power comes from the ecosystem surrounding it. You can find more information here.

Alternatives

Besides GR and Plots there is a nice package called GRUtils which provides a user-friendly interface to the low-level GR subsytem, but in a more "Julian" and modular style. Newcomers are recommended to use this package. A detailed documentation can be found here.

GR and GRUtils are currently still being developed in parallel - but there are plans to merge the two modules in the future.

Run-time environment

GR.jl is a wrapper for the GR Framework. Therefore, the GR run-time libraries are required to use the software. These are provided via the GR_jll.jl package, which is an autogenerated package constructed using BinaryBuilder. This is the default setting.

Another alternative is the use of binaries from GR tarballs, which are provided directly by the GR developers as stand-alone distributions for selected platforms - regardless of the programming language. In this case, only one GR runtime environment is required for different language environments (Julia, Python, C/C++), whose installation path can be specified by the environment variable GRDIR.

ENV["JULIA_DEBUG"] = "GR" # Turn on debug statements for the GR package
ENV["GRDIR"] = "<path of you GR installation>" # e.g. "/usr/local/gr"
using GR

For more information about setting up a local GR installation, see the GR Framework website.

However, if you want to permanently use your own GR run-time, you have to set the environment variable GRDIR accordingly before starting Julia, e.g.

  • macOS or Linux: export GRDIR=/usr/local/gr
  • Windows: set GRDIR=C:\gr

Please note that with the method shown here, GR_jll is not imported.

Download Details:

Author: jheinen
Source Code: https://github.com/jheinen/GR.jl 
License: View license

#julia #visualising 

GR.jl: Plotting for Julia Based on GR
Nat  Grady

Nat Grady

1659735000

ggplot2: An implementation of the Grammar of Graphics in R

ggplot2

Overview

ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

Installation

# The easiest way to get ggplot2 is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just ggplot2:
install.packages("ggplot2")

# Or the development version from GitHub:
# install.packages("devtools")
devtools::install_github("tidyverse/ggplot2")

Cheatsheet

Usage

It’s hard to succinctly describe how ggplot2 works because it embodies a deep philosophy of visualisation. However, in most cases you start with ggplot(), supply a dataset and aesthetic mapping (with aes()). You then add on layers (like geom_point() or geom_histogram()), scales (like scale_colour_brewer()), faceting specifications (like facet_wrap()) and coordinate systems (like coord_flip()).

library(ggplot2)

ggplot(mpg, aes(displ, hwy, colour = class)) + 
  geom_point()

Lifecycle

ggplot2 is now over 10 years old and is used by hundreds of thousands of people to make millions of plots. That means, by-and-large, ggplot2 itself changes relatively little. When we do make changes, they will be generally to add new functions or arguments rather than changing the behaviour of existing functions, and if we do make changes to existing behaviour we will do them for compelling reasons.

If you are looking for innovation, look to ggplot2’s rich ecosystem of extensions. See a community maintained list at https://exts.ggplot2.tidyverse.org/gallery/.

Learning ggplot2

If you are new to ggplot2 you are better off starting with a systematic introduction, rather than trying to learn from reading individual documentation pages. Currently, there are three good places to start:

The Data Visualisation and Graphics for communication chapters in R for Data Science. R for Data Science is designed to give you a comprehensive introduction to the tidyverse, and these two chapters will get you up to speed with the essentials of ggplot2 as quickly as possible.

If you’d like to take an online course, try Data Visualization in R With ggplot2 by Kara Woo.

If you’d like to follow a webinar, try Plotting Anything with ggplot2 by Thomas Lin Pedersen.

If you want to dive into making common graphics as quickly as possible, I recommend The R Graphics Cookbook by Winston Chang. It provides a set of recipes to solve common graphics problems.

If you’ve mastered the basics and want to learn more, read ggplot2: Elegant Graphics for Data Analysis. It describes the theoretical underpinnings of ggplot2 and shows you how all the pieces fit together. This book helps you understand the theory that underpins ggplot2, and will help you create new types of graphics specifically tailored to your needs.

Getting help

There are two main places to get help with ggplot2:

The RStudio community is a friendly place to ask any questions about ggplot2.

Stack Overflow is a great source of answers to common ggplot2 questions. It is also a great place to get help, once you have created a reproducible example that illustrates your problem.

Author: Tidyverse
Source Code: https://github.com/tidyverse/ggplot2 
License: Unknown, MIT licenses found

#r #visualising #datavisualisation 

ggplot2: An implementation of the Grammar of Graphics in R
Alec  Nikolaus

Alec Nikolaus

1600891200

Hands-On Guide To Graphviz Python Tool To Define And Visualize Graphs

Python provides different visualization libraries that allow us to create different graphs and plots. These graphs and plots help us in visualizing the data patterns, anomalies in the data, or if data has missing values. Visualization is an important part of data discovery.

Modules like seaborn, matplotlib, bokeh, etc. are all used to create visualizations that are highly interactive, scalable, and visually attractive. But these libraries don’t allow us to create nodes and edges to connect different diagrams or flowcharts or a graph. For creating graphs and connecting them using nodes and edges we can use Graphviz.

Graphviz is an open-source python module that is used to create graph objects which can be completed using different nodes and edges. It is based on the DOT language of the Graphviz software and in python it allows us to download the source code of the graph in DOT language.


In this article, we will see how we can create a graph using Graphviz and how to download the source code of the graph in the DOT language.m**.**

#graphviz #nodes #visualising #python

Hands-On Guide To Graphviz Python Tool To Define And Visualize Graphs