1660006920
DiagrammeR
With the DiagrammeR package you can create, modify, analyze, and visualize network graph diagrams. The output can be incorporated into R Markdown documents, integrated with Shiny web apps, converted to other graph formats, or exported as image files.
This package is made possible by the htmlwidgets R package, which provides an easy-to-use framework for bringing together R and JavaScript.
The graph above can be created with this combination of DiagrammeR functions:
example_graph <-
create_graph() %>%
add_pa_graph(
n = 50, m = 1,
set_seed = 23
) %>%
add_gnp_graph(
n = 50, p = 1/100,
set_seed = 23
) %>%
join_node_attrs(df = get_betweenness(.)) %>%
join_node_attrs(df = get_degree_total(.)) %>%
colorize_node_attrs(
node_attr_from = total_degree,
node_attr_to = fillcolor,
palette = "Greens",
alpha = 90
) %>%
rescale_node_attrs(
node_attr_from = betweenness,
to_lower_bound = 0.5,
to_upper_bound = 1.0,
node_attr_to = height
) %>%
select_nodes_by_id(nodes = get_articulation_points(.)) %>%
set_node_attrs_ws(node_attr = peripheries, value = 2) %>%
set_node_attrs_ws(node_attr = penwidth, value = 3) %>%
clear_selection() %>%
set_node_attr_to_display(attr = NULL)
render_graph(example_graph, layout = "nicely")
DiagrammeR’s graph functions allow you to create graph objects, modify those graphs, get information from the graphs, create a series of graphs, and do many other useful things. This makes it possible to generate a network graph with data available in tabular datasets. Two specialized data frames contain node data and attributes (node data frames) and edges with associated edge attributes (edge data frames). Because the attributes are always kept alongside the node and edge definitions (within the graph object itself), we can easily work with them.
Let’s create a graph object with create_graph()
and add some nodes and edges to it. Each node gets a new integer ID upon creation. Each edge also gets an ID starting from 1. The pipes between functions make the whole process readable and understandable.
a_graph <-
create_graph() %>%
add_node() %>%
add_node() %>%
add_edge(from = 1, to = 2)
We can take away an edge by using delete_edge()
.
b_graph <- a_graph %>% delete_edge(from = 1, to = 2)
We can add a node to the graph while, at the same time, defining edges to or from existing nodes in the graph.
c_graph <- b_graph %>% add_node(from = 1, to = 2)
Viewing the graph object in the console will provide some basic information about the graph and some pointers on where to get additional information.
c_graph
#> DiagrammeR Graph // 3 nodes / 2 edges
#> -- directed / connected / DAG / simple
#>
#> NODES / type: <unused> / label: <unused> info: `get_node_df()`
#> -- no additional node attributes
#> EDGES / rel: <unused> info: `get_edge_df()`
#> -- no additional edge attributes
#> SELECTION / <none>
#> CACHE / <none>
#> GLOBAL ATTRS / 17 are set info: `get_global_graph_attr_info()`
#> GRAPH ACTIONS / <none>
#> GRAPH LOG / <3 actions> -> add_edge() -> delete_edge() -> add_node()
Any time we add a node or edge to the graph, we can add node or edge aesthetic or data attributes. These can be styling properties (e.g., color
, shape
), grouping labels (e.g., type
and rel
), or data values that are useful for calculations and for display purposes. Most node or edge creation functions (depending on whether they create either edges, nodes, or both) have the arguments node_aes
, edge_aes
, node_data
, and edge_data
. Using these, we can call the namesake helper functions (node_aes()
, edge_aes()
, node_data()
, and edge_data()
) to specifically target the created nodes or edges and bind attribute data. An additional benefit in using the helper functions (for the node/edge aesthetic attributes especially) is that RStudio can provide inline help on attribute names and definitions when typing node_aes(
or edge_aes(
and pressing the TAB key.
Here is an example of adding a node while setting its color
, fillcolor
, and fontcolor
node aesthetic attributes, and, adding an edge with color
, arrowhead
, and tooltip
edge aesthetic attributes. In both the add_node()
and the add_edge()
calls, the new node and edge were set with a value
node/edge data attribute.
d_graph <-
c_graph %>%
add_node(
type = "type_a",
node_aes = node_aes(
color = "steelblue",
fillcolor = "lightblue",
fontcolor = "gray35"
),
node_data = node_data(
value = 2.5
)
) %>%
add_edge(
from = 1, to = 3,
rel = "interacted_with",
edge_aes = edge_aes(
color = "red",
arrowhead = "vee",
tooltip = "Red Arrow"
),
edge_data = edge_data(
value = 5.2
)
)
Creating attributes and setting their values is often useful because we can further work with the attributes (e.g., mutate values or even use them during traversals). Furthermore, we can create aesthetic properties based on numerical or categorical data. This is important for when you want to display your graph diagram using the render_graph()
function.
Don’t worry if attribute values weren’t set right during the creation of the associated nodes or edges. They are ways to set attribute values for existing nodes and edges. Functions are available for targeting the specific nodes/edges (i.e., making a selection) and other functions are used to set attribute values for the selected nodes or edges. Often, this can be the more efficient strategy as we can target nodes/edges based on their properties (e.g., degree, relationships to neighbors, etc.). Here is an example where we select a node based on its value
attribute and modify its color
node aesthetic attribute:
e_graph <-
d_graph %>%
select_nodes(conditions = value == 2.5) %>%
set_node_attrs_ws(node_attr = fillcolor, value = "orange") %>%
clear_selection()
To explain this a bit, we take the graph object d_graph
, select only the nodes that have a node value
attribute of exactly 2.5
. (We now have an active node selection.) With the selected nodes, we set their node attribute fillcolor
with the value orange
. Then we deactivate the selection with clear_selection()
. Now, if we view the graph with render_graph()
we get this:
There are quite a few functions that allow you to select nodes (e.g., select_nodes()
, select_nodes_by_id()
, select_last_nodes_created()
) and edges (e.g., select_edges()
, select_edges_by_edge_id()
, select_last_edges_created()
). With these selections, we can apply changes using functions that end with ..._ws()
(with selection). As seen, node attributes could be set/replaced with set_node_attrs_ws()
but we can also mutate attributes of selected nodes (mutate_node_attrs_ws()
), delete selected nodes (delete_nodes_ws()
), and even create a subgraph with that selection (create_subgraph_ws()
). Selections of nodes or edges can be inverted (where non-selected nodes or edges become the active selection) with invert_selection()
, certain nodes/edges can be removed from the active selection with the deselect_nodes()
/deselect_edges()
, and any selection can and should be eventually cleared with clear_selection()
.
We can create a graph object and add graph primitives such as paths, cycles, and trees to it.
f_graph <-
create_graph() %>%
add_path(n = 3) %>%
add_cycle(n = 4) %>%
add_balanced_tree(k = 2, h = 2)
You can add one or more randomly generated graphs to a graph object. Here, let’s add a directed GNM graph with 10 nodes and 15 edges (the set_seed
option makes the random graph reproducible).
g_graph <-
create_graph() %>%
add_gnm_graph(
n = 15, m = 20,
set_seed = 23
)
The undirected version of this graph is can be made using:
h_graph <-
create_graph(directed = FALSE) %>%
add_gnm_graph(
n = 15, m = 20,
set_seed = 23
)
We can view the graph using render_graph()
. There are several layouts to choose from as well (e.g., nicely
, tree
, kk
, fr
, etc.).
render_graph(h_graph, layout = "fr")
The DiagrammeR package contains a few simple datasets that help illustrate how to create a graph with table data. The node_list_1
and edge_list_1
datasets are super simple node and edge data frames that can be assembled into a graph. Let’s print them side by side to see what we’re working with.
node_list_1 edge_list_1
id label from to
1 1 A 1 1 2
2 2 B 2 1 3
3 3 C 3 1 4
4 4 D 4 1 9
5 5 E 5 2 8
6 6 F 6 2 7
7 7 G 7 2 1
8 8 H 8 2 10
9 9 I 9 3 1
10 10 J 10 3 6
11 3 8
12 4 1
13 5 7
14 6 2
15 6 9
16 8 1
17 9 3
18 9 10
19 10 1
To fashion this into a graph, we need to ensure that both the nodes and their attributes (in this case, just a label
) are added, and, that the edges are added. Furthermore, we must map the from
and the to
definitions to the node id
(in other cases, we may need to map relationships between text labels to the same text attribute stored in the node data frame). We can use three functions to generate a graph containing this data:
create_graph()
add_nodes_from_table()
add_edges_from_table()
Let’s show the process in a stepwise fashion (while occasionally viewing the graph’s internal ndf and edf) so that we can understand what is actually happening. First, create the graph object with create_graph()
:
# Create the graph object
i_graph_1 <- create_graph()
# It will start off as empty
i_graph_1 %>% is_graph_empty()
#> [1] TRUE
Add nodes from a table with add_nodes_from_table()
:
# Add the nodes to the graph
i_graph_2 <-
i_graph_1 %>%
add_nodes_from_table(
table = node_list_1,
label_col = label
)
Inspect the graph’s internal node data frame (ndf) with get_node_df()
:
# View the graph's internal node data frame
i_graph_2 %>% get_node_df()
#> id type label id_external
#> 1 1 <NA> A 1
#> 2 2 <NA> B 2
#> 3 3 <NA> C 3
#> 4 4 <NA> D 4
#> 5 5 <NA> E 5
#> 6 6 <NA> F 6
#> 7 7 <NA> G 7
#> 8 8 <NA> H 8
#> 9 9 <NA> I 9
#> 10 10 <NA> J 10
The graph now has 10 nodes (no edges yet). Each node was automatically assigned an auto-incrementing id
. The incoming id
was also automatically renamed id_external
so as to avoid duplicate column names and also to retain a column for mapping edge definitions. Now, let’s add the edges. We need to specify that the from_col
in the edge_list_1
table is indeed from
and that the to_col
is to
. The from_to_map
argument expects a node attribute column that the from
and to
columns will map to. In this case it’s id_external
. Note that while id
also matches perfectly in this mapping, there may be cases where id
won’t match with and id_external
column (e.g., when there are existing nodes or when the node id
values in the incoming table are provided in a different order, etc.).
Now, connect the graph nodes with edges from another dataset using add_edges_from_table()
:
# Add the edges to the graph
i_graph_3 <-
i_graph_2 %>%
add_edges_from_table(
table = edge_list_1,
from_col = from,
to_col = to,
from_to_map = id_external
)
Inspect the graph’s internal edge data frame (edf) with get_edge_df()
:
# View the edge data frame
i_graph_3 %>% get_edge_df()
#> id from to rel
#> 1 1 1 2 <NA>
#> 2 2 1 3 <NA>
#> 3 3 1 4 <NA>
#> 4 4 1 9 <NA>
#> 5 5 2 8 <NA>
#> 6 6 2 7 <NA>
#> 7 7 2 1 <NA>
#> 8 8 2 10 <NA>
#> 9 9 3 1 <NA>
#> 10 10 3 6 <NA>
#> 11 11 3 8 <NA>
#> 12 12 4 1 <NA>
#> 13 13 5 7 <NA>
#> 14 14 6 2 <NA>
#> 15 15 6 9 <NA>
#> 16 16 8 1 <NA>
#> 17 17 9 3 <NA>
#> 18 18 9 10 <NA>
#> 19 19 10 1 <NA>
By supplying the name of the graph object in the console, we can get a succinct summary of the graph’s properties. Here, we see that the graph has 10 nodes and 19 edges:
i_graph_3
#> DiagrammeR Graph // 10 nodes / 19 edges
#> -- directed / connected / simple
#>
#> NODES / type: <unused> / label: 10 vals - complete & unique
#> -- 1 additional node attribute (id_external)
#> EDGES / rel: <unused> info: `get_edge_df()`
#> -- no additional edge attributes
#> SELECTION / <none>
#> CACHE / <none>
#> GLOBAL ATTRS / 17 are set info: `get_global_graph_attr_info()`
#> GRAPH ACTIONS / <none>
#> GRAPH LOG / <1 action> -> add_nodes_from_table() -> add_edges_from_table() -> ()
There are two other similar datasets included in the package (node_list_2
and edge_list_2
). These contain extended attribute data. Let’s have a quick look at their column names:
colnames(node_list_2)
#> [1] "id" "label" "type" "value_1" "value_2"
colnames(edge_list_2)
#> [1] "from" "to" "rel" "value_1" "value_2"
Because we have unique labels in the label
column, and categorical labels in the type
and rel
columns, we can create a property graph from this data. Like before, we can incorporate the two tables as a graph with add_nodes_from_table()
and add_edges_from_table()
. This time, we’ll remove the auto-generated id_external
node attribute with the drop_node_attrs()
function.
j_graph <-
create_graph() %>%
add_nodes_from_table(
table = node_list_2,
label_col = label,
type_col = type
) %>%
add_edges_from_table(
table = edge_list_2,
from_col = from,
to_col = to,
from_to_map = id_external,
rel_col = rel
) %>%
drop_node_attrs(node_attr = id_external)
Let’s again view the graph summary in the console. Note that the additional node attributes (value_1
and value_2
) are present for both the nodes and the edges:
j_graph
#> DiagrammeR Graph // 10 nodes / 19 edges
#> -- directed / connected / property graph / simple
#>
#> NODES / type: 2 vals - complete / label: 10 vals - complete & unique
#> -- 2 additional node attributes (value_1, value_2)
#> EDGES / rel: 3 vals - complete info: `get_edge_df()`
#> -- 2 additional edge attributes (value_1, value_2)
#> SELECTION / <none>
#> CACHE / <none>
#> GLOBAL ATTRS / 17 are set info: `get_global_graph_attr_info()`
#> GRAPH ACTIONS / <none>
#> GRAPH LOG / <3 actions> -> add_edges_from_table() -> () -> drop_node_attrs()
Now, because we have node/edge metadata (categorical labels and numerical data in value_1
& value_2
for both nodes and edges), we can do some interesting things with the graph. First, let’s do some mutation with mutate_node_attrs()
and mutate_edge_attrs()
and get the sums of value_1
and value_2
as value_3
(for both the nodes and the edges). Then, let’s color the nodes and edges forestgreen
if value_3
is greater than 10
(red
otherwise). Finally, let’s display the values of value_3
for the nodes when rendering the graph diagram. Here we go!
k_graph <-
j_graph %>%
mutate_node_attrs(value_3 = value_1 + value_2) %>%
mutate_edge_attrs(value_3 = value_1 + value_2) %>%
select_nodes(conditions = value_3 > 10) %>%
set_node_attrs_ws(node_attr = fillcolor, value = "forestgreen") %>%
invert_selection() %>%
set_node_attrs_ws(node_attr = fillcolor, value = "red") %>%
select_edges(conditions = value_3 > 10) %>%
set_edge_attrs_ws(edge_attr = color, value = "forestgreen") %>%
invert_selection() %>%
set_edge_attrs_ws(edge_attr = color, value = "red") %>%
clear_selection() %>%
set_node_attr_to_display(attr = value_3)
render_graph(k_graph)
Let’s create a property graph that pertains to contributors to three software projects. This graph has nodes representing people and projects. The attributes name
, age
, join_date
, email
, follower_count
, following_count
, and starred_count
are specific to the person
nodes while the project
, start_date
, stars
, and language
attributes apply to the project
nodes. The edges represent the relationships between the people and the project.
The example graph file repository.dgr
is available in the extdata/example_graphs_dgr/
directory in the DiagrammeR package (currently, only for the Github version). We can load it into memory by using the open_graph()
function, where system.file()
helps to provide the location of the file within the package.
# Load in a the small repository graph
graph <-
open_graph(
system.file(
"extdata/example_graphs_dgr/repository.dgr",
package = "DiagrammeR"
)
)
We can always view this property graph with the render_graph()
function:
render_graph(graph, layout = "kk")
Now that the graph is set up, you can create queries with magrittr pipelines to get specific answers from the graph.
Get the average age of all the contributors. Select all nodes of type person
(not project
). Each node of that type has non-NA
age
attribute, so, get that attribute as a vector with get_node_attrs_ws()
and then calculate the mean with R’s mean()
function.
graph %>%
select_nodes(conditions = type == "person") %>%
get_node_attrs_ws(node_attr = age) %>%
mean()
#> [1] 33.6
We can get the total number of commits to all projects. We know that all edges contain the numerical commits
attribute, so, select all edges (select_edges()
by itself selects all edges in the graph). After that, get a numeric vector of commits
values and then get its sum()
(all commits to all projects).
graph %>%
select_edges() %>%
get_edge_attrs_ws(edge_attr = commits) %>%
sum()
#> [1] 5182
Single out the one known as Josh and get his total number of commits as a maintainer and as a contributor. Start by selecting the Josh node with select_nodes(conditions = name == "Josh")
. In this graph, we know that all people have an edge to a project and that edge can be of the relationship (rel
) type of contributor
or maintainer
. We can migrate our selection from nodes to outbound edges with trav_out_edges()
(and we won’t provide a condition, just all the outgoing edges from Josh will be selected). Now we have a selection of 2 edges. Get that vector of commits
values with get_edge_attrs_ws()
and then calculate the sum()
. This is the total number of commits.
graph %>%
select_nodes(conditions = name == "Josh") %>%
trav_out_edge() %>%
get_edge_attrs_ws(edge_attr = commits) %>%
sum()
#> [1] 227
Get the total number of commits from Louisa, just from the maintainer role though. In this case we’ll supply a condition in trav_out_edge()
. This acts as a filter for the traversal and this means that the selection will be applied to only those edges where the condition is met. Although there is only a single value, we’ll still use sum()
after get_edge_attrs_ws()
(a good practice because we may not know the vector length, especially in big graphs).
graph %>%
select_nodes(conditions = name == "Louisa") %>%
trav_out_edge(conditions = rel == "maintainer") %>%
get_edge_attrs_ws(edge_attr = commits) %>%
sum()
#> [1] 236
How do we do something more complex, like, get the names of people in graph above age 32? First, select all person
nodes with select_nodes(conditions = type == "person")
. Then, follow up with another select_nodes()
call specifying age > 32
. Importantly, have set_op = "intersect"
(giving us the intersection of both selections).
Now that we have the starting selection of nodes we want, we need to get all values of these nodes’ name
attribute as a character vector. We do this with the get_node_attrs_ws()
function. After getting that vector, sort the names alphabetically with the R function sort()
. Because we get a named vector, we can use unname()
to not show us the names of each vector component.
graph %>%
select_nodes(conditions = type == "person") %>%
select_nodes(conditions = age > 32, set_op = "intersect") %>%
get_node_attrs_ws(node_attr = name) %>%
sort() %>%
unname()
#> [1] "Jack" "Jon" "Kim" "Roger" "Sheryl"
That supercalc project is progressing quite nicely. Let’s get the total number of commits from all people to that most interesting project. Start by selecting that project’s node and work backwards. Traverse to the edges leading to it with trav_in_edge()
. Those edges are from committers and they all contain the commits
attribute with numerical values. Get a vector of commits
and then get the sum (there are 1676
commits).
graph %>%
select_nodes(conditions = project == "supercalc") %>%
trav_in_edge() %>%
get_edge_attrs_ws(edge_attr = commits) %>%
sum()
#> [1] 1676
Kim is now a contributor to the stringbuildeR project and has made 15 new commits to that project. We can modify the graph to reflect this.
First, add an edge with add_edge()
. Note that add_edge()
usually relies on node IDs in from
and to
when creating the new edge. This is almost always inconvenient so we can instead use node labels (we know they are unique in this graph) to compose the edge, setting use_labels = TRUE
.
The rel
value in add_edge()
was set to contributor
– in a property graph we always have values set for all node type
and edge rel
attributes. We will set another attribute for this edge (commits
) by first selecting the edge (it was the last edge made, so we can use select_last_edges_created()
), then, use set_edge_attrs_ws()
and provide the attribute/value pair. Finally, clear the active selections with clear_selection()
. The graph is now changed, have a look.
graph <-
graph %>%
add_edge(
from = "Kim",
to = "stringbuildeR",
rel = "contributor"
) %>%
select_last_edges_created() %>%
set_edge_attrs_ws(edge_attr = commits, value = 15) %>%
clear_selection()
render_graph(graph, layout = "kk")
Get all email addresses for contributors (but not maintainers) of the randomizer and supercalc88 projects. With trav_in_edge()
we just want the contributer
edges/commits. Once on those edges, hop back unconditionally to the people from which the edges originate with trav_out_node()
. Get the email
values from those selected individuals as a sorted character vector.
graph %>%
select_nodes(
conditions =
project == "randomizer" |
project == "supercalc"
) %>%
trav_in_edge(conditions = rel == "contributor") %>%
trav_out_node() %>%
get_node_attrs_ws(node_attr = email) %>%
sort() %>%
unname()
#> [1] "j_2000@ultramail.io" "josh_ch@megamail.kn"
#> [3] "kim_3251323@ohhh.ai" "lhe99@mailing-fun.com"
#> [5] "roger_that@whalemail.net" "the_simone@a-q-w-o.net"
#> [7] "the_will@graphymail.com"
Which people have committed to more than one project? This is a matter of node degree. We know that people have edges outward and projects and edges inward. Thus, anybody having an outdegree (number of edges outward) greater than 1
has committed to more than one project. Globally, select nodes with that condition using select_nodes_by_degree("outdeg > 1")
. Once getting the name
attribute values from that node selection, we can provide a sorted character vector of names.
graph %>%
select_nodes_by_degree(expressions = "outdeg > 1") %>%
get_node_attrs_ws(node_attr = name) %>%
sort() %>%
unname()
#> [1] "Josh" "Kim" "Louisa"
DiagrammeR is used in an R environment. If you don’t have an R installation, it can be obtained from the Comprehensive R Archive Network (CRAN).
You can install the development version of DiagrammeR from GitHub using the devtools package.
devtools::install_github("rich-iannone/DiagrammeR")
Or, get it from CRAN.
install.packages("DiagrammeR")
If you encounter a bug, have usage questions, or want to share ideas to make this package better, feel free to file an issue.
Please note that the DiagrammeR project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Author: rich-iannone
Source Code: https://github.com/rich-iannone/DiagrammeR
License: Unknown, MIT licenses found
1659980700
The goal of patchwork
is to make it ridiculously simple to combine separate ggplots into the same graphic. As such it tries to solve the same problem as gridExtra::grid.arrange()
and cowplot::plot_grid
but using an API that incites exploration and iteration, and scales to arbitrily complex layouts.
You can install patchwork from CRAN using install.packages('patchwork')
. Alternatively you can grab the development version from github using devtools:
# install.packages("devtools")
devtools::install_github("thomasp85/patchwork")
The usage of patchwork
is simple: just add plots together!
library(ggplot2)
library(patchwork)
p1 <- ggplot(mtcars) + geom_point(aes(mpg, disp))
p2 <- ggplot(mtcars) + geom_boxplot(aes(gear, disp, group = gear))
p1 + p2
patchwork provides rich support for arbitrarily complex layouts with full alignment. As an example, check out this very readable code for nesting three plots on top of a third:
p3 <- ggplot(mtcars) + geom_smooth(aes(disp, qsec))
p4 <- ggplot(mtcars) + geom_bar(aes(carb))
(p1 | p2 | p3) /
p4
patchwork can do so much more. Check out the guides for learning everything there is to know about all the different features:
Please note that the patchwork project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
Author: Thomasp85
Source Code: https://github.com/thomasp85/patchwork
License: View license
1659853500
Sloop monitors Kubernetes, recording histories of events and resource state changes and providing visualizations to aid in debugging past events.
Key features:
Sloop can be installed using any of these options:
Users can install sloop by using helm chart now, for instructions refer helm readme
sloopimage/sloop
Building Sloop from source needs a working Go environment with version 1.13 or greater installed.
Clone the sloop repository and build using make
:
mkdir -p $GOPATH/src/github.com/salesforce
cd $GOPATH/src/github.com/salesforce
git clone https://github.com/salesforce/sloop.git
cd sloop
go env -w GO111MODULE=auto
make
$GOPATH/bin/sloop
When complete, you should have a running Sloop version accessing the current context from your kubeConfig. Just point your browser at http://localhost:8080/
Other makefile targets:
To run from Docker you need to host mount your kubeconfig:
make docker-snapshot
docker run --rm -it -p 8080:8080 -v ~/.kube/:/kube/ -e KUBECONFIG=/kube/config sloop
In this mode, data is written to a memory-backed volume and is discarded after each run. To preserve the data, you can host-mount /data with something like -v /data/:/some_path_on_host/
To reflect any changes to webserver/webfiles, run the following command on terminal while within the webserver directory before submitting a pr:
go-bindata -pkg webserver -o bindata.go webfiles/
This will update the bindata folder with your changes to any html, css or javascript files within the directory.
This is very similar to above but abstracts running docker with AWS credentials for connecting to EKS
make docker
export AWS_ACCESS_KEY_ID=<access_key_id> AWS_SECRET_ACCESS_KEY=<secret_access_key> AWS_SESSION_TOKEN=<session_token>
./providers/aws/sloop_to_eks.sh <cluster name>
Data retention policy stated above still applies in this case.
This is an advanced feature. Use with caution.
To download a backup of the database, navigate to http://localhost:8080/data/backup
To restore from a backup, start sloop
with the -restore-database-file
flag set to the backup file downloaded in the previous step. When restoring, you may also wish to set the -disable-kube-watch=true
flag to stop new writes from occurring and/or the -context
flag to restore the database into a different context.
Sloop's memory usage can be managed by tweaking several options:
badger-use-lsm-only-options
If this flag is set to true, values would be collocated with the LSM tree, with value log largely acting as a write-ahead log only. Recommended value for memory constrained environments: falsebadger-keep-l0-in-memory
When this flag is set to true, Level 0 tables are kept in memory. This leads to better performance in writes as well as compactions. Recommended value for memory constrained environments: falsebadger-sync-writes
When SyncWrites is true all writes are synced to disk. Setting this to false would achieve better performance, but may cause data loss in case of crash. Recommended value for memory constrained environments: falsebadger-vlog-fileIO-mapping
TableLoadingMode indicates which file loading mode should be used for the LSM tree data files. Setting to true would not load the value in memory map. Recommended value for memory constrained environments: trueApart from these flags some other values can be tweaked to fit in the memory constraints. Following are some examples of setups.
// 0.5<<20 (524288 bytes = 0.5 Mb)
"badger-max-table-size=524288",
"badger-number-of-compactors=1",
"badger-number-of-level-zero-tables=1",
"badger-number-of-zero-tables-stall=2",
// 16<<20 (16777216 bytes = 16 Mb)
"badger-max-table-size=16777216",
"badger-number-of-compactors=1",
"badger-number-of-level-zero-tables=1",
"badger-number-of-zero-tables-stall=2",
// 32<<20 (33554432 bytes = 32 Mb)
"badger-max-table-size=33554432",
"badger-number-of-compactors=1",
"badger-number-of-level-zero-tables=2",
"badger-number-of-zero-tables-stall=3",
Apart from the above settings, max-disk-mb and max-look-back can be tweaked according to input data and memory constraints.
Refer to CONTRIBUTING.md
Author: Salesforce
Source Code: https://github.com/salesforce/sloop
License: BSD-3-Clause license
1659821280
Additional Themes and Theme Components for ‘ggplot2’
This is a very focused package that provides typography-centric themes and theme components for ggplot2. It’s a an extract/riff of hrbrmisc
created by request.
The core theme: theme_ipsum
(“ipsum” is Latin for “precise”) uses Arial Narrow which should be installed on practically any modern system, so it’s “free”-ish. This font is condensed, has solid default kerning pairs and geometric numbers. That’s what I consider the “font trifecta” must-have for charts. An additional quality for fonts for charts is that they have a diversity of weights. Arial Narrow (the one on most systems, anyway) does not have said diversity but this quality is not (IMO) a “must have”.
The following functions are implemented/objects are exported:
Themes:
theme_ipsum
: Arial Narrowtheme_ipsum_gs
: Goldman Sans Condensedtheme_ipsum_es
: Econ Sans Condensedtheme_ipsum_rc
: Roboto Condensedtheme_ipsum_ps
: IBM Plex Sans fonttheme_ipsum_pub
: Public Sanstheme_ipsum_tw
: Titilium Webtheme_modern_rc
: Roboto Condensed dark themetheme_ft_rc
: Dark theme based on FT’s dark theme (Roboto Condensed)Scales (that align with various themes):
scale_color_ipsum
: Discrete color & fill scales based on the ipsum palettescale_colour_ipsum
: Discrete color & fill scales based on the ipsum palettescale_fill_ipsum
: Discrete color & fill scales based on the ipsum palettescale_color_ft
: Discrete color & fill scales based on the FT palettescale_colour_ft
: Discrete color & fill scales based on the FT palettescale_fill_ft
: Discrete color & fill scales based on the FT palettescale_x_comma
: X & Y scales with opinionated presets for percent & comma label formatsscale_x_percent
: X & Y scales with opinionated presets for percent & comma label formatsscale_y_comma
: X & Y scales with opinionated presets for percent & comma label formatsscale_y_percent
: X & Y scales with opinionated presets for percent & comma label formatsPalettes/Named Colors:
ipsum_pal
: A muted, qualitative color paletteft_cols
: FT color paletteft_pal
: A bright qualitative color paletteft_text_col
: FT color paletteFonts:
font_an
: Arial Narrow font name R variable aliasesfont_es
: Econ Sans font name R variable aliasesfont_es_bold
: Econ Sans font name R variable aliasesfont_es_light
: Econ Sans font name R variable aliasesfont_rc
: Roboto Condensed font name R variable aliasesfont_rc_light
: Roboto Condensed font name R variable aliasesfont_pub
: Public Sans font name R variable aliasesfont_pub_bold
: Public Sans font name R variable aliasesfont_pub_light
: Public Sans font name R variable aliasesfont_pub_thin
: Public Sans font name R variable aliasesfont_ps
: PlexSans font name R variable aliasesfont_ps_light
: PlexSans font name R variable aliasesfont_tw
: Titillium Web font name R variable aliasesfont_tw_bold
: Titillium Web font name R variable aliasesfont_tw_light
: Titillium Web font name R variable aliasesR Markdown:
ipsum
: ipsum R markdown templateipsum_pdf
: ipsum R markdown template for PDF outputUtilities:
flush_ticks
: Makes axis text labels flush on the endsft_geom_defaults
: Change geom defaults from black to custom lights for the FT themegg_check
: Spell check ggplot2 plot labelsimport_econ_sans
: Import Econ Sans Condensed font for use in chartsimport_plex_sans
: Import IBM Plex Sans font for use in chartsimport_roboto_condensed
: Import Roboto Condensed font for use in chartsimport_titillium_web
: Import Titillium Web font for use in chartsmodern_geom_defaults
: Change geom defaults from black to white for the modern themeupdate_geom_font_defaults
: Update matching font defaults for text geomsinstall.packages("hrbrthemes") # NOTE: CRAN version is 0.8.0
# or
install.packages("hrbrthemes", repos = c("https://cinc.rud.is", "https://cloud.r-project.org/"))
# or
remotes::install_git("https://git.rud.is/hrbrmstr/hrbrthemes.git")
# or
remotes::install_git("https://git.sr.ht/~hrbrmstr/hrbrthemes")
# or
remotes::install_gitlab("hrbrmstr/hrbrthemes")
# or
remotes::install_bitbucket("hrbrmstr/hrbrthemes")
# or
remotes::install_github("hrbrmstr/hrbrthemes")
NOTE: To use the ‘remotes’ install options you will need to have the {remotes} package installed.
library(hrbrthemes)
library(gcookbook)
library(tidyverse)
# current verison
packageVersion("hrbrthemes")
## [1] '0.8.6'
ggplot(mtcars, aes(mpg, wt)) +
geom_point() +
labs(x="Fuel efficiency (mpg)", y="Weight (tons)",
title="Seminal ggplot2 scatterplot example",
subtitle="A plot that is only useful for demonstration purposes",
caption="Brought to you by the letter 'g'") +
theme_ipsum()
ggplot(mtcars, aes(mpg, wt)) +
geom_point() +
labs(x="Fuel efficiency (mpg)", y="Weight (tons)",
title="Seminal ggplot2 scatterplot example",
subtitle="A plot that is only useful for demonstration purposes",
caption="Brought to you by the letter 'g'") +
theme_ipsum_rc()
ggplot(mtcars, aes(mpg, wt)) +
geom_point(color = ft_cols$yellow) +
labs(x="Fuel efficiency (mpg)", y="Weight (tons)",
title="Seminal ggplot2 scatterplot example",
subtitle="A plot that is only useful for demonstration purposes",
caption="Brought to you by the letter 'g'") +
theme_ft_rc()
ggplot(mpg, aes(displ, hwy)) +
geom_jitter(aes(color=class, fill=class), size=3, shape=21, alpha=1/2) +
scale_x_continuous(expand=c(0,0), limits=c(1, 8), breaks=1:8) +
scale_y_continuous(expand=c(0,0), limits=c(10, 50)) +
scale_color_ipsum() +
scale_fill_ipsum() +
facet_wrap(~class, scales="free") +
labs(
title="IBM Plex Sans Test",
subtitle="This is a subtitle to see the how it looks in IBM Plex Sans",
caption="Source: hrbrthemes & IBM"
) +
theme_ipsum_ps(grid="XY", axis="xy") +
theme(legend.position="none") -> gg
flush_ticks(gg)
## theme(axis.text.x=element_text(hjust=c(0, rep(0.5, 6), 1))) +
## theme(axis.text.y=element_text(vjust=c(0, rep(0.5, 3), 1)))
ggplot(mpg, aes(displ, hwy)) +
geom_jitter(aes(color=class, fill=class), size=3, shape=21, alpha=1/2) +
scale_x_continuous(expand=c(0,0), limits=c(1, 8), breaks=1:8) +
scale_y_continuous(expand=c(0,0), limits=c(10, 50)) +
scale_color_ipsum() +
scale_fill_ipsum() +
facet_wrap(~class, scales="free") +
labs(
title="Titillium Web",
subtitle="This is a subtitle to see the how it looks in Titillium Web",
caption="Source: hrbrthemes & Google"
) +
theme_ipsum_tw(grid="XY", axis="xy") +
theme(legend.position="none") -> gg
flush_ticks(gg)
## theme(axis.text.x=element_text(hjust=c(0, rep(0.5, 6), 1))) +
## theme(axis.text.y=element_text(vjust=c(0, rep(0.5, 3), 1)))
ggplot(mtcars, aes(mpg, wt)) +
geom_point(aes(color=factor(carb))) +
labs(x="Fuel efficiency (mpg)", y="Weight (tons)",
title="Seminal ggplot2 scatterplot example",
subtitle="A plot that is only useful for demonstration purposes",
caption="Brought to you by the letter 'g'") +
scale_color_ipsum() +
theme_ipsum_rc()
count(mpg, class) %>%
mutate(pct=n/sum(n)) %>%
ggplot(aes(class, pct)) +
geom_col() +
scale_y_percent() +
labs(x="Fuel efficiency (mpg)", y="Weight (tons)",
title="Seminal ggplot2 column chart example with percents",
subtitle="A plot that is only useful for demonstration purposes",
caption="Brought to you by the letter 'g'") +
theme_ipsum(grid="Y")
ggplot(uspopage, aes(x=Year, y=Thousands, fill=AgeGroup)) +
geom_area() +
scale_fill_ipsum() +
scale_x_continuous(expand=c(0,0)) +
scale_y_comma() +
labs(title="Age distribution of population in the U.S., 1900-2002",
subtitle="Example data from the R Graphics Cookbook",
caption="Source: R Graphics Cookbook") +
theme_ipsum_rc(grid="XY") +
theme(axis.text.x=element_text(hjust=c(0, 0.5, 0.5, 0.5, 1))) +
theme(legend.position="bottom")
update_geom_font_defaults(font_rc_light)
count(mpg, class) %>%
mutate(n=n*2000) %>%
arrange(n) %>%
mutate(class=factor(class, levels=class)) %>%
ggplot(aes(class, n)) +
geom_col() +
geom_text(aes(label=scales::comma(n)), hjust=0, nudge_y=2000) +
scale_y_comma(limits=c(0,150000)) +
coord_flip() +
labs(x="Fuel efficiency (mpg)", y="Weight (tons)",
title="Seminal ggplot2 column chart example with commas",
subtitle="A plot that is only useful for demonstration purposes, esp since you'd never\nreally want direct labels and axis labels",
caption="Brought to you by the letter 'g'") +
theme_ipsum_rc(grid="X")
df <- data.frame(x=c(20, 25, 30), y=c(4, 4, 4), txt=c("One", "Two", "Three"))
ggplot(mtcars, aes(mpg, wt)) +
geom_point() +
labs(x="This is some txt", y="This is more text",
title="Thisy is a titlle",
subtitle="This is a subtitley",
caption="This is a captien") +
theme_ipsum_rc(grid="XY") -> gg
gg_check(gg)
## Possible misspelled words in [title]: (Thisy, titlle)
## Possible misspelled words in [subtitle]: (subtitley)
## Possible misspelled words in [caption]: (captien)
Lang | # Files | (%) | LoC | (%) | Blank lines | (%) | # Lines | (%) |
---|---|---|---|---|---|---|---|---|
R | 24 | 0.89 | 1724 | 0.80 | 327 | 0.72 | 908 | 0.84 |
HTML | 1 | 0.04 | 297 | 0.14 | 32 | 0.07 | 2 | 0.00 |
Rmd | 2 | 0.07 | 129 | 0.06 | 98 | 0.21 | 168 | 0.16 |
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Author: Hrbrmstr
Source Code: https://github.com/hrbrmstr/hrbrthemes
License: View license
1659784320
R package corrplot provides a visual exploratory tool on correlation matrix that supports automatic variable reordering to help detect hidden patterns among variables.
corrplot is very easy to use and provides a rich array of plotting options in visualization method, graphic layout, color, legend, text labels, etc. It also provides p-values and confidence intervals to help users determine the statistical significance of the correlations.
For examples, see its online vignette.
This package is licensed under the MIT license, and available on CRAN: https://cran.r-project.org/package=corrplot.
library(corrplot)
M = cor(mtcars)
corrplot(M, order = 'hclust', addrect = 2)
To download the release version of the package on CRAN, type the following at the R command line:
install.packages('corrplot')
To download the development version of the package, type the following at the R command line:
devtools::install_github('taiyun/corrplot', build_vignettes = TRUE)
To cite corrplot
properly, call the R built-in command citation('corrplot')
as follows:
citation('corrplot')
If you encounter a clear bug, please file a minimal reproducible example on github.
Author: Taiyun
Source Code: https://github.com/taiyun/corrplot
License: View license
1659769380
ggtree: an R package for visualization of phylogenetic trees with their annotation data
‘ggtree’ extends the ‘ggplot2’ plotting system which implemented the grammar of graphics. ‘ggtree’ is designed for visualization and annotation of phylogenetic trees and other tree-like structures with their annotation data.
For details, please visit https://yulab-smu.top/treedata-book/.
Guangchuang YU
School of Basic Medical Sciences, Southern Medical University
https://guangchuangyu.github.io
If you use ggtree in published research, please cite the most appropriate paper(s) from this list:
We welcome any contributions! By participating in this project you agree to abide by the terms outlined in the Contributor Code of Conduct.
Author: YuLab-SMU
Source Code: https://github.com/YuLab-SMU/ggtree
License:
1659765540
{ggstatsplot}
: {ggplot2}
Based Plots with Statistical Details“What is to be sought in designs for the display of information is the clear portrayal of complexity. Not the complication of the simple; rather … the revelation of the complex.” - Edward R. Tufte
{ggstatsplot}
is an extension of {ggplot2}
package for creating graphics with details from statistical tests included in the information-rich plots themselves. In a typical exploratory data analysis workflow, data visualization and statistical modeling are two different phases: visualization informs modeling, and modeling in its turn can suggest a different visualization method, and so on and so forth. The central idea of {ggstatsplot}
is simple: combine these two phases into one in the form of graphics with statistical details, which makes data exploration simpler and faster.
Type | Source | Command |
---|---|---|
Release | install.packages("ggstatsplot") | |
Development | remotes::install_github("IndrajeetPatil/ggstatsplot") |
If you want to cite this package in a scientific journal or in any other context, run the following code in your R
console:
citation("ggstatsplot")
To cite package 'ggstatsplot' in publications use:
Patil, I. (2021). Visualizations with statistical details: The
'ggstatsplot' approach. Journal of Open Source Software, 6(61), 3167,
doi:10.21105/joss.03167
A BibTeX entry for LaTeX users is
@Article{,
doi = {10.21105/joss.03167},
url = {https://doi.org/10.21105/joss.03167},
year = {2021},
publisher = {{The Open Journal}},
volume = {6},
number = {61},
pages = {3167},
author = {Indrajeet Patil},
title = {{Visualizations with statistical details: The {'ggstatsplot'} approach}},
journal = {{Journal of Open Source Software}},
}
I would like to thank all the contributors to {ggstatsplot}
who pointed out bugs or requested features I hadn’t considered. I would especially like to thank other package developers (especially Daniel Lüdecke, Dominique Makowski, Mattan S. Ben-Shachar, Brenton Wiernik, Patrick Mair, Salvatore Mangiafico, etc.) who have patiently and diligently answered my relentless questions and supported feature requests in their projects. I also want to thank Chuck Powell for his initial contributions to the package.
The hexsticker was generously designed by Sarah Otterstetter (Max Planck Institute for Human Development, Berlin). This package has also benefited from the larger #rstats
community on Twitter, LinkedIn, and StackOverflow
.
Thanks are also due to my postdoc advisers (Mina Cikara and Fiery Cushman at Harvard University; Iyad Rahwan at Max Planck Institute for Human Development) who patiently supported me spending hundreds (?) of hours working on this package rather than what I was paid to do. 😁
To see the detailed documentation for each function in the stable CRAN version of the package, see:
It, therefore, produces a limited kinds of plots for the supported analyses:
Function | Plot | Description | Lifecycle |
---|---|---|---|
ggbetweenstats | violin plots | for comparisons between groups/conditions | |
ggwithinstats | violin plots | for comparisons within groups/conditions | |
gghistostats | histograms | for distribution about numeric variable | |
ggdotplotstats | dot plots/charts | for distribution about labeled numeric variable | |
ggscatterstats | scatterplots | for correlation between two variables | |
ggcorrmat | correlation matrices | for correlations between multiple variables | |
ggpiestats | pie charts | for categorical data | |
ggbarstats | bar charts | for categorical data | |
ggcoefstats | dot-and-whisker plots | for regression models and meta-analysis |
In addition to these basic plots, {ggstatsplot}
also provides grouped_
versions (see below) that makes it easy to repeat the same analysis for any grouping variable.
The table below summarizes all the different types of analyses currently supported in this package-
Functions | Description | Parametric | Non-parametric | Robust | Bayesian |
---|---|---|---|---|---|
ggbetweenstats | Between group/condition comparisons | ✅ | ✅ | ✅ | ✅ |
ggwithinstats | Within group/condition comparisons | ✅ | ✅ | ✅ | ✅ |
gghistostats , ggdotplotstats | Distribution of a numeric variable | ✅ | ✅ | ✅ | ✅ |
ggcorrmat | Correlation matrix | ✅ | ✅ | ✅ | ✅ |
ggscatterstats | Correlation between two variables | ✅ | ✅ | ✅ | ✅ |
ggpiestats , ggbarstats | Association between categorical variables | ✅ | ✅ | ❌ | ✅ |
ggpiestats , ggbarstats | Equal proportions for categorical variable levels | ✅ | ✅ | ❌ | ✅ |
ggcoefstats | Regression model coefficients | ✅ | ✅ | ✅ | ✅ |
ggcoefstats | Random-effects meta-analysis | ✅ | ❌ | ✅ | ✅ |
Summary of Bayesian analysis
Analysis | Hypothesis testing | Estimation |
---|---|---|
(one/two-sample) t-test | ✅ | ✅ |
one-way ANOVA | ✅ | ✅ |
correlation | ✅ | ✅ |
(one/two-way) contingency table | ✅ | ✅ |
random-effects meta-analysis | ✅ | ✅ |
For all statistical tests reported in the plots, the default template abides by the gold standard for statistical reporting. For example, here are results from Yuen’s test for trimmed means (robust t-test):
Statistical analysis is carried out by {statsExpressions}
package, and thus a summary table of all the statistical tests currently supported across various functions can be found in article for that package: https://indrajeetpatil.github.io/statsExpressions/articles/stats_details.html
ggbetweenstats
This function creates either a violin plot, a box plot, or a mix of two for between-group or between-condition comparisons with results from statistical tests in the subtitle. The simplest function call looks like this-
set.seed(123)
ggbetweenstats(
data = iris,
x = Species,
y = Sepal.Length,
title = "Distribution of sepal length across Iris species"
)
Defaults return
✅ raw data + distributions
✅ descriptive statistics
✅ inferential statistics
✅ effect size + CIs
✅ pairwise comparisons
✅ Bayesian hypothesis-testing
✅ Bayesian estimation
A number of other arguments can be specified to make this plot even more informative or change some of the default options. Additionally, there is also a grouped_
variant of this function that makes it easy to repeat the same operation across a single grouping variable:
set.seed(123)
grouped_ggbetweenstats(
data = dplyr::filter(movies_long, genre %in% c("Action", "Comedy")),
x = mpaa,
y = length,
grouping.var = genre,
outlier.tagging = TRUE,
outlier.label = title,
outlier.coef = 2,
ggsignif.args = list(textsize = 4, tip_length = 0.01),
p.adjust.method = "bonferroni",
palette = "default_jama",
package = "ggsci",
plotgrid.args = list(nrow = 1),
annotation.args = list(title = "Differences in movie length by mpaa ratings for different genres")
)
Note here that the function can be used to tag outliers!
Summary of graphics
graphical element | geom_ used | argument for further modification |
---|---|---|
raw data | ggplot2::geom_point | point.args |
box plot | ggplot2::geom_boxplot | ❌ |
density plot | ggplot2::geom_violin | violin.args |
centrality measure point | ggplot2::geom_point | centrality.point.args |
centrality measure label | ggrepel::geom_label_repel | centrality.label.args |
outlier point | ggplot2::stat_boxplot | ❌ |
outlier label | ggrepel::geom_label_repel | outlier.label.args |
pairwise comparisons | ggsignif::geom_signif | ggsignif.args |
Summary of tests
Central tendency measure
Type | Measure | Function used |
---|---|---|
Parametric | mean | datawizard::describe_distribution |
Non-parametric | median | datawizard::describe_distribution |
Robust | trimmed mean | datawizard::describe_distribution |
Bayesian | MAP (maximum a posteriori probability) estimate | datawizard::describe_distribution |
Hypothesis testing
Type | No. of groups | Test | Function used |
---|---|---|---|
Parametric | > 2 | Fisher’s or Welch’s one-way ANOVA | stats::oneway.test |
Non-parametric | > 2 | Kruskal–Wallis one-way ANOVA | stats::kruskal.test |
Robust | > 2 | Heteroscedastic one-way ANOVA for trimmed means | WRS2::t1way |
Bayes Factor | > 2 | Fisher’s ANOVA | BayesFactor::anovaBF |
Parametric | 2 | Student’s or Welch’s t-test | stats::t.test |
Non-parametric | 2 | Mann–Whitney U test | stats::wilcox.test |
Robust | 2 | Yuen’s test for trimmed means | WRS2::yuen |
Bayesian | 2 | Student’s t-test | BayesFactor::ttestBF |
Effect size estimation
Type | No. of groups | Effect size | CI? | Function used |
---|---|---|---|---|
Parametric | > 2 | ✅ | effectsize::omega_squared , effectsize::eta_squared | |
Non-parametric | > 2 | ✅ | effectsize::rank_epsilon_squared | |
Robust | > 2 | ✅ | WRS2::t1way | |
Bayes Factor | > 2 | ✅ | performance::r2_bayes | |
Parametric | 2 | Cohen’s d, Hedge’s g | ✅ | effectsize::cohens_d , effectsize::hedges_g |
Non-parametric | 2 | r (rank-biserial correlation) | ✅ | effectsize::rank_biserial |
Robust | 2 | ✅ | WRS2::yuen.effect.ci | |
Bayesian | 2 | ✅ | bayestestR::describe_posterior |
Pairwise comparison tests
Type | Equal variance? | Test | p-value adjustment? | Function used |
---|---|---|---|---|
Parametric | No | Games-Howell test | ✅ | PMCMRplus::gamesHowellTest |
Parametric | Yes | Student’s t-test | ✅ | stats::pairwise.t.test |
Non-parametric | No | Dunn test | ✅ | PMCMRplus::kwAllPairsDunnTest |
Robust | No | Yuen’s trimmed means test | ✅ | WRS2::lincon |
Bayesian | NA | Student’s t-test | NA | BayesFactor::ttestBF |
For more, see the ggbetweenstats
vignette: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggbetweenstats.html
ggwithinstats
ggbetweenstats
function has an identical twin function ggwithinstats
for repeated measures designs that behaves in the same fashion with a few minor tweaks introduced to properly visualize the repeated measures design. As can be seen from an example below, the only difference between the plot structure is that now the group means are connected by paths to highlight the fact that these data are paired with each other.
set.seed(123)
library(WRS2) ## for data
library(afex) ## to run anova
ggwithinstats(
data = WineTasting,
x = Wine,
y = Taste,
title = "Wine tasting"
)
Defaults return
✅ raw data + distributions
✅ descriptive statistics
✅ inferential statistics
✅ effect size + CIs
✅ pairwise comparisons
✅ Bayesian hypothesis-testing
✅ Bayesian estimation
The central tendency measure displayed will depend on the statistics:
Type | Measure | Function used |
---|---|---|
Parametric | mean | datawizard::describe_distribution |
Non-parametric | median | datawizard::describe_distribution |
Robust | trimmed mean | datawizard::describe_distribution |
Bayesian | MAP estimate | datawizard::describe_distribution |
As with the ggbetweenstats
, this function also has a grouped_
variant that makes repeating the same analysis across a single grouping variable quicker. We will see an example with only repeated measurements-
set.seed(123)
grouped_ggwithinstats(
data = dplyr::filter(bugs_long, region %in% c("Europe", "North America"), condition %in% c("LDLF", "LDHF")),
x = condition,
y = desire,
type = "np",
xlab = "Condition",
ylab = "Desire to kill an artrhopod",
grouping.var = region,
outlier.tagging = TRUE,
outlier.label = education
)
Summary of graphics
graphical element | geom_ used | argument for further modification |
---|---|---|
raw data | ggplot2::geom_point | point.args |
point path | ggplot2::geom_path | point.path.args |
box plot | ggplot2::geom_boxplot | boxplot.args |
density plot | ggplot2::geom_violin | violin.args |
centrality measure point | ggplot2::geom_point | centrality.point.args |
centrality measure point path | ggplot2::geom_path | centrality.path.args |
centrality measure label | ggrepel::geom_label_repel | centrality.label.args |
outlier point | ggplot2::stat_boxplot | ❌ |
outlier label | ggrepel::geom_label_repel | outlier.label.args |
pairwise comparisons | ggsignif::geom_signif | ggsignif.args |
Summary of tests
Central tendency measure
Type | Measure | Function used |
---|---|---|
Parametric | mean | datawizard::describe_distribution |
Non-parametric | median | datawizard::describe_distribution |
Robust | trimmed mean | datawizard::describe_distribution |
Bayesian | MAP (maximum a posteriori probability) estimate | datawizard::describe_distribution |
Hypothesis testing
Type | No. of groups | Test | Function used |
---|---|---|---|
Parametric | > 2 | One-way repeated measures ANOVA | afex::aov_ez |
Non-parametric | > 2 | Friedman rank sum test | stats::friedman.test |
Robust | > 2 | Heteroscedastic one-way repeated measures ANOVA for trimmed means | WRS2::rmanova |
Bayes Factor | > 2 | One-way repeated measures ANOVA | BayesFactor::anovaBF |
Parametric | 2 | Student’s t-test | stats::t.test |
Non-parametric | 2 | Wilcoxon signed-rank test | stats::wilcox.test |
Robust | 2 | Yuen’s test on trimmed means for dependent samples | WRS2::yuend |
Bayesian | 2 | Student’s t-test | BayesFactor::ttestBF |
Effect size estimation
Type | No. of groups | Effect size | CI? | Function used |
---|---|---|---|---|
Parametric | > 2 | ✅ | effectsize::omega_squared , effectsize::eta_squared | |
Non-parametric | > 2 | ✅ | effectsize::kendalls_w | |
Robust | > 2 | ✅ | WRS2::wmcpAKP | |
Bayes Factor | > 2 | ✅ | performance::r2_bayes | |
Parametric | 2 | Cohen’s d, Hedge’s g | ✅ | effectsize::cohens_d , effectsize::hedges_g |
Non-parametric | 2 | r (rank-biserial correlation) | ✅ | effectsize::rank_biserial |
Robust | 2 | ✅ | WRS2::wmcpAKP | |
Bayesian | 2 | ✅ | bayestestR::describe_posterior |
Pairwise comparison tests
Type | Test | p-value adjustment? | Function used |
---|---|---|---|
Parametric | Student’s t-test | ✅ | stats::pairwise.t.test |
Non-parametric | Durbin-Conover test | ✅ | PMCMRplus::durbinAllPairsTest |
Robust | Yuen’s trimmed means test | ✅ | WRS2::rmmcp |
Bayesian | Student’s t-test | ❌ | BayesFactor::ttestBF |
For more, see the ggwithinstats
vignette: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggwithinstats.html
gghistostats
To visualize the distribution of a single variable and check if its mean is significantly different from a specified value with a one-sample test, gghistostats
can be used.
set.seed(123)
gghistostats(
data = ggplot2::msleep,
x = awake,
title = "Amount of time spent awake",
test.value = 12,
binwidth = 1
)
Defaults return
✅ counts + proportion for bins
✅ descriptive statistics
✅ inferential statistics
✅ effect size + CIs
✅ Bayesian hypothesis-testing
✅ Bayesian estimation
There is also a grouped_
variant of this function that makes it easy to repeat the same operation across a single grouping variable:
set.seed(123)
grouped_gghistostats(
data = dplyr::filter(movies_long, genre %in% c("Action", "Comedy")),
x = budget,
test.value = 50,
type = "nonparametric",
xlab = "Movies budget (in million US$)",
grouping.var = genre,
normal.curve = TRUE,
normal.curve.args = list(color = "red", size = 1),
ggtheme = ggthemes::theme_tufte(),
## modify the defaults from `{ggstatsplot}` for each plot
plotgrid.args = list(nrow = 1),
annotation.args = list(title = "Movies budgets for different genres")
)
Summary of graphics
graphical element | geom_ used | argument for further modification |
---|---|---|
histogram bin | ggplot2::stat_bin | bin.args |
centrality measure line | ggplot2::geom_vline | centrality.line.args |
normality curve | ggplot2::stat_function | normal.curve.args |
Summary of tests
Central tendency measure
Type | Measure | Function used |
---|---|---|
Parametric | mean | datawizard::describe_distribution |
Non-parametric | median | datawizard::describe_distribution |
Robust | trimmed mean | datawizard::describe_distribution |
Bayesian | MAP (maximum a posteriori probability) estimate | datawizard::describe_distribution |
Hypothesis testing
Type | Test | Function used |
---|---|---|
Parametric | One-sample Student’s t-test | stats::t.test |
Non-parametric | One-sample Wilcoxon test | stats::wilcox.test |
Robust | Bootstrap-t method for one-sample test | WRS2::trimcibt |
Bayesian | One-sample Student’s t-test | BayesFactor::ttestBF |
Effect size estimation
Type | Effect size | CI? | Function used |
---|---|---|---|
Parametric | Cohen’s d, Hedge’s g | ✅ | effectsize::cohens_d , effectsize::hedges_g |
Non-parametric | r (rank-biserial correlation) | ✅ | effectsize::rank_biserial |
Robust | trimmed mean | ✅ | WRS2::trimcibt |
Bayes Factor | ✅ | bayestestR::describe_posterior |
For more, including information about the variant of this function grouped_gghistostats
, see the gghistostats
vignette: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/gghistostats.html
ggdotplotstats
This function is similar to gghistostats
, but is intended to be used when the numeric variable also has a label.
set.seed(123)
ggdotplotstats(
data = dplyr::filter(gapminder::gapminder, continent == "Asia"),
y = country,
x = lifeExp,
test.value = 55,
type = "robust",
title = "Distribution of life expectancy in Asian continent",
xlab = "Life expectancy"
)
Defaults return
✅ descriptives (mean + sample size)
✅ inferential statistics
✅ effect size + CIs
✅ Bayesian hypothesis-testing
✅ Bayesian estimation
As with the rest of the functions in this package, there is also a grouped_
variant of this function to facilitate looping the same operation for all levels of a single grouping variable.
set.seed(123)
grouped_ggdotplotstats(
data = dplyr::filter(ggplot2::mpg, cyl %in% c("4", "6")),
x = cty,
y = manufacturer,
type = "bayes",
xlab = "city miles per gallon",
ylab = "car manufacturer",
grouping.var = cyl,
test.value = 15.5,
point.args = list(color = "red", size = 5, shape = 13),
annotation.args = list(title = "Fuel economy data")
)
Summary of graphics
graphical element | geom_ used | argument for further modification |
---|---|---|
raw data | ggplot2::geom_point | point.args |
centrality measure line | ggplot2::geom_vline | centrality.line.args |
Summary of tests
Central tendency measure
Type | Measure | Function used |
---|---|---|
Parametric | mean | datawizard::describe_distribution |
Non-parametric | median | datawizard::describe_distribution |
Robust | trimmed mean | datawizard::describe_distribution |
Bayesian | MAP (maximum a posteriori probability) estimate | datawizard::describe_distribution |
Hypothesis testing
Type | Test | Function used |
---|---|---|
Parametric | One-sample Student’s t-test | stats::t.test |
Non-parametric | One-sample Wilcoxon test | stats::wilcox.test |
Robust | Bootstrap-t method for one-sample test | WRS2::trimcibt |
Bayesian | One-sample Student’s t-test | BayesFactor::ttestBF |
Effect size estimation
Type | Effect size | CI? | Function used |
---|---|---|---|
Parametric | Cohen’s d, Hedge’s g | ✅ | effectsize::cohens_d , effectsize::hedges_g |
Non-parametric | r (rank-biserial correlation) | ✅ | effectsize::rank_biserial |
Robust | trimmed mean | ✅ | WRS2::trimcibt |
Bayes Factor | ✅ | bayestestR::describe_posterior |
ggscatterstats
This function creates a scatterplot with marginal distributions overlaid on the axes and results from statistical tests in the subtitle:
ggscatterstats(
data = ggplot2::msleep,
x = sleep_rem,
y = awake,
xlab = "REM sleep (in hours)",
ylab = "Amount of time spent awake (in hours)",
title = "Understanding mammalian sleep"
)
Defaults return
✅ raw data + distributions
✅ marginal distributions
✅ inferential statistics
✅ effect size + CIs
✅ Bayesian hypothesis-testing
✅ Bayesian estimation
There is also a grouped_
variant of this function that makes it easy to repeat the same operation across a single grouping variable.
set.seed(123)
grouped_ggscatterstats(
data = dplyr::filter(movies_long, genre %in% c("Action", "Comedy")),
x = rating,
y = length,
grouping.var = genre,
label.var = title,
label.expression = length > 200,
xlab = "IMDB rating",
ggtheme = ggplot2::theme_grey(),
ggplot.component = list(ggplot2::scale_x_continuous(breaks = seq(2, 9, 1), limits = (c(2, 9)))),
plotgrid.args = list(nrow = 1),
annotation.args = list(title = "Relationship between movie length and IMDB ratings")
)
Summary of graphics
graphical element | geom_ used | argument for further modification |
---|---|---|
raw data | ggplot2::geom_point | point.args |
labels for raw data | ggrepel::geom_label_repel | point.label.args |
smooth line | ggplot2::geom_smooth | smooth.line.args |
marginal histograms | ggside::geom_xsidehistogram , ggside::geom_ysidehistogram | xsidehistogram.args , ysidehistogram.args |
Summary of tests
Hypothesis testing and Effect size estimation
Type | Test | CI? | Function used |
---|---|---|---|
Parametric | Pearson’s correlation coefficient | ✅ | correlation::correlation |
Non-parametric | Spearman’s rank correlation coefficient | ✅ | correlation::correlation |
Robust | Winsorized Pearson correlation coefficient | ✅ | correlation::correlation |
Bayesian | Pearson’s correlation coefficient | ✅ | correlation::correlation |
For more, see the ggscatterstats
vignette: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggscatterstats.html
ggcorrmat
ggcorrmat
makes a correlalogram (a matrix of correlation coefficients) with minimal amount of code. Just sticking to the defaults itself produces publication-ready correlation matrices. But, for the sake of exploring the available options, let’s change some of the defaults. For example, multiple aesthetics-related arguments can be modified to change the appearance of the correlation matrix.
set.seed(123)
## as a default this function outputs a correlation matrix plot
ggcorrmat(
data = ggplot2::msleep,
colors = c("#B2182B", "white", "#4D4D4D"),
title = "Correlalogram for mammals sleep dataset",
subtitle = "sleep units: hours; weight units: kilograms"
)
Defaults return
✅ effect size + significance
✅ careful handling of NA
s
If there are NA
s present in the selected variables, the legend will display minimum, median, and maximum number of pairs used for correlation tests.
There is also a grouped_
variant of this function that makes it easy to repeat the same operation across a single grouping variable:
set.seed(123)
grouped_ggcorrmat(
data = dplyr::filter(movies_long, genre %in% c("Action", "Comedy")),
type = "robust",
colors = c("#cbac43", "white", "#550000"),
grouping.var = genre,
matrix.type = "lower"
)
Summary of graphics
graphical element | geom_ used | argument for further modification |
---|---|---|
correlation matrix | ggcorrplot::ggcorrplot | ggcorrplot.args |
Summary of tests
Hypothesis testing and Effect size estimation
Type | Test | CI? | Function used |
---|---|---|---|
Parametric | Pearson’s correlation coefficient | ✅ | correlation::correlation |
Non-parametric | Spearman’s rank correlation coefficient | ✅ | correlation::correlation |
Robust | Winsorized Pearson correlation coefficient | ✅ | correlation::correlation |
Bayesian | Pearson’s correlation coefficient | ✅ | correlation::correlation |
For examples and more information, see the ggcorrmat
vignette: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggcorrmat.html
ggpiestats
This function creates a pie chart for categorical or nominal variables with results from contingency table analysis (Pearson’s chi-squared test for between-subjects design and McNemar’s chi-squared test for within-subjects design) included in the subtitle of the plot. If only one categorical variable is entered, results from one-sample proportion test (i.e., a chi-squared goodness of fit test) will be displayed as a subtitle.
To study an interaction between two categorical variables:
set.seed(123)
ggpiestats(
data = mtcars,
x = am,
y = cyl,
package = "wesanderson",
palette = "Royal1",
title = "Dataset: Motor Trend Car Road Tests",
legend.title = "Transmission"
)
Defaults return
✅ descriptives (frequency + %s)
✅ inferential statistics
✅ effect size + CIs
✅ Goodness-of-fit tests
✅ Bayesian hypothesis-testing
✅ Bayesian estimation
There is also a grouped_
variant of this function that makes it easy to repeat the same operation across a single grouping variable. Following example is a case where the theoretical question is about proportions for different levels of a single nominal variable:
set.seed(123)
grouped_ggpiestats(
data = mtcars,
x = cyl,
grouping.var = am,
label.repel = TRUE,
package = "ggsci",
palette = "default_ucscgb"
)
Summary of graphics
graphical element | geom_ used | argument for further modification |
---|---|---|
pie slices | ggplot2::geom_col | ❌ |
descriptive labels | ggplot2::geom_label /ggrepel::geom_label_repel | label.args |
Summary of tests
two-way table
Hypothesis testing
Type | Design | Test | Function used |
---|---|---|---|
Parametric/Non-parametric | Unpaired | Pearson’s | stats::chisq.test |
Bayesian | Unpaired | Bayesian Pearson’s | BayesFactor::contingencyTableBF |
Parametric/Non-parametric | Paired | McNemar’s | stats::mcnemar.test |
Bayesian | Paired | ❌ | ❌ |
Effect size estimation
Type | Design | Effect size | CI? | Function used |
---|---|---|---|---|
Parametric/Non-parametric | Unpaired | Cramer’s | ✅ | effectsize::cramers_v |
Bayesian | Unpaired | Cramer’s | ✅ | effectsize::cramers_v |
Parametric/Non-parametric | Paired | Cohen’s | ✅ | effectsize::cohens_g |
Bayesian | Paired | ❌ | ❌ | ❌ |
one-way table
Hypothesis testing
Type | Test | Function used |
---|---|---|
Parametric/Non-parametric | Goodness of fit | stats::chisq.test |
Bayesian | Bayesian Goodness of fit | (custom) |
Effect size estimation
Type | Effect size | CI? | Function used |
---|---|---|---|
Parametric/Non-parametric | Pearson’s | ✅ | effectsize::pearsons_c |
Bayesian | ❌ | ❌ | ❌ |
For more, see the ggpiestats
vignette: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggpiestats.html
ggbarstats
In case you are not a fan of pie charts (for very good reasons), you can alternatively use ggbarstats
function which has a similar syntax.
N.B. The p-values from one-sample proportion test are displayed on top of each bar.
set.seed(123)
library(ggplot2)
ggbarstats(
data = movies_long,
x = mpaa,
y = genre,
title = "MPAA Ratings by Genre",
xlab = "movie genre",
legend.title = "MPAA rating",
ggplot.component = list(ggplot2::scale_x_discrete(guide = ggplot2::guide_axis(n.dodge = 2))),
palette = "Set2"
)
Defaults return
✅ descriptives (frequency + %s)
✅ inferential statistics
✅ effect size + CIs
✅ Goodness-of-fit tests
✅ Bayesian hypothesis-testing
✅ Bayesian estimation
And, needless to say, there is also a grouped_
variant of this function-
## setup
set.seed(123)
grouped_ggbarstats(
data = mtcars,
x = am,
y = cyl,
grouping.var = vs,
package = "wesanderson",
palette = "Darjeeling2" # ,
# ggtheme = ggthemes::theme_tufte(base_size = 12)
)
Summary of graphics
graphical element | geom_ used | argument for further modification |
---|---|---|
bars | ggplot2::geom_bar | ❌ |
descriptive labels | ggplot2::geom_label | label.args |
Summary of tests
two-way table
Hypothesis testing
Type | Design | Test | Function used |
---|---|---|---|
Parametric/Non-parametric | Unpaired | Pearson’s | stats::chisq.test |
Bayesian | Unpaired | Bayesian Pearson’s | BayesFactor::contingencyTableBF |
Parametric/Non-parametric | Paired | McNemar’s | stats::mcnemar.test |
Bayesian | Paired | ❌ | ❌ |
Effect size estimation
Type | Design | Effect size | CI? | Function used |
---|---|---|---|---|
Parametric/Non-parametric | Unpaired | Cramer’s | ✅ | effectsize::cramers_v |
Bayesian | Unpaired | Cramer’s | ✅ | effectsize::cramers_v |
Parametric/Non-parametric | Paired | Cohen’s | ✅ | effectsize::cohens_g |
Bayesian | Paired | ❌ | ❌ | ❌ |
one-way table
Hypothesis testing
Type | Test | Function used |
---|---|---|
Parametric/Non-parametric | Goodness of fit | stats::chisq.test |
Bayesian | Bayesian Goodness of fit | (custom) |
Effect size estimation
Type | Effect size | CI? | Function used |
---|---|---|---|
Parametric/Non-parametric | Pearson’s | ✅ | effectsize::pearsons_c |
Bayesian | ❌ | ❌ | ❌ |
ggcoefstats
The function ggcoefstats
generates dot-and-whisker plots for regression models saved in a tidy data frame. The tidy dataframes are prepared using parameters::model_parameters()
. Additionally, if available, the model summary indices are also extracted from performance::model_performance()
.
Although the statistical models displayed in the plot may differ based on the class of models being investigated, there are few aspects of the plot that will be invariant across models:
The dot-whisker plot contains a dot representing the estimate and their confidence intervals (95%
is the default). The estimate can either be effect sizes (for tests that depend on the F
-statistic) or regression coefficients (for tests with t
-, -, and
z
-statistic), etc. The function will, by default, display a helpful x
-axis label that should clear up what estimates are being displayed. The confidence intervals can sometimes be asymmetric if bootstrapping was used.
The label attached to dot will provide more details from the statistical test carried out and it will typically contain estimate, statistic, and p-value.e
The caption will contain diagnostic information, if available, about models that can be useful for model selection: The smaller the Akaike’s Information Criterion (AIC) and the Bayesian Information Criterion (BIC) values, the “better” the model is.
The output of this function will be a {ggplot2}
object and, thus, it can be further modified (e.g. change themes) with {ggplot2}
functions.
set.seed(123)
## model
mod <- stats::lm(formula = mpg ~ am * cyl, data = mtcars)
ggcoefstats(mod)
Defaults return
✅ inferential statistics
✅ estimate + CIs
✅ model summary (AIC and BIC)
Supported models
Most of the regression models that are supported in the underlying packages are also supported by ggcoefstats
.
insight::supported_models()
#> [1] "aareg" "afex_aov"
#> [3] "AKP" "Anova.mlm"
#> [5] "anova.rms" "aov"
#> [7] "aovlist" "Arima"
#> [9] "averaging" "bamlss"
#> [11] "bamlss.frame" "bayesQR"
#> [13] "bayesx" "BBmm"
#> [15] "BBreg" "bcplm"
#> [17] "betamfx" "betaor"
#> [19] "betareg" "BFBayesFactor"
#> [21] "bfsl" "BGGM"
#> [23] "bife" "bifeAPEs"
#> [25] "bigglm" "biglm"
#> [27] "blavaan" "blrm"
#> [29] "bracl" "brglm"
#> [31] "brmsfit" "brmultinom"
#> [33] "btergm" "censReg"
#> [35] "cgam" "cgamm"
#> [37] "cglm" "clm"
#> [39] "clm2" "clmm"
#> [41] "clmm2" "clogit"
#> [43] "coeftest" "complmrob"
#> [45] "confusionMatrix" "coxme"
#> [47] "coxph" "coxph.penal"
#> [49] "coxr" "cpglm"
#> [51] "cpglmm" "crch"
#> [53] "crq" "crqs"
#> [55] "crr" "dep.effect"
#> [57] "DirichletRegModel" "drc"
#> [59] "eglm" "elm"
#> [61] "epi.2by2" "ergm"
#> [63] "feglm" "feis"
#> [65] "felm" "fitdistr"
#> [67] "fixest" "flexsurvreg"
#> [69] "gam" "Gam"
#> [71] "gamlss" "gamm"
#> [73] "gamm4" "garch"
#> [75] "gbm" "gee"
#> [77] "geeglm" "glht"
#> [79] "glimML" "glm"
#> [81] "Glm" "glmm"
#> [83] "glmmadmb" "glmmPQL"
#> [85] "glmmTMB" "glmrob"
#> [87] "glmRob" "glmx"
#> [89] "gls" "gmnl"
#> [91] "HLfit" "htest"
#> [93] "hurdle" "iv_robust"
#> [95] "ivFixed" "ivprobit"
#> [97] "ivreg" "lavaan"
#> [99] "lm" "lm_robust"
#> [101] "lme" "lmerMod"
#> [103] "lmerModLmerTest" "lmodel2"
#> [105] "lmrob" "lmRob"
#> [107] "logistf" "logitmfx"
#> [109] "logitor" "LORgee"
#> [111] "lqm" "lqmm"
#> [113] "lrm" "manova"
#> [115] "MANOVA" "marginaleffects"
#> [117] "marginaleffects.summary" "margins"
#> [119] "maxLik" "mclogit"
#> [121] "mcmc" "mcmc.list"
#> [123] "MCMCglmm" "mcp1"
#> [125] "mcp12" "mcp2"
#> [127] "med1way" "mediate"
#> [129] "merMod" "merModList"
#> [131] "meta_bma" "meta_fixed"
#> [133] "meta_random" "metaplus"
#> [135] "mhurdle" "mipo"
#> [137] "mira" "mixed"
#> [139] "MixMod" "mixor"
#> [141] "mjoint" "mle"
#> [143] "mle2" "mlm"
#> [145] "mlogit" "mmlogit"
#> [147] "model_fit" "multinom"
#> [149] "mvord" "negbinirr"
#> [151] "negbinmfx" "ols"
#> [153] "onesampb" "orm"
#> [155] "pgmm" "plm"
#> [157] "PMCMR" "poissonirr"
#> [159] "poissonmfx" "polr"
#> [161] "probitmfx" "psm"
#> [163] "Rchoice" "ridgelm"
#> [165] "riskRegression" "rjags"
#> [167] "rlm" "rlmerMod"
#> [169] "RM" "rma"
#> [171] "rma.uni" "robmixglm"
#> [173] "robtab" "rq"
#> [175] "rqs" "rqss"
#> [177] "Sarlm" "scam"
#> [179] "selection" "sem"
#> [181] "SemiParBIV" "semLm"
#> [183] "semLme" "slm"
#> [185] "speedglm" "speedlm"
#> [187] "stanfit" "stanmvreg"
#> [189] "stanreg" "summary.lm"
#> [191] "survfit" "survreg"
#> [193] "svy_vglm" "svychisq"
#> [195] "svyglm" "svyolr"
#> [197] "t1way" "tobit"
#> [199] "trimcibt" "truncreg"
#> [201] "vgam" "vglm"
#> [203] "wbgee" "wblm"
#> [205] "wbm" "wmcpAKP"
#> [207] "yuen" "yuend"
#> [209] "zcpglm" "zeroinfl"
#> [211] "zerotrunc"
Although not shown here, this function can also be used to carry out parametric, robust, and Bayesian random-effects meta-analysis.
Summary of graphics
graphical element | geom_ used | argument for further modification |
---|---|---|
regression estimate | ggplot2::geom_point | point.args |
error bars | ggplot2::geom_errorbarh | errorbar.args |
vertical line | ggplot2::geom_vline | vline.args |
label with statistical details | ggrepel::geom_label_repel | stats.label.args |
Summary of meta-analysis tests
Hypothesis testing and Effect size estimation
Type | Test | Effect size | CI? | Function used |
---|---|---|---|---|
Parametric | Meta-analysis via random-effects models | ✅ | metafor::metafor | |
Robust | Meta-analysis via robust random-effects models | ✅ | metaplus::metaplus | |
Bayes | Meta-analysis via Bayesian random-effects models | ✅ | metaBMA::meta_random |
For a more exhaustive account of this function, see the associated vignette- https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggcoefstats.html
{ggstatsplot}
also offers a convenience function to extract dataframes with statistical details that are used to create expressions displayed in {ggstatsplot}
plots.
set.seed(123)
## a list of tibbles containing statistical analysis summaries
ggbetweenstats(mtcars, cyl, mpg) %>%
extract_stats()
#> $subtitle_data
#> # A tibble: 1 × 14
#> statistic df df.error p.value
#> <dbl> <dbl> <dbl> <dbl>
#> 1 31.6 2 18.0 0.00000127
#> method effectsize estimate
#> <chr> <chr> <dbl>
#> 1 One-way analysis of means (not assuming equal variances) Omega2 0.744
#> conf.level conf.low conf.high conf.method conf.distribution n.obs expression
#> <dbl> <dbl> <dbl> <chr> <chr> <int> <list>
#> 1 0.95 0.531 1 ncp F 32 <language>
#>
#> $caption_data
#> # A tibble: 6 × 18
#> term pd rope.percentage prior.distribution prior.location prior.scale
#> <chr> <dbl> <dbl> <chr> <dbl> <dbl>
#> 1 mu 1 0 cauchy 0 0.707
#> 2 cyl-4 1 0 cauchy 0 0.707
#> 3 cyl-6 0.780 0.390 cauchy 0 0.707
#> 4 cyl-8 1 0 cauchy 0 0.707
#> 5 sig2 1 0 cauchy 0 0.707
#> 6 g_cyl 1 0.0155 cauchy 0 0.707
#> bf10 method log_e_bf10 effectsize
#> <dbl> <chr> <dbl> <chr>
#> 1 3008850. Bayes factors for linear models 14.9 Bayesian R-squared
#> 2 3008850. Bayes factors for linear models 14.9 Bayesian R-squared
#> 3 3008850. Bayes factors for linear models 14.9 Bayesian R-squared
#> 4 3008850. Bayes factors for linear models 14.9 Bayesian R-squared
#> 5 3008850. Bayes factors for linear models 14.9 Bayesian R-squared
#> 6 3008850. Bayes factors for linear models 14.9 Bayesian R-squared
#> estimate std.dev conf.level conf.low conf.high conf.method n.obs expression
#> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <int> <list>
#> 1 0.714 0.0503 0.95 0.574 0.788 HDI 32 <language>
#> 2 0.714 0.0503 0.95 0.574 0.788 HDI 32 <language>
#> 3 0.714 0.0503 0.95 0.574 0.788 HDI 32 <language>
#> 4 0.714 0.0503 0.95 0.574 0.788 HDI 32 <language>
#> 5 0.714 0.0503 0.95 0.574 0.788 HDI 32 <language>
#> 6 0.714 0.0503 0.95 0.574 0.788 HDI 32 <language>
#>
#> $pairwise_comparisons_data
#> # A tibble: 3 × 9
#> group1 group2 statistic p.value alternative distribution p.adjust.method
#> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
#> 1 4 6 -6.67 0.00110 two.sided q Holm
#> 2 4 8 -10.7 0.0000140 two.sided q Holm
#> 3 6 8 -7.48 0.000257 two.sided q Holm
#> test expression
#> <chr> <list>
#> 1 Games-Howell <language>
#> 2 Games-Howell <language>
#> 3 Games-Howell <language>
#>
#> $descriptive_data
#> NULL
#>
#> $one_sample_data
#> NULL
#>
#> $tidy_data
#> NULL
#>
#> $glance_data
#> NULL
Note that all of this analysis is carried out by {statsExpressions}
package: https://indrajeetpatil.github.io/statsExpressions/
{ggstatsplot}
statistical details with custom plotsSometimes you may not like the default plots produced by {ggstatsplot}
. In such cases, you can use other custom plots (from {ggplot2}
or other plotting packages) and still use {ggstatsplot}
functions to display results from relevant statistical test.
For example, in the following chunk, we will create our own plot using {ggplot2}
package, and use {ggstatsplot}
function for extracting expression:
## loading the needed libraries
set.seed(123)
library(ggplot2)
## using `{ggstatsplot}` to get expression with statistical results
stats_results <- ggbetweenstats(morley, Expt, Speed, output = "subtitle")
## creating a custom plot of our choosing
ggplot(morley, aes(x = as.factor(Expt), y = Speed)) +
geom_boxplot() +
labs(
title = "Michelson-Morley experiments",
subtitle = stats_results,
x = "Speed of light",
y = "Experiment number"
)
{ggstatsplot}
No need to use scores of packages for statistical analysis (e.g., one to get stats, one to get effect sizes, another to get Bayes Factors, and yet another to get pairwise comparisons, etc.).
Minimal amount of code needed for all functions (typically only data
, x
, and y
), which minimizes chances of error and makes for tidy scripts.
Conveniently toggle between statistical approaches.
Truly makes your figures worth a thousand words.
No need to copy-paste results to the text editor (MS-Word, e.g.).
Disembodied figures stand on their own and are easy to evaluate for the reader.
More breathing room for theoretical discussion and other text.
No need to worry about updating figures and statistical details separately.
{ggstatsplot}
This package is…
❌ an alternative to learning {ggplot2}
✅ (The better you know {ggplot2}
, the more you can modify the defaults to your liking.)
❌ meant to be used in talks/presentations
✅ (Default plots can be too complicated for effectively communicating results in time-constrained presentation settings, e.g. conference talks.)
❌ the only game in town
✅ (GUI software alternatives: JASP and jamovi).
In case you use the GUI software jamovi
, you can install a module called jjstatsplot
, which is a wrapper around {ggstatsplot}
.
I’m happy to receive bug reports, suggestions, questions, and (most of all) contributions to fix problems and add features. I personally prefer using the GitHub
issues system over trying to reach out to me in other ways (personal e-mail, Twitter, etc.). Pull Requests for contributions are encouraged.
Here are some simple ways in which you can contribute (in the increasing order of commitment):
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Author: IndrajeetPatil
Source Code: https://github.com/IndrajeetPatil/ggstatsplot
License: GPL-3.0, GPL-3.0 licenses found
1659765300
RSyntaxTree is a graphical syntax tree generator written in the Ruby programming language.
See updates and a working web interface available at https://yohasebe.com/rsyntaxtree.
You can run RSyntaxTree's web interface on your local machine using Docker Desktop. See RSyntaxTree Web UI
# gem install rsyntaxtree
For the web interface, see Usage section of https://yohasebe.com/rsyntaxtree.
For the command-line interface, type $rsyntaxtree -h
after installation. Here's what you get:
RSyntaxTree, (linguistic) syntax tree generator written in Ruby.
Usage:
rsyntaxtree [options] "[VP [VP [V set] [NP bracket notation]] [ADV here]]"
where [options] are:
-o, --outdir=<s> Output directory (default: ./)
-f, --format=<s> Output format: png, gif, jpg, pdf, or svg (default: png)
-l, --leafstyle=<s> visual style of tree leaves: auto, triangle, bar, or nothing (default: auto)
-n, --fontstyle=<s> Font style (available when ttf font is specified): sans, serif, cjk (default: sans)
-t, --font=<s> Path to a ttf font used to generate tree (optional)
-s, --fontsize=<i> Size: 8-26 (default: 16)
-m, --margin=<i> Margin: 0-10 (default: 1)
-v, --vheight=<f> Connector Height: 0.5-5.0 (default: 2.0)
-c, --color=<s> Color text and bars: on or off (default: on)
-y, --symmetrize=<s> Generate radically symmetrical, balanced tree: on or off (default: off)
-r, --transparent=<s> Make background transparent: on or off (default: off)
-p, --polyline=<s> draw polyline connectors: on or off (default: off)
-e, --version Print version and exit
-h, --help Show this message```
See the documentation for more detailed info about the syntax.
See RSyntaxTree Examples.
Input text
[S
[NP |R|<>SyntaxTree]
[VP
[V generates]
[NP
[Adj #\+multilingual\
\+beautiful]
[NP syntax\
trees]
]
]
]
Output (PNG or SVG)
For the latest updates and downloads please visit http://github.com/yohasebe/rsyntaxtree
Author: Yohasebe
Source Code: https://github.com/yohasebe/rsyntaxtree
License: MIT License
1659749940
ggrepel provides geoms for ggplot2 to repel overlapping text labels:
geom_text_repel()
geom_label_repel()
Text labels repel away from each other, away from data points, and away from edges of the plotting area.
library(ggrepel)
ggplot(mtcars, aes(wt, mpg, label = rownames(mtcars))) +
geom_text_repel() +
geom_point(color = 'red') +
theme_classic(base_size = 16)
# The easiest way to get ggrepel is to install it from CRAN:
install.packages("ggrepel")
# Or get the the development version from GitHub:
# install.packages("devtools")
devtools::install_github("slowkow/ggrepel")
See the examples page to learn more about how to use ggrepel in your project.
Please submit an issue to report bugs or ask questions.
Please contribute bug fixes or new features with a pull request to this repository.
Author: Slowkow
Source Code: https://github.com/slowkow/ggrepel
License: GPL-3.0 license
1659656880
Exploratory Data Analysis (EDA) is the initial and an important phase of data analysis/predictive modeling. During this process, analysts/modelers will have a first look of the data, and thus generate relevant hypotheses and decide next steps. However, the EDA process could be a hassle at times. This R package aims to automate most of data handling and visualization, so that users could focus on studying the data and extracting insights.
The package can be installed directly from CRAN.
install.packages("DataExplorer")
However, the latest stable version (if any) could be found on GitHub, and installed using devtools
package.
if (!require(devtools)) install.packages("devtools")
devtools::install_github("boxuancui/DataExplorer")
If you would like to install the latest development version, you may install the develop branch.
if (!require(devtools)) install.packages("devtools")
devtools::install_github("boxuancui/DataExplorer", ref = "develop")
The package is extremely easy to use. Almost everything could be done in one line of code. Please refer to the package manuals for more information. You may also find the package vignettes here.
To get a report for the airquality dataset:
library(DataExplorer)
create_report(airquality)
To get a report for the diamonds dataset with response variable price:
library(ggplot2)
create_report(diamonds, y = "price")
Instead of running create_report
, you may also run each function individually for your analysis, e.g.,
## View basic description for airquality data
introduce(airquality)
rows | 153 |
columns | 6 |
discrete_columns | 0 |
continuous_columns | 6 |
all_missing_columns | 0 |
total_missing_values | 44 |
complete_rows | 111 |
total_observations | 918 |
memory_usage | 6,376 |
## Plot basic description for airquality data
plot_intro(airquality)
## View missing value distribution for airquality data
plot_missing(airquality)
## Left: frequency distribution of all discrete variables
plot_bar(diamonds)
## Right: `price` distribution of all discrete variables
plot_bar(diamonds, with = "price")
## View frequency distribution by a discrete variable
plot_bar(diamonds, by = "cut")
## View histogram of all continuous variables
plot_histogram(diamonds)
## View estimated density distribution of all continuous variables
plot_density(diamonds)
## View quantile-quantile plot of all continuous variables
plot_qq(diamonds)
## View quantile-quantile plot of all continuous variables by feature `cut`
plot_qq(diamonds, by = "cut")
## View overall correlation heatmap
plot_correlation(diamonds)
## View bivariate continuous distribution based on `cut`
plot_boxplot(diamonds, by = "cut")
## Scatterplot `price` with all other continuous features
plot_scatterplot(split_columns(diamonds)$continuous, by = "price", sampled_rows = 1000L)
## Visualize principal component analysis
plot_prcomp(diamonds, maxcat = 5L)
#> 2 features with more than 5 categories ignored!
#> color: 7 categories
#> clarity: 8 categories
To make quick updates to your data:
## Group bottom 20% `clarity` by frequency
group_category(diamonds, feature = "clarity", threshold = 0.2, update = TRUE)
## Group bottom 20% `clarity` by `price`
group_category(diamonds, feature = "clarity", threshold = 0.2, measure = "price", update = TRUE)
## Dummify diamonds dataset
dummify(diamonds)
dummify(diamonds, select = "cut")
## Set values for missing observations
df <- data.frame("a" = rnorm(260), "b" = rep(letters, 10))
df[sample.int(260, 50), ] <- NA
set_missing(df, list(0L, "unknown"))
## Update columns
update_columns(airquality, c("Month", "Day"), as.factor)
update_columns(airquality, 1L, function(x) x^2)
## Drop columns
drop_columns(diamonds, 8:10)
drop_columns(diamonds, "clarity")
See article wiki page.
Author: Boxuancui
Source Code: https://github.com/boxuancui/DataExplorer
License: View license
1658321700
Plotly.js is a standalone Javascript data visualization library, and it also powers the Python and R modules named plotly
in those respective ecosystems (referred to as Plotly.py and Plotly.R).
Plotly.js can be used to produce dozens of chart types and visualizations, including statistical charts, 3D graphs, scientific charts, SVG and tile maps, financial charts and more.
Contact us for Plotly.js consulting, dashboard development, application integration, and feature additions.
Install a ready-to-use distributed bundle
npm i --save plotly.js-dist-min
and use import or require in node.js
// ES6 module
import Plotly from 'plotly.js-dist-min'
// CommonJS
var Plotly = require('plotly.js-dist-min')
You may also consider using plotly.js-dist
if you prefer using an unminified package.
In the examples below
Plotly
object is added to the window scope byscript
. ThenewPlot
method is then used to draw an interactive figure as described bydata
andlayout
into the desireddiv
here namedgd
. As demonstrated in the example above basic knowledge ofhtml
and JSON syntax is enough to get started i.e. with/without JavaScript! To learn and build more with plotly.js please visit plotly.js documentation.
<head>
<script src="https://cdn.plot.ly/plotly-2.13.1.min.js"></script>
</head>
<body>
<div id="gd"></div>
<script>
Plotly.newPlot("gd", /* JSON object */ {
"data": [{ "y": [1, 2, 3] }],
"layout": { "width": 600, "height": 400}
})
</script>
</body>
Alternatively you may consider using native ES6 import in the script tag.
<script type="module">
import "https://cdn.plot.ly/plotly-2.13.1.min.js"
Plotly.newPlot("gd", [{ y: [1, 2, 3] }])
</script>
Fastly supports Plotly.js with free CDN service. Read more at https://www.fastly.com/open-source.
While non-minified source files may contain characters outside UTF-8, it is recommended that you specify the charset
when loading those bundles.
<script src="https://cdn.plot.ly/plotly-2.13.1.js" charset="utf-8"></script>
Please note that as of v2 the "plotly-latest" outputs (e.g. https://cdn.plot.ly/plotly-latest.min.js) will no longer be updated on the CDN, and will stay at the last v1 patch v1.58.5. Therefore, to use the CDN with plotly.js v2 and higher, you must specify an exact plotly.js version.
You could load either version two or version three of MathJax files, for example:
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-AMS-MML_SVG.js"></script>
<script src="https://cdn.jsdelivr.net/npm/mathjax@3.2.2/es5/tex-svg.js"></script>
When using MathJax version 3, it is also possible to use
chtml
output on the other parts of the page in addition tosvg
output for the plotly graph. Please refer todevtools/test_dashboard/index-mathjax3chtml.html
to see an example.
There are two kinds of plotly.js bundles:
npm
and the CDN
, described in the dist README.If your library needs to bundle or directly load plotly.js/lib/index.js or parts of its modules similar to index-basic in some other way than via an official or a custom bundle, or in case you want to tweak the default build configurations of browserify
or webpack
, etc. then please visit BUILDING.md
.
Official plotly.js documentation is hosted at https://plotly.com/javascript.
These pages are generated by the Plotly graphing-library-docs repo built with Jekyll and publicly hosted on GitHub Pages. For more info about contributing to Plotly documentation, please read through contributing guidelines.
Have a bug or a feature request? Please open a Github issue keeping in mind the issue guidelines. You may also want to read about how changes get made to Plotly.js
Please read through our contributing guidelines. Included are directions for opening issues, using plotly.js in your project and notes on development.
Plotly.js is at the core of a large and dynamic ecosystem with many contributors who file issues, reproduce bugs, suggest improvements, write code in this repo (and other upstream or downstream ones) and help users in the Plotly community forum. The following people deserve special recognition for their outsized contributions to this ecosystem:
GitHub | Status | ||
---|---|---|---|
Alex C. Johnson | @alexcjohnson | Active, Maintainer | |
Mojtaba Samimi | @archmoj | @solarchvision | Active, Maintainer |
Antoine Roy-Gobeil | @antoinerg | Active, Maintainer | |
Nicolas Kruchten | @nicolaskruchten | @nicolaskruchten | Active, Maintainer |
Jon Mease | @jonmmease | @jonmmease | Active |
Étienne Tétreault-Pinard | @etpinard | @etpinard | Hall of Fame |
Mikola Lysenko | @mikolalysenko | @MikolaLysenko | Hall of Fame |
Ricky Reusser | @rreusser | @rickyreusser | Hall of Fame |
Dmitry Yv. | @dy | @DimaYv | Hall of Fame |
Robert Monfera | @monfera | @monfera | Hall of Fame |
Robert Möstl | @rmoestl | @rmoestl | Hall of Fame |
Nicolas Riesco | @n-riesco | Hall of Fame | |
Miklós Tusz | @mdtusz | @mdtusz | Hall of Fame |
Chelsea Douglas | @cldougl | Hall of Fame | |
Ben Postlethwaite | @bpostlethwaite | Hall of Fame | |
Chris Parmer | @chriddyp | Hall of Fame | |
Alex Vados | @alexander-daniel | Hall of Fame |
Code and documentation copyright 2021 Plotly, Inc.
Code released under the MIT license.
This project is maintained under the Semantic Versioning guidelines.
See the Releases section of our GitHub project for changelogs for each release version of plotly.js.
plotly-js
) or on Stack Overflow (tagged plotly
).plotly
on packages which modify or add to the functionality of plotly.js when distributing through npm.Author: Plotly
Source Code: https://github.com/plotly/plotly.js
License: MIT license
1658306820
A three-dimensional static graph viewer.
(click the image to try it out)
<html>
<head>
<style>
#graph {
width: 500px;
height: 500px;
border: 1px solid grey;
}
</style>
</head>
<body>
<div id="graph"></div>
<script src="graphosaurus.min.js"></script>
<script>
// JavaScript will go here
</script>
</body>
</html>
If you open this up in your web browser, you'll see something that looks like this:
Look at that amazing square! Now let's create a graph, a couple nodes, and an edge between the nodes:
var graph = G.graph()
// Create a red node with cartesian coordinates x=0, y=0, z=0
var redNode = G.node([0, 0, 0], {color: "red"});
graph.addNode(redNode);
// You can also use the addTo method to add to the graph
var greenNode = G.node([1, 1, 1], {color: "green"}).addTo(graph);
var edge = G.edge([redNode, greenNode], {color: "blue"});
graph.addEdge(edge); // or edge.addTo(graph)
// Render the graph in the HTML element with id='graph'
graph.renderIn("graph");
After inserting this JavaScript in the <script>
block, you should see this:
While this is a very basic example, I hope I've demonstrated how simple it is to create graphs with Graphosaurus.
git clone https://github.com/frewsxcv/graphosaurus.git
to clone this repositorynpm install
to install all the build requirementsgrunt
to build Graphosaurus. The resulting compiled JavaScript will be in dist/
and the docs will be in doc/
JSDoc generated API documentation can be found here.
John Conway's illustration of our glorious leader, the gryposaurus graphosaurus.
All files in this repository are licensed under version two of the Mozilla Public License.
Graphosaurus has some third party dependencies listed in the package.json
file in the devDependencies
and dependencies
sections. Their licenses can be found on their respective project pages.
Author: frewsxcv
Source Code: https://github.com/frewsxcv/graphosaurus
License: MPL-2.0 license
1658302390
React components for Chart.js, the most popular charting library.
Supports Chart.js v3 and v2.
Install this library with peer dependencies:
pnpm add react-chartjs-2 chart.js
# or
yarn add react-chartjs-2 chart.js
# or
npm i react-chartjs-2 chart.js
We recommend using chart.js@^3.0.0
.
Then, import and use individual components:
import { Doughnut } from 'react-chartjs-2';
<Doughnut data={...} />
Need an API to fetch data? Consider Cube, an open-source API for data apps.
Author: Reactchartjs
Source Code: https://github.com/reactchartjs/react-chartjs-2
License: MIT, MIT licenses found
1658291760
Apache ECharts is a free, powerful charting and visualization library offering an easy way of adding intuitive, interactive, and highly customizable charts to your commercial products. It is written in pure JavaScript and based on zrender, which is a whole new lightweight canvas library.
You may choose one of the following methods:
npm install echarts --save
Build echarts source code:
Execute the instructions in the root directory of the echarts: (Node.js is required)
# Install the dependencies from NPM:
npm install
# Rebuild source code immediately in watch mode when changing the source code.
npm run dev
# Check correctness of TypeScript code.
npm run checktype
# If intending to build and get all types of the "production" files:
npm run release
Then the "production" files are generated in the dist
directory.
If you wish to debug locally or make pull requests, please refer to the contributing document.
https://github.com/ecomfe/awesome-echarts
ECharts GL An extension pack of ECharts, which provides 3D plots, globe visualization, and WebGL acceleration.
Extension for Baidu Map 百度地图扩展 An extension provides a wrapper of Baidu Map Service SDK.
vue-echarts ECharts component for Vue.js
echarts-stat Statistics tool for ECharts
Please refer to Apache Code of Conduct.
Deqing Li, Honghui Mei, Yi Shen, Shuang Su, Wenli Zhang, Junting Wang, Ming Zu, Wei Chen. ECharts: A Declarative Framework for Rapid Construction of Web-based Visualization. Visual Informatics, 2018.
Author: Apache
Source Code: https://github.com/apache/echarts
License: Apache-2.0 license
1658284320
The dygraphs JavaScript library produces interactive, zoomable charts of time series:
Learn more about it at dygraphs.com.
Get help with dygraphs by browsing the on Stack Overflow (preferred) and Google Groups.
<html>
<head>
<script type="text/javascript" src="dygraph.js"></script>
<link rel="stylesheet" href="dygraph.css" />
</head>
<body>
<div id="graphdiv"></div>
<script type="text/javascript">
g = new Dygraph(
document.getElementById("graphdiv"), // containing div
"Date,Temperature\n" + // the data series
"2008-05-07,75\n" +
"2008-05-08,70\n" +
"2008-05-09,80\n",
{ } // the options
);
</script>
</body>
</html>
Learn more by reading the tutorial and seeing demonstrations of what dygraphs can do in the gallery. You can get dygraph.js
and dygraph.css
from cdnjs or from NPM (see below).
Get dygraphs from NPM:
npm install dygraphs
You'll find pre-built JS & CSS files in node_modules/dygraphs/dist
. If you're using a module bundler like browserify or webpack, you can import dygraphs:
import Dygraph from 'dygraphs';
// or: const Dygraph = require('dygraphs');
const g = new Dygraph('graphdiv', data, { /* options */ });
Check out the dygraphs-es6 repo for a fully-worked example.
To get going, clone the repo and run:
npm install
npm run build
Then open tests/demo.html
in your browser.
Read more about the dygraphs development process in the developer guide.
Author: Danvk
Source Code: https://github.com/danvk/dygraphs
License: MIT license