1597027020

How R and R-Studio can make the writing manuscripts easy

Background

In earlier posts, we have already discussed how using an R language is beneficial to the researchers in several ways (here). It provides many benefits such as reproducible research, greater visualization, and a huge audience for your research. Also, we have discussed how and where the research contributed in the R can be published in reputed journals, here.

Now, consider, you are ready with some research contribution in the R and now in a position to draft a manuscript for the methodology or the R package.

If it is a methodology implemented in R and you wish to discuss its performance in your manuscript. In such a case, you may start with an introduction, literature review, proposed methodology sections followed by the results, performance evaluation, and plots you achieved in the R. Finally, you may conclude the manuscript with some remarks. If it is so, you will start by writing all the texts in the respective sections, citations, tables, equations, and figures. These figures and tables can be the output of the analysis done in R. If you are willing to draft the manuscript in LaTeX in the traditional way, you will chalk out all analysis and make a table in the Latex, and you may upload the same figure produced in the R study.

Consider it is a manuscript introducing an R package, the manuscript will be full of R code chunks, R function descriptions, and respective case studies with output plots. It can be really challenging for authors to draft such kind of manuscript with code chunks and other R material in traditional latex format of the journals. You might have observed in several articles, some of them have written the codes very much similar to the original format and fonts, but many of them failed to do so. To avoid the code chunks in the manuscript, some authors preferred to provide the code as a supplement material.

This situation can be explained in other words as a researcher is ready with codes, desired results, and plots, but he/she can not use this content directly to his/her manuscript. Rather, he/she needs to save the results outside the R-Studio and then to embed it in the latex of the manuscript.

To avoid such a situation, this post is demonstrating how R and R-Studio can be used to write a manuscript for several journals with minimum efforts and how the R codes chunks and corresponding output can be directly embedded in the manuscript.

#publishing #data-science #journal #r #research #data analysis

1641430440

Bokeh Plotting Backend for Pandas and GeoPandas

Pandas-Bokeh provides a Bokeh plotting backend for Pandas, GeoPandas and Pyspark DataFrames, similar to the already existing Visualization feature of Pandas. Importing the library adds a complementary plotting method plot_bokeh() on DataFrames and Series.

With Pandas-Bokeh, creating stunning, interactive, HTML-based visualization is as easy as calling:

df.plot_bokeh()

Pandas-Bokeh also provides native support as a Pandas Plotting backend for Pandas >= 0.25. When Pandas-Bokeh is installed, switchting the default Pandas plotting backend to Bokeh can be done via:

pd.set_option('plotting.backend', 'pandas_bokeh')


More details about the new Pandas backend can be found below.

Interactive Documentation

https://patrikhlobil.github.io/Pandas-Bokeh/

for an interactive version of the documentation below, where you can play with the dynamic Bokeh plots.

For more information have a look at the Examples below or at notebooks on the Github Repository of this project.

Installation

You can install Pandas-Bokeh from PyPI via pip

pip install pandas-bokeh


or conda:

conda install -c patrikhlobil pandas-bokeh


With the current release 0.5.5, Pandas-Bokeh officially supports Python 3.6 and newer. For more details, see Release Notes.

How To Use

Classical Use

The Pandas-Bokeh library should be imported after Pandas, GeoPandas and/or Pyspark. After the import, one should define the plotting output, which can be:

pandas_bokeh.output_notebook(): Embeds the Plots in the cell outputs of the notebook. Ideal when working in Jupyter Notebooks.

pandas_bokeh.output_file(filename): Exports the plot to the provided filename as an HTML.

For more details about the plotting outputs, see the reference here or the Bokeh documentation.

import pandas as pd import pandas_bokeh pandas_bokeh.output_notebook()

import pandas as pd import pandas_bokeh pandas_bokeh.output_file("Interactive Plot.html")

Pandas-Bokeh as native Pandas plotting backend

For pandas >= 0.25, a plotting backend switch is natively supported. It can be achievied by calling:

import pandas as pd
pd.set_option('plotting.backend', 'pandas_bokeh')


Now, the plotting API is accessible for a Pandas DataFrame via:

df.plot(...)


All additional functionalities of Pandas-Bokeh are then accessible at pd.plotting. So, setting the output to notebook is:

pd.plotting.output_notebook()


or calling the grid layout functionality:

pd.plotting.plot_grid(...)


Note: Backwards compatibility is kept since there will still be the df.plot_bokeh(...) methods for a DataFrame.

Plot types

Supported plottypes are at the moment:

Also, check out the complementary chapter Outputs, Formatting & Layouts about:

Lineplot

Basic Lineplot

This simple lineplot in Pandas-Bokeh already contains various interactive elements:

• a pannable and zoomable (zoom in plotarea and zoom on axis) plot
• by clicking on the legend elements, one can hide and show the individual lines
• a Hovertool for the plotted lines

Consider the following simple example:

import numpy as np

np.random.seed(42)
"Apple": np.random.randn(1000)+0.17},
index=pd.date_range('1/1/2000', periods=1000))
df = df.cumsum()
df = df + 50
df.plot_bokeh(kind="line")       #equivalent to df.plot_bokeh.line()

Note, that similar to the regular pandas.DataFrame.plot method, there are also additional accessors to directly access the different plotting types like:

• df.plot_bokeh(kind="line", ...)df.plot_bokeh.line(...)
• df.plot_bokeh(kind="bar", ...)df.plot_bokeh.bar(...)
• df.plot_bokeh(kind="hist", ...)df.plot_bokeh.hist(...)
• ...

There are various optional parameters to tune the plots, for example:

kind: Which kind of plot should be produced. Currently supported are: "line", "point", "scatter", "bar" and "histogram". In the near future many more will be implemented as horizontal barplot, boxplots, pie-charts, etc.

x: Name of the column to use for the horizontal x-axis. If the x parameter is not specified, the index is used for the x-values of the plot. Alternative, also an array of values can be passed that has the same number of elements as the DataFrame.

y: Name of column or list of names of columns to use for the vertical y-axis.

figsize: Choose width & height of the plot

title: Sets title of the plot

xlim/ylim: Set visibler range of plot for x- and y-axis (also works for datetime x-axis)

xlabel/ylabel: Set x- and y-labels

logx/logy: Set log-scale on x-/y-axis

xticks/yticks: Explicitly set the ticks on the axes

color: Defines a single color for a plot.

colormap: Can be used to specify multiple colors to plot. Can be either a list of colors or the name of a Bokeh color palette

hovertool: If True a Hovertool is active, else if False no Hovertool is drawn.

hovertool_string: If specified, this string will be used for the hovertool (@{column} will be replaced by the value of the column for the element the mouse hovers over, see also Bokeh documentation and here)

toolbar_location: Specify the position of the toolbar location (None, "above", "below", "left" or "right"). Default: "right"

zooming: Enables/Disables zooming. Default: True

panning: Enables/Disables panning. Default: True

fontsize_label/fontsize_ticks/fontsize_title/fontsize_legend: Set fontsize of labels, ticks, title or legend (int or string of form "15pt")

rangetool Enables a range tool scroller. Default False

kwargs**: Optional keyword arguments of bokeh.plotting.figure.line

Try them out to get a feeling for the effects. Let us consider now:

df.plot_bokeh.line(
figsize=(800, 450),
y="Apple",
xlabel="Date",
ylabel="Stock price [$]", yticks=[0, 100, 200, 300, 400], ylim=(0, 400), toolbar_location=None, colormap=["red", "blue"], hovertool_string=r"""<img src='https://upload.wikimedia.org/wikipedia/commons/thumb/f/fa/Apple_logo_black.svg/170px-Apple_logo_black.svg.png' height="42" alt="@imgs" width="42" style="float: left; margin: 0px 15px 15px 0px;" border="2"></img> Apple <h4> Stock Price: </h4> @{Apple}""", panning=False, zooming=False) Lineplot with data points For lineplots, as for many other plot-kinds, there are some special keyword arguments that only work for this plotting type. For lineplots, these are: plot_data_points: Plot also the data points on the lines plot_data_points_size: Determines the size of the data points marker: Defines the point type (Default: "circle"). Possible values are: 'circle', 'square', 'triangle', 'asterisk', 'circle_x', 'square_x', 'inverted_triangle', 'x', 'circle_cross', 'square_cross', 'diamond', 'cross' kwargs**: Optional keyword arguments of bokeh.plotting.figure.line Let us use this information to have another version of the same plot: df.plot_bokeh.line( figsize=(800, 450), title="Apple vs Google", xlabel="Date", ylabel="Stock price [$]",
yticks=[0, 100, 200, 300, 400],
ylim=(100, 200),
xlim=("2001-01-01", "2001-02-01"),
colormap=["red", "blue"],
plot_data_points=True,
plot_data_points_size=10,
marker="asterisk")

Lineplot with rangetool

ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, columns=list('ABCD'))
df = df.cumsum()

df.plot_bokeh(rangetool=True)

Pointplot

If you just wish to draw the date points for curves, the pointplot option is the right choice. It also accepts the kwargs of bokeh.plotting.figure.scatter like marker or size:

import numpy as np

x = np.arange(-3, 3, 0.1)
y2 = x**2
y3 = x**3
df = pd.DataFrame({"x": x, "Parabula": y2, "Cube": y3})
df.plot_bokeh.point(
x="x",
xticks=range(-3, 4),
size=5,
colormap=["#009933", "#ff3399"],
title="Pointplot (Parabula vs. Cube)",
marker="x")

Stepplot

With a similar API as the line- & pointplots, one can generate a stepplot. Additional keyword arguments for this plot type are passes to bokeh.plotting.figure.step, e.g. mode (before, after, center), see the following example

import numpy as np

x = np.arange(-3, 3, 1)
y2 = x**2
y3 = x**3
df = pd.DataFrame({"x": x, "Parabula": y2, "Cube": y3})
df.plot_bokeh.step(
x="x",
xticks=range(-1, 1),
colormap=["#009933", "#ff3399"],
title="Pointplot (Parabula vs. Cube)",
figsize=(800,300),
fontsize_title=30,
fontsize_label=25,
fontsize_ticks=15,
fontsize_legend=5,
)

df.plot_bokeh.step(
x="x",
xticks=range(-1, 1),
colormap=["#009933", "#ff3399"],
title="Pointplot (Parabula vs. Cube)",
mode="after",
figsize=(800,300)
)

Note that the step-plot API of Bokeh does so far not support a hovertool functionality.

Scatterplot

A basic scatterplot can be created using the kind="scatter" option. For scatterplots, the x and y parameters have to be specified and the following optional keyword argument is allowed:

category: Determines the category column to use for coloring the scatter points

kwargs**: Optional keyword arguments of bokeh.plotting.figure.scatter

Note, that the pandas.DataFrame.plot_bokeh() method return per default a Bokeh figure, which can be embedded in Dashboard layouts with other figures and Bokeh objects (for more details about (sub)plot layouts and embedding the resulting Bokeh plots as HTML click here).

In the example below, we use the building grid layout support of Pandas-Bokeh to display both the DataFrame (using a Bokeh DataTable) and the resulting scatterplot:

# Load Iris Dataset:
r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/iris/iris.csv"
)
df = df.sample(frac=1)

# Create Bokeh-Table with DataFrame:
from bokeh.models.widgets import DataTable, TableColumn
from bokeh.models import ColumnDataSource

data_table = DataTable(
columns=[TableColumn(field=Ci, title=Ci) for Ci in df.columns],
source=ColumnDataSource(df),
height=300,
)

# Create Scatterplot:
p_scatter = df.plot_bokeh.scatter(
x="petal length (cm)",
y="sepal width (cm)",
category="species",
title="Iris DataSet Visualization",
show_figure=False,
)

# Combine Table and Scatterplot via grid layout:
pandas_bokeh.plot_grid([[data_table, p_scatter]], plot_width=400, plot_height=350)

A possible optional keyword parameters that can be passed to bokeh.plotting.figure.scatter is size. Below, we use the sepal length of the Iris data as reference for the size:

#Change one value to clearly see the effect of the size keyword
df.loc[13, "sepal length (cm)"] = 15

#Make scatterplot:
p_scatter = df.plot_bokeh.scatter(
x="petal length (cm)",
y="sepal width (cm)",
category="species",
title="Iris DataSet Visualization with Size Keyword",
size="sepal length (cm)")

In this example you can see, that the additional dimension sepal length cannot be used to clearly differentiate between the virginica and versicolor species.

Barplot

The barplot API has no special keyword arguments, but accepts optional kwargs of bokeh.plotting.figure.vbar like alpha. It uses per default the index for the bar categories (however, also columns can be used as x-axis category using the x argument).

data = {
'fruits':
['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'],
'2015': [2, 1, 4, 3, 2, 4],
'2016': [5, 3, 3, 2, 4, 6],
'2017': [3, 2, 4, 4, 5, 3]
}
df = pd.DataFrame(data).set_index("fruits")

p_bar = df.plot_bokeh.bar(
ylabel="Price per Unit [€]",
title="Fruit prices per Year",
alpha=0.6)

Using the stacked keyword argument you also maked stacked barplots:

p_stacked_bar = df.plot_bokeh.bar(
ylabel="Price per Unit [€]",
title="Fruit prices per Year",
stacked=True,
alpha=0.6)

Also horizontal versions of the above barplot are supported with the keyword kind="barh" or the accessor plot_bokeh.barh. You can still specify a column of the DataFrame as the bar category via the x argument if you do not wish to use the index.

#Reset index, such that "fruits" is now a column of the DataFrame:
df.reset_index(inplace=True)

#Create horizontal bar (via kind keyword):
p_hbar = df.plot_bokeh(
kind="barh",
x="fruits",
xlabel="Price per Unit [€]",
title="Fruit prices per Year",
alpha=0.6,
legend = "bottom_right",
show_figure=False)

#Create stacked horizontal bar (via barh accessor):
p_stacked_hbar = df.plot_bokeh.barh(
x="fruits",
stacked=True,
xlabel="Price per Unit [€]",
title="Fruit prices per Year",
alpha=0.6,
legend = "bottom_right",
show_figure=False)

#Plot all barplot examples in a grid:
pandas_bokeh.plot_grid([[p_bar, p_stacked_bar],
[p_hbar, p_stacked_hbar]],
plot_width=450)

Histogram

For drawing histograms (kind="hist"), Pandas-Bokeh has a lot of customization features. Optional keyword arguments for histogram plots are:

bins: Determines bins to use for the histogram. If bins is an int, it defines the number of equal-width bins in the given range (10, by default). If bins is a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths. If bins is a string, it defines the method used to calculate the optimal bin width, as defined by histogram_bin_edges.

histogram_type: Either "sidebyside", "topontop" or "stacked". Default: "topontop"

stacked: Boolean that overrides the histogram_type as "stacked" if given. Default: False

kwargs**: Optional keyword arguments of bokeh.plotting.figure.quad

Below examples of the different histogram types:

import numpy as np

df_hist = pd.DataFrame({
'a': np.random.randn(1000) + 1,
'b': np.random.randn(1000),
'c': np.random.randn(1000) - 1
},
columns=['a', 'b', 'c'])

#Top-on-Top Histogram (Default):
df_hist.plot_bokeh.hist(
bins=np.linspace(-5, 5, 41),
vertical_xlabel=True,
hovertool=False,
title="Normal distributions (Top-on-Top)",
line_color="black")

#Side-by-Side Histogram (multiple bars share bin side-by-side) also accessible via
#kind="hist":
df_hist.plot_bokeh(
kind="hist",
bins=np.linspace(-5, 5, 41),
histogram_type="sidebyside",
vertical_xlabel=True,
hovertool=False,
title="Normal distributions (Side-by-Side)",
line_color="black")

#Stacked histogram:
df_hist.plot_bokeh.hist(
bins=np.linspace(-5, 5, 41),
histogram_type="stacked",
vertical_xlabel=True,
hovertool=False,
title="Normal distributions (Stacked)",
line_color="black")

Further, advanced keyword arguments for histograms are:

• weights: A column of the DataFrame that is used as weight for the histogramm aggregation (see also numpy.histogram)
• normed: If True, histogram values are normed to 1 (sum of histogram values=1). It is also possible to pass an integer, e.g. normed=100 would result in a histogram with percentage y-axis (sum of histogram values=100). Default: False
• cumulative: If True, a cumulative histogram is shown. Default: False
• show_average: If True, the average of the histogram is also shown. Default: False

Their usage is shown in these examples:

p_hist = df_hist.plot_bokeh.hist(
y=["a", "b"],
bins=np.arange(-4, 6.5, 0.5),
normed=100,
vertical_xlabel=True,
ylabel="Share[%]",
title="Normal distributions (normed)",
show_average=True,
xlim=(-4, 6),
ylim=(0, 30),
show_figure=False)

p_hist_cum = df_hist.plot_bokeh.hist(
y=["a", "b"],
bins=np.arange(-4, 6.5, 0.5),
normed=100,
cumulative=True,
vertical_xlabel=True,
ylabel="Share[%]",
title="Normal distributions (normed & cumulative)",
show_figure=False)

pandas_bokeh.plot_grid([[p_hist, p_hist_cum]], plot_width=450, plot_height=300)

Areaplot

Areaplot (kind="area") can be either drawn on top of each other or stacked. The important parameters are:

stacked: If True, the areaplots are stacked. If False, plots are drawn on top of each other. Default: False

kwargs**: Optional keyword arguments of bokeh.plotting.figure.patch

Let us consider the energy consumption split by source that can be downloaded as DataFrame via:

df_energy = pd.read_csv(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/energy/energy.csv",
parse_dates=["Year"])
df_energy.head()

Creating the Areaplot can be achieved via:

df_energy.plot_bokeh.area(
x="Year",
stacked=True,
legend="top_left",
colormap=["brown", "orange", "black", "grey", "blue", "green"],
title="Worldwide energy consumption split by energy source",
ylabel="Million tonnes oil equivalent",
ylim=(0, 16000))

Note that the energy consumption of fossile energy is still increasing and renewable energy sources are still small in comparison 😢!!! However, when we norm the plot using the normed keyword, there is a clear trend towards renewable energies in the last decade:

df_energy.plot_bokeh.area(
x="Year",
stacked=True,
normed=100,
legend="bottom_left",
colormap=["brown", "orange", "black", "grey", "blue", "green"],
title="Worldwide energy consumption split by energy source",
ylabel="Million tonnes oil equivalent")

Pieplot

For Pieplots, let us consider a dataset showing the results of all Bundestags elections in Germany since 2002:

df_pie = pd.read_csv(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/Bundestagswahl/Bundestagswahl.csv")
df_pie

We can create a Pieplot of the last election in 2017 by specifying the "Partei" (german for party) column as the x column and the "2017" column as the y column for values:

df_pie.plot_bokeh.pie(
x="Partei",
y="2017",
colormap=["blue", "red", "yellow", "green", "purple", "orange", "grey"],
title="Results of German Bundestag Election 2017",
)

When you pass several columns to the y parameter (not providing the y-parameter assumes you plot all columns), multiple nested pieplots will be shown in one plot:

df_pie.plot_bokeh.pie(
x="Partei",
colormap=["blue", "red", "yellow", "green", "purple", "orange", "grey"],
title="Results of German Bundestag Elections [2002-2017]",
line_color="grey")

Mapplot

The mapplot method of Pandas-Bokeh allows for plotting geographic points stored in a Pandas DataFrame on an interactive map. For more advanced Geoplots for line and polygon shapes have a look at the Geoplots examples for the GeoPandas API of Pandas-Bokeh.

For mapplots, only (latitude, longitude) pairs in geographic projection (WGS84) can be plotted on a map. The basic API has the following 2 base parameters:

• x: name of the longitude column of the DataFrame
• y: name of the latitude column of the DataFrame

The other optional keyword arguments are discussed in the section about the GeoPandas API, e.g. category for coloring the points.

Below an example of plotting all cities for more than 1 million inhabitants:

df_mapplot = pd.read_csv(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/populated%20places/populated_places.csv")
df_mapplot.head()
df_mapplot["size"] = df_mapplot["pop_max"] / 1000000
df_mapplot.plot_bokeh.map(
x="longitude",
y="latitude",
hovertool_string="""<h2> @{name} </h2>

<h3> Population: @{pop_max} </h3>""",
tile_provider="STAMEN_TERRAIN_RETINA",
size="size",
figsize=(900, 600),
title="World cities with more than 1.000.000 inhabitants")

Geoplots

Pandas-Bokeh also allows for interactive plotting of Maps using GeoPandas by providing a geopandas.GeoDataFrame.plot_bokeh() method. It allows to plot the following geodata on a map :

• Points/MultiPoints
• Lines/MultiLines
• Polygons/MultiPolygons

Note: t is not possible to mix up the objects types, i.e. a GeoDataFrame with Points and Lines is for example not allowed.

Les us start with a simple example using the "World Borders Dataset" . Let us first import all neccessary libraries and read the shapefile:

import geopandas as gpd
import pandas as pd
import pandas_bokeh
pandas_bokeh.output_notebook()

df_states.head()

Plotting the data on a map is as simple as calling:

df_states.plot_bokeh(simplify_shapes=10000)

We also passed the optional parameter simplify_shapes (~meter) to improve plotting performance (for a reference see shapely.object.simplify). The above geolayer thus has an accuracy of about 10km.

Many keyword arguments like xlabel, ylabel, xlim, ylim, title, colormap, hovertool, zooming, panning, ... for costumizing the plot are also available for the geoplotting API and can be uses as in the examples shown above. There are however also many other options especially for plotting geodata:

• geometry_column: Specify the column that stores the geometry-information (default: "geometry")
• hovertool_columns: Specify column names, for which values should be shown in hovertool
• hovertool_string: If specified, this string will be used for the hovertool (@{column} will be replaced by the value of the column for the element the mouse hovers over, see also Bokeh documentation)
• colormap_uselog: If set True, the colormapper is using a logscale. Default: False
• colormap_range: Specify the value range of the colormapper via (min, max) tuple
• tile_provider: Define build-in tile provider for background maps. Possible values: None, 'CARTODBPOSITRON', 'CARTODBPOSITRON_RETINA', 'STAMEN_TERRAIN', 'STAMEN_TERRAIN_RETINA', 'STAMEN_TONER', 'STAMEN_TONER_BACKGROUND', 'STAMEN_TONER_LABELS'. Default: CARTODBPOSITRON_RETINA
• tile_provider_url: An arbitraty tile_provider_url of the form '/{Z}/{X}/{Y}*.png' can be passed to be used as background map.
• tile_attribution: String (also HTML accepted) for showing attribution for tile source in the lower right corner
• tile_alpha: Sets the alpha value of the background tile between [0, 1]. Default: 1

One of the most common usage of map plots are choropleth maps, where the color of a the objects is determined by the property of the object itself. There are 3 ways of drawing choropleth maps using Pandas-Bokeh, which are described below.

Categories

This is the simplest way. Just provide the category keyword for the selection of the property column:

• category: Specifies the column of the GeoDataFrame that should be used to draw a choropleth map
• show_colorbar: Whether or not to show a colorbar for categorical plots. Default: True

Let us now draw the regions as a choropleth plot using the category keyword (at the moment, only numerical columns are supported for choropleth plots):

df_states.plot_bokeh(
figsize=(900, 600),
simplify_shapes=5000,
category="REGION",
show_colorbar=False,
colormap=["blue", "yellow", "green", "red"],
hovertool_columns=["STATE_NAME", "REGION"],
tile_provider="STAMEN_TERRAIN_RETINA")

When hovering over the states, the state-name and the region are shown as specified in the hovertool_columns argument.

Dropdown

By passing a list of column names of the GeoDataFrame as the dropdown keyword argument, a dropdown menu is shown above the map. This dropdown menu can be used to select the choropleth layer by the user. :

df_states["STATE_NAME_SMALL"] = df_states["STATE_NAME"].str.lower()

df_states.plot_bokeh(
figsize=(900, 600),
simplify_shapes=5000,
dropdown=["POPESTIMATE2010", "POPESTIMATE2017"],
colormap="Viridis",
hovertool_string="""
<img
src="https://www.states101.com/img/flags/gif/small/@STATE_NAME_SMALL.gif"
height="42" alt="@imgs" width="42"
style="float: left; margin: 0px 15px 15px 0px;"
border="2"></img>

<h2>  @STATE_NAME </h2>
<h3> 2010: @POPESTIMATE2010 </h3>
<h3> 2017: @POPESTIMATE2017 </h3>""",
tile_provider_url=r"http://c.tile.stamen.com/watercolor/{Z}/{X}/{Y}.jpg",
tile_attribution='Map tiles by <a href="http://stamen.com">Stamen Design</a>, under <a href="http://creativecommons.org/licenses/by/3.0">CC BY 3.0</a>. Data by <a href="http://openstreetmap.org">OpenStreetMap</a>, under <a href="http://www.openstreetmap.org/copyright">ODbL</a>.'
)

Using hovertool_string, one can pass a string that can contain arbitrary HTML elements (including divs, images, ...) that is shown when hovering over the geographies (@{column} will be replaced by the value of the column for the element the mouse hovers over, see also Bokeh documentation).

Here, we also used an OSM tile server with watercolor style via tile_provider_url and added the attribution via tile_attribution.

Sliders

Another option for interactive choropleth maps is the slider implementation of Pandas-Bokeh. The possible keyword arguments are here:

• slider: By passing a list of column names of the GeoDataFrame, a slider can be used to . This dropdown menu can be used to select the choropleth layer by the user.
• slider_range: Pass a range (or numpy.arange) of numbers object to relate the sliders values with the slider columns. By passing range(0,10), the slider will have values [0, 1, 2, ..., 9], when passing numpy.arange(3,5,0.5), the slider will have values [3, 3.5, 4, 4.5]. Default: range(0, len(slider))
• slider_name: Specifies the title of the slider. Default is an empty string.

This can be used to display the change in population relative to the year 2010:


#Calculate change of population relative to 2010:
for i in range(8):
df_states["Delta_Population_201%d"%i] = ((df_states["POPESTIMATE201%d"%i] / df_states["POPESTIMATE2010"]) -1 ) * 100

#Specify slider columns:
slider_columns = ["Delta_Population_201%d"%i for i in range(8)]

#Specify slider-range (Maps "Delta_Population_2010" -> 2010,
#                           "Delta_Population_2011" -> 2011, ...):
slider_range = range(2010, 2018)

#Make slider plot:
df_states.plot_bokeh(
figsize=(900, 600),
simplify_shapes=5000,
slider=slider_columns,
slider_range=slider_range,
slider_name="Year",
colormap="Inferno",
hovertool_columns=["STATE_NAME"] + slider_columns,
title="Change of Population [%]")

Plot multiple geolayers

If you wish to display multiple geolayers, you can pass the Bokeh figure of a Pandas-Bokeh plot via the figure keyword to the next plot_bokeh() call:

import geopandas as gpd
import pandas_bokeh
pandas_bokeh.output_notebook()

# Read in GeoJSONs from URL:
r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/populated%20places/ne_10m_populated_places_simple_bigcities.geojson"
)
df_cities["size"] = df_cities.pop_max / 400000

#Plot shapes of US states (pass figure options to this initial plot):
figure = df_states.plot_bokeh(
figsize=(800, 450),
simplify_shapes=10000,
show_figure=False,
xlim=[-170, -80],
ylim=[10, 70],
category="REGION",
colormap="Dark2",
legend="States",
show_colorbar=False,
)

#Plot cities as points on top of the US states layer by passing the figure:
df_cities.plot_bokeh(
figure=figure,         # <== pass figure here!
category="pop_max",
colormap="Viridis",
colormap_uselog=True,
size="size",
hovertool_string="""<h1>@name</h1>
<h3>Population: @pop_max </h3>""",
marker="inverted_triangle",
legend="Cities",
)

Point & Line plots:

Below, you can see an example that use Pandas-Bokeh to plot point data on a map. The plot shows all cities with a population larger than 1.000.000. For point plots, you can select the marker as keyword argument (since it is passed to bokeh.plotting.figure.scatter). Here an overview of all available marker types:

gdf = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/populated%20places/ne_10m_populated_places_simple_bigcities.geojson")
gdf["size"] = gdf.pop_max / 400000

gdf.plot_bokeh(
category="pop_max",
colormap="Viridis",
colormap_uselog=True,
size="size",
hovertool_string="""<h1>@name</h1>
<h3>Population: @pop_max </h3>""",
xlim=[-15, 35],
ylim=[30,60],
marker="inverted_triangle");

In a similar way, also GeoDataFrames with (multi)line shapes can be drawn using Pandas-Bokeh.

Colorbar formatting:

If you want to display the numerical labels on your colorbar with an alternative to the scientific format, you can pass in a one of the bokeh number string formats or an instance of one of the bokeh.models.formatters to the colorbar_tick_format argument in the geoplot

An example of using the string format argument:

df_states = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/states/states.geojson")

df_states["STATE_NAME_SMALL"] = df_states["STATE_NAME"].str.lower()

# pass in a string format to colorbar_tick_format to display the ticks as 10m rather than 1e7
df_states.plot_bokeh(
figsize=(900, 600),
category="POPESTIMATE2017",
simplify_shapes=5000,
colormap="Inferno",
colormap_uselog=True,
colorbar_tick_format="0.0a")

An example of using the bokeh PrintfTickFormatter:

df_states = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/states/states.geojson")

df_states["STATE_NAME_SMALL"] = df_states["STATE_NAME"].str.lower()

for i in range(8):
df_states["Delta_Population_201%d"%i] = ((df_states["POPESTIMATE201%d"%i] / df_states["POPESTIMATE2010"]) -1 ) * 100

# pass in a PrintfTickFormatter instance colorbar_tick_format to display the ticks with 2 decimal places
df_states.plot_bokeh(
figsize=(900, 600),
category="Delta_Population_2017",
simplify_shapes=5000,
colormap="Inferno",
colorbar_tick_format=PrintfTickFormatter(format="%4.2f"))

Outputs, Formatting & Layouts

Output options

The pandas.DataFrame.plot_bokeh API has the following additional keyword arguments:

• show_figure: If True, the resulting figure is shown (either in the notebook or exported and shown as HTML file, see Basics. If False, None is returned. Default: True
• return_html: If True, the method call returns an HTML string that contains all Bokeh CSS&JS resources and the figure embedded in a div. This HTML representation of the plot can be used for embedding the plot in an HTML document. Default: False

If you have a Bokeh figure or layout, you can also use the pandas_bokeh.embedded_html function to generate an embeddable HTML representation of the plot. This can be included into any valid HTML (note that this is not possible directly with the HTML generated by the pandas_bokeh.output_file output option, because it includes an HTML header). Let us consider the following simple example:

#Import Pandas and Pandas-Bokeh (if you do not specify an output option, the standard is
#output_file):
import pandas as pd
import pandas_bokeh

#Create DataFrame to Plot:
import numpy as np
x = np.arange(-10, 10, 0.1)
sin = np.sin(x)
cos = np.cos(x)
tan = np.tan(x)
df = pd.DataFrame({"x": x, "sin(x)": sin, "cos(x)": cos, "tan(x)": tan})

#Make Bokeh plot from DataFrame using Pandas-Bokeh. Do not show the plot, but export
#it to an embeddable HTML string:
html_plot = df.plot_bokeh(
kind="line",
x="x",
y=["sin(x)", "cos(x)", "tan(x)"],
xticks=range(-20, 20),
title="Trigonometric functions",
show_figure=False,
return_html=True,
ylim=(-1.5, 1.5))

#Write some HTML and embed the HTML plot below it. For production use, please use
#Templates and the awesome Jinja library.
html = r"""
<script type="text/x-mathjax-config">
MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\$','\$']]}});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>

<h1> Trigonometric functions </h1>

<p> The basic trigonometric functions are:</p>

<p>$sin(x)$</p>
<p>$cos(x)$</p>
<p>$tan(x) = \frac{sin(x)}{cos(x)}$</p>

<p>Below is a plot that shows them</p>

""" + html_plot

#Export the HTML string to an external HTML file and show it:
with open("test.html" , "w") as f:
f.write(html)

import webbrowser
webbrowser.open("test.html")

This code will open up a webbrowser and show the following page. As you can see, the interactive Bokeh plot is embedded nicely into the HTML layout. The return_html option is ideal for the use in a templating engine like Jinja.

Auto Scaling Plots

For single plots that have a number of x axis values or for larger monitors, you can auto scale the figure to the width of the entire jupyter cell by setting the sizing_mode parameter.

df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd']) df.plot_bokeh(kind="bar", figsize=(500, 200), sizing_mode="scale_width")

The figsize parameter can be used to change the height and width as well as act as a scaling multiplier against the axis that is not being scaled.

Number formats

To change the formats of numbers in the hovertool, use the number_format keyword argument. For a documentation about the format to pass, have a look at the Bokeh documentation.Let us consider some examples for the number 3.141592653589793:

This number format will be applied to all numeric columns of the hovertool. If you want to make a very custom or complicated hovertool, you should probably use the hovertool_string keyword argument, see e.g. this example. Below, we use the number_format parameter to specify the "Stock Price" format to 2 decimal digits and an additional $sign. import numpy as np #Lineplot: np.random.seed(42) df = pd.DataFrame({ "Google": np.random.randn(1000) + 0.2, "Apple": np.random.randn(1000) + 0.17 }, index=pd.date_range('1/1/2000', periods=1000)) df = df.cumsum() df = df + 50 df.plot_bokeh( kind="line", title="Apple vs Google", xlabel="Date", ylabel="Stock price [$]",
yticks=[0, 100, 200, 300, 400],
ylim=(0, 400),
colormap=["red", "blue"],
number_format="1.00 $") Suppress scientific notation for axes If you want to suppress the scientific notation for axes, you can use the disable_scientific_axes parameter, which accepts one of "x", "y", "xy": df = pd.DataFrame({"Animal": ["Mouse", "Rabbit", "Dog", "Tiger", "Elefant", "Wale"], "Weight [g]": [19, 3000, 40000, 200000, 6000000, 50000000]}) p_scientific = df.plot_bokeh(x="Animal", y="Weight [g]", show_figure=False) p_non_scientific = df.plot_bokeh(x="Animal", y="Weight [g]", disable_scientific_axes="y", show_figure=False,) pandas_bokeh.plot_grid([[p_scientific, p_non_scientific]], plot_width = 450) Dashboard Layouts As shown in the Scatterplot Example, combining plots with plots or other HTML elements is straighforward in Pandas-Bokeh due to the layout capabilities of Bokeh. The easiest way to generate a dashboard layout is using the pandas_bokeh.plot_grid method (which is an extension of bokeh.layouts.gridplot): import pandas as pd import numpy as np import pandas_bokeh pandas_bokeh.output_notebook() #Barplot: data = { 'fruits': ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'], '2015': [2, 1, 4, 3, 2, 4], '2016': [5, 3, 3, 2, 4, 6], '2017': [3, 2, 4, 4, 5, 3] } df = pd.DataFrame(data).set_index("fruits") p_bar = df.plot_bokeh( kind="bar", ylabel="Price per Unit [€]", title="Fruit prices per Year", show_figure=False) #Lineplot: np.random.seed(42) df = pd.DataFrame({ "Google": np.random.randn(1000) + 0.2, "Apple": np.random.randn(1000) + 0.17 }, index=pd.date_range('1/1/2000', periods=1000)) df = df.cumsum() df = df + 50 p_line = df.plot_bokeh( kind="line", title="Apple vs Google", xlabel="Date", ylabel="Stock price [$]",
yticks=[0, 100, 200, 300, 400],
ylim=(0, 400),
colormap=["red", "blue"],
show_figure=False)

#Scatterplot:
df = pd.DataFrame(iris["data"])
df.columns = iris["feature_names"]
df["species"] = iris["target"]
df["species"] = df["species"].map(dict(zip(range(3), iris["target_names"])))
p_scatter = df.plot_bokeh(
kind="scatter",
x="petal length (cm)",
y="sepal width (cm)",
category="species",
title="Iris DataSet Visualization",
show_figure=False)

#Histogram:
df_hist = pd.DataFrame({
'a': np.random.randn(1000) + 1,
'b': np.random.randn(1000),
'c': np.random.randn(1000) - 1
},
columns=['a', 'b', 'c'])

p_hist = df_hist.plot_bokeh(
kind="hist",
bins=np.arange(-6, 6.5, 0.5),
vertical_xlabel=True,
normed=100,
hovertool=False,
title="Normal distributions",
show_figure=False)

#Make Dashboard with Grid Layout:
pandas_bokeh.plot_grid([[p_line, p_bar],
[p_scatter, p_hist]], plot_width=450)

Using a combination of row and column elements (see also Bokeh Layouts) allow for a very easy general arrangement of elements. An alternative layout to the one above is:

p_line.plot_width = 900
p_hist.plot_width = 900

layout = pandas_bokeh.column(p_line,
pandas_bokeh.row(p_scatter, p_bar),
p_hist)

pandas_bokeh.show(layout)

Release Notes

Release Notes can be found here.

Contributing to Pandas-Bokeh

If you wish to contribute to the development of Pandas-Bokeh you can follow the instructions on the CONTRIBUTING.md.

Author: PatrikHlobil
Source Code: https://github.com/PatrikHlobil/Pandas-Bokeh

1597027020

Background

In earlier posts, we have already discussed how using an R language is beneficial to the researchers in several ways (here). It provides many benefits such as reproducible research, greater visualization, and a huge audience for your research. Also, we have discussed how and where the research contributed in the R can be published in reputed journals, here.

Now, consider, you are ready with some research contribution in the R and now in a position to draft a manuscript for the methodology or the R package.

If it is a methodology implemented in R and you wish to discuss its performance in your manuscript. In such a case, you may start with an introduction, literature review, proposed methodology sections followed by the results, performance evaluation, and plots you achieved in the R. Finally, you may conclude the manuscript with some remarks. If it is so, you will start by writing all the texts in the respective sections, citations, tables, equations, and figures. These figures and tables can be the output of the analysis done in R. If you are willing to draft the manuscript in LaTeX in the traditional way, you will chalk out all analysis and make a table in the Latex, and you may upload the same figure produced in the R study.

Consider it is a manuscript introducing an R package, the manuscript will be full of R code chunks, R function descriptions, and respective case studies with output plots. It can be really challenging for authors to draft such kind of manuscript with code chunks and other R material in traditional latex format of the journals. You might have observed in several articles, some of them have written the codes very much similar to the original format and fonts, but many of them failed to do so. To avoid the code chunks in the manuscript, some authors preferred to provide the code as a supplement material.

This situation can be explained in other words as a researcher is ready with codes, desired results, and plots, but he/she can not use this content directly to his/her manuscript. Rather, he/she needs to save the results outside the R-Studio and then to embed it in the latex of the manuscript.

To avoid such a situation, this post is demonstrating how R and R-Studio can be used to write a manuscript for several journals with minimum efforts and how the R codes chunks and corresponding output can be directly embedded in the manuscript.

#publishing #data-science #journal #r #research #data analysis

1598740560

What You Can Learn about Setting from Classic Sitcoms

Giving your novel a strong sense of place is vital to doing your part to engage the readers without confusing or frustrating them. Setting is a big part of this (though not the whole enchilada — there is also social context and historic period), and I often find writing students and consulting clients erring on one of two extremes.

**Either: **Every scene is set in a different, elaborately-described place from the last. This leads to confusion (and possibly exhaustion and impatience) for the reader, because they have no sense of what they need to actually pay attention to for later and what’s just…there. Are the details of that forest in chapter 2 important? Will I ever be back in this castle again? Is there a reason for this character to be in this particular room versus the one she was in the last time I saw her? Who knows!

Or: There are few or no clues at all as to where the characters are in a scene. What’s in the room? Are they even in a room? Are there other people in th — ope, yes, there are, someone just materialized, what is happening? This all leads to the dreaded “brains in jars” syndrome. That is, characters are only their thoughts and words, with no grounding in the space-time continuum. No one seems to be in a place, in a body, at a time of day.

Everything aspect of writing a novel comes with its difficulties, and there are a lot of moving pieces to manage and deploy in the right balance. When you’re a newer writer, especially, there’s something to be said for keeping things simple until you have a handle on how to manage the arc and scope of a novel-length work. And whether you tend to overdo settings or underdo them, you can learn something from TV, especially classic sitcoms.

Your basic “live studio audience” sitcoms are performed and filmed on sets built inside studios vs. on location. This helps keep production expenses in check and helps the viewer feel at home — there’s a reliable and familiar container to hold the story of any given episode. The writers on the show don’t have to reinvent the wheel with every script.

Often, a show will have no more than two or three basic sets that are used episode to episode, and then a few other easily-understood sets (characters’ workplaces, restaurants, streets scenes) are also used regularly but not every episode.

#creative-writing #writing-exercise #writing-craft #writing #writing-tips #machine learning

1607333509

Write My Paper for Me Online: Can You Do My Research Paper USA? TheWritingPlanet.com

We understand the paint that understudies need to experience every day considering the course that at whatever point you have completed all your astute necessities for your Ph.D. or of course MA, you’ll need to begin searching after the piece. So what do you think? Okay have the choice to control it with no other individual, or you require dissertation help from a readied capable? As a general rule, it’s a long cycle as you need to make a broad paper out of around 10 to 20 thousand words starting with the presentation of your subject and some time later your explanations behind why it should be dissected. Following the model, you’ll need to make a creation audit that will join the pushing assessments that will other than control your paper’s theoretical structure. It sounds tangled. Really, it is.

This assignment is one of the most tiring and unusual ones an understudy needs to look during his 16 to 18 years of training. In any case, once more, we have your back this time too. I handle the focuses above may have made you anxious about this undertaking, in any case we have proficient **dissertation service**s who give marvelous work affiliations. You’re at the ideal spot and that too at the ideal time since we have a markdown on the all out of our creation relationship at TheWritingPlanet. Subsequently, it’s the best an ideal open entry for you to benefit of our affiliations while you discharge your anxiety by chilling. In any case, it is commonly your decision to do it without anyone’s help or select an expert to do it for you.

You comprehend that a recommendation or hypothesis is unquestionably new for you, and you’ll need to battle while guiding it, so it will require an enormous heap of effort for you to begin and thusly finish it. Do you think you have great event to do it? Or of course plainly would you have the decision to do it as enough as an expert would do it? Contemplating everything, you’ll find your answer when you utilize TheWritingPlanet’s piece benefits that will help you in creation your hypothesis, proposition, or reference paper. Considering everything, remember you take hypothesis help from our association. Considering, you’ll be getting the help of a Ph.D. degree holder who has enormous combination with making a few affiliations and proposal papers beginning at now. It would other than help on the off chance that you didn’t stress over passing marks since you will accomplish the most raised of all. Do you know why? This is on the grounds that our party of journalists wires Ph.D. holders who were top graders in their particular fields and schools. Accordingly, you will get ensured results at TheWritingPlanet.

Is it affirmed that you are pushed considering the way that your cutoff time is moving closer?

Cutoff time a basic bit of the time changes into a shocking dream for understudies, and they everything thought about breeze up zeroing in on themselves. It moreover causes authentic clinical issues, for example, prepared assaults and dread assaults. Thinking about these parts, quite a while back, we started our trim office, which has appeared at the raised level at this point. In a little while, starting at now, the understudies are in an unclear condition, and we are their confirmations.

Teachers don’t like that the time they give for finishing the speculation isn’t sufficient for an understudy considering the way that an epic piece of the understudies have a colossal heap of different things on their can list. Some of them need to consider their family; some need to deal with their positions while some sit back pushing. Thusly, if your cutoff time is drawing nearer soon, you’re so far in no disposition regardless, your suggestion paper with no other individual, by then you’ll truly need to pick and hand over your undertaking to us. We won’t let you gobble up additional time since it continues moving endlessly from our hands, and we don’t have anything left back near the end.

We can help you at essential occasions when you perceive its absolutely unfathomable left you can do your article or proposition paper. On the off chance that you complete these assignments by us, you’ll clear as can be get your scholastics direct on target with surprising outcomes. Our makers are the best in their fields, and they outfit our customers with the best affiliations.

#custom-writing-services #write-my-paper #the-writing-planet #cheap-dissertation #the-writing-planet

1594399440

Getting Started with R Markdown — Guide and Cheatsheet

In this blog post, we’ll look at how to use R Markdown. By the end, you’ll have the skills you need to produce a document or presentation using R Mardown, from scratch!

We’ll show you how to convert the default R Markdown document into a useful reference guide of your own. We encourage you to follow along by building out your own R Markdown guide, but if you prefer to just read along, that works, too!

R Markdown is an open-source tool for producing reproducible reports in R. It enables you to keep all of your code, results, plots, and writing in one place. R Markdown is particularly useful when you are producing a document for an audience that is interested in the results from your analysis, but not your code.

R Markdown is powerful because it can be used for data analysis and data science, collaborating with others, and communicating results to decision makers. With R Markdown, you have the option to export your work to numerous formats including PDF, Microsoft Word, a slideshow, or an HTML document for use in a website.

Turn your data analysis into pretty documents with R Markdown.

We’ll use the RStudio integrated development environment (IDE) to produce our R Markdown reference guide. If you’d like to learn more about RStudio, check out our list of 23 awesome RStudio tips and tricks!

Here at Dataquest, we love using R Markdown for coding in R and authoring content. In fact, we wrote this blog post in R Markdown! Also, learners on the Dataquest platform use R Markdown for completing their R projects.

We included fully-reproducible code examples in this blog post. When you’ve mastered the content in this post, check out our other blog post on R Markdown tips, tricks, and shortcuts.

Okay, let’s get started with building our very own R Markdown reference document!

1. Install R Markdown

R Markdown is a free, open source tool that is installed like any other R package. Use the following command to install R Markdown:

install.packages("rmarkdown")


Now that R Markdown is installed, open a new R Markdown file in RStudio by navigating to File > New File > R Markdown…. R Markdown files have the file extension “.Rmd”.

2. Default Output Format

When you open a new R Markdown file in RStudio, a pop-up window appears that prompts you to select output format to use for the document.

The default output format is HTML. With HTML, you can easily view it in a web browser.

We recommend selecting the default HTML setting for now — it can save you time! Why? Because compiling an HTML document is generally faster than generating a PDF or other format. When you near a finished product, you change the output to the format of your choosing and then make the final touches.

One final thing to note is that the title you give your document in the pop-up above is not the file name! Navigate to File > Save As..` to name, and save, the document.

#data science tutorials #beginner #r #r markdown #r tutorial #r tutorials #rstats #rstudio #tutorial #tutorials