1591595160
Using the base plotting system in R can seem overwhelming, but it doesn’t have to be.
Base plotting in R can be intimidating. It takes a canvas approach to plot construction, allowing you to paint layer after layer of detail onto your graphics. As a result, there is a seemingly endless number of functions and attributes to learn, but there’s no need to panic or jump straight to ggplot. This article highlights the versatility of the base plot() function while giving you some quick tricks to create beautiful plots.
#data-visualization #data-science #r #learning-to-code #plotting
1677494820
zoofs
is a Python library for performing feature selection using a variety of nature inspired wrapper algorithms. The algorithms range from swarm-intelligence to physics based to Evolutionary. It's an easy to use, flexible and powerful tool to reduce your feature size.
timeout
as a parameter to stop operation after the given number of second(s). An amazing alternative to passing number of iterationsUse the package manager to install zoofs.
pip install zoofs
Algorithm Name | Class Name | Description | References doi |
---|---|---|---|
Particle Swarm Algorithm | ParticleSwarmOptimization | Utilizes swarm behaviour | https://doi.org/10.1007/978-3-319-13563-2_51 |
Grey Wolf Algorithm | GreyWolfOptimization | Utilizes wolf hunting behaviour | https://doi.org/10.1016/j.neucom.2015.06.083 |
Dragon Fly Algorithm | DragonFlyOptimization | Utilizes dragonfly swarm behaviour | https://doi.org/10.1016/j.knosys.2020.106131 |
Harris Hawk Algorithm | HarrisHawkOptimization | Utilizes hawk hunting behaviour | https://link.springer.com/chapter/10.1007/978-981-32-9990-0_12 |
Genetic Algorithm Algorithm | GeneticOptimization | Utilizes genetic mutation behaviour | https://doi.org/10.1109/ICDAR.2001.953980 |
Gravitational Algorithm | GravitationalOptimization | Utilizes newtons gravitational behaviour | https://doi.org/10.1109/ICASSP.2011.5946916 |
More algos soon, stay tuned !
Define your own objective function for optimization !
from sklearn.metrics import log_loss
# define your own objective function, make sure the function receives four parameters,
# fit your model and return the objective value !
def objective_function_topass(model,X_train, y_train, X_valid, y_valid):
model.fit(X_train,y_train)
P=log_loss(y_valid,model.predict_proba(X_valid))
return P
# import an algorithm !
from zoofs import ParticleSwarmOptimization
# create object of algorithm
algo_object=ParticleSwarmOptimization(objective_function_topass,n_iteration=20,
population_size=20,minimize=True)
import lightgbm as lgb
lgb_model = lgb.LGBMClassifier()
# fit the algorithm
algo_object.fit(lgb_model,X_train, y_train, X_valid, y_valid,verbose=True)
#plot your results
algo_object.plot_history()
from sklearn.metrics import mean_squared_error
# define your own objective function, make sure the function receives four parameters,
# fit your model and return the objective value !
def objective_function_topass(model,X_train, y_train, X_valid, y_valid):
model.fit(X_train,y_train)
P=mean_squared_error(y_valid,model.predict(X_valid))
return P
# import an algorithm !
from zoofs import ParticleSwarmOptimization
# create object of algorithm
algo_object=ParticleSwarmOptimization(objective_function_topass,n_iteration=20,
population_size=20,minimize=True)
import lightgbm as lgb
lgb_model = lgb.LGBMRegressor()
# fit the algorithm
algo_object.fit(lgb_model,X_train, y_train, X_valid, y_valid,verbose=True)
#plot your results
algo_object.plot_history()
Particle Swarm Algorithm
In computational science, particle swarm optimization (PSO) is a computational method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. It solves a problem by having a population of candidate solutions, here dubbed particles, and moving these particles around in the search-space according to simple mathematical formula over the particle's position and velocity. Each particle's movement is influenced by its local best known position, but is also guided toward the best known positions in the search-space, which are updated as better positions are found by other particles. This is expected to move the swarm toward the best solutions.
Parameters |
The function must return a value, that needs to be minimized/maximized.
Number of time the algorithm will run
Stop operation after the given number of second(s). If this argument is set to None, the operation is executed without time limitation and n_iteration is followed
Total size of the population
Defines if the objective value is to be maximized or minimized
first acceleration coefficient of particle swarm
second acceleration coefficient of particle swarm
weight parameter |
Attributes |
Final best set of features |
Methods | Class Name |
---|---|
fit | Run the algorithm |
plot_history | Plot results achieved across iteration |
Parameters |
machine learning model's object
Training input samples to be used for machine learning model
The target values (class labels in classification, real numbers in regression).
Validation input samples
The Validation target values .
Print results for iterations |
Returns |
Final best set of features |
Plot results across iterations
from sklearn.metrics import log_loss
# define your own objective function, make sure the function receives four parameters,
# fit your model and return the objective value !
def objective_function_topass(model,X_train, y_train, X_valid, y_valid):
model.fit(X_train,y_train)
P=log_loss(y_valid,model.predict_proba(X_valid))
return P
# import an algorithm !
from zoofs import ParticleSwarmOptimization
# create object of algorithm
algo_object=ParticleSwarmOptimization(objective_function_topass,n_iteration=20,
population_size=20,minimize=True,c1=2,c2=2,w=0.9)
import lightgbm as lgb
lgb_model = lgb.LGBMClassifier()
# fit the algorithm
algo_object.fit(lgb_model,X_train, y_train, X_valid, y_valid,verbose=True)
#plot your results
algo_object.plot_history()
Grey Wolf Algorithm
The Grey Wolf Optimizer (GWO) mimics the leadership hierarchy and hunting mechanism of grey wolves in nature. Four types of grey wolves such as alpha, beta, delta, and omega are employed for simulating the leadership hierarchy. In addition, three main steps of hunting, searching for prey, encircling prey, and attacking prey, are implemented to perform optimization.
Parameters |
The function must return a value, that needs to be minimized/maximized.
Number of time the algorithm will run
Stop operation after the given number of second(s). If this argument is set to None, the operation is executed without time limitation and n_iteration is followed
Total size of the population
Choose the between the two methods of grey wolf optimization
Defines if the objective value is to be maximized or minimized |
Attributes |
Final best set of features |
Methods | Class Name |
---|---|
fit | Run the algorithm |
plot_history | Plot results achieved across iteration |
Parameters |
machine learning model's object
Training input samples to be used for machine learning model
The target values (class labels in classification, real numbers in regression).
Validation input samples
The Validation target values .
Print results for iterations |
Returns |
Final best set of features |
Plot results across iterations
from sklearn.metrics import log_loss
# define your own objective function, make sure the function receives four parameters,
# fit your model and return the objective value !
def objective_function_topass(model,X_train, y_train, X_valid, y_valid):
model.fit(X_train,y_train)
P=log_loss(y_valid,model.predict_proba(X_valid))
return P
# import an algorithm !
from zoofs import GreyWolfOptimization
# create object of algorithm
algo_object=GreyWolfOptimization(objective_function_topass,n_iteration=20,method=1,
population_size=20,minimize=True)
import lightgbm as lgb
lgb_model = lgb.LGBMClassifier()
# fit the algorithm
algo_object.fit(lgb_model,X_train, y_train, X_valid, y_valid,verbose=True)
#plot your results
algo_object.plot_history()
Dragon Fly Algorithm
The main inspiration of the Dragonfly Algorithm (DA) algorithm originates from static and dynamic swarming behaviours. These two swarming behaviours are very similar to the two main phases of optimization using meta-heuristics: exploration and exploitation. Dragonflies create sub swarms and fly over different areas in a static swarm, which is the main objective of the exploration phase. In the static swarm, however, dragonflies fly in bigger swarms and along one direction, which is favourable in the exploitation phase.
Parameters |
The function must return a value, that needs to be minimized/maximized.
Number of time the algorithm will run
Stop operation after the given number of second(s). If this argument is set to None, the operation is executed without time limitation and n_iteration is followed
Total size of the population
Choose the between the three methods of Dragon Fly optimization
Defines if the objective value is to be maximized or minimized |
Attributes |
Final best set of features |
Methods | Class Name |
---|---|
fit | Run the algorithm |
plot_history | Plot results achieved across iteration |
Parameters |
machine learning model's object
Training input samples to be used for machine learning model
The target values (class labels in classification, real numbers in regression).
Validation input samples
The Validation target values .
Print results for iterations |
Returns |
Final best set of features |
Plot results across iterations
from sklearn.metrics import log_loss
# define your own objective function, make sure the function receives four parameters,
# fit your model and return the objective value !
def objective_function_topass(model,X_train, y_train, X_valid, y_valid):
model.fit(X_train,y_train)
P=log_loss(y_valid,model.predict_proba(X_valid))
return P
# import an algorithm !
from zoofs import DragonFlyOptimization
# create object of algorithm
algo_object=DragonFlyOptimization(objective_function_topass,n_iteration=20,method='sinusoidal',
population_size=20,minimize=True)
import lightgbm as lgb
lgb_model = lgb.LGBMClassifier()
# fit the algorithm
algo_object.fit(lgb_model,X_train, y_train, X_valid, y_valid, verbose=True)
#plot your results
algo_object.plot_history()
Harris Hawk Optimization
HHO is a popular swarm-based, gradient-free optimization algorithm with several active and time-varying phases of exploration and exploitation. This algorithm initially published by the prestigious Journal of Future Generation Computer Systems (FGCS) in 2019, and from the first day, it has gained increasing attention among researchers due to its flexible structure, high performance, and high-quality results. The main logic of the HHO method is designed based on the cooperative behaviour and chasing styles of Harris' hawks in nature called "surprise pounce". Currently, there are many suggestions about how to enhance the functionality of HHO, and there are also several enhanced variants of the HHO in the leading Elsevier and IEEE transaction journals.
Parameters |
The function must return a value, that needs to be minimized/maximized.
Number of time the algorithm will run
Stop operation after the given number of second(s). If this argument is set to None, the operation is executed without time limitation and n_iteration is followed
Total size of the population
Defines if the objective value is to be maximized or minimized
value for levy random walk |
Attributes |
Final best set of features |
Methods | Class Name |
---|---|
fit | Run the algorithm |
plot_history | Plot results achieved across iteration |
Parameters |
machine learning model's object
Training input samples to be used for machine learning model
The target values (class labels in classification, real numbers in regression).
Validation input samples
The Validation target values .
Print results for iterations |
Returns |
Final best set of features |
Plot results across iterations
from sklearn.metrics import log_loss
# define your own objective function, make sure the function receives four parameters,
# fit your model and return the objective value !
def objective_function_topass(model,X_train, y_train, X_valid, y_valid):
model.fit(X_train,y_train)
P=log_loss(y_valid,model.predict_proba(X_valid))
return P
# import an algorithm !
from zoofs import HarrisHawkOptimization
# create object of algorithm
algo_object=HarrisHawkOptimization(objective_function_topass,n_iteration=20,
population_size=20,minimize=True)
import lightgbm as lgb
lgb_model = lgb.LGBMClassifier()
# fit the algorithm
algo_object.fit(lgb_model,X_train, y_train, X_valid, y_valid,verbose=True)
#plot your results
algo_object.plot_history()
Genetic Algorithm
In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on biologically inspired operators such as mutation, crossover and selection. Some examples of GA applications include optimizing decision trees for better performance, automatically solve sudoku puzzles, hyperparameter optimization, etc.
Parameters |
The function must return a value, that needs to be minimized/maximized.
Number of time the algorithm will run
Stop operation after the given number of second(s). If this argument is set to None, the operation is executed without time limitation and n_iteration is followed
Total size of the population
measure of reproductive opportunities for each organism in the population
number of top individuals to be considered as elites
rate of mutation in the population's gene
Defines if the objective value is to be maximized or minimized |
Attributes |
Final best set of features |
Methods | Class Name |
---|---|
fit | Run the algorithm |
plot_history | Plot results achieved across iteration |
Parameters |
machine learning model's object
Training input samples to be used for machine learning model
The target values (class labels in classification, real numbers in regression).
Validation input samples
The Validation target values .
Print results for iterations |
Returns |
Final best set of features |
Plot results across iterations
from sklearn.metrics import log_loss
# define your own objective function, make sure the function receives four parameters,
# fit your model and return the objective value !
def objective_function_topass(model,X_train, y_train, X_valid, y_valid):
model.fit(X_train,y_train)
P=log_loss(y_valid,model.predict_proba(X_valid))
return P
# import an algorithm !
from zoofs import GeneticOptimization
# create object of algorithm
algo_object=GeneticOptimization(objective_function_topass,n_iteration=20,
population_size=20,selective_pressure=2,elitism=2,
mutation_rate=0.05,minimize=True)
import lightgbm as lgb
lgb_model = lgb.LGBMClassifier()
# fit the algorithm
algo_object.fit(lgb_model,X_train, y_train,X_valid, y_valid, verbose=True)
#plot your results
algo_object.plot_history()
Gravitational Algorithm
Gravitational Algorithm is based on the law of gravity and mass interactions is introduced. In the algorithm, the searcher agents are a collection of masses which interact with each other based on the Newtonian gravity and the laws of motion.
Parameters |
The function must return a value, that needs to be minimized/maximized.
Number of time the algorithm will run
Stop operation after the given number of second(s). If this argument is set to None, the operation is executed without time limitation and n_iteration is followed
Total size of the population
gravitational strength constant
distance constant
Defines if the objective value is to be maximized or minimized |
Attributes |
Final best set of features |
Methods | Class Name |
---|---|
fit | Run the algorithm |
plot_history | Plot results achieved across iteration |
Parameters |
machine learning model's object
Training input samples to be used for machine learning model
The target values (class labels in classification, real numbers in regression).
Validation input samples
The Validation target values .
Print results for iterations |
Returns |
Final best set of features |
Plot results across iterations
from sklearn.metrics import log_loss
# define your own objective function, make sure the function receives four parameters,
# fit your model and return the objective value !
def objective_function_topass(model,X_train, y_train, X_valid, y_valid):
model.fit(X_train,y_train)
P=log_loss(y_valid,model.predict_proba(X_valid))
return P
# import an algorithm !
from zoofs import GravitationalOptimization
# create object of algorithm
algo_object=GravitationalOptimization(objective_function_topass,n_iteration=50,
population_size=50,g0=100,eps=0.5,minimize=True)
import lightgbm as lgb
lgb_model = lgb.LGBMClassifier()
# fit the algorithm
algo_object.fit(lgb_model,X_train, y_train, X_valid, y_valid, verbose=True)
#plot your results
algo_object.plot_history()
zoofs
The development of zoofs
relies completely on contributions.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
18,08,2021
🌟 Like this Project? Give us a star !
https://jaswinder9051998.github.io/zoofs/
Author: jaswinder9051998
Source Code: https://github.com/jaswinder9051998/zoofs
License: Apache-2.0 license
1649209980
A cross-platform command line REPL for the rapid experimentation and exploration of C#. It supports intellisense, installing NuGet packages, and referencing local .NET projects and assemblies.
(click to view animation)
C# REPL provides the following features:
C# REPL is a .NET 6 global tool, and runs on Windows 10, Mac OS, and Linux. It can be installed via:
dotnet tool install -g csharprepl
If you're running on Mac OS Catalina (10.15) or later, make sure you follow any additional directions printed to the screen. You may need to update your PATH variable in order to use .NET global tools.
After installation is complete, run csharprepl
to begin. C# REPL can be updated via dotnet tool update -g csharprepl
.
Run csharprepl
from the command line to begin an interactive session. The default colorscheme uses the color palette defined by your terminal, but these colors can be changed using a theme.json
file provided as a command line argument.
Type some C# into the prompt and press Enter to run it. The result, if any, will be printed:
> Console.WriteLine("Hello World")
Hello World
> DateTime.Now.AddDays(8)
[6/7/2021 5:13:00 PM]
To evaluate multiple lines of code, use Shift+Enter to insert a newline:
> var x = 5;
var y = 8;
x * y
40
Additionally, if the statement is not a "complete statement" a newline will automatically be inserted when Enter is pressed. For example, in the below code, the first line is not a syntactically complete statement, so when we press enter we'll go down to a new line:
> if (x == 5)
| // caret position, after we press Enter on Line 1
Finally, pressing Ctrl+Enter will show a "detailed view" of the result. For example, for the DateTime.Now
expression below, on the first line we pressed Enter, and on the second line we pressed Ctrl+Enter to view more detailed output:
> DateTime.Now // Pressing Enter shows a reasonable representation
[5/30/2021 5:13:00 PM]
> DateTime.Now // Pressing Ctrl+Enter shows a detailed representation
[5/30/2021 5:13:00 PM] {
Date: [5/30/2021 12:00:00 AM],
Day: 30,
DayOfWeek: Sunday,
DayOfYear: 150,
Hour: 17,
InternalKind: 9223372036854775808,
InternalTicks: 637579915804530992,
Kind: Local,
Millisecond: 453,
Minute: 13,
Month: 5,
Second: 0,
Ticks: 637579915804530992,
TimeOfDay: [17:13:00.4530992],
Year: 2021,
_dateData: 9860951952659306800
}
A note on semicolons: C# expressions do not require semicolons, but statements do. If a statement is missing a required semicolon, a newline will be added instead of trying to run the syntatically incomplete statement; simply type the semicolon to complete the statement.
> var now = DateTime.Now; // assignment statement, semicolon required
> DateTime.Now.AddDays(8) // expression, we don't need a semicolon
[6/7/2021 5:03:05 PM]
Use the #r
command to add assembly or nuget references.
#r "AssemblyName"
or #r "path/to/assembly.dll"
#r "path/to/project.csproj"
. Solution files (.sln) can also be referenced.#r "nuget: PackageName"
to install the latest version of a package, or #r "nuget: PackageName, 13.0.5"
to install a specific version (13.0.5 in this case).To run ASP.NET applications inside the REPL, start the csharprepl
application with the --framework
parameter, specifying the Microsoft.AspNetCore.App
shared framework. Then, use the above #r
command to reference the application DLL. See the Command Line Configuration section below for more details.
csharprepl --framework Microsoft.AspNetCore.App
The C# REPL supports multiple configuration flags to control startup, behavior, and appearance:
csharprepl [OPTIONS] [response-file.rsp] [script-file.csx] [-- <additional-arguments>]
Supported options are:
-r <dll>
or --reference <dll>
: Reference an assembly, project file, or nuget package. Can be specified multiple times. Uses the same syntax as #r
statements inside the REPL. For example, csharprepl -r "nuget:Newtonsoft.Json" "path/to/myproj.csproj"
-u <namespace>
or --using <namespace>
: Add a using statement. Can be specified multiple times.-f <framework>
or --framework <framework>
: Reference a shared framework. The available shared frameworks depends on the local .NET installation, and can be useful when running an ASP.NET application from the REPL. Example frameworks are:-t <theme.json>
or --theme <theme.json>
: Read a theme file for syntax highlighting. This theme file associates C# syntax classifications with colors. The color values can be full RGB, or ANSI color names (defined in your terminal's theme). The NO_COLOR standard is supported.--trace
: Produce a trace file in the current directory that logs CSharpRepl internals. Useful for CSharpRepl bug reports.-v
or --version
: Show version number and exit.-h
or --help
: Show help and exit.response-file.rsp
: A filepath of an .rsp file, containing any of the above command line options.script-file.csx
: A filepath of a .csx file, containing lines of C# to evaluate before starting the REPL. Arguments to this script can be passed as <additional-arguments>
, after a double hyphen (--
), and will be available in a global args
variable.If you have dotnet-suggest
enabled, all options can be tab-completed, including values provided to --framework
and .NET namespaces provided to --using
.
C# REPL is a standalone software application, but it can be useful to integrate it with other developer tools:
To add C# REPL as a menu entry in Windows Terminal, add the following profile to Windows Terminal's settings.json
configuration file (under the JSON property profiles.list
):
{
"name": "C# REPL",
"commandline": "csharprepl"
},
To get the exact colors shown in the screenshots in this README, install the Windows Terminal Dracula theme.
To use the C# REPL with Visual Studio Code, simply run the csharprepl
command in the Visual Studio Code terminal. To send commands to the REPL, use the built-in Terminal: Run Selected Text In Active Terminal
command from the Command Palette (workbench.action.terminal.runSelectedText
).
To add the C# REPL to the Windows Start Menu for quick access, you can run the following PowerShell command, which will start C# REPL in Windows Terminal:
$shell = New-Object -ComObject WScript.Shell
$shortcut = $shell.CreateShortcut("$env:appdata\Microsoft\Windows\Start Menu\Programs\csharprepl.lnk")
$shortcut.TargetPath = "wt.exe"
$shortcut.Arguments = "-w 0 nt csharprepl.exe"
$shortcut.Save()
You may also wish to add a shorter alias for C# REPL, which can be done by creating a .cmd
file somewhere on your path. For example, put the following contents in C:\Users\username\.dotnet\tools\csr.cmd
:
wt -w 0 nt csharprepl
This will allow you to launch C# REPL by running csr
from anywhere that accepts Windows commands, like the Window Run dialog.
This project is far from being the first REPL for C#. Here are some other projects; if this project doesn't suit you, another one might!
Visual Studio's C# Interactive pane is full-featured (it has syntax highlighting and intellisense) and is part of Visual Studio. This deep integration with Visual Studio is both a benefit from a workflow perspective, and a drawback as it's not cross-platform. As far as I know, the C# Interactive pane does not support NuGet packages or navigating to documentation/source code. Subjectively, it does not follow typical command line keybindings, so can feel a bit foreign.
csi.exe ships with C# and is a command line REPL. It's great because it's a cross platform REPL that comes out of the box, but it doesn't support syntax highlighting or autocompletion.
dotnet script allows you to run C# scripts from the command line. It has a REPL built-in, but the predominant focus seems to be as a script runner. It's a great tool, though, and has a strong community following.
dotnet interactive is a tool from Microsoft that creates a Jupyter notebook for C#, runnable through Visual Studio Code. It also provides a general framework useful for running REPLs.
Download Details:
Author: waf
Source Code: https://github.com/waf/CSharpRepl
License: MPL-2.0 License
1652748716
Exploratory data analysis is used by data scientists to analyze and investigate data sets and summarize their main characteristics, often employing data visualization methods. It helps determine how best to manipulate data sources to get the answers you need, making it easier for data scientists to discover patterns, spot anomalies, test a hypothesis, or check assumptions. EDA is primarily used to see what data can reveal beyond the formal modeling or hypothesis testing task and provides a better understanding of data set variables and the relationships between them. It can also help determine if the statistical techniques you are considering for data analysis are appropriate or not.
🔹 Topics Covered:
00:00:00 Basics of EDA with Python
01:40:10 Multiple Variate Analysis
02:30:26 Outlier Detection
03:44:48 Cricket World Cup Analysis using Exploratory Data Analysis
If we want to explain EDA in simple terms, it means trying to understand the given data much better, so that we can make some sense out of it.
We can find a more formal definition in Wikipedia.
In statistics, exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task.
EDA in Python uses data visualization to draw meaningful patterns and insights. It also involves the preparation of data sets for analysis by removing irregularities in the data.
Based on the results of EDA, companies also make business decisions, which can have repercussions later.
In this article we’ll see about the following topics:
Data Sourcing is the process of finding and loading the data into our system. Broadly there are two ways in which we can find data.
Private Data
As the name suggests, private data is given by private organizations. There are some security and privacy concerns attached to it. This type of data is used for mainly organizations internal analysis.
Public Data
This type of Data is available to everyone. We can find this in government websites and public organizations etc. Anyone can access this data, we do not need any special permissions or approval.
We can get public data on the following sites.
The very first step of EDA is Data Sourcing, we have seen how we can access data and load into our system. Now, the next step is how to clean the data.
After completing the Data Sourcing, the next step in the process of EDA is Data Cleaning. It is very important to get rid of the irregularities and clean the data after sourcing it into our system.
Irregularities are of different types of data.
To perform the data cleaning we are using a sample data set, which can be found here.
We are using Jupyter Notebook for analysis.
First, let’s import the necessary libraries and store the data in our system for analysis.
#import the useful libraries.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# Read the data set of "Marketing Analysis" in data.
data= pd.read_csv("marketing_analysis.csv")
# Printing the data
data
Now, the data set looks like this,
If we observe the above dataset, there are some discrepancies in the Column header for the first 2 rows. The correct data is from the index number 1. So, we have to fix the first two rows.
This is called Fixing the Rows and Columns. Let’s ignore the first two rows and load the data again.
#import the useful libraries.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# Read the file in data without first two rows as it is of no use.
data = pd.read_csv("marketing_analysis.csv",skiprows = 2)
#print the head of the data frame.
data.head()
Now, the dataset looks like this, and it makes more sense.
Dataset after fixing the rows and columns
Following are the steps to be taken while Fixing Rows and Columns:
Now if we observe the above dataset, the customerid
column has of no importance to our analysis, and also the jobedu
column has both the information of job
and education
in it.
So, what we’ll do is, we’ll drop the customerid
column and we’ll split the jobedu
column into two other columns job
and education
and after that, we’ll drop the jobedu
column as well.
# Drop the customer id as it is of no use.
data.drop('customerid', axis = 1, inplace = True)
#Extract job & Education in newly from "jobedu" column.
data['job']= data["jobedu"].apply(lambda x: x.split(",")[0])
data['education']= data["jobedu"].apply(lambda x: x.split(",")[1])
# Drop the "jobedu" column from the dataframe.
data.drop('jobedu', axis = 1, inplace = True)
# Printing the Dataset
data
Now, the dataset looks like this,
Dropping Customerid
and jobedu columns and adding job and education columns
Missing Values
If there are missing values in the Dataset before doing any statistical analysis, we need to handle those missing values.
There are mainly three types of missing values.
Let’s see which columns have missing values in the dataset.
# Checking the missing values
data.isnull().sum()
The output will be,
As we can see three columns contain missing values. Let’s see how to handle the missing values. We can handle missing values by dropping the missing records or by imputing the values.
Drop the missing Values
Let’s handle missing values in the age
column.
# Dropping the records with age missing in data dataframe.
data = data[~data.age.isnull()].copy()
# Checking the missing values in the dataset.
data.isnull().sum()
Let’s check the missing values in the dataset now.
Let’s impute values to the missing values for the month column.
Since the month column is of an object type, let’s calculate the mode of that column and impute those values to the missing values.
# Find the mode of month in data
month_mode = data.month.mode()[0]
# Fill the missing values with mode value of month in data.
data.month.fillna(month_mode, inplace = True)
# Let's see the null values in the month column.
data.month.isnull().sum()
Now output is,
# Mode of month is
'may, 2017'
# Null values in month column after imputing with mode
0
Handling the missing values in the Response column. Since, our target column is Response Column, if we impute the values to this column it’ll affect our analysis. So, it is better to drop the missing values from Response Column.
#drop the records with response missing in data.
data = data[~data.response.isnull()].copy()
# Calculate the missing values in each column of data frame
data.isnull().sum()
Let’s check whether the missing values in the dataset have been handled or not,
All the missing values have been handled
We can also, fill the missing values as ‘NaN’ so that while doing any statistical analysis, it won’t affect the outcome.
Handling Outliers
We have seen how to fix missing values, now let’s see how to handle outliers in the dataset.
Outliers are the values that are far beyond the next nearest data points.
There are two types of outliers:
So, after understanding the causes of these outliers, we can handle them by dropping those records or imputing with the values or leaving them as is, if it makes more sense.
Standardizing Values
To perform data analysis on a set of values, we have to make sure the values in the same column should be on the same scale. For example, if the data contains the values of the top speed of different companies’ cars, then the whole column should be either in meters/sec scale or miles/sec scale.
Now, that we are clear on how to source and clean the data, let’s see how we can analyze the data.
If we analyze data over a single variable/column from a dataset, it is known as Univariate Analysis.
Categorical Unordered Univariate Analysis:
An unordered variable is a categorical variable that has no defined order. If we take our data as an example, the job column in the dataset is divided into many sub-categories like technician, blue-collar, services, management, etc. There is no weight or measure given to any value in the ‘job’ column.
Now, let’s analyze the job category by using plots. Since Job is a category, we will plot the bar plot.
# Let's calculate the percentage of each job status category.
data.job.value_counts(normalize=True)
#plot the bar graph of percentage job categories
data.job.value_counts(normalize=True).plot.barh()
plt.show()
The output looks like this,
By the above bar plot, we can infer that the data set contains more number of blue-collar workers compared to other categories.
Categorical Ordered Univariate Analysis:
Ordered variables are those variables that have a natural rank of order. Some examples of categorical ordered variables from our dataset are:
Now, let’s analyze the Education Variable from the dataset. Since we’ve already seen a bar plot, let’s see how a Pie Chart looks like.
#calculate the percentage of each education category.
data.education.value_counts(normalize=True)
#plot the pie chart of education categories
data.education.value_counts(normalize=True).plot.pie()
plt.show()
The output will be,
By the above analysis, we can infer that the data set has a large number of them belongs to secondary education after that tertiary and next primary. Also, a very small percentage of them have been unknown.
This is how we analyze univariate categorical analysis. If the column or variable is of numerical then we’ll analyze by calculating its mean, median, std, etc. We can get those values by using the describe function.
data.salary.describe()
The output will be,
If we analyze data by taking two variables/columns into consideration from a dataset, it is known as Bivariate Analysis.
a) Numeric-Numeric Analysis:
Analyzing the two numeric variables from a dataset is known as numeric-numeric analysis. We can analyze it in three different ways.
Scatter Plot
Let’s take three columns ‘Balance’, ‘Age’ and ‘Salary’ from our dataset and see what we can infer by plotting to scatter plot between salary
balance
and age
balance
#plot the scatter plot of balance and salary variable in data
plt.scatter(data.salary,data.balance)
plt.show()
#plot the scatter plot of balance and age variable in data
data.plot.scatter(x="age",y="balance")
plt.show()
Now, the scatter plots looks like,
Pair Plot
Now, let’s plot Pair Plots for the three columns we used in plotting Scatter plots. We’ll use the seaborn library for plotting Pair Plots.
#plot the pair plot of salary, balance and age in data dataframe.
sns.pairplot(data = data, vars=['salary','balance','age'])
plt.show()
The Pair Plot looks like this,
Correlation Matrix
Since we cannot use more than two variables as x-axis and y-axis in Scatter and Pair Plots, it is difficult to see the relation between three numerical variables in a single graph. In those cases, we’ll use the correlation matrix.
# Creating a matrix using age, salry, balance as rows and columns
data[['age','salary','balance']].corr()
#plot the correlation matrix of salary, balance and age in data dataframe.
sns.heatmap(data[['age','salary','balance']].corr(), annot=True, cmap = 'Reds')
plt.show()
First, we created a matrix using age, salary, and balance. After that, we are plotting the heatmap using the seaborn library of the matrix.
b) Numeric - Categorical Analysis
Analyzing the one numeric variable and one categorical variable from a dataset is known as numeric-categorical analysis. We analyze them mainly using mean, median, and box plots.
Let’s take salary
and response
columns from our dataset.
First check for mean value using groupby
#groupby the response to find the mean of the salary with response no & yes separately.
data.groupby('response')['salary'].mean()
The output will be,
There is not much of a difference between the yes and no response based on the salary.
Let’s calculate the median,
#groupby the response to find the median of the salary with response no & yes separately.
data.groupby('response')['salary'].median()
The output will be,
By both mean and median we can say that the response of yes and no remains the same irrespective of the person’s salary. But, is it truly behaving like that, let’s plot the box plot for them and check the behavior.
#plot the box plot of salary for yes & no responses.
sns.boxplot(data.response, data.salary)
plt.show()
The box plot looks like this,
As we can see, when we plot the Box Plot, it paints a very different picture compared to mean and median. The IQR for customers who gave a positive response is on the higher salary side.
This is how we analyze Numeric-Categorical variables, we use mean, median, and Box Plots to draw some sort of conclusions.
c) Categorical — Categorical Analysis
Since our target variable/column is the Response rate, we’ll see how the different categories like Education, Marital Status, etc., are associated with the Response column. So instead of ‘Yes’ and ‘No’ we will convert them into ‘1’ and ‘0’, by doing that we’ll get the “Response Rate”.
#create response_rate of numerical data type where response "yes"= 1, "no"= 0
data['response_rate'] = np.where(data.response=='yes',1,0)
data.response_rate.value_counts()
The output looks like this,
Let’s see how the response rate varies for different categories in marital status.
#plot the bar graph of marital status with average value of response_rate
data.groupby('marital')['response_rate'].mean().plot.bar()
plt.show()
The graph looks like this,
By the above graph, we can infer that the positive response is more for Single status members in the data set. Similarly, we can plot the graphs for Loan vs Response rate, Housing Loans vs Response rate, etc.
If we analyze data by taking more than two variables/columns into consideration from a dataset, it is known as Multivariate Analysis.
Let’s see how ‘Education’, ‘Marital’, and ‘Response_rate’ vary with each other.
First, we’ll create a pivot table with the three columns and after that, we’ll create a heatmap.
result = pd.pivot_table(data=data, index='education', columns='marital',values='response_rate')
print(result)
#create heat map of education vs marital vs response_rate
sns.heatmap(result, annot=True, cmap = 'RdYlGn', center=0.117)
plt.show()
The Pivot table and heatmap looks like this,
Based on the Heatmap we can infer that the married people with primary education are less likely to respond positively for the survey and single people with tertiary education are most likely to respond positively to the survey.
Similarly, we can plot the graphs for Job vs marital vs response, Education vs poutcome vs response, etc.
Conclusion
This is how we’ll do Exploratory Data Analysis. Exploratory Data Analysis (EDA) helps us to look beyond the data. The more we explore the data, the more the insights we draw from it. As a data analyst, almost 80% of our time will be spent understanding data and solving various business problems through EDA.
Thank you for reading and Happy Coding!!!
#dataanalysis #python
1561523460
This Matplotlib cheat sheet introduces you to the basics that you need to plot your data with Python and includes code samples.
Data visualization and storytelling with your data are essential skills that every data scientist needs to communicate insights gained from analyses effectively to any audience out there.
For most beginners, the first package that they use to get in touch with data visualization and storytelling is, naturally, Matplotlib: it is a Python 2D plotting library that enables users to make publication-quality figures. But, what might be even more convincing is the fact that other packages, such as Pandas, intend to build more plotting integration with Matplotlib as time goes on.
However, what might slow down beginners is the fact that this package is pretty extensive. There is so much that you can do with it and it might be hard to still keep a structure when you're learning how to work with Matplotlib.
DataCamp has created a Matplotlib cheat sheet for those who might already know how to use the package to their advantage to make beautiful plots in Python, but that still want to keep a one-page reference handy. Of course, for those who don't know how to work with Matplotlib, this might be the extra push be convinced and to finally get started with data visualization in Python.
You'll see that this cheat sheet presents you with the six basic steps that you can go through to make beautiful plots.
Check out the infographic by clicking on the button below:
With this handy reference, you'll familiarize yourself in no time with the basics of Matplotlib: you'll learn how you can prepare your data, create a new plot, use some basic plotting routines to your advantage, add customizations to your plots, and save, show and close the plots that you make.
What might have looked difficult before will definitely be more clear once you start using this cheat sheet! Use it in combination with the Matplotlib Gallery, the documentation.
Matplotlib
Matplotlib is a Python 2D plotting library which produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms.
>>> import numpy as np
>>> x = np.linspace(0, 10, 100)
>>> y = np.cos(x)
>>> z = np.sin(x)
>>> data = 2 * np.random.random((10, 10))
>>> data2 = 3 * np.random.random((10, 10))
>>> Y, X = np.mgrid[-3:3:100j, -3:3:100j]
>>> U = 1 X** 2 + Y
>>> V = 1 + X Y**2
>>> from matplotlib.cbook import get_sample_data
>>> img = np.load(get_sample_data('axes_grid/bivariate_normal.npy'))
>>> import matplotlib.pyplot as plt
>>> fig = plt.figure()
>>> fig2 = plt.figure(figsize=plt.figaspect(2.0))
>>> fig.add_axes()
>>> ax1 = fig.add_subplot(221) #row-col-num
>>> ax3 = fig.add_subplot(212)
>>> fig3, axes = plt.subplots(nrows=2,ncols=2)
>>> fig4, axes2 = plt.subplots(ncols=3)
>>> plt.savefig('foo.png') #Save figures
>>> plt.savefig('foo.png', transparent=True) #Save transparent figures
>>> plt.show()
>>> fig, ax = plt.subplots()
>>> lines = ax.plot(x,y) #Draw points with lines or markers connecting them
>>> ax.scatter(x,y) #Draw unconnected points, scaled or colored
>>> axes[0,0].bar([1,2,3],[3,4,5]) #Plot vertical rectangles (constant width)
>>> axes[1,0].barh([0.5,1,2.5],[0,1,2]) #Plot horiontal rectangles (constant height)
>>> axes[1,1].axhline(0.45) #Draw a horizontal line across axes
>>> axes[0,1].axvline(0.65) #Draw a vertical line across axes
>>> ax.fill(x,y,color='blue') #Draw filled polygons
>>> ax.fill_between(x,y,color='yellow') #Fill between y values and 0
>>> fig, ax = plt.subplots()
>>> im = ax.imshow(img, #Colormapped or RGB arrays
cmap= 'gist_earth',
interpolation= 'nearest',
vmin=-2,
vmax=2)
>>> axes2[0].pcolor(data2) #Pseudocolor plot of 2D array
>>> axes2[0].pcolormesh(data) #Pseudocolor plot of 2D array
>>> CS = plt.contour(Y,X,U) #Plot contours
>>> axes2[2].contourf(data1) #Plot filled contours
>>> axes2[2]= ax.clabel(CS) #Label a contour plot
>>> axes[0,1].arrow(0,0,0.5,0.5) #Add an arrow to the axes
>>> axes[1,1].quiver(y,z) #Plot a 2D field of arrows
>>> axes[0,1].streamplot(X,Y,U,V) #Plot a 2D field of arrows
>>> ax1.hist(y) #Plot a histogram
>>> ax3.boxplot(y) #Make a box and whisker plot
>>> ax3.violinplot(z) #Make a violin plot
y-axis
x-axis
The basic steps to creating plots with matplotlib are:
1 Prepare Data
2 Create Plot
3 Plot
4 Customized Plot
5 Save Plot
6 Show Plot
>>> import matplotlib.pyplot as plt
>>> x = [1,2,3,4] #Step 1
>>> y = [10,20,25,30]
>>> fig = plt.figure() #Step 2
>>> ax = fig.add_subplot(111) #Step 3
>>> ax.plot(x, y, color= 'lightblue', linewidth=3) #Step 3, 4
>>> ax.scatter([2,4,6],
[5,15,25],
color= 'darkgreen',
marker= '^' )
>>> ax.set_xlim(1, 6.5)
>>> plt.savefig('foo.png' ) #Step 5
>>> plt.show() #Step 6
>>> plt.cla() #Clear an axis
>>> plt.clf(). #Clear the entire figure
>>> plt.close(). #Close a window
>>> plt.plot(x, x, x, x**2, x, x** 3)
>>> ax.plot(x, y, alpha = 0.4)
>>> ax.plot(x, y, c= 'k')
>>> fig.colorbar(im, orientation= 'horizontal')
>>> im = ax.imshow(img,
cmap= 'seismic' )
>>> fig, ax = plt.subplots()
>>> ax.scatter(x,y,marker= ".")
>>> ax.plot(x,y,marker= "o")
>>> plt.plot(x,y,linewidth=4.0)
>>> plt.plot(x,y,ls= 'solid')
>>> plt.plot(x,y,ls= '--')
>>> plt.plot(x,y,'--' ,x**2,y**2,'-.' )
>>> plt.setp(lines,color= 'r',linewidth=4.0)
>>> ax.text(1,
-2.1,
'Example Graph',
style= 'italic' )
>>> ax.annotate("Sine",
xy=(8, 0),
xycoords= 'data',
xytext=(10.5, 0),
textcoords= 'data',
arrowprops=dict(arrowstyle= "->",
connectionstyle="arc3"),)
>>> plt.title(r '$sigma_i=15$', fontsize=20)
Limits & Autoscaling
>>> ax.margins(x=0.0,y=0.1) #Add padding to a plot
>>> ax.axis('equal') #Set the aspect ratio of the plot to 1
>>> ax.set(xlim=[0,10.5],ylim=[-1.5,1.5]) #Set limits for x-and y-axis
>>> ax.set_xlim(0,10.5) #Set limits for x-axis
Legends
>>> ax.set(title= 'An Example Axes', #Set a title and x-and y-axis labels
ylabel= 'Y-Axis',
xlabel= 'X-Axis')
>>> ax.legend(loc= 'best') #No overlapping plot elements
Ticks
>>> ax.xaxis.set(ticks=range(1,5), #Manually set x-ticks
ticklabels=[3,100, 12,"foo" ])
>>> ax.tick_params(axis= 'y', #Make y-ticks longer and go in and out
direction= 'inout',
length=10)
Subplot Spacing
>>> fig3.subplots_adjust(wspace=0.5, #Adjust the spacing between subplots
hspace=0.3,
left=0.125,
right=0.9,
top=0.9,
bottom=0.1)
>>> fig.tight_layout() #Fit subplot(s) in to the figure area
Axis Spines
>>> ax1.spines[ 'top'].set_visible(False) #Make the top axis line for a plot invisible
>>> ax1.spines['bottom' ].set_position(( 'outward',10)) #Move the bottom axis line outward
Have this Cheat Sheet at your fingertips
Original article source at https://www.datacamp.com
#matplotlib #cheatsheet #python
1647064260
Run C# scripts from the .NET CLI, define NuGet packages inline and edit/debug them in VS Code - all of that with full language services support from OmniSharp.
Name | Version | Framework(s) |
---|---|---|
dotnet-script (global tool) | net6.0 , net5.0 , netcoreapp3.1 | |
Dotnet.Script (CLI as Nuget) | net6.0 , net5.0 , netcoreapp3.1 | |
Dotnet.Script.Core | netcoreapp3.1 , netstandard2.0 | |
Dotnet.Script.DependencyModel | netstandard2.0 | |
Dotnet.Script.DependencyModel.Nuget | netstandard2.0 |
The only thing we need to install is .NET Core 3.1 or .NET 5.0 SDK.
.NET Core 2.1 introduced the concept of global tools meaning that you can install dotnet-script
using nothing but the .NET CLI.
dotnet tool install -g dotnet-script
You can invoke the tool using the following command: dotnet-script
Tool 'dotnet-script' (version '0.22.0') was successfully installed.
The advantage of this approach is that you can use the same command for installation across all platforms. .NET Core SDK also supports viewing a list of installed tools and their uninstallation.
dotnet tool list -g
Package Id Version Commands
---------------------------------------------
dotnet-script 0.22.0 dotnet-script
dotnet tool uninstall dotnet-script -g
Tool 'dotnet-script' (version '0.22.0') was successfully uninstalled.
choco install dotnet.script
We also provide a PowerShell script for installation.
(new-object Net.WebClient).DownloadString("https://raw.githubusercontent.com/filipw/dotnet-script/master/install/install.ps1") | iex
curl -s https://raw.githubusercontent.com/filipw/dotnet-script/master/install/install.sh | bash
If permission is denied we can try with sudo
curl -s https://raw.githubusercontent.com/filipw/dotnet-script/master/install/install.sh | sudo bash
A Dockerfile for running dotnet-script in a Linux container is available. Build:
cd build
docker build -t dotnet-script -f Dockerfile ..
And run:
docker run -it dotnet-script --version
You can manually download all the releases in zip
format from the GitHub releases page.
Our typical helloworld.csx
might look like this:
Console.WriteLine("Hello world!");
That is all it takes and we can execute the script. Args are accessible via the global Args array.
dotnet script helloworld.csx
Simply create a folder somewhere on your system and issue the following command.
dotnet script init
This will create main.csx
along with the launch configuration needed to debug the script in VS Code.
.
├── .vscode
│ └── launch.json
├── main.csx
└── omnisharp.json
We can also initialize a folder using a custom filename.
dotnet script init custom.csx
Instead of main.csx
which is the default, we now have a file named custom.csx
.
.
├── .vscode
│ └── launch.json
├── custom.csx
└── omnisharp.json
Note: Executing
dotnet script init
inside a folder that already contains one or more script files will not create themain.csx
file.
Scripts can be executed directly from the shell as if they were executables.
foo.csx arg1 arg2 arg3
OSX/Linux
Just like all scripts, on OSX/Linux you need to have a
#!
and mark the file as executable via chmod +x foo.csx. If you use dotnet script init to create your csx it will automatically have the#!
directive and be marked as executable.
The OSX/Linux shebang directive should be #!/usr/bin/env dotnet-script
#!/usr/bin/env dotnet-script
Console.WriteLine("Hello world");
You can execute your script using dotnet script or dotnet-script, which allows you to pass arguments to control your script execution more.
foo.csx arg1 arg2 arg3
dotnet script foo.csx -- arg1 arg2 arg3
dotnet-script foo.csx -- arg1 arg2 arg3
All arguments after --
are passed to the script in the following way:
dotnet script foo.csx -- arg1 arg2 arg3
Then you can access the arguments in the script context using the global Args
collection:
foreach (var arg in Args)
{
Console.WriteLine(arg);
}
All arguments before --
are processed by dotnet script
. For example, the following command-line
dotnet script -d foo.csx -- -d
will pass the -d
before --
to dotnet script
and enable the debug mode whereas the -d
after --
is passed to script for its own interpretation of the argument.
dotnet script
has built-in support for referencing NuGet packages directly from within the script.
#r "nuget: AutoMapper, 6.1.0"
Note: Omnisharp needs to be restarted after adding a new package reference
We can define package sources using a NuGet.Config
file in the script root folder. In addition to being used during execution of the script, it will also be used by OmniSharp
that provides language services for packages resolved from these package sources.
As an alternative to maintaining a local NuGet.Config
file we can define these package sources globally either at the user level or at the computer level as described in Configuring NuGet Behaviour
It is also possible to specify packages sources when executing the script.
dotnet script foo.csx -s https://SomePackageSource
Multiple packages sources can be specified like this:
dotnet script foo.csx -s https://SomePackageSource -s https://AnotherPackageSource
Dotnet-Script can create a standalone executable or DLL for your script.
Switch | Long switch | description |
---|---|---|
-o | --output | Directory where the published executable should be placed. Defaults to a 'publish' folder in the current directory. |
-n | --name | The name for the generated DLL (executable not supported at this time). Defaults to the name of the script. |
--dll | Publish to a .dll instead of an executable. | |
-c | --configuration | Configuration to use for publishing the script [Release/Debug]. Default is "Debug" |
-d | --debug | Enables debug output. |
-r | --runtime | The runtime used when publishing the self contained executable. Defaults to your current runtime. |
The executable you can run directly independent of dotnet install, while the DLL can be run using the dotnet CLI like this:
dotnet script exec {path_to_dll} -- arg1 arg2
We provide two types of caching, the dependency cache
and the execution cache
which is explained in detail below. In order for any of these caches to be enabled, it is required that all NuGet package references are specified using an exact version number. The reason for this constraint is that we need to make sure that we don't execute a script with a stale dependency graph.
In order to resolve the dependencies for a script, a dotnet restore
is executed under the hood to produce a project.assets.json
file from which we can figure out all the dependencies we need to add to the compilation. This is an out-of-process operation and represents a significant overhead to the script execution. So this cache works by looking at all the dependencies specified in the script(s) either in the form of NuGet package references or assembly file references. If these dependencies matches the dependencies from the last script execution, we skip the restore and read the dependencies from the already generated project.assets.json
file. If any of the dependencies has changed, we must restore again to obtain the new dependency graph.
In order to execute a script it needs to be compiled first and since that is a CPU and time consuming operation, we make sure that we only compile when the source code has changed. This works by creating a SHA256 hash from all the script files involved in the execution. This hash is written to a temporary location along with the DLL that represents the result of the script compilation. When a script is executed the hash is computed and compared with the hash from the previous compilation. If they match there is no need to recompile and we run from the already compiled DLL. If the hashes don't match, the cache is invalidated and we recompile.
You can override this automatic caching by passing --no-cache flag, which will bypass both caches and cause dependency resolution and script compilation to happen every time we execute the script.
The temporary location used for caches is a sub-directory named dotnet-script
under (in order of priority):
DOTNET_SCRIPT_CACHE_LOCATION
, if defined and value is not empty.$XDG_CACHE_HOME
if defined otherwise $HOME/.cache
~/Library/Caches
Path.GetTempPath
for the platform.The days of debugging scripts using Console.WriteLine
are over. One major feature of dotnet script
is the ability to debug scripts directly in VS Code. Just set a breakpoint anywhere in your script file(s) and hit F5(start debugging)
Script packages are a way of organizing reusable scripts into NuGet packages that can be consumed by other scripts. This means that we now can leverage scripting infrastructure without the need for any kind of bootstrapping.
A script package is just a regular NuGet package that contains script files inside the content
or contentFiles
folder.
The following example shows how the scripts are laid out inside the NuGet package according to the standard convention .
└── contentFiles
└── csx
└── netstandard2.0
└── main.csx
This example contains just the main.csx
file in the root folder, but packages may have multiple script files either in the root folder or in subfolders below the root folder.
When loading a script package we will look for an entry point script to be loaded. This entry point script is identified by one of the following.
main.csx
in the root folderIf the entry point script cannot be determined, we will simply load all the scripts files in the package.
The advantage with using an entry point script is that we can control loading other scripts from the package.
To consume a script package all we need to do specify the NuGet package in the #load
directive.
The following example loads the simple-targets package that contains script files to be included in our script.
#load "nuget:simple-targets-csx, 6.0.0"
using static SimpleTargets;
var targets = new TargetDictionary();
targets.Add("default", () => Console.WriteLine("Hello, world!"));
Run(Args, targets);
Note: Debugging also works for script packages so that we can easily step into the scripts that are brought in using the
#load
directive.
Scripts don't actually have to exist locally on the machine. We can also execute scripts that are made available on an http(s)
endpoint.
This means that we can create a Gist on Github and execute it just by providing the URL to the Gist.
This Gist contains a script that prints out "Hello World"
We can execute the script like this
dotnet script https://gist.githubusercontent.com/seesharper/5d6859509ea8364a1fdf66bbf5b7923d/raw/0a32bac2c3ea807f9379a38e251d93e39c8131cb/HelloWorld.csx
That is a pretty long URL, so why don't make it a TinyURL like this:
dotnet script https://tinyurl.com/y8cda9zt
A pretty common scenario is that we have logic that is relative to the script path. We don't want to require the user to be in a certain directory for these paths to resolve correctly so here is how to provide the script path and the script folder regardless of the current working directory.
public static string GetScriptPath([CallerFilePath] string path = null) => path;
public static string GetScriptFolder([CallerFilePath] string path = null) => Path.GetDirectoryName(path);
Tip: Put these methods as top level methods in a separate script file and
#load
that file wherever access to the script path and/or folder is needed.
This release contains a C# REPL (Read-Evaluate-Print-Loop). The REPL mode ("interactive mode") is started by executing dotnet-script
without any arguments.
The interactive mode allows you to supply individual C# code blocks and have them executed as soon as you press Enter. The REPL is configured with the same default set of assembly references and using statements as regular CSX script execution.
Once dotnet-script
starts you will see a prompt for input. You can start typing C# code there.
~$ dotnet script
> var x = 1;
> x+x
2
If you submit an unterminated expression into the REPL (no ;
at the end), it will be evaluated and the result will be serialized using a formatter and printed in the output. This is a bit more interesting than just calling ToString()
on the object, because it attempts to capture the actual structure of the object. For example:
~$ dotnet script
> var x = new List<string>();
> x.Add("foo");
> x
List<string>(1) { "foo" }
> x.Add("bar");
> x
List<string>(2) { "foo", "bar" }
>
REPL also supports inline Nuget packages - meaning the Nuget packages can be installed into the REPL from within the REPL. This is done via our #r
and #load
from Nuget support and uses identical syntax.
~$ dotnet script
> #r "nuget: Automapper, 6.1.1"
> using AutoMapper;
> typeof(MapperConfiguration)
[AutoMapper.MapperConfiguration]
> #load "nuget: simple-targets-csx, 6.0.0";
> using static SimpleTargets;
> typeof(TargetDictionary)
[Submission#0+SimpleTargets+TargetDictionary]
Using Roslyn syntax parsing, we also support multiline REPL mode. This means that if you have an uncompleted code block and press Enter, we will automatically enter the multiline mode. The mode is indicated by the *
character. This is particularly useful for declaring classes and other more complex constructs.
~$ dotnet script
> class Foo {
* public string Bar {get; set;}
* }
> var foo = new Foo();
Aside from the regular C# script code, you can invoke the following commands (directives) from within the REPL:
Command | Description |
---|---|
#load | Load a script into the REPL (same as #load usage in CSX) |
#r | Load an assembly into the REPL (same as #r usage in CSX) |
#reset | Reset the REPL back to initial state (without restarting it) |
#cls | Clear the console screen without resetting the REPL state |
#exit | Exits the REPL |
You can execute a CSX script and, at the end of it, drop yourself into the context of the REPL. This way, the REPL becomes "seeded" with your code - all the classes, methods or variables are available in the REPL context. This is achieved by running a script with an -i
flag.
For example, given the following CSX script:
var msg = "Hello World";
Console.WriteLine(msg);
When you run this with the -i
flag, Hello World
is printed, REPL starts and msg
variable is available in the REPL context.
~$ dotnet script foo.csx -i
Hello World
>
You can also seed the REPL from inside the REPL - at any point - by invoking a #load
directive pointed at a specific file. For example:
~$ dotnet script
> #load "foo.csx"
Hello World
>
The following example shows how we can pipe data in and out of a script.
The UpperCase.csx
script simply converts the standard input to upper case and writes it back out to standard output.
using (var streamReader = new StreamReader(Console.OpenStandardInput()))
{
Write(streamReader.ReadToEnd().ToUpper());
}
We can now simply pipe the output from one command into our script like this.
echo "This is some text" | dotnet script UpperCase.csx
THIS IS SOME TEXT
The first thing we need to do add the following to the launch.config
file that allows VS Code to debug a running process.
{
"name": ".NET Core Attach",
"type": "coreclr",
"request": "attach",
"processId": "${command:pickProcess}"
}
To debug this script we need a way to attach the debugger in VS Code and the simplest thing we can do here is to wait for the debugger to attach by adding this method somewhere.
public static void WaitForDebugger()
{
Console.WriteLine("Attach Debugger (VS Code)");
while(!Debugger.IsAttached)
{
}
}
To debug the script when executing it from the command line we can do something like
WaitForDebugger();
using (var streamReader = new StreamReader(Console.OpenStandardInput()))
{
Write(streamReader.ReadToEnd().ToUpper()); // <- SET BREAKPOINT HERE
}
Now when we run the script from the command line we will get
$ echo "This is some text" | dotnet script UpperCase.csx
Attach Debugger (VS Code)
This now gives us a chance to attach the debugger before stepping into the script and from VS Code, select the .NET Core Attach
debugger and pick the process that represents the executing script.
Once that is done we should see our breakpoint being hit.
By default, scripts will be compiled using the debug
configuration. This is to ensure that we can debug a script in VS Code as well as attaching a debugger for long running scripts.
There are however situations where we might need to execute a script that is compiled with the release
configuration. For instance, running benchmarks using BenchmarkDotNet is not possible unless the script is compiled with the release
configuration.
We can specify this when executing the script.
dotnet script foo.csx -c release
Starting from version 0.50.0, dotnet-script
supports .Net Core 3.0 and all the C# 8 features. The way we deal with nullable references types in dotnet-script
is that we turn every warning related to nullable reference types into compiler errors. This means every warning between CS8600
and CS8655
are treated as an error when compiling the script.
Nullable references types are turned off by default and the way we enable it is using the #nullable enable
compiler directive. This means that existing scripts will continue to work, but we can now opt-in on this new feature.
#!/usr/bin/env dotnet-script
#nullable enable
string name = null;
Trying to execute the script will result in the following error
main.csx(5,15): error CS8625: Cannot convert null literal to non-nullable reference type.
We will also see this when working with scripts in VS Code under the problems panel.
Download Details:
Author: filipw
Source Code: https://github.com/filipw/dotnet-script
License: MIT License