Jamison  Fisher

Jamison Fisher

1644309840

Optimus: Python Library to Easily Load, Process, Plot & Create ML

Overview

Optimus is an opinionated python library to easily load, process, plot and create ML models that run over pandas, Dask, cuDF, dask-cuDF, Vaex or Spark.

Some amazing things Optimus can do for you:

  • Process using a simple API, making it easy to use for newcomers.
  • More than 100 functions to handle strings, process dates, urls and emails.
  • Easily plot data from any size.
  • Out of box functions to explore and fix data quality.
  • Use the same code to process your data in your laptop or in a remote cluster of GPUs.

See Documentation

Try Optimus

To launch a live notebook server to test optimus using binder or Colab, click on one of the following badges:

Installation (pip):

In your terminal just type:

pip install pyoptimus

By default Optimus install Pandas as the default engine, to install other engines you can use the following commands:

EngineCommand
Daskpip install pyoptimus[dask]
cuDFpip install pyoptimus[cudf]
Dask-cuDFpip install pyoptimus[dask-cudf]
Vaexpip install pyoptimus[vaex]
Sparkpip install pyoptimus[spark]

To install from the repo:

pip install git+https://github.com/hi-primus/optimus.git@develop-22.2

To install other engines:

pip install git+https://github.com/hi-primus/optimus.git@develop-22.2#egg=pyoptimus[dask]

Requirements

  • Python 3.7 or 3.8

Examples

You can go to 10 minutes to Optimus where you can find the basics to start working in a notebook.

Also you can go to the Examples section and find specific notebooks about data cleaning, data munging, profiling, data enrichment and how to create ML and DL models.

Here's a handy Cheat Sheet with the most common Optimus' operations.

Start Optimus

Start Optimus using "pandas", "dask", "cudf","dask_cudf","vaex" or "spark".

from optimus import Optimus op = Optimus("pandas")

Loading data

Now Optimus can load data in csv, json, parquet, avro and excel formats from a local file or from a URL.

#csv
df = op.load.csv("../examples/data/foo.csv")

#json
df = op.load.json("../examples/data/foo.json")

# using a url
df = op.load.json("https://raw.githubusercontent.com/hi-primus/optimus/develop-22.2/examples/data/foo.json")

# parquet
df = op.load.parquet("../examples/data/foo.parquet")

# ...or anything else
df = op.load.file("../examples/data/titanic3.xls")

Also, you can load data from Oracle, Redshift, MySQL and Postgres databases.

Saving Data

#csv
df.save.csv("data/foo.csv")

# json
df.save.json("data/foo.json")

# parquet
df.save.parquet("data/foo.parquet")

You can also save data to oracle, redshift, mysql and postgres.

Create dataframes

Also, you can create a dataframe from scratch

df = op.create.dataframe({
    'A': ['a', 'b', 'c', 'd'],
    'B': [1, 3, 5, 7],
    'C': [2, 4, 6, None],
    'D': ['1980/04/10', '1980/04/10', '1980/04/10', '1980/04/10']
})

Using display you have a beautiful way to show your data with extra information like column number, column data type and marked white spaces.

display(df)

Cleaning and Processing

Optimus was created to make data cleaning a breeze. The API was designed to be super easy to newcomers and very familiar for people that comes from Pandas. Optimus expands the standard DataFrame functionality adding .rows and .cols accessors.

For example you can load data from a url, transform and apply some predefined cleaning functions:

new_df = df\
    .rows.sort("rank", "desc")\
    .cols.lower(["names", "function"])\
    .cols.date_format("date arrival", "yyyy/MM/dd", "dd-MM-YYYY")\
    .cols.years_between("date arrival", "dd-MM-YYYY", output_cols="from arrival")\
    .cols.normalize_chars("names")\
    .cols.remove_special_chars("names")\
    .rows.drop(df["rank"]>8)\
    .cols.rename("*", str.lower)\
    .cols.trim("*")\
    .cols.unnest("japanese name", output_cols="other names")\
    .cols.unnest("last position seen", separator=",", output_cols="pos")\
    .cols.drop(["last position seen", "japanese name", "date arrival", "cybertronian", "nulltype"])

Need help? 🛠️

Feedback

Feedback is what drive Optimus future, so please take a couple of minutes to help shape the Optimus' Roadmap: http://bit.ly/optimus_survey

Also if you want to a suggestion or feature request use https://github.com/hi-primus/optimus/issues

Troubleshooting

If you have issues, see our Troubleshooting Guide

Contributing to Optimus 💡

Contributions go far beyond pull requests and commits. We are very happy to receive any kind of contributions
including:

  • Documentation updates, enhancements, designs, or bugfixes.
  • Spelling or grammar fixes.
  • README.md corrections or redesigns.
  • Adding unit, or functional tests
  • Triaging GitHub issues -- especially determining whether an issue still persists or is reproducible.
  • Blogging, speaking about, or creating tutorials about Optimus and its many features.
  • Helping others on our official chats

Download Details:
Author: hi-primus
Source Code: https://github.com/hi-primus/optimus
License: Apache-2.0 License

#pandas  #python #machine-learning #data-science #dask #pyspark 

What is GEEK

Buddha Community

Optimus: Python Library to Easily Load, Process, Plot & Create ML
Anil  Sakhiya

Anil Sakhiya

1652748716

Exploratory Data Analysis(EDA) with Python

Exploratory Data Analysis Tutorial | Basics of EDA with Python

Exploratory data analysis is used by data scientists to analyze and investigate data sets and summarize their main characteristics, often employing data visualization methods. It helps determine how best to manipulate data sources to get the answers you need, making it easier for data scientists to discover patterns, spot anomalies, test a hypothesis, or check assumptions. EDA is primarily used to see what data can reveal beyond the formal modeling or hypothesis testing task and provides a better understanding of data set variables and the relationships between them. It can also help determine if the statistical techniques you are considering for data analysis are appropriate or not.

🔹 Topics Covered:
00:00:00 Basics of EDA with Python
01:40:10 Multiple Variate Analysis
02:30:26 Outlier Detection
03:44:48 Cricket World Cup Analysis using Exploratory Data Analysis


Learning the basics of Exploratory Data Analysis using Python with Numpy, Matplotlib, and Pandas.

What is Exploratory Data Analysis(EDA)?

If we want to explain EDA in simple terms, it means trying to understand the given data much better, so that we can make some sense out of it.

We can find a more formal definition in Wikipedia.

In statistics, exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task.

EDA in Python uses data visualization to draw meaningful patterns and insights. It also involves the preparation of data sets for analysis by removing irregularities in the data.

Based on the results of EDA, companies also make business decisions, which can have repercussions later.

  • If EDA is not done properly then it can hamper the further steps in the machine learning model building process.
  • If done well, it may improve the efficacy of everything we do next.

In this article we’ll see about the following topics:

  1. Data Sourcing
  2. Data Cleaning
  3. Univariate analysis
  4. Bivariate analysis
  5. Multivariate analysis

1. Data Sourcing

Data Sourcing is the process of finding and loading the data into our system. Broadly there are two ways in which we can find data.

  1. Private Data
  2. Public Data

Private Data

As the name suggests, private data is given by private organizations. There are some security and privacy concerns attached to it. This type of data is used for mainly organizations internal analysis.

Public Data

This type of Data is available to everyone. We can find this in government websites and public organizations etc. Anyone can access this data, we do not need any special permissions or approval.

We can get public data on the following sites.

The very first step of EDA is Data Sourcing, we have seen how we can access data and load into our system. Now, the next step is how to clean the data.

2. Data Cleaning

After completing the Data Sourcing, the next step in the process of EDA is Data Cleaning. It is very important to get rid of the irregularities and clean the data after sourcing it into our system.

Irregularities are of different types of data.

  • Missing Values
  • Incorrect Format
  • Incorrect Headers
  • Anomalies/Outliers

To perform the data cleaning we are using a sample data set, which can be found here.

We are using Jupyter Notebook for analysis.

First, let’s import the necessary libraries and store the data in our system for analysis.

#import the useful libraries.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# Read the data set of "Marketing Analysis" in data.
data= pd.read_csv("marketing_analysis.csv")

# Printing the data
data

Now, the data set looks like this,

If we observe the above dataset, there are some discrepancies in the Column header for the first 2 rows. The correct data is from the index number 1. So, we have to fix the first two rows.

This is called Fixing the Rows and Columns. Let’s ignore the first two rows and load the data again.

#import the useful libraries.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# Read the file in data without first two rows as it is of no use.
data = pd.read_csv("marketing_analysis.csv",skiprows = 2)

#print the head of the data frame.
data.head()

Now, the dataset looks like this, and it makes more sense.

Dataset after fixing the rows and columns

Following are the steps to be taken while Fixing Rows and Columns:

  1. Delete Summary Rows and Columns in the Dataset.
  2. Delete Header and Footer Rows on every page.
  3. Delete Extra Rows like blank rows, page numbers, etc.
  4. We can merge different columns if it makes for better understanding of the data
  5. Similarly, we can also split one column into multiple columns based on our requirements or understanding.
  6. Add Column names, it is very important to have column names to the dataset.

Now if we observe the above dataset, the customerid column has of no importance to our analysis, and also the jobedu column has both the information of job and education in it.

So, what we’ll do is, we’ll drop the customerid column and we’ll split the jobedu column into two other columns job and education and after that, we’ll drop the jobedu column as well.

# Drop the customer id as it is of no use.
data.drop('customerid', axis = 1, inplace = True)

#Extract job  & Education in newly from "jobedu" column.
data['job']= data["jobedu"].apply(lambda x: x.split(",")[0])
data['education']= data["jobedu"].apply(lambda x: x.split(",")[1])

# Drop the "jobedu" column from the dataframe.
data.drop('jobedu', axis = 1, inplace = True)

# Printing the Dataset
data

Now, the dataset looks like this,

Dropping Customerid and jobedu columns and adding job and education columns

Missing Values

If there are missing values in the Dataset before doing any statistical analysis, we need to handle those missing values.

There are mainly three types of missing values.

  1. MCAR(Missing completely at random): These values do not depend on any other features.
  2. MAR(Missing at random): These values may be dependent on some other features.
  3. MNAR(Missing not at random): These missing values have some reason for why they are missing.

Let’s see which columns have missing values in the dataset.

# Checking the missing values
data.isnull().sum()

The output will be,

As we can see three columns contain missing values. Let’s see how to handle the missing values. We can handle missing values by dropping the missing records or by imputing the values.

Drop the missing Values

Let’s handle missing values in the age column.

# Dropping the records with age missing in data dataframe.
data = data[~data.age.isnull()].copy()

# Checking the missing values in the dataset.
data.isnull().sum()

Let’s check the missing values in the dataset now.

Let’s impute values to the missing values for the month column.

Since the month column is of an object type, let’s calculate the mode of that column and impute those values to the missing values.

# Find the mode of month in data
month_mode = data.month.mode()[0]

# Fill the missing values with mode value of month in data.
data.month.fillna(month_mode, inplace = True)

# Let's see the null values in the month column.
data.month.isnull().sum()

Now output is,

# Mode of month is
'may, 2017'
# Null values in month column after imputing with mode
0

Handling the missing values in the Response column. Since, our target column is Response Column, if we impute the values to this column it’ll affect our analysis. So, it is better to drop the missing values from Response Column.

#drop the records with response missing in data.
data = data[~data.response.isnull()].copy()
# Calculate the missing values in each column of data frame
data.isnull().sum()

Let’s check whether the missing values in the dataset have been handled or not,

All the missing values have been handled

We can also, fill the missing values as ‘NaN’ so that while doing any statistical analysis, it won’t affect the outcome.

Handling Outliers

We have seen how to fix missing values, now let’s see how to handle outliers in the dataset.

Outliers are the values that are far beyond the next nearest data points.

There are two types of outliers:

  1. Univariate outliers: Univariate outliers are the data points whose values lie beyond the range of expected values based on one variable.
  2. Multivariate outliers: While plotting data, some values of one variable may not lie beyond the expected range, but when you plot the data with some other variable, these values may lie far from the expected value.

So, after understanding the causes of these outliers, we can handle them by dropping those records or imputing with the values or leaving them as is, if it makes more sense.

Standardizing Values

To perform data analysis on a set of values, we have to make sure the values in the same column should be on the same scale. For example, if the data contains the values of the top speed of different companies’ cars, then the whole column should be either in meters/sec scale or miles/sec scale.

Now, that we are clear on how to source and clean the data, let’s see how we can analyze the data.

3. Univariate Analysis

If we analyze data over a single variable/column from a dataset, it is known as Univariate Analysis.

Categorical Unordered Univariate Analysis:

An unordered variable is a categorical variable that has no defined order. If we take our data as an example, the job column in the dataset is divided into many sub-categories like technician, blue-collar, services, management, etc. There is no weight or measure given to any value in the ‘job’ column.

Now, let’s analyze the job category by using plots. Since Job is a category, we will plot the bar plot.

# Let's calculate the percentage of each job status category.
data.job.value_counts(normalize=True)

#plot the bar graph of percentage job categories
data.job.value_counts(normalize=True).plot.barh()
plt.show()

The output looks like this,

By the above bar plot, we can infer that the data set contains more number of blue-collar workers compared to other categories.

Categorical Ordered Univariate Analysis:

Ordered variables are those variables that have a natural rank of order. Some examples of categorical ordered variables from our dataset are:

  • Month: Jan, Feb, March……
  • Education: Primary, Secondary,……

Now, let’s analyze the Education Variable from the dataset. Since we’ve already seen a bar plot, let’s see how a Pie Chart looks like.

#calculate the percentage of each education category.
data.education.value_counts(normalize=True)

#plot the pie chart of education categories
data.education.value_counts(normalize=True).plot.pie()
plt.show()

The output will be,

By the above analysis, we can infer that the data set has a large number of them belongs to secondary education after that tertiary and next primary. Also, a very small percentage of them have been unknown.

This is how we analyze univariate categorical analysis. If the column or variable is of numerical then we’ll analyze by calculating its mean, median, std, etc. We can get those values by using the describe function.

data.salary.describe()

The output will be,

4. Bivariate Analysis

If we analyze data by taking two variables/columns into consideration from a dataset, it is known as Bivariate Analysis.

a) Numeric-Numeric Analysis:

Analyzing the two numeric variables from a dataset is known as numeric-numeric analysis. We can analyze it in three different ways.

  • Scatter Plot
  • Pair Plot
  • Correlation Matrix

Scatter Plot

Let’s take three columns ‘Balance’, ‘Age’ and ‘Salary’ from our dataset and see what we can infer by plotting to scatter plot between salary balance and age balance

#plot the scatter plot of balance and salary variable in data
plt.scatter(data.salary,data.balance)
plt.show()

#plot the scatter plot of balance and age variable in data
data.plot.scatter(x="age",y="balance")
plt.show()

Now, the scatter plots looks like,

Pair Plot

Now, let’s plot Pair Plots for the three columns we used in plotting Scatter plots. We’ll use the seaborn library for plotting Pair Plots.

#plot the pair plot of salary, balance and age in data dataframe.
sns.pairplot(data = data, vars=['salary','balance','age'])
plt.show()

The Pair Plot looks like this,

Correlation Matrix

Since we cannot use more than two variables as x-axis and y-axis in Scatter and Pair Plots, it is difficult to see the relation between three numerical variables in a single graph. In those cases, we’ll use the correlation matrix.

# Creating a matrix using age, salry, balance as rows and columns
data[['age','salary','balance']].corr()

#plot the correlation matrix of salary, balance and age in data dataframe.
sns.heatmap(data[['age','salary','balance']].corr(), annot=True, cmap = 'Reds')
plt.show()

First, we created a matrix using age, salary, and balance. After that, we are plotting the heatmap using the seaborn library of the matrix.

b) Numeric - Categorical Analysis

Analyzing the one numeric variable and one categorical variable from a dataset is known as numeric-categorical analysis. We analyze them mainly using mean, median, and box plots.

Let’s take salary and response columns from our dataset.

First check for mean value using groupby

#groupby the response to find the mean of the salary with response no & yes separately.
data.groupby('response')['salary'].mean()

The output will be,

There is not much of a difference between the yes and no response based on the salary.

Let’s calculate the median,

#groupby the response to find the median of the salary with response no & yes separately.
data.groupby('response')['salary'].median()

The output will be,

By both mean and median we can say that the response of yes and no remains the same irrespective of the person’s salary. But, is it truly behaving like that, let’s plot the box plot for them and check the behavior.

#plot the box plot of salary for yes & no responses.
sns.boxplot(data.response, data.salary)
plt.show()

The box plot looks like this,

As we can see, when we plot the Box Plot, it paints a very different picture compared to mean and median. The IQR for customers who gave a positive response is on the higher salary side.

This is how we analyze Numeric-Categorical variables, we use mean, median, and Box Plots to draw some sort of conclusions.

c) Categorical — Categorical Analysis

Since our target variable/column is the Response rate, we’ll see how the different categories like Education, Marital Status, etc., are associated with the Response column. So instead of ‘Yes’ and ‘No’ we will convert them into ‘1’ and ‘0’, by doing that we’ll get the “Response Rate”.

#create response_rate of numerical data type where response "yes"= 1, "no"= 0
data['response_rate'] = np.where(data.response=='yes',1,0)
data.response_rate.value_counts()

The output looks like this,

Let’s see how the response rate varies for different categories in marital status.

#plot the bar graph of marital status with average value of response_rate
data.groupby('marital')['response_rate'].mean().plot.bar()
plt.show()

The graph looks like this,

By the above graph, we can infer that the positive response is more for Single status members in the data set. Similarly, we can plot the graphs for Loan vs Response rate, Housing Loans vs Response rate, etc.

5. Multivariate Analysis

If we analyze data by taking more than two variables/columns into consideration from a dataset, it is known as Multivariate Analysis.

Let’s see how ‘Education’, ‘Marital’, and ‘Response_rate’ vary with each other.

First, we’ll create a pivot table with the three columns and after that, we’ll create a heatmap.

result = pd.pivot_table(data=data, index='education', columns='marital',values='response_rate')
print(result)

#create heat map of education vs marital vs response_rate
sns.heatmap(result, annot=True, cmap = 'RdYlGn', center=0.117)
plt.show()

The Pivot table and heatmap looks like this,

Based on the Heatmap we can infer that the married people with primary education are less likely to respond positively for the survey and single people with tertiary education are most likely to respond positively to the survey.

Similarly, we can plot the graphs for Job vs marital vs response, Education vs poutcome vs response, etc.

Conclusion

This is how we’ll do Exploratory Data Analysis. Exploratory Data Analysis (EDA) helps us to look beyond the data. The more we explore the data, the more the insights we draw from it. As a data analyst, almost 80% of our time will be spent understanding data and solving various business problems through EDA.

Thank you for reading and Happy Coding!!!

#dataanalysis #python

Easter  Deckow

Easter Deckow

1655630160

PyTumblr: A Python Tumblr API v2 Client

PyTumblr

Installation

Install via pip:

$ pip install pytumblr

Install from source:

$ git clone https://github.com/tumblr/pytumblr.git
$ cd pytumblr
$ python setup.py install

Usage

Create a client

A pytumblr.TumblrRestClient is the object you'll make all of your calls to the Tumblr API through. Creating one is this easy:

client = pytumblr.TumblrRestClient(
    '<consumer_key>',
    '<consumer_secret>',
    '<oauth_token>',
    '<oauth_secret>',
)

client.info() # Grabs the current user information

Two easy ways to get your credentials to are:

  1. The built-in interactive_console.py tool (if you already have a consumer key & secret)
  2. The Tumblr API console at https://api.tumblr.com/console
  3. Get sample login code at https://api.tumblr.com/console/calls/user/info

Supported Methods

User Methods

client.info() # get information about the authenticating user
client.dashboard() # get the dashboard for the authenticating user
client.likes() # get the likes for the authenticating user
client.following() # get the blogs followed by the authenticating user

client.follow('codingjester.tumblr.com') # follow a blog
client.unfollow('codingjester.tumblr.com') # unfollow a blog

client.like(id, reblogkey) # like a post
client.unlike(id, reblogkey) # unlike a post

Blog Methods

client.blog_info(blogName) # get information about a blog
client.posts(blogName, **params) # get posts for a blog
client.avatar(blogName) # get the avatar for a blog
client.blog_likes(blogName) # get the likes on a blog
client.followers(blogName) # get the followers of a blog
client.blog_following(blogName) # get the publicly exposed blogs that [blogName] follows
client.queue(blogName) # get the queue for a given blog
client.submission(blogName) # get the submissions for a given blog

Post Methods

Creating posts

PyTumblr lets you create all of the various types that Tumblr supports. When using these types there are a few defaults that are able to be used with any post type.

The default supported types are described below.

  • state - a string, the state of the post. Supported types are published, draft, queue, private
  • tags - a list, a list of strings that you want tagged on the post. eg: ["testing", "magic", "1"]
  • tweet - a string, the string of the customized tweet you want. eg: "Man I love my mega awesome post!"
  • date - a string, the customized GMT that you want
  • format - a string, the format that your post is in. Support types are html or markdown
  • slug - a string, the slug for the url of the post you want

We'll show examples throughout of these default examples while showcasing all the specific post types.

Creating a photo post

Creating a photo post supports a bunch of different options plus the described default options * caption - a string, the user supplied caption * link - a string, the "click-through" url for the photo * source - a string, the url for the photo you want to use (use this or the data parameter) * data - a list or string, a list of filepaths or a single file path for multipart file upload

#Creates a photo post using a source URL
client.create_photo(blogName, state="published", tags=["testing", "ok"],
                    source="https://68.media.tumblr.com/b965fbb2e501610a29d80ffb6fb3e1ad/tumblr_n55vdeTse11rn1906o1_500.jpg")

#Creates a photo post using a local filepath
client.create_photo(blogName, state="queue", tags=["testing", "ok"],
                    tweet="Woah this is an incredible sweet post [URL]",
                    data="/Users/johnb/path/to/my/image.jpg")

#Creates a photoset post using several local filepaths
client.create_photo(blogName, state="draft", tags=["jb is cool"], format="markdown",
                    data=["/Users/johnb/path/to/my/image.jpg", "/Users/johnb/Pictures/kittens.jpg"],
                    caption="## Mega sweet kittens")

Creating a text post

Creating a text post supports the same options as default and just a two other parameters * title - a string, the optional title for the post. Supports markdown or html * body - a string, the body of the of the post. Supports markdown or html

#Creating a text post
client.create_text(blogName, state="published", slug="testing-text-posts", title="Testing", body="testing1 2 3 4")

Creating a quote post

Creating a quote post supports the same options as default and two other parameter * quote - a string, the full text of the qote. Supports markdown or html * source - a string, the cited source. HTML supported

#Creating a quote post
client.create_quote(blogName, state="queue", quote="I am the Walrus", source="Ringo")

Creating a link post

  • title - a string, the title of post that you want. Supports HTML entities.
  • url - a string, the url that you want to create a link post for.
  • description - a string, the desciption of the link that you have
#Create a link post
client.create_link(blogName, title="I like to search things, you should too.", url="https://duckduckgo.com",
                   description="Search is pretty cool when a duck does it.")

Creating a chat post

Creating a chat post supports the same options as default and two other parameters * title - a string, the title of the chat post * conversation - a string, the text of the conversation/chat, with diablog labels (no html)

#Create a chat post
chat = """John: Testing can be fun!
Renee: Testing is tedious and so are you.
John: Aw.
"""
client.create_chat(blogName, title="Renee just doesn't understand.", conversation=chat, tags=["renee", "testing"])

Creating an audio post

Creating an audio post allows for all default options and a has 3 other parameters. The only thing to keep in mind while dealing with audio posts is to make sure that you use the external_url parameter or data. You cannot use both at the same time. * caption - a string, the caption for your post * external_url - a string, the url of the site that hosts the audio file * data - a string, the filepath of the audio file you want to upload to Tumblr

#Creating an audio file
client.create_audio(blogName, caption="Rock out.", data="/Users/johnb/Music/my/new/sweet/album.mp3")

#lets use soundcloud!
client.create_audio(blogName, caption="Mega rock out.", external_url="https://soundcloud.com/skrillex/sets/recess")

Creating a video post

Creating a video post allows for all default options and has three other options. Like the other post types, it has some restrictions. You cannot use the embed and data parameters at the same time. * caption - a string, the caption for your post * embed - a string, the HTML embed code for the video * data - a string, the path of the file you want to upload

#Creating an upload from YouTube
client.create_video(blogName, caption="Jon Snow. Mega ridiculous sword.",
                    embed="http://www.youtube.com/watch?v=40pUYLacrj4")

#Creating a video post from local file
client.create_video(blogName, caption="testing", data="/Users/johnb/testing/ok/blah.mov")

Editing a post

Updating a post requires you knowing what type a post you're updating. You'll be able to supply to the post any of the options given above for updates.

client.edit_post(blogName, id=post_id, type="text", title="Updated")
client.edit_post(blogName, id=post_id, type="photo", data="/Users/johnb/mega/awesome.jpg")

Reblogging a Post

Reblogging a post just requires knowing the post id and the reblog key, which is supplied in the JSON of any post object.

client.reblog(blogName, id=125356, reblog_key="reblog_key")

Deleting a post

Deleting just requires that you own the post and have the post id

client.delete_post(blogName, 123456) # Deletes your post :(

A note on tags: When passing tags, as params, please pass them as a list (not a comma-separated string):

client.create_text(blogName, tags=['hello', 'world'], ...)

Getting notes for a post

In order to get the notes for a post, you need to have the post id and the blog that it is on.

data = client.notes(blogName, id='123456')

The results include a timestamp you can use to make future calls.

data = client.notes(blogName, id='123456', before_timestamp=data["_links"]["next"]["query_params"]["before_timestamp"])

Tagged Methods

# get posts with a given tag
client.tagged(tag, **params)

Using the interactive console

This client comes with a nice interactive console to run you through the OAuth process, grab your tokens (and store them for future use).

You'll need pyyaml installed to run it, but then it's just:

$ python interactive-console.py

and away you go! Tokens are stored in ~/.tumblr and are also shared by other Tumblr API clients like the Ruby client.

Running tests

The tests (and coverage reports) are run with nose, like this:

python setup.py test

Author: tumblr
Source Code: https://github.com/tumblr/pytumblr
License: Apache-2.0 license

#python #api 

Dylan  Iqbal

Dylan Iqbal

1561523460

Matplotlib Cheat Sheet: Plotting in Python

This Matplotlib cheat sheet introduces you to the basics that you need to plot your data with Python and includes code samples.

Data visualization and storytelling with your data are essential skills that every data scientist needs to communicate insights gained from analyses effectively to any audience out there. 

For most beginners, the first package that they use to get in touch with data visualization and storytelling is, naturally, Matplotlib: it is a Python 2D plotting library that enables users to make publication-quality figures. But, what might be even more convincing is the fact that other packages, such as Pandas, intend to build more plotting integration with Matplotlib as time goes on.

However, what might slow down beginners is the fact that this package is pretty extensive. There is so much that you can do with it and it might be hard to still keep a structure when you're learning how to work with Matplotlib.   

DataCamp has created a Matplotlib cheat sheet for those who might already know how to use the package to their advantage to make beautiful plots in Python, but that still want to keep a one-page reference handy. Of course, for those who don't know how to work with Matplotlib, this might be the extra push be convinced and to finally get started with data visualization in Python. 

You'll see that this cheat sheet presents you with the six basic steps that you can go through to make beautiful plots. 

Check out the infographic by clicking on the button below:

Python Matplotlib cheat sheet

With this handy reference, you'll familiarize yourself in no time with the basics of Matplotlib: you'll learn how you can prepare your data, create a new plot, use some basic plotting routines to your advantage, add customizations to your plots, and save, show and close the plots that you make.

What might have looked difficult before will definitely be more clear once you start using this cheat sheet! Use it in combination with the Matplotlib Gallery, the documentation.

Matplotlib 

Matplotlib is a Python 2D plotting library which produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms.

Prepare the Data 

1D Data 

>>> import numpy as np
>>> x = np.linspace(0, 10, 100)
>>> y = np.cos(x)
>>> z = np.sin(x)

2D Data or Images 

>>> data = 2 * np.random.random((10, 10))
>>> data2 = 3 * np.random.random((10, 10))
>>> Y, X = np.mgrid[-3:3:100j, -3:3:100j]
>>> U = 1 X** 2 + Y
>>> V = 1 + X Y**2
>>> from matplotlib.cbook import get_sample_data
>>> img = np.load(get_sample_data('axes_grid/bivariate_normal.npy'))

Create Plot

>>> import matplotlib.pyplot as plt

Figure 

>>> fig = plt.figure()
>>> fig2 = plt.figure(figsize=plt.figaspect(2.0))

Axes 

>>> fig.add_axes()
>>> ax1 = fig.add_subplot(221) #row-col-num
>>> ax3 = fig.add_subplot(212)
>>> fig3, axes = plt.subplots(nrows=2,ncols=2)
>>> fig4, axes2 = plt.subplots(ncols=3)

Save Plot 

>>> plt.savefig('foo.png') #Save figures
>>> plt.savefig('foo.png',  transparent=True) #Save transparent figures

Show Plot

>>> plt.show()

Plotting Routines 

1D Data 

>>> fig, ax = plt.subplots()
>>> lines = ax.plot(x,y) #Draw points with lines or markers connecting them
>>> ax.scatter(x,y) #Draw unconnected points, scaled or colored
>>> axes[0,0].bar([1,2,3],[3,4,5]) #Plot vertical rectangles (constant width)
>>> axes[1,0].barh([0.5,1,2.5],[0,1,2]) #Plot horiontal rectangles (constant height)
>>> axes[1,1].axhline(0.45) #Draw a horizontal line across axes
>>> axes[0,1].axvline(0.65) #Draw a vertical line across axes
>>> ax.fill(x,y,color='blue') #Draw filled polygons
>>> ax.fill_between(x,y,color='yellow') #Fill between y values and 0

2D Data 

>>> fig, ax = plt.subplots()
>>> im = ax.imshow(img, #Colormapped or RGB arrays
      cmap= 'gist_earth', 
      interpolation= 'nearest',
      vmin=-2,
      vmax=2)
>>> axes2[0].pcolor(data2) #Pseudocolor plot of 2D array
>>> axes2[0].pcolormesh(data) #Pseudocolor plot of 2D array
>>> CS = plt.contour(Y,X,U) #Plot contours
>>> axes2[2].contourf(data1) #Plot filled contours
>>> axes2[2]= ax.clabel(CS) #Label a contour plot

Vector Fields 

>>> axes[0,1].arrow(0,0,0.5,0.5) #Add an arrow to the axes
>>> axes[1,1].quiver(y,z) #Plot a 2D field of arrows
>>> axes[0,1].streamplot(X,Y,U,V) #Plot a 2D field of arrows

Data Distributions 

>>> ax1.hist(y) #Plot a histogram
>>> ax3.boxplot(y) #Make a box and whisker plot
>>> ax3.violinplot(z)  #Make a violin plot

Plot Anatomy & Workflow 

Plot Anatomy 

 y-axis      

                           x-axis 

Workflow 

The basic steps to creating plots with matplotlib are:

1 Prepare Data
2 Create Plot
3 Plot
4 Customized Plot
5 Save Plot
6 Show Plot

>>> import matplotlib.pyplot as plt
>>> x = [1,2,3,4]  #Step 1
>>> y = [10,20,25,30] 
>>> fig = plt.figure() #Step 2
>>> ax = fig.add_subplot(111) #Step 3
>>> ax.plot(x, y, color= 'lightblue', linewidth=3)  #Step 3, 4
>>> ax.scatter([2,4,6],
          [5,15,25],
          color= 'darkgreen',
          marker= '^' )
>>> ax.set_xlim(1, 6.5)
>>> plt.savefig('foo.png' ) #Step 5
>>> plt.show() #Step 6

Close and Clear 

>>> plt.cla()  #Clear an axis
>>> plt.clf(). #Clear the entire figure
>>> plt.close(). #Close a window

Plotting Customize Plot 

Colors, Color Bars & Color Maps 

>>> plt.plot(x, x, x, x**2, x, x** 3)
>>> ax.plot(x, y, alpha = 0.4)
>>> ax.plot(x, y, c= 'k')
>>> fig.colorbar(im, orientation= 'horizontal')
>>> im = ax.imshow(img,
            cmap= 'seismic' )

Markers 

>>> fig, ax = plt.subplots()
>>> ax.scatter(x,y,marker= ".")
>>> ax.plot(x,y,marker= "o")

Linestyles 

>>> plt.plot(x,y,linewidth=4.0)
>>> plt.plot(x,y,ls= 'solid') 
>>> plt.plot(x,y,ls= '--') 
>>> plt.plot(x,y,'--' ,x**2,y**2,'-.' ) 
>>> plt.setp(lines,color= 'r',linewidth=4.0)

Text & Annotations 

>>> ax.text(1,
           -2.1, 
           'Example Graph', 
            style= 'italic' )
>>> ax.annotate("Sine", 
xy=(8, 0),
xycoords= 'data', 
xytext=(10.5, 0),
textcoords= 'data', 
arrowprops=dict(arrowstyle= "->", 
connectionstyle="arc3"),)

Mathtext 

>>> plt.title(r '$sigma_i=15$', fontsize=20)

Limits, Legends and Layouts 

Limits & Autoscaling 

>>> ax.margins(x=0.0,y=0.1) #Add padding to a plot
>>> ax.axis('equal')  #Set the aspect ratio of the plot to 1
>>> ax.set(xlim=[0,10.5],ylim=[-1.5,1.5])  #Set limits for x-and y-axis
>>> ax.set_xlim(0,10.5) #Set limits for x-axis

Legends 

>>> ax.set(title= 'An Example Axes',  #Set a title and x-and y-axis labels
            ylabel= 'Y-Axis', 
            xlabel= 'X-Axis')
>>> ax.legend(loc= 'best')  #No overlapping plot elements

Ticks 

>>> ax.xaxis.set(ticks=range(1,5),  #Manually set x-ticks
             ticklabels=[3,100, 12,"foo" ])
>>> ax.tick_params(axis= 'y', #Make y-ticks longer and go in and out
             direction= 'inout', 
              length=10)

Subplot Spacing 

>>> fig3.subplots_adjust(wspace=0.5,   #Adjust the spacing between subplots
             hspace=0.3,
             left=0.125,
             right=0.9,
             top=0.9,
             bottom=0.1)
>>> fig.tight_layout() #Fit subplot(s) in to the figure area

Axis Spines 

>>> ax1.spines[ 'top'].set_visible(False) #Make the top axis line for a plot invisible
>>> ax1.spines['bottom' ].set_position(( 'outward',10))  #Move the bottom axis line outward

Have this Cheat Sheet at your fingertips

Original article source at https://www.datacamp.com

#matplotlib #cheatsheet #python

Dotnet Script: Run C# Scripts From The .NET CLI

dotnet script

Run C# scripts from the .NET CLI, define NuGet packages inline and edit/debug them in VS Code - all of that with full language services support from OmniSharp.

NuGet Packages

NameVersionFramework(s)
dotnet-script (global tool)Nugetnet6.0, net5.0, netcoreapp3.1
Dotnet.Script (CLI as Nuget)Nugetnet6.0, net5.0, netcoreapp3.1
Dotnet.Script.CoreNugetnetcoreapp3.1 , netstandard2.0
Dotnet.Script.DependencyModelNugetnetstandard2.0
Dotnet.Script.DependencyModel.NugetNugetnetstandard2.0

Installing

Prerequisites

The only thing we need to install is .NET Core 3.1 or .NET 5.0 SDK.

.NET Core Global Tool

.NET Core 2.1 introduced the concept of global tools meaning that you can install dotnet-script using nothing but the .NET CLI.

dotnet tool install -g dotnet-script

You can invoke the tool using the following command: dotnet-script
Tool 'dotnet-script' (version '0.22.0') was successfully installed.

The advantage of this approach is that you can use the same command for installation across all platforms. .NET Core SDK also supports viewing a list of installed tools and their uninstallation.

dotnet tool list -g

Package Id         Version      Commands
---------------------------------------------
dotnet-script      0.22.0       dotnet-script
dotnet tool uninstall dotnet-script -g

Tool 'dotnet-script' (version '0.22.0') was successfully uninstalled.

Windows

choco install dotnet.script

We also provide a PowerShell script for installation.

(new-object Net.WebClient).DownloadString("https://raw.githubusercontent.com/filipw/dotnet-script/master/install/install.ps1") | iex

Linux and Mac

curl -s https://raw.githubusercontent.com/filipw/dotnet-script/master/install/install.sh | bash

If permission is denied we can try with sudo

curl -s https://raw.githubusercontent.com/filipw/dotnet-script/master/install/install.sh | sudo bash

Docker

A Dockerfile for running dotnet-script in a Linux container is available. Build:

cd build
docker build -t dotnet-script -f Dockerfile ..

And run:

docker run -it dotnet-script --version

Github

You can manually download all the releases in zip format from the GitHub releases page.

Usage

Our typical helloworld.csx might look like this:

Console.WriteLine("Hello world!");

That is all it takes and we can execute the script. Args are accessible via the global Args array.

dotnet script helloworld.csx

Scaffolding

Simply create a folder somewhere on your system and issue the following command.

dotnet script init

This will create main.csx along with the launch configuration needed to debug the script in VS Code.

.
├── .vscode
│   └── launch.json
├── main.csx
└── omnisharp.json

We can also initialize a folder using a custom filename.

dotnet script init custom.csx

Instead of main.csx which is the default, we now have a file named custom.csx.

.
├── .vscode
│   └── launch.json
├── custom.csx
└── omnisharp.json

Note: Executing dotnet script init inside a folder that already contains one or more script files will not create the main.csx file.

Running scripts

Scripts can be executed directly from the shell as if they were executables.

foo.csx arg1 arg2 arg3

OSX/Linux

Just like all scripts, on OSX/Linux you need to have a #! and mark the file as executable via chmod +x foo.csx. If you use dotnet script init to create your csx it will automatically have the #! directive and be marked as executable.

The OSX/Linux shebang directive should be #!/usr/bin/env dotnet-script

#!/usr/bin/env dotnet-script
Console.WriteLine("Hello world");

You can execute your script using dotnet script or dotnet-script, which allows you to pass arguments to control your script execution more.

foo.csx arg1 arg2 arg3
dotnet script foo.csx -- arg1 arg2 arg3
dotnet-script foo.csx -- arg1 arg2 arg3

Passing arguments to scripts

All arguments after -- are passed to the script in the following way:

dotnet script foo.csx -- arg1 arg2 arg3

Then you can access the arguments in the script context using the global Args collection:

foreach (var arg in Args)
{
    Console.WriteLine(arg);
}

All arguments before -- are processed by dotnet script. For example, the following command-line

dotnet script -d foo.csx -- -d

will pass the -d before -- to dotnet script and enable the debug mode whereas the -d after -- is passed to script for its own interpretation of the argument.

NuGet Packages

dotnet script has built-in support for referencing NuGet packages directly from within the script.

#r "nuget: AutoMapper, 6.1.0"

package

Note: Omnisharp needs to be restarted after adding a new package reference

Package Sources

We can define package sources using a NuGet.Config file in the script root folder. In addition to being used during execution of the script, it will also be used by OmniSharp that provides language services for packages resolved from these package sources.

As an alternative to maintaining a local NuGet.Config file we can define these package sources globally either at the user level or at the computer level as described in Configuring NuGet Behaviour

It is also possible to specify packages sources when executing the script.

dotnet script foo.csx -s https://SomePackageSource

Multiple packages sources can be specified like this:

dotnet script foo.csx -s https://SomePackageSource -s https://AnotherPackageSource

Creating DLLs or Exes from a CSX file

Dotnet-Script can create a standalone executable or DLL for your script.

SwitchLong switchdescription
-o--outputDirectory where the published executable should be placed. Defaults to a 'publish' folder in the current directory.
-n--nameThe name for the generated DLL (executable not supported at this time). Defaults to the name of the script.
 --dllPublish to a .dll instead of an executable.
-c--configurationConfiguration to use for publishing the script [Release/Debug]. Default is "Debug"
-d--debugEnables debug output.
-r--runtimeThe runtime used when publishing the self contained executable. Defaults to your current runtime.

The executable you can run directly independent of dotnet install, while the DLL can be run using the dotnet CLI like this:

dotnet script exec {path_to_dll} -- arg1 arg2

Caching

We provide two types of caching, the dependency cache and the execution cache which is explained in detail below. In order for any of these caches to be enabled, it is required that all NuGet package references are specified using an exact version number. The reason for this constraint is that we need to make sure that we don't execute a script with a stale dependency graph.

Dependency Cache

In order to resolve the dependencies for a script, a dotnet restore is executed under the hood to produce a project.assets.json file from which we can figure out all the dependencies we need to add to the compilation. This is an out-of-process operation and represents a significant overhead to the script execution. So this cache works by looking at all the dependencies specified in the script(s) either in the form of NuGet package references or assembly file references. If these dependencies matches the dependencies from the last script execution, we skip the restore and read the dependencies from the already generated project.assets.json file. If any of the dependencies has changed, we must restore again to obtain the new dependency graph.

Execution cache

In order to execute a script it needs to be compiled first and since that is a CPU and time consuming operation, we make sure that we only compile when the source code has changed. This works by creating a SHA256 hash from all the script files involved in the execution. This hash is written to a temporary location along with the DLL that represents the result of the script compilation. When a script is executed the hash is computed and compared with the hash from the previous compilation. If they match there is no need to recompile and we run from the already compiled DLL. If the hashes don't match, the cache is invalidated and we recompile.

You can override this automatic caching by passing --no-cache flag, which will bypass both caches and cause dependency resolution and script compilation to happen every time we execute the script.

Cache Location

The temporary location used for caches is a sub-directory named dotnet-script under (in order of priority):

  1. The path specified for the value of the environment variable named DOTNET_SCRIPT_CACHE_LOCATION, if defined and value is not empty.
  2. Linux distributions only: $XDG_CACHE_HOME if defined otherwise $HOME/.cache
  3. macOS only: ~/Library/Caches
  4. The value returned by Path.GetTempPath for the platform.

 

Debugging

The days of debugging scripts using Console.WriteLine are over. One major feature of dotnet script is the ability to debug scripts directly in VS Code. Just set a breakpoint anywhere in your script file(s) and hit F5(start debugging)

debug

Script Packages

Script packages are a way of organizing reusable scripts into NuGet packages that can be consumed by other scripts. This means that we now can leverage scripting infrastructure without the need for any kind of bootstrapping.

Creating a script package

A script package is just a regular NuGet package that contains script files inside the content or contentFiles folder.

The following example shows how the scripts are laid out inside the NuGet package according to the standard convention .

└── contentFiles
    └── csx
        └── netstandard2.0
            └── main.csx

This example contains just the main.csx file in the root folder, but packages may have multiple script files either in the root folder or in subfolders below the root folder.

When loading a script package we will look for an entry point script to be loaded. This entry point script is identified by one of the following.

  • A script called main.csx in the root folder
  • A single script file in the root folder

If the entry point script cannot be determined, we will simply load all the scripts files in the package.

The advantage with using an entry point script is that we can control loading other scripts from the package.

Consuming a script package

To consume a script package all we need to do specify the NuGet package in the #loaddirective.

The following example loads the simple-targets package that contains script files to be included in our script.

#load "nuget:simple-targets-csx, 6.0.0"

using static SimpleTargets;
var targets = new TargetDictionary();

targets.Add("default", () => Console.WriteLine("Hello, world!"));

Run(Args, targets);

Note: Debugging also works for script packages so that we can easily step into the scripts that are brought in using the #load directive.

Remote Scripts

Scripts don't actually have to exist locally on the machine. We can also execute scripts that are made available on an http(s) endpoint.

This means that we can create a Gist on Github and execute it just by providing the URL to the Gist.

This Gist contains a script that prints out "Hello World"

We can execute the script like this

dotnet script https://gist.githubusercontent.com/seesharper/5d6859509ea8364a1fdf66bbf5b7923d/raw/0a32bac2c3ea807f9379a38e251d93e39c8131cb/HelloWorld.csx

That is a pretty long URL, so why don't make it a TinyURL like this:

dotnet script https://tinyurl.com/y8cda9zt

Script Location

A pretty common scenario is that we have logic that is relative to the script path. We don't want to require the user to be in a certain directory for these paths to resolve correctly so here is how to provide the script path and the script folder regardless of the current working directory.

public static string GetScriptPath([CallerFilePath] string path = null) => path;
public static string GetScriptFolder([CallerFilePath] string path = null) => Path.GetDirectoryName(path);

Tip: Put these methods as top level methods in a separate script file and #load that file wherever access to the script path and/or folder is needed.

REPL

This release contains a C# REPL (Read-Evaluate-Print-Loop). The REPL mode ("interactive mode") is started by executing dotnet-script without any arguments.

The interactive mode allows you to supply individual C# code blocks and have them executed as soon as you press Enter. The REPL is configured with the same default set of assembly references and using statements as regular CSX script execution.

Basic usage

Once dotnet-script starts you will see a prompt for input. You can start typing C# code there.

~$ dotnet script
> var x = 1;
> x+x
2

If you submit an unterminated expression into the REPL (no ; at the end), it will be evaluated and the result will be serialized using a formatter and printed in the output. This is a bit more interesting than just calling ToString() on the object, because it attempts to capture the actual structure of the object. For example:

~$ dotnet script
> var x = new List<string>();
> x.Add("foo");
> x
List<string>(1) { "foo" }
> x.Add("bar");
> x
List<string>(2) { "foo", "bar" }
>

Inline Nuget packages

REPL also supports inline Nuget packages - meaning the Nuget packages can be installed into the REPL from within the REPL. This is done via our #r and #load from Nuget support and uses identical syntax.

~$ dotnet script
> #r "nuget: Automapper, 6.1.1"
> using AutoMapper;
> typeof(MapperConfiguration)
[AutoMapper.MapperConfiguration]
> #load "nuget: simple-targets-csx, 6.0.0";
> using static SimpleTargets;
> typeof(TargetDictionary)
[Submission#0+SimpleTargets+TargetDictionary]

Multiline mode

Using Roslyn syntax parsing, we also support multiline REPL mode. This means that if you have an uncompleted code block and press Enter, we will automatically enter the multiline mode. The mode is indicated by the * character. This is particularly useful for declaring classes and other more complex constructs.

~$ dotnet script
> class Foo {
* public string Bar {get; set;}
* }
> var foo = new Foo();

REPL commands

Aside from the regular C# script code, you can invoke the following commands (directives) from within the REPL:

CommandDescription
#loadLoad a script into the REPL (same as #load usage in CSX)
#rLoad an assembly into the REPL (same as #r usage in CSX)
#resetReset the REPL back to initial state (without restarting it)
#clsClear the console screen without resetting the REPL state
#exitExits the REPL

Seeding REPL with a script

You can execute a CSX script and, at the end of it, drop yourself into the context of the REPL. This way, the REPL becomes "seeded" with your code - all the classes, methods or variables are available in the REPL context. This is achieved by running a script with an -i flag.

For example, given the following CSX script:

var msg = "Hello World";
Console.WriteLine(msg);

When you run this with the -i flag, Hello World is printed, REPL starts and msg variable is available in the REPL context.

~$ dotnet script foo.csx -i
Hello World
>

You can also seed the REPL from inside the REPL - at any point - by invoking a #load directive pointed at a specific file. For example:

~$ dotnet script
> #load "foo.csx"
Hello World
>

Piping

The following example shows how we can pipe data in and out of a script.

The UpperCase.csx script simply converts the standard input to upper case and writes it back out to standard output.

using (var streamReader = new StreamReader(Console.OpenStandardInput()))
{
    Write(streamReader.ReadToEnd().ToUpper());
}

We can now simply pipe the output from one command into our script like this.

echo "This is some text" | dotnet script UpperCase.csx
THIS IS SOME TEXT

Debugging

The first thing we need to do add the following to the launch.config file that allows VS Code to debug a running process.

{
    "name": ".NET Core Attach",
    "type": "coreclr",
    "request": "attach",
    "processId": "${command:pickProcess}"
}

To debug this script we need a way to attach the debugger in VS Code and the simplest thing we can do here is to wait for the debugger to attach by adding this method somewhere.

public static void WaitForDebugger()
{
    Console.WriteLine("Attach Debugger (VS Code)");
    while(!Debugger.IsAttached)
    {
    }
}

To debug the script when executing it from the command line we can do something like

WaitForDebugger();
using (var streamReader = new StreamReader(Console.OpenStandardInput()))
{
    Write(streamReader.ReadToEnd().ToUpper()); // <- SET BREAKPOINT HERE
}

Now when we run the script from the command line we will get

$ echo "This is some text" | dotnet script UpperCase.csx
Attach Debugger (VS Code)

This now gives us a chance to attach the debugger before stepping into the script and from VS Code, select the .NET Core Attach debugger and pick the process that represents the executing script.

Once that is done we should see our breakpoint being hit.

Configuration(Debug/Release)

By default, scripts will be compiled using the debug configuration. This is to ensure that we can debug a script in VS Code as well as attaching a debugger for long running scripts.

There are however situations where we might need to execute a script that is compiled with the release configuration. For instance, running benchmarks using BenchmarkDotNet is not possible unless the script is compiled with the release configuration.

We can specify this when executing the script.

dotnet script foo.csx -c release

 

Nullable reference types

Starting from version 0.50.0, dotnet-script supports .Net Core 3.0 and all the C# 8 features. The way we deal with nullable references types in dotnet-script is that we turn every warning related to nullable reference types into compiler errors. This means every warning between CS8600 and CS8655 are treated as an error when compiling the script.

Nullable references types are turned off by default and the way we enable it is using the #nullable enable compiler directive. This means that existing scripts will continue to work, but we can now opt-in on this new feature.

#!/usr/bin/env dotnet-script

#nullable enable

string name = null;

Trying to execute the script will result in the following error

main.csx(5,15): error CS8625: Cannot convert null literal to non-nullable reference type.

We will also see this when working with scripts in VS Code under the problems panel.

image

Download Details:
Author: filipw
Source Code: https://github.com/filipw/dotnet-script
License: MIT License

#dotnet  #aspdotnet  #csharp 

Shubham Ankit

Shubham Ankit

1657081614

How to Automate Excel with Python | Python Excel Tutorial (OpenPyXL)

How to Automate Excel with Python

In this article, We will show how we can use python to automate Excel . A useful Python library is Openpyxl which we will learn to do Excel Automation

What is OPENPYXL

Openpyxl is a Python library that is used to read from an Excel file or write to an Excel file. Data scientists use Openpyxl for data analysis, data copying, data mining, drawing charts, styling sheets, adding formulas, and more.

Workbook: A spreadsheet is represented as a workbook in openpyxl. A workbook consists of one or more sheets.

Sheet: A sheet is a single page composed of cells for organizing data.

Cell: The intersection of a row and a column is called a cell. Usually represented by A1, B5, etc.

Row: A row is a horizontal line represented by a number (1,2, etc.).

Column: A column is a vertical line represented by a capital letter (A, B, etc.).

Openpyxl can be installed using the pip command and it is recommended to install it in a virtual environment.

pip install openpyxl

CREATE A NEW WORKBOOK

We start by creating a new spreadsheet, which is called a workbook in Openpyxl. We import the workbook module from Openpyxl and use the function Workbook() which creates a new workbook.

from openpyxl
import Workbook
#creates a new workbook
wb = Workbook()
#Gets the first active worksheet
ws = wb.active
#creating new worksheets by using the create_sheet method

ws1 = wb.create_sheet("sheet1", 0) #inserts at first position
ws2 = wb.create_sheet("sheet2") #inserts at last position
ws3 = wb.create_sheet("sheet3", -1) #inserts at penultimate position

#Renaming the sheet
ws.title = "Example"

#save the workbook
wb.save(filename = "example.xlsx")

READING DATA FROM WORKBOOK

We load the file using the function load_Workbook() which takes the filename as an argument. The file must be saved in the same working directory.

#loading a workbook
wb = openpyxl.load_workbook("example.xlsx")

 

GETTING SHEETS FROM THE LOADED WORKBOOK

 

#getting sheet names
wb.sheetnames
result = ['sheet1', 'Sheet', 'sheet3', 'sheet2']

#getting a particular sheet
sheet1 = wb["sheet2"]

#getting sheet title
sheet1.title
result = 'sheet2'

#Getting the active sheet
sheetactive = wb.active
result = 'sheet1'

 

ACCESSING CELLS AND CELL VALUES

 

#get a cell from the sheet
sheet1["A1"] <
  Cell 'Sheet1'.A1 >

  #get the cell value
ws["A1"].value 'Segment'

#accessing cell using row and column and assigning a value
d = ws.cell(row = 4, column = 2, value = 10)
d.value
10

 

ITERATING THROUGH ROWS AND COLUMNS

 

#looping through each row and column
for x in range(1, 5):
  for y in range(1, 5):
  print(x, y, ws.cell(row = x, column = y)
    .value)

#getting the highest row number
ws.max_row
701

#getting the highest column number
ws.max_column
19

There are two functions for iterating through rows and columns.

Iter_rows() => returns the rows
Iter_cols() => returns the columns {
  min_row = 4, max_row = 5, min_col = 2, max_col = 5
} => This can be used to set the boundaries
for any iteration.

Example:

#iterating rows
for row in ws.iter_rows(min_row = 2, max_col = 3, max_row = 3):
  for cell in row:
  print(cell) <
  Cell 'Sheet1'.A2 >
  <
  Cell 'Sheet1'.B2 >
  <
  Cell 'Sheet1'.C2 >
  <
  Cell 'Sheet1'.A3 >
  <
  Cell 'Sheet1'.B3 >
  <
  Cell 'Sheet1'.C3 >

  #iterating columns
for col in ws.iter_cols(min_row = 2, max_col = 3, max_row = 3):
  for cell in col:
  print(cell) <
  Cell 'Sheet1'.A2 >
  <
  Cell 'Sheet1'.A3 >
  <
  Cell 'Sheet1'.B2 >
  <
  Cell 'Sheet1'.B3 >
  <
  Cell 'Sheet1'.C2 >
  <
  Cell 'Sheet1'.C3 >

To get all the rows of the worksheet we use the method worksheet.rows and to get all the columns of the worksheet we use the method worksheet.columns. Similarly, to iterate only through the values we use the method worksheet.values.


Example:

for row in ws.values:
  for value in row:
  print(value)

 

WRITING DATA TO AN EXCEL FILE

Writing to a workbook can be done in many ways such as adding a formula, adding charts, images, updating cell values, inserting rows and columns, etc… We will discuss each of these with an example.

 

CREATING AND SAVING A NEW WORKBOOK

 

#creates a new workbook
wb = openpyxl.Workbook()

#saving the workbook
wb.save("new.xlsx")

 

ADDING AND REMOVING SHEETS

 

#creating a new sheet
ws1 = wb.create_sheet(title = "sheet 2")

#creating a new sheet at index 0
ws2 = wb.create_sheet(index = 0, title = "sheet 0")

#checking the sheet names
wb.sheetnames['sheet 0', 'Sheet', 'sheet 2']

#deleting a sheet
del wb['sheet 0']

#checking sheetnames
wb.sheetnames['Sheet', 'sheet 2']

 

ADDING CELL VALUES

 

#checking the sheet value
ws['B2'].value
null

#adding value to cell
ws['B2'] = 367

#checking value
ws['B2'].value
367

 

ADDING FORMULAS

 

We often require formulas to be included in our Excel datasheet. We can easily add formulas using the Openpyxl module just like you add values to a cell.
 

For example:

import openpyxl
from openpyxl
import Workbook

wb = openpyxl.load_workbook("new1.xlsx")
ws = wb['Sheet']

ws['A9'] = '=SUM(A2:A8)'

wb.save("new2.xlsx")

The above program will add the formula (=SUM(A2:A8)) in cell A9. The result will be as below.

image

 

MERGE/UNMERGE CELLS

Two or more cells can be merged to a rectangular area using the method merge_cells(), and similarly, they can be unmerged using the method unmerge_cells().

For example:
Merge cells

#merge cells B2 to C9
ws.merge_cells('B2:C9')
ws['B2'] = "Merged cells"

Adding the above code to the previous example will merge cells as below.

image

UNMERGE CELLS

 

#unmerge cells B2 to C9
ws.unmerge_cells('B2:C9')

The above code will unmerge cells from B2 to C9.

INSERTING AN IMAGE

To insert an image we import the image function from the module openpyxl.drawing.image. We then load our image and add it to the cell as shown in the below example.

Example:

import openpyxl
from openpyxl
import Workbook
from openpyxl.drawing.image
import Image

wb = openpyxl.load_workbook("new1.xlsx")
ws = wb['Sheet']
#loading the image(should be in same folder)
img = Image('logo.png')
ws['A1'] = "Adding image"
#adjusting size
img.height = 130
img.width = 200
#adding img to cell A3

ws.add_image(img, 'A3')

wb.save("new2.xlsx")

Result:

image

CREATING CHARTS

Charts are essential to show a visualization of data. We can create charts from Excel data using the Openpyxl module chart. Different forms of charts such as line charts, bar charts, 3D line charts, etc., can be created. We need to create a reference that contains the data to be used for the chart, which is nothing but a selection of cells (rows and columns). I am using sample data to create a 3D bar chart in the below example:

Example

import openpyxl
from openpyxl
import Workbook
from openpyxl.chart
import BarChart3D, Reference, series

wb = openpyxl.load_workbook("example.xlsx")
ws = wb.active

values = Reference(ws, min_col = 3, min_row = 2, max_col = 3, max_row = 40)
chart = BarChart3D()
chart.add_data(values)
ws.add_chart(chart, "E3")
wb.save("MyChart.xlsx")

Result
image


How to Automate Excel with Python with Video Tutorial

Welcome to another video! In this video, We will cover how we can use python to automate Excel. I'll be going over everything from creating workbooks to accessing individual cells and stylizing cells. There is a ton of things that you can do with Excel but I'll just be covering the core/base things in OpenPyXl.

⭐️ Timestamps ⭐️
00:00 | Introduction
02:14 | Installing openpyxl
03:19 | Testing Installation
04:25 | Loading an Existing Workbook
06:46 | Accessing Worksheets
07:37 | Accessing Cell Values
08:58 | Saving Workbooks
09:52 | Creating, Listing and Changing Sheets
11:50 | Creating a New Workbook
12:39 | Adding/Appending Rows
14:26 | Accessing Multiple Cells
20:46 | Merging Cells
22:27 | Inserting and Deleting Rows
23:35 | Inserting and Deleting Columns
24:48 | Copying and Moving Cells
26:06 | Practical Example, Formulas & Cell Styling

📄 Resources 📄
OpenPyXL Docs: https://openpyxl.readthedocs.io/en/stable/ 
Code Written in This Tutorial: https://github.com/techwithtim/ExcelPythonTutorial 
Subscribe: https://www.youtube.com/c/TechWithTim/featured 

#python