1653314700
In this video tutorial, We'll share How to Create Pandas DataFrame from a Dictionary and Converting Python Dictionary to Pandas DataFrame.
1655630160
Install via pip:
$ pip install pytumblr
Install from source:
$ git clone https://github.com/tumblr/pytumblr.git
$ cd pytumblr
$ python setup.py install
A pytumblr.TumblrRestClient
is the object you'll make all of your calls to the Tumblr API through. Creating one is this easy:
client = pytumblr.TumblrRestClient(
'<consumer_key>',
'<consumer_secret>',
'<oauth_token>',
'<oauth_secret>',
)
client.info() # Grabs the current user information
Two easy ways to get your credentials to are:
interactive_console.py
tool (if you already have a consumer key & secret)client.info() # get information about the authenticating user
client.dashboard() # get the dashboard for the authenticating user
client.likes() # get the likes for the authenticating user
client.following() # get the blogs followed by the authenticating user
client.follow('codingjester.tumblr.com') # follow a blog
client.unfollow('codingjester.tumblr.com') # unfollow a blog
client.like(id, reblogkey) # like a post
client.unlike(id, reblogkey) # unlike a post
client.blog_info(blogName) # get information about a blog
client.posts(blogName, **params) # get posts for a blog
client.avatar(blogName) # get the avatar for a blog
client.blog_likes(blogName) # get the likes on a blog
client.followers(blogName) # get the followers of a blog
client.blog_following(blogName) # get the publicly exposed blogs that [blogName] follows
client.queue(blogName) # get the queue for a given blog
client.submission(blogName) # get the submissions for a given blog
Creating posts
PyTumblr lets you create all of the various types that Tumblr supports. When using these types there are a few defaults that are able to be used with any post type.
The default supported types are described below.
We'll show examples throughout of these default examples while showcasing all the specific post types.
Creating a photo post
Creating a photo post supports a bunch of different options plus the described default options * caption - a string, the user supplied caption * link - a string, the "click-through" url for the photo * source - a string, the url for the photo you want to use (use this or the data parameter) * data - a list or string, a list of filepaths or a single file path for multipart file upload
#Creates a photo post using a source URL
client.create_photo(blogName, state="published", tags=["testing", "ok"],
source="https://68.media.tumblr.com/b965fbb2e501610a29d80ffb6fb3e1ad/tumblr_n55vdeTse11rn1906o1_500.jpg")
#Creates a photo post using a local filepath
client.create_photo(blogName, state="queue", tags=["testing", "ok"],
tweet="Woah this is an incredible sweet post [URL]",
data="/Users/johnb/path/to/my/image.jpg")
#Creates a photoset post using several local filepaths
client.create_photo(blogName, state="draft", tags=["jb is cool"], format="markdown",
data=["/Users/johnb/path/to/my/image.jpg", "/Users/johnb/Pictures/kittens.jpg"],
caption="## Mega sweet kittens")
Creating a text post
Creating a text post supports the same options as default and just a two other parameters * title - a string, the optional title for the post. Supports markdown or html * body - a string, the body of the of the post. Supports markdown or html
#Creating a text post
client.create_text(blogName, state="published", slug="testing-text-posts", title="Testing", body="testing1 2 3 4")
Creating a quote post
Creating a quote post supports the same options as default and two other parameter * quote - a string, the full text of the qote. Supports markdown or html * source - a string, the cited source. HTML supported
#Creating a quote post
client.create_quote(blogName, state="queue", quote="I am the Walrus", source="Ringo")
Creating a link post
#Create a link post
client.create_link(blogName, title="I like to search things, you should too.", url="https://duckduckgo.com",
description="Search is pretty cool when a duck does it.")
Creating a chat post
Creating a chat post supports the same options as default and two other parameters * title - a string, the title of the chat post * conversation - a string, the text of the conversation/chat, with diablog labels (no html)
#Create a chat post
chat = """John: Testing can be fun!
Renee: Testing is tedious and so are you.
John: Aw.
"""
client.create_chat(blogName, title="Renee just doesn't understand.", conversation=chat, tags=["renee", "testing"])
Creating an audio post
Creating an audio post allows for all default options and a has 3 other parameters. The only thing to keep in mind while dealing with audio posts is to make sure that you use the external_url parameter or data. You cannot use both at the same time. * caption - a string, the caption for your post * external_url - a string, the url of the site that hosts the audio file * data - a string, the filepath of the audio file you want to upload to Tumblr
#Creating an audio file
client.create_audio(blogName, caption="Rock out.", data="/Users/johnb/Music/my/new/sweet/album.mp3")
#lets use soundcloud!
client.create_audio(blogName, caption="Mega rock out.", external_url="https://soundcloud.com/skrillex/sets/recess")
Creating a video post
Creating a video post allows for all default options and has three other options. Like the other post types, it has some restrictions. You cannot use the embed and data parameters at the same time. * caption - a string, the caption for your post * embed - a string, the HTML embed code for the video * data - a string, the path of the file you want to upload
#Creating an upload from YouTube
client.create_video(blogName, caption="Jon Snow. Mega ridiculous sword.",
embed="http://www.youtube.com/watch?v=40pUYLacrj4")
#Creating a video post from local file
client.create_video(blogName, caption="testing", data="/Users/johnb/testing/ok/blah.mov")
Editing a post
Updating a post requires you knowing what type a post you're updating. You'll be able to supply to the post any of the options given above for updates.
client.edit_post(blogName, id=post_id, type="text", title="Updated")
client.edit_post(blogName, id=post_id, type="photo", data="/Users/johnb/mega/awesome.jpg")
Reblogging a Post
Reblogging a post just requires knowing the post id and the reblog key, which is supplied in the JSON of any post object.
client.reblog(blogName, id=125356, reblog_key="reblog_key")
Deleting a post
Deleting just requires that you own the post and have the post id
client.delete_post(blogName, 123456) # Deletes your post :(
A note on tags: When passing tags, as params, please pass them as a list (not a comma-separated string):
client.create_text(blogName, tags=['hello', 'world'], ...)
Getting notes for a post
In order to get the notes for a post, you need to have the post id and the blog that it is on.
data = client.notes(blogName, id='123456')
The results include a timestamp you can use to make future calls.
data = client.notes(blogName, id='123456', before_timestamp=data["_links"]["next"]["query_params"]["before_timestamp"])
# get posts with a given tag
client.tagged(tag, **params)
This client comes with a nice interactive console to run you through the OAuth process, grab your tokens (and store them for future use).
You'll need pyyaml
installed to run it, but then it's just:
$ python interactive-console.py
and away you go! Tokens are stored in ~/.tumblr
and are also shared by other Tumblr API clients like the Ruby client.
The tests (and coverage reports) are run with nose, like this:
python setup.py test
Author: tumblr
Source Code: https://github.com/tumblr/pytumblr
License: Apache-2.0 license
1623927960
Python is famous for its vast selection of libraries and resources from the open-source community. As a Data Analyst/Engineer/Scientist, one might be familiar with popular packages such as Numpy, Pandas, Scikit-learn, Keras, and TensorFlow. Together these modules help us extract value out of data and propels the field of analytics. As data continue to become larger and more complex, one other element to consider is a framework dedicated to processing Big Data, such as Apache Spark. In this article, I will demonstrate the capabilities of distributed/cluster computing and present a comparison between the Pandas DataFrame and Spark DataFrame. My hope is to provide more conviction on choosing the right implementation.
Pandas has become very popular for its ease of use. It utilizes DataFrames to present data in tabular format like a spreadsheet with rows and columns. Importantly, it has very intuitive methods to perform common analytical tasks and a relatively flat learning curve. It loads all of the data into memory on a single machine (one node) for rapid execution. While the Pandas DataFrame has proven to be tremendously powerful in manipulating data, it does have its limits. With data growing at an exponentially rate, complex data processing becomes expensive to handle and causes performance degradation. These operations require parallelization and distributed computing, which the Pandas DataFrame does not support.
Apache Spark is an open-source cluster computing framework. With cluster computing, data processing is distributed and performed in parallel by multiple nodes. This is recognized as the MapReduce framework because the division of labor can usually be characterized by sets of the map, shuffle, and reduce operations found in functional programming. Spark’s implementation of cluster computing is unique because processes 1) are executed in-memory and 2) build up a query plan which does not execute until necessary (known as lazy execution). Although Spark’s cluster computing framework has a broad range of utility, we only look at the Spark DataFrame for the purpose of this article. Similar to those found in Pandas, the Spark DataFrame has intuitive APIs, making it easy to implement.
#pandas dataframe vs. spark dataframe: when parallel computing matters #pandas #pandas dataframe #pandas dataframe vs. spark dataframe #spark #when parallel computing matters
1623370500
Hey - Nick here! This page is a free excerpt from my $199 course Python for Finance, which is 50% off for the next 50 students.
If you want the full course, click here to sign up.
It’s now time for some practice problems! See below for details on how to proceed.
All of the code for this course’s practice problems can be found in this GitHub repository.
There are two options that you can use to complete the practice problems:
Note that binder can take up to a minute to load the repository, so please be patient.
Within that repository, there is a folder called starter-files
and a folder called finished-files
. You should open the appropriate practice problems within the starter-files
folder and only consult the corresponding file in the finished-files
folder if you get stuck.
The repository is public, which means that you can suggest changes using a pull request later in this course if you’d like.
#dataframes #pandas #practice problems: how to join dataframes in pandas #how to join dataframes in pandas #practice #/pandas/issues.
1624431580
In this tutorial, we are going to discuss different ways to add a new column to pandas data frame.
Table of Contents
Pandas data frameis a two-dimensional heterogeneous data structure that stores the data in a tabular form with labeled indexes i.e. rows and columns.
Usually, data frames are used when we have to deal with a large dataset, then we can simply see the summary of that large dataset by loading it into a pandas data frame and see the summary of the data frame.
In the real-world scenario, a pandas data frame is created by loading the datasets from an existing CSV file, Excel file, etc.
But pandas data frame can be also created from the list, dictionary, list of lists, list of dictionaries, dictionary of ndarray/lists, etc. Before we start discussing how to add a new column to an existing data frame we require a pandas data frame.
#pandas #dataframe #pandas dataframe #column #add a new column #how to add a new column to pandas dataframe
1586702221
In this post, we will learn about pandas’ data structures/objects. Pandas provide two type of data structures:-
Pandas Series is a one dimensional indexed data, which can hold datatypes like integer, string, boolean, float, python object etc. A Pandas Series can hold only one data type at a time. The axis label of the data is called the index of the series. The labels need not to be unique but must be a hashable type. The index of the series can be integer, string and even time-series data. In general, Pandas Series is nothing but a column of an excel sheet with row index being the index of the series.
Pandas dataframe is a primary data structure of pandas. Pandas dataframe is a two-dimensional size mutable array with both flexible row indices and flexible column names. In general, it is just like an excel sheet or SQL table. It can also be seen as a python’s dict-like container for series objects.
#python #python-pandas #pandas-dataframe #pandas-series #pandas-tutorial