Learning To Create Python Multi-Threaded And Multi-Process

Learning To Create Python Multi-Threaded And Multi-Process - This article is not useful for experienced Python developers. It is rather a superficial overview of Python multi-threaded features for those who recently started learning Python.

Unfortunately, you cannot find tons of material about multithreading in Python. Moreover, quite often I meet Python beginners who don’t know about GIL, for example. In this article, I want to cover the most basic features of a Python multi-threaded, discuss what does GIL stand for and how to act with/without it.

Python is a perfect programming language. It ideally includes many programming paradigms. Majority of the tasks that can be faced by a developer are solved easily, elegantly and concisely using Python. However, a single-threaded solution is quite often enough for all those tasks. Single-threaded programs are usually predictable and easy-to-debug. The same can’t be said for the Python multi-threaded and multi-process applications.

Python Multi-Threaded Application

Python has a threading module that includes everything for the multi-threaded programming: there you can find different types of locks, a semaphore, and an event mechanism. Thus, you get everything you need for the vast majority of Python multi-threaded applications. Moreover, it is extremely easy to use all those tools. To make sure, let’s discuss an example of an application that runs two threads. The first thread types ten “0”, the second – ten “1”, and strictly in turn.

import threading

def writer(x, event_for_wait, event_for_set):

    for i in xrange(10):

        event_for_wait.wait() # wait for event

        event_for_wait.clear() # clean event for future

        print x

        event_for_set.set() # set event for neighbor thread

# init events

e1 = threading.Event()

e2 = threading.Event()

# init threads

t1 = threading.Thread(target=writer, args=(0, e1, e2))

t2 = threading.Thread(target=writer, args=(1, e2, e1))

# start threads

t1.start()

t2.start()

e1.set() # initiate the first event

# join threads to the main thread

t1.join()

t2.join()

No magic and voodoo code. As you can see, the code is accurate and consistent. As you can see, we have created the thread out of a function which is highly convenient for small tasks. Moreover, this code is rather flexible. For example, you have created one more thread that types “2”. Thus, you will get the following:

import threading

def writer(x, event_for_wait, event_for_set):

    for i in xrange(10):

        event_for_wait.wait() # wait for event

        event_for_wait.clear() # clean event for future

        print x

        event_for_set.set() # set event for neighbor thread

# init events

e1 = threading.Event()

e2 = threading.Event()

e3 = threading.Event()

# init threads

t1 = threading.Thread(target=writer, args=(0, e1, e2))

t2 = threading.Thread(target=writer, args=(1, e2, e3))

t3 = threading.Thread(target=writer, args=(2, e3, e1))

# start threads

t1.start()

t2.start()

t3.start()

e1.set() # initiate the first event

# join threads to the main thread

t1.join()

t2.join()

t3.join()

Here, we have added a new event, a new thread, and changed the parameters passed to the threads for the start (it is also possible to create a more general solution with the help of MapReduce, for example, but is out of the article). Thus, as you can see, there are no complicated things and magic. Everything is simple and comprehensive. Let’s move forward.

Global Interpreter Lock

There are two the most widespread reasons for using threads. Firstly, it is useful for increasing the using of modern multi-core processor architecture. It means increasing application performance. Secondly, threads are of utmost importance in case we need to divide the application logic into parallel and fully or partly asynchronous sections (i.e. you need to have an opportunity to ping several servers simultaneously).

Considering the first situation, we face the following Python limitation called Global Interpreter Lock (GIL). The GIL concept means that at any specific time only one thread may be executed by a processor. It was designed to avoid the threads being competing for different variables. An executing thread gains access through the whole environment. This feature of Python thread implementation significantly simplifies the work with threads. Moreover, this feature provides you with certain thread safety.

However, you should pay attention to the following moment: it may seem that a Python multi-threaded application will work exactly the same time as a single-threaded, doing the same thing. However, here you will face the following unpleasant issue. Let’s consider the following code to understand what I mean:

with open('test1.txt', 'w') as fout:

    for i in xrange(1000000):

        print >> fout, 1

# This application just creates a million of strings ‘1’ for ~0.35s on my local machine.

# Now, let’s consider another program for comparison:

from threading import Thread

def writer(filename, n):

    with open(filename, 'w') as fout:

        for i in xrange(n):

            print >> fout, 1

t1 = Thread(target=writer, args=('test2.txt', 500000,))

t2 = Thread(target=writer, args=('test3.txt', 500000,))

t1.start()

t2.start()

t1.join()

t2.join()

The second application creates 2 threads. In each thread, the application creates a separate file for half a million lines “1”. In fact, the amount of work is the same. However, over time, you will see an interesting effect. The application performance is from 0.7 seconds to 7 seconds. What is the reason for this situation?

In fact, it happens due to the fact that when a thread does not need a CPU resource, it frees GIL. At this moment, both threads can try to get it. At the same time, the operating system knows that there are many cores. Thus, it can intensify this situation by trying to distribute the threads between the cores.

UPD: currently, Python 3.2 has an improved implementation of GIL, and this problem is partially solved. The solution is about the fact that each thread after losing control waits for a short period before it can capture GIL again.

“Thus, it’s extremely difficult to create an effective multi-threaded application in Python?” you may ask. However, keep calm, there is always a solution.

Python Multi-Process Applications

In order to solve the problem mentioned in the previous chapter, Python provides us with the subprocess module. We can create an application that should be executed in a parallel thread and execute it in several threads in other application. This solution would have significantly increased our application performance. The matter is that the threads created in GIL only wait for the launched process shutdown. However, this approach has many problems as well. The main issue is that it becomes difficult to transfer the data between the processes. This way, we would have to serialize objects, adjust the connection through PIPE or other tools. In fact, it results in additional expenses and the code becomes hard to understand.

For this reason, here comes another useful approach. Python also has multiprocessing module which is quite similar to threading. For example, the processes can be created in the same way using the functions. The methods of operation with the processes are almost the same as for threading. However, in order to synchronize the processes and provide data sharing – we need to use other tools. I am referring to Queues and Pipes. Alongside, the analogues to locks, events and semaphores mentioned in threading are also available here.

Additionally, the multiprocessing module also provides an operating principle of general memory. Thus, the module provides the class of Value and Array variables that can be shared between the processes. For the convenience of working with the variables, you can use Manager classes. They are more flexible and convenient, but slower.

Furthermore, the multiprocessing module provides an opportunity to create pools of processes. This mechanism is pretty convenient for implementing the Master-Worker template to build a parallel Map.

Among the basic problems of working with multiprocessing, I need to point out a module relative platform dependence. Due to the fact that different operating systems provide a different working process with the processes, the code receives several limitations. For example, Windows OS doesn’t have fork mechanism. Therefore, you need to wrap the processes separation point in the following:

if __name__ =='__main__':

Thus, this code construction is a good code style.

What’s More …

In order to create parallel applications using Python, there are also other libraries and approaches. For example, you can use Haddop+Python or different implementations of MPI and Python (pyMPI, mpi4py). Moreover, it’s even possible to use the wrappers of the existing libraries in C++ and Fortran. Here, we can mention such frameworks/libraries as Pyro, Twisted, Tornado and so on. However, all these things are the topics of other articles.

#python #web-development

What is GEEK

Buddha Community

Learning To Create Python Multi-Threaded And Multi-Process

Pyringe: Debugger Capable Of Attaching to & Injecting Code Into Python

DISCLAIMER: This is not an official google project, this is just something I wrote while at Google.

Pyringe

What this is

Pyringe is a python debugger capable of attaching to running processes, inspecting their state and even of injecting python code into them while they're running. With pyringe, you can list threads, get tracebacks, inspect locals/globals/builtins of running functions, all without having to prepare your program for it.

What this is not

A "Google project". It's my internship project that got open-sourced. Sorry for the confusion.

What do I need?

Pyringe internally uses gdb to do a lot of its heavy lifting, so you will need a fairly recent build of gdb (version 7.4 onwards, and only if gdb was configured with --with-python). You will also need the symbols for whatever build of python you're running.
On Fedora, the package you're looking for is python-debuginfo, on Debian it's called python2.7-dbg (adjust according to version). Arch Linux users: see issue #5, Ubuntu users can only debug the python-dbg binary (see issue #19).
Having Colorama will get you output in boldface, but it's optional.

How do I get it?

Get it from the Github repo, PyPI, or via pip (pip install pyringe).

Is this Python3-friendly?

Short answer: No, sorry. Long answer:
There's three potentially different versions of python in play here:

  1. The version running pyringe
  2. The version being debugged
  3. The version of libpythonXX.so your build of gdb was linked against

2 Is currently the dealbreaker here. Cpython has changed a bit in the meantime[1], and making all features work while debugging python3 will have to take a back seat for now until the more glaring issues have been taken care of.
As for 1 and 3, the 2to3 tool may be able to handle it automatically. But then, as long as 2 hasn't been taken care of, this isn't really a use case in the first place.

[1] - For example, pendingbusy (which is used for injection) has been renamed to busy and been given a function-local scope, making it harder to interact with via gdb.

Will this work with PyPy?

Unfortunately, no. Since this makes use of some CPython internals and implementation details, only CPython is supported. If you don't know what PyPy or CPython are, you'll probably be fine.

Why not PDB?

PDB is great. Use it where applicable! But sometimes it isn't.
Like when python itself crashes, gets stuck in some C extension, or you want to inspect data without stopping a program. In such cases, PDB (and all other debuggers that run within the interpreter itself) are next to useless, and without pyringe you'd be left with having to debug using print statements. Pyringe is just quite convenient in these cases.

I injected a change to a local var into a function and it's not showing up!

This is a known limitation. Things like inject('var = 2') won't work, but inject('var[1] = 1337') should. This is because most of the time, python internally uses a fast path for looking up local variables that doesn't actually perform the dictionary lookup in locals(). In general, code you inject into processes with pyringe is very different from a normal python function call.

How do I use it?

You can start the debugger by executing python -m pyringe. Alternatively:

import pyringe
pyringe.interact()

If that reminds you of the code module, good; this is intentional.
After starting the debugger, you'll be greeted by what behaves almost like a regular python REPL.
Try the following:

==> pid:[None] #threads:[0] current thread:[None]
>>> help()
Available commands:
 attach: Attach to the process with the given pid.
 bt: Get a backtrace of the current position.
 [...]
==> pid:[None] #threads:[0] current thread:[None]
>>> attach(12679)
==> pid:[12679] #threads:[11] current thread:[140108099462912]
>>> threads()
[140108099462912, 140108107855616, 140108116248323, 140108124641024, 140108133033728, 140108224739072, 140108233131776, 140108141426432, 140108241524480, 140108249917184, 140108269324032]

The IDs you see here correspond to what threading.current_thread().ident would tell you.
All debugger functions are just regular python functions that have been exposed to the REPL, so you can do things like the following.

==> pid:[12679] #threads:[11] current thread:[140108099462912]
>>> for tid in threads():
...   if not tid % 10:
...     thread(tid)
...     bt()
... 
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 524, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 504, in run
    self.__target(*self.__args, **self.__kwargs)
  File "./test.py", line 46, in Idle
    Thread_2_Func(1)
  File "./test.py", line 40, in Wait
    time.sleep(n)
==> pid:[12679] #threads:[11] current thread:[140108241524480]
>>> 

You can access the inferior's locals and inspect them like so:

==> pid:[12679] #threads:[11] current thread:[140108241524480]
>>> inflocals()
{'a': <proxy of A object at remote 0x1d9b290>, 'LOL': 'success!', 'b': <proxy of B object at remote 0x1d988c0>, 'n': 1}
==> pid:[12679] #threads:[11] current thread:[140108241524480]
>>> p('a')
<proxy of A object at remote 0x1d9b290>
==> pid:[12679] #threads:[11] current thread:[140108241524480]
>>> p('a').attr
'Some_magic_string'
==> pid:[12679] #threads:[11] current thread:[140108241524480]
>>> 

And sure enough, the definition of a's class reads:

class Example(object):
  cl_attr = False
  def __init__(self):
    self.attr = 'Some_magic_string'

There's limits to how far this proxying of objects goes, and everything that isn't trivial data will show up as strings (like '<function at remote 0x1d957d0>').
You can inject python code into running programs. Of course, there are caveats but... see for yourself:

==> pid:[12679] #threads:[11] current thread:[140108241524480]
>>> inject('import threading')
==> pid:[12679] #threads:[11] current thread:[140108241524480]
>>> inject('print threading.current_thread().ident')
==> pid:[12679] #threads:[11] current thread:[140108241524480]
>>> 

The output of my program in this case reads:

140108241524480

If you need additional pointers, just try using python's help (pyhelp() in the debugger) on debugger commands.

Author: google
Source Code: https://github.com/google/pyringe
License: Apache-2.0 License

#python 

Easter  Deckow

Easter Deckow

1655630160

PyTumblr: A Python Tumblr API v2 Client

PyTumblr

Installation

Install via pip:

$ pip install pytumblr

Install from source:

$ git clone https://github.com/tumblr/pytumblr.git
$ cd pytumblr
$ python setup.py install

Usage

Create a client

A pytumblr.TumblrRestClient is the object you'll make all of your calls to the Tumblr API through. Creating one is this easy:

client = pytumblr.TumblrRestClient(
    '<consumer_key>',
    '<consumer_secret>',
    '<oauth_token>',
    '<oauth_secret>',
)

client.info() # Grabs the current user information

Two easy ways to get your credentials to are:

  1. The built-in interactive_console.py tool (if you already have a consumer key & secret)
  2. The Tumblr API console at https://api.tumblr.com/console
  3. Get sample login code at https://api.tumblr.com/console/calls/user/info

Supported Methods

User Methods

client.info() # get information about the authenticating user
client.dashboard() # get the dashboard for the authenticating user
client.likes() # get the likes for the authenticating user
client.following() # get the blogs followed by the authenticating user

client.follow('codingjester.tumblr.com') # follow a blog
client.unfollow('codingjester.tumblr.com') # unfollow a blog

client.like(id, reblogkey) # like a post
client.unlike(id, reblogkey) # unlike a post

Blog Methods

client.blog_info(blogName) # get information about a blog
client.posts(blogName, **params) # get posts for a blog
client.avatar(blogName) # get the avatar for a blog
client.blog_likes(blogName) # get the likes on a blog
client.followers(blogName) # get the followers of a blog
client.blog_following(blogName) # get the publicly exposed blogs that [blogName] follows
client.queue(blogName) # get the queue for a given blog
client.submission(blogName) # get the submissions for a given blog

Post Methods

Creating posts

PyTumblr lets you create all of the various types that Tumblr supports. When using these types there are a few defaults that are able to be used with any post type.

The default supported types are described below.

  • state - a string, the state of the post. Supported types are published, draft, queue, private
  • tags - a list, a list of strings that you want tagged on the post. eg: ["testing", "magic", "1"]
  • tweet - a string, the string of the customized tweet you want. eg: "Man I love my mega awesome post!"
  • date - a string, the customized GMT that you want
  • format - a string, the format that your post is in. Support types are html or markdown
  • slug - a string, the slug for the url of the post you want

We'll show examples throughout of these default examples while showcasing all the specific post types.

Creating a photo post

Creating a photo post supports a bunch of different options plus the described default options * caption - a string, the user supplied caption * link - a string, the "click-through" url for the photo * source - a string, the url for the photo you want to use (use this or the data parameter) * data - a list or string, a list of filepaths or a single file path for multipart file upload

#Creates a photo post using a source URL
client.create_photo(blogName, state="published", tags=["testing", "ok"],
                    source="https://68.media.tumblr.com/b965fbb2e501610a29d80ffb6fb3e1ad/tumblr_n55vdeTse11rn1906o1_500.jpg")

#Creates a photo post using a local filepath
client.create_photo(blogName, state="queue", tags=["testing", "ok"],
                    tweet="Woah this is an incredible sweet post [URL]",
                    data="/Users/johnb/path/to/my/image.jpg")

#Creates a photoset post using several local filepaths
client.create_photo(blogName, state="draft", tags=["jb is cool"], format="markdown",
                    data=["/Users/johnb/path/to/my/image.jpg", "/Users/johnb/Pictures/kittens.jpg"],
                    caption="## Mega sweet kittens")

Creating a text post

Creating a text post supports the same options as default and just a two other parameters * title - a string, the optional title for the post. Supports markdown or html * body - a string, the body of the of the post. Supports markdown or html

#Creating a text post
client.create_text(blogName, state="published", slug="testing-text-posts", title="Testing", body="testing1 2 3 4")

Creating a quote post

Creating a quote post supports the same options as default and two other parameter * quote - a string, the full text of the qote. Supports markdown or html * source - a string, the cited source. HTML supported

#Creating a quote post
client.create_quote(blogName, state="queue", quote="I am the Walrus", source="Ringo")

Creating a link post

  • title - a string, the title of post that you want. Supports HTML entities.
  • url - a string, the url that you want to create a link post for.
  • description - a string, the desciption of the link that you have
#Create a link post
client.create_link(blogName, title="I like to search things, you should too.", url="https://duckduckgo.com",
                   description="Search is pretty cool when a duck does it.")

Creating a chat post

Creating a chat post supports the same options as default and two other parameters * title - a string, the title of the chat post * conversation - a string, the text of the conversation/chat, with diablog labels (no html)

#Create a chat post
chat = """John: Testing can be fun!
Renee: Testing is tedious and so are you.
John: Aw.
"""
client.create_chat(blogName, title="Renee just doesn't understand.", conversation=chat, tags=["renee", "testing"])

Creating an audio post

Creating an audio post allows for all default options and a has 3 other parameters. The only thing to keep in mind while dealing with audio posts is to make sure that you use the external_url parameter or data. You cannot use both at the same time. * caption - a string, the caption for your post * external_url - a string, the url of the site that hosts the audio file * data - a string, the filepath of the audio file you want to upload to Tumblr

#Creating an audio file
client.create_audio(blogName, caption="Rock out.", data="/Users/johnb/Music/my/new/sweet/album.mp3")

#lets use soundcloud!
client.create_audio(blogName, caption="Mega rock out.", external_url="https://soundcloud.com/skrillex/sets/recess")

Creating a video post

Creating a video post allows for all default options and has three other options. Like the other post types, it has some restrictions. You cannot use the embed and data parameters at the same time. * caption - a string, the caption for your post * embed - a string, the HTML embed code for the video * data - a string, the path of the file you want to upload

#Creating an upload from YouTube
client.create_video(blogName, caption="Jon Snow. Mega ridiculous sword.",
                    embed="http://www.youtube.com/watch?v=40pUYLacrj4")

#Creating a video post from local file
client.create_video(blogName, caption="testing", data="/Users/johnb/testing/ok/blah.mov")

Editing a post

Updating a post requires you knowing what type a post you're updating. You'll be able to supply to the post any of the options given above for updates.

client.edit_post(blogName, id=post_id, type="text", title="Updated")
client.edit_post(blogName, id=post_id, type="photo", data="/Users/johnb/mega/awesome.jpg")

Reblogging a Post

Reblogging a post just requires knowing the post id and the reblog key, which is supplied in the JSON of any post object.

client.reblog(blogName, id=125356, reblog_key="reblog_key")

Deleting a post

Deleting just requires that you own the post and have the post id

client.delete_post(blogName, 123456) # Deletes your post :(

A note on tags: When passing tags, as params, please pass them as a list (not a comma-separated string):

client.create_text(blogName, tags=['hello', 'world'], ...)

Getting notes for a post

In order to get the notes for a post, you need to have the post id and the blog that it is on.

data = client.notes(blogName, id='123456')

The results include a timestamp you can use to make future calls.

data = client.notes(blogName, id='123456', before_timestamp=data["_links"]["next"]["query_params"]["before_timestamp"])

Tagged Methods

# get posts with a given tag
client.tagged(tag, **params)

Using the interactive console

This client comes with a nice interactive console to run you through the OAuth process, grab your tokens (and store them for future use).

You'll need pyyaml installed to run it, but then it's just:

$ python interactive-console.py

and away you go! Tokens are stored in ~/.tumblr and are also shared by other Tumblr API clients like the Ruby client.

Running tests

The tests (and coverage reports) are run with nose, like this:

python setup.py test

Author: tumblr
Source Code: https://github.com/tumblr/pytumblr
License: Apache-2.0 license

#python #api 

Libraries for Debugging Code in Popular Python

In this Python article, let's learn about Debugging Tools: Libraries for Debugging Code in Popular Python

Table of contents:

  • pdb-like Debugger
    • ipdb - IPython-enabled pdb.
    • pdb++ - Another drop-in replacement for pdb.
    • pudb - A full-screen, console-based Python debugger.
    • wdb - An improbable web debugger through WebSockets.
  • Tracing
    • lptrace - strace for Python programs.
    • manhole - Debugging UNIX socket connections and present the stacktraces for all threads and an interactive prompt.
    • pyringe - Debugger capable of attaching to and injecting code into Python processes.
    • python-hunter - A flexible code tracing toolkit.
  • Profiler
    • line_profiler - Line-by-line profiling.
    • memory_profiler - Monitor Memory usage of Python code.
    • py-spy - A sampling profiler for Python programs. Written in Rust.
    • pyflame - A ptracing profiler For Python.
    • vprof - Visual Python profiler.
  • Others
    • django-debug-toolbar - Display various debug information for Django.
    • django-devserver - A drop-in replacement for Django's runserver.
    • flask-debugtoolbar - A port of the django-debug-toolbar to flask.
    • icecream - Inspect variables, expressions, and program execution with a single, simple function call.
    • pyelftools - Parsing and analyzing ELF files and DWARF debugging information.

 

What is a debugging tool?

A debugger is a software tool that can help the software development process by identifying coding errors at various stages of the operating system or application development. Some debuggers will analyze a test run to see what lines of code were not executed.

Debugger for Python programs with a graphical user interface. It uses bdb (part of stdlib) but adds a GUI and has some powerful features like object browser, windows for variables, classes, functions, exceptions, stack, conditional breakpoints, etc.


Libraries for Debugging Code in Popular Python

  1. IPython pdb

ipdb exports functions to access the IPython debugger, which features tab completion, syntax highlighting, better tracebacks, better introspection with the same interface as the pdb module.

Example usage:

import ipdb
ipdb.set_trace()
ipdb.set_trace(context=5)  # will show five lines of code
                           # instead of the default three lines
                           # or you can set it via IPDB_CONTEXT_SIZE env variable
                           # or setup.cfg file
ipdb.pm()
ipdb.run('x[0] = 3')
result = ipdb.runcall(function, arg0, arg1, kwarg='foo')
result = ipdb.runeval('f(1,2) - 3')

Arguments for set_trace

The set_trace function accepts context which will show as many lines of code as defined, and cond, which accepts boolean values (such as abc == 17) and will start ipdb's interface whenever cond equals to True.

Using configuration file

It's possible to set up context using a .ipdb file on your home folder, setup.cfg or pyproject.toml on your project folder. You can also set your file location via env var $IPDB_CONFIG. Your environment variable has priority over the home configuration file, which in turn has priority over the setup config file. Currently, only context setting is available.

A valid setup.cfg is as follows

[ipdb]
context=5

A valid .ipdb is as follows

context=5

A valid pyproject.toml is as follows

[tool.ipdb]
context=5

The post-mortem function, ipdb.pm(), is equivalent to the magic function %debug.

View on GitHub


2.  pdb++

pdb++, a drop-in replacement for pdb (the Python debugger)

What is it?

This module is an extension of the pdb module of the standard library. It is meant to be fully compatible with its predecessor, yet it introduces a number of new features to make your debugging experience as nice as possible.

https://user-images.githubusercontent.com/412005/64484794-2f373380-d20f-11e9-9f04-e1dabf113c6f.png

pdb++ features include:

  • colorful TAB completion of Python expressions (through fancycompleter)
  • optional syntax highlighting of code listings (through Pygments)
  • sticky mode
  • several new commands to be used from the interactive (Pdb++) prompt
  • smart command parsing (hint: have you ever typed r or c at the prompt to print the value of some variable?)
  • additional convenience functions in the pdb module, to be used from your program

pdb++ is meant to be a drop-in replacement for pdb. If you find some unexpected behavior, please report it as a bug.

Installation

Since pdb++ is not a valid package name the package is named pdbpp:

$ pip install pdbpp

pdb++ is also available via conda:

$ conda install -c conda-forge pdbpp

Alternatively, you can just put pdb.py somewhere inside your PYTHONPATH.

View on GitHub


3.  PuDB

Its goal is to provide all the niceties of modern GUI-based debuggers in a more lightweight and keyboard-friendly package. PuDB allows you to debug code right where you write and test it--in a terminal.

Here are some screenshots:

Light theme

  • doc/images/pudb-screenshot-light.png

Dark theme

  • doc/images/pudb-screenshot-dark.png

View on GitHub


4.  wdb

An improbable web debugger through WebSockets

wdb is a full featured web debugger based on a client-server architecture.

The wdb server which is responsible of managing debugging instances along with browser connections (through websockets) is based on Tornado. The wdb clients allow step by step debugging, in-program python code execution, code edition (based on CodeMirror) setting breakpoints...

Due to this architecture, all of this is fully compatible with multithread and multiprocess programs.

wdb works with python 2 (2.6, 2.7), python 3 (3.2, 3.3, 3.4, 3.5) and pypy. Even better, it is possible to debug a python 2 program with a wdb server running on python 3 and vice-versa or debug a program running on a computer with a debugging server running on another computer inside a web page on a third computer!

Even betterer, it is now possible to pause a currently running python process/thread using code injection from the web interface. (This requires gdb and ptrace enabled)

In other words it's a very enhanced version of pdb directly in your browser with nice features.

Installation:

Global installation:

    $ pip install wdb.server

In virtualenv or with a different python installation:

    $ pip install wdb

(You must have the server installed and running)

View on GitHub


5.  lptrace

lptrace is strace for Python programs. It lets you see in real-time what functions a Python program is running. It's particularly useful to debug weird issues on production.

For example, let's debug a non-trivial program, the Python SimpleHTTPServer. First, let's run the server:

vagrant@precise32:/vagrant$ python -m SimpleHTTPServer 8080 &
[1] 1818
vagrant@precise32:/vagrant$ Serving HTTP on 0.0.0.0 port 8080 ...

Now let's connect lptrace to it:

vagrant@precise32:/vagrant$ sudo python lptrace -p 1818
...
fileno (/usr/lib/python2.7/SocketServer.py:438)
meth (/usr/lib/python2.7/socket.py:223)

fileno (/usr/lib/python2.7/SocketServer.py:438)
meth (/usr/lib/python2.7/socket.py:223)

_handle_request_noblock (/usr/lib/python2.7/SocketServer.py:271)
get_request (/usr/lib/python2.7/SocketServer.py:446)
accept (/usr/lib/python2.7/socket.py:201)
__init__ (/usr/lib/python2.7/socket.py:185)
verify_request (/usr/lib/python2.7/SocketServer.py:296)
process_request (/usr/lib/python2.7/SocketServer.py:304)
finish_request (/usr/lib/python2.7/SocketServer.py:321)
__init__ (/usr/lib/python2.7/SocketServer.py:632)
setup (/usr/lib/python2.7/SocketServer.py:681)
makefile (/usr/lib/python2.7/socket.py:212)
__init__ (/usr/lib/python2.7/socket.py:246)
makefile (/usr/lib/python2.7/socket.py:212)
__init__ (/usr/lib/python2.7/socket.py:246)
handle (/usr/lib/python2.7/BaseHTTPServer.py:336)
handle_one_request (/usr/lib/python2.7/BaseHTTPServer.py:301)
^CReceived Ctrl-C, quitting
vagrant@precise32:/vagrant$

You can see that the server is handling the request in real time! After pressing Ctrl-C, the trace is removed and the program execution resumes normally.

View on GitHub


6.  python-manhole

Debugging manhole for python applications.

Manhole is in-process service that will accept unix domain socket connections and present the stacktraces for all threads and an interactive prompt. It can either work as a python daemon thread waiting for connections at all times or a signal handler (stopping your application and waiting for a connection).

Access to the socket is restricted to the application's effective user id or root.

This is just like Twisted's manhole. It's simpler (no dependencies), it only runs on Unix domain sockets (in contrast to Twisted's manhole which can run on telnet or ssh) and it integrates well with various types of applications.

Usage

Install it:

pip install manhole

You can put this in your django settings, wsgi app file, some module that's always imported early etc:

import manhole
manhole.install() # this will start the daemon thread

# and now you start your app, eg: server.serve_forever()

Now in a shell you can do either of these:

netcat -U /tmp/manhole-1234
socat - unix-connect:/tmp/manhole-1234
socat readline unix-connect:/tmp/manhole-1234

Socat with readline is best (history, editing etc). If your socat doesn't have readline try this.

Sample output:

$ nc -U /tmp/manhole-1234

Python 2.7.3 (default, Apr 10 2013, 06:20:15)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> dir()
['__builtins__', 'dump_stacktraces', 'os', 'socket', 'sys', 'traceback']
>>> print 'foobar'
foobar

View on GitHub


7.  Pyringe

Pyringe is a python debugger capable of attaching to running processes, inspecting their state and even of injecting python code into them while they're running. With pyringe, you can list threads, get tracebacks, inspect locals/globals/builtins of running functions, all without having to prepare your program for it.

How do I use it?

You can start the debugger by executing python -m pyringe. Alternatively:

import pyringe
pyringe.interact()

If that reminds you of the code module, good; this is intentional.
After starting the debugger, you'll be greeted by what behaves almost like a regular python REPL.
Try the following:

==> pid:[None] #threads:[0] current thread:[None]
>>> help()
Available commands:
 attach: Attach to the process with the given pid.
 bt: Get a backtrace of the current position.
 [...]
==> pid:[None] #threads:[0] current thread:[None]
>>> attach(12679)
==> pid:[12679] #threads:[11] current thread:[140108099462912]
>>> threads()
[140108099462912, 140108107855616, 140108116248323, 140108124641024, 140108133033728, 140108224739072, 140108233131776, 140108141426432, 140108241524480, 140108249917184, 140108269324032]

The IDs you see here correspond to what threading.current_thread().ident would tell you.
All debugger functions are just regular python functions that have been exposed to the REPL, so you can do things like the following.

==> pid:[12679] #threads:[11] current thread:[140108099462912]
>>> for tid in threads():
...   if not tid % 10:
...     thread(tid)
...     bt()
... 
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 524, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 504, in run
    self.__target(*self.__args, **self.__kwargs)
  File "./test.py", line 46, in Idle
    Thread_2_Func(1)
  File "./test.py", line 40, in Wait
    time.sleep(n)
==> pid:[12679] #threads:[11] current thread:[140108241524480]
>>> 

You can access the inferior's locals and inspect them like so:

==> pid:[12679] #threads:[11] current thread:[140108241524480]
>>> inflocals()
{'a': <proxy of A object at remote 0x1d9b290>, 'LOL': 'success!', 'b': <proxy of B object at remote 0x1d988c0>, 'n': 1}
==> pid:[12679] #threads:[11] current thread:[140108241524480]
>>> p('a')
<proxy of A object at remote 0x1d9b290>
==> pid:[12679] #threads:[11] current thread:[140108241524480]
>>> p('a').attr
'Some_magic_string'
==> pid:[12679] #threads:[11] current thread:[140108241524480]
>>> 

And sure enough, the definition of a's class reads:

class Example(object):
  cl_attr = False
  def __init__(self):
    self.attr = 'Some_magic_string'

There's limits to how far this proxying of objects goes, and everything that isn't trivial data will show up as strings (like '<function at remote 0x1d957d0>').

View on GitHub


8.  python-hunter

Hunter is a flexible code tracing toolkit, not for measuring coverage, but for debugging, logging, inspection and other nefarious purposes. It has a simple Python API, a convenient terminal API and a CLI tool to attach to processes.

Installation

pip install hunter

Documentation

https://python-hunter.readthedocs.io/

Getting started

Basic use involves passing various filters to the trace option. An example:

import hunter
hunter.trace(module='posixpath', action=hunter.CallPrinter)

import os
os.path.join('a', 'b')

That would result in:

>>> os.path.join('a', 'b')
         /usr/lib/python3.6/posixpath.py:75    call      => join(a='a')
         /usr/lib/python3.6/posixpath.py:80    line         a = os.fspath(a)
         /usr/lib/python3.6/posixpath.py:81    line         sep = _get_sep(a)
         /usr/lib/python3.6/posixpath.py:41    call         => _get_sep(path='a')
         /usr/lib/python3.6/posixpath.py:42    line            if isinstance(path, bytes):
         /usr/lib/python3.6/posixpath.py:45    line            return '/'
         /usr/lib/python3.6/posixpath.py:45    return       <= _get_sep: '/'
         /usr/lib/python3.6/posixpath.py:82    line         path = a
         /usr/lib/python3.6/posixpath.py:83    line         try:
         /usr/lib/python3.6/posixpath.py:84    line         if not p:
         /usr/lib/python3.6/posixpath.py:86    line         for b in map(os.fspath, p):
         /usr/lib/python3.6/posixpath.py:87    line         if b.startswith(sep):
         /usr/lib/python3.6/posixpath.py:89    line         elif not path or path.endswith(sep):
         /usr/lib/python3.6/posixpath.py:92    line         path += sep + b
         /usr/lib/python3.6/posixpath.py:86    line         for b in map(os.fspath, p):
         /usr/lib/python3.6/posixpath.py:96    line         return path
         /usr/lib/python3.6/posixpath.py:96    return    <= join: 'a/b'
'a/b'

In a terminal it would look like:

https://raw.githubusercontent.com/ionelmc/python-hunter/master/docs/code-trace.png

Another useful scenario is to ignore all standard modules and force colors to make them stay even if the output is redirected to a file.

import hunter
hunter.trace(stdlib=False, action=hunter.CallPrinter(force_colors=True))

View on GitHub


9.  line_profiler

line_profiler is a module for doing line-by-line profiling of functions. kernprof is a convenient script for running either line_profiler or the Python standard library's cProfile or profile modules, depending on what is available.

Installation

Note: As of version 2.1.2, pip install line_profiler does not work. Please install as follows until it is fixed in the next release:

git clone https://github.com/rkern/line_profiler.git
find line_profiler -name '*.pyx' -exec cython {} \;
cd line_profiler
pip install . --user

Releases of line_profiler can be installed using pip:

$ pip install line_profiler

Source releases and any binaries can be downloaded from the PyPI link.

http://pypi.python.org/pypi/line_profiler

To check out the development sources, you can use Git:

$ git clone https://github.com/rkern/line_profiler.git

You may also download source tarballs of any snapshot from that URL.

Source releases will require a C compiler in order to build line_profiler. In addition, git checkouts will also require Cython >= 0.10. Source releases on PyPI should contain the pregenerated C sources, so Cython should not be required in that case.

kernprof is a single-file pure Python script and does not require a compiler. If you wish to use it to run cProfile and not line-by-line profiling, you may copy it to a directory on your PATH manually and avoid trying to build any C extensions.

View on GitHub


10.  Memory Profiler

This is a python module for monitoring memory consumption of a process as well as line-by-line analysis of memory consumption for python programs. It is a pure python module which depends on the psutil module.

Installation

To install through easy_install or pip:

$ easy_install -U memory_profiler # pip install -U memory_profiler

To install from source, download the package, extract and type:

$ python setup.py install

Usage

line-by-line memory usage

The line-by-line memory usage mode is used much in the same way of the line_profiler: first decorate the function you would like to profile with @profile and then run the script with a special script (in this case with specific arguments to the Python interpreter).

In the following example, we create a simple function my_func that allocates lists a, b and then deletes b:

@profile
def my_func():
    a = [1] * (10 ** 6)
    b = [2] * (2 * 10 ** 7)
    del b
    return a

if __name__ == '__main__':
    my_func()

Execute the code passing the option -m memory_profiler to the python interpreter to load the memory_profiler module and print to stdout the line-by-line analysis. If the file name was example.py, this would result in:

$ python -m memory_profiler example.py

Output will follow:

Line #    Mem usage  Increment   Line Contents
==============================================
     3                           @profile
     4      5.97 MB    0.00 MB   def my_func():
     5     13.61 MB    7.64 MB       a = [1] * (10 ** 6)
     6    166.20 MB  152.59 MB       b = [2] * (2 * 10 ** 7)
     7     13.61 MB -152.59 MB       del b
     8     13.61 MB    0.00 MB       return a

The first column represents the line number of the code that has been profiled, the second column (Mem usage) the memory usage of the Python interpreter after that line has been executed. The third column (Increment) represents the difference in memory of the current line with respect to the last one. The last column (Line Contents) prints the code that has been profiled.

View on GitHub


11.  py-spy

py-spy is a sampling profiler for Python programs. It lets you visualize what your Python program is spending time on without restarting the program or modifying the code in any way. py-spy is extremely low overhead: it is written in Rust for speed and doesn't run in the same process as the profiled Python program. This means py-spy is safe to use against production Python code.

py-spy works on Linux, OSX, Windows and FreeBSD, and supports profiling all recent versions of the CPython interpreter (versions 2.3-2.7 and 3.3-3.10).

Installation

Prebuilt binary wheels can be installed from PyPI with:

pip install py-spy

You can also download prebuilt binaries from the GitHub Releases Page.

If you're a Rust user, py-spy can also be installed with: cargo install py-spy.

On macOS, py-spy is in Homebrew and can be installed with brew install py-spy.

On Arch Linux, py-spy is in AUR and can be installed with yay -S py-spy.

On Alpine Linux, py-spy is in testing repository and can be installed with apk add py-spy --update-cache --repository http://dl-3.alpinelinux.org/alpine/edge/testing/ --allow-untrusted.

Usage

py-spy works from the command line and takes either the PID of the program you want to sample from or the command line of the python program you want to run. py-spy has three subcommands record, top and dump:

record

py-spy supports recording profiles to a file using the record command. For example, you can generate a flame graph of your python process by going:

py-spy record -o profile.svg --pid 12345
# OR
py-spy record -o profile.svg -- python myprogram.py

View on GitHub


12.  Pyflame

Pyflame is a high performance profiling tool that generates flame graphs for Python. Pyflame is implemented in C++, and uses the Linux ptrace(2) system call to collect profiling information. It can take snapshots of the Python call stack without explicit instrumentation, meaning you can profile a program without modifying its source code. Pyflame is capable of profiling embedded Python interpreters like uWSGI. It fully supports profiling multi-threaded Python programs.

Pyflame usually introduces significantly less overhead than the builtin profile (or cProfile) modules, and emits richer profiling data. The profiling overhead is low enough that you can use it to profile live processes in production.

Quickstart

Building And Installing

For Debian/Ubuntu, install the following:

# Install build dependencies on Debian or Ubuntu.
sudo apt-get install autoconf automake autotools-dev g++ pkg-config python-dev python3-dev libtool make

Once you have the build dependencies installed:

./autogen.sh
./configure
make

The make command will produce an executable at src/pyflame that you can run and use.

Optionally, if you have virtualenv installed, you can test the executable you produced using make check.

Using Pyflame

The full documentation for using Pyflame is here. But here's a quick guide:

# Attach to PID 12345 and profile it for 1 second
pyflame -p 12345

# Attach to PID 768 and profile it for 5 seconds, sampling every 0.01 seconds
pyflame -s 5 -r 0.01 -p 768

# Run py.test against tests/, emitting sample data to prof.txt
pyflame -o prof.txt -t py.test tests/

In all of these cases you will get flame graph data on stdout (or to a file if you used -o). This data is in the format expected by flamegraph.pl, which you can find here.

View on GitHub


13.  vprof

vprof is a Python package providing rich and interactive visualizations for various Python program characteristics such as running time and memory usage. It supports Python 3.4+ and distributed under BSD license.

The project is in active development and some of its features might not work as expected.

Installation

vprof can be installed from PyPI

pip install vprof

To build vprof from sources, clone this repository and execute

python3 setup.py deps_install && python3 setup.py build_ui && python3 setup.py install

To install just vprof dependencies, run

python3 setup.py deps_install

Usage

vprof -c <config> <src>

<config> is a combination of supported modes:

  • c - CPU flame graph ⚠️ Not available for windows #62

Shows CPU flame graph for <src>.

  • p - profiler

Runs built-in Python profiler on <src> and displays results.

  • m - memory graph

Shows objects that are tracked by CPython GC and left in memory after code execution. Also shows process memory usage after execution of each line of <src>.

  • h - code heatmap

Displays all executed code of <src> with line run times and execution counts.

View on GitHub


14.  Django Debug Toolbar

The Django Debug Toolbar is a configurable set of panels that display various debug information about the current request/response and when clicked, display more details about the panel's content.


Here's a screenshot of the toolbar in action:

Django Debug Toolbar screenshot

In addition to the built-in panels, a number of third-party panels are contributed by the community.

The current stable version of the Debug Toolbar is 3.6.0. It works on Django ≥ 3.2.4.

View on GitHub


15.  django-devserver

A drop in replacement for Django's built-in runserver command. Features include:

  • An extendable interface for handling things such as real-time logging.
  • Integration with the werkzeug interactive debugger.
  • Threaded (default) and multi-process development servers.
  • Ability to specify a WSGI application as your target environment.

Note

django-devserver works on Django 1.3 and newer

Installation

To install the latest stable version:

pip install git+git://github.com/dcramer/django-devserver#egg=django-devserver

django-devserver has some optional dependancies, which we highly recommend installing.

  • pip install sqlparse -- pretty SQL formatting
  • pip install werkzeug -- interactive debugger
  • pip install guppy -- tracks memory usage (required for MemoryUseModule)
  • pip install line_profiler -- does line-by-line profiling (required for LineProfilerModule)

You will need to include devserver in your INSTALLED_APPS:

INSTALLED_APPS = (
    ...
    'devserver',
)

If you're using django.contrib.staticfiles or any other apps with management command runserver, make sure to put devserver above any of them (or below, for Django<1.7). Otherwise devserver will log an error, but it will fail to work properly.

View on GitHub


16.  Flask Debug-toolbar

This is a port of the excellent django-debug-toolbar for Flask applications.

Installation

Installing is simple with pip:

$ pip install flask-debugtoolbar

Usage

Setting up the debug toolbar is simple:

from flask import Flask
from flask_debugtoolbar import DebugToolbarExtension

app = Flask(__name__)

# the toolbar is only enabled in debug mode:
app.debug = True

# set a 'SECRET_KEY' to enable the Flask session cookies
app.config['SECRET_KEY'] = '<replace with a secret key>'

toolbar = DebugToolbarExtension(app)

The toolbar will automatically be injected into Jinja templates when debug mode is on. In production, setting app.debug = False will disable the toolbar.

View on GitHub


17.  IceCream

Do you ever use print() or log() to debug your code? Of course you do. IceCream, or ic for short, makes print debugging a little sweeter.

ic() is like print(), but better:

  1. It prints both expressions/variable names and their values.
  2. It's 40% faster to type.
  3. Data structures are pretty printed.
  4. Output is syntax highlighted.
  5. It optionally includes program context: filename, line number, and parent function.

IceCream is well tested, permissively licensed, and supports Python 2, Python 3, PyPy2, and PyPy3. (Python 3.11 support is forthcoming.)

Inspect Variables

Have you ever printed variables or expressions to debug your program? If you've ever typed something like

print(foo('123'))

or the more thorough

print("foo('123')", foo('123'))

then ic() will put a smile on your face. With arguments, ic() inspects itself and prints both its own arguments and the values of those arguments.

from icecream import ic

def foo(i):
    return i + 333

ic(foo(123))

Prints

ic| foo(123): 456

Similarly,

d = {'key': {1: 'one'}}
ic(d['key'][1])

class klass():
    attr = 'yep'
ic(klass.attr)

Prints

ic| d['key'][1]: 'one'
ic| klass.attr: 'yep'

Just give ic() a variable or expression and you're done. Easy.

View on GitHub


18.  pyelftools

pyelftools is a pure-Python library for parsing and analyzing ELF files and DWARF debugging information. See the User's guide for more details.

Pre-requisites

As a user of pyelftools, one only needs Python 3 to run. For hacking on pyelftools the requirements are a bit more strict, please see the hacking guide.

Installing

pyelftools can be installed from PyPI (Python package index):

> pip install pyelftools

Alternatively, you can download the source distribution for the most recent and historic versions from the Downloads tab on the pyelftools project page (by going to Tags). Then, you can install from source, as usual:

> python setup.py install

Since pyelftools is a work in progress, it's recommended to have the most recent version of the code. This can be done by downloading the master zip file or just cloning the Git repository.

Since pyelftools has no external dependencies, it's also easy to use it without installing, by locally adjusting PYTHONPATH.

View on GitHub


FAQ about Debugging Tools python

  • How many types of debugging are in Python?

Debugging in any programming language typically involves two types of errors: syntax or logical. Syntax errors are those where the programming language commands are not interpreted by the compiler or interpreter because of a problem with how the program is written.

  • Best Debugging Tools include:

Chrome DevTools, Progress Telerik Fiddler, GDB (GNU Debugger), Data Display Debugger, SonarLint, Froglogic Squish, and TotalView HPC Debugging Software.

  • Why is it called debugging?

The terms "bug" and "debugging" are popularly attributed to Admiral Grace Hopper in the 1940s. While she was working on a Mark II computer at Harvard University, her associates discovered a moth stuck in a relay and thereby impeding operation, whereupon she remarked that they were "debugging" the system.

  • Why do we need debugging?

Debugging is important because it allows software engineers and developers to fix errors in a program before releasing it to the public. It's a complementary process to testing, which involves learning how an error affects a program overall.


Related videos:

Python Tutorial - Introduction to DEBUGGING


Related posts:

#python 

Tamale  Moses

Tamale Moses

1669003576

Exploring Mutable and Immutable in Python

In this Python article, let's learn about Mutable and Immutable in Python. 

Mutable and Immutable in Python

Mutable is a fancy way of saying that the internal state of the object is changed/mutated. So, the simplest definition is: An object whose internal state can be changed is mutable. On the other hand, immutable doesn’t allow any change in the object once it has been created.

Both of these states are integral to Python data structure. If you want to become more knowledgeable in the entire Python Data Structure, take this free course which covers multiple data structures in Python including tuple data structure which is immutable. You will also receive a certificate on completion which is sure to add value to your portfolio.

Mutable Definition

Mutable is when something is changeable or has the ability to change. In Python, ‘mutable’ is the ability of objects to change their values. These are often the objects that store a collection of data.

Immutable Definition

Immutable is the when no change is possible over time. In Python, if the value of an object cannot be changed over time, then it is known as immutable. Once created, the value of these objects is permanent.

List of Mutable and Immutable objects

Objects of built-in type that are mutable are:

  • Lists
  • Sets
  • Dictionaries
  • User-Defined Classes (It purely depends upon the user to define the characteristics) 

Objects of built-in type that are immutable are:

  • Numbers (Integer, Rational, Float, Decimal, Complex & Booleans)
  • Strings
  • Tuples
  • Frozen Sets
  • User-Defined Classes (It purely depends upon the user to define the characteristics)

Object mutability is one of the characteristics that makes Python a dynamically typed language. Though Mutable and Immutable in Python is a very basic concept, it can at times be a little confusing due to the intransitive nature of immutability.

Objects in Python

In Python, everything is treated as an object. Every object has these three attributes:

  • Identity – This refers to the address that the object refers to in the computer’s memory.
  • Type – This refers to the kind of object that is created. For example- integer, list, string etc. 
  • Value – This refers to the value stored by the object. For example – List=[1,2,3] would hold the numbers 1,2 and 3

While ID and Type cannot be changed once it’s created, values can be changed for Mutable objects.

Check out this free python certificate course to get started with Python.

Mutable Objects in Python

I believe, rather than diving deep into the theory aspects of mutable and immutable in Python, a simple code would be the best way to depict what it means in Python. Hence, let us discuss the below code step-by-step:

#Creating a list which contains name of Indian cities  

cities = [‘Delhi’, ‘Mumbai’, ‘Kolkata’]

# Printing the elements from the list cities, separated by a comma & space

for city in cities:
		print(city, end=’, ’)

Output [1]: Delhi, Mumbai, Kolkata

#Printing the location of the object created in the memory address in hexadecimal format

print(hex(id(cities)))

Output [2]: 0x1691d7de8c8

#Adding a new city to the list cities

cities.append(‘Chennai’)

#Printing the elements from the list cities, separated by a comma & space 

for city in cities:
	print(city, end=’, ’)

Output [3]: Delhi, Mumbai, Kolkata, Chennai

#Printing the location of the object created in the memory address in hexadecimal format

print(hex(id(cities)))

Output [4]: 0x1691d7de8c8

The above example shows us that we were able to change the internal state of the object ‘cities’ by adding one more city ‘Chennai’ to it, yet, the memory address of the object did not change. This confirms that we did not create a new object, rather, the same object was changed or mutated. Hence, we can say that the object which is a type of list with reference variable name ‘cities’ is a MUTABLE OBJECT.

Let us now discuss the term IMMUTABLE. Considering that we understood what mutable stands for, it is obvious that the definition of immutable will have ‘NOT’ included in it. Here is the simplest definition of immutable– An object whose internal state can NOT be changed is IMMUTABLE.

Again, if you try and concentrate on different error messages, you have encountered, thrown by the respective IDE; you use you would be able to identify the immutable objects in Python. For instance, consider the below code & associated error message with it, while trying to change the value of a Tuple at index 0. 

#Creating a Tuple with variable name ‘foo’

foo = (1, 2)

#Changing the index[0] value from 1 to 3

foo[0] = 3
	
TypeError: 'tuple' object does not support item assignment 

Immutable Objects in Python

Once again, a simple code would be the best way to depict what immutable stands for. Hence, let us discuss the below code step-by-step:

#Creating a Tuple which contains English name of weekdays

weekdays = ‘Sunday’, ‘Monday’, ‘Tuesday’, ‘Wednesday’, ‘Thursday’, ‘Friday’, ‘Saturday’

# Printing the elements of tuple weekdays

print(weekdays)

Output [1]:  (‘Sunday’, ‘Monday’, ‘Tuesday’, ‘Wednesday’, ‘Thursday’, ‘Friday’, ‘Saturday’)

#Printing the location of the object created in the memory address in hexadecimal format

print(hex(id(weekdays)))

Output [2]: 0x1691cc35090

#tuples are immutable, so you cannot add new elements, hence, using merge of tuples with the # + operator to add a new imaginary day in the tuple ‘weekdays’

weekdays  +=  ‘Pythonday’,

#Printing the elements of tuple weekdays

print(weekdays)

Output [3]: (‘Sunday’, ‘Monday’, ‘Tuesday’, ‘Wednesday’, ‘Thursday’, ‘Friday’, ‘Saturday’, ‘Pythonday’)

#Printing the location of the object created in the memory address in hexadecimal format

print(hex(id(weekdays)))

Output [4]: 0x1691cc8ad68

This above example shows that we were able to use the same variable name that is referencing an object which is a type of tuple with seven elements in it. However, the ID or the memory location of the old & new tuple is not the same. We were not able to change the internal state of the object ‘weekdays’. The Python program manager created a new object in the memory address and the variable name ‘weekdays’ started referencing the new object with eight elements in it.  Hence, we can say that the object which is a type of tuple with reference variable name ‘weekdays’ is an IMMUTABLE OBJECT.

Also Read: Understanding the Exploratory Data Analysis (EDA) in Python

Where can you use mutable and immutable objects:

Mutable objects can be used where you want to allow for any updates. For example, you have a list of employee names in your organizations, and that needs to be updated every time a new member is hired. You can create a mutable list, and it can be updated easily.

Immutability offers a lot of useful applications to different sensitive tasks we do in a network centred environment where we allow for parallel processing. By creating immutable objects, you seal the values and ensure that no threads can invoke overwrite/update to your data. This is also useful in situations where you would like to write a piece of code that cannot be modified. For example, a debug code that attempts to find the value of an immutable object.

Watch outs:  Non transitive nature of Immutability:

OK! Now we do understand what mutable & immutable objects in Python are. Let’s go ahead and discuss the combination of these two and explore the possibilities. Let’s discuss, as to how will it behave if you have an immutable object which contains the mutable object(s)? Or vice versa? Let us again use a code to understand this behaviour–

#creating a tuple (immutable object) which contains 2 lists(mutable) as it’s elements

#The elements (lists) contains the name, age & gender 

person = (['Ayaan', 5, 'Male'], ['Aaradhya', 8, 'Female'])

#printing the tuple

print(person)

Output [1]: (['Ayaan', 5, 'Male'], ['Aaradhya', 8, 'Female'])

#printing the location of the object created in the memory address in hexadecimal format

print(hex(id(person)))

Output [2]: 0x1691ef47f88

#Changing the age for the 1st element. Selecting 1st element of tuple by using indexing [0] then 2nd element of the list by using indexing [1] and assigning a new value for age as 4

person[0][1] = 4

#printing the updated tuple

print(person)

Output [3]: (['Ayaan', 4, 'Male'], ['Aaradhya', 8, 'Female'])

#printing the location of the object created in the memory address in hexadecimal format

print(hex(id(person)))

Output [4]: 0x1691ef47f88

In the above code, you can see that the object ‘person’ is immutable since it is a type of tuple. However, it has two lists as it’s elements, and we can change the state of lists (lists being mutable). So, here we did not change the object reference inside the Tuple, but the referenced object was mutated.

Also Read: Real-Time Object Detection Using TensorFlow

Same way, let’s explore how it will behave if you have a mutable object which contains an immutable object? Let us again use a code to understand the behaviour–

#creating a list (mutable object) which contains tuples(immutable) as it’s elements

list1 = [(1, 2, 3), (4, 5, 6)]

#printing the list

print(list1)

Output [1]: [(1, 2, 3), (4, 5, 6)]

#printing the location of the object created in the memory address in hexadecimal format

print(hex(id(list1)))

Output [2]: 0x1691d5b13c8	

#changing object reference at index 0

list1[0] = (7, 8, 9)

#printing the list

Output [3]: [(7, 8, 9), (4, 5, 6)]

#printing the location of the object created in the memory address in hexadecimal format

print(hex(id(list1)))

Output [4]: 0x1691d5b13c8

As an individual, it completely depends upon you and your requirements as to what kind of data structure you would like to create with a combination of mutable & immutable objects. I hope that this information will help you while deciding the type of object you would like to select going forward.

Before I end our discussion on IMMUTABILITY, allow me to use the word ‘CAVITE’ when we discuss the String and Integers. There is an exception, and you may see some surprising results while checking the truthiness for immutability. For instance:
#creating an object of integer type with value 10 and reference variable name ‘x’ 

x = 10
 

#printing the value of ‘x’

print(x)

Output [1]: 10

#Printing the location of the object created in the memory address in hexadecimal format

print(hex(id(x)))

Output [2]: 0x538fb560

#creating an object of integer type with value 10 and reference variable name ‘y’

y = 10

#printing the value of ‘y’

print(y)

Output [3]: 10

#Printing the location of the object created in the memory address in hexadecimal format

print(hex(id(y)))

Output [4]: 0x538fb560

As per our discussion and understanding, so far, the memory address for x & y should have been different, since, 10 is an instance of Integer class which is immutable. However, as shown in the above code, it has the same memory address. This is not something that we expected. It seems that what we have understood and discussed, has an exception as well.

Quick checkPython Data Structures

Immutability of Tuple

Tuples are immutable and hence cannot have any changes in them once they are created in Python. This is because they support the same sequence operations as strings. We all know that strings are immutable. The index operator will select an element from a tuple just like in a string. Hence, they are immutable.

Exceptions in immutability

Like all, there are exceptions in the immutability in python too. Not all immutable objects are really mutable. This will lead to a lot of doubts in your mind. Let us just take an example to understand this.

Consider a tuple ‘tup’.

Now, if we consider tuple tup = (‘GreatLearning’,[4,3,1,2]) ;

We see that the tuple has elements of different data types. The first element here is a string which as we all know is immutable in nature. The second element is a list which we all know is mutable. Now, we all know that the tuple itself is an immutable data type. It cannot change its contents. But, the list inside it can change its contents. So, the value of the Immutable objects cannot be changed but its constituent objects can. change its value.

FAQs

1. Difference between mutable vs immutable in Python?

Mutable ObjectImmutable Object
State of the object can be modified after it is created.State of the object can’t be modified once it is created.
They are not thread safe.They are thread safe
Mutable classes are not final.It is important to make the class final before creating an immutable object.

2. What are the mutable and immutable data types in Python?

  • Some mutable data types in Python are:

list, dictionary, set, user-defined classes.

  • Some immutable data types are: 

int, float, decimal, bool, string, tuple, range.

3. Are lists mutable in Python?

Lists in Python are mutable data types as the elements of the list can be modified, individual elements can be replaced, and the order of elements can be changed even after the list has been created.
(Examples related to lists have been discussed earlier in this blog.)

4. Why are tuples called immutable types?

Tuple and list data structures are very similar, but one big difference between the data types is that lists are mutable, whereas tuples are immutable. The reason for the tuple’s immutability is that once the elements are added to the tuple and the tuple has been created; it remains unchanged.

A programmer would always prefer building a code that can be reused instead of making the whole data object again. Still, even though tuples are immutable, like lists, they can contain any Python object, including mutable objects.

5. Are sets mutable in Python?

A set is an iterable unordered collection of data type which can be used to perform mathematical operations (like union, intersection, difference etc.). Every element in a set is unique and immutable, i.e. no duplicate values should be there, and the values can’t be changed. However, we can add or remove items from the set as the set itself is mutable.

6. Are strings mutable in Python?

Strings are not mutable in Python. Strings are a immutable data types which means that its value cannot be updated.

Join Great Learning Academy’s free online courses and upgrade your skills today.


Original article source at: https://www.mygreatlearning.com

#python 

Ray  Patel

Ray Patel

1625843760

Python Packages in SQL Server – Get Started with SQL Server Machine Learning Services

Introduction

When installing Machine Learning Services in SQL Server by default few Python Packages are installed. In this article, we will have a look on how to get those installed python package information.

Python Packages

When we choose Python as Machine Learning Service during installation, the following packages are installed in SQL Server,

  • revoscalepy – This Microsoft Python package is used for remote compute contexts, streaming, parallel execution of rx functions for data import and transformation, modeling, visualization, and analysis.
  • microsoftml – This is another Microsoft Python package which adds machine learning algorithms in Python.
  • Anaconda 4.2 – Anaconda is an opensource Python package

#machine learning #sql server #executing python in sql server #machine learning using python #machine learning with sql server #ml in sql server using python #python in sql server ml #python packages #python packages for machine learning services #sql server machine learning services