Actors.jl: Concurrent Computing in Julia Based on The Actor Model

Actors.jl

Concurrent computing in Julia with actors.


Actors implements the Actor Model of computation:

An actor ... in response to a message it receives, can concurrently:

  • send a finite number of messages to other actors;
  • create a finite number of new actors;
  • designate the behavior to be used for the next message it receives.

Actors make(s) concurrency easy to understand and reason about and integrate(s) well with Julia's multi-threading and distributed computing. It provides an API for writing reactive applications, that are:

  • responsive: react to inputs and events,
  • message-driven: rely on asynchronous message-passing,
  • resilient: can cope with failures,
  • elastic: can distribute load over multiple threads and workers.

Greeting Actors

The following example defines two behavior functions: greet and hello and spawns two actors with them. sayhello will forward a message to greeter, get a greeting string back and deliver it as a result:

julia> using Actors

julia> import Actors: spawn

julia> greet(greeting, msg) = greeting*", "*msg*"!" # a greetings server
greet (generic function with 1 method)

julia> hello(greeter, to) = request(greeter, to)    # a greetings client
hello (generic function with 1 method)

julia> greeter = spawn(greet, "Hello")              # start the server with a greet string
Link{Channel{Any}}(Channel{Any}(sz_max:32,sz_curr:0), 1, :default)

julia> sayhello = spawn(hello, greeter)             # start the client with a link to the server
Link{Channel{Any}}(Channel{Any}(sz_max:32,sz_curr:0), 1, :default)

julia> request(sayhello, "World")                   # request the client
"Hello, World!"

julia> request(sayhello, "Kermit")
"Hello, Kermit!"

Please look into the manual for more information and more serious examples.

Development

Actors is part of the Julia GitHub group JuliaActors. Please join!


Download Details:

Author: JuliaActors
Source Code: https://github.com/JuliaActors/Actors.jl 
License: MIT license

#julia #actors #concurrency 

Actors.jl: Concurrent Computing in Julia Based on The Actor Model

Non-blocking synchronization primitives for PHP based on Amp & Revolt

amphp/sync

AMPHP is a collection of event-driven libraries for PHP designed with fibers and concurrency in mind. amphp/sync specifically provides synchronization primitives such as locks and semaphores for asynchronous and concurrent programming.

Installation

This package can be installed as a Composer dependency.

composer require amphp/sync

Usage

The weak link when managing concurrency is humans; so amphp/sync provides abstractions to hide some complexity.

Mutex

Mutual exclusion can be achieved using Amp\Sync\synchronized() and any Mutex implementation, or by manually using the Mutex instance to acquire a Lock.

As long as the resulting Lock object isn't released using Lock::release() or by being garbage collected, the holder of the lock can exclusively run some code as long as all other parties running the same code also acquire a lock before doing so.

function writeExclusively(Amp\Sync\Mutex $mutex, string $filePath, string $data) {
    $lock = $mutex->acquire();
    
    try {
        Amp\File\write($filePath, $data);
    } finally {
        $lock->release();
    }
}
function writeExclusively(Amp\Sync\Mutex $mutex, string $filePath, string $data) {
    Amp\Sync\synchronized($mutex, fn () => Amp\File\write($filePath, $data));
}

Semaphore

Semaphores are another synchronization primitive in addition to mutual exclusion.

Instead of providing exclusive access to a single party, they provide access to a limited set of N parties at the same time. This makes them great to control concurrency, e.g. limiting an HTTP client to X concurrent requests, so the HTTP server doesn't get overwhelmed.

Similar to Mutex, Lock instances can be acquired using Semaphore::acquire(). Please refer to the Mutex documentation for additional usage documentation, as they're basically equivalent except for the fact that Mutex is always a Semaphore with a count of exactly one party.

In many cases you can use amphp/pipeline instead of directly using a Semaphore.

Concurrency Approaches

Given you have a list of URLs you want to crawl, let's discuss a few possible approaches. For simplicity, we will assume a fetch function already exists, which takes a URL and returns the HTTP status code (which is everything we want to know for these examples).

Approach 1: Sequential

Simple loop using non-blocking I/O, but no concurrency while fetching the individual URLs; starts the second request as soon as the first completed.

$urls = [...];

$results = [];

foreach ($urls as $url) {
    $results[$url] = fetch($url);
}

var_dump($results);

Approach 2: Everything Concurrently

Almost the same loop, but awaiting all operations at once; starts all requests immediately. Might not be feasible with too many URLs.

$urls = [...];

$results = [];

foreach ($urls as $url) {
    $results[$url] = Amp\async(fetch(...), $url);
}

$results = Amp\Future\await($results);

var_dump($results);

Approach 3: Concurrent Chunks

Splitting the jobs into chunks of ten; all requests within a chunk are made concurrently, but each chunk sequentially, so the timing for each chunk depends on the slowest response; starts the eleventh request as soon as the first ten requests completed.

$urls = [...];

$results = [];

foreach (\array_chunk($urls, 10) as $chunk) {
    $futures = [];

    foreach ($chunk as $url) {
        $futures[$url] = Amp\async(fetch(...), $url);
    }

    $results = \array_merge($results, Amp\Future\await($futures));
}

var_dump($results);

Approach 4: ConcurrentIterator

TODO: Link to example of amphp/pipeline

Versioning

amphp/sync follows the semver semantic versioning specification like all other amphp packages.

Security

If you discover any security related issues, please email me@kelunik.com instead of using the issue tracker.

Download Details:

Author: amphp
Source Code: https://github.com/amphp/sync 
License: MIT license

#php #concurrency #async 

Non-blocking synchronization primitives for PHP based on Amp & Revolt

Parallel: Parallel processing for PHP based on Amp

Parallel

amphp/parallel provides true parallel processing for PHP using multiple processes or native threads, without blocking and no extensions required.

To be as flexible as possible, this library comes with a collection of non-blocking concurrency tools that can be used independently as needed, as well as an "opinionated" worker API that allows you to assign units of work to a pool of worker threads or processes.

Installation

This package can be installed as a Composer dependency.

composer require amphp/parallel

Usage

The basic usage of this library is to submit blocking tasks to be executed by a worker pool in order to avoid blocking the main event loop.

<?php

require __DIR__ . '/../vendor/autoload.php';

use Amp\Parallel\Worker;
use Amp\Promise;

$urls = [
    'https://secure.php.net',
    'https://amphp.org',
    'https://github.com',
];

$promises = [];
foreach ($urls as $url) {
    $promises[$url] = Worker\enqueueCallable('file_get_contents', $url);
}

$responses = Promise\wait(Promise\all($promises));

foreach ($responses as $url => $response) {
    \printf("Read %d bytes from %s\n", \strlen($response), $url);
}

file_get_contents is just used as an example for a blocking function here. If you just want to fetch multiple HTTP resources concurrently, it's better to use amphp/http-client, our non-blocking HTTP client.

The functions you call must be predefined or autoloadable by Composer so they also exist in the worker processes. Instead of simple callables, you can also enqueue Task instances with Amp\Parallel\Worker\enqueue().

Documentation

Documentation can be found on amphp.org/parallel as well as in the ./docs directory.

Versioning

amphp/parallel follows the semver semantic versioning specification like all other amphp packages.

Security

If you discover any security related issues, please email me@kelunik.com instead of using the issue tracker.

Development and Contributing

Want to hack on the source? A Vagrant box is provided with the repository to give a common development environment for running concurrent threads and processes, and comes with a bunch of handy tools and scripts for testing and experimentation.

Starting up and logging into the virtual machine is as simple as

vagrant up && vagrant ssh

Once inside the VM, you can install PHP extensions with Pickle, switch versions with newphp VERSION, and test for memory leaks with Valgrind.

Download Details:

Author: amphp
Source Code: https://github.com/amphp/parallel 
License: MIT license

#php #parallel #concurrency 

Parallel: Parallel processing for PHP based on Amp
Elian  Harber

Elian Harber

1667500080

Concurrent-map: A Thread-safe Concurrent Map for Go

Concurrent map 

As explained here and here, the map type in Go doesn't support concurrent reads and writes. concurrent-map provides a high-performance solution to this by sharding the map with minimal time spent waiting for locks.

Prior to Go 1.9, there was no concurrent map implementation in the stdlib. In Go 1.9, sync.Map was introduced. The new sync.Map has a few key differences from this map. The stdlib sync.Map is designed for append-only scenarios. So if you want to use the map for something more like in-memory db, you might benefit from using our version. You can read more about it in the golang repo, for example here and here

usage

Import the package:

import (
    "github.com/orcaman/concurrent-map/v2"
)
go get "github.com/orcaman/concurrent-map/v2"

The package is now imported under the "cmap" namespace.

example


    // Create a new map.
    m := cmap.New[string]()

    // Sets item within map, sets "bar" under key "foo"
    m.Set("foo", "bar")

    // Retrieve item from map.
    bar, ok := m.Get("foo")

    // Removes item under key "foo"
    m.Remove("foo")

For more examples have a look at concurrent_map_test.go.

Running tests:

go test "github.com/orcaman/concurrent-map/v2"

guidelines for contributing

Contributions are highly welcome. In order for a contribution to be merged, please follow these guidelines:

  • Open an issue and describe what you are after (fixing a bug, adding an enhancement, etc.).
  • According to the core team's feedback on the above mentioned issue, submit a pull request, describing the changes and linking to the issue.
  • New code must have test coverage.
  • If the code is about performance issues, you must include benchmarks in the process (either in the issue or in the PR).
  • In general, we would like to keep concurrent-map as simple as possible and as similar to the native map. Please keep this in mind when opening issues.

language

Download Details:

Author: orcaman
Source Code: https://github.com/orcaman/concurrent-map 
License: MIT license

#go #golang #map #concurrency 

Concurrent-map: A Thread-safe Concurrent Map for Go

Crawler: An Easy to Use, Powerful Crawler Implemented in PHP

🕸 Crawl the web using PHP 🕷

This package provides a class to crawl links on a website. Under the hood Guzzle promises are used to crawl multiple urls concurrently.

Because the crawler can execute JavaScript, it can crawl JavaScript rendered sites. Under the hood Chrome and Puppeteer are used to power this feature.

Installation

This package can be installed via Composer:

composer require spatie/crawler

Usage

The crawler can be instantiated like this

use Spatie\Crawler\Crawler;

Crawler::create()
    ->setCrawlObserver(<class that extends \Spatie\Crawler\CrawlObservers\CrawlObserver>)
    ->startCrawling($url);

The argument passed to setCrawlObserver must be an object that extends the \Spatie\Crawler\CrawlObservers\CrawlObserver abstract class:

namespace Spatie\Crawler\CrawlObservers;

use GuzzleHttp\Exception\RequestException;
use Psr\Http\Message\ResponseInterface;
use Psr\Http\Message\UriInterface;

abstract class CrawlObserver
{
    /**
     * Called when the crawler will crawl the url.
     *
     * @param \Psr\Http\Message\UriInterface $url
     */
    public function willCrawl(UriInterface $url): void
    {
    }

    /**
     * Called when the crawler has crawled the given url successfully.
     *
     * @param \Psr\Http\Message\UriInterface $url
     * @param \Psr\Http\Message\ResponseInterface $response
     * @param \Psr\Http\Message\UriInterface|null $foundOnUrl
     */
    abstract public function crawled(
        UriInterface $url,
        ResponseInterface $response,
        ?UriInterface $foundOnUrl = null
    ): void;

    /**
     * Called when the crawler had a problem crawling the given url.
     *
     * @param \Psr\Http\Message\UriInterface $url
     * @param \GuzzleHttp\Exception\RequestException $requestException
     * @param \Psr\Http\Message\UriInterface|null $foundOnUrl
     */
    abstract public function crawlFailed(
        UriInterface $url,
        RequestException $requestException,
        ?UriInterface $foundOnUrl = null
    ): void;

    /**
     * Called when the crawl has ended.
     */
    public function finishedCrawling(): void
    {
    }
}

Using multiple observers

You can set multiple observers with setCrawlObservers:

Crawler::create()
    ->setCrawlObservers([
        <class that extends \Spatie\Crawler\CrawlObservers\CrawlObserver>,
        <class that extends \Spatie\Crawler\CrawlObservers\CrawlObserver>,
        ...
     ])
    ->startCrawling($url);

Alternatively you can set multiple observers one by one with addCrawlObserver:

Crawler::create()
    ->addCrawlObserver(<class that extends \Spatie\Crawler\CrawlObservers\CrawlObserver>)
    ->addCrawlObserver(<class that extends \Spatie\Crawler\CrawlObservers\CrawlObserver>)
    ->addCrawlObserver(<class that extends \Spatie\Crawler\CrawlObservers\CrawlObserver>)
    ->startCrawling($url);

Executing JavaScript

By default, the crawler will not execute JavaScript. This is how you can enable the execution of JavaScript:

Crawler::create()
    ->executeJavaScript()
    ...

In order to make it possible to get the body html after the javascript has been executed, this package depends on our Browsershot package. This package uses Puppeteer under the hood. Here are some pointers on how to install it on your system.

Browsershot will make an educated guess as to where its dependencies are installed on your system. By default, the Crawler will instantiate a new Browsershot instance. You may find the need to set a custom created instance using the setBrowsershot(Browsershot $browsershot) method.

Crawler::create()
    ->setBrowsershot($browsershot)
    ->executeJavaScript()
    ...

Note that the crawler will still work even if you don't have the system dependencies required by Browsershot. These system dependencies are only required if you're calling executeJavaScript().

Filtering certain urls

You can tell the crawler not to visit certain urls by using the setCrawlProfile-function. That function expects an object that extends Spatie\Crawler\CrawlProfiles\CrawlProfile:

/*
 * Determine if the given url should be crawled.
 */
public function shouldCrawl(UriInterface $url): bool;

This package comes with three CrawlProfiles out of the box:

  • CrawlAllUrls: this profile will crawl all urls on all pages including urls to an external site.
  • CrawlInternalUrls: this profile will only crawl the internal urls on the pages of a host.
  • CrawlSubdomains: this profile will only crawl the internal urls and its subdomains on the pages of a host.

Ignoring robots.txt and robots meta

By default, the crawler will respect robots data. It is possible to disable these checks like so:

Crawler::create()
    ->ignoreRobots()
    ...

Robots data can come from either a robots.txt file, meta tags or response headers. More information on the spec can be found here: http://www.robotstxt.org/.

Parsing robots data is done by our package spatie/robots-txt.

Accept links with rel="nofollow" attribute

By default, the crawler will reject all links containing attribute rel="nofollow". It is possible to disable these checks like so:

Crawler::create()
    ->acceptNofollowLinks()
    ...

Using a custom User Agent

In order to respect robots.txt rules for a custom User Agent you can specify your own custom User Agent.

Crawler::create()
    ->setUserAgent('my-agent')

You can add your specific crawl rule group for 'my-agent' in robots.txt. This example disallows crawling the entire site for crawlers identified by 'my-agent'.

// Disallow crawling for my-agent
User-agent: my-agent
Disallow: /

Setting the number of concurrent requests

To improve the speed of the crawl the package concurrently crawls 10 urls by default. If you want to change that number you can use the setConcurrency method.

Crawler::create()
    ->setConcurrency(1) // now all urls will be crawled one by one

Defining Crawl Limits

By default, the crawler continues until it has crawled every page it can find. This behavior might cause issues if you are working in an environment with limitations such as a serverless environment.

The crawl behavior can be controlled with the following two options:

  • Total Crawl Limit (setTotalCrawlLimit): This limit defines the maximal count of URLs to crawl.
  • Current Crawl Limit (setCurrentCrawlLimit): This defines how many URLs are processed during the current crawl.

Let's take a look at some examples to clarify the difference between these two methods.

Example 1: Using the total crawl limit

The setTotalCrawlLimit method allows to limit the total number of URLs to crawl, no matter often you call the crawler.

$queue = <your selection/implementation of a queue>;

// Crawls 5 URLs and ends.
Crawler::create()
    ->setCrawlQueue($queue)
    ->setTotalCrawlLimit(5)
    ->startCrawling($url);

// Doesn't crawl further as the total limit is reached.
Crawler::create()
    ->setCrawlQueue($queue)
    ->setTotalCrawlLimit(5)
    ->startCrawling($url);

Example 2: Using the current crawl limit

The setCurrentCrawlLimit will set a limit on how many URls will be crawled per execution. This piece of code will process 5 pages with each execution, without a total limit of pages to crawl.

$queue = <your selection/implementation of a queue>;

// Crawls 5 URLs and ends.
Crawler::create()
    ->setCrawlQueue($queue)
    ->setCurrentCrawlLimit(5)
    ->startCrawling($url);

// Crawls the next 5 URLs and ends.
Crawler::create()
    ->setCrawlQueue($queue)
    ->setCurrentCrawlLimit(5)
    ->startCrawling($url);

Example 3: Combining the total and crawl limit

Both limits can be combined to control the crawler:

$queue = <your selection/implementation of a queue>;

// Crawls 5 URLs and ends.
Crawler::create()
    ->setCrawlQueue($queue)
    ->setTotalCrawlLimit(10)
    ->setCurrentCrawlLimit(5)
    ->startCrawling($url);

// Crawls the next 5 URLs and ends.
Crawler::create()
    ->setCrawlQueue($queue)
    ->setTotalCrawlLimit(10)
    ->setCurrentCrawlLimit(5)
    ->startCrawling($url);

// Doesn't crawl further as the total limit is reached.
Crawler::create()
    ->setCrawlQueue($queue)
    ->setTotalCrawlLimit(10)
    ->setCurrentCrawlLimit(5)
    ->startCrawling($url);

Example 4: Crawling across requests

You can use the setCurrentCrawlLimit to break up long running crawls. The following example demonstrates a (simplified) approach. It's made up of an initial request and any number of follow-up requests continuing the crawl.

Initial Request

To start crawling across different requests, you will need to create a new queue of your selected queue-driver. Start by passing the queue-instance to the crawler. The crawler will start filling the queue as pages are processed and new URLs are discovered. Serialize and store the queue reference after the crawler has finished (using the current crawl limit).

// Create a queue using your queue-driver.
$queue = <your selection/implementation of a queue>;

// Crawl the first set of URLs
Crawler::create()
    ->setCrawlQueue($queue)
    ->setCurrentCrawlLimit(10)
    ->startCrawling($url);

// Serialize and store your queue
$serializedQueue = serialize($queue);

Subsequent Requests

For any following requests you will need to unserialize your original queue and pass it to the crawler:

// Unserialize queue
$queue = unserialize($serializedQueue);

// Crawls the next set of URLs
Crawler::create()
    ->setCrawlQueue($queue)
    ->setCurrentCrawlLimit(10)
    ->startCrawling($url);

// Serialize and store your queue
$serialized_queue = serialize($queue);

The behavior is based on the information in the queue. Only if the same queue-instance is passed in the behavior works as described. When a completely new queue is passed in, the limits of previous crawls -even for the same website- won't apply.

An example with more details can be found here.

Setting the maximum crawl depth

By default, the crawler continues until it has crawled every page of the supplied URL. If you want to limit the depth of the crawler you can use the setMaximumDepth method.

Crawler::create()
    ->setMaximumDepth(2)

Setting the maximum response size

Most html pages are quite small. But the crawler could accidentally pick up on large files such as PDFs and MP3s. To keep memory usage low in such cases the crawler will only use the responses that are smaller than 2 MB. If, when streaming a response, it becomes larger than 2 MB, the crawler will stop streaming the response. An empty response body will be assumed.

You can change the maximum response size.

// let's use a 3 MB maximum.
Crawler::create()
    ->setMaximumResponseSize(1024 * 1024 * 3)

Add a delay between requests

In some cases you might get rate-limited when crawling too aggressively. To circumvent this, you can use the setDelayBetweenRequests() method to add a pause between every request. This value is expressed in milliseconds.

Crawler::create()
    ->setDelayBetweenRequests(150) // After every page crawled, the crawler will wait for 150ms

Limiting which content-types to parse

By default, every found page will be downloaded (up to setMaximumResponseSize() in size) and parsed for additional links. You can limit which content-types should be downloaded and parsed by setting the setParseableMimeTypes() with an array of allowed types.

Crawler::create()
    ->setParseableMimeTypes(['text/html', 'text/plain'])

This will prevent downloading the body of pages that have different mime types, like binary files, audio/video, ... that are unlikely to have links embedded in them. This feature mostly saves bandwidth.

Using a custom crawl queue

When crawling a site the crawler will put urls to be crawled in a queue. By default, this queue is stored in memory using the built-in ArrayCrawlQueue.

When a site is very large you may want to store that queue elsewhere, maybe a database. In such cases, you can write your own crawl queue.

A valid crawl queue is any class that implements the Spatie\Crawler\CrawlQueues\CrawlQueue-interface. You can pass your custom crawl queue via the setCrawlQueue method on the crawler.

Crawler::create()
    ->setCrawlQueue(<implementation of \Spatie\Crawler\CrawlQueues\CrawlQueue>)

Here

Change the default base url scheme

By default, the crawler will set the base url scheme to http if none. You have the ability to change that with setDefaultScheme.

Crawler::create()
    ->setDefaultScheme('https')

Changelog

Please see CHANGELOG for more information what has changed recently.

Contributing

Please see CONTRIBUTING for details.

Testing

First, install the Puppeteer dependency, or your tests will fail.

npm install puppeteer

To run the tests you'll have to start the included node based server first in a separate terminal window.

cd tests/server
npm install
node server.js

With the server running, you can start testing.

composer test

Security

If you've found a bug regarding security please mail security@spatie.be instead of using the issue tracker.

Postcardware

You're free to use this package, but if it makes it to your production environment we highly appreciate you sending us a postcard from your hometown, mentioning which of our package(s) you are using.

Our address is: Spatie, Kruikstraat 22, 2018 Antwerp, Belgium.

We publish all received postcards on our company website.

Credits

Download Details:

Author: Spatie
Source Code: https://github.com/spatie/crawler 
License: MIT license

#php #crawler #guzzle #concurrency 

Crawler: An Easy to Use, Powerful Crawler Implemented in PHP
Royce  Reinger

Royce Reinger

1663992000

5 Popular Concurrency Library for Rust

In today's post we will learn about 5 Popular Concurrency Library for Rust.

What is Concurrency?

Concurrency is the occurrence of multiple events within overlapping time frames, but not simultaneously. On a computer system, concurrency is implemented in the paradigm called concurrent computing.

The three main types of concurrent computing are threading, asynchrony, and preemptive multitasking. Each method has its own special precautions which must be taken to prevent race conditions, where multiple threads or processes access the same shared data in memory in improper order.

When working with databases, concurrency controls help make sure each transaction on the database takes place in a particular order rather than at the same time. This keeps the transactions from working at the same time, which could cause data to become incorrect or corrupt the database.

Table of contents:

  • Crossbeam-rs/crossbeam – Support for parallelism and low-level concurrency in Rust.
  • Orium/archery [archery] - Library to abstract from Rc/Arc pointer types. 
  • Rayon - A data parallelism library for Rust.
  • Rustcc/coroutine-rs – Coroutine Library in Rust.
  • Zonyitoo/coio-rs - Coroutine I/O for Rust.

1 - Crossbeam-rs/crossbeam:

Support for parallelism and low-level concurrency in Rust.

This crate provides a set of tools for concurrent programming:

Atomics

  • AtomicCell, a thread-safe mutable memory location.(no_std)
  • AtomicConsume, for reading from primitive atomic types with "consume" ordering.(no_std)

Data structures

  • deque, work-stealing deques for building task schedulers.
  • ArrayQueue, a bounded MPMC queue that allocates a fixed-capacity buffer on construction.(alloc)
  • SegQueue, an unbounded MPMC queue that allocates small buffers, segments, on demand.(alloc)

Memory management

  • epoch, an epoch-based garbage collector.(alloc)

Thread synchronization

  • channel, multi-producer multi-consumer channels for message passing.
  • Parker, a thread parking primitive.
  • ShardedLock, a sharded reader-writer lock with fast concurrent reads.
  • WaitGroup, for synchronizing the beginning or end of some computation.

Utilities

  • Backoff, for exponential backoff in spin loops.(no_std)
  • CachePadded, for padding and aligning a value to the length of a cache line.(no_std)
  • scope, for spawning threads that borrow local variables from the stack.

Features marked with (no_std) can be used in no_std environments.
Features marked with (alloc) can be used in no_std environments, but only if alloc feature is enabled.

Crates

The main crossbeam crate just re-exports tools from smaller subcrates:

  • crossbeam-channel provides multi-producer multi-consumer channels for message passing.
  • crossbeam-deque provides work-stealing deques, which are primarily intended for building task schedulers.
  • crossbeam-epoch provides epoch-based garbage collection for building concurrent data structures.
  • crossbeam-queue provides concurrent queues that can be shared among threads.
  • crossbeam-utils provides atomics, synchronization primitives, scoped threads, and other utilities.

There is one more experimental subcrate that is not yet included in crossbeam:

Usage

Add this to your Cargo.toml:

[dependencies]
crossbeam = "0.8"

Compatibility

Crossbeam supports stable Rust releases going back at least six months, and every time the minimum supported Rust version is increased, a new minor version is released. Currently, the minimum supported Rust version is 1.38.

View on Github

2 - Orium/archery [archery]:

Library to abstract from Rc/Arc pointer types. 

Archery is a rust library that offers a way to abstraction over Rc and Arc smart pointers. This allows you to create data structures where the pointer type is parameterizable, so you can avoid the overhead of Arc when you don’t need to share data across threads.

In languages that supports higher-kinded polymorphism this would be simple to achieve without any library, but rust does not support that yet. To mimic higher-kinded polymorphism Archery implements the approach suggested by Joshua Liebow-Feeser in “Rust has higher kinded types already… sort of”. While other approaches exist, they seem to always offer poor ergonomics for the user.

Setup

To use Archery add the following to your Cargo.toml:

[dependencies]
archery = "<version>"

Using Archery

Archery defines a SharedPointer that receives the kind of pointer as a type parameter. This gives you a convenient and ergonomic way to abstract the pointer type away.

Example

Declare a data structure with the pointer kind as a type parameter bounded by SharedPointerKind:

use archery::*;

struct KeyValuePair<K, V, P: SharedPointerKind> {
    pub key: SharedPointer<K, P>,
    pub value: SharedPointer<V, P>,
}

impl<K, V, P: SharedPointerKind> KeyValuePair<K, V, P> {
    fn new(key: K, value: V) -> KeyValuePair<K, V, P> {
        KeyValuePair {
            key: SharedPointer::new(key),
            value: SharedPointer::new(value),
        }
    }
}

To use it just plug-in the kind of pointer you want:

let pair: KeyValuePair<_, _, RcK> =
    KeyValuePair::new("António Variações", 1944);

assert_eq!(*pair.value, 1944);

View on Github

3 - Rayon:

A data parallelism library for Rust.

Rayon is a data-parallelism library for Rust. It is extremely lightweight and makes it easy to convert a sequential computation into a parallel one. It also guarantees data-race freedom. (You may also enjoy this blog post about Rayon, which gives more background and details about how it works, or this video, from the Rust Belt Rust conference.) Rayon is available on crates.io, and API Documentation is available on docs.rs.

Parallel iterators and more

Rayon makes it drop-dead simple to convert sequential iterators into parallel ones: usually, you just change your foo.iter() call into foo.par_iter(), and Rayon does the rest:

use rayon::prelude::*;
fn sum_of_squares(input: &[i32]) -> i32 {
    input.par_iter() // <-- just change that!
         .map(|&i| i * i)
         .sum()
}

Parallel iterators take care of deciding how to divide your data into tasks; it will dynamically adapt for maximum performance. If you need more flexibility than that, Rayon also offers the join and scope functions, which let you create parallel tasks on your own. For even more control, you can create custom threadpools rather than using Rayon's default, global threadpool.

Using Rayon

Rayon is available on crates.io. The recommended way to use it is to add a line into your Cargo.toml such as:

[dependencies]
rayon = "1.5"

To use the Parallel Iterator APIs, a number of traits have to be in scope. The easiest way to bring those things into scope is to use the Rayon prelude. In each module where you would like to use the parallel iterator APIs, just add:

use rayon::prelude::*;

Rayon currently requires rustc 1.46.0 or greater.

Usage with WebAssembly

Rayon can work on the Web via WebAssembly, but requires an adapter and some project configuration to account for differences between WebAssembly threads and threads on the other platforms.

Check out wasm-bindgen-rayon docs for more details.

Contribution

Rayon is an open source project! If you'd like to contribute to Rayon, check out the list of "help wanted" issues. These are all (or should be) issues that are suitable for getting started, and they generally include a detailed set of instructions for what to do. Please ask questions if anything is unclear! Also, check out the Guide to Development page on the wiki. Note that all code submitted in PRs to Rayon is assumed to be licensed under Rayon's dual MIT/Apache2 licensing.

Quick demo

To see Rayon in action, check out the rayon-demo directory, which includes a number of demos of code using Rayon. For example, run this command to get a visualization of an nbody simulation. To see the effect of using Rayon, press s to run sequentially and p to run in parallel.

> cd rayon-demo
> cargo run --release -- nbody visualize

For more information on demos, try:

> cd rayon-demo
> cargo run --release -- --help

View on Github

4 - Rustcc/coroutine-rs:

Coroutine Library in Rust.

[dependencies]
coroutine = "0.8"

Usage

Basic usage of Coroutine

extern crate coroutine;

use std::usize;
use coroutine::asymmetric::Coroutine;

fn main() {
    let coro: Coroutine<i32> = Coroutine::spawn(|me,_| {
        for num in 0..10 {
            me.yield_with(num);
        }
        usize::MAX
    });

    for num in coro {
        println!("{}", num.unwrap());
    }
}

This program will print the following to the console

0
1
2
3
4
5
6
7
8
9
18446744073709551615

For more detail, please run cargo doc --open.

Goals

 Basic single threaded coroutine support

 Asymmetric Coroutines

 Symmetric Coroutines

 Thread-safe: can only resume a coroutine in one thread simultaneously

Notes

Basically it supports arm, i686, mips, mipsel and x86_64 platforms, but we have only tested in

OS X 10.10.*, x86_64, nightly

ArchLinux, x86_64, nightly

View on Github

5 - Zonyitoo/coio-rs:

Coroutine I/O for Rust.

Coroutine scheduling with work-stealing algorithm.

WARN: Possibly crash because of TLS inline, check #56 for more detail!

Feature

  • Non-blocking I/O
  • Work-stealing coroutine scheduling
  • Asynchronous computing APIs

Usage

Note: You must use Nightly Rust to build this Project.

[dependencies.coio]
git = "https://github.com/zonyitoo/coio-rs.git"

Basic Coroutines

extern crate coio;

use coio::Scheduler;

fn main() {
    Scheduler::new()
        .run(|| {
            for _ in 0..10 {
                println!("Heil Hydra");
                Scheduler::sched(); // Yields the current coroutine
            }
        })
        .unwrap();
}

TCP Echo Server

extern crate coio;

use std::io::{Read, Write};

use coio::net::TcpListener;
use coio::{spawn, Scheduler};

fn main() {
    // Spawn a coroutine for accepting new connections
    Scheduler::new().with_workers(4).run(move|| {
        let acceptor = TcpListener::bind("127.0.0.1:8080").unwrap();
        println!("Waiting for connection ...");

        for stream in acceptor.incoming() {
            let (mut stream, addr) = stream.unwrap();

            println!("Got connection from {:?}", addr);

            // Spawn a new coroutine to handle the connection
            spawn(move|| {
                let mut buf = [0; 1024];

                loop {
                    match stream.read(&mut buf) {
                        Ok(0) => {
                            println!("EOF");
                            break;
                        },
                        Ok(len) => {
                            println!("Read {} bytes, echo back", len);
                            stream.write_all(&buf[0..len]).unwrap();
                        },
                        Err(err) => {
                            println!("Error occurs: {:?}", err);
                            break;
                        }
                    }
                }

                println!("Client closed");
            });
        }
    }).unwrap();
}

View on Github

Thank you for following this article.

Related videos:

Rust Concurrency Explained

#rust #concurrency 

5 Popular Concurrency Library for Rust
Audra  Haag

Audra Haag

1660227900

Apply Concurrency, Parallelism and Asyncio to Speeding Up Python

What are concurrency and parallelism, and how do they apply to Python?

You can find all the code examples from this article in the concurrency-parallelism-and-asyncio repo on GitHub.

Source: https://testdriven.io

#python #concurrency #asyncio 

Apply Concurrency, Parallelism and Asyncio to Speeding Up Python
Hoang Tran

Hoang Tran

1660220640

Áp Dụng Concurrency, Parallelism và Asyncio Để Tăng Tốc Python

Concurrency và Parallelism là gì và chúng áp dụng cho Python như thế nào?

Có nhiều lý do khiến ứng dụng của bạn có thể bị chậm. Đôi khi điều này là do thiết kế thuật toán kém hoặc lựa chọn sai cấu trúc dữ liệu. Tuy nhiên, đôi khi, đó là do các lực lượng nằm ngoài tầm kiểm soát của chúng tôi, chẳng hạn như các hạn chế về phần cứng hoặc các trục trặc của mạng. Đó là nơi mà tính đồng thời và song song phù hợp. Chúng cho phép các chương trình của bạn thực hiện nhiều việc cùng một lúc, cùng một lúc hoặc bằng cách lãng phí ít thời gian nhất có thể cho việc chờ đợi các tác vụ bận rộn.

Cho dù bạn đang xử lý các tài nguyên web bên ngoài, đọc và ghi vào nhiều tệp hoặc cần sử dụng một hàm tính toán chuyên sâu nhiều lần với các tham số khác nhau, bài viết này sẽ giúp bạn tối đa hóa hiệu quả và tốc độ mã của mình.

Đầu tiên, chúng ta sẽ đi sâu vào vấn đề đồng thời và song song là gì và cách chúng phù hợp với lĩnh vực Python bằng cách sử dụng các thư viện tiêu chuẩn như phân luồng, đa xử lý và asyncio. Phần cuối cùng của bài viết này sẽ so sánh việc triển khai async/ của Python awaitvới cách các ngôn ngữ khác đã triển khai chúng.

Bạn có thể tìm thấy tất cả các ví dụ mã từ bài viết này trong repo đồng thời-song song-và-asyncio trên GitHub.

Để làm việc với các ví dụ trong bài viết này, bạn nên biết cách làm việc với các yêu cầu HTTP.

Mục tiêu

Đến cuối bài viết này, bạn sẽ có thể trả lời các câu hỏi sau:

  1. Đồng thời là gì?
  2. Chủ đề là gì?
  3. Nó có nghĩa là gì khi một cái gì đó không bị chặn?
  4. Vòng lặp sự kiện là gì?
  5. Gọi lại là gì?
  6. Tại sao phương pháp asyncio luôn nhanh hơn một chút so với phương pháp phân luồng?
  7. Khi nào bạn nên sử dụng threading và khi nào bạn nên sử dụng asyncio?
  8. Song song là gì?
  9. Sự khác biệt giữa đồng thời và song song là gì?
  10. Có thể kết hợp asyncio với đa xử lý không?
  11. Khi nào bạn nên sử dụng multiprocessing vs asyncio hoặc threading?
  12. Sự khác biệt giữa multiprocessing, asyncio và concurrency.futures là gì?
  13. Làm cách nào để kiểm tra asyncio bằng pytest?

Đồng tiền

Đồng thời là gì?

Một định nghĩa hiệu quả cho đồng thời là "có thể thực hiện nhiều tác vụ cùng một lúc". Tuy nhiên, điều này có một chút sai lầm, vì các tác vụ có thể thực sự được thực hiện cùng một lúc. Thay vào đó, một quá trình có thể bắt đầu, sau đó khi nó đang đợi một hướng dẫn cụ thể kết thúc, hãy chuyển sang một nhiệm vụ mới, chỉ để quay lại khi nó không còn chờ đợi nữa. Khi một nhiệm vụ hoàn thành, nó lại chuyển sang một nhiệm vụ chưa hoàn thành cho đến khi tất cả chúng đã được thực hiện. Các tác vụ bắt đầu không đồng bộ, được thực hiện không đồng bộ và sau đó kết thúc không đồng bộ.

đồng thời, không song song

Nếu điều đó khiến bạn bối rối, thay vào đó hãy nghĩ đến một phép tương tự: Giả sử bạn muốn tạo BLT . Đầu tiên, bạn sẽ cho thịt xông khói vào chảo ở lửa vừa và nhỏ. Trong khi nấu thịt xông khói, bạn có thể lấy cà chua và rau diếp ra và bắt đầu sơ chế (rửa và cắt) chúng. Trong khi đó, bạn tiếp tục kiểm tra và thỉnh thoảng lật thịt xông khói của bạn.

Tại thời điểm này, bạn đã bắt đầu một nhiệm vụ, sau đó bắt đầu và hoàn thành hai nhiệm vụ khác trong thời gian chờ đợi, tất cả trong khi bạn vẫn đang chờ đợi nhiệm vụ đầu tiên.

Cuối cùng, bạn cho bánh mì của mình vào máy nướng bánh mì. Trong khi nướng, bạn tiếp tục kiểm tra thịt xông khói của mình. Khi các miếng hoàn thành, bạn kéo chúng ra và đặt chúng vào đĩa. Sau khi nướng xong bánh mì, bạn phết lên đó lớp bánh mì sandwich mà bạn lựa chọn, sau đó bạn có thể bắt đầu xếp lớp trên cà chua, rau diếp và sau đó, khi đã nấu xong là thịt xông khói của bạn. Chỉ khi mọi thứ đã được nấu chín, chuẩn bị và xếp lớp, bạn mới có thể đặt miếng bánh mì nướng cuối cùng lên bánh mì sandwich, cắt miếng (tùy chọn) và ăn.

Vì nó đòi hỏi bạn phải thực hiện nhiều nhiệm vụ cùng một lúc, nên việc tạo BLT vốn dĩ là một quá trình đồng thời, ngay cả khi bạn không tập trung toàn bộ vào từng nhiệm vụ đó cùng một lúc. Đối với tất cả các ý định và mục đích, trong phần tiếp theo, chúng tôi sẽ đề cập đến hình thức đồng thời này chỉ là "đồng thời". Chúng ta sẽ phân biệt nó ở phần sau trong bài viết này.

Vì lý do này, đồng thời là rất tốt cho các quy trình I / O chuyên sâu - các tác vụ liên quan đến việc chờ đợi các yêu cầu web hoặc thao tác đọc / ghi tệp.

Trong Python, có một số cách khác nhau để đạt được sự đồng thời. Đầu tiên chúng ta sẽ xem xét thư viện luồng.

Đối với các ví dụ của chúng tôi trong phần này, chúng tôi sẽ xây dựng một chương trình Python nhỏ để lấy một thể loại nhạc ngẫu nhiên từ API Genrenator của Binary Jazz năm lần, in thể loại đó ra màn hình và đưa mỗi thể loại vào tệp riêng của nó.

Để làm việc với phân luồng trong Python, bạn cần nhập duy nhất threading, nhưng đối với ví dụ này, tôi cũng đã nhập urllibđể làm việc với các yêu cầu HTTP, timeđể xác định thời gian hoàn thành các hàm và jsondễ dàng chuyển đổi dữ liệu json được trả về từ API Genrenator.

Bạn có thể tìm thấy mã cho ví dụ này ở đây .

Hãy bắt đầu với một hàm đơn giản:

def write_genre(file_name):
    """
    Uses genrenator from binaryjazz.us to write a random genre to the
    name of the given file
    """

    req = Request("https://binaryjazz.us/wp-json/genrenator/v1/genre/", headers={"User-Agent": "Mozilla/5.0"})
    genre = json.load(urlopen(req))

    with open(file_name, "w") as new_file:
        print(f"Writing '{genre}' to '{file_name}'...")
        new_file.write(genre)

Kiểm tra mã ở trên, chúng tôi đang thực hiện một yêu cầu đối với API Genrenator, tải phản hồi JSON của nó (một thể loại nhạc ngẫu nhiên), in nó, sau đó ghi nó vào một tệp.

Nếu không có tiêu đề "Tác nhân người dùng", bạn sẽ nhận được 304.

Điều chúng tôi thực sự quan tâm là phần tiếp theo, nơi diễn ra luồng thực tế:

threads = []

for i in range(5):
    thread = threading.Thread(
        target=write_genre,
        args=[f"./threading/new_file{i}.txt"]
    )
    thread.start()
    threads.append(thread)

for thread in threads:
    thread.join()

Đầu tiên chúng ta bắt đầu với một danh sách. Sau đó, chúng tôi tiến hành lặp lại năm lần, mỗi lần tạo một chuỗi mới. Tiếp theo, chúng tôi bắt đầu từng chuỗi, thêm nó vào danh sách "chuỗi" của chúng tôi, sau đó lặp lại danh sách của chúng tôi lần cuối để tham gia từng chuỗi.

Giải thích: Tạo chủ đề trong Python rất dễ dàng.

Để tạo một chủ đề mới, hãy sử dụng threading.Thread(). Bạn có thể chuyển vào nó kwarg (đối số từ khóa) targetvới giá trị của bất kỳ hàm nào bạn muốn chạy trên chuỗi đó. Nhưng chỉ chuyển vào tên của hàm chứ không phải giá trị của nó (ý nghĩa, cho mục đích của chúng tôi write_genrevà không phải write_genre()). Để chuyển các đối số, hãy chuyển vào "kwargs" (lấy một chính tả của kwargs của bạn) hoặc "args" (lấy một có thể lặp lại chứa các args của bạn - trong trường hợp này là một danh sách).

Tuy nhiên, tạo một chuỗi không giống như bắt đầu một chuỗi. Để bắt đầu chủ đề của bạn, hãy sử dụng {the name of your thread}.start(). Bắt đầu một luồng có nghĩa là "bắt đầu thực thi của nó."

Cuối cùng, khi chúng tôi nối các chuỗi với thread.join(), tất cả những gì chúng tôi đang làm là đảm bảo chuỗi đã kết thúc trước khi tiếp tục với mã của chúng tôi.

Chủ đề

Nhưng chính xác thì một chủ đề là gì?

Luồng là một cách cho phép máy tính của bạn chia nhỏ một quy trình / chương trình thành nhiều phần nhẹ thực thi song song. Hơi khó hiểu, việc triển khai phân luồng theo tiêu chuẩn của Python giới hạn các luồng chỉ có thể thực thi từng luồng một do một thứ được gọi là Global Interpreter Lock (GIL). GIL là cần thiết vì quản lý bộ nhớ của CPython (triển khai mặc định của Python) không an toàn theo luồng. Do hạn chế này, luồng trong Python là đồng thời, nhưng không song song. Để giải quyết vấn đề này, Python có một multiprocessingmô-đun riêng biệt không bị giới hạn bởi GIL có chức năng quay các quy trình riêng biệt, cho phép thực thi song song mã của bạn. Sử dụng multiprocessingmô-đun gần giống như sử dụng threadingmô-đun.

Có thể tìm thấy thêm thông tin về GIL của Python và độ an toàn của chuỗi trên Real Pythoncác tài liệu chính thức của Python .

Chúng ta sẽ sớm xem xét sâu hơn về đa xử lý trong Python.

Trước khi chúng tôi cho thấy khả năng cải thiện tốc độ so với mã không phân luồng, tôi cũng đã tự do tạo một phiên bản không phân luồng của cùng một chương trình (một lần nữa, có sẵn trên GitHub ). Thay vì tạo một luồng mới và tham gia từng luồng, thay vào đó, nó gọi write_genretrong một vòng lặp for lặp lại năm lần.

Để so sánh các điểm chuẩn tốc độ, tôi cũng đã nhập timethư viện để tính thời gian thực thi các tập lệnh của chúng tôi:

Starting...
Writing "binary indoremix" to "./sync/new_file0.txt"...
Writing "slavic aggro polka fusion" to "./sync/new_file1.txt"...
Writing "israeli new wave" to "./sync/new_file2.txt"...
Writing "byzantine motown" to "./sync/new_file3.txt"...
Writing "dutch hate industrialtune" to "./sync/new_file4.txt"...
Time to complete synchronous read/writes: 1.42 seconds

Upon running the script, we see that it takes my computer around 1.49 seconds (along with classic music genres such as "dutch hate industrialtune"). Not too bad.

Now let's run the version that uses threading:

Starting...
Writing "college k-dubstep" to "./threading/new_file2.txt"...
Writing "swiss dirt" to "./threading/new_file0.txt"...
Writing "bop idol alternative" to "./threading/new_file4.txt"...
Writing "ethertrio" to "./threading/new_file1.txt"...
Writing "beach aust shanty français" to "./threading/new_file3.txt"...
Time to complete threading read/writes: 0.77 seconds

The first thing that might stand out to you is the functions not being completed in order: 2 - 0 - 4 - 1 - 3

This is because of the asynchronous nature of threading: as one function waits, another one begins, and so on. Because we're able to continue performing tasks while we're waiting on others to finish (either due to networking or file I/O operations), you may also have noticed that we cut our time roughly in half: 0.77 seconds. Whereas this might not seem like a lot now, it's easy to imagine the very real case of building a web application that needs to write much more data to a file or interact with much more complex web services.

So, if threading is so great, why don't we end the article here?

Because there are even better ways to perform tasks concurrently.

Asyncio

Let's take a look at an example using asyncio. For this method, we're going to install aiohttp using pip. This will allow us to make non-blocking requests and receive responses using the async/await syntax that will be introduced shortly. It also has the extra benefit of a function that converts a JSON response without needing to import the json library. We'll also install and import aiofiles, which allows non-blocking file operations. Other than aiohttp and aiofiles, import asyncio, which comes with the Python standard library.

"Non-blocking" means a program will allow other threads to continue running while it's waiting. This is opposed to "blocking" code, which stops execution of your program completely. Normal, synchronous I/O operations suffer from this limitation.

You can find the code for this example here.

Once we have our imports in place, let's take a look at the asynchronous version of the write_genre function from our asyncio example:

async def write_genre(file_name):
    """
    Uses genrenator from binaryjazz.us to write a random genre to the
    name of the given file
    """

    async with aiohttp.ClientSession() as session:
        async with session.get("https://binaryjazz.us/wp-json/genrenator/v1/genre/") as response:
            genre = await response.json()

    async with aiofiles.open(file_name, "w") as new_file:
        print(f'Writing "{genre}" to "{file_name}"...')
        await new_file.write(genre)

For those not familiar with the async/await syntax that can be found in many other modern languages, async declares that a function, for loop, or with statement must be used asynchronously. To call an async function, you must either use the await keyword from another async function or call create_task() directly from the event loop, which can be grabbed from asyncio.get_event_loop() -- i.e., loop = asyncio.get_event_loop().

Additionally:

  1. async with allows awaiting async responses and file operations.
  2. async for (not used here) iterates over an asynchronous stream.

The Event Loop

Event loops are constructs inherent to asynchronous programming that allow performing tasks asynchronously. As you're reading this article, I can safely assume you're probably not too familiar with the concept. However, even if you've never written an async application, you have experience with event loops every time you use a computer. Whether your computer is listening for keyboard input, you're playing online multiplayer games, or you're browsing Reddit while you have files copying in the background, an event loop is the driving force that keeps everything working smoothly and efficiently. In its purest essence, an event loop is a process that waits around for triggers and then performs specific (programmed) actions once those triggers are met. They often return a "promise" (JavaScript syntax) or "future" (Python syntax) of some sort to denote that a task has been added. Once the task is finished, the promise or future returns a value passed back from the called function (assuming the function does return a value).

The idea of performing a function in response to another function is called a "callback."

For another take on callbacks and events, here's a great answer on Stack Overflow.

Here's a walkthrough of our function:

We're using async with to open our client session asynchronously. The aiohttp.ClientSession() class is what allows us to make HTTP requests and remain connected to a source without blocking the execution of our code. We then make an async request to the Genrenator API and await the JSON response (a random music genre). In the next line, we use async with again with the aiofiles library to asynchronously open a new file to write our new genre to. We print the genre, then write it to the file.

Unlike regular Python scripts, programming with asyncio pretty much enforces* using some sort of "main" function.

*Unless you're using the deprecated "yield" syntax with the @asyncio.coroutine decorator, which will be removed in Python 3.10.

This is because you need to use the "async" keyword in order to use the "await" syntax, and the "await" syntax is the only way to actually run other async functions.

Here's our main function:

async def main():
    tasks = []

    for i in range(5):
        tasks.append(write_genre(f"./async/new_file{i}.txt"))

    await asyncio.gather(*tasks)

As you can see, we've declared it with "async." We then create an empty list called "tasks" to house our async tasks (calls to Genrenator and our file I/O). We append our tasks to our list, but they are not actually run yet. The calls don't actually get made until we schedule them with await asyncio.gather(*tasks). This runs all of the tasks in our list and waits for them to finish before continuing with the rest of our program. Lastly, we use asyncio.run(main()) to run our "main" function. The .run() function is the entry point for our program, and it should generally only be called once per process.

For those not familiar, the * in front of tasks is called "argument unpacking." Just as it sounds, it unpacks our list into a series of arguments for our function. Our function is asyncio.gather() in this case.

And that's all we need to do. Now, running our program (the source of which includes the same timing functionality of the synchronous and threading examples)...

Writing "albuquerque fiddlehaus" to "./async/new_file1.txt"...
Writing "euroreggaebop" to "./async/new_file2.txt"...
Writing "shoedisco" to "./async/new_file0.txt"...
Writing "russiagaze" to "./async/new_file4.txt"...
Writing "alternative xylophone" to "./async/new_file3.txt"...
Time to complete asyncio read/writes: 0.71 seconds

...we see it's even faster still. And, in general, the asyncio method will always be a bit faster than the threading method. This is because when we use the "await" syntax, we essentially tell our program "hold on, I'll be right back," but our program keeps track of how long it takes us to finish what we're doing. Once we're done, our program will know, and will pick back up as soon as it's able. Threading in Python allows asynchronicity, but our program could theoretically skip around different threads that may not yet be ready, wasting time if there are threads ready to continue running.

So when should I use threading, and when should I use asyncio?

When you're writing new code, use asyncio. If you need to interface with older libraries or those that don't support asyncio, you might be better off with threading.

Testing asyncio with pytest

It turns out testing async functions with pytest is as easy as testing synchronous functions. Just install the pytest-asyncio package with pip, mark your tests with the async keyword, and apply a decorator that lets pytest know it's asynchronous: @pytest.mark.asyncio. Let's look at an example.

First, let's write an arbitrary async function in a file called hello_asyncio.py:

import asyncio


async def say_hello(name: str):
    """ Sleeps for two seconds, then prints 'Hello, {{ name }}!' """
    try:
        if type(name) != str:
            raise TypeError("'name' must be a string")
        if name == "":
            raise ValueError("'name' cannot be empty")
    except (TypeError, ValueError):
        raise

    print("Sleeping...")
    await asyncio.sleep(2)
    print(f"Hello, {name}!")

The function takes a single string argument: name. Upon ensuring that name is a string with a length greater than one, our function asynchronously sleeps for two seconds, then prints "Hello, {name}!" to the console.

The difference between asyncio.sleep() and time.sleep() is that asyncio.sleep() is non-blocking.

Now let's test it with pytest. In the same directory as hello_asyncio.py, create a file called test_hello_asyncio.py, then open it in your favorite text editor.

Let's start with our imports:

import pytest # Note: pytest-asyncio does not require a separate import

from hello_asyncio import say_hello

Then we'll create a test with proper input:

@pytest.mark.parametrize("name", [
    "Robert Paulson",
    "Seven of Nine",
    "x Æ a-12"
])
@pytest.mark.asyncio
async def test_say_hello(name):
    await say_hello(name)

Things to note:

  • The @pytest.mark.asyncio decorator lets pytest work asynchronously
  • Our test uses the async syntax
  • We're awaiting our async function as we would if we were running it outside of a test

Now let's run our test with the verbose -v option:

pytest -v
...
collected 3 items

test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED    [ 33%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED     [ 66%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED       [100%]

Looks good. Next we'll write a couple of tests with bad input. Back inside of test_hello_asyncio.py, let's create a class called TestSayHelloThrowsExceptions:

class TestSayHelloThrowsExceptions:
    @pytest.mark.parametrize("name", [
        "",
    ])
    @pytest.mark.asyncio
    async def test_say_hello_value_error(self, name):
        with pytest.raises(ValueError):
            await say_hello(name)

    @pytest.mark.parametrize("name", [
        19,
        {"name", "Diane"},
        []
    ])
    @pytest.mark.asyncio
    async def test_say_hello_type_error(self, name):
        with pytest.raises(TypeError):
            await say_hello(name)

Again, we decorate our tests with @pytest.mark.asyncio, mark our tests with the async syntax, then call our function with await.

Run the tests again:

pytest -v
...
collected 7 items

test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED                                    [ 14%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED                                     [ 28%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED                                       [ 42%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_value_error[] PASSED        [ 57%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[19] PASSED       [ 71%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name1] PASSED    [ 85%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name2] PASSED    [100%]

Without pytest-asyncio

Alternatively to pytest-asyncio, you can create a pytest fixture that yields an asyncio event loop:

import asyncio
import pytest

from hello_asyncio import say_hello


@pytest.fixture
def event_loop():
    loop = asyncio.get_event_loop()
    yield loop

Then, rather than using the async/await syntax, you create your tests as you would normal, synchronous tests:

@pytest.mark.parametrize("name", [
    "Robert Paulson",
    "Seven of Nine",
    "x Æ a-12"
])
def test_say_hello(event_loop, name):
    event_loop.run_until_complete(say_hello(name))


class TestSayHelloThrowsExceptions:
    @pytest.mark.parametrize("name", [
        "",
    ])
    def test_say_hello_value_error(self, event_loop, name):
        with pytest.raises(ValueError):
            event_loop.run_until_complete(say_hello(name))

    @pytest.mark.parametrize("name", [
        19,
        {"name", "Diane"},
        []
    ])
    def test_say_hello_type_error(self, event_loop, name):
        with pytest.raises(TypeError):
            event_loop.run_until_complete(say_hello(name))

If you're interested, here's a more advanced tutorial on asyncio testing.

Further Reading

If you want to learn more about what distinguishes Python's implementation of threading vs asyncio, here's a great article from Medium.

For even better examples and explanations of threading in Python, here's a video by Corey Schafer that goes more in-depth, including using the concurrent.futures library.

Lastly, for a massive deep-dive into asyncio itself, here's an article from Real Python completely dedicated to it.

Bonus: One more library you might be interested in is called Unsync, especially if you want to easily convert your current synchronous code into asynchronous code. To use it, you install the library with pip, import it with from unsync import unsync, then decorate whatever currently synchronous function with @unsync to make it asynchronous. To await it and get its return value (which you can do anywhere -- it doesn't have to be in an async/unsync function), just call .result() after the function call.

Parallelism

What is parallelism?

Parallelism is very-much related to concurrency. In fact, parallelism is a subset of concurrency: whereas a concurrent process performs multiple tasks at the same time whether they're being diverted total attention or not, a parallel process is physically performing multiple tasks all at the same time. A good example would be driving, listening to music, and eating the BLT we made in the last section at the same time.

đồng thời và song song

Because they don't require a lot of intensive effort, you can do them all at once without having to wait on anything or divert your attention away.

Now let's take a look at how to implement this in Python. We could use the multiprocessing library, but let's use the concurrent.futures library instead -- it eliminates the need to manage the number of process manually. Because the major benefit of multiprocessing happens when you perform multiple cpu-heavy tasks, we're going to compute the squares of 1 million (1000000) to 1 million and 16 (1000016).

You can find the code for this example here.

The only import we'll need is concurrent.futures:

import concurrent.futures
import time


if __name__ == "__main__":
    pow_list = [i for i in range(1000000, 1000016)]

    print("Starting...")
    start = time.time()

    with concurrent.futures.ProcessPoolExecutor() as executor:
        futures = [executor.submit(pow, i, i) for i in pow_list]

    for f in concurrent.futures.as_completed(futures):
        print("okay")

    end = time.time()
    print(f"Time to complete: {round(end - start, 2)}")

Because I'm developing on a Windows machine, I'm using if __name__ == "main". This is necessary because Windows does not have the fork system call inherent to Unix systems. Because Windows doesn't have this capability, it resorts to launching a new interpreter with each process that tries to import the main module. If the main module doesn't exist, it reruns your entire program, causing recursive chaos to ensue.

So taking a look at our main function, we use a list comprehension to create a list from 1 million to 1 million and 16, we open a ProcessPoolExecutor with concurrent.futures, and we use list comprehension and ProcessPoolExecutor().submit() to start executing our processes and throwing them into a list called "futures."

We could also use ThreadPoolExecutor() if we wanted to use threads instead -- concurrent.futures is versatile.

And this is where the asynchronicity comes in: The "results" list does not actually contain the results from running our functions. Instead, it contains "futures" which are similar to the JavaScript idea of "promises." In order to allow our program to continue running, we get back these futures that represent a placeholder for a value. If we try to print the future, depending on whether it's finished running or not, we'll either get back a state of "pending" or "finished." Once it's finished we can get the return value (assuming there is one) using var.result(). In this case, our var will be "result."

We then iterate through our list of futures, but instead of printing our values, we're simply printing out "okay." This is just because of how massive the resulting calculations come out to be.

Just as before, I built a comparison script that does this synchronously. And, just as before, you can find it on GitHub.

Running our control program, which also includes functionality for timing our program, we get:

Starting...
okay
...
okay
Time to complete: 54.64

Wow. 54.64 seconds is quite a long time. Let's see if our version with multiprocessing does any better:

Starting...
okay
...
okay
Time to complete: 6.24

Our time has been significantly reduced. We're at about 1/9th of our original time.

So what would happen if we used threading for this instead?

I'm sure you can guess -- it wouldn't be much faster than doing it synchronously. In fact, it might be slower because it still takes a little time and effort to spin up new threads. But don't take my word for it, here's what we get when we replace ProcessPoolExecutor() with ThreadPoolExecutor():

Starting...
okay
...
okay
Time to complete: 53.83

As I mentioned earlier, threading allows your applications to focus on new tasks while others are waiting. In this case, we're never sitting idly by. Multiprocessing, on the other hand, spins up totally new services, usually on separate CPU cores, ready to do whatever you ask it completely in tandem with whatever else your script is doing. This is why the multiprocessing version taking roughly 1/9th of the time makes sense -- I have 8 cores in my CPU.

Now that we've talked about concurrency and parallelism in Python, we can finally set the terms straight. If you're having trouble distinguishing between the terms, you can safely and accurately think of our previous definitions of "parallelism" and "concurrency" as "parallel concurrency" and "non-parallel concurrency" respectively.

Further Reading

Real Python has a great article on concurrency vs parallelism.

Engineer Man has a good video comparison of threading vs multiprocessing.

Corey Schafer also has a good video on multiprocessing in the same spirit as his threading video.

If you only watch one video, watch this excellent talk by Raymond Hettinger. He does an amazing job explaining the differences between multiprocessing, threading, and asyncio.

Combining Asyncio with Multiprocessing

What if I need to combine many I/O operations with heavy calculations?

We can do that too. Say you need to scrape 100 web pages for a specific piece of information, and then you need to save that piece of info in a file for later. We can separate the compute power across each of our computer's cores by making each process scrape a fraction of the pages.

For this script, let's install Beautiful Soup to help us easily scrape our pages: pip install beautifulsoup4. This time we actually have quite a few imports. Here they are, and here's why we're using them:

import asyncio                         # Gives us async/await
import concurrent.futures              # Allows creating new processes
import time
from math import floor                 # Helps divide up our requests evenly across our CPU cores
from multiprocessing import cpu_count  # Returns our number of CPU cores

import aiofiles                        # For asynchronously performing file I/O operations
import aiohttp                         # For asynchronously making HTTP requests
from bs4 import BeautifulSoup          # For easy webpage scraping

You can find the code for this example here.

First, we're going to create an async function that makes a request to Wikipedia to get back random pages. We'll scrape each page we get back for its title using BeautifulSoup, and then we'll append it to a given file; we'll separate each title with a tab. The function will take two arguments:

  1. num_pages - Number of pages to request and scrape for titles
  2. output_file - The file to append our titles to
async def get_and_scrape_pages(num_pages: int, output_file: str):
    """
    Makes {{ num_pages }} requests to Wikipedia to receive {{ num_pages }} random
    articles, then scrapes each page for its title and appends it to {{ output_file }},
    separating each title with a tab: "\\t"

    #### Arguments
    ---
    num_pages: int -
        Number of random Wikipedia pages to request and scrape

    output_file: str -
        File to append titles to
    """
    async with \
    aiohttp.ClientSession() as client, \
    aiofiles.open(output_file, "a+", encoding="utf-8") as f:

        for _ in range(num_pages):
            async with client.get("https://en.wikipedia.org/wiki/Special:Random") as response:
                if response.status > 399:
                    # I was getting a 429 Too Many Requests at a higher volume of requests
                    response.raise_for_status()

                page = await response.text()
                soup = BeautifulSoup(page, features="html.parser")
                title = soup.find("h1").text

                await f.write(title + "\t")

        await f.write("\n")

We're both asynchronously opening an aiohttp ClientSession and our output file. The mode, a+, means append to the file and create it if it doesn't already exist. Encoding our strings as utf-8 ensures we don't get an error if our titles contain international characters. If we get an error response, we'll raise it instead of continuing (at high request volumes I was getting a 429 Too Many Requests). We asynchronously get the text from our response, then we parse the title and asynchronously and append it to our file. After we append all of our titles, we append a new line: "\n".

Our next function is the function we'll start with each new process to allow running it asynchronously:

def start_scraping(num_pages: int, output_file: str, i: int):
    """ Starts an async process for requesting and scraping Wikipedia pages """
    print(f"Process {i} starting...")
    asyncio.run(get_and_scrape_pages(num_pages, output_file))
    print(f"Process {i} finished.")

Now for our main function. Let's start with some constants (and our function declaration):

def main():
    NUM_PAGES = 100 # Number of pages to scrape altogether
    NUM_CORES = cpu_count() # Our number of CPU cores (including logical cores)
    OUTPUT_FILE = "./wiki_titles.tsv" # File to append our scraped titles to

    PAGES_PER_CORE = floor(NUM_PAGES / NUM_CORES)
    PAGES_FOR_FINAL_CORE = PAGES_PER_CORE + NUM_PAGES % PAGES_PER_CORE # For our final core

And now the logic:

    futures = []

    with concurrent.futures.ProcessPoolExecutor(NUM_CORES) as executor:
        for i in range(NUM_CORES - 1):
            new_future = executor.submit(
                start_scraping, # Function to perform
                # v Arguments v
                num_pages=PAGES_PER_CORE,
                output_file=OUTPUT_FILE,
                i=i
            )
            futures.append(new_future)

        futures.append(
            executor.submit(
                start_scraping,
                PAGES_FOR_FINAL_CORE, OUTPUT_FILE, NUM_CORES-1
            )
        )

    concurrent.futures.wait(futures)

We create an array to store our futures, then we create a ProcessPoolExecutor, setting its max_workers equal to our number of cores. We iterate over a range equal to our number of cores minus 1, running a new process with our start_scraping function. We then append it our futures list. Our final core will potentially have extra work to do as it will scrape a number of pages equal to each of our other cores, but will additionally scrape a number of pages equal to the remainder that we got when dividing our total number of pages to scrape by our total number of cpu cores.

Make sure to actually run your main function:

if __name__ == "__main__":
    start = time.time()
    main()
    print(f"Time to complete: {round(time.time() - start, 2)} seconds.")

After running the program with my 8-core CPU (along with benchmarking code):

This version (asyncio with multiprocessing):

Time to complete: 5.65 seconds.

Multiprocessing only:

Time to complete: 8.87 seconds.

asyncio only:

Time to complete: 47.92 seconds.

Completely synchronous:

Time to complete: 88.86 seconds.

I'm actually quite surprised to see that the improvement of asyncio with multiprocessing over just multiprocessing wasn't as great as I thought it would be.

Recap: When to use multiprocessing vs asyncio or threading

  1. Use multiprocessing when you need to do many heavy calculations and you can split them up.
  2. Use asyncio or threading when you're performing I/O operations -- communicating with external resources or reading/writing from/to files.
  3. Multiprocessing and asyncio can be used together, but a good rule of thumb is to fork a process before you thread/use asyncio instead of the other way around -- threads are relatively cheap compared to processes.

Async/Await in Other Languages

async/await and similar syntax also exist in other languages, and in some of those languages, its implementation can differ drastically.

.NET: F# to C

The first programming language (back in 2007) to use the async syntax was Microsoft's F#. Whereas it doesn't exactly use await to wait on a function call, it uses specific syntax like let! and do! along with proprietary Async functions included in the System module.

You can find more about async programming in F# on Microsoft's F# docs.

Their C# team then built upon this concept, and that's where the async/await keywords that we're now familiar with were born:

using System;

// Allows the "Task" return type
using System.Threading.Tasks;

public class Program
{
    // Declare an async function with "async"
    private static async Task<string> ReturnHello()
    {
        return "hello world";
    }

    // Main can be async -- no problem
    public static async Task Main()
    {
        // await an async string
        string result = await ReturnHello();

        // Print the string we got asynchronously
        Console.WriteLine(result);
    }
}

Run it on .NETFiddle

We ensure that we're using System.Threading.Tasks as it includes the Task type, and, in general, the Task type is needed for an async function to be awaited. The cool thing about C# is that you can make your main function asynchronous just by declaring it with async, and you won't have any issues.

If you're interested in learning more about async/await in C#, Microsoft's C# docs have a good page on it.

JavaScript

First introduced in ES6, the async/await syntax is essentially an abstraction over JavaScript promises (which are similar to Python futures). Unlike Python, however, so long as you're not awaiting, you can call an async function normally without a specific function like Python's asyncio.start():

// Declare a function with async
async function returnHello(){
    return "hello world";
}

async function printSomething(){
    // await an async string
    const result = await returnHello();

    // print the string we got asynchronously
    console.log(result);
}

// Run our async code
printSomething();

Run it on JSFiddle

Xem MDN để biết thêm thông tin về async/ awaittrong JavaScript .

Rỉ sét

Rust hiện cũng cho phép sử dụng cú pháp async/ awaitvà nó hoạt động tương tự như Python, C # và JavaScript:

// Allows blocking synchronous code to run async code
use futures::executor::block_on;

// Declare an async function with "async"
async fn return_hello() -> String {
    "hello world".to_string()
}

// Code that awaits must also be declared with "async"
async fn print_something(){
    // await an async String
    let result: String = return_hello().await;

    // Print the string we got asynchronously
    println!("{0}", result);
}

fn main() {
    // Block the current synchronous execution to run our async code
    block_on(print_something());
}

Chạy nó trên Rust Play

Để sử dụng các hàm không đồng bộ, trước tiên chúng ta phải thêm futures = "0.3"vào Cargo.toml của mình . Sau đó, chúng tôi nhập block_onhàm với use futures::executor::block_on- block_onlà cần thiết để chạy hàm không đồng bộ của chúng tôi từ mainhàm đồng bộ của chúng tôi.

Bạn có thể tìm thêm thông tin về async/ awaitvề Rust trong tài liệu về Rust.

Đi

Thay vì cú pháp async/ truyền thống awaitvốn có cho tất cả các ngôn ngữ trước đây mà chúng tôi đã đề cập, Go sử dụng "goroutines" và "channel". Bạn có thể nghĩ về một kênh tương tự như một tương lai của Python. Trong Go, bạn thường gửi một kênh dưới dạng đối số cho một hàm, sau đó sử dụng gođể chạy hàm đồng thời. Bất cứ khi nào bạn cần đảm bảo rằng hàm đã hoàn tất, bạn sử dụng <-cú pháp mà bạn có thể coi là awaitcú pháp phổ biến hơn. Nếu goroutine của bạn (hàm bạn đang chạy không đồng bộ) có giá trị trả về, nó có thể được lấy theo cách này.

package main

import "fmt"

// "chan" makes the return value a string channel instead of a string
func returnHello(result chan string){
    // Gives our channel a value
    result <- "hello world"
}

func main() {
    // Creates a string channel
    result := make(chan string)

    // Starts execution of our goroutine
    go returnHello(result)

    // Awaits and prints our string
    fmt.Println(<- result)
}

Chạy nó trong Go Playground

Để biết thêm thông tin về đồng thời trong cờ vây, hãy xem Giới thiệu về lập trình trong cờ vây của Caleb Doxsey.

Ruby

Tương tự như Python, Ruby cũng có giới hạn về Khóa thông dịch viên toàn cầu. Những gì nó không có là đồng thời được tích hợp sẵn trong ngôn ngữ. Tuy nhiên, có một loại đá quý do cộng đồng tạo ra cho phép sử dụng đồng thời trong Ruby và bạn có thể tìm thấy nguồn của nó trên GitHub .

Java

Giống như Ruby, Java không có async/ awaitcú pháp tích hợp, nhưng nó có khả năng đồng thời bằng cách sử dụng java.util.concurrentmô-đun. Tuy nhiên, Electronic Arts đã viết một thư viện Async cho phép sử dụng awaitnhư một phương pháp. Nó không hoàn toàn giống với Python / C # / JavaScript / Rust, nhưng nó đáng để xem xét nếu bạn là nhà phát triển Java và quan tâm đến loại chức năng này.

C ++

Mặc dù C ++ cũng không có cú pháp async/ await, nhưng nó có khả năng sử dụng tương lai để chạy mã đồng thời bằng cách sử dụng futuresmô-đun:

#include <iostream>
#include <string>

// Necessary for futures
#include <future>

// No async declaration needed
std::string return_hello() {
    return "hello world";
}

int main ()
{
    // Declares a string future
    std::future<std::string> fut = std::async(return_hello);

    // Awaits the result of the future
    std::string result = fut.get();

    // Prints the string we got asynchronously
    std::cout << result << '\n';
}

Chạy nó trên C ++ Shell

Không cần khai báo một hàm với bất kỳ từ khóa nào để biểu thị liệu nó có thể và nên chạy không đồng bộ hay không. Thay vào đó, bạn khai báo tương lai ban đầu của mình bất cứ khi nào bạn cần std::future<{{ function return type }}>và đặt nó bằng std::async(), bao gồm tên của hàm bạn muốn thực hiện không đồng bộ cùng với bất kỳ đối số nào mà nó cần - tức là std::async(do_something, 1, 2, "string"). Để chờ đợi giá trị của tương lai, hãy sử dụng .get()cú pháp trên đó.

Bạn có thể tìm thấy tài liệu về không đồng bộ trong C ++ trên cplusplus.com.

Bản tóm tắt

Cho dù bạn đang làm việc với các hoạt động mạng hoặc tệp không đồng bộ hay bạn đang thực hiện nhiều phép tính phức tạp, có một số cách khác nhau để tối đa hóa hiệu quả mã của bạn.

Nếu bạn đang sử dụng Python, bạn có thể sử dụng asynciohoặc threadingđể tận dụng tối đa các hoạt động I / O hoặc multiprocessingmô-đun dành cho mã đòi hỏi nhiều CPU.

Cũng nên nhớ rằng concurrent.futuresmô-đun có thể được sử dụng thay cho một trong hai threadinghoặc multiprocessing.

Nếu bạn đang sử dụng một ngôn ngữ lập trình khác, rất có thể bạn cũng sẽ triển khai async/ awaitcho ngôn ngữ đó.

Nguồn:  https://testdriven.io

#python #concurrency #asyncio 

Áp Dụng Concurrency, Parallelism và Asyncio Để Tăng Tốc Python

Примените Concurrency, Parallelism и Asyncio для ускорения Python

Что такое параллелизм и параллелизм и как они применимы к Python?

Есть много причин, по которым ваши приложения могут работать медленно. Иногда это происходит из-за плохого алгоритмического дизайна или неправильного выбора структуры данных. Однако иногда это происходит из-за не зависящих от нас сил, таких как аппаратные ограничения или особенности сети. Вот тут-то и подходят параллелизм и параллелизм. Они позволяют вашим программам делать несколько вещей одновременно, либо одновременно, либо тратя как можно меньше времени на ожидание загруженных задач.

Независимо от того, имеете ли вы дело с внешними веб-ресурсами, чтением и записью в несколько файлов или вам нужно несколько раз использовать функцию с интенсивными вычислениями с различными параметрами, эта статья должна помочь вам максимизировать эффективность и скорость вашего кода.

Во-первых, мы углубимся в то, что такое параллелизм и параллелизм и как они вписываются в область Python, используя стандартные библиотеки, такие как многопоточность, многопроцессорность и асинхронность. asyncВ последней части этой статьи реализация / в Python будет сравниваться awaitс тем, как они реализованы в других языках.

Вы можете найти все примеры кода из этой статьи в репозитории concurrency-parallelism-and-asyncio на GitHub.

Чтобы работать с примерами в этой статье, вы уже должны знать, как работать с HTTP-запросами.

Цели

К концу этой статьи вы должны быть в состоянии ответить на следующие вопросы:

  1. Что такое параллелизм?
  2. Что такое нить?
  3. Что это значит, когда что-то не блокируется?
  4. Что такое цикл событий?
  5. Что такое обратный вызов?
  6. Почему метод asyncio всегда немного быстрее, чем метод потоков?
  7. Когда следует использовать многопоточность, а когда следует использовать asyncio?
  8. Что такое параллелизм?
  9. В чем разница между параллелизмом и параллелизмом?
  10. Можно ли совместить asyncio с многопроцессорностью?
  11. Когда следует использовать многопроцессорность, а не асинхронность или многопоточность?
  12. В чем разница между многопроцессорностью, asyncio и concurrency.futures?
  13. Как я могу протестировать asyncio с помощью pytest?

параллелизм

Что такое параллелизм?

Эффективным определением параллелизма является «способность выполнять несколько задач одновременно». Однако это немного вводит в заблуждение, поскольку задачи могут выполняться или не выполняться в одно и то же время. Вместо этого процесс может начаться, а затем, когда он ожидает завершения определенной инструкции, переключиться на новую задачу, чтобы вернуться только после того, как он больше не ждет. Как только одна задача завершена, она снова переключается на незавершенную задачу, пока все они не будут выполнены. Задачи начинаются асинхронно, выполняются асинхронно и затем асинхронно завершаются.

параллелизм, а не параллельный

Если это сбивает вас с толку, давайте вместо этого придумаем аналогию: скажем, вы хотите создать BLT . Во-первых, вам нужно бросить бекон в сковороду на среднем огне. Пока бекон готовится, вы можете достать помидоры и листья салата и начать их готовить (мыть и нарезать). Все это время вы продолжаете проверять и время от времени переворачиваете свой бекон.

На этом этапе вы начали одну задачу, а затем тем временем начали и завершили еще две, все еще ожидая выполнения первой.

В конце концов, вы кладете свой хлеб в тостер. Пока он поджаривается, вы продолжаете проверять свой бекон. Когда кусочки готовы, вы вытаскиваете их и кладете на тарелку. Как только ваш хлеб поджарится, вы намазываете его выбранной пастой для сэндвичей, а затем можете начать выкладывать слоями помидоры, листья салата, а затем, когда все готово, бекон. Только после того, как все приготовлено, подготовлено и выложено слоями, вы можете положить последний кусок тоста на бутерброд, нарезать его (по желанию) и съесть.

Поскольку это требует от вас одновременного выполнения нескольких задач, создание BLT по своей сути является параллельным процессом, даже если вы не уделяете все свое внимание каждой из этих задач одновременно. Во всех смыслах и целях в следующем разделе мы будем называть эту форму параллелизма просто параллелизмом. Мы будем различать его позже в этой статье.

По этой причине параллелизм отлично подходит для процессов с интенсивным вводом-выводом — задач, включающих ожидание веб-запросов или операций чтения/записи файлов.

В Python существует несколько различных способов достижения параллелизма. Сначала мы рассмотрим библиотеку потоков.

Для наших примеров в этом разделе мы собираемся создать небольшую программу на Python, которая пять раз выбирает случайный музыкальный жанр из API Genrenator Binary Jazz , выводит жанр на экран и помещает каждый в отдельный файл.

Для работы с многопоточностью в Python вам потребуется единственный импорт threading, но для этого примера я также импортировал urllibдля работы с HTTP-запросами, timeчтобы определить, сколько времени требуется для выполнения функций, и jsonчтобы легко преобразовать возвращаемые данные json. через Genrenator API.

Вы можете найти код для этого примера здесь .

Начнем с простой функции:

def write_genre(file_name):
    """
    Uses genrenator from binaryjazz.us to write a random genre to the
    name of the given file
    """

    req = Request("https://binaryjazz.us/wp-json/genrenator/v1/genre/", headers={"User-Agent": "Mozilla/5.0"})
    genre = json.load(urlopen(req))

    with open(file_name, "w") as new_file:
        print(f"Writing '{genre}' to '{file_name}'...")
        new_file.write(genre)

Изучая приведенный выше код, мы делаем запрос к Genrenator API, загружаем его ответ JSON (случайный музыкальный жанр), распечатываем его, а затем записываем в файл.

Без заголовка «User-Agent» вы получите 304.

Что нас действительно интересует, так это следующий раздел, где происходит фактическая многопоточность:

threads = []

for i in range(5):
    thread = threading.Thread(
        target=write_genre,
        args=[f"./threading/new_file{i}.txt"]
    )
    thread.start()
    threads.append(thread)

for thread in threads:
    thread.join()

Сначала мы начинаем со списка. Затем мы повторяем пять раз, каждый раз создавая новый поток. Затем мы запускаем каждый поток, добавляем его в наш список «потоков», а затем проходим по нашему списку в последний раз, чтобы присоединиться к каждому потоку.

Объяснение: Создавать потоки в Python очень просто.

Чтобы создать новый поток, используйте threading.Thread(). Вы можете передать в него kwarg (аргумент ключевого слова) targetсо значением любой функции, которую вы хотите запустить в этом потоке. Но передавать только имя функции, а не ее значение (имеющееся в виду для наших целей, write_genreа не write_genre()). Чтобы передать аргументы, передайте «kwargs» (который принимает dict ваших kwargs) или «args» (который принимает итерацию, содержащую ваши аргументы — в данном случае список).

Однако создание потока — это не то же самое, что запуск потока. Чтобы начать тему, используйте {the name of your thread}.start(). Запуск потока означает «начало его выполнения».

Наконец, когда мы объединяем потоки с помощью thread.join(), все, что мы делаем, — это обеспечиваем завершение потока, прежде чем продолжить работу с нашим кодом.

Потоки

Но что такое нить?

Поток — это способ, позволяющий вашему компьютеру разбить один процесс/программу на множество легковесных частей, которые выполняются параллельно. Несколько сбивает с толку то, что стандартная реализация многопоточности в Python ограничивает возможность выполнения потоков только по одному из-за так называемой глобальной блокировки интерпретатора (GIL). GIL необходим, потому что управление памятью CPython (реализация Python по умолчанию) не является потокобезопасным. Из-за этого ограничения многопоточность в Python является одновременной, но не параллельной. Чтобы обойти это, в Python есть отдельный multiprocessingмодуль, не ограниченный GIL, который запускает отдельные процессы, обеспечивая параллельное выполнение вашего кода. Использование multiprocessingмодуля почти идентично использованию threadingмодуля.

Дополнительную информацию о GIL Python и безопасности потоков можно найти в официальной документации Real Python и Python .

Вскоре мы более подробно рассмотрим многопроцессорность в Python.

Прежде чем мы покажем потенциальное улучшение скорости по сравнению с беспотоковым кодом, я позволил себе также создать непоточную версию той же программы (опять же, доступную на GitHub ). Вместо того, чтобы создавать новый поток и присоединяться к каждому из них, он вместо этого вызывает write_genreцикл for, который повторяется пять раз.

Чтобы сравнить тесты скорости, я также импортировал timeбиблиотеку для измерения времени выполнения наших скриптов:

Starting...
Writing "binary indoremix" to "./sync/new_file0.txt"...
Writing "slavic aggro polka fusion" to "./sync/new_file1.txt"...
Writing "israeli new wave" to "./sync/new_file2.txt"...
Writing "byzantine motown" to "./sync/new_file3.txt"...
Writing "dutch hate industrialtune" to "./sync/new_file4.txt"...
Time to complete synchronous read/writes: 1.42 seconds

После запуска сценария мы видим, что мой компьютер занимает около 1,49 секунды (наряду с классическими музыкальными жанрами, такими как «голландская ненависть индастриалтюн»). Не так уж плохо.

Теперь давайте запустим версию, использующую многопоточность:

Starting...
Writing "college k-dubstep" to "./threading/new_file2.txt"...
Writing "swiss dirt" to "./threading/new_file0.txt"...
Writing "bop idol alternative" to "./threading/new_file4.txt"...
Writing "ethertrio" to "./threading/new_file1.txt"...
Writing "beach aust shanty français" to "./threading/new_file3.txt"...
Time to complete threading read/writes: 0.77 seconds

Первое, что может вас заинтересовать, это то, что функции выполняются не по порядку: 2 - 0 - 4 - 1 - 3.

Это связано с асинхронным характером многопоточности: пока одна функция ожидает, начинается другая и так далее. Поскольку мы можем продолжать выполнять задачи, пока ждем завершения других (из-за сетевых или файловых операций ввода-вывода), вы также могли заметить, что мы сократили наше время примерно вдвое: 0,77 секунды. Хотя сейчас это может показаться не таким уж большим, легко представить себе вполне реальный случай создания веб-приложения, которое должно записывать гораздо больше данных в файл или взаимодействовать с гораздо более сложными веб-сервисами.

Итак, если многопоточность — это так здорово, почему бы нам не закончить статью на этом?

Потому что есть еще лучшие способы одновременного выполнения задач.

Асинкио

Давайте рассмотрим пример использования asyncio. Для этого метода мы собираемся установить aiohttp с помощью pip. Это позволит нам делать неблокирующие запросы и получать ответы, используя синтаксис async/ await, который вскоре будет представлен. Он также имеет дополнительное преимущество функции, которая преобразует ответ JSON без необходимости импортировать jsonбиблиотеку. Мы также установим и импортируем файлы aiofiles , которые позволяют выполнять неблокирующие операции с файлами. Кроме aiohttpand aiofiles, import asyncio, который входит в стандартную библиотеку Python.

«Неблокирующий» означает, что программа позволит другим потокам продолжать работу, пока она ожидает. Это противоположно «блокирующему» коду, который полностью останавливает выполнение вашей программы. Обычные синхронные операции ввода-вывода страдают от этого ограничения.

Вы можете найти код для этого примера здесь .

Когда у нас есть импорт, давайте взглянем на асинхронную версию write_genreфункции из нашего примера asyncio:

async def write_genre(file_name):
    """
    Uses genrenator from binaryjazz.us to write a random genre to the
    name of the given file
    """

    async with aiohttp.ClientSession() as session:
        async with session.get("https://binaryjazz.us/wp-json/genrenator/v1/genre/") as response:
            genre = await response.json()

    async with aiofiles.open(file_name, "w") as new_file:
        print(f'Writing "{genre}" to "{file_name}"...')
        await new_file.write(genre)

For those not familiar with the async/await syntax that can be found in many other modern languages, async declares that a function, for loop, or with statement must be used asynchronously. To call an async function, you must either use the await keyword from another async function or call create_task() directly from the event loop, which can be grabbed from asyncio.get_event_loop() -- i.e., loop = asyncio.get_event_loop().

Additionally:

  1. async with allows awaiting async responses and file operations.
  2. async for (not used here) iterates over an asynchronous stream.

The Event Loop

Циклы событий — это конструкции, присущие асинхронному программированию, которые позволяют выполнять задачи асинхронно. Поскольку вы читаете эту статью, я могу с уверенностью предположить, что вы, вероятно, не слишком знакомы с этой концепцией. Однако, даже если вы никогда не писали асинхронное приложение, у вас есть опыт работы с циклами событий каждый раз, когда вы используете компьютер. Независимо от того, прослушивает ли ваш компьютер ввод с клавиатуры, играете ли вы в многопользовательские онлайн-игры или просматриваете Reddit во время копирования файлов в фоновом режиме, цикл событий является движущей силой, обеспечивающей бесперебойную и эффективную работу. В чистом виде цикл событий — это процесс, который ожидает триггеров, а затем выполняет определенные (запрограммированные) действия, как только эти триггеры встречаются. Они часто возвращают «обещание» (синтаксис JavaScript) или «будущее». (синтаксис Python) для обозначения того, что задача была добавлена. После завершения задачи обещание или будущее возвращает значение, переданное из вызываемой функции (при условии, что функция действительно возвращает значение).

Идея выполнения функции в ответ на другую функцию называется «обратным вызовом».

Чтобы еще раз взглянуть на обратные вызовы и события, вот отличный ответ на Stack Overflow .

Вот пошаговое руководство по нашей функции:

Мы используем async withдля асинхронного открытия нашего клиентского сеанса. Класс aiohttp.ClientSession()— это то, что позволяет нам делать HTTP-запросы и оставаться на связи с источником, не блокируя выполнение нашего кода. Затем мы делаем асинхронный запрос к Genrenator API и ждем ответа JSON (случайный музыкальный жанр). В следующей строке мы async withснова используем aiofilesбиблиотеку, чтобы асинхронно открыть новый файл, чтобы записать в него наш новый жанр. Печатаем жанр, потом пишем в файл.

В отличие от обычных скриптов Python, программирование с помощью asyncio в значительной степени требует * использования какой-то «основной» функции.

*Если вы не используете устаревший синтаксис «yield» с декоратором @asyncio.coroutine, который будет удален в Python 3.10 .

Это связано с тем, что вам нужно использовать ключевое слово «async», чтобы использовать синтаксис «ожидания», а синтаксис «ожидания» — единственный способ фактически запустить другие асинхронные функции.

Вот наша основная функция:

async def main():
    tasks = []

    for i in range(5):
        tasks.append(write_genre(f"./async/new_file{i}.txt"))

    await asyncio.gather(*tasks)

Как видите, мы объявили это с помощью «async». Затем мы создаем пустой список под названием «задачи» для размещения наших асинхронных задач (вызовы Genrenator и наш файловый ввод-вывод). Мы добавляем наши задачи в наш список, но на самом деле они еще не запущены. Звонки на самом деле не совершаются, пока мы не запланируем их с помощью await asyncio.gather(*tasks). Это запускает все задачи в нашем списке и ждет их завершения, прежде чем продолжить остальную часть нашей программы. Наконец, мы используем asyncio.run(main())для запуска нашей «основной» функции. Функция .run()является точкой входа для нашей программы, и обычно ее следует вызывать только один раз для каждого процесса .

Для тех, кто не знаком, *перед задачами называется «распаковка аргументов». Как это ни звучит, он распаковывает наш список в ряд аргументов для нашей функции. Наша функция asyncio.gather()в этом случае.

И это все, что нам нужно сделать. Теперь запустим нашу программу (источник которой включает в себя те же функции синхронизации, что и примеры синхронного и многопоточного выполнения)...

Writing "albuquerque fiddlehaus" to "./async/new_file1.txt"...
Writing "euroreggaebop" to "./async/new_file2.txt"...
Writing "shoedisco" to "./async/new_file0.txt"...
Writing "russiagaze" to "./async/new_file4.txt"...
Writing "alternative xylophone" to "./async/new_file3.txt"...
Time to complete asyncio read/writes: 0.71 seconds

...мы видим, что это еще быстрее. И вообще метод asyncio всегда будет немного быстрее, чем метод threading. Это связано с тем, что когда мы используем синтаксис «ожидания», мы, по сути, говорим нашей программе «подожди, я сейчас вернусь», но наша программа отслеживает, сколько времени нам потребуется, чтобы закончить то, что мы делаем. Как только мы закончим, наша программа узнает об этом и возобновит работу, как только сможет. Потоки в Python допускают асинхронность, но наша программа теоретически может пропускать разные потоки, которые могут быть еще не готовы, что приводит к потере времени, если есть потоки, готовые к продолжению выполнения.

Итак, когда я должен использовать многопоточность и когда я должен использовать asyncio?

Когда вы пишете новый код, используйте asyncio. Если вам нужно взаимодействовать со старыми библиотеками или теми, которые не поддерживают asyncio, вам может быть лучше использовать многопоточность.

Тестирование asyncio с помощью pytest

Оказывается, тестировать асинхронные функции с помощью pytest так же просто, как тестировать синхронные функции. Просто установите пакет pytest-asyncio с помощью pip, отметьте свои тесты asyncключевым словом и примените декоратор, который дает pytestпонять, что он асинхронный: @pytest.mark.asyncio. Давайте посмотрим на пример.

Во-первых, давайте напишем произвольную асинхронную функцию в файле с именем hello_asyncio.py :

import asyncio


async def say_hello(name: str):
    """ Sleeps for two seconds, then prints 'Hello, {{ name }}!' """
    try:
        if type(name) != str:
            raise TypeError("'name' must be a string")
        if name == "":
            raise ValueError("'name' cannot be empty")
    except (TypeError, ValueError):
        raise

    print("Sleeping...")
    await asyncio.sleep(2)
    print(f"Hello, {name}!")

Функция принимает один строковый аргумент: name. Убедившись, что nameэто строка длиной больше единицы, наша функция асинхронно приостанавливается на две секунды, а затем выводит "Hello, {name}!"на консоль.

Разница между asyncio.sleep()и time.sleep()в том, что asyncio.sleep()он не блокирует.

Теперь давайте проверим это с помощью pytest. В том же каталоге, что и hello_asyncio.py, создайте файл с именем test_hello_asyncio.py, а затем откройте его в своем любимом текстовом редакторе.

Начнем с нашего импорта:

import pytest # Note: pytest-asyncio does not require a separate import

from hello_asyncio import say_hello

Затем мы создадим тест с правильным вводом:

@pytest.mark.parametrize("name", [
    "Robert Paulson",
    "Seven of Nine",
    "x Æ a-12"
])
@pytest.mark.asyncio
async def test_say_hello(name):
    await say_hello(name)

Что следует отметить:

  • Декоратор позволяет pytest @pytest.mark.asyncioработать асинхронно
  • В нашем тесте используется asyncсинтаксис
  • Мы awaitзапускаем нашу асинхронную функцию так, как если бы мы запускали ее вне теста.

Теперь давайте запустим наш тест с подробной -vопцией:

pytest -v
...
collected 3 items

test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED    [ 33%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED     [ 66%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED       [100%]

Выглядит неплохо. Далее мы напишем пару тестов с плохим входом. Вернувшись внутрь test_hello_asyncio.py , давайте создадим класс с именем TestSayHelloThrowsExceptions:

class TestSayHelloThrowsExceptions:
    @pytest.mark.parametrize("name", [
        "",
    ])
    @pytest.mark.asyncio
    async def test_say_hello_value_error(self, name):
        with pytest.raises(ValueError):
            await say_hello(name)

    @pytest.mark.parametrize("name", [
        19,
        {"name", "Diane"},
        []
    ])
    @pytest.mark.asyncio
    async def test_say_hello_type_error(self, name):
        with pytest.raises(TypeError):
            await say_hello(name)

Опять же, мы украшаем наши тесты с помощью @pytest.mark.asyncio, помечаем наши тесты asyncсинтаксисом, а затем вызываем нашу функцию с помощью await.

Запустите тесты еще раз:

pytest -v
...
collected 7 items

test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED                                    [ 14%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED                                     [ 28%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED                                       [ 42%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_value_error[] PASSED        [ 57%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[19] PASSED       [ 71%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name1] PASSED    [ 85%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name2] PASSED    [100%]

Без pytest-asyncio

В качестве альтернативы pytest-asyncio вы можете создать фикстуру pytest, которая создает цикл событий asyncio:

import asyncio
import pytest

from hello_asyncio import say_hello


@pytest.fixture
def event_loop():
    loop = asyncio.get_event_loop()
    yield loop

Затем вместо использования синтаксиса async/ awaitвы создаете свои тесты, как обычные синхронные тесты:

@pytest.mark.parametrize("name", [
    "Robert Paulson",
    "Seven of Nine",
    "x Æ a-12"
])
def test_say_hello(event_loop, name):
    event_loop.run_until_complete(say_hello(name))


class TestSayHelloThrowsExceptions:
    @pytest.mark.parametrize("name", [
        "",
    ])
    def test_say_hello_value_error(self, event_loop, name):
        with pytest.raises(ValueError):
            event_loop.run_until_complete(say_hello(name))

    @pytest.mark.parametrize("name", [
        19,
        {"name", "Diane"},
        []
    ])
    def test_say_hello_type_error(self, event_loop, name):
        with pytest.raises(TypeError):
            event_loop.run_until_complete(say_hello(name))

If you're interested, here's a more advanced tutorial on asyncio testing.

Further Reading

If you want to learn more about what distinguishes Python's implementation of threading vs asyncio, here's a great article from Medium.

For even better examples and explanations of threading in Python, here's a video by Corey Schafer that goes more in-depth, including using the concurrent.futures library.

Lastly, for a massive deep-dive into asyncio itself, here's an article from Real Python completely dedicated to it.

Bonus: One more library you might be interested in is called Unsync, especially if you want to easily convert your current synchronous code into asynchronous code. To use it, you install the library with pip, import it with from unsync import unsync, then decorate whatever currently synchronous function with @unsync to make it asynchronous. To await it and get its return value (which you can do anywhere -- it doesn't have to be in an async/unsync function), just call .result() after the function call.

Parallelism

What is parallelism?

Параллелизм очень сильно связан с параллелизмом. На самом деле, параллелизм — это подмножество параллелизма: в то время как параллельный процесс выполняет несколько задач одновременно, независимо от того, отвлекается ли на них все внимание или нет, параллельный процесс физически выполняет несколько задач одновременно. Хорошим примером может быть вождение, прослушивание музыки и одновременная поедание BLT, которое мы сделали в последнем разделе.

одновременный и параллельный

Поскольку они не требуют больших интенсивных усилий, вы можете выполнять их все сразу, не ожидая ничего и не отвлекая внимания.

Теперь давайте посмотрим, как это реализовать на Python. Мы могли бы использовать multiprocessingбиблиотеку, но давайте concurrent.futuresвместо этого воспользуемся библиотекой — она устраняет необходимость управлять количеством процессов вручную. Поскольку основное преимущество многопроцессорной обработки возникает, когда вы выполняете несколько ресурсоемких задач, мы собираемся вычислить квадраты от 1 миллиона (1000000) до 1 миллиона и 16 (1000016).

Вы можете найти код для этого примера здесь .

Единственный импорт, который нам понадобится, это concurrent.futures:

import concurrent.futures
import time


if __name__ == "__main__":
    pow_list = [i for i in range(1000000, 1000016)]

    print("Starting...")
    start = time.time()

    with concurrent.futures.ProcessPoolExecutor() as executor:
        futures = [executor.submit(pow, i, i) for i in pow_list]

    for f in concurrent.futures.as_completed(futures):
        print("okay")

    end = time.time()
    print(f"Time to complete: {round(end - start, 2)}")

Поскольку я разрабатываю на компьютере с Windows, я использую if __name__ == "main". Это необходимо, поскольку в Windows нет forkсистемного вызова, присущего системам Unix . Поскольку Windows не имеет такой возможности, она прибегает к запуску нового интерпретатора для каждого процесса, пытающегося импортировать основной модуль. Если основной модуль не существует, он перезапускает всю вашу программу, вызывая рекурсивный хаос.

Итак, взглянув на нашу основную функцию, мы используем понимание списка для создания списка от 1 миллиона до 1 миллиона и 16, мы открываем ProcessPoolExecutor с concurrent.futures и используем понимание списка и ProcessPoolExecutor().submit()начинаем выполнять наши процессы и бросать их в список под названием «фьючерсы».

Мы также могли бы использовать ThreadPoolExecutor(), если бы вместо этого хотели использовать потоки — concurrent.futures универсален.

И здесь проявляется асинхронность: список «результаты» на самом деле не содержит результатов выполнения наших функций. Вместо этого он содержит «фьючерсы», которые аналогичны идее «обещаний» в JavaScript. Чтобы наша программа продолжала работать, мы возвращаем эти фьючерсы, которые представляют собой заполнитель для значения. Если мы попытаемся напечатать будущее, в зависимости от того, завершено оно или нет, мы либо вернемся в состояние «ожидание», либо «завершено». После завершения мы можем получить возвращаемое значение (при условии, что оно есть), используя var.result(). В этом случае наша переменная будет «результатом».

Затем мы повторяем наш список фьючерсов, но вместо того, чтобы печатать наши значения, мы просто печатаем «хорошо». Это просто из-за того, насколько массивными получаются результирующие вычисления.

Как и прежде, я создал скрипт сравнения, который делает это синхронно. И, как и прежде, вы можете найти его на GitHub .

Запустив нашу управляющую программу, которая также включает в себя функции синхронизации нашей программы, мы получаем:

Starting...
okay
...
okay
Time to complete: 54.64

Ух ты. 54,64 секунды — это довольно много. Посмотрим, будет ли лучше наша версия с многопроцессорностью:

Starting...
okay
...
okay
Time to complete: 6.24

Наше время значительно сократилось. Мы находимся примерно в 1/9 от нашего первоначального времени.

Так что же произойдет, если вместо этого мы будем использовать потоки?

Я уверен, вы можете догадаться - это будет не намного быстрее, чем синхронное выполнение. На самом деле, это может быть медленнее, потому что для создания новых потоков по-прежнему требуется немного времени и усилий. Но не верьте мне на слово, вот что мы получим, если заменим ProcessPoolExecutor()на ThreadPoolExecutor():

Starting...
okay
...
okay
Time to complete: 53.83

Как я упоминал ранее, многопоточность позволяет вашим приложениям сосредоточиться на новых задачах, пока другие ждут. В этом случае мы никогда не сидим сложа руки. С другой стороны, многопроцессорность запускает совершенно новые сервисы, обычно на отдельных ядрах ЦП, готовые делать все, что вы попросите, полностью в тандеме с тем, что делает ваш скрипт. Вот почему многопроцессорная версия, занимающая примерно 1/9 времени, имеет смысл — у меня 8 ядер в моем процессоре.

Теперь, когда мы поговорили о параллелизме и параллелизме в Python, мы можем, наконец, прояснить термины. Если у вас возникли проблемы с различием между терминами, вы можете безопасно и точно думать о наших предыдущих определениях «параллелизм» и «параллелизм» как «параллельный параллелизм» и «непараллельный параллелизм» соответственно.

Дальнейшее чтение

В Real Python есть отличная статья о параллелизме и параллелизме .

У Engineer Man есть хорошее видео, сравнивающее многопоточность и многопроцессорность .

У Кори Шафера также есть хорошее видео о многопроцессорности в том же духе, что и его видео о многопоточности.

Если вы смотрите только одно видео, посмотрите это превосходное выступление Рэймонда Хеттингера . Он проделывает потрясающую работу, объясняя различия между многопроцессорностью, многопоточностью и асинхронностью.

Сочетание Asyncio с многопроцессорностью

Что делать, если мне нужно объединить множество операций ввода-вывода с тяжелыми вычислениями?

Мы тоже можем это сделать. Скажем, вам нужно очистить 100 веб-страниц для определенной части информации, а затем вам нужно сохранить эту часть информации в файле на потом. Мы можем разделить вычислительную мощность между каждым из ядер нашего компьютера, заставив каждый процесс очищать часть страниц.

Для этого скрипта давайте установим Beautiful Soup , который поможет нам легко очищать наши страницы: pip install beautifulsoup4. На этот раз у нас на самом деле довольно много импорта. Вот они, и вот почему мы их используем:

import asyncio                         # Gives us async/await
import concurrent.futures              # Allows creating new processes
import time
from math import floor                 # Helps divide up our requests evenly across our CPU cores
from multiprocessing import cpu_count  # Returns our number of CPU cores

import aiofiles                        # For asynchronously performing file I/O operations
import aiohttp                         # For asynchronously making HTTP requests
from bs4 import BeautifulSoup          # For easy webpage scraping

Вы можете найти код для этого примера здесь .

Во-первых, мы собираемся создать асинхронную функцию, которая отправляет запрос в Википедию на получение случайных страниц. Мы будем очищать заголовок каждой страницы, который получим, с помощью BeautifulSoup, а затем добавим его в заданный файл; мы будем отделять каждый заголовок табуляцией. Функция будет принимать два аргумента:

  1. num_pages — количество страниц для запроса и очистки заголовков
  2. output_file - файл, в который добавляются наши заголовки.
async def get_and_scrape_pages(num_pages: int, output_file: str):
    """
    Makes {{ num_pages }} requests to Wikipedia to receive {{ num_pages }} random
    articles, then scrapes each page for its title and appends it to {{ output_file }},
    separating each title with a tab: "\\t"

    #### Arguments
    ---
    num_pages: int -
        Number of random Wikipedia pages to request and scrape

    output_file: str -
        File to append titles to
    """
    async with \
    aiohttp.ClientSession() as client, \
    aiofiles.open(output_file, "a+", encoding="utf-8") as f:

        for _ in range(num_pages):
            async with client.get("https://en.wikipedia.org/wiki/Special:Random") as response:
                if response.status > 399:
                    # I was getting a 429 Too Many Requests at a higher volume of requests
                    response.raise_for_status()

                page = await response.text()
                soup = BeautifulSoup(page, features="html.parser")
                title = soup.find("h1").text

                await f.write(title + "\t")

        await f.write("\n")

Мы оба асинхронно открываем aiohttp ClientSessionи наш выходной файл. Режим a+означает добавление к файлу и создание его, если он еще не существует. Кодирование наших строк как utf-8 гарантирует, что мы не получим ошибку, если наши заголовки содержат международные символы. Если мы получим ответ об ошибке, мы поднимем его вместо продолжения (при больших объемах запросов я получал 429 Too Many Requests). Мы асинхронно получаем текст из нашего ответа, затем разбираем заголовок и асинхронно добавляем его в наш файл. После того, как мы добавим все наши заголовки, мы добавим новую строку: "\n".

Наша следующая функция — это функция, которую мы будем запускать с каждым новым процессом, чтобы разрешить его асинхронный запуск:

def start_scraping(num_pages: int, output_file: str, i: int):
    """ Starts an async process for requesting and scraping Wikipedia pages """
    print(f"Process {i} starting...")
    asyncio.run(get_and_scrape_pages(num_pages, output_file))
    print(f"Process {i} finished.")

Теперь о нашей основной функции. Начнем с некоторых констант (и объявления нашей функции):

def main():
    NUM_PAGES = 100 # Number of pages to scrape altogether
    NUM_CORES = cpu_count() # Our number of CPU cores (including logical cores)
    OUTPUT_FILE = "./wiki_titles.tsv" # File to append our scraped titles to

    PAGES_PER_CORE = floor(NUM_PAGES / NUM_CORES)
    PAGES_FOR_FINAL_CORE = PAGES_PER_CORE + NUM_PAGES % PAGES_PER_CORE # For our final core

А теперь логика:

    futures = []

    with concurrent.futures.ProcessPoolExecutor(NUM_CORES) as executor:
        for i in range(NUM_CORES - 1):
            new_future = executor.submit(
                start_scraping, # Function to perform
                # v Arguments v
                num_pages=PAGES_PER_CORE,
                output_file=OUTPUT_FILE,
                i=i
            )
            futures.append(new_future)

        futures.append(
            executor.submit(
                start_scraping,
                PAGES_FOR_FINAL_CORE, OUTPUT_FILE, NUM_CORES-1
            )
        )

    concurrent.futures.wait(futures)

Мы создаем массив для хранения наших фьючерсов, затем мы создаем ProcessPoolExecutor, устанавливая его max_workersравным нашему количеству ядер. Мы перебираем диапазон, равный нашему количеству ядер минус 1, запуская новый процесс с нашей start_scrapingфункцией. Затем мы добавляем к нему наш список фьючерсов. У нашего последнего ядра потенциально будет дополнительная работа, поскольку оно будет очищать количество страниц, равное каждому из наших других ядер, но дополнительно будет очищать количество страниц, равное остатку, который мы получили при делении нашего общего количества страниц для очистки. по общему количеству ядер процессора.

Убедитесь, что ваша основная функция действительно запущена:

if __name__ == "__main__":
    start = time.time()
    main()
    print(f"Time to complete: {round(time.time() - start, 2)} seconds.")

После запуска программы на моем 8-ядерном процессоре (вместе с кодом бенчмаркинга):

Эта версия ( asyncio с многопроцессорностью ):

Time to complete: 5.65 seconds.

Только многопроцессорность :

Time to complete: 8.87 seconds.

только асинкио :

Time to complete: 47.92 seconds.

Полностью синхронно :

Time to complete: 88.86 seconds.

На самом деле я очень удивлен, увидев, что улучшение asyncio с многопроцессорностью по сравнению с просто многопроцессорностью оказалось не таким значительным, как я думал.

Резюме: когда использовать многопроцессорность, а не асинхронность или многопоточность

  1. Используйте многопроцессорную обработку, когда вам нужно выполнить много тяжелых вычислений, и вы можете разделить их.
  2. Используйте asyncio или многопоточность, когда вы выполняете операции ввода-вывода — связь с внешними ресурсами или чтение/запись из/в файлы.
  3. Многопроцессорность и асинхронность можно использовать вместе, но хорошее эмпирическое правило состоит в том, чтобы разветвить процесс до того, как вы начнете использовать поток/использовать асинхронность, а не наоборот — потоки относительно дешевы по сравнению с процессами.

Async/Await на других языках

async/ awaitи подобный синтаксис также существует в других языках, и в некоторых из этих языков его реализация может сильно отличаться.

.NET: от F# до C

Первым языком программирования (еще в 2007 году), использовавшим этот asyncсинтаксис, был Microsoft F#. В то время как он точно не использует awaitожидание вызова функции, он использует особый синтаксис, такой как let!и do!наряду с проприетарными Asyncфункциями, включенными в Systemмодуль.

Дополнительные сведения об асинхронном программировании на F# можно найти в документации Microsoft по F# .

Затем их команда C# построила эту концепцию, и именно здесь родились ключевые слова async/ , с awaitкоторыми мы теперь знакомы:

using System;

// Allows the "Task" return type
using System.Threading.Tasks;

public class Program
{
    // Declare an async function with "async"
    private static async Task<string> ReturnHello()
    {
        return "hello world";
    }

    // Main can be async -- no problem
    public static async Task Main()
    {
        // await an async string
        string result = await ReturnHello();

        // Print the string we got asynchronously
        Console.WriteLine(result);
    }
}

Запустите его на .NETFiddle

Мы гарантируем, что мы, using System.Threading.Tasksпоскольку он включает Taskтип, и, как правило, Taskтип необходим для ожидания асинхронной функции. Самое классное в C# то, что вы можете сделать свою основную функцию асинхронной, просто объявив ее с помощью async, и у вас не будет никаких проблем.

Если вы хотите узнать больше о async/ awaitв C#, в документации Microsoft по C# есть хорошая страница.

JavaScript

asyncСинтаксис / , впервые представленный в ES6, по awaitсути представляет собой абстракцию обещаний JavaScript (которые аналогичны фьючерсам Python). Однако, в отличие от Python, пока вы не ждете, вы можете вызывать асинхронную функцию в обычном режиме без специальной функции, такой как Python asyncio.start():

// Declare a function with async
async function returnHello(){
    return "hello world";
}

async function printSomething(){
    // await an async string
    const result = await returnHello();

    // print the string we got asynchronously
    console.log(result);
}

// Run our async code
printSomething();

Запустите его на JSFiddle

См. MDN для получения дополнительной информации о async/ awaitв JavaScript .

Ржавчина

Теперь Rust также позволяет использовать синтаксис async/ awaitи работает аналогично Python, C# и JavaScript:

// Allows blocking synchronous code to run async code
use futures::executor::block_on;

// Declare an async function with "async"
async fn return_hello() -> String {
    "hello world".to_string()
}

// Code that awaits must also be declared with "async"
async fn print_something(){
    // await an async String
    let result: String = return_hello().await;

    // Print the string we got asynchronously
    println!("{0}", result);
}

fn main() {
    // Block the current synchronous execution to run our async code
    block_on(print_something());
}

Запустите его на Rust Play

Чтобы использовать асинхронные функции, мы должны сначала добавить futures = "0.3"в наш Cargo.toml . Затем мы импортируем block_onфункцию с use futures::executor::block_on-- block_onэто необходимо для запуска нашей асинхронной функции из нашей синхронной mainфункции.

Вы можете найти больше информации о async/ awaitв Rust в документации Rust.

Идти

Вместо традиционного синтаксиса async/ await, присущего всем предыдущим рассмотренным нами языкам, в Go используются «горутины» и «каналы». Вы можете думать о канале как о будущем Python. В Go вы обычно отправляете канал в качестве аргумента функции, а затем используете goдля одновременного запуска функции. Всякий раз, когда вам нужно убедиться, что функция завершила свое выполнение, вы используете <-синтаксис, который вы можете считать более распространенным awaitсинтаксисом. Если ваша горутина (функция, которую вы запускаете асинхронно) имеет возвращаемое значение, ее можно получить таким образом.

package main

import "fmt"

// "chan" makes the return value a string channel instead of a string
func returnHello(result chan string){
    // Gives our channel a value
    result <- "hello world"
}

func main() {
    // Creates a string channel
    result := make(chan string)

    // Starts execution of our goroutine
    go returnHello(result)

    // Awaits and prints our string
    fmt.Println(<- result)
}

Запустите его на игровой площадке Go

Дополнительные сведения о параллелизме в Go см . в статье «Введение в программирование на Go » Калеба Докси.

Рубин

Подобно Python, Ruby также имеет ограничение Global Interpreter Lock. Чего у него нет, так это параллелизма, встроенного в язык. Тем не менее, есть созданный сообществом гем, который позволяет параллелизм в Ruby, и вы можете найти его исходный код на GitHub .

Ява

Как и в Ruby, в Java нет встроенного синтаксиса async/ , но есть возможности параллелизма с использованием модуля. Однако Electronic Arts написала асинхронную библиотеку , позволяющую использовать в качестве метода. Это не совсем то же самое, что Python/C#/JavaScript/Rust, но на него стоит обратить внимание, если вы являетесь Java-разработчиком и заинтересованы в такой функциональности.awaitjava.util.concurrentawait

С++

Хотя C++ также не имеет синтаксиса async/ await, у него есть возможность использовать фьючерсы для одновременного запуска кода с использованием futuresмодуля:

#include <iostream>
#include <string>

// Necessary for futures
#include <future>

// No async declaration needed
std::string return_hello() {
    return "hello world";
}

int main ()
{
    // Declares a string future
    std::future<std::string> fut = std::async(return_hello);

    // Awaits the result of the future
    std::string result = fut.get();

    // Prints the string we got asynchronously
    std::cout << result << '\n';
}

Запустите его в C++ Shell

Нет необходимости объявлять функцию с каким-либо ключевым словом, чтобы указать, может и должна ли она выполняться асинхронно. Вместо этого вы объявляете свое начальное будущее всякий раз, когда вам это нужно, std::future<{{ function return type }}>и устанавливаете его равным std::async(), включая имя функции, которую вы хотите выполнить асинхронно, вместе с любыми аргументами, которые она принимает, т . Е. std::async(do_something, 1, 2, "string"). Чтобы дождаться значения будущего, используйте для него .get()синтаксис.

Вы можете найти документацию по асинхронности в C++ на сайте cplusplus.com.

Резюме

Независимо от того, работаете ли вы с асинхронными сетевыми или файловыми операциями или выполняете множество сложных вычислений, существует несколько различных способов максимизировать эффективность вашего кода.

Если вы используете Python, вы можете использовать asyncioили threadingмаксимально использовать операции ввода-вывода или multiprocessingмодуль для кода, интенсивно использующего ЦП.

Также помните, что concurrent.futuresмодуль можно использовать вместо любого threadingили multiprocessing.

Если вы используете другой язык программирования, скорее всего, для него тоже есть реализация async/ .await

Источник:  https://testdriven.io

#python #concurrency #asyncio 

Примените Concurrency, Parallelism и Asyncio для ускорения Python
小泉  晃

小泉 晃

1660205580

應用並發、並行和異步加速 Python

什麼是並發和並行性,它們如何應用於 Python?

您的應用程序運行緩慢的原因有很多。有時這是由於算法設計不佳或數據結構選擇錯誤造成的。然而,有時,這是由於我們無法控制的力量,例如硬件限製或網絡的怪癖。這就是並發性和並行性適合的地方。它們允許您的程序同時執行多項操作,或者同時或通過浪費最少的時間等待繁忙的任務。

無論您是處理外部 Web 資源、讀取和寫入多個文件,還是需要多次使用不同參數的計算密集型函數,本文都應幫助您最大限度地提高代碼的效率和速度。

首先,我們將深入研究什麼是並發和並行性,以及它們如何使用標準庫(如線程、多處理和異步)融入 Python 領域。本文的最後一部分將比較 Python 對async/的實現await與其他語言的實現方式。

您可以在 GitHub 上的concurrency-parallelism-and-asyncio 存儲庫中找到本文中的所有代碼示例。

要完成本文中的示例,您應該已經知道如何處理 HTTP 請求。

目標

在本文結束時,您應該能夠回答以下問題:

  1. 什麼是並發?
  2. 什麼是線程?
  3. 當某些東西是非阻塞的時,這意味著什麼?
  4. 什麼是事件循環?
  5. 什麼是回調?
  6. 為什麼 asyncio 方法總是比 threading 方法快一點?
  7. 什麼時候應該使用線程,什麼時候應該使用 asyncio?
  8. 什麼是並行?
  9. 並發和並行有什麼區別?
  10. 是否可以將 asyncio 與多處理結合起來?
  11. 什麼時候應該使用多處理與異步或線程?
  12. 多處理、異步和 concurrency.futures 之間有什麼區別?
  13. 如何使用 pytest 測試 asyncio?

並發

什麼是並發?

An effective definition for concurrency is "being able to perform multiple tasks at once". This is a bit misleading though, as the tasks may or may not actually be performed at exactly the same time. Instead, a process might start, then once it's waiting on a specific instruction to finish, switch to a new task, only to come back once it's no longer waiting. Once one task is finished, it switches again to an unfinished task until they have all been performed. Tasks start asynchronously, get performed asynchronously, and then finish asynchronously.

並發,非並行

If that was confusing to you, let's instead think of an analogy: Say you want to make a BLT. First, you'll want to throw the bacon in a pan on medium-low heat. While the bacon's cooking, you can get out your tomatoes and lettuce and start preparing (washing and cutting) them. All the while, you continue checking on and occasionally flipping over your bacon.

At this point, you've started a task, and then started and completed two more in the meantime, all while you're still waiting on the first.

Eventually you put your bread in a toaster. While it's toasting, you continue checking on your bacon. As pieces get finished, you pull them out and place them on a plate. Once your bread is done toasting, you apply to it your sandwich spread of choice, and then you can start layering on your tomatoes, lettuce, and then, once it's done cooking, your bacon. Only once everything is cooked, prepared, and layered can you place the last piece of toast onto your sandwich, slice it (optional), and eat it.

Because it requires you to perform multiple tasks at the same time, making a BLT is inherently a concurrent process, even if you are not giving your full attention to each of those tasks all at once. For all intents and purposes, for the next section, we'll refer to this form of concurrency as just "concurrency." We'll differentiate it later on in this article.

For this reason, concurrency is great for I/O-intensive processes -- tasks that involve waiting on web requests or file read/write operations.

In Python, there are a few different ways to achieve concurrency. The first we'll take a look at is the threading library.

For our examples in this section, we're going to build a small Python program that grabs a random music genre from Binary Jazz's Genrenator API five times, prints the genre to the screen, and puts each one into its own file.

To work with threading in Python, the only import you'll need is threading, but for this example, I've also imported urllib to work with HTTP requests, time to determine how long the functions take to complete, and json to easily convert the json data returned from the Genrenator API.

You can find the code for this example here.

Let's start with a simple function:

def write_genre(file_name):
    """
    Uses genrenator from binaryjazz.us to write a random genre to the
    name of the given file
    """

    req = Request("https://binaryjazz.us/wp-json/genrenator/v1/genre/", headers={"User-Agent": "Mozilla/5.0"})
    genre = json.load(urlopen(req))

    with open(file_name, "w") as new_file:
        print(f"Writing '{genre}' to '{file_name}'...")
        new_file.write(genre)

Examining the code above, we're making a request to the Genrenator API, loading its JSON response (a random music genre), printing it, then writing it to a file.

Without the "User-Agent" header you will receive a 304.

What we're really interested in is the next section, where the actual threading happens:

threads = []

for i in range(5):
    thread = threading.Thread(
        target=write_genre,
        args=[f"./threading/new_file{i}.txt"]
    )
    thread.start()
    threads.append(thread)

for thread in threads:
    thread.join()

We first start with a list. We then proceed to iterate five times, creating a new thread each time. Next, we start each thread, append it to our "threads" list, and then iterate over our list one last time to join each thread.

Explanation: Creating threads in Python is easy.

To create a new thread, use threading.Thread(). You can pass into it the kwarg (keyword argument) target with a value of whatever function you would like to run on that thread. But only pass in the name of the function, not its value (meaning, for our purposes, write_genre and not write_genre()). To pass arguments, pass in "kwargs" (which takes a dict of your kwargs) or "args" (which takes an iterable containing your args -- in this case, a list).

Creating a thread is not the same as starting a thread, however. To start your thread, use {the name of your thread}.start(). Starting a thread means "starting its execution."

Lastly, when we join threads with thread.join(), all we're doing is ensuring the thread has finished before continuing on with our code.

Threads

But what exactly is a thread?

A thread is a way of allowing your computer to break up a single process/program into many lightweight pieces that execute in parallel. Somewhat confusingly, Python's standard implementation of threading limits threads to only being able to execute one at a time due to something called the Global Interpreter Lock (GIL). The GIL is necessary because CPython's (Python's default implementation) memory management is not thread-safe. Because of this limitation, threading in Python is concurrent, but not parallel. To get around this, Python has a separate multiprocessing module not limited by the GIL that spins up separate processes, enabling parallel execution of your code. Using the multiprocessing module is nearly identical to using the threading module.

More info about Python's GIL and thread safety can be found on Real Python and Python's official docs.

We'll take a more in-depth look at multiprocessing in Python shortly.

Before we show the potential speed improvement over non-threaded code, I took the liberty of also creating a non-threaded version of the same program (again, available on GitHub). Instead of creating a new thread and joining each one, it instead calls write_genre in a for loop that iterates five times.

To compare speed benchmarks, I also imported the time library to time the execution of our scripts:

Starting...
Writing "binary indoremix" to "./sync/new_file0.txt"...
Writing "slavic aggro polka fusion" to "./sync/new_file1.txt"...
Writing "israeli new wave" to "./sync/new_file2.txt"...
Writing "byzantine motown" to "./sync/new_file3.txt"...
Writing "dutch hate industrialtune" to "./sync/new_file4.txt"...
Time to complete synchronous read/writes: 1.42 seconds

Upon running the script, we see that it takes my computer around 1.49 seconds (along with classic music genres such as "dutch hate industrialtune"). Not too bad.

Now let's run the version that uses threading:

Starting...
Writing "college k-dubstep" to "./threading/new_file2.txt"...
Writing "swiss dirt" to "./threading/new_file0.txt"...
Writing "bop idol alternative" to "./threading/new_file4.txt"...
Writing "ethertrio" to "./threading/new_file1.txt"...
Writing "beach aust shanty français" to "./threading/new_file3.txt"...
Time to complete threading read/writes: 0.77 seconds

The first thing that might stand out to you is the functions not being completed in order: 2 - 0 - 4 - 1 - 3

This is because of the asynchronous nature of threading: as one function waits, another one begins, and so on. Because we're able to continue performing tasks while we're waiting on others to finish (either due to networking or file I/O operations), you may also have noticed that we cut our time roughly in half: 0.77 seconds. Whereas this might not seem like a lot now, it's easy to imagine the very real case of building a web application that needs to write much more data to a file or interact with much more complex web services.

So, if threading is so great, why don't we end the article here?

Because there are even better ways to perform tasks concurrently.

Asyncio

Let's take a look at an example using asyncio. For this method, we're going to install aiohttp using pip. This will allow us to make non-blocking requests and receive responses using the async/await syntax that will be introduced shortly. It also has the extra benefit of a function that converts a JSON response without needing to import the json library. We'll also install and import aiofiles, which allows non-blocking file operations. Other than aiohttp and aiofiles, import asyncio, which comes with the Python standard library.

"Non-blocking" means a program will allow other threads to continue running while it's waiting. This is opposed to "blocking" code, which stops execution of your program completely. Normal, synchronous I/O operations suffer from this limitation.

You can find the code for this example here.

Once we have our imports in place, let's take a look at the asynchronous version of the write_genre function from our asyncio example:

async def write_genre(file_name):
    """
    Uses genrenator from binaryjazz.us to write a random genre to the
    name of the given file
    """

    async with aiohttp.ClientSession() as session:
        async with session.get("https://binaryjazz.us/wp-json/genrenator/v1/genre/") as response:
            genre = await response.json()

    async with aiofiles.open(file_name, "w") as new_file:
        print(f'Writing "{genre}" to "{file_name}"...')
        await new_file.write(genre)

For those not familiar with the async/await syntax that can be found in many other modern languages, async declares that a function, for loop, or with statement must be used asynchronously. To call an async function, you must either use the await keyword from another async function or call create_task() directly from the event loop, which can be grabbed from asyncio.get_event_loop() -- i.e., loop = asyncio.get_event_loop().

Additionally:

  1. async with allows awaiting async responses and file operations.
  2. async for (not used here) iterates over an asynchronous stream.

The Event Loop

Event loops are constructs inherent to asynchronous programming that allow performing tasks asynchronously. As you're reading this article, I can safely assume you're probably not too familiar with the concept. However, even if you've never written an async application, you have experience with event loops every time you use a computer. Whether your computer is listening for keyboard input, you're playing online multiplayer games, or you're browsing Reddit while you have files copying in the background, an event loop is the driving force that keeps everything working smoothly and efficiently. In its purest essence, an event loop is a process that waits around for triggers and then performs specific (programmed) actions once those triggers are met. They often return a "promise" (JavaScript syntax) or "future" (Python syntax) of some sort to denote that a task has been added. Once the task is finished, the promise or future returns a value passed back from the called function (assuming the function does return a value).

The idea of performing a function in response to another function is called a "callback."

For another take on callbacks and events, here's a great answer on Stack Overflow.

Here's a walkthrough of our function:

We're using async with to open our client session asynchronously. The aiohttp.ClientSession() class is what allows us to make HTTP requests and remain connected to a source without blocking the execution of our code. We then make an async request to the Genrenator API and await the JSON response (a random music genre). In the next line, we use async with again with the aiofiles library to asynchronously open a new file to write our new genre to. We print the genre, then write it to the file.

Unlike regular Python scripts, programming with asyncio pretty much enforces* using some sort of "main" function.

*Unless you're using the deprecated "yield" syntax with the @asyncio.coroutine decorator, which will be removed in Python 3.10.

This is because you need to use the "async" keyword in order to use the "await" syntax, and the "await" syntax is the only way to actually run other async functions.

Here's our main function:

async def main():
    tasks = []

    for i in range(5):
        tasks.append(write_genre(f"./async/new_file{i}.txt"))

    await asyncio.gather(*tasks)

As you can see, we've declared it with "async." We then create an empty list called "tasks" to house our async tasks (calls to Genrenator and our file I/O). We append our tasks to our list, but they are not actually run yet. The calls don't actually get made until we schedule them with await asyncio.gather(*tasks). This runs all of the tasks in our list and waits for them to finish before continuing with the rest of our program. Lastly, we use asyncio.run(main()) to run our "main" function. The .run() function is the entry point for our program, and it should generally only be called once per process.

For those not familiar, the * in front of tasks is called "argument unpacking." Just as it sounds, it unpacks our list into a series of arguments for our function. Our function is asyncio.gather() in this case.

And that's all we need to do. Now, running our program (the source of which includes the same timing functionality of the synchronous and threading examples)...

Writing "albuquerque fiddlehaus" to "./async/new_file1.txt"...
Writing "euroreggaebop" to "./async/new_file2.txt"...
Writing "shoedisco" to "./async/new_file0.txt"...
Writing "russiagaze" to "./async/new_file4.txt"...
Writing "alternative xylophone" to "./async/new_file3.txt"...
Time to complete asyncio read/writes: 0.71 seconds

...we see it's even faster still. And, in general, the asyncio method will always be a bit faster than the threading method. This is because when we use the "await" syntax, we essentially tell our program "hold on, I'll be right back," but our program keeps track of how long it takes us to finish what we're doing. Once we're done, our program will know, and will pick back up as soon as it's able. Threading in Python allows asynchronicity, but our program could theoretically skip around different threads that may not yet be ready, wasting time if there are threads ready to continue running.

So when should I use threading, and when should I use asyncio?

When you're writing new code, use asyncio. If you need to interface with older libraries or those that don't support asyncio, you might be better off with threading.

Testing asyncio with pytest

It turns out testing async functions with pytest is as easy as testing synchronous functions. Just install the pytest-asyncio package with pip, mark your tests with the async keyword, and apply a decorator that lets pytest know it's asynchronous: @pytest.mark.asyncio. Let's look at an example.

First, let's write an arbitrary async function in a file called hello_asyncio.py:

import asyncio


async def say_hello(name: str):
    """ Sleeps for two seconds, then prints 'Hello, {{ name }}!' """
    try:
        if type(name) != str:
            raise TypeError("'name' must be a string")
        if name == "":
            raise ValueError("'name' cannot be empty")
    except (TypeError, ValueError):
        raise

    print("Sleeping...")
    await asyncio.sleep(2)
    print(f"Hello, {name}!")

The function takes a single string argument: name. Upon ensuring that name is a string with a length greater than one, our function asynchronously sleeps for two seconds, then prints "Hello, {name}!" to the console.

The difference between asyncio.sleep() and time.sleep() is that asyncio.sleep() is non-blocking.

Now let's test it with pytest. In the same directory as hello_asyncio.py, create a file called test_hello_asyncio.py, then open it in your favorite text editor.

Let's start with our imports:

import pytest # Note: pytest-asyncio does not require a separate import

from hello_asyncio import say_hello

Then we'll create a test with proper input:

@pytest.mark.parametrize("name", [
    "Robert Paulson",
    "Seven of Nine",
    "x Æ a-12"
])
@pytest.mark.asyncio
async def test_say_hello(name):
    await say_hello(name)

Things to note:

  • The @pytest.mark.asyncio decorator lets pytest work asynchronously
  • Our test uses the async syntax
  • We're awaiting our async function as we would if we were running it outside of a test

Now let's run our test with the verbose -v option:

pytest -v
...
collected 3 items

test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED    [ 33%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED     [ 66%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED       [100%]

Looks good. Next we'll write a couple of tests with bad input. Back inside of test_hello_asyncio.py, let's create a class called TestSayHelloThrowsExceptions:

class TestSayHelloThrowsExceptions:
    @pytest.mark.parametrize("name", [
        "",
    ])
    @pytest.mark.asyncio
    async def test_say_hello_value_error(self, name):
        with pytest.raises(ValueError):
            await say_hello(name)

    @pytest.mark.parametrize("name", [
        19,
        {"name", "Diane"},
        []
    ])
    @pytest.mark.asyncio
    async def test_say_hello_type_error(self, name):
        with pytest.raises(TypeError):
            await say_hello(name)

Again, we decorate our tests with @pytest.mark.asyncio, mark our tests with the async syntax, then call our function with await.

Run the tests again:

pytest -v
...
collected 7 items

test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED                                    [ 14%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED                                     [ 28%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED                                       [ 42%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_value_error[] PASSED        [ 57%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[19] PASSED       [ 71%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name1] PASSED    [ 85%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name2] PASSED    [100%]

Without pytest-asyncio

Alternatively to pytest-asyncio, you can create a pytest fixture that yields an asyncio event loop:

import asyncio
import pytest

from hello_asyncio import say_hello


@pytest.fixture
def event_loop():
    loop = asyncio.get_event_loop()
    yield loop

Then, rather than using the async/await syntax, you create your tests as you would normal, synchronous tests:

@pytest.mark.parametrize("name", [
    "Robert Paulson",
    "Seven of Nine",
    "x Æ a-12"
])
def test_say_hello(event_loop, name):
    event_loop.run_until_complete(say_hello(name))


class TestSayHelloThrowsExceptions:
    @pytest.mark.parametrize("name", [
        "",
    ])
    def test_say_hello_value_error(self, event_loop, name):
        with pytest.raises(ValueError):
            event_loop.run_until_complete(say_hello(name))

    @pytest.mark.parametrize("name", [
        19,
        {"name", "Diane"},
        []
    ])
    def test_say_hello_type_error(self, event_loop, name):
        with pytest.raises(TypeError):
            event_loop.run_until_complete(say_hello(name))

If you're interested, here's a more advanced tutorial on asyncio testing.

Further Reading

If you want to learn more about what distinguishes Python's implementation of threading vs asyncio, here's a great article from Medium.

For even better examples and explanations of threading in Python, here's a video by Corey Schafer that goes more in-depth, including using the concurrent.futures library.

Lastly, for a massive deep-dive into asyncio itself, here's an article from Real Python completely dedicated to it.

Bonus: One more library you might be interested in is called Unsync, especially if you want to easily convert your current synchronous code into asynchronous code. To use it, you install the library with pip, import it with from unsync import unsync, then decorate whatever currently synchronous function with @unsync to make it asynchronous. To await it and get its return value (which you can do anywhere -- it doesn't have to be in an async/unsync function), just call .result() after the function call.

Parallelism

What is parallelism?

Parallelism is very-much related to concurrency. In fact, parallelism is a subset of concurrency: whereas a concurrent process performs multiple tasks at the same time whether they're being diverted total attention or not, a parallel process is physically performing multiple tasks all at the same time. A good example would be driving, listening to music, and eating the BLT we made in the last section at the same time.

並發並行

Because they don't require a lot of intensive effort, you can do them all at once without having to wait on anything or divert your attention away.

Now let's take a look at how to implement this in Python. We could use the multiprocessing library, but let's use the concurrent.futures library instead -- it eliminates the need to manage the number of process manually. Because the major benefit of multiprocessing happens when you perform multiple cpu-heavy tasks, we're going to compute the squares of 1 million (1000000) to 1 million and 16 (1000016).

You can find the code for this example here.

The only import we'll need is concurrent.futures:

import concurrent.futures
import time


if __name__ == "__main__":
    pow_list = [i for i in range(1000000, 1000016)]

    print("Starting...")
    start = time.time()

    with concurrent.futures.ProcessPoolExecutor() as executor:
        futures = [executor.submit(pow, i, i) for i in pow_list]

    for f in concurrent.futures.as_completed(futures):
        print("okay")

    end = time.time()
    print(f"Time to complete: {round(end - start, 2)}")

Because I'm developing on a Windows machine, I'm using if __name__ == "main". This is necessary because Windows does not have the fork system call inherent to Unix systems. Because Windows doesn't have this capability, it resorts to launching a new interpreter with each process that tries to import the main module. If the main module doesn't exist, it reruns your entire program, causing recursive chaos to ensue.

So taking a look at our main function, we use a list comprehension to create a list from 1 million to 1 million and 16, we open a ProcessPoolExecutor with concurrent.futures, and we use list comprehension and ProcessPoolExecutor().submit() to start executing our processes and throwing them into a list called "futures."

We could also use ThreadPoolExecutor() if we wanted to use threads instead -- concurrent.futures is versatile.

And this is where the asynchronicity comes in: The "results" list does not actually contain the results from running our functions. Instead, it contains "futures" which are similar to the JavaScript idea of "promises." In order to allow our program to continue running, we get back these futures that represent a placeholder for a value. If we try to print the future, depending on whether it's finished running or not, we'll either get back a state of "pending" or "finished." Once it's finished we can get the return value (assuming there is one) using var.result(). In this case, our var will be "result."

We then iterate through our list of futures, but instead of printing our values, we're simply printing out "okay." This is just because of how massive the resulting calculations come out to be.

Just as before, I built a comparison script that does this synchronously. And, just as before, you can find it on GitHub.

Running our control program, which also includes functionality for timing our program, we get:

Starting...
okay
...
okay
Time to complete: 54.64

Wow. 54.64 seconds is quite a long time. Let's see if our version with multiprocessing does any better:

Starting...
okay
...
okay
Time to complete: 6.24

Our time has been significantly reduced. We're at about 1/9th of our original time.

So what would happen if we used threading for this instead?

I'm sure you can guess -- it wouldn't be much faster than doing it synchronously. In fact, it might be slower because it still takes a little time and effort to spin up new threads. But don't take my word for it, here's what we get when we replace ProcessPoolExecutor() with ThreadPoolExecutor():

Starting...
okay
...
okay
Time to complete: 53.83

As I mentioned earlier, threading allows your applications to focus on new tasks while others are waiting. In this case, we're never sitting idly by. Multiprocessing, on the other hand, spins up totally new services, usually on separate CPU cores, ready to do whatever you ask it completely in tandem with whatever else your script is doing. This is why the multiprocessing version taking roughly 1/9th of the time makes sense -- I have 8 cores in my CPU.

Now that we've talked about concurrency and parallelism in Python, we can finally set the terms straight. If you're having trouble distinguishing between the terms, you can safely and accurately think of our previous definitions of "parallelism" and "concurrency" as "parallel concurrency" and "non-parallel concurrency" respectively.

Further Reading

Real Python has a great article on concurrency vs parallelism.

Engineer Man has a good video comparison of threading vs multiprocessing.

Corey Schafer also has a good video on multiprocessing in the same spirit as his threading video.

If you only watch one video, watch this excellent talk by Raymond Hettinger. He does an amazing job explaining the differences between multiprocessing, threading, and asyncio.

Combining Asyncio with Multiprocessing

What if I need to combine many I/O operations with heavy calculations?

We can do that too. Say you need to scrape 100 web pages for a specific piece of information, and then you need to save that piece of info in a file for later. We can separate the compute power across each of our computer's cores by making each process scrape a fraction of the pages.

For this script, let's install Beautiful Soup to help us easily scrape our pages: pip install beautifulsoup4. This time we actually have quite a few imports. Here they are, and here's why we're using them:

import asyncio                         # Gives us async/await
import concurrent.futures              # Allows creating new processes
import time
from math import floor                 # Helps divide up our requests evenly across our CPU cores
from multiprocessing import cpu_count  # Returns our number of CPU cores

import aiofiles                        # For asynchronously performing file I/O operations
import aiohttp                         # For asynchronously making HTTP requests
from bs4 import BeautifulSoup          # For easy webpage scraping

You can find the code for this example here.

First, we're going to create an async function that makes a request to Wikipedia to get back random pages. We'll scrape each page we get back for its title using BeautifulSoup, and then we'll append it to a given file; we'll separate each title with a tab. The function will take two arguments:

  1. num_pages - Number of pages to request and scrape for titles
  2. output_file - The file to append our titles to
async def get_and_scrape_pages(num_pages: int, output_file: str):
    """
    Makes {{ num_pages }} requests to Wikipedia to receive {{ num_pages }} random
    articles, then scrapes each page for its title and appends it to {{ output_file }},
    separating each title with a tab: "\\t"

    #### Arguments
    ---
    num_pages: int -
        Number of random Wikipedia pages to request and scrape

    output_file: str -
        File to append titles to
    """
    async with \
    aiohttp.ClientSession() as client, \
    aiofiles.open(output_file, "a+", encoding="utf-8") as f:

        for _ in range(num_pages):
            async with client.get("https://en.wikipedia.org/wiki/Special:Random") as response:
                if response.status > 399:
                    # I was getting a 429 Too Many Requests at a higher volume of requests
                    response.raise_for_status()

                page = await response.text()
                soup = BeautifulSoup(page, features="html.parser")
                title = soup.find("h1").text

                await f.write(title + "\t")

        await f.write("\n")

We're both asynchronously opening an aiohttp ClientSession and our output file. The mode, a+, means append to the file and create it if it doesn't already exist. Encoding our strings as utf-8 ensures we don't get an error if our titles contain international characters. If we get an error response, we'll raise it instead of continuing (at high request volumes I was getting a 429 Too Many Requests). We asynchronously get the text from our response, then we parse the title and asynchronously and append it to our file. After we append all of our titles, we append a new line: "\n".

Our next function is the function we'll start with each new process to allow running it asynchronously:

def start_scraping(num_pages: int, output_file: str, i: int):
    """ Starts an async process for requesting and scraping Wikipedia pages """
    print(f"Process {i} starting...")
    asyncio.run(get_and_scrape_pages(num_pages, output_file))
    print(f"Process {i} finished.")

Now for our main function. Let's start with some constants (and our function declaration):

def main():
    NUM_PAGES = 100 # Number of pages to scrape altogether
    NUM_CORES = cpu_count() # Our number of CPU cores (including logical cores)
    OUTPUT_FILE = "./wiki_titles.tsv" # File to append our scraped titles to

    PAGES_PER_CORE = floor(NUM_PAGES / NUM_CORES)
    PAGES_FOR_FINAL_CORE = PAGES_PER_CORE + NUM_PAGES % PAGES_PER_CORE # For our final core

And now the logic:

    futures = []

    with concurrent.futures.ProcessPoolExecutor(NUM_CORES) as executor:
        for i in range(NUM_CORES - 1):
            new_future = executor.submit(
                start_scraping, # Function to perform
                # v Arguments v
                num_pages=PAGES_PER_CORE,
                output_file=OUTPUT_FILE,
                i=i
            )
            futures.append(new_future)

        futures.append(
            executor.submit(
                start_scraping,
                PAGES_FOR_FINAL_CORE, OUTPUT_FILE, NUM_CORES-1
            )
        )

    concurrent.futures.wait(futures)

We create an array to store our futures, then we create a ProcessPoolExecutor, setting its max_workers equal to our number of cores. We iterate over a range equal to our number of cores minus 1, running a new process with our start_scraping function. We then append it our futures list. Our final core will potentially have extra work to do as it will scrape a number of pages equal to each of our other cores, but will additionally scrape a number of pages equal to the remainder that we got when dividing our total number of pages to scrape by our total number of cpu cores.

Make sure to actually run your main function:

if __name__ == "__main__":
    start = time.time()
    main()
    print(f"Time to complete: {round(time.time() - start, 2)} seconds.")

After running the program with my 8-core CPU (along with benchmarking code):

This version (asyncio with multiprocessing):

Time to complete: 5.65 seconds.

Multiprocessing only:

Time to complete: 8.87 seconds.

asyncio only:

Time to complete: 47.92 seconds.

Completely synchronous:

Time to complete: 88.86 seconds.

I'm actually quite surprised to see that the improvement of asyncio with multiprocessing over just multiprocessing wasn't as great as I thought it would be.

Recap: When to use multiprocessing vs asyncio or threading

  1. Use multiprocessing when you need to do many heavy calculations and you can split them up.
  2. Use asyncio or threading when you're performing I/O operations -- communicating with external resources or reading/writing from/to files.
  3. Multiprocessing and asyncio can be used together, but a good rule of thumb is to fork a process before you thread/use asyncio instead of the other way around -- threads are relatively cheap compared to processes.

Async/Await in Other Languages

async/await and similar syntax also exist in other languages, and in some of those languages, its implementation can differ drastically.

.NET: F# to C

The first programming language (back in 2007) to use the async syntax was Microsoft's F#. Whereas it doesn't exactly use await to wait on a function call, it uses specific syntax like let! and do! along with proprietary Async functions included in the System module.

You can find more about async programming in F# on Microsoft's F# docs.

Their C# team then built upon this concept, and that's where the async/await keywords that we're now familiar with were born:

using System;

// Allows the "Task" return type
using System.Threading.Tasks;

public class Program
{
    // Declare an async function with "async"
    private static async Task<string> ReturnHello()
    {
        return "hello world";
    }

    // Main can be async -- no problem
    public static async Task Main()
    {
        // await an async string
        string result = await ReturnHello();

        // Print the string we got asynchronously
        Console.WriteLine(result);
    }
}

Run it on .NETFiddle

We ensure that we're using System.Threading.Tasks as it includes the Task type, and, in general, the Task type is needed for an async function to be awaited. The cool thing about C# is that you can make your main function asynchronous just by declaring it with async, and you won't have any issues.

If you're interested in learning more about async/await in C#, Microsoft's C# docs have a good page on it.

JavaScript

First introduced in ES6, the async/await syntax is essentially an abstraction over JavaScript promises (which are similar to Python futures). Unlike Python, however, so long as you're not awaiting, you can call an async function normally without a specific function like Python's asyncio.start():

// Declare a function with async
async function returnHello(){
    return "hello world";
}

async function printSomething(){
    // await an async string
    const result = await returnHello();

    // print the string we got asynchronously
    console.log(result);
}

// Run our async code
printSomething();

Run it on JSFiddle

async有關/ awaitin JavaScript的更多信息,請參閱 MDN 。

Rust 現在也允許使用async/await語法,它的工作方式類似於 Python、C# 和 JavaScript:

// Allows blocking synchronous code to run async code
use futures::executor::block_on;

// Declare an async function with "async"
async fn return_hello() -> String {
    "hello world".to_string()
}

// Code that awaits must also be declared with "async"
async fn print_something(){
    // await an async String
    let result: String = return_hello().await;

    // Print the string we got asynchronously
    println!("{0}", result);
}

fn main() {
    // Block the current synchronous execution to run our async code
    block_on(print_something());
}

在 Rust Play 上運行它

為了使用異步函數,我們必須首先添加futures = "0.3"到我們的Cargo.toml中。然後,我們使用--導入block_on函數,這是從同步函數運行異步函數所必需的。use futures::executor::block_onblock_onmain

你可以在 Rust 文檔中找到關於async/ awaitin Rust的更多信息。

Go 使用“goroutines”和“channels”,而不是我們之前介紹的所有語言固有的傳統async/語法。await您可以將通道視為類似於 Python 的未來。在 Go 中,您通常將通道作為參數發送給函數,然後用於go並發運行該函數。每當您需要確保函數完成完成時,您都可以使用<-語法,您可以將其視為更常見的await語法。如果您的 goroutine(您正在異步運行的函數)有返回值,則可以通過這種方式獲取它。

package main

import "fmt"

// "chan" makes the return value a string channel instead of a string
func returnHello(result chan string){
    // Gives our channel a value
    result <- "hello world"
}

func main() {
    // Creates a string channel
    result := make(chan string)

    // Starts execution of our goroutine
    go returnHello(result)

    // Awaits and prints our string
    fmt.Println(<- result)
}

在 Go Playground 中運行它

有關 Go 並發的更多信息,請參閱Caleb Doxsey的 Go 編程簡介。

紅寶石

與 Python 類似,Ruby 也有 Global Interpreter Lock 限制。它沒有的是語言內置的並發性。但是,有一個社區創建的 gem 允許在 Ruby 中進行並發,您可以在 GitHub 上找到它的源代碼

爪哇

與 Ruby 一樣,Java 沒有內置async/語法,但它確實具有使用模塊的並發功能。但是,Electronic Arts 編寫了一個允許將其用作方法的異步庫。它與 Python/C#/JavaScript/Rust 並不完全相同,但如果您是 Java 開發人員並且對此類功能感興趣,則值得研究一下。awaitjava.util.concurrentawait

C++

儘管 C++ 也沒有async/await語法,但它確實能夠使用期貨來使用futures模塊同時運行代碼:

#include <iostream>
#include <string>

// Necessary for futures
#include <future>

// No async declaration needed
std::string return_hello() {
    return "hello world";
}

int main ()
{
    // Declares a string future
    std::future<std::string> fut = std::async(return_hello);

    // Awaits the result of the future
    std::string result = fut.get();

    // Prints the string we got asynchronously
    std::cout << result << '\n';
}

在 C++ Shell 上運行它

無需使用任何關鍵字聲明函數來表示它是否可以並且應該異步運行。相反,您可以在需要時聲明您的初始未來,std::future<{{ function return type }}>並將其設置為等於std::async(),包括您要異步執行的函數的名稱以及它所採用的任何參數——即std::async(do_something, 1, 2, "string"). 要等待未來的值,請使用其.get()上的語法。

您可以在 cplusplus.com 上找到C++ 中的異步文檔。

概括

無論您是在處理異步網絡或文件操作,還是在執行大量複雜的計算,都有幾種不同的方法可以最大限度地提高代碼的效率。

如果您使用的是 Python,則可以使用asynciothreading充分利用 I/O 操作或multiprocessingCPU 密集型代碼的模塊。

還要記住,該concurrent.futures模塊可以用來代替threadingmultiprocessing

如果您使用的是另一種編程語言,那麼可能也有async/的實現await

來源:  https ://testdriven.io

#python #concurrency #asyncio 

應用並發、並行和異步加速 Python
Shayna  Lowe

Shayna Lowe

1660198260

Appliquer La Concurrence, Le Parallélisme Et L'asyncio Pour Accélérer

Que sont la concurrence et le parallélisme, et comment s'appliquent-ils à Python ?

Il existe de nombreuses raisons pour lesquelles vos applications peuvent être lentes. Parfois, cela est dû à une mauvaise conception algorithmique ou à un mauvais choix de structure de données. Parfois, cependant, cela est dû à des forces indépendantes de notre volonté, telles que des contraintes matérielles ou les bizarreries du réseau. C'est là que la concurrence et le parallélisme s'intègrent. Ils permettent à vos programmes de faire plusieurs choses à la fois, soit en même temps, soit en perdant le moins de temps possible à attendre des tâches occupées.

Que vous ayez affaire à des ressources Web externes, que vous lisiez et écriviez dans plusieurs fichiers, ou que vous ayez besoin d'utiliser plusieurs fois une fonction gourmande en calculs avec différents paramètres, cet article devrait vous aider à maximiser l'efficacité et la vitesse de votre code.

Tout d'abord, nous allons approfondir ce que sont la concurrence et le parallélisme et comment ils s'intègrent dans le domaine de Python en utilisant des bibliothèques standard telles que le threading, le multitraitement et l'asyncio. La dernière partie de cet article comparera l'implémentation de async/ de Python awaitavec la façon dont d'autres langages les ont implémentés.

Vous pouvez trouver tous les exemples de code de cet article dans le référentiel concurrency-parallelism-and-asyncio sur GitHub.

Pour parcourir les exemples de cet article, vous devez déjà savoir comment utiliser les requêtes HTTP.

Objectifs

À la fin de cet article, vous devriez être en mesure de répondre aux questions suivantes :

  1. Qu'est-ce que la concurrence ?
  2. Qu'est-ce qu'un fil ?
  3. Qu'est-ce que cela signifie quand quelque chose ne bloque pas ?
  4. Qu'est-ce qu'une boucle événementielle ?
  5. Qu'est-ce qu'un rappel ?
  6. Pourquoi la méthode asyncio est-elle toujours un peu plus rapide que la méthode threading ?
  7. Quand devez-vous utiliser le threading et quand devez-vous utiliser asyncio ?
  8. Qu'est-ce que le parallélisme ?
  9. Quelle est la différence entre concurrence et parallélisme ?
  10. Est-il possible de combiner l'asyncio avec le multitraitement ?
  11. Quand devriez-vous utiliser le multitraitement par rapport à l'asyncio ou au threading ?
  12. Quelle est la différence entre multiprocessing, asyncio et concurrency.futures ?
  13. Comment puis-je tester asyncio avec pytest?

Concurrence

Qu'est-ce que la concurrence ?

Une définition efficace de la simultanéité est "être capable d'effectuer plusieurs tâches à la fois". C'est un peu trompeur, car les tâches peuvent ou non être effectuées exactement au même moment. Au lieu de cela, un processus peut démarrer, puis une fois qu'il attend une instruction spécifique pour se terminer, passer à une nouvelle tâche, pour revenir une fois qu'il n'attend plus. Une fois qu'une tâche est terminée, il repasse à une tâche inachevée jusqu'à ce qu'elles aient toutes été exécutées. Les tâches démarrent de manière asynchrone, sont exécutées de manière asynchrone, puis se terminent de manière asynchrone.

concurrence, pas parallèle

Si cela vous a déconcerté, pensons plutôt à une analogie : Supposons que vous vouliez faire un BLT . Tout d'abord, vous voudrez jeter le bacon dans une casserole à feu moyen-doux. Pendant la cuisson du bacon, vous pouvez sortir vos tomates et laitues et commencer à les préparer (laver et couper). Pendant tout ce temps, vous continuez à vérifier et à retourner de temps en temps votre bacon.

À ce stade, vous avez commencé une tâche, puis commencé et terminé deux autres entre-temps, tout en attendant toujours la première.

Finalement, vous mettez votre pain dans un grille-pain. Pendant qu'il grille, vous continuez à vérifier votre bacon. Au fur et à mesure que les pièces sont terminées, vous les sortez et les placez sur une assiette. Une fois votre pain grillé, vous y appliquez la pâte à tartiner de votre choix, puis vous pouvez commencer à superposer vos tomates, votre laitue, puis, une fois la cuisson terminée, votre bacon. Ce n'est qu'une fois que tout est cuit, préparé et en couches que vous pouvez placer le dernier morceau de pain grillé sur votre sandwich, le trancher (facultatif) et le manger.

Parce qu'il vous oblige à effectuer plusieurs tâches en même temps, la création d'un BLT est par nature un processus simultané, même si vous n'accordez pas toute votre attention à chacune de ces tâches en même temps. À toutes fins utiles, pour la section suivante, nous désignerons cette forme de concurrence par « concurrence ». Nous le différencierons plus tard dans cet article.

Pour cette raison, la simultanéité est idéale pour les processus gourmands en E/S, c'est-à-dire les tâches qui impliquent d'attendre des requêtes Web ou des opérations de lecture/écriture de fichiers.

En Python, il existe plusieurs façons d'obtenir la concurrence. La première que nous allons examiner est la bibliothèque de threads.

Pour nos exemples dans cette section, nous allons construire un petit programme Python qui récupère cinq fois un genre musical aléatoire de l'API Genrenator de Binary Jazz , imprime le genre à l'écran et place chacun dans son propre fichier.

Pour travailler avec le threading en Python, la seule importation dont vous aurez besoin est threading, mais pour cet exemple, j'ai également importé urllibpour travailler avec des requêtes HTTP, timepour déterminer combien de temps les fonctions prennent pour se terminer et jsonpour convertir facilement les données json renvoyées depuis l'API Genrenator.

Vous pouvez trouver le code de cet exemple ici .

Commençons par une fonction simple :

def write_genre(file_name):
    """
    Uses genrenator from binaryjazz.us to write a random genre to the
    name of the given file
    """

    req = Request("https://binaryjazz.us/wp-json/genrenator/v1/genre/", headers={"User-Agent": "Mozilla/5.0"})
    genre = json.load(urlopen(req))

    with open(file_name, "w") as new_file:
        print(f"Writing '{genre}' to '{file_name}'...")
        new_file.write(genre)

En examinant le code ci-dessus, nous faisons une demande à l'API Genrenator, chargeons sa réponse JSON (un genre musical aléatoire), l'imprimons, puis l'écrivons dans un fichier.

Sans l'en-tête "User-Agent", vous recevrez un 304.

Ce qui nous intéresse vraiment, c'est la section suivante, où le threading réel se produit :

threads = []

for i in range(5):
    thread = threading.Thread(
        target=write_genre,
        args=[f"./threading/new_file{i}.txt"]
    )
    thread.start()
    threads.append(thread)

for thread in threads:
    thread.join()

Nous commençons d'abord par une liste. Nous procédons ensuite à une itération cinq fois, en créant un nouveau thread à chaque fois. Ensuite, nous démarrons chaque thread, l'ajoutons à notre liste de "threads", puis parcourons notre liste une dernière fois pour rejoindre chaque thread.

Explication : Créer des threads en Python est facile.

Pour créer un nouveau fil, utilisez threading.Thread(). Vous pouvez y passer le kwarg (argument de mot clé) targetavec une valeur de la fonction que vous souhaitez exécuter sur ce thread. Mais ne transmettez que le nom de la fonction, pas sa valeur (c'est-à-dire, pour nos besoins, write_genreet non write_genre()). Pour passer des arguments, passez "kwargs" (qui prend un dict de vos kwargs) ou "args" (qui prend un itérable contenant vos arguments - dans ce cas, une liste).

Cependant, créer un thread n'est pas la même chose que démarrer un thread. Pour démarrer votre fil, utilisez {the name of your thread}.start(). Démarrer un thread signifie "démarrer son exécution".

Enfin, lorsque nous rejoignons des threads avec thread.join(), tout ce que nous faisons est de nous assurer que le thread est terminé avant de continuer avec notre code.

Fils

Mais qu'est-ce qu'un fil exactement ?

Un thread est un moyen de permettre à votre ordinateur de décomposer un processus/programme unique en plusieurs éléments légers qui s'exécutent en parallèle. De manière quelque peu déroutante, l'implémentation standard de Python du threading limite les threads à ne pouvoir s'exécuter qu'un seul à la fois en raison de quelque chose appelé le Global Interpreter Lock (GIL). Le GIL est nécessaire car la gestion de la mémoire de CPython (l'implémentation par défaut de Python) n'est pas thread-safe. En raison de cette limitation, le threading en Python est simultané, mais pas parallèle. Pour contourner ce problème, Python dispose d'un multiprocessingmodule séparé non limité par le GIL qui exécute des processus séparés, permettant l'exécution parallèle de votre code. L'utilisation du multiprocessingmodule est presque identique à l'utilisation du threadingmodule.

Plus d'informations sur le GIL de Python et la sécurité des threads peuvent être trouvées sur Real Python et la documentation officielle de Python .

Nous examinerons plus en détail le multitraitement en Python sous peu.

Avant de montrer l'amélioration potentielle de la vitesse par rapport au code non-thread, j'ai pris la liberté de créer également une version non-thread du même programme (là encore, disponible sur GitHub ). Au lieu de créer un nouveau thread et de joindre chacun d'eux, il appelle write_genreà la place une boucle for qui itère cinq fois.

Pour comparer les benchmarks de vitesse, j'ai aussi importé la timelibrairie pour chronométrer l'exécution de nos scripts :

Starting...
Writing "binary indoremix" to "./sync/new_file0.txt"...
Writing "slavic aggro polka fusion" to "./sync/new_file1.txt"...
Writing "israeli new wave" to "./sync/new_file2.txt"...
Writing "byzantine motown" to "./sync/new_file3.txt"...
Writing "dutch hate industrialtune" to "./sync/new_file4.txt"...
Time to complete synchronous read/writes: 1.42 seconds

Lors de l'exécution du script, nous constatons qu'il faut environ 1,49 seconde à mon ordinateur (ainsi que des genres musicaux classiques tels que "dutch hate industrialtune"). Pas mal.

Exécutons maintenant la version qui utilise le threading :

Starting...
Writing "college k-dubstep" to "./threading/new_file2.txt"...
Writing "swiss dirt" to "./threading/new_file0.txt"...
Writing "bop idol alternative" to "./threading/new_file4.txt"...
Writing "ethertrio" to "./threading/new_file1.txt"...
Writing "beach aust shanty français" to "./threading/new_file3.txt"...
Time to complete threading read/writes: 0.77 seconds

La première chose qui pourrait vous surprendre est que les fonctions ne sont pas complétées dans l'ordre : 2 - 0 - 4 - 1 - 3

Cela est dû à la nature asynchrone du threading : lorsqu'une fonction attend, une autre commence, et ainsi de suite. Étant donné que nous pouvons continuer à effectuer des tâches pendant que nous attendons que les autres finissent (soit en raison de la mise en réseau ou des opérations d'E/S de fichiers), vous avez peut-être également remarqué que nous réduisons notre temps environ de moitié : 0,77 seconde. Bien que cela ne semble pas beaucoup maintenant, il est facile d'imaginer le cas très réel de la création d'une application Web qui doit écrire beaucoup plus de données dans un fichier ou interagir avec des services Web beaucoup plus complexes.

Donc, si le threading est si génial, pourquoi ne pas terminer l'article ici ?

Parce qu'il existe des moyens encore meilleurs d'effectuer des tâches simultanément.

Asyncio

Jetons un coup d'œil à un exemple utilisant asyncio. Pour cette méthode, nous allons installer aiohttp en utilisant pip. Cela nous permettra de faire des requêtes non bloquantes et de recevoir des réponses en utilisant la syntaxe async/ awaitqui sera introduite prochainement. Il a également l'avantage supplémentaire d'une fonction qui convertit une réponse JSON sans avoir besoin d'importer la jsonbibliothèque. Nous installerons et importerons également des fichiers aio , ce qui permet des opérations de fichiers non bloquantes. Autre que aiohttpet aiofiles, import asyncio, qui est fourni avec la bibliothèque standard Python.

"Non bloquant" signifie qu'un programme permettra à d'autres threads de continuer à s'exécuter pendant qu'il attend. Cela s'oppose au code "bloquant", qui arrête complètement l'exécution de votre programme. Les opérations d'E/S normales et synchrones souffrent de cette limitation.

Vous pouvez trouver le code de cet exemple ici .

Une fois nos importations en place, examinons la version asynchrone de la write_genrefonction de notre exemple asyncio :

async def write_genre(file_name):
    """
    Uses genrenator from binaryjazz.us to write a random genre to the
    name of the given file
    """

    async with aiohttp.ClientSession() as session:
        async with session.get("https://binaryjazz.us/wp-json/genrenator/v1/genre/") as response:
            genre = await response.json()

    async with aiofiles.open(file_name, "w") as new_file:
        print(f'Writing "{genre}" to "{file_name}"...')
        await new_file.write(genre)

Pour ceux qui ne connaissent pas la syntaxe async/ awaitque l'on trouve dans de nombreux autres langages modernes, asyncdéclare qu'une fonction, une forboucle ou une withinstruction doit être utilisée de manière asynchrone. Pour appeler une fonction asynchrone, vous devez soit utiliser le mot- awaitclé d'une autre fonction asynchrone, soit appeler create_task()directement à partir de la boucle d'événements, qui peut être extraite de asyncio.get_event_loop()-- c'est-à-dire loop = asyncio.get_event_loop().

En outre:

  1. async withpermet d'attendre des réponses asynchrones et des opérations sur les fichiers.
  2. async for(non utilisé ici) itère sur un flux asynchrone .

La boucle d'événements

Les boucles d'événements sont des constructions inhérentes à la programmation asynchrone qui permettent d'effectuer des tâches de manière asynchrone. Pendant que vous lisez cet article, je peux supposer que vous n'êtes probablement pas trop familier avec le concept. Cependant, même si vous n'avez jamais écrit d'application asynchrone, vous avez de l'expérience avec les boucles d'événements chaque fois que vous utilisez un ordinateur. Que votre ordinateur écoute les entrées au clavier, que vous jouiez à des jeux multijoueurs en ligne ou que vous naviguiez sur Reddit pendant que vous copiez des fichiers en arrière-plan, une boucle d'événements est la force motrice qui permet à tout de fonctionner de manière fluide et efficace. Dans son essence la plus pure, une boucle d'événements est un processus qui attend des déclencheurs, puis exécute des actions spécifiques (programmées) une fois que ces déclencheurs sont satisfaits. Ils renvoient souvent une "promesse" (syntaxe JavaScript) ou un "futur" (syntaxe Python) d'une sorte pour indiquer qu'une tâche a été ajoutée. Une fois la tâche terminée, la promesse ou le futur renvoie une valeur renvoyée par la fonction appelée (en supposant que la fonction renvoie une valeur).

L'idée d'exécuter une fonction en réponse à une autre fonction s'appelle un "rappel".

Pour une autre approche des rappels et des événements, voici une excellente réponse sur Stack Overflow .

Voici une présentation de notre fonction :

Nous utilisons async withpour ouvrir notre session client de manière asynchrone. La aiohttp.ClientSession()classe est ce qui nous permet de faire des requêtes HTTP et de rester connecté à une source sans bloquer l'exécution de notre code. Nous faisons ensuite une requête asynchrone à l'API Genrenator et attendons la réponse JSON (un genre musical aléatoire). Dans la ligne suivante, nous utilisons async withà nouveau avec la aiofilesbibliothèque pour ouvrir de manière asynchrone un nouveau fichier dans lequel écrire notre nouveau genre. Nous imprimons le genre, puis l'écrivons dans le fichier.

Contrairement aux scripts Python classiques, la programmation avec asyncio applique à peu près* l'utilisation d'une sorte de fonction "principale".

*Sauf si vous utilisez la syntaxe obsolète "yield" avec le décorateur @asyncio.coroutine, qui sera supprimé dans Python 3.10 .

En effet, vous devez utiliser le mot-clé "async" pour utiliser la syntaxe "wait", et la syntaxe "wait" est le seul moyen d'exécuter réellement d'autres fonctions asynchrones.

Voici notre fonction principale :

async def main():
    tasks = []

    for i in range(5):
        tasks.append(write_genre(f"./async/new_file{i}.txt"))

    await asyncio.gather(*tasks)

Comme vous pouvez le voir, nous l'avons déclaré avec "async". Nous créons ensuite une liste vide appelée "tâches" pour héberger nos tâches asynchrones (appels à Genrenator et nos E/S de fichiers). Nous ajoutons nos tâches à notre liste, mais elles ne sont pas encore exécutées. Les appels ne sont pas passés tant que nous ne les avons pas planifiés avec await asyncio.gather(*tasks). Cela exécute toutes les tâches de notre liste et attend qu'elles se terminent avant de continuer avec le reste de notre programme. Enfin, nous utilisons asyncio.run(main())pour exécuter notre fonction "main". La .run()fonction est le point d'entrée de notre programme, et elle ne doit généralement être appelée qu'une seule fois par processus .

Pour ceux qui ne sont pas familiers, le *devant des tâches est appelé "déballage des arguments". Tout comme cela sonne, il décompresse notre liste en une série d'arguments pour notre fonction. Notre fonction est asyncio.gather()dans ce cas.

Et c'est tout ce que nous devons faire. Maintenant, exécutant notre programme (dont la source inclut la même fonctionnalité de synchronisation des exemples synchrones et de threading)...

Writing "albuquerque fiddlehaus" to "./async/new_file1.txt"...
Writing "euroreggaebop" to "./async/new_file2.txt"...
Writing "shoedisco" to "./async/new_file0.txt"...
Writing "russiagaze" to "./async/new_file4.txt"...
Writing "alternative xylophone" to "./async/new_file3.txt"...
Time to complete asyncio read/writes: 0.71 seconds

...on voit que c'est encore plus rapide. Et, en général, la méthode asyncio sera toujours un peu plus rapide que la méthode de threading. En effet, lorsque nous utilisons la syntaxe "attendre", nous disons essentiellement à notre programme "attendez, je reviens tout de suite", mais notre programme garde une trace du temps qu'il nous faut pour terminer ce que nous faisons. Une fois que nous aurons terminé, notre programme le saura et reprendra dès qu'il le pourra. Le threading en Python permet l'asynchronicité, mais notre programme pourrait théoriquement ignorer différents threads qui ne sont peut-être pas encore prêts, perdant du temps s'il y a des threads prêts à continuer à fonctionner.

Alors, quand dois-je utiliser le threading et quand dois-je utiliser asyncio ?

Lorsque vous écrivez un nouveau code, utilisez asyncio. Si vous avez besoin d'interfacer avec des bibliothèques plus anciennes ou celles qui ne prennent pas en charge l'asyncio, vous feriez peut-être mieux d'utiliser le threading.

Tester l'asyncio avec pytest

Il s'avère que tester des fonctions asynchrones avec pytest est aussi simple que tester des fonctions synchrones. Installez simplement le package pytest-asyncio avec pip, marquez vos tests avec le mot- asyncclé et appliquez un décorateur qui indique pytestqu'il est asynchrone : @pytest.mark.asyncio. Prenons un exemple.

Commençons par écrire une fonction asynchrone arbitraire dans un fichier appelé hello_asyncio.py :

import asyncio


async def say_hello(name: str):
    """ Sleeps for two seconds, then prints 'Hello, {{ name }}!' """
    try:
        if type(name) != str:
            raise TypeError("'name' must be a string")
        if name == "":
            raise ValueError("'name' cannot be empty")
    except (TypeError, ValueError):
        raise

    print("Sleeping...")
    await asyncio.sleep(2)
    print(f"Hello, {name}!")

La fonction prend un seul argument de chaîne : name. Après s'être assuré qu'il names'agit d'une chaîne d'une longueur supérieure à un, notre fonction dort de manière asynchrone pendant deux secondes, puis imprime "Hello, {name}!"sur la console.

La différence entre asyncio.sleep()et time.sleep()est qu'il asyncio.sleep()n'est pas bloquant.

Testons-le maintenant avec pytest. Dans le même répertoire que hello_asyncio.py, créez un fichier appelé test_hello_asyncio.py, puis ouvrez-le dans votre éditeur de texte préféré.

Commençons par nos importations :

import pytest # Note: pytest-asyncio does not require a separate import

from hello_asyncio import say_hello

Ensuite, nous allons créer un test avec une entrée appropriée :

@pytest.mark.parametrize("name", [
    "Robert Paulson",
    "Seven of Nine",
    "x Æ a-12"
])
@pytest.mark.asyncio
async def test_say_hello(name):
    await say_hello(name)

À noter :

  • Le @pytest.mark.asynciodécorateur permet à pytest de fonctionner de manière asynchrone
  • Notre test utilise la asyncsyntaxe
  • Nous awaitutilisons notre fonction asynchrone comme nous le ferions si nous l'exécutions en dehors d'un test

Exécutons maintenant notre test avec l' -voption verbose :

pytest -v
...
collected 3 items

test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED    [ 33%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED     [ 66%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED       [100%]

Cela semble bon. Ensuite, nous allons écrire quelques tests avec une mauvaise entrée. De retour à l'intérieur de test_hello_asyncio.py , créons une classe appelée TestSayHelloThrowsExceptions:

class TestSayHelloThrowsExceptions:
    @pytest.mark.parametrize("name", [
        "",
    ])
    @pytest.mark.asyncio
    async def test_say_hello_value_error(self, name):
        with pytest.raises(ValueError):
            await say_hello(name)

    @pytest.mark.parametrize("name", [
        19,
        {"name", "Diane"},
        []
    ])
    @pytest.mark.asyncio
    async def test_say_hello_type_error(self, name):
        with pytest.raises(TypeError):
            await say_hello(name)

Encore une fois, nous décorons nos tests avec @pytest.mark.asyncio, marquons nos tests avec la asyncsyntaxe, puis appelons notre fonction avec await.

Relancez les tests :

pytest -v
...
collected 7 items

test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED                                    [ 14%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED                                     [ 28%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED                                       [ 42%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_value_error[] PASSED        [ 57%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[19] PASSED       [ 71%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name1] PASSED    [ 85%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name2] PASSED    [100%]

Sans pytest-asyncio

Alternativement à pytest-asyncio, vous pouvez créer un appareil pytest qui génère une boucle d'événement asyncio :

import asyncio
import pytest

from hello_asyncio import say_hello


@pytest.fixture
def event_loop():
    loop = asyncio.get_event_loop()
    yield loop

Ensuite, plutôt que d'utiliser la syntaxe async/ await, vous créez vos tests comme vous le feriez pour des tests synchrones normaux :

@pytest.mark.parametrize("name", [
    "Robert Paulson",
    "Seven of Nine",
    "x Æ a-12"
])
def test_say_hello(event_loop, name):
    event_loop.run_until_complete(say_hello(name))


class TestSayHelloThrowsExceptions:
    @pytest.mark.parametrize("name", [
        "",
    ])
    def test_say_hello_value_error(self, event_loop, name):
        with pytest.raises(ValueError):
            event_loop.run_until_complete(say_hello(name))

    @pytest.mark.parametrize("name", [
        19,
        {"name", "Diane"},
        []
    ])
    def test_say_hello_type_error(self, event_loop, name):
        with pytest.raises(TypeError):
            event_loop.run_until_complete(say_hello(name))

Si cela vous intéresse, voici un tutoriel plus avancé sur les tests asynchrones .

Lectures complémentaires

Si vous voulez en savoir plus sur ce qui distingue l'implémentation de Python du threading par rapport à l'asyncio, voici un excellent article de Medium .

Pour des exemples et des explications encore meilleurs sur le threading en Python, voici une vidéo de Corey Schafer qui va plus en profondeur, y compris l'utilisation de la concurrent.futuresbibliothèque.

Enfin, pour une plongée massive dans l'asyncio lui-même, voici un article de Real Python entièrement dédié à celui-ci.

Bonus : Une autre bibliothèque qui pourrait vous intéresser s'appelle Unsync , surtout si vous souhaitez convertir facilement votre code synchrone actuel en code asynchrone. Pour l'utiliser, vous installez la bibliothèque avec pip, l'importez avec from unsync import unsync, puis décorez la fonction actuellement synchrone avec @unsyncpour la rendre asynchrone. Pour l'attendre et obtenir sa valeur de retour (ce que vous pouvez faire n'importe où - il n'est pas nécessaire qu'il soit dans une fonction async/unsync), appelez simplement .result()après l'appel de la fonction.

Parallélisme

Qu'est-ce que le parallélisme ?

Le parallélisme est très lié à la concurrence. En fait, le parallélisme est un sous-ensemble de la simultanéité : alors qu'un processus simultané exécute plusieurs tâches en même temps, qu'elles fassent l'objet d'une attention totale ou non, un processus parallèle exécute physiquement plusieurs tâches en même temps. Un bon exemple serait de conduire, d'écouter de la musique et de manger le BLT que nous avons préparé dans la dernière section en même temps.

concurrent et parallèle

Parce qu'ils ne nécessitent pas beaucoup d'efforts intensifs, vous pouvez les faire tous en même temps sans avoir à attendre quoi que ce soit ou à détourner votre attention.

Voyons maintenant comment implémenter cela en Python. Nous pourrions utiliser la multiprocessingbibliothèque, mais utilisons concurrent.futuresplutôt la bibliothèque -- cela élimine le besoin de gérer manuellement le nombre de processus. Étant donné que le principal avantage du multitraitement se produit lorsque vous effectuez plusieurs tâches gourmandes en ressources processeur, nous allons calculer les carrés de 1 million (1000000) à 1 million et 16 (1000016).

Vous pouvez trouver le code de cet exemple ici .

La seule importation dont nous aurons besoin estconcurrent.futures :

import concurrent.futures
import time


if __name__ == "__main__":
    pow_list = [i for i in range(1000000, 1000016)]

    print("Starting...")
    start = time.time()

    with concurrent.futures.ProcessPoolExecutor() as executor:
        futures = [executor.submit(pow, i, i) for i in pow_list]

    for f in concurrent.futures.as_completed(futures):
        print("okay")

    end = time.time()
    print(f"Time to complete: {round(end - start, 2)}")

Parce que je développe sur une machine Windows, j'utilise if __name__ == "main". Cela est nécessaire car Windows ne dispose pas de l' forkappel système inhérent aux systèmes Unix . Parce que Windows n'a pas cette capacité, il a recours au lancement d'un nouvel interpréteur avec chaque processus qui tente d'importer le module principal. Si le module principal n'existe pas, il relance tout votre programme, provoquant un chaos récursif.

Donc, en regardant notre fonction principale, nous utilisons une compréhension de liste pour créer une liste de 1 million à 1 million et 16, nous ouvrons un ProcessPoolExecutor avec concurrent.futures, et nous utilisons la compréhension de liste et ProcessPoolExecutor().submit()pour commencer à exécuter nos processus et les lancer dans une liste appelée "futures".

Nous pourrions également utiliser ThreadPoolExecutor()si nous voulions utiliser des threads à la place - concurrent.futures est polyvalent.

Et c'est là qu'intervient l'asynchronicité : la liste des "résultats" ne contient pas réellement les résultats de l'exécution de nos fonctions. Au lieu de cela, il contient des "futurs" qui sont similaires à l'idée JavaScript de "promesses". Afin de permettre à notre programme de continuer à fonctionner, nous récupérons ces contrats à terme qui représentent un espace réservé pour une valeur. Si nous essayons d'imprimer le futur, selon qu'il est terminé ou non, nous retrouverons soit un état "en attente" soit "terminé". Une fois terminé, nous pouvons obtenir la valeur de retour (en supposant qu'il y en ait une) en utilisant var.result(). Dans ce cas, notre var sera "résultat".

Nous parcourons ensuite notre liste de contrats à terme, mais au lieu d'imprimer nos valeurs, nous imprimons simplement "d'accord". C'est juste à cause de l'ampleur des calculs qui en résultent.

Comme avant, j'ai construit un script de comparaison qui fait cela de manière synchrone. Et, comme avant, vous pouvez le trouver sur GitHub .

En exécutant notre programme de contrôle, qui inclut également une fonctionnalité pour chronométrer notre programme, nous obtenons :

Starting...
okay
...
okay
Time to complete: 54.64

Ouah. 54,64 secondes, c'est assez long. Voyons si notre version avec multitraitement fait mieux :

Starting...
okay
...
okay
Time to complete: 6.24

Notre temps a été considérablement réduit. Nous sommes à environ 1/9e de notre temps d'origine.

Alors que se passerait-il si nous utilisions le threading pour cela à la place ?

Je suis sûr que vous pouvez deviner - ce ne serait pas beaucoup plus rapide que de le faire de manière synchrone. En fait, cela peut être plus lent car il faut encore un peu de temps et d'efforts pour créer de nouveaux threads. Mais ne me croyez pas sur parole, voici ce que nous obtenons lorsque nous remplaçons ProcessPoolExecutor()par ThreadPoolExecutor():

Starting...
okay
...
okay
Time to complete: 53.83

Comme je l'ai mentionné précédemment, le threading permet à vos applications de se concentrer sur de nouvelles tâches pendant que d'autres attendent. Dans ce cas, nous ne restons jamais les bras croisés. Le multitraitement, d'autre part, crée des services totalement nouveaux, généralement sur des cœurs de processeur séparés, prêts à faire tout ce que vous lui demandez complètement en tandem avec tout ce que fait votre script. C'est pourquoi la version multitraitement prenant environ 1/9ème du temps a du sens - j'ai 8 cœurs dans mon CPU.

Maintenant que nous avons parlé de concurrence et de parallélisme en Python, nous pouvons enfin clarifier les termes. Si vous rencontrez des difficultés pour faire la distinction entre les termes, vous pouvez penser en toute sécurité et avec précision à nos définitions précédentes de "parallélisme" et de "concurrence" comme "concurrence parallèle" et "concurrence non parallèle" respectivement.

Lectures complémentaires

Real Python a un excellent article sur la concurrence vs le parallélisme .

Engineer Man a une bonne comparaison vidéo de threading vs multiprocessing .

Corey Schafer a également une bonne vidéo sur le multitraitement dans le même esprit que sa vidéo de threading.

Si vous ne regardez qu'une seule vidéo, regardez cette excellente conférence de Raymond Hettinger . Il fait un travail incroyable en expliquant les différences entre le multitraitement, le threading et l'asyncio.

Combiner Asyncio avec le multitraitement

Que se passe-t-il si j'ai besoin de combiner de nombreuses opérations d'E/S avec des calculs lourds ?

Nous pouvons le faire également. Supposons que vous deviez récupérer 100 pages Web pour une information spécifique, puis que vous deviez enregistrer cette information dans un fichier pour plus tard. Nous pouvons séparer la puissance de calcul entre chacun des cœurs de notre ordinateur en faisant en sorte que chaque processus gratte une fraction des pages.

Pour ce script, installons Beautiful Soup pour nous aider à gratter facilement nos pages : pip install beautifulsoup4. Cette fois, nous avons en fait pas mal d'importations. Les voici, et voici pourquoi nous les utilisons :

import asyncio                         # Gives us async/await
import concurrent.futures              # Allows creating new processes
import time
from math import floor                 # Helps divide up our requests evenly across our CPU cores
from multiprocessing import cpu_count  # Returns our number of CPU cores

import aiofiles                        # For asynchronously performing file I/O operations
import aiohttp                         # For asynchronously making HTTP requests
from bs4 import BeautifulSoup          # For easy webpage scraping

Vous pouvez trouver le code de cet exemple ici .

Tout d'abord, nous allons créer une fonction asynchrone qui demande à Wikipédia de récupérer des pages aléatoires. Nous gratterons chaque page que nous récupérons pour son titre en utilisant BeautifulSoup, puis nous l'ajouterons à un fichier donné ; nous séparerons chaque titre par une tabulation. La fonction prendra deux arguments :

  1. num_pages - Nombre de pages à demander et à gratter pour les titres
  2. output_file - Le fichier auquel ajouter nos titres
async def get_and_scrape_pages(num_pages: int, output_file: str):
    """
    Makes {{ num_pages }} requests to Wikipedia to receive {{ num_pages }} random
    articles, then scrapes each page for its title and appends it to {{ output_file }},
    separating each title with a tab: "\\t"

    #### Arguments
    ---
    num_pages: int -
        Number of random Wikipedia pages to request and scrape

    output_file: str -
        File to append titles to
    """
    async with \
    aiohttp.ClientSession() as client, \
    aiofiles.open(output_file, "a+", encoding="utf-8") as f:

        for _ in range(num_pages):
            async with client.get("https://en.wikipedia.org/wiki/Special:Random") as response:
                if response.status > 399:
                    # I was getting a 429 Too Many Requests at a higher volume of requests
                    response.raise_for_status()

                page = await response.text()
                soup = BeautifulSoup(page, features="html.parser")
                title = soup.find("h1").text

                await f.write(title + "\t")

        await f.write("\n")

Nous ouvrons tous les deux de manière asynchrone un aiohttp ClientSessionet notre fichier de sortie. Le mode, a+, signifie ajouter au fichier et le créer s'il n'existe pas déjà. L'encodage de nos chaînes en utf-8 garantit que nous n'obtenons pas d'erreur si nos titres contiennent des caractères internationaux. Si nous obtenons une réponse d'erreur, nous l'augmenterons au lieu de continuer (à des volumes de demandes élevés, j'obtenais un 429 Too Many Requests). Nous obtenons de manière asynchrone le texte de notre réponse, puis nous analysons le titre de manière asynchrone et l'ajoutons à notre fichier. Après avoir ajouté tous nos titres, nous ajoutons une nouvelle ligne : "\n".

Notre prochaine fonction est la fonction que nous allons démarrer avec chaque nouveau processus pour permettre son exécution asynchrone :

def start_scraping(num_pages: int, output_file: str, i: int):
    """ Starts an async process for requesting and scraping Wikipedia pages """
    print(f"Process {i} starting...")
    asyncio.run(get_and_scrape_pages(num_pages, output_file))
    print(f"Process {i} finished.")

Passons maintenant à notre fonction principale. Commençons par quelques constantes (et notre déclaration de fonction) :

def main():
    NUM_PAGES = 100 # Number of pages to scrape altogether
    NUM_CORES = cpu_count() # Our number of CPU cores (including logical cores)
    OUTPUT_FILE = "./wiki_titles.tsv" # File to append our scraped titles to

    PAGES_PER_CORE = floor(NUM_PAGES / NUM_CORES)
    PAGES_FOR_FINAL_CORE = PAGES_PER_CORE + NUM_PAGES % PAGES_PER_CORE # For our final core

Et maintenant la logique :

    futures = []

    with concurrent.futures.ProcessPoolExecutor(NUM_CORES) as executor:
        for i in range(NUM_CORES - 1):
            new_future = executor.submit(
                start_scraping, # Function to perform
                # v Arguments v
                num_pages=PAGES_PER_CORE,
                output_file=OUTPUT_FILE,
                i=i
            )
            futures.append(new_future)

        futures.append(
            executor.submit(
                start_scraping,
                PAGES_FOR_FINAL_CORE, OUTPUT_FILE, NUM_CORES-1
            )
        )

    concurrent.futures.wait(futures)

Nous créons un tableau pour stocker nos futurs, puis nous créons un ProcessPoolExecutor, en le définissant max_workerscomme égal à notre nombre de cœurs. Nous parcourons une plage égale à notre nombre de cœurs moins 1, en exécutant un nouveau processus avec notre start_scrapingfonction. Nous l'ajoutons ensuite à notre liste des contrats à terme. Notre noyau final aura potentiellement du travail supplémentaire à faire car il grattera un nombre de pages égal à chacun de nos autres noyaux, mais grattera en outre un nombre de pages égal au reste que nous avons obtenu en divisant notre nombre total de pages à gratter par notre nombre total de cœurs de processeur.

Assurez-vous d'exécuter réellement votre fonction principale :

if __name__ == "__main__":
    start = time.time()
    main()
    print(f"Time to complete: {round(time.time() - start, 2)} seconds.")

Après avoir exécuté le programme avec mon processeur à 8 cœurs (avec le code d'analyse comparative) :

Cette version ( asyncio avec multitraitement ):

Time to complete: 5.65 seconds.

Multitraitement uniquement :

Time to complete: 8.87 seconds.

asynchrone uniquement :

Time to complete: 47.92 seconds.

Complètement synchrone :

Time to complete: 88.86 seconds.

Je suis en fait assez surpris de voir que l'amélioration de l'asyncio avec le multitraitement par rapport au multitraitement n'était pas aussi grande que je le pensais.

Récapitulatif : Quand utiliser le multitraitement par rapport à l'asyncio ou au threading

  1. Utilisez le multitraitement lorsque vous devez effectuer de nombreux calculs lourds et que vous pouvez les diviser.
  2. Utilisez l'asyncio ou le threading lorsque vous effectuez des opérations d'E/S - en communiquant avec des ressources externes ou en lisant/écrivant depuis/vers des fichiers.
  3. Le multitraitement et l'asyncio peuvent être utilisés ensemble, mais une bonne règle de base consiste à bifurquer un processus avant de thread/utiliser asyncio au lieu de l'inverse - les threads sont relativement bon marché par rapport aux processus.

Async/Attente dans d'autres langues

async/ awaitet une syntaxe similaire existent également dans d'autres langages, et dans certains de ces langages, son implémentation peut différer considérablement.

.NET : F# à C

Le premier langage de programmation (en 2007) à utiliser la asyncsyntaxe était le F# de Microsoft. Alors qu'il n'est pas exactement utilisé awaitpour attendre un appel de fonction, il utilise une syntaxe spécifique comme let!et do!avec les Asyncfonctions propriétaires incluses dans le Systemmodule.

Vous pouvez en savoir plus sur la programmation asynchrone en F# dans la documentation F# de Microsoft .

Leur équipe C# s'est ensuite appuyée sur ce concept, et c'est là que sont nés les mots-clés async/ awaitque nous connaissons maintenant :

using System;

// Allows the "Task" return type
using System.Threading.Tasks;

public class Program
{
    // Declare an async function with "async"
    private static async Task<string> ReturnHello()
    {
        return "hello world";
    }

    // Main can be async -- no problem
    public static async Task Main()
    {
        // await an async string
        string result = await ReturnHello();

        // Print the string we got asynchronously
        Console.WriteLine(result);
    }
}

Exécutez-le sur .NETFiddle

On s'assure qu'on est using System.Threading.Taskstel qu'il inclut le Tasktype, et, en général, le Tasktype est nécessaire pour qu'une fonction asynchrone soit attendue. L'avantage de C# est que vous pouvez rendre votre fonction principale asynchrone simplement en la déclarant avec async, et vous n'aurez aucun problème.

Si vous souhaitez en savoir plus sur async/ awaitdans C #, les documents C # de Microsoft contiennent une bonne page.

Javascript

Introduite pour la première fois dans ES6, la syntaxe async/ awaitest essentiellement une abstraction des promesses JavaScript (qui sont similaires aux futures Python). Contrairement à Python, cependant, tant que vous n'attendez pas, vous pouvez appeler une fonction asynchrone normalement sans fonction spécifique comme celle de Python asyncio.start():

// Declare a function with async
async function returnHello(){
    return "hello world";
}

async function printSomething(){
    // await an async string
    const result = await returnHello();

    // print the string we got asynchronously
    console.log(result);
}

// Run our async code
printSomething();

Exécutez-le sur JSFiddle

Voir MDN pour plus d'informations sur async/ awaitdans JavaScript .

Rouiller

Rust autorise désormais également l'utilisation de la syntaxe async/ await, et fonctionne de la même manière que Python, C# et JavaScript :

// Allows blocking synchronous code to run async code
use futures::executor::block_on;

// Declare an async function with "async"
async fn return_hello() -> String {
    "hello world".to_string()
}

// Code that awaits must also be declared with "async"
async fn print_something(){
    // await an async String
    let result: String = return_hello().await;

    // Print the string we got asynchronously
    println!("{0}", result);
}

fn main() {
    // Block the current synchronous execution to run our async code
    block_on(print_something());
}

Exécutez-le sur Rust Play

Pour utiliser les fonctions asynchrones, nous devons d'abord ajouter futures = "0.3"à notre Cargo.toml . Nous importons ensuite la block_onfonction avec use futures::executor::block_on-- block_onest nécessaire pour exécuter notre fonction asynchrone à partir de notre mainfonction synchrone.

Vous pouvez trouver plus d'informations sur async/ awaitdans Rust dans la documentation Rust.

Aller

Plutôt que la syntaxe async/ traditionnelle awaitinhérente à tous les langages précédents que nous avons couverts, Go utilise des "goroutines" et des "canaux". Vous pouvez considérer un canal comme étant similaire à un futur Python. Dans Go, vous envoyez généralement un canal comme argument à une fonction, puis utilisez gopour exécuter la fonction simultanément. Chaque fois que vous devez vous assurer que la fonction est terminée, vous utilisez la <-syntaxe, que vous pouvez considérer comme la awaitsyntaxe la plus courante. Si votre goroutine (la fonction que vous exécutez de manière asynchrone) a une valeur de retour, elle peut être saisie de cette façon.

package main

import "fmt"

// "chan" makes the return value a string channel instead of a string
func returnHello(result chan string){
    // Gives our channel a value
    result <- "hello world"
}

func main() {
    // Creates a string channel
    result := make(chan string)

    // Starts execution of our goroutine
    go returnHello(result)

    // Awaits and prints our string
    fmt.Println(<- result)
}

Exécutez-le dans le Go Playground

Pour plus d'informations sur la simultanéité dans Go, consultez An Introduction to Programming in Go par Caleb Doxsey.

Rubis

Comme Python, Ruby a également la limitation Global Interpreter Lock. Ce qu'il n'a pas, c'est la simultanéité intégrée au langage. Cependant, il existe un joyau créé par la communauté qui permet la simultanéité dans Ruby, et vous pouvez trouver sa source sur GitHub .

Java

Comme Ruby, Java n'a pas la syntaxe async/ awaitintégrée, mais il a des capacités de concurrence en utilisant le java.util.concurrentmodule. Cependant, Electronic Arts a écrit une bibliothèque Async qui permet de l'utiliser awaitcomme méthode. Ce n'est pas exactement la même chose que Python/C#/JavaScript/Rust, mais cela vaut la peine d'être examiné si vous êtes un développeur Java et que vous êtes intéressé par ce type de fonctionnalité.

C++

Bien que C++ n'ait pas non plus la syntaxe async/ await, il a la possibilité d'utiliser des contrats à terme pour exécuter du code simultanément à l'aide du futuresmodule :

#include <iostream>
#include <string>

// Necessary for futures
#include <future>

// No async declaration needed
std::string return_hello() {
    return "hello world";
}

int main ()
{
    // Declares a string future
    std::future<std::string> fut = std::async(return_hello);

    // Awaits the result of the future
    std::string result = fut.get();

    // Prints the string we got asynchronously
    std::cout << result << '\n';
}

Exécutez-le sur C++ Shell

Il n'est pas nécessaire de déclarer une fonction avec un mot clé pour indiquer si elle peut et doit être exécutée de manière asynchrone. Au lieu de cela, vous déclarez votre futur initial chaque fois que vous en avez besoin avec std::future<{{ function return type }}>et le définissez égal à std::async(), y compris le nom de la fonction que vous souhaitez exécuter de manière asynchrone avec tous les arguments qu'elle prend - c'est-à-dire std::async(do_something, 1, 2, "string"). Pour attendre la valeur du futur, utilisez la .get()syntaxe dessus.

Vous pouvez trouver de la documentation pour async en C++ sur cplusplus.com.

Sommaire

Que vous travailliez avec des opérations de réseau ou de fichier asynchrones ou que vous effectuiez de nombreux calculs complexes, il existe plusieurs façons d'optimiser l'efficacité de votre code.

Si vous utilisez Python, vous pouvez utiliser asyncioou threadingpour tirer le meilleur parti des opérations d'E/S ou du multiprocessingmodule pour le code gourmand en CPU.

Rappelez-vous également que le concurrent.futuresmodule peut être utilisé à la place de threadingou multiprocessing.

Si vous utilisez un autre langage de programmation, il y a de fortes chances qu'il y ait une implémentation de async/ awaitpour lui aussi.

Source :  https://testdrive.io

#python #concurrency #asyncio 

Appliquer La Concurrence, Le Parallélisme Et L'asyncio Pour Accélérer

Aplicar Concurrencia, Paralelismo Y Asincio Para Acelerar Python

¿Qué son la concurrencia y el paralelismo, y cómo se aplican a Python?

Hay muchas razones por las que sus aplicaciones pueden ser lentas. A veces, esto se debe a un diseño algorítmico deficiente o a una elección incorrecta de la estructura de datos. A veces, sin embargo, se debe a fuerzas fuera de nuestro control, como restricciones de hardware o las peculiaridades de las redes. Ahí es donde encajan la concurrencia y el paralelismo. Permiten que sus programas hagan varias cosas a la vez, ya sea al mismo tiempo o perdiendo el menor tiempo posible esperando en tareas ocupadas.

Ya sea que esté tratando con recursos web externos, leyendo y escribiendo en varios archivos, o necesite usar una función de cálculo intensivo varias veces con diferentes parámetros, este artículo debería ayudarlo a maximizar la eficiencia y la velocidad de su código.

Primero, profundizaremos en qué son la simultaneidad y el paralelismo y cómo encajan en el ámbito de Python utilizando bibliotecas estándar como subprocesamiento, multiprocesamiento y asyncio. La última parte de este artículo comparará la implementación de async/ de Python awaitcon la forma en que otros lenguajes la han implementado.

Puede encontrar todos los ejemplos de código de este artículo en el repositorio de concurrency-parallelism-and-asyncio en GitHub.

Para trabajar con los ejemplos de este artículo, ya debería saber cómo trabajar con solicitudes HTTP.

Objetivos

Al final de este artículo, debería poder responder las siguientes preguntas:

  1. ¿Qué es la concurrencia?
  2. ¿Qué es un hilo?
  3. ¿Qué significa cuando algo no bloquea?
  4. ¿Qué es un bucle de eventos?
  5. ¿Qué es una devolución de llamada?
  6. ¿Por qué el método asyncio siempre es un poco más rápido que el método de subprocesamiento?
  7. ¿Cuándo debe usar subprocesos y cuándo debe usar asyncio?
  8. ¿Qué es el paralelismo?
  9. ¿Cuál es la diferencia entre concurrencia y paralelismo?
  10. ¿Es posible combinar asyncio con multiprocesamiento?
  11. ¿Cuándo debería usar multiprocesamiento vs asyncio o subprocesos?
  12. ¿Cuál es la diferencia entre multiprocesamiento, asyncio y concurrency.futures?
  13. ¿Cómo puedo probar asyncio con pytest?

concurrencia

¿Qué es la concurrencia?

Una definición efectiva de concurrencia es "ser capaz de realizar múltiples tareas a la vez". Sin embargo, esto es un poco engañoso, ya que las tareas pueden o no realizarse exactamente al mismo tiempo. En cambio, un proceso podría comenzar, luego, una vez que está esperando que finalice una instrucción específica, cambiar a una nueva tarea, solo para regresar una vez que ya no esté esperando. Una vez que finaliza una tarea, cambia de nuevo a una tarea sin terminar hasta que se hayan realizado todas. Las tareas comienzan de forma asíncrona, se realizan de forma asíncrona y luego finalizan de forma asíncrona.

concurrencia, no paralelo

Si eso te resultó confuso, pensemos en una analogía: digamos que quieres hacer un BLT . Primero, querrás tirar el tocino en una sartén a fuego medio-bajo. Mientras se cocina el tocino, puedes sacar los tomates y la lechuga y comenzar a prepararlos (lavarlos y cortarlos). Mientras tanto, continúas revisando y ocasionalmente volteando tu tocino.

En este punto, ha comenzado una tarea y luego comenzó y completó dos más mientras tanto, todo mientras todavía está esperando la primera.

Eventualmente pones tu pan en una tostadora. Mientras se tuesta, continúas revisando tu tocino. A medida que se terminan las piezas, las saca y las coloca en un plato. Una vez que el pan haya terminado de tostarse, se le aplica la crema para untar de su elección, y luego puede comenzar a colocar capas sobre los tomates, la lechuga y luego, una vez que haya terminado de cocinarse, el tocino. Solo una vez que todo esté cocido, preparado y en capas, puede colocar la última tostada en su sándwich, cortarlo (opcional) y comerlo.

Debido a que requiere que realice varias tareas al mismo tiempo, hacer un BLT es inherentemente un proceso simultáneo, incluso si no está prestando toda su atención a cada una de esas tareas a la vez. Para todos los efectos, en la siguiente sección, nos referiremos a esta forma de concurrencia simplemente como "concurrencia". Lo diferenciaremos más adelante en este artículo.

Por esta razón, la simultaneidad es ideal para procesos intensivos de E/S, tareas que implican esperar solicitudes web u operaciones de lectura/escritura de archivos.

En Python, hay algunas formas diferentes de lograr la concurrencia. Lo primero que veremos es la biblioteca de subprocesos.

Para nuestros ejemplos en esta sección, vamos a construir un pequeño programa de Python que toma cinco veces un género musical aleatorio de la API Genrenator de Binary Jazz , imprime el género en la pantalla y coloca cada uno en su propio archivo.

Para trabajar con subprocesos en Python, la única importación que necesitará es threading, pero para este ejemplo, también importé urllibpara trabajar con solicitudes HTTP, timepara determinar cuánto tardan las funciones en completarse y jsonpara convertir fácilmente los datos json devueltos. de la API de Genrenator.

Puede encontrar el código para este ejemplo aquí .

Comencemos con una función simple:

def write_genre(file_name):
    """
    Uses genrenator from binaryjazz.us to write a random genre to the
    name of the given file
    """

    req = Request("https://binaryjazz.us/wp-json/genrenator/v1/genre/", headers={"User-Agent": "Mozilla/5.0"})
    genre = json.load(urlopen(req))

    with open(file_name, "w") as new_file:
        print(f"Writing '{genre}' to '{file_name}'...")
        new_file.write(genre)

Examinando el código anterior, hacemos una solicitud a la API de Genrenator, cargamos su respuesta JSON (un género musical aleatorio), la imprimimos y luego la escribimos en un archivo.

Sin el encabezado "User-Agent", recibirá un 304.

Lo que realmente nos interesa es la siguiente sección, donde ocurre el enhebrado real:

threads = []

for i in range(5):
    thread = threading.Thread(
        target=write_genre,
        args=[f"./threading/new_file{i}.txt"]
    )
    thread.start()
    threads.append(thread)

for thread in threads:
    thread.join()

Primero comenzamos con una lista. Luego procedemos a iterar cinco veces, creando un nuevo hilo cada vez. A continuación, comenzamos cada subproceso, lo agregamos a nuestra lista de "subprocesos" y luego iteramos sobre nuestra lista una última vez para unir cada subproceso.

Explicación: crear hilos en Python es fácil.

Para crear un hilo nuevo, utilice threading.Thread(). Puede pasarle el kwarg (argumento de palabra clave) targetcon un valor de cualquier función que le gustaría ejecutar en ese hilo. Pero solo pase el nombre de la función, no su valor (es decir, para nuestros propósitos, write_genrey no write_genre()). Para pasar argumentos, pase "kwargs" (que toma un dict de sus kwargs) o "args" (que toma un iterable que contiene sus argumentos, en este caso, una lista).

Sin embargo, crear un hilo no es lo mismo que iniciar un hilo. Para iniciar su hilo, utilice {the name of your thread}.start(). Comenzar un hilo significa "comenzar su ejecución".

Por último, cuando unimos hilos con thread.join(), todo lo que hacemos es asegurarnos de que el hilo haya terminado antes de continuar con nuestro código.

Hilos

Pero, ¿qué es exactamente un hilo?

Un subproceso es una forma de permitir que su computadora divida un solo proceso/programa en muchas piezas livianas que se ejecutan en paralelo. De manera un tanto confusa, la implementación estándar de subprocesos de Python limita los subprocesos a solo poder ejecutar uno a la vez debido a algo llamado Bloqueo global de intérprete (GIL). El GIL es necesario porque la administración de memoria de CPython (la implementación predeterminada de Python) no es segura para subprocesos. Debido a esta limitación, los subprocesos en Python son concurrentes, pero no paralelos. Para evitar esto, Python tiene un multiprocessingmódulo separado no limitado por GIL que activa procesos separados, lo que permite la ejecución paralela de su código. El uso del multiprocessingmódulo es casi idéntico al uso del threadingmódulo.

Puede encontrar más información sobre GIL de Python y la seguridad de subprocesos en Real Python y en los documentos oficiales de Python .

Echaremos un vistazo más profundo al multiprocesamiento en Python en breve.

Antes de mostrar la mejora potencial de la velocidad con respecto al código sin subprocesos, me tomé la libertad de crear también una versión sin subprocesos del mismo programa (nuevamente, disponible en GitHub ). En lugar de crear un nuevo subproceso y unir cada uno de ellos, llama write_genrea un bucle for que itera cinco veces.

Para comparar puntos de referencia de velocidad, también importé la timebiblioteca para cronometrar la ejecución de nuestros scripts:

Starting...
Writing "binary indoremix" to "./sync/new_file0.txt"...
Writing "slavic aggro polka fusion" to "./sync/new_file1.txt"...
Writing "israeli new wave" to "./sync/new_file2.txt"...
Writing "byzantine motown" to "./sync/new_file3.txt"...
Writing "dutch hate industrialtune" to "./sync/new_file4.txt"...
Time to complete synchronous read/writes: 1.42 seconds

Al ejecutar el script, vemos que mi computadora tarda alrededor de 1,49 segundos (junto con géneros musicales clásicos como "Dutch Hat Industrialtune"). No está mal.

Ahora ejecutemos la versión que usa subprocesos:

Starting...
Writing "college k-dubstep" to "./threading/new_file2.txt"...
Writing "swiss dirt" to "./threading/new_file0.txt"...
Writing "bop idol alternative" to "./threading/new_file4.txt"...
Writing "ethertrio" to "./threading/new_file1.txt"...
Writing "beach aust shanty français" to "./threading/new_file3.txt"...
Time to complete threading read/writes: 0.77 seconds

Lo primero que le puede llamar la atención es que las funciones no se completan en orden: 2 - 0 - 4 - 1 - 3

Esto se debe a la naturaleza asincrónica de los subprocesos: mientras una función espera, comienza otra, y así sucesivamente. Debido a que podemos continuar realizando tareas mientras esperamos que otros terminen (ya sea debido a operaciones de red o de E/S de archivos), también puede haber notado que reducimos nuestro tiempo aproximadamente a la mitad: 0,77 segundos. Si bien esto puede no parecer mucho ahora, es fácil imaginar el caso real de crear una aplicación web que necesita escribir muchos más datos en un archivo o interactuar con servicios web mucho más complejos.

Entonces, si la creación de subprocesos es tan buena, ¿por qué no terminamos el artículo aquí?

Porque hay formas aún mejores de realizar tareas al mismo tiempo.

Asíncio

Echemos un vistazo a un ejemplo usando asyncio. Para este método, instalaremos aiohttp usando pip. Esto nos permitirá realizar solicitudes sin bloqueo y recibir respuestas utilizando la sintaxis async/ awaitque se presentará en breve. También tiene el beneficio adicional de una función que convierte una respuesta JSON sin necesidad de importar la jsonbiblioteca. También instalaremos e importaremos aiofiles , lo que permite operaciones de archivo sin bloqueo. Aparte de aiohttpand aiofiles, import asyncio, que viene con la biblioteca estándar de Python.

"Sin bloqueo" significa que un programa permitirá que otros subprocesos continúen ejecutándose mientras espera. Esto se opone al código de "bloqueo", que detiene la ejecución de su programa por completo. Las operaciones de E/S sincrónicas normales sufren esta limitación.

Puede encontrar el código para este ejemplo aquí .

Una vez que tengamos nuestras importaciones en su lugar, echemos un vistazo a la versión asíncrona de la write_genrefunción de nuestro ejemplo asyncio:

async def write_genre(file_name):
    """
    Uses genrenator from binaryjazz.us to write a random genre to the
    name of the given file
    """

    async with aiohttp.ClientSession() as session:
        async with session.get("https://binaryjazz.us/wp-json/genrenator/v1/genre/") as response:
            genre = await response.json()

    async with aiofiles.open(file_name, "w") as new_file:
        print(f'Writing "{genre}" to "{file_name}"...')
        await new_file.write(genre)

Para aquellos que no están familiarizados con la sintaxis async/ awaitque se puede encontrar en muchos otros lenguajes modernos, asyncdeclara que una función, forbucle o withdeclaración debe usarse de forma asíncrona. Para llamar a una función asíncrona, debe usar la awaitpalabra clave de otra función asíncrona o llamar create_task()directamente desde el bucle de eventos, que se puede obtener de asyncio.get_event_loop(), es decir, loop = asyncio.get_event_loop().

Además:

  1. async withpermite esperar respuestas asíncronas y operaciones de archivo.
  2. async for(no se usa aquí) itera sobre una secuencia asíncrona .

El bucle de eventos

Los bucles de eventos son construcciones inherentes a la programación asíncrona que permiten realizar tareas de forma asíncrona. Mientras lee este artículo, puedo asumir con seguridad que probablemente no esté muy familiarizado con el concepto. Sin embargo, incluso si nunca ha escrito una aplicación asíncrona, tiene experiencia con bucles de eventos cada vez que usa una computadora. Ya sea que su computadora esté escuchando la entrada del teclado, esté jugando juegos multijugador en línea o esté navegando en Reddit mientras tiene archivos que se copian en segundo plano, un bucle de eventos es la fuerza impulsora que mantiene todo funcionando sin problemas y de manera eficiente. En su esencia más pura, un bucle de eventos es un proceso que espera desencadenantes y luego realiza acciones específicas (programadas) una vez que se cumplen esos desencadenantes. A menudo devuelven una "promesa" (sintaxis de JavaScript) o "futuro" (sintaxis de Python) de algún tipo para indicar que se ha agregado una tarea. Una vez que finaliza la tarea, la promesa o el futuro devuelve un valor pasado desde la función llamada (suponiendo que la función devuelva un valor).

La idea de realizar una función en respuesta a otra función se denomina "devolución de llamada".

Para otra versión de las devoluciones de llamada y los eventos, aquí hay una excelente respuesta en Stack Overflow .

Aquí hay un tutorial de nuestra función:

Estamos usando async withpara abrir nuestra sesión de cliente de forma asíncrona. La aiohttp.ClientSession()clase es lo que nos permite realizar solicitudes HTTP y permanecer conectados a una fuente sin bloquear la ejecución de nuestro código. Luego hacemos una solicitud asíncrona a la API de Genrenator y esperamos la respuesta JSON (un género musical aleatorio). En la siguiente línea, usamos async withnuevamente con la aiofilesbiblioteca para abrir de forma asincrónica un nuevo archivo para escribir nuestro nuevo género. Imprimimos el género, luego lo escribimos en el archivo.

A diferencia de las secuencias de comandos regulares de Python, la programación con asyncio prácticamente impone* el uso de algún tipo de función "principal".

* A menos que esté usando la sintaxis obsoleta de "rendimiento" con el decorador @asyncio.coroutine, que se eliminará en Python 3.10 .

Esto se debe a que necesita usar la palabra clave "async" para usar la sintaxis "await", y la sintaxis "await" es la única forma de ejecutar realmente otras funciones asíncronas.

Aquí está nuestra función principal:

async def main():
    tasks = []

    for i in range(5):
        tasks.append(write_genre(f"./async/new_file{i}.txt"))

    await asyncio.gather(*tasks)

Como puede ver, lo hemos declarado con "async". Luego creamos una lista vacía llamada "tareas" para albergar nuestras tareas asíncronas (llamadas a Genrenator y nuestro archivo I/O). Agregamos nuestras tareas a nuestra lista, pero en realidad aún no se ejecutan. En realidad, las llamadas no se realizan hasta que las programamos con await asyncio.gather(*tasks). Esto ejecuta todas las tareas de nuestra lista y espera a que finalicen antes de continuar con el resto de nuestro programa. Por último, usamos asyncio.run(main())para ejecutar nuestra función "principal". La .run()función es el punto de entrada de nuestro programa y, por lo general , solo debe llamarse una vez por proceso .

Para aquellos que no estén familiarizados, el *frente de las tareas se llama "desempaquetado de argumentos". Tal como suena, descomprime nuestra lista en una serie de argumentos para nuestra función. Nuestra función es asyncio.gather()en este caso.

Y eso es todo lo que tenemos que hacer. Ahora, ejecutando nuestro programa (cuya fuente incluye la misma funcionalidad de sincronización de los ejemplos sincrónicos y de subprocesamiento)...

Writing "albuquerque fiddlehaus" to "./async/new_file1.txt"...
Writing "euroreggaebop" to "./async/new_file2.txt"...
Writing "shoedisco" to "./async/new_file0.txt"...
Writing "russiagaze" to "./async/new_file4.txt"...
Writing "alternative xylophone" to "./async/new_file3.txt"...
Time to complete asyncio read/writes: 0.71 seconds

... vemos que es aún más rápido aún. Y, en general, el método asyncio siempre será un poco más rápido que el método de subprocesamiento. Esto se debe a que cuando usamos la sintaxis "aguardar", esencialmente le decimos a nuestro programa "espere, vuelvo enseguida", pero nuestro programa realiza un seguimiento de cuánto tiempo nos lleva terminar lo que estamos haciendo. Una vez que hayamos terminado, nuestro programa lo sabrá y se reanudará tan pronto como sea posible. La creación de subprocesos en Python permite la asincronía, pero nuestro programa teóricamente podría omitir diferentes subprocesos que aún no estén listos, perdiendo el tiempo si hay subprocesos listos para continuar ejecutándose.

Entonces, ¿cuándo debo usar subprocesos y cuándo debo usar asyncio?

Cuando esté escribiendo código nuevo, use asyncio. Si necesita interactuar con bibliotecas más antiguas o aquellas que no son compatibles con asyncio, es posible que esté mejor con subprocesos.

Probando asyncio con pytest

Resulta que probar funciones asíncronas con pytest es tan fácil como probar funciones síncronas. Simplemente instale el paquete pytest-asyncio con pip, marque sus pruebas con la asyncpalabra clave y aplique un decorador que permita pytestsaber que es asíncrono: @pytest.mark.asyncio. Veamos un ejemplo.

Primero, escribamos una función asíncrona arbitraria en un archivo llamado hello_asyncio.py :

import asyncio


async def say_hello(name: str):
    """ Sleeps for two seconds, then prints 'Hello, {{ name }}!' """
    try:
        if type(name) != str:
            raise TypeError("'name' must be a string")
        if name == "":
            raise ValueError("'name' cannot be empty")
    except (TypeError, ValueError):
        raise

    print("Sleeping...")
    await asyncio.sleep(2)
    print(f"Hello, {name}!")

La función toma un solo argumento de cadena: name. Al asegurarnos de que namees una cadena con una longitud superior a uno, nuestra función duerme de forma asíncrona durante dos segundos y luego se imprime "Hello, {name}!"en la consola.

La diferencia entre asyncio.sleep()y time.sleep()es que asyncio.sleep()no bloquea.

Ahora vamos a probarlo con pytest. En el mismo directorio que hello_asyncio.py, cree un archivo llamado test_hello_asyncio.py, luego ábralo en su editor de texto favorito.

Comencemos con nuestras importaciones:

import pytest # Note: pytest-asyncio does not require a separate import

from hello_asyncio import say_hello

Luego crearemos una prueba con la entrada adecuada:

@pytest.mark.parametrize("name", [
    "Robert Paulson",
    "Seven of Nine",
    "x Æ a-12"
])
@pytest.mark.asyncio
async def test_say_hello(name):
    await say_hello(name)

Cosas a tener en cuenta:

  • El @pytest.mark.asynciodecorador permite que pytest funcione de forma asíncrona.
  • Nuestra prueba usa la asyncsintaxis
  • Estamos awaitejecutando nuestra función asíncrona como lo haríamos si la estuviéramos ejecutando fuera de una prueba

-vAhora ejecutemos nuestra prueba con la opción detallada :

pytest -v
...
collected 3 items

test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED    [ 33%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED     [ 66%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED       [100%]

Se ve bien. A continuación, escribiremos un par de pruebas con una entrada incorrecta. De vuelta dentro de test_hello_asyncio.py , creemos una clase llamada TestSayHelloThrowsExceptions:

class TestSayHelloThrowsExceptions:
    @pytest.mark.parametrize("name", [
        "",
    ])
    @pytest.mark.asyncio
    async def test_say_hello_value_error(self, name):
        with pytest.raises(ValueError):
            await say_hello(name)

    @pytest.mark.parametrize("name", [
        19,
        {"name", "Diane"},
        []
    ])
    @pytest.mark.asyncio
    async def test_say_hello_type_error(self, name):
        with pytest.raises(TypeError):
            await say_hello(name)

Nuevamente, decoramos nuestras pruebas con @pytest.mark.asyncio, marcamos nuestras pruebas con la asyncsintaxis, luego llamamos a nuestra función con await.

Vuelva a ejecutar las pruebas:

pytest -v
...
collected 7 items

test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED                                    [ 14%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED                                     [ 28%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED                                       [ 42%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_value_error[] PASSED        [ 57%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[19] PASSED       [ 71%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name1] PASSED    [ 85%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name2] PASSED    [100%]

Sin pytest-asyncio

Como alternativa a pytest-asyncio, puede crear un accesorio pytest que produzca un bucle de eventos asyncio:

import asyncio
import pytest

from hello_asyncio import say_hello


@pytest.fixture
def event_loop():
    loop = asyncio.get_event_loop()
    yield loop

Luego, en lugar de usar la sintaxis async/ await, crea sus pruebas como lo haría con las pruebas sincrónicas normales:

@pytest.mark.parametrize("name", [
    "Robert Paulson",
    "Seven of Nine",
    "x Æ a-12"
])
def test_say_hello(event_loop, name):
    event_loop.run_until_complete(say_hello(name))


class TestSayHelloThrowsExceptions:
    @pytest.mark.parametrize("name", [
        "",
    ])
    def test_say_hello_value_error(self, event_loop, name):
        with pytest.raises(ValueError):
            event_loop.run_until_complete(say_hello(name))

    @pytest.mark.parametrize("name", [
        19,
        {"name", "Diane"},
        []
    ])
    def test_say_hello_type_error(self, event_loop, name):
        with pytest.raises(TypeError):
            event_loop.run_until_complete(say_hello(name))

If you're interested, here's a more advanced tutorial on asyncio testing.

Further Reading

If you want to learn more about what distinguishes Python's implementation of threading vs asyncio, here's a great article from Medium.

For even better examples and explanations of threading in Python, here's a video by Corey Schafer that goes more in-depth, including using the concurrent.futures library.

Lastly, for a massive deep-dive into asyncio itself, here's an article from Real Python completely dedicated to it.

Bonus: One more library you might be interested in is called Unsync, especially if you want to easily convert your current synchronous code into asynchronous code. To use it, you install the library with pip, import it with from unsync import unsync, then decorate whatever currently synchronous function with @unsync to make it asynchronous. To await it and get its return value (which you can do anywhere -- it doesn't have to be in an async/unsync function), just call .result() after the function call.

Parallelism

What is parallelism?

Parallelism is very-much related to concurrency. In fact, parallelism is a subset of concurrency: whereas a concurrent process performs multiple tasks at the same time whether they're being diverted total attention or not, a parallel process is physically performing multiple tasks all at the same time. A good example would be driving, listening to music, and eating the BLT we made in the last section at the same time.

concurrente y paralelo

Because they don't require a lot of intensive effort, you can do them all at once without having to wait on anything or divert your attention away.

Now let's take a look at how to implement this in Python. We could use the multiprocessing library, but let's use the concurrent.futures library instead -- it eliminates the need to manage the number of process manually. Because the major benefit of multiprocessing happens when you perform multiple cpu-heavy tasks, we're going to compute the squares of 1 million (1000000) to 1 million and 16 (1000016).

You can find the code for this example here.

The only import we'll need is concurrent.futures:

import concurrent.futures
import time


if __name__ == "__main__":
    pow_list = [i for i in range(1000000, 1000016)]

    print("Starting...")
    start = time.time()

    with concurrent.futures.ProcessPoolExecutor() as executor:
        futures = [executor.submit(pow, i, i) for i in pow_list]

    for f in concurrent.futures.as_completed(futures):
        print("okay")

    end = time.time()
    print(f"Time to complete: {round(end - start, 2)}")

Because I'm developing on a Windows machine, I'm using if __name__ == "main". This is necessary because Windows does not have the fork system call inherent to Unix systems. Because Windows doesn't have this capability, it resorts to launching a new interpreter with each process that tries to import the main module. If the main module doesn't exist, it reruns your entire program, causing recursive chaos to ensue.

So taking a look at our main function, we use a list comprehension to create a list from 1 million to 1 million and 16, we open a ProcessPoolExecutor with concurrent.futures, and we use list comprehension and ProcessPoolExecutor().submit() to start executing our processes and throwing them into a list called "futures."

We could also use ThreadPoolExecutor() if we wanted to use threads instead -- concurrent.futures is versatile.

And this is where the asynchronicity comes in: The "results" list does not actually contain the results from running our functions. Instead, it contains "futures" which are similar to the JavaScript idea of "promises." In order to allow our program to continue running, we get back these futures that represent a placeholder for a value. If we try to print the future, depending on whether it's finished running or not, we'll either get back a state of "pending" or "finished." Once it's finished we can get the return value (assuming there is one) using var.result(). In this case, our var will be "result."

We then iterate through our list of futures, but instead of printing our values, we're simply printing out "okay." This is just because of how massive the resulting calculations come out to be.

Just as before, I built a comparison script that does this synchronously. And, just as before, you can find it on GitHub.

Running our control program, which also includes functionality for timing our program, we get:

Starting...
okay
...
okay
Time to complete: 54.64

Wow. 54.64 seconds is quite a long time. Let's see if our version with multiprocessing does any better:

Starting...
okay
...
okay
Time to complete: 6.24

Our time has been significantly reduced. We're at about 1/9th of our original time.

So what would happen if we used threading for this instead?

I'm sure you can guess -- it wouldn't be much faster than doing it synchronously. In fact, it might be slower because it still takes a little time and effort to spin up new threads. But don't take my word for it, here's what we get when we replace ProcessPoolExecutor() with ThreadPoolExecutor():

Starting...
okay
...
okay
Time to complete: 53.83

As I mentioned earlier, threading allows your applications to focus on new tasks while others are waiting. In this case, we're never sitting idly by. Multiprocessing, on the other hand, spins up totally new services, usually on separate CPU cores, ready to do whatever you ask it completely in tandem with whatever else your script is doing. This is why the multiprocessing version taking roughly 1/9th of the time makes sense -- I have 8 cores in my CPU.

Now that we've talked about concurrency and parallelism in Python, we can finally set the terms straight. If you're having trouble distinguishing between the terms, you can safely and accurately think of our previous definitions of "parallelism" and "concurrency" as "parallel concurrency" and "non-parallel concurrency" respectively.

Further Reading

Real Python has a great article on concurrency vs parallelism.

Engineer Man has a good video comparison of threading vs multiprocessing.

Corey Schafer also has a good video on multiprocessing in the same spirit as his threading video.

If you only watch one video, watch this excellent talk by Raymond Hettinger. He does an amazing job explaining the differences between multiprocessing, threading, and asyncio.

Combining Asyncio with Multiprocessing

What if I need to combine many I/O operations with heavy calculations?

We can do that too. Say you need to scrape 100 web pages for a specific piece of information, and then you need to save that piece of info in a file for later. We can separate the compute power across each of our computer's cores by making each process scrape a fraction of the pages.

For this script, let's install Beautiful Soup to help us easily scrape our pages: pip install beautifulsoup4. This time we actually have quite a few imports. Here they are, and here's why we're using them:

import asyncio                         # Gives us async/await
import concurrent.futures              # Allows creating new processes
import time
from math import floor                 # Helps divide up our requests evenly across our CPU cores
from multiprocessing import cpu_count  # Returns our number of CPU cores

import aiofiles                        # For asynchronously performing file I/O operations
import aiohttp                         # For asynchronously making HTTP requests
from bs4 import BeautifulSoup          # For easy webpage scraping

You can find the code for this example here.

First, we're going to create an async function that makes a request to Wikipedia to get back random pages. We'll scrape each page we get back for its title using BeautifulSoup, and then we'll append it to a given file; we'll separate each title with a tab. The function will take two arguments:

  1. num_pages - Number of pages to request and scrape for titles
  2. output_file - The file to append our titles to
async def get_and_scrape_pages(num_pages: int, output_file: str):
    """
    Makes {{ num_pages }} requests to Wikipedia to receive {{ num_pages }} random
    articles, then scrapes each page for its title and appends it to {{ output_file }},
    separating each title with a tab: "\\t"

    #### Arguments
    ---
    num_pages: int -
        Number of random Wikipedia pages to request and scrape

    output_file: str -
        File to append titles to
    """
    async with \
    aiohttp.ClientSession() as client, \
    aiofiles.open(output_file, "a+", encoding="utf-8") as f:

        for _ in range(num_pages):
            async with client.get("https://en.wikipedia.org/wiki/Special:Random") as response:
                if response.status > 399:
                    # I was getting a 429 Too Many Requests at a higher volume of requests
                    response.raise_for_status()

                page = await response.text()
                soup = BeautifulSoup(page, features="html.parser")
                title = soup.find("h1").text

                await f.write(title + "\t")

        await f.write("\n")

Ambos estamos abriendo asíncronamente un aiohttp ClientSessiony nuestro archivo de salida. El modo, a+significa agregar al archivo y crearlo si aún no existe. Codificar nuestras cadenas como utf-8 garantiza que no recibamos un error si nuestros títulos contienen caracteres internacionales. Si recibimos una respuesta de error, la generaremos en lugar de continuar (en grandes volúmenes de solicitudes recibía un 429 Demasiadas solicitudes). Obtenemos de forma asíncrona el texto de nuestra respuesta, luego analizamos el título y de forma asíncrona y lo agregamos a nuestro archivo. Después de agregar todos nuestros títulos, agregamos una nueva línea: "\n".

Nuestra próxima función es la función que comenzaremos con cada nuevo proceso para permitir ejecutarlo de forma asíncrona:

def start_scraping(num_pages: int, output_file: str, i: int):
    """ Starts an async process for requesting and scraping Wikipedia pages """
    print(f"Process {i} starting...")
    asyncio.run(get_and_scrape_pages(num_pages, output_file))
    print(f"Process {i} finished.")

Ahora nuestra función principal. Comencemos con algunas constantes (y nuestra declaración de función):

def main():
    NUM_PAGES = 100 # Number of pages to scrape altogether
    NUM_CORES = cpu_count() # Our number of CPU cores (including logical cores)
    OUTPUT_FILE = "./wiki_titles.tsv" # File to append our scraped titles to

    PAGES_PER_CORE = floor(NUM_PAGES / NUM_CORES)
    PAGES_FOR_FINAL_CORE = PAGES_PER_CORE + NUM_PAGES % PAGES_PER_CORE # For our final core

Y ahora la lógica:

    futures = []

    with concurrent.futures.ProcessPoolExecutor(NUM_CORES) as executor:
        for i in range(NUM_CORES - 1):
            new_future = executor.submit(
                start_scraping, # Function to perform
                # v Arguments v
                num_pages=PAGES_PER_CORE,
                output_file=OUTPUT_FILE,
                i=i
            )
            futures.append(new_future)

        futures.append(
            executor.submit(
                start_scraping,
                PAGES_FOR_FINAL_CORE, OUTPUT_FILE, NUM_CORES-1
            )
        )

    concurrent.futures.wait(futures)

Creamos una matriz para almacenar nuestros futuros, luego creamos un ProcessPoolExecutor, estableciendo su max_workersigual a nuestra cantidad de núcleos. Iteramos sobre un rango igual a nuestro número de núcleos menos 1, ejecutando un nuevo proceso con nuestra start_scrapingfunción. Luego le agregamos nuestra lista de futuros. Nuestro núcleo final tendrá potencialmente trabajo adicional que hacer, ya que raspará una cantidad de páginas igual a cada uno de nuestros otros núcleos, pero también raspará una cantidad de páginas igual al resto que obtuvimos al dividir nuestro número total de páginas para raspar por nuestro número total de núcleos de CPU.

Asegúrate de ejecutar tu función principal:

if __name__ == "__main__":
    start = time.time()
    main()
    print(f"Time to complete: {round(time.time() - start, 2)} seconds.")

Después de ejecutar el programa con mi CPU de 8 núcleos (junto con el código de evaluación comparativa):

Esta versión ( asyncio con multiprocesamiento ):

Time to complete: 5.65 seconds.

Solo multiprocesamiento :

Time to complete: 8.87 seconds.

solo asíncio :

Time to complete: 47.92 seconds.

Completamente sincronizado :

Time to complete: 88.86 seconds.

De hecho, estoy bastante sorprendido de ver que la mejora de asyncio con multiprocesamiento sobre solo multiprocesamiento no fue tan buena como pensé que sería.

Resumen: cuándo usar multiprocesamiento frente a asyncio o subprocesamiento

  1. Utilice el multiprocesamiento cuando necesite hacer muchos cálculos pesados ​​y pueda dividirlos.
  2. Utilice asyncio o subprocesamiento cuando esté realizando operaciones de E/S, comunicándose con recursos externos o leyendo/escribiendo desde/hacia archivos.
  3. El multiprocesamiento y asyncio se pueden usar juntos, pero una buena regla general es bifurcar un proceso antes de enhebrar/usar asyncio en lugar de hacerlo al revés: los hilos son relativamente baratos en comparación con los procesos.

Async/Await en otros idiomas

async/ awaity una sintaxis similar también existe en otros idiomas, y en algunos de esos idiomas, su implementación puede diferir drásticamente.

.NET: F# a C

El primer lenguaje de programación (en 2007) que utilizó la asyncsintaxis fue F# de Microsoft. Mientras que no suele awaitesperar una llamada de función, utiliza una sintaxis específica como let!y do!junto con Asyncfunciones propietarias incluidas en el Systemmódulo.

Puede encontrar más información sobre la programación asíncrona en F# en los documentos de F# de Microsoft .

Luego, su equipo de C# se basó en este concepto, y ahí es donde nacieron las palabras clave async/ awaitcon las que ahora estamos familiarizados:

using System;

// Allows the "Task" return type
using System.Threading.Tasks;

public class Program
{
    // Declare an async function with "async"
    private static async Task<string> ReturnHello()
    {
        return "hello world";
    }

    // Main can be async -- no problem
    public static async Task Main()
    {
        // await an async string
        string result = await ReturnHello();

        // Print the string we got asynchronously
        Console.WriteLine(result);
    }
}

Ejecutarlo en .NETFiddle

Nos aseguramos de que estemos using System.Threading.Tasksya que incluye el Tasktipo y, en general, el Tasktipo es necesario para que se espere una función asíncrona. Lo bueno de C# es que puede hacer que su función principal sea asíncrona simplemente declarándola con async, y no tendrá ningún problema.

Si está interesado en obtener más información sobre async/ awaiten C#, los documentos de C# de Microsoft tienen una buena página al respecto.

JavaScript

Presentada por primera vez en ES6, la sintaxis async/ awaites esencialmente una abstracción sobre las promesas de JavaScript (que son similares a los futuros de Python). Sin embargo, a diferencia de Python, siempre que no esté esperando, puede llamar a una función asíncrona normalmente sin una función específica como la de Python asyncio.start():

// Declare a function with async
async function returnHello(){
    return "hello world";
}

async function printSomething(){
    // await an async string
    const result = await returnHello();

    // print the string we got asynchronously
    console.log(result);
}

// Run our async code
printSomething();

Ejecutarlo en JSFiddle

Consulte MDN para obtener más información sobre async/ awaiten JavaScript .

Óxido

Rust ahora también permite el uso de la sintaxis async/ awaity funciona de manera similar a Python, C# y JavaScript:

// Allows blocking synchronous code to run async code
use futures::executor::block_on;

// Declare an async function with "async"
async fn return_hello() -> String {
    "hello world".to_string()
}

// Code that awaits must also be declared with "async"
async fn print_something(){
    // await an async String
    let result: String = return_hello().await;

    // Print the string we got asynchronously
    println!("{0}", result);
}

fn main() {
    // Block the current synchronous execution to run our async code
    block_on(print_something());
}

Ejecútalo en Rust Play

Para usar funciones asíncronas, primero debemos agregar futures = "0.3"a nuestro Cargo.toml . Luego importamos la block_onfunción con use futures::executor::block_on-- block_ones necesario para ejecutar nuestra función asíncrona desde nuestra mainfunción síncrona.

Puede encontrar más información sobre async/ awaiten Rust en los documentos de Rust.

Vamos

asyncEn lugar de la sintaxis / tradicional awaitinherente a todos los lenguajes anteriores que hemos cubierto, Go usa "goroutines" y "channels". Puede pensar en un canal como algo similar a un futuro de Python. En Go, generalmente envía un canal como argumento a una función y luego lo usa gopara ejecutar la función simultáneamente. Siempre que necesite asegurarse de que la función haya terminado de completarse, use la <-sintaxis, que puede considerar como la awaitsintaxis más común. Si su goroutine (la función que está ejecutando de forma asincrónica) tiene un valor de retorno, se puede obtener de esta manera.

package main

import "fmt"

// "chan" makes the return value a string channel instead of a string
func returnHello(result chan string){
    // Gives our channel a value
    result <- "hello world"
}

func main() {
    // Creates a string channel
    result := make(chan string)

    // Starts execution of our goroutine
    go returnHello(result)

    // Awaits and prints our string
    fmt.Println(<- result)
}

Ejecutarlo en el Go Playground

Para obtener más información sobre la simultaneidad en Go, consulte Introducción a la programación en Go de Caleb Doxsey.

Rubí

De manera similar a Python, Ruby también tiene la limitación Global Interpreter Lock. Lo que no tiene es concurrencia integrada en el lenguaje. Sin embargo, existe una gema creada por la comunidad que permite la concurrencia en Ruby, y puede encontrar su fuente en GitHub .

Java

Al igual que Ruby, Java no tiene la sintaxis async/ awaitincorporada, pero tiene capacidades de concurrencia usando el java.util.concurrentmódulo. Sin embargo, Electronic Arts escribió una biblioteca Async que permite su uso awaitcomo método. No es exactamente lo mismo que Python/C#/JavaScript/Rust, pero vale la pena investigarlo si es un desarrollador de Java y está interesado en este tipo de funcionalidad.

C++

Aunque C ++ tampoco tiene la sintaxis async/ await, tiene la capacidad de usar futuros para ejecutar código simultáneamente usando el futuresmódulo:

#include <iostream>
#include <string>

// Necessary for futures
#include <future>

// No async declaration needed
std::string return_hello() {
    return "hello world";
}

int main ()
{
    // Declares a string future
    std::future<std::string> fut = std::async(return_hello);

    // Awaits the result of the future
    std::string result = fut.get();

    // Prints the string we got asynchronously
    std::cout << result << '\n';
}

Ejecutarlo en C++ Shell

No es necesario declarar una función con ninguna palabra clave para indicar si puede y debe ejecutarse de forma asíncrona o no. En su lugar, declaras tu futuro inicial siempre que lo necesites std::future<{{ function return type }}>y lo estableces igual a std::async(), incluido el nombre de la función que deseas realizar de forma asíncrona junto con los argumentos necesarios, es decir, std::async(do_something, 1, 2, "string"). Para esperar el valor del futuro, use la .get()sintaxis en él.

Puede encontrar documentación para async en C++ en cplusplus.com.

Resumen

Ya sea que esté trabajando con operaciones de archivo o de red asincrónicas o esté realizando numerosos cálculos complejos, hay algunas formas diferentes de maximizar la eficiencia de su código.

Si usa Python, puede usar asyncioo threadingpara aprovechar al máximo las operaciones de E/S o el multiprocessingmódulo para código con uso intensivo de CPU.

Recuerde también que el concurrent.futuresmódulo se puede usar en lugar de threadingo multiprocessing.

Si está utilizando otro lenguaje de programación, es probable que también haya una implementación de async/ awaitpara él.

Fuente:  https://testdriven.io

#python #concurrency #asyncio 

Aplicar Concurrencia, Paralelismo Y Asincio Para Acelerar Python

Aplicar Simultaneidade, Paralelismo, Assíncrono Para Acelerar O Python

O que são simultaneidade e paralelismo e como eles se aplicam ao Python?

Há muitas razões pelas quais seus aplicativos podem ser lentos. Às vezes, isso se deve ao design algorítmico ruim ou à escolha errada da estrutura de dados. Às vezes, no entanto, é devido a forças fora de nosso controle, como restrições de hardware ou peculiaridades da rede. É aí que a simultaneidade e o paralelismo se encaixam. Eles permitem que seus programas façam várias coisas ao mesmo tempo, seja ao mesmo tempo ou desperdiçando o mínimo de tempo possível esperando em tarefas ocupadas.

Não importa se você está lidando com recursos da Web externos, lendo e gravando em vários arquivos ou precisa usar uma função de cálculo intensivo várias vezes com parâmetros diferentes, este artigo deve ajudá-lo a maximizar a eficiência e a velocidade do seu código.

Primeiro, vamos nos aprofundar no que são simultaneidade e paralelismo e como eles se encaixam no reino do Python usando bibliotecas padrão como threading, multiprocessing e assíncrono. A última parte deste artigo irá comparar a implementação de async/ do Python awaitcom como outras linguagens as implementaram.

Você pode encontrar todos os exemplos de código deste artigo no repositório concurrency-parallelism-and-asyncio no GitHub.

Para trabalhar com os exemplos deste artigo, você já deve saber como trabalhar com solicitações HTTP.

Objetivos

Ao final deste artigo, você deverá ser capaz de responder às seguintes perguntas:

  1. O que é simultaneidade?
  2. O que é um fio?
  3. O que significa quando algo não está bloqueando?
  4. O que é um loop de eventos?
  5. O que é um retorno de chamada?
  6. Por que o método assíncrono é sempre um pouco mais rápido que o método de encadeamento?
  7. Quando você deve usar o encadeamento e quando deve usar o assíncrono?
  8. O que é paralelismo?
  9. Qual é a diferença entre simultaneidade e paralelismo?
  10. É possível combinar assíncrono com multiprocessamento?
  11. Quando você deve usar multiprocessamento versus assíncrono ou encadeamento?
  12. Qual é a diferença entre multiprocessamento, assíncrono e simultaneidade.futures?
  13. Como posso testar o assíncrono com o pytest?

Simultaneidade

O que é simultaneidade?

Uma definição eficaz para simultaneidade é "ser capaz de executar várias tarefas ao mesmo tempo". Isso é um pouco enganador, pois as tarefas podem ou não ser executadas exatamente ao mesmo tempo. Em vez disso, um processo pode iniciar e, quando estiver aguardando uma instrução específica para terminar, mudar para uma nova tarefa, apenas para voltar quando não estiver mais esperando. Quando uma tarefa é concluída, ela alterna novamente para uma tarefa inacabada até que todas tenham sido executadas. As tarefas começam de forma assíncrona, são executadas de forma assíncrona e terminam de forma assíncrona.

simultaneidade, não paralela

Se isso foi confuso para você, vamos pensar em uma analogia: digamos que você queira fazer um BLT . Primeiro, você vai querer jogar o bacon em uma panela em fogo médio-baixo. Enquanto o bacon cozinha, você pode pegar seus tomates e alface e começar a prepará-los (lavar e cortar). O tempo todo, você continua verificando e ocasionalmente virando seu bacon.

Neste ponto, você iniciou uma tarefa e, em seguida, iniciou e concluiu mais duas, enquanto ainda espera pela primeira.

Eventualmente você coloca seu pão em uma torradeira. Enquanto está torrando, você continua verificando seu bacon. À medida que as peças terminam, você as retira e as coloca em um prato. Depois que o pão estiver tostado, você aplica nele a pasta de sanduíche de sua escolha e, em seguida, pode começar a colocar camadas de tomate, alface e, quando terminar de cozinhar, o bacon. Somente depois que tudo estiver cozido, preparado e em camadas, você pode colocar o último pedaço de torrada no sanduíche, cortá-lo (opcional) e comê-lo.

Como exige que você execute várias tarefas ao mesmo tempo, fazer um BLT é inerentemente um processo simultâneo, mesmo que você não esteja dando atenção total a cada uma dessas tarefas de uma só vez. Para todos os efeitos, na próxima seção, nos referiremos a essa forma de simultaneidade apenas como "simultaneidade". Vamos diferenciá-lo mais adiante neste artigo.

Por esse motivo, a simultaneidade é ótima para processos com uso intenso de E/S -- tarefas que envolvem aguardar solicitações da Web ou operações de leitura/gravação de arquivos.

Em Python, existem algumas maneiras diferentes de alcançar a simultaneidade. A primeira que veremos é a biblioteca de encadeamento.

Para nossos exemplos nesta seção, vamos construir um pequeno programa Python que pega um gênero de música aleatório da API Genrenator da Binary Jazz cinco vezes, imprime o gênero na tela e coloca cada um em seu próprio arquivo.

Para trabalhar com encadeamento em Python, a única importação necessária é threading, mas para este exemplo, também importei urllibpara trabalhar com solicitações HTTP, timepara determinar quanto tempo as funções levam para serem concluídas e jsonpara converter facilmente os dados json retornados da API do gerador.

Você pode encontrar o código para este exemplo aqui .

Vamos começar com uma função simples:

def write_genre(file_name):
    """
    Uses genrenator from binaryjazz.us to write a random genre to the
    name of the given file
    """

    req = Request("https://binaryjazz.us/wp-json/genrenator/v1/genre/", headers={"User-Agent": "Mozilla/5.0"})
    genre = json.load(urlopen(req))

    with open(file_name, "w") as new_file:
        print(f"Writing '{genre}' to '{file_name}'...")
        new_file.write(genre)

Examinando o código acima, estamos fazendo uma solicitação à API do Genrenator, carregando sua resposta JSON (um gênero musical aleatório), imprimindo-a e gravando-a em um arquivo.

Sem o cabeçalho "User-Agent", você receberá um 304.

O que realmente nos interessa é a próxima seção, onde o encadeamento real acontece:

threads = []

for i in range(5):
    thread = threading.Thread(
        target=write_genre,
        args=[f"./threading/new_file{i}.txt"]
    )
    thread.start()
    threads.append(thread)

for thread in threads:
    thread.join()

Começamos com uma lista. Em seguida, continuamos a iterar cinco vezes, criando um novo encadeamento a cada vez. Em seguida, iniciamos cada thread, anexamos à nossa lista de "threads" e, em seguida, iteramos nossa lista uma última vez para ingressar em cada thread.

Explicação: Criar threads em Python é fácil.

Para criar um novo tópico, use threading.Thread(). Você pode passar para ele o kwarg (argumento de palavra-chave) targetcom um valor de qualquer função que você gostaria de executar nesse thread. Mas passe apenas o nome da função, não seu valor (ou seja, para nossos propósitos, write_genree não write_genre()). Para passar argumentos, passe "kwargs" (que recebe um dict de seus kwargs) ou "args" (que recebe um iterável contendo seus argumentos -- neste caso, uma lista).

No entanto, criar um thread não é o mesmo que iniciar um thread. Para iniciar seu tópico, use {the name of your thread}.start(). Iniciar um thread significa "iniciar sua execução".

Por fim, quando juntamos threads com thread.join(), tudo o que estamos fazendo é garantir que o thread tenha terminado antes de continuar com nosso código.

Tópicos

Mas o que exatamente é um fio?

Um thread é uma maneira de permitir que seu computador divida um único processo/programa em muitas partes leves que são executadas em paralelo. Um tanto confuso, a implementação padrão de encadeamento do Python limita os encadeamentos a apenas serem capazes de executar um de cada vez devido a algo chamado Global Interpreter Lock (GIL). O GIL é necessário porque o gerenciamento de memória do CPython (implementação padrão do Python) não é thread-safe. Devido a essa limitação, o encadeamento em Python é simultâneo, mas não paralelo. Para contornar isso, o Python tem um multiprocessingmódulo separado não limitado pelo GIL que ativa processos separados, permitindo a execução paralela do seu código. O uso do multiprocessingmódulo é quase idêntico ao uso do threadingmódulo.

Mais informações sobre a segurança de thread e GIL do Python podem ser encontradas em Real Python e nos documentos oficiais do Python .

Daremos uma olhada mais detalhada no multiprocessamento em Python em breve.

Antes de mostrarmos a potencial melhoria de velocidade em relação ao código não encadeado, tomei a liberdade de também criar uma versão não encadeada do mesmo programa (novamente, disponível no GitHub ). Em vez de criar um novo thread e unir cada um, ele chama write_genreum loop for que itera cinco vezes.

Para comparar benchmarks de velocidade, também importei a timebiblioteca para cronometrar a execução de nossos scripts:

Starting...
Writing "binary indoremix" to "./sync/new_file0.txt"...
Writing "slavic aggro polka fusion" to "./sync/new_file1.txt"...
Writing "israeli new wave" to "./sync/new_file2.txt"...
Writing "byzantine motown" to "./sync/new_file3.txt"...
Writing "dutch hate industrialtune" to "./sync/new_file4.txt"...
Time to complete synchronous read/writes: 1.42 seconds

Ao executar o script, vemos que meu computador leva cerca de 1,49 segundo (junto com gêneros de música clássica como "dutch hate industrialtune"). Não é tão ruim.

Agora vamos executar a versão que usa threading:

Starting...
Writing "college k-dubstep" to "./threading/new_file2.txt"...
Writing "swiss dirt" to "./threading/new_file0.txt"...
Writing "bop idol alternative" to "./threading/new_file4.txt"...
Writing "ethertrio" to "./threading/new_file1.txt"...
Writing "beach aust shanty français" to "./threading/new_file3.txt"...
Time to complete threading read/writes: 0.77 seconds

A primeira coisa que pode se destacar para você é que as funções não estão sendo concluídas na ordem: 2 - 0 - 4 - 1 - 3

Isso ocorre devido à natureza assíncrona do encadeamento: enquanto uma função espera, outra começa e assim por diante. Como podemos continuar executando tarefas enquanto esperamos que os outros terminem (seja devido a operações de rede ou de E/S de arquivo), você também deve ter notado que reduzimos nosso tempo aproximadamente pela metade: 0,77 segundos. Embora isso possa não parecer muito agora, é fácil imaginar o caso real de construir um aplicativo da Web que precisa gravar muito mais dados em um arquivo ou interagir com serviços da Web muito mais complexos.

Então, se o encadeamento é tão bom, por que não terminamos o artigo aqui?

Porque existem maneiras ainda melhores de executar tarefas simultaneamente.

Assíncio

Vamos dar uma olhada em um exemplo usando assíncrono. Para este método, vamos instalar o aiohttp usando pip. Isso nos permitirá fazer solicitações sem bloqueio e receber respostas usando a sintaxe async/ awaitque será introduzida em breve. Ele também tem o benefício extra de uma função que converte uma resposta JSON sem precisar importar a jsonbiblioteca. Também instalaremos e importaremos aiofiles , que permite operações de arquivo sem bloqueio. Além de aiohttpe aiofiles, import asyncio, que vem com a biblioteca padrão do Python.

"Sem bloqueio" significa que um programa permitirá que outros threads continuem em execução enquanto aguardam. Isso se opõe ao código de "bloqueio", que interrompe completamente a execução do seu programa. As operações de E/S normais e síncronas sofrem com essa limitação.

Você pode encontrar o código para este exemplo aqui .

Assim que tivermos nossas importações em vigor, vamos dar uma olhada na versão assíncrona da write_genrefunção do nosso exemplo assíncrono:

async def write_genre(file_name):
    """
    Uses genrenator from binaryjazz.us to write a random genre to the
    name of the given file
    """

    async with aiohttp.ClientSession() as session:
        async with session.get("https://binaryjazz.us/wp-json/genrenator/v1/genre/") as response:
            genre = await response.json()

    async with aiofiles.open(file_name, "w") as new_file:
        print(f'Writing "{genre}" to "{file_name}"...')
        await new_file.write(genre)

Para aqueles que não estão familiarizados com a sintaxe async/ awaitque pode ser encontrada em muitas outras linguagens modernas, asyncdeclara que uma função, forloop ou withinstrução deve ser usada de forma assíncrona. Para chamar uma função assíncrona, você deve usar a awaitpalavra-chave de outra função assíncrona ou chamar create_task()diretamente do loop de eventos, que pode ser obtido de asyncio.get_event_loop()-- ou seja, loop = asyncio.get_event_loop().

Adicionalmente:

  1. async withpermite aguardar respostas assíncronas e operações de arquivo.
  2. async for(não usado aqui) itera em um fluxo assíncrono .

O ciclo de eventos

Os loops de eventos são construções inerentes à programação assíncrona que permitem realizar tarefas de forma assíncrona. Enquanto você está lendo este artigo, posso assumir com segurança que você provavelmente não está muito familiarizado com o conceito. No entanto, mesmo que você nunca tenha escrito um aplicativo assíncrono, você tem experiência com loops de eventos toda vez que usa um computador. Esteja seu computador ouvindo a entrada do teclado, você jogando jogos multiplayer online ou navegando no Reddit enquanto os arquivos são copiados em segundo plano, um loop de eventos é a força motriz que mantém tudo funcionando de forma suave e eficiente. Em sua essência mais pura, um loop de eventos é um processo que espera por gatilhos e, em seguida, executa ações específicas (programadas) quando esses gatilhos são atendidos. Eles geralmente retornam uma "promessa" (sintaxe JavaScript) ou "futuro" (Sintaxe Python) de algum tipo para denotar que uma tarefa foi adicionada. Depois que a tarefa é concluída, a promessa ou o futuro retorna um valor passado da função chamada (supondo que a função retorne um valor).

A ideia de executar uma função em resposta a outra função é chamada de "retorno de chamada".

Para outra abordagem sobre retornos de chamada e eventos, aqui está uma ótima resposta no Stack Overflow .

Aqui está um passo a passo da nossa função:

Estamos usando async withpara abrir nossa sessão do cliente de forma assíncrona. A aiohttp.ClientSession()classe é o que nos permite fazer requisições HTTP e permanecermos conectados a uma fonte sem bloquear a execução do nosso código. Em seguida, fazemos uma solicitação assíncrona à API do Genrenator e aguardamos a resposta JSON (um gênero musical aleatório). Na próxima linha, usamos async withnovamente com a aiofilesbiblioteca para abrir de forma assíncrona um novo arquivo para gravar nosso novo gênero. Imprimimos o gênero e depois gravamos no arquivo.

Ao contrário dos scripts Python regulares, a programação com assíncrono praticamente impõe* o uso de algum tipo de função "principal".

*A menos que você esteja usando a sintaxe obsoleta "yield" com o decorador @asyncio.coroutine, que será removido no Python 3.10 .

Isso ocorre porque você precisa usar a palavra-chave "async" para usar a sintaxe "await", e a sintaxe "await" é a única maneira de realmente executar outras funções assíncronas.

Aqui está nossa função principal:

async def main():
    tasks = []

    for i in range(5):
        tasks.append(write_genre(f"./async/new_file{i}.txt"))

    await asyncio.gather(*tasks)

Como você pode ver, nós o declaramos com "async". Em seguida, criamos uma lista vazia chamada "tasks" para abrigar nossas tarefas assíncronas (chamadas para Genrenator e nosso arquivo I/O). Anexamos nossas tarefas à nossa lista, mas elas ainda não foram executadas. As chamadas não são feitas até que as agendemos com await asyncio.gather(*tasks). Isso executa todas as tarefas em nossa lista e espera que elas terminem antes de continuar com o restante do nosso programa. Por fim, usamos asyncio.run(main())para executar nossa função "main". A .run()função é o ponto de entrada para nosso programa e geralmente deve ser chamada apenas uma vez por processo .

Para quem não conhece, a *frente das tarefas é chamada de "descompactação de argumentos". Assim como parece, ele descompacta nossa lista em uma série de argumentos para nossa função. Nossa função é asyncio.gather()neste caso.

E isso é tudo que precisamos fazer. Agora, executando nosso programa (cuja fonte inclui a mesma funcionalidade de tempo dos exemplos síncrono e de encadeamento)...

Writing "albuquerque fiddlehaus" to "./async/new_file1.txt"...
Writing "euroreggaebop" to "./async/new_file2.txt"...
Writing "shoedisco" to "./async/new_file0.txt"...
Writing "russiagaze" to "./async/new_file4.txt"...
Writing "alternative xylophone" to "./async/new_file3.txt"...
Time to complete asyncio read/writes: 0.71 seconds

...vemos que é ainda mais rápido. E, em geral, o método assíncrono sempre será um pouco mais rápido que o método de encadeamento. Isso ocorre porque quando usamos a sintaxe "await", basicamente dizemos ao nosso programa "espere, já volto", mas nosso programa registra quanto tempo levamos para terminar o que estamos fazendo. Quando terminarmos, nosso programa saberá e voltará a funcionar assim que possível. Threading em Python permite assincronia, mas nosso programa teoricamente poderia pular diferentes threads que podem ainda não estar prontos, perdendo tempo se houver threads prontos para continuar rodando.

Então, quando devo usar o encadeamento e quando devo usar o assíncrono?

Quando você estiver escrevendo um novo código, use assíncrono. Se você precisar fazer interface com bibliotecas mais antigas ou que não suportam assíncrono, talvez seja melhor usar o threading.

Testando assíncrono com pytest

Acontece que testar funções assíncronas com pytest é tão fácil quanto testar funções síncronas. Basta instalar o pacote pytest-asyncio com pip, marcar seus testes com a palavra- asyncchave e aplicar um decorador que pytestinforme que é assíncrono: @pytest.mark.asyncio. Vejamos um exemplo.

Primeiro, vamos escrever uma função assíncrona arbitrária em um arquivo chamado hello_asyncio.py :

import asyncio


async def say_hello(name: str):
    """ Sleeps for two seconds, then prints 'Hello, {{ name }}!' """
    try:
        if type(name) != str:
            raise TypeError("'name' must be a string")
        if name == "":
            raise ValueError("'name' cannot be empty")
    except (TypeError, ValueError):
        raise

    print("Sleeping...")
    await asyncio.sleep(2)
    print(f"Hello, {name}!")

A função recebe um único argumento de string: name. Ao garantir que nameé uma string com um comprimento maior que um, nossa função dorme de forma assíncrona por dois segundos e depois imprime "Hello, {name}!"no console.

A diferença entre asyncio.sleep()e time.sleep()é que asyncio.sleep()não é bloqueante.

Agora vamos testá-lo com pytest. No mesmo diretório que hello_asyncio.py, crie um arquivo chamado test_hello_asyncio.py e abra-o em seu editor de texto favorito.

Vamos começar com nossas importações:

import pytest # Note: pytest-asyncio does not require a separate import

from hello_asyncio import say_hello

Em seguida, criaremos um teste com entrada adequada:

@pytest.mark.parametrize("name", [
    "Robert Paulson",
    "Seven of Nine",
    "x Æ a-12"
])
@pytest.mark.asyncio
async def test_say_hello(name):
    await say_hello(name)

Coisas a observar:

  • O @pytest.mark.asynciodecorador permite que o pytest funcione de forma assíncrona
  • Nosso teste usa a asyncsintaxe
  • Estamos awaitexecutando nossa função assíncrona como faríamos se a estivéssemos executando fora de um teste

Agora vamos executar nosso teste com a -vopção verbose:

pytest -v
...
collected 3 items

test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED    [ 33%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED     [ 66%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED       [100%]

Parece bom. Em seguida, escreveremos alguns testes com entrada ruim. De volta ao test_hello_asyncio.py , vamos criar uma classe chamada TestSayHelloThrowsExceptions:

class TestSayHelloThrowsExceptions:
    @pytest.mark.parametrize("name", [
        "",
    ])
    @pytest.mark.asyncio
    async def test_say_hello_value_error(self, name):
        with pytest.raises(ValueError):
            await say_hello(name)

    @pytest.mark.parametrize("name", [
        19,
        {"name", "Diane"},
        []
    ])
    @pytest.mark.asyncio
    async def test_say_hello_type_error(self, name):
        with pytest.raises(TypeError):
            await say_hello(name)

Novamente, decoramos nossos testes com @pytest.mark.asyncio, marcamos nossos testes com a asyncsintaxe e chamamos nossa função com await.

Execute os testes novamente:

pytest -v
...
collected 7 items

test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED                                    [ 14%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED                                     [ 28%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED                                       [ 42%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_value_error[] PASSED        [ 57%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[19] PASSED       [ 71%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name1] PASSED    [ 85%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name2] PASSED    [100%]

Sem pytest-asyncio

Alternativamente ao pytest-asyncio, você pode criar um acessório pytest que produz um loop de eventos assíncrono:

import asyncio
import pytest

from hello_asyncio import say_hello


@pytest.fixture
def event_loop():
    loop = asyncio.get_event_loop()
    yield loop

Então, em vez de usar a sintaxe async/ await, você cria seus testes como faria com testes normais e síncronos:

@pytest.mark.parametrize("name", [
    "Robert Paulson",
    "Seven of Nine",
    "x Æ a-12"
])
def test_say_hello(event_loop, name):
    event_loop.run_until_complete(say_hello(name))


class TestSayHelloThrowsExceptions:
    @pytest.mark.parametrize("name", [
        "",
    ])
    def test_say_hello_value_error(self, event_loop, name):
        with pytest.raises(ValueError):
            event_loop.run_until_complete(say_hello(name))

    @pytest.mark.parametrize("name", [
        19,
        {"name", "Diane"},
        []
    ])
    def test_say_hello_type_error(self, event_loop, name):
        with pytest.raises(TypeError):
            event_loop.run_until_complete(say_hello(name))

Se você estiver interessado, aqui está um tutorial mais avançado sobre testes assíncronos .

Leitura adicional

Se você quiser saber mais sobre o que distingue a implementação de threading versus assíncrona do Python, aqui está um ótimo artigo do Medium .

Para exemplos e explicações ainda melhores de encadeamento em Python, aqui está um vídeo de Corey Schafer que é mais aprofundado, incluindo o uso da concurrent.futuresbiblioteca.

Por fim, para um mergulho profundo no próprio assíncrono, aqui está um artigo do Real Python completamente dedicado a ele.

Bônus : mais uma biblioteca na qual você pode estar interessado é chamada Unsync , especialmente se você deseja converter facilmente seu código síncrono atual em código assíncrono. Para usá-lo, você instala a biblioteca com pip, importa com from unsync import unsync, depois decora qualquer função atualmente síncrona @unsyncpara torná-la assíncrona. Para aguardá-lo e obter seu valor de retorno (o que você pode fazer em qualquer lugar - não precisa estar em uma função assíncrona/dessincronizada), basta chamar .result()após a chamada da função.

Paralelismo

O que é paralelismo?

O paralelismo está muito relacionado à simultaneidade. Na verdade, o paralelismo é um subconjunto de simultaneidade: enquanto um processo simultâneo executa várias tarefas ao mesmo tempo, independentemente de estarem desviando a atenção total ou não, um processo paralelo está executando fisicamente várias tarefas ao mesmo tempo. Um bom exemplo seria dirigir, ouvir música e comer o BLT que fizemos na última seção ao mesmo tempo.

concorrente e paralelo

Como eles não exigem muito esforço intenso, você pode fazê-los todos de uma vez sem ter que esperar por nada ou desviar sua atenção.

Agora vamos dar uma olhada em como implementar isso em Python. Poderíamos usar a multiprocessingbiblioteca, mas vamos usar a concurrent.futuresbiblioteca - elimina a necessidade de gerenciar o número de processos manualmente. Como o principal benefício do multiprocessamento acontece quando você executa várias tarefas com muita CPU, vamos calcular os quadrados de 1 milhão (1000000) a 1 milhão e 16 (1000016).

Você pode encontrar o código para este exemplo aqui .

A única importação que precisamos é concurrent.futures:

import concurrent.futures
import time


if __name__ == "__main__":
    pow_list = [i for i in range(1000000, 1000016)]

    print("Starting...")
    start = time.time()

    with concurrent.futures.ProcessPoolExecutor() as executor:
        futures = [executor.submit(pow, i, i) for i in pow_list]

    for f in concurrent.futures.as_completed(futures):
        print("okay")

    end = time.time()
    print(f"Time to complete: {round(end - start, 2)}")

Como estou desenvolvendo em uma máquina Windows, estou usando o if __name__ == "main". Isso é necessário porque o Windows não possui a forkchamada de sistema inerente aos sistemas Unix . Como o Windows não possui esse recurso, ele lança um novo interpretador a cada processo que tenta importar o módulo principal. Se o módulo principal não existir, ele executará novamente todo o seu programa, causando um caos recursivo.

Então, dando uma olhada em nossa função principal, usamos uma compreensão de lista para criar uma lista de 1 milhão a 1 milhão e 16, abrimos um ProcessPoolExecutor com concurrent.futures e usamos compreensão de lista e ProcessPoolExecutor().submit()começamos a executar nossos processos e jogá-los em uma lista chamada "futuros".

Também poderíamos usar ThreadPoolExecutor()se quiséssemos usar threads -- concurrent.futures é versátil.

E é aí que entra a assincronia: a lista de "resultados" não contém os resultados da execução de nossas funções. Em vez disso, ele contém "futuros" que são semelhantes à ideia JavaScript de "promessas". Para permitir que nosso programa continue em execução, recuperamos esses futuros que representam um espaço reservado para um valor. Se tentarmos imprimir o futuro, dependendo se ele terminou de ser executado ou não, retornaremos um estado de "pendente" ou "concluído". Uma vez terminado, podemos obter o valor de retorno (supondo que haja um) usando var.result(). Nesse caso, nosso var será "resultado".

Em seguida, iteramos nossa lista de futuros, mas em vez de imprimir nossos valores, estamos simplesmente imprimindo "ok". Isso é apenas por causa de quão massivos os cálculos resultantes se tornam.

Assim como antes, criei um script de comparação que faz isso de forma síncrona. E, assim como antes, você pode encontrá-lo no GitHub .

Executando nosso programa de controle, que também inclui funcionalidade para cronometrar nosso programa, obtemos:

Starting...
okay
...
okay
Time to complete: 54.64

Uau. 54,64 segundos é bastante tempo. Vamos ver se nossa versão com multiprocessamento se sai melhor:

Starting...
okay
...
okay
Time to complete: 6.24

Nosso tempo foi significativamente reduzido. Estamos em cerca de 1/9 do nosso tempo original.

Então, o que aconteceria se usássemos threading para isso?

Tenho certeza que você pode adivinhar - não seria muito mais rápido do que fazê-lo de forma síncrona. Na verdade, pode ser mais lento porque ainda leva um pouco de tempo e esforço para criar novos tópicos. Mas não acredite na minha palavra, aqui está o que obtemos quando substituímos ProcessPoolExecutor()por ThreadPoolExecutor():

Starting...
okay
...
okay
Time to complete: 53.83

Como mencionei anteriormente, o encadeamento permite que seus aplicativos se concentrem em novas tarefas enquanto outros estão esperando. Neste caso, nunca estamos sentados de braços cruzados. O multiprocessamento, por outro lado, gera serviços totalmente novos, geralmente em núcleos de CPU separados, prontos para fazer o que você pedir, completamente em conjunto com o que mais seu script estiver fazendo. É por isso que a versão de multiprocessamento que ocupa cerca de 1/9 do tempo faz sentido - eu tenho 8 núcleos na minha CPU.

Agora que falamos sobre simultaneidade e paralelismo em Python, podemos finalmente esclarecer os termos. Se você está tendo problemas para distinguir entre os termos, você pode pensar com segurança e precisão em nossas definições anteriores de "paralelismo" e "simultaneidade" como "simultaneidade paralela" e "simultaneidade não paralela", respectivamente.

Leitura adicional

Real Python tem um ótimo artigo sobre simultaneidade vs paralelismo .

Engineer Man tem uma boa comparação de vídeo de threading vs multiprocessamento .

Corey Schafer também tem um bom vídeo sobre multiprocessamento no mesmo espírito de seu vídeo de threading.

Se você assistir apenas a um vídeo, assista a esta excelente palestra de Raymond Hettinger . Ele faz um trabalho incrível explicando as diferenças entre multiprocessamento, threading e assíncrono.

Combinando o Asyncio com o Multiprocessamento

E se eu precisar combinar muitas operações de E/S com cálculos pesados?

Podemos fazer isso também. Digamos que você precise extrair 100 páginas da Web para obter uma informação específica e, em seguida, salve essa informação em um arquivo para mais tarde. Podemos separar o poder de computação em cada um dos núcleos de nosso computador, fazendo com que cada processo raspe uma fração das páginas.

Para este script, vamos instalar o Beautiful Soup para nos ajudar a raspar facilmente nossas páginas: pip install beautifulsoup4. Desta vez, temos realmente algumas importações. Aqui estão eles, e é por isso que os estamos usando:

import asyncio                         # Gives us async/await
import concurrent.futures              # Allows creating new processes
import time
from math import floor                 # Helps divide up our requests evenly across our CPU cores
from multiprocessing import cpu_count  # Returns our number of CPU cores

import aiofiles                        # For asynchronously performing file I/O operations
import aiohttp                         # For asynchronously making HTTP requests
from bs4 import BeautifulSoup          # For easy webpage scraping

Você pode encontrar o código para este exemplo aqui .

Primeiro, vamos criar uma função assíncrona que faz uma solicitação à Wikipedia para recuperar páginas aleatórias. Vamos raspar cada página que obtivermos para seu título usando BeautifulSoup, e então vamos anexá-la a um determinado arquivo; vamos separar cada título com uma tabulação. A função terá dois argumentos:

  1. num_pages - Número de páginas a serem solicitadas e raspadas para títulos
  2. output_file - O arquivo para anexar nossos títulos
async def get_and_scrape_pages(num_pages: int, output_file: str):
    """
    Makes {{ num_pages }} requests to Wikipedia to receive {{ num_pages }} random
    articles, then scrapes each page for its title and appends it to {{ output_file }},
    separating each title with a tab: "\\t"

    #### Arguments
    ---
    num_pages: int -
        Number of random Wikipedia pages to request and scrape

    output_file: str -
        File to append titles to
    """
    async with \
    aiohttp.ClientSession() as client, \
    aiofiles.open(output_file, "a+", encoding="utf-8") as f:

        for _ in range(num_pages):
            async with client.get("https://en.wikipedia.org/wiki/Special:Random") as response:
                if response.status > 399:
                    # I was getting a 429 Too Many Requests at a higher volume of requests
                    response.raise_for_status()

                page = await response.text()
                soup = BeautifulSoup(page, features="html.parser")
                title = soup.find("h1").text

                await f.write(title + "\t")

        await f.write("\n")

Ambos estamos abrindo de forma assíncrona um aiohttp ClientSessione nosso arquivo de saída. O modo, a+, significa anexar ao arquivo e criá-lo se ele ainda não existir. Codificar nossas strings como utf-8 garante que não receberemos um erro se nossos títulos contiverem caracteres internacionais. Se recebermos uma resposta de erro, vamos aumentá-la em vez de continuar (em grandes volumes de solicitações, eu estava recebendo 429 Too Many Requests). Obtemos de forma assíncrona o texto de nossa resposta, analisamos o título e o anexamos de forma assíncrona ao nosso arquivo. Depois de anexarmos todos os nossos títulos, anexamos uma nova linha: "\n".

Nossa próxima função é a função que iniciaremos com cada novo processo para permitir executá-lo de forma assíncrona:

def start_scraping(num_pages: int, output_file: str, i: int):
    """ Starts an async process for requesting and scraping Wikipedia pages """
    print(f"Process {i} starting...")
    asyncio.run(get_and_scrape_pages(num_pages, output_file))
    print(f"Process {i} finished.")

Agora para a nossa função principal. Vamos começar com algumas constantes (e nossa declaração de função):

def main():
    NUM_PAGES = 100 # Number of pages to scrape altogether
    NUM_CORES = cpu_count() # Our number of CPU cores (including logical cores)
    OUTPUT_FILE = "./wiki_titles.tsv" # File to append our scraped titles to

    PAGES_PER_CORE = floor(NUM_PAGES / NUM_CORES)
    PAGES_FOR_FINAL_CORE = PAGES_PER_CORE + NUM_PAGES % PAGES_PER_CORE # For our final core

E agora a lógica:

    futures = []

    with concurrent.futures.ProcessPoolExecutor(NUM_CORES) as executor:
        for i in range(NUM_CORES - 1):
            new_future = executor.submit(
                start_scraping, # Function to perform
                # v Arguments v
                num_pages=PAGES_PER_CORE,
                output_file=OUTPUT_FILE,
                i=i
            )
            futures.append(new_future)

        futures.append(
            executor.submit(
                start_scraping,
                PAGES_FOR_FINAL_CORE, OUTPUT_FILE, NUM_CORES-1
            )
        )

    concurrent.futures.wait(futures)

Criamos um array para armazenar nossos futuros, depois criamos um ProcessPoolExecutor, definindo seu max_workersigual ao nosso número de núcleos. Nós iteramos em um intervalo igual ao nosso número de núcleos menos 1, executando um novo processo com nossa start_scrapingfunção. Em seguida, anexamos nossa lista de futuros. Nosso núcleo final terá potencialmente trabalho extra a fazer, pois irá extrair um número de páginas igual a cada um de nossos outros núcleos, mas também irá extrair um número de páginas igual ao restante que obtivemos ao dividir nosso número total de páginas a serem coletadas pelo nosso número total de núcleos de CPU.

Certifique-se de realmente executar sua função principal:

if __name__ == "__main__":
    start = time.time()
    main()
    print(f"Time to complete: {round(time.time() - start, 2)} seconds.")

Depois de executar o programa com minha CPU de 8 núcleos (junto com o código de benchmark):

Esta versão ( assíncrona com multiprocessamento ):

Time to complete: 5.65 seconds.

Multiprocessamento apenas :

Time to complete: 8.87 seconds.

apenas assíncrono :

Time to complete: 47.92 seconds.

Completamente síncrono :

Time to complete: 88.86 seconds.

Na verdade, estou bastante surpreso ao ver que a melhoria do assíncrono com multiprocessamento sobre apenas multiprocessamento não foi tão grande quanto eu pensei que seria.

Recapitulação: Quando usar multiprocessamento vs assíncrono ou encadeamento

  1. Use o multiprocessamento quando precisar fazer muitos cálculos pesados ​​e puder dividi-los.
  2. Use assíncrono ou encadeamento quando estiver executando operações de E/S -- comunicando-se com recursos externos ou lendo/gravando de/para arquivos.
  3. Multiprocessamento e assíncrono podem ser usados ​​juntos, mas uma boa regra geral é bifurcar um processo antes de encadear/usar o assíncrono em vez do contrário -- encadeamentos são relativamente baratos em comparação com processos.

Assíncrono/Aguardar em outros idiomas

async/ awaite sintaxe semelhante também existem em outras linguagens e, em algumas dessas linguagens, sua implementação pode diferir drasticamente.

.NET: F# para C

A primeira linguagem de programação (em 2007) a usar a asyncsintaxe foi o F# da Microsoft. Considerando que ele não usa exatamente awaitpara esperar em uma chamada de função, ele usa uma sintaxe específica como let!e do!junto com Asyncfunções proprietárias incluídas no Systemmódulo.

Você pode encontrar mais informações sobre programação assíncrona em F# em F# docs da Microsoft .

A equipe de C# deles construiu esse conceito, e é aí que as palavras-chave async/ com as awaitquais estamos familiarizados nasceram:

using System;

// Allows the "Task" return type
using System.Threading.Tasks;

public class Program
{
    // Declare an async function with "async"
    private static async Task<string> ReturnHello()
    {
        return "hello world";
    }

    // Main can be async -- no problem
    public static async Task Main()
    {
        // await an async string
        string result = await ReturnHello();

        // Print the string we got asynchronously
        Console.WriteLine(result);
    }
}

Execute-o no .NETFiddle

Garantimos que estamos using System.Threading.Taskscomo inclui o Tasktipo e, em geral, o Tasktipo é necessário para que uma função assíncrona seja aguardada. O legal do C# é que você pode tornar sua função principal assíncrona apenas declarando-a com async, e você não terá problemas.

Se você estiver interessado em aprender mais sobre async/ awaitem C#, os documentos C# da Microsoft têm uma boa página.

JavaScript

Introduzido pela primeira vez no ES6, a sintaxe async/ awaité essencialmente uma abstração sobre as promessas do JavaScript (que são semelhantes aos futuros do Python). Ao contrário do Python, no entanto, desde que você não esteja aguardando, você pode chamar uma função assíncrona normalmente sem uma função específica como a do Python asyncio.start():

// Declare a function with async
async function returnHello(){
    return "hello world";
}

async function printSomething(){
    // await an async string
    const result = await returnHello();

    // print the string we got asynchronously
    console.log(result);
}

// Run our async code
printSomething();

Execute-o no JSFiddle

Consulte MDN para obter mais informações sobre async/ awaitem JavaScript .

Ferrugem

Rust agora também permite o uso da sintaxe async/ awaite funciona de maneira semelhante ao Python, C# e JavaScript:

// Allows blocking synchronous code to run async code
use futures::executor::block_on;

// Declare an async function with "async"
async fn return_hello() -> String {
    "hello world".to_string()
}

// Code that awaits must also be declared with "async"
async fn print_something(){
    // await an async String
    let result: String = return_hello().await;

    // Print the string we got asynchronously
    println!("{0}", result);
}

fn main() {
    // Block the current synchronous execution to run our async code
    block_on(print_something());
}

Execute-o no Rust Play

Para usar funções assíncronas, devemos primeiro adicionar futures = "0.3"ao nosso Cargo.toml . Em seguida, importamos a block_onfunção com use futures::executor::block_on-- block_oné necessário para executar nossa função assíncrona de nossa mainfunção síncrona.

Você pode encontrar mais informações sobre async/ awaitem Rust nos documentos do Rust.

Vai

Em vez da sintaxe async/ tradicional awaitinerente a todas as linguagens anteriores que abordamos, o Go usa "goroutines" e "channels". Você pode pensar em um canal como sendo semelhante a um futuro Python. Em Go, você geralmente envia um canal como um argumento para uma função e, em seguida, usa gopara executar a função simultaneamente. Sempre que precisar garantir que a função foi concluída, use a <-sintaxe, que pode ser considerada a awaitsintaxe mais comum. Se sua goroutine (a função que você está executando de forma assíncrona) tiver um valor de retorno, ela poderá ser capturada dessa maneira.

package main

import "fmt"

// "chan" makes the return value a string channel instead of a string
func returnHello(result chan string){
    // Gives our channel a value
    result <- "hello world"
}

func main() {
    // Creates a string channel
    result := make(chan string)

    // Starts execution of our goroutine
    go returnHello(result)

    // Awaits and prints our string
    fmt.Println(<- result)
}

Execute-o no Go Playground

Para obter mais informações sobre simultaneidade em Go, consulte An Introduction to Programming in Go por Caleb Doxsey.

Rubi

Da mesma forma que o Python, o Ruby também possui a limitação Global Interpreter Lock. O que não tem é simultaneidade embutida na linguagem. No entanto, existe uma gem criada pela comunidade que permite simultaneidade em Ruby, e você pode encontrar sua fonte no GitHub .

Java

Como Ruby, Java não possui a sintaxe async/ awaitincorporada, mas possui recursos de simultaneidade usando o java.util.concurrentmódulo. No entanto, a Electronic Arts escreveu uma biblioteca Async que permite o uso awaitcomo método. Não é exatamente o mesmo que Python/C#/JavaScript/Rust, mas vale a pena dar uma olhada se você for um desenvolvedor Java e estiver interessado nesse tipo de funcionalidade.

C++

Embora o C++ também não tenha a sintaxe async/ await, ele tem a capacidade de usar futuros para executar código simultaneamente usando o futuresmódulo:

#include <iostream>
#include <string>

// Necessary for futures
#include <future>

// No async declaration needed
std::string return_hello() {
    return "hello world";
}

int main ()
{
    // Declares a string future
    std::future<std::string> fut = std::async(return_hello);

    // Awaits the result of the future
    std::string result = fut.get();

    // Prints the string we got asynchronously
    std::cout << result << '\n';
}

Execute-o no C++ Shell

Não há necessidade de declarar uma função com qualquer palavra-chave para indicar se ela pode ou não ser executada de forma assíncrona. Em vez disso, você declara seu futuro inicial sempre que precisar std::future<{{ function return type }}>e o define como std::async(), incluindo o nome da função que deseja executar de forma assíncrona junto com quaisquer argumentos necessários - ou seja, std::async(do_something, 1, 2, "string"). Para aguardar o valor do futuro, use a .get()sintaxe nele.

Você pode encontrar documentação para async em C++ em cplusplus.com.

Resumo

Esteja você trabalhando com rede assíncrona ou operações de arquivo ou executando vários cálculos complexos, existem algumas maneiras diferentes de maximizar a eficiência do seu código.

Se estiver usando Python, você pode usar asyncioou threadingpara aproveitar ao máximo as operações de E/S ou o multiprocessingmódulo para código com uso intensivo de CPU.

Lembre-se também de que o concurrent.futuresmódulo pode ser usado no lugar de threadingou multiprocessing.

Se você estiver usando outra linguagem de programação, é provável que haja uma implementação de async/ awaitpara ela também.

Fonte:  https://testdrive.io

#python #concurrency #asyncio 

Aplicar Simultaneidade, Paralelismo, Assíncrono Para Acelerar O Python

Goでの並行性の習得— Select、Goroutines、Channelsを使用

この記事では、Golangでselect、goroutines、channelsを組み合わせた並行プログラムを構築する方法について説明します。

並行性、チャネル、およびゴルーチンの概念を理解するために、最初にこれら2つの記事を読むことをお勧めします。

選択する

Goツアーのドキュメントから:

「このselectステートメントにより、ゴルーチンは複数の通信操作を待機できます。

ケースのselect1つが実行できるようになるまでブロックし、その後、そのケースを実行します。複数の準備ができている場合は、ランダムに1つを選択します。」

APIサーバーの応答

select最速のAPI呼び出しから応答を取得するためにどのように使用できるかを調査します。select理解するためのコードとその強力な機能について詳しく見ていきましょう。

package main

import (
	"encoding/json"
	"fmt"
	"io/ioutil"
	"log"
	"net/http"
	"time"
)

const (
	API_KEY              = "f32ee7b348msh230c75aaf106721p1366a6jsn952b266f7ae5"
	API_GOOGLE_NEWS_HOST = "google-news.p.rapidapi.com"
	API_FREE_NEWS_HOST   = "free-news.p.rapidapi.com"
	GOOGLE_NEWS_URL      = "https://google-news.p.rapidapi.com/v1/top_headlines?lang=en&country=US"
	FREE_NEWS_URL        = "https://free-news.p.rapidapi.com/v1/search?lang=en&q=Elon"
)

var (
	google = make(chan News)
	free   = make(chan News)
)

type Article struct {
	Title   string `json:"title"`
	Link    string `json:"link"`
	Id      string `json:"id"`
	MongoId string `json:"_id"`
}

type News struct {
	Source   string
	Articles []*Article `json:"articles"`
}

type Function struct {
	f       func(news chan<- News)
	channel chan News
}

func main() {
	functions := []*Function{
		{f: googleNews, channel: google},
		{f: freeNews, channel: free},
	}
	quickestApiResponse(functions)
}

func quickestApiResponse(functions []*Function) {
	var articles []*Article

	for _, function := range functions {
		function.Run()
	}

	select {
	case googleNewsResponse := <-google:
		fmt.Printf("Source: %s\n", googleNewsResponse.Source)
		articles = googleNewsResponse.Articles
	case freeNewsReponse := <-free:
		fmt.Printf("Source: %s\n", freeNewsReponse.Source)
		articles = freeNewsReponse.Articles
	}

	fmt.Printf("Articles %v\n", articles)
}

func googleNews(google chan<- News) {
	req, err := http.NewRequest("GET", GOOGLE_NEWS_URL, nil)
	if err != nil {
		fmt.Printf("Error initializing request%v\n", err.Error())
		return
	}

	req.Header.Add("X-RapidAPI-Key", API_KEY)
	req.Header.Add("X-RapidAPI-Host", API_GOOGLE_NEWS_HOST)
	client := &http.Client{}
	resp, err := client.Do(req)
	if err != nil {
		fmt.Printf("Error making request %v\n", err.Error())
		return
	}
	defer resp.Body.Close()

	if resp.StatusCode != 200 {
		fmt.Printf("Google News Response StatusCode %v Status %v\n", resp.StatusCode, resp.Status)
		return
	}

	googleNewsArticles := News{Source: "GoogleNewsApi"}
	if err := json.NewDecoder(resp.Body).Decode(&googleNewsArticles); err != nil {
		fmt.Printf("Error decoding body %v\n", err.Error())
		return
	}

	fmt.Printf("Google Articles %v\n", googleNewsArticles)
	fmt.Printf("Google Articles Size %d\n", len(googleNewsArticles.Articles))
	google <- googleNewsArticles
}

func freeNews(free chan<- News) {
	req, err := http.NewRequest("GET", FREE_NEWS_URL, nil)
	if err != nil {
		fmt.Printf("Error initializing request%v\n", err.Error())
		return
	}

	req.Header.Add("X-RapidAPI-Key", API_KEY)
	req.Header.Add("X-RapidAPI-Host", API_FREE_NEWS_HOST)
	client := &http.Client{}
	resp, err := client.Do(req)
	if err != nil {
		fmt.Printf("Error making request %v\n", err.Error())
		return
	}
	defer resp.Body.Close()

	if resp.StatusCode != 200 {
		fmt.Printf("Free News Response StatusCode %v Status %v\n", resp.StatusCode, resp.Status)
		return
	}

	var freeNewsArticles News
	if err := json.NewDecoder(resp.Body).Decode(&freeNewsArticles); err != nil {
		fmt.Printf("Error decoding body %v\n", err.Error())
		return
	}

	freeNewsArticles.Source = "FreeNewsApi"
	fmt.Printf("Free Articles %v\n", freeNewsArticles)
	fmt.Printf("Free Articles Size %d\n", len(freeNewsArticles.Articles))
	free <- freeNewsArticles
}

func (a *Article) GetId() string {
	if a.Id == "" {
		return a.MongoId
	}

	return a.Id
}

func (f *Function) Run() {
	go f.f(f.channel)
}

上記の実装は、ケースの1つが実行されるまでselectがどのように待機するかを強調することに焦点を当てています。

この例では、さまざまな部分を理解することが重要なので、それらを1つずつ見ていきましょう。

選択ロジックを見る前に、API呼び出しがどのように行われるかを調べてみましょう。

Function構造体は単一のAPI呼び出しを表し、その属性はタイプfのチャネルを取得するNews関数です。この関数のシグネチャによって、チャネルがチャネルとして扱われることがすでに強制されていることに注意してくださいsend-only。2番目の属性は、タイプのチャネルですNews。 API呼び出しが実行され、応答が解析されます。このチャネルは、結果の送信に使用されます。

構造体は、News記事を保持するオブジェクトであり、記事がどのソースからのものであるかを示します。

43行目では、のスライスFunctionを2つの要素で初期化します。最初の要素はgoogleNews関数を持ち、googleチャネルを使用し、2番目の要素は関数を使用freeNewsしてチャネルを使用しfreeます。

両方のAPI呼び出しがニュースをフェッチするため、チャネルは同じタイプですが、関数ごとに1つです。

69行目と102行目には、これら2つのAPIの実装があります。それぞれがそれぞれのURLにHTTPリクエストを送信し、応答を解析します。それが完了すると、ニュースはそれぞれのチャネルを介して送信されます。

quickestApiResponseそれでは、メソッドに焦点を当てましょう。このメソッドの目的は、記事変数を最速のAPIからの応答に設定することです。Run54行目では、メソッドを呼び出して各関数を実行しています。このメソッドは、関数で新しいゴルーチンを開始し、チャネルを渡します。これらのAPI呼び出しは、順番に実行したくないため、別のゴルーチンで実行する必要があることに注意してください。

次に、selectはgoogleまたはfreeチャネルのいずれかが応答を送信するのを待ちます。API呼び出しのいずれかがそれぞれのチャネルを介して応答を送信すると、selectはその場合のコードを実行し、他のコードを無視します。これにより、記事が最速のAPI呼び出しからの応答に効果的に設定されます。

プログラムを実行して、出力を確認してみましょう。

APIサーバーの応答出力

FreeNewsApiより速く走った!

このロジックは他の多くのユースケースに適用でき、プログラムが複数のゴルーチンを実行し、チャネルを使用して通信し、selectを使用してそれらを待機できるようにします。

この例で実装できるもう1つのことは、ある種のタイムアウトを強制することです。API呼び出しに制限を超える場合は、記事を空のままにします。以下のコードは、selectにケースをもう1つ追加することでこれを実現しています。

const (
  API_MAX_TIMEOUT      = 3 * time.Second 
)

func quickestApiResponse(functions []*Function) {
	var articles []*Article

	for _, function := range functions {
		function.Run()
	}

	select {
	case googleNewsResponse := <-google:
		fmt.Printf("Source: %s\n", googleNewsResponse.Source)
		articles = googleNewsResponse.Articles
	case freeNewsReponse := <-free:
		fmt.Printf("Source: %s\n", freeNewsReponse.Source)
		articles = freeNewsReponse.Articles
         case <-time.After(API_MAX_TIMEOUT):
		fmt.Println("Time out! API calls took too long!!")
	}

	fmt.Printf("Articles %v\n", articles)
}

はタイプのtime.Afterチャネルを返し、time.Time指定された時間が経過すると現在の時刻を送信します。ここでは、このチャネルの値を変数に割り当てていないことに注意してください。これは、チャネルが送信するデータを気にせず、信号の受信のみを気にするためです。両方のAPIで3秒間スリープすると、タイムアウトケースが実行され、他の2つのケースは無視されます。

APIサーバーの応答タイムアウト

定期的なプロセスの実行

select定期的なプロセスを実行するためにどのように使用できるかを見てみましょう。このプログラムでは、次のシナリオがあります。

プログラムは、プロセスが実行を開始するタイミングと各実行間の間隔時間として、任意の関数を定期的なプロセスとして渡す必要があります。

以下に初期コードがあります。見てみましょう。

package chaptereight

import (
	"fmt"
	"math/rand"
	"time"

	"github.com/brianvoe/gofakeit/v6"
)

type PendingUserNotifications map[int][]*Notification
type Notification struct {
	Content string
	UserId  int
}

func sendUserBatchNotificationsEmail(userId int, notifications []*Notification) {
	fmt.Printf("Sending email to user with userId %d for pending notifications %v\n", userId, notifications)
}

func handlePendingUsersNotifications(pendingNotifications PendingUserNotifications, handler func(userId int, notifications []*Notification)) {
	for userId, notifications := range pendingNotifications {
		handler(userId, notifications)
		delete(pendingNotifications, userId)
	}
}

func collectNewUsersNotifications(notifications PendingUserNotifications) {
	randomNotifications := getRandomNotifications()
	if len(randomNotifications) > 0 {
		notifications[randomNotifications[0].UserId] = randomNotifications
	}
}

func getRandomNotifications() (notifications []*Notification) {
	rand.Seed(time.Now().UnixNano())
	userId := rand.Intn(100-10+1) + 10
	numOfNotifications := rand.Intn(5-0+1) + 0
	fmt.Printf("numOfNotifications %v\n", numOfNotifications)
	for i := 0; i < numOfNotifications; i++ {
		notifications = append(notifications, &Notification{Content: gofakeit.Paragraph(1, 2, 10, " "), UserId: userId})
	}

	return
}

上記のコードは、実行するタスクを反映しています。2つの主要な機能collectNewUsersNotificationsとがありhandlePendingUsersNotificationsます。1つ目は、すべての新しいユーザー通知を収集することを目的としています。理想的な実装は、この関数がデータベース内の未読通知を検索することですが、この例では、特定のユーザーに対してランダムな通知を受け取ることをシミュレートしています。

通知はNotification、コンテンツ用とユーザーID用の2つのフィールドのみを持つ構造体を使用して作成されます。

収集関数は、PendingUserNotificationsタイプを使用して通知を格納します。このタイプは、キーがユーザーIDを表す整数であり、値がのスライスであるマップですNotification

すべての通知を収集handlePendingUserNotificationsしたら、関数を使用して通知を繰り返し処理し、各通知に対してハンドラー関数を実行します。各ユーザーの通知を処理した後、それらはマップから削除されます。この場合に使用するハンドラーはsendUserBatchNotificationsEmailです。その目的は、保留中のすべての通知を含む電子メールをユーザーに送信して、ユーザーが確認できるようにすることです。

ここで、を使用してこのタスクを繰り返し実行する方法に焦点を当てましょうselect。前に述べたように、次のことを考慮する必要があります。

  • インターバル時間を渡すことを許可する
  • プロセスの開始時刻を渡すことを許可します
  • 発信者が必要なときに定期的なプロセスをキャンセル/停止できるようにする

以下のコードは、これを実現する方法を示しています。

package main

import (
	"fmt"
	"math/rand"
	"time"

	"github.com/brianvoe/gofakeit/v6"
)

type PendingUserNotifications map[int][]*Notification
type ProcessHandler func()
type Notification struct {
	Content string
	UserId  int
}
type RecurringProcess struct {
	name      string
	interval  time.Duration
	startTime time.Time
	handler   func()
	stop      chan struct{}
}

func main() {
	pendingNotificationsProcess()
}

func pendingNotificationsProcess() {
	process := &RecurringProcess{}
	notifications := PendingUserNotifications{}
	handler := func() {
		collectNewUsersNotifications(notifications)
		handlePendingUsersNotifications(notifications, sendUserBatchNotificationsEmail, process)
	}
	interval := 10 * time.Second
	startTime := time.Now().Add(3 * time.Minute)
	process = createRecurringProcess("Pending User Notifications", handler, interval, startTime)

	<-process.stop
}

func sendUserBatchNotificationsEmail(userId int, notifications []*Notification) {
	fmt.Printf("Sending email to user with userId %d for pending notifications %v\n", userId, notifications)
}

func handlePendingUsersNotifications(pendingNotifications PendingUserNotifications, handler func(userId int, notifications []*Notification), process *RecurringProcess) {
	userNotificationCount := 0
	for userId, notifications := range pendingNotifications {
		userNotificationCount++
		handler(userId, notifications)
		delete(pendingNotifications, userId)
	}

	if userNotificationCount == 0 {
		process.Cancel()
	}
}

func collectNewUsersNotifications(notifications PendingUserNotifications) {
	randomNotifications := getRandomNotifications()
	if len(randomNotifications) > 0 {
		notifications[randomNotifications[0].UserId] = randomNotifications
	}
}

func getRandomNotifications() (notifications []*Notification) {
	rand.Seed(time.Now().UnixNano())
	userId := rand.Intn(100-10+1) + 10
	numOfNotifications := rand.Intn(5-0+1) + 0
	fmt.Printf("numOfNotifications %v\n", numOfNotifications)
	for i := 0; i < numOfNotifications; i++ {
		notifications = append(notifications, &Notification{Content: gofakeit.Paragraph(1, 2, 10, " "), UserId: userId})
	}

	return
}

func createRecurringProcess(name string, handler ProcessHandler, interval time.Duration, startTime time.Time) *RecurringProcess {
	process := &RecurringProcess{
		name:      name,
		interval:  interval,
		startTime: startTime,
		handler:   handler,
		stop:      make(chan struct{}),
	}

	go process.Start()

	return process
}

func (p *RecurringProcess) Start() {
	startTicker := &time.Timer{}
	ticker := &time.Ticker{C: nil}
	defer func() { ticker.Stop() }()

	if p.startTime.Before(time.Now()) {
		p.startTime = time.Now()
	}
	startTicker = time.NewTimer(time.Until(p.startTime))

	for {
		select {
		case <-startTicker.C:
			ticker = time.NewTicker(p.interval)
			fmt.Println("Starting recurring process")
			p.handler()
		case <-ticker.C:
			fmt.Println("Next run")
			p.handler()
		case <-p.stop:
			fmt.Println("Stoping recurring process")
			return
		}
	}
}

func (p *RecurringProcess) Cancel() {
	close(p.stop)
}

定期的なプロセスを表す新しい構造体を導入しましたRecurringProcess。この構造体には、次のフィールドが含まれています。

  • name —プロセスの名前
  • interval —各実行間の間隔時間
  • startTime —プロセスが開始される時間
  • handler —実行ごとに呼び出すハンドラー関数
  • stop —プロセスを停止するチャネル

関数ではpendingNotificationsProcess、新しい定期的なプロセスと通知をそれぞれ30行目と31行目に初期化します。使用するハンドラー関数は、collectNewUsersNotificationshandlePendingUsersNotifications関数の両方を内部に持つ関数です。handlePendingUsersNotificationsここで、プロセスを停止する必要があるため、プロセスをに渡していることに注意してください。

間隔と開始時間も指定しました。

次に、を呼び出しますcreateRecurringProcess。この関数は定期的なプロセスを作成し、それも開始します。プロセスを開始するためにゴルーチンを使用している88行目に焦点を当てましょう。

40行目では、停止チャネルから読み取ることによってメインゴルーチンをブロックします。これは、メッセージがこのチャネルに送信されるまでメインゴルーチンがブロックされることを意味します。

Start繰り返しプロセスを実行するためのすべてのロジックを含む93行目の関数を見てみましょう。

この関数は、startTicker変数を使用して、開始時刻を使用して定期的なプロセスを開始します。開始時刻が過去の場合、プロセスはすぐに開始されます。

time.NewTimer指定された期間が経過すると、はチャネルで現在の時刻を送信します。これにより、プロセスを開始できます。これが、チャネルが信号を受信するのを待機しているselectの最初のケースがある理由です。

また、95行目には。であるticker変数がありtime.Tickerます。goのティッカーは、指定された間隔でそのチャネルにティックを送信します。startTicker.Cチャネルがシグナルを送信したら、ticker106行目の変数に間隔を指定した新しいティッカーを割り当て、ハンドラー関数を呼び出します。

この後、tickerは2番目の選択ケースでティックの受信を開始し、ティックを受信するたびに、ハンドラー関数も呼び出されます。

選択の最後のケースでは、シグナルが送信されるまで待機して、戻るだけでプロセスを停止します。

selectが無限forループ内にあることに注目してください。これは、ケースの1つが明示的にループを解除するまでループを継続したいためです。ティックを受け取るたびに、2番目のケースが実行され、次に同じループに入り、selectはそのケースのいくつかが実行されるのを再び待ちます。

55行目にロジックを追加してプロセスを停止するために、通知の数をカウントし、保留中の通知がない場合、プログラムはプロセスをキャンセルします。このCancel関数は停止チャネルを閉じ、これによりプログラムが終了します。

プログラムを実行して、どのように機能するかを見てみましょう。

プログラム出力

プログラムは期待どおりに機能します。これは、定期的なプロセスを実行する方法の単なる例です。これは、より複雑なものを実装するための基本コードになる可能性があります。を使用して複雑なプログラムを作成できますselect

結論

並行プログラムの構築は、特にゴルーチン、チャネル、および選択がどのように機能するかを理解するのに苦労している場合、最初は難しい場合があります。

この記事で、混乱が少なくなり、を使用できるいくつかのユースケースが見つかったことを願っていますselect

読んでいただきありがとうございます。今後ともよろしくお願いいたします。 

このストーリーは、もともとhttps://betterprogramming.pub/concurrency-with-select-goroutines-and-channels-9786e0c6be3cで公開されました

#concurrency #go 

Goでの並行性の習得— Select、Goroutines、Channelsを使用

Dominar La Simultaneidad En Go: Con Select, Goroutines Y Channels

En este artículo, vamos a hablar sobre cómo crear programas simultáneos combinando select, goroutines y canales en Golang.

Recomendaría leer estos dos artículos primero para familiarizarse con los conceptos de simultaneidad, canales y goroutines.

Seleccione

De la documentación del recorrido Go:

“La selectdeclaración permite que una rutina go espere en múltiples operaciones de comunicación.

A selectbloquea hasta que se puede ejecutar uno de sus casos, luego ejecuta ese caso. Elige uno al azar si hay varios listos”.

Respuesta del servidor API

Vamos a investigar cómo podemos usar selectpara tomar la respuesta de la llamada API más rápida. Sumerjámonos en un poco de código para entender selecty sus poderosas características.

package main

import (
	"encoding/json"
	"fmt"
	"io/ioutil"
	"log"
	"net/http"
	"time"
)

const (
	API_KEY              = "f32ee7b348msh230c75aaf106721p1366a6jsn952b266f7ae5"
	API_GOOGLE_NEWS_HOST = "google-news.p.rapidapi.com"
	API_FREE_NEWS_HOST   = "free-news.p.rapidapi.com"
	GOOGLE_NEWS_URL      = "https://google-news.p.rapidapi.com/v1/top_headlines?lang=en&country=US"
	FREE_NEWS_URL        = "https://free-news.p.rapidapi.com/v1/search?lang=en&q=Elon"
)

var (
	google = make(chan News)
	free   = make(chan News)
)

type Article struct {
	Title   string `json:"title"`
	Link    string `json:"link"`
	Id      string `json:"id"`
	MongoId string `json:"_id"`
}

type News struct {
	Source   string
	Articles []*Article `json:"articles"`
}

type Function struct {
	f       func(news chan<- News)
	channel chan News
}

func main() {
	functions := []*Function{
		{f: googleNews, channel: google},
		{f: freeNews, channel: free},
	}
	quickestApiResponse(functions)
}

func quickestApiResponse(functions []*Function) {
	var articles []*Article

	for _, function := range functions {
		function.Run()
	}

	select {
	case googleNewsResponse := <-google:
		fmt.Printf("Source: %s\n", googleNewsResponse.Source)
		articles = googleNewsResponse.Articles
	case freeNewsReponse := <-free:
		fmt.Printf("Source: %s\n", freeNewsReponse.Source)
		articles = freeNewsReponse.Articles
	}

	fmt.Printf("Articles %v\n", articles)
}

func googleNews(google chan<- News) {
	req, err := http.NewRequest("GET", GOOGLE_NEWS_URL, nil)
	if err != nil {
		fmt.Printf("Error initializing request%v\n", err.Error())
		return
	}

	req.Header.Add("X-RapidAPI-Key", API_KEY)
	req.Header.Add("X-RapidAPI-Host", API_GOOGLE_NEWS_HOST)
	client := &http.Client{}
	resp, err := client.Do(req)
	if err != nil {
		fmt.Printf("Error making request %v\n", err.Error())
		return
	}
	defer resp.Body.Close()

	if resp.StatusCode != 200 {
		fmt.Printf("Google News Response StatusCode %v Status %v\n", resp.StatusCode, resp.Status)
		return
	}

	googleNewsArticles := News{Source: "GoogleNewsApi"}
	if err := json.NewDecoder(resp.Body).Decode(&googleNewsArticles); err != nil {
		fmt.Printf("Error decoding body %v\n", err.Error())
		return
	}

	fmt.Printf("Google Articles %v\n", googleNewsArticles)
	fmt.Printf("Google Articles Size %d\n", len(googleNewsArticles.Articles))
	google <- googleNewsArticles
}

func freeNews(free chan<- News) {
	req, err := http.NewRequest("GET", FREE_NEWS_URL, nil)
	if err != nil {
		fmt.Printf("Error initializing request%v\n", err.Error())
		return
	}

	req.Header.Add("X-RapidAPI-Key", API_KEY)
	req.Header.Add("X-RapidAPI-Host", API_FREE_NEWS_HOST)
	client := &http.Client{}
	resp, err := client.Do(req)
	if err != nil {
		fmt.Printf("Error making request %v\n", err.Error())
		return
	}
	defer resp.Body.Close()

	if resp.StatusCode != 200 {
		fmt.Printf("Free News Response StatusCode %v Status %v\n", resp.StatusCode, resp.Status)
		return
	}

	var freeNewsArticles News
	if err := json.NewDecoder(resp.Body).Decode(&freeNewsArticles); err != nil {
		fmt.Printf("Error decoding body %v\n", err.Error())
		return
	}

	freeNewsArticles.Source = "FreeNewsApi"
	fmt.Printf("Free Articles %v\n", freeNewsArticles)
	fmt.Printf("Free Articles Size %d\n", len(freeNewsArticles.Articles))
	free <- freeNewsArticles
}

func (a *Article) GetId() string {
	if a.Id == "" {
		return a.MongoId
	}

	return a.Id
}

func (f *Function) Run() {
	go f.f(f.channel)
}

La implementación anterior se enfoca en resaltar cómo una selección esperará hasta que se ejecute uno de sus casos.

Es importante entender las diferentes partes en este ejemplo, así que veámoslas una por una.

Antes de ver la lógica de selección, examinemos cómo se realizan las llamadas a la API.

La Functionestructura representa una sola llamada API, sus atributos son una función fque toma un canal de tipo News, observe cómo la firma de esta función ya impone que el canal será tratado como un send-onlycanal, el segundo atributo es un canal de tipo News, una vez que el Se ejecuta la llamada a la API y se analiza la respuesta; este canal se utilizará para enviar los resultados.

La Newsestructura es el objeto para contener los artículos y de qué fuente provienen.

En la línea 43, inicializamos un segmento de Function, con dos elementos, el primero tiene la googleNewsfunción y usa el googlecanal, y el segundo usa la freeNewsfunción y usa el freecanal.

Dado que ambas llamadas a la API obtendrán noticias, los canales son del mismo tipo, pero uno para cada función.

En las líneas 69 y 102, tenemos las implementaciones de estas dos API. Cada uno hace una solicitud HTTP a sus respectivas URL y analiza la respuesta, una vez hecho esto, las noticias se envían a través de sus respectivos canales.

Centrémonos ahora en el quickestApiResponsemétodo. El propósito de este método es establecer la variable del artículo en la respuesta de la API más rápida. En la línea 54, cada función se ejecuta llamando al Runmétodo. Este método inicia una nueva rutina en la función y pasa el canal. Es importante tener en cuenta que estas llamadas API deben ejecutarse en una rutina separada porque no queremos ejecutarlas secuencialmente.

Luego, la selección esperará a que el canal googleo freeenvíe una respuesta. Una vez que cualquiera de las llamadas a la API envíe la respuesta a través de su canal respectivo, la selección ejecutará el código en ese caso e ignorará el otro. Esto configurará efectivamente los artículos para la respuesta de la llamada API más rápida.

Ejecutemos el programa para ver el resultado:

Salida de respuesta del servidor API

El FreeNewsApicorrió más rápido!.

Esta lógica se puede aplicar a muchos otros casos de uso, lo que permite que el programa ejecute varias rutinas, use canales para comunicarse y use select para esperarlas.

Una cosa más que podemos implementar en este ejemplo es imponer algún tipo de tiempo de espera, si las llamadas a la API superan el límite, dejaremos los artículos vacíos. El siguiente código logra esto agregando un caso más al select.

const (
  API_MAX_TIMEOUT      = 3 * time.Second 
)

func quickestApiResponse(functions []*Function) {
	var articles []*Article

	for _, function := range functions {
		function.Run()
	}

	select {
	case googleNewsResponse := <-google:
		fmt.Printf("Source: %s\n", googleNewsResponse.Source)
		articles = googleNewsResponse.Articles
	case freeNewsReponse := <-free:
		fmt.Printf("Source: %s\n", freeNewsReponse.Source)
		articles = freeNewsReponse.Articles
         case <-time.After(API_MAX_TIMEOUT):
		fmt.Println("Time out! API calls took too long!!")
	}

	fmt.Printf("Articles %v\n", articles)
}

El time.Afterdevuelve un canal de tipo time.Timey enviará la hora actual una vez que haya pasado el tiempo especificado. Observe cómo aquí no estamos asignando el valor de este canal a una variable, esto se debe a que no nos importan los datos que enviará el canal, solo nos importa recibir la señal. Si dormimos durante tres segundos en ambas API, veremos que se ejecuta el caso de tiempo de espera y los otros dos casos se ignoran.

Tiempo de espera de respuesta del servidor API

Ejecución de procesos recurrentes

Veamos cómo podemos utilizar selectpara ejecutar un proceso recurrente. Para este programa, tendremos el siguiente escenario:

El programa debe permitirnos pasar cualquier función como el proceso recurrente cuando ese proceso debe comenzar a ejecutarse y un intervalo de tiempo entre cada ejecución.

A continuación tenemos el código inicial, echemos un vistazo:

package chaptereight

import (
	"fmt"
	"math/rand"
	"time"

	"github.com/brianvoe/gofakeit/v6"
)

type PendingUserNotifications map[int][]*Notification
type Notification struct {
	Content string
	UserId  int
}

func sendUserBatchNotificationsEmail(userId int, notifications []*Notification) {
	fmt.Printf("Sending email to user with userId %d for pending notifications %v\n", userId, notifications)
}

func handlePendingUsersNotifications(pendingNotifications PendingUserNotifications, handler func(userId int, notifications []*Notification)) {
	for userId, notifications := range pendingNotifications {
		handler(userId, notifications)
		delete(pendingNotifications, userId)
	}
}

func collectNewUsersNotifications(notifications PendingUserNotifications) {
	randomNotifications := getRandomNotifications()
	if len(randomNotifications) > 0 {
		notifications[randomNotifications[0].UserId] = randomNotifications
	}
}

func getRandomNotifications() (notifications []*Notification) {
	rand.Seed(time.Now().UnixNano())
	userId := rand.Intn(100-10+1) + 10
	numOfNotifications := rand.Intn(5-0+1) + 0
	fmt.Printf("numOfNotifications %v\n", numOfNotifications)
	for i := 0; i < numOfNotifications; i++ {
		notifications = append(notifications, &Notification{Content: gofakeit.Paragraph(1, 2, 10, " "), UserId: userId})
	}

	return
}

El código anterior refleja la tarea que queremos ejecutar. Tenemos dos funciones principales collectNewUsersNotificationsy handlePendingUsersNotifications. La primera está destinada a recopilar todas las notificaciones de nuevos usuarios, la implementación ideal sería que esta función busque notificaciones no leídas en una base de datos, pero por el bien de este ejemplo, estamos simulando recibir notificaciones aleatorias para ciertos usuarios.

Las notificaciones se crean utilizando la Notificationestructura con solo dos campos, uno para el contenido y otro para la identificación del usuario.

La función de recopilación utiliza el PendingUserNotificationstipo para almacenar las notificaciones. Este tipo es un mapa donde la clave es un número entero que representa la identificación del usuario y el valor es una porción de Notification.

Después de recopilar todas las notificaciones, queremos usar handlePendingUserNotificationsla función para iterar sobre las notificaciones y ejecutar una función de controlador en cada una de ellas. Después de que procesamos las notificaciones de cada usuario, se eliminan del mapa. El controlador que usaremos en este caso es el sendUserBatchNotificationsEmail. Su finalidad es enviar un email al usuario con todas las notificaciones pendientes para que pueda echar un vistazo.

Centrémonos ahora en cómo ejecutar esta tarea de manera recurrente usando select. Como mencioné anteriormente, tenemos que considerar lo siguiente:

  • Permitir pasar un intervalo de tiempo
  • Permitir pasar la hora de inicio del proceso
  • Permita que la persona que llama cancele/detenga el proceso recurrente cuando lo desee

El siguiente código muestra cómo lograr esto:

package main

import (
	"fmt"
	"math/rand"
	"time"

	"github.com/brianvoe/gofakeit/v6"
)

type PendingUserNotifications map[int][]*Notification
type ProcessHandler func()
type Notification struct {
	Content string
	UserId  int
}
type RecurringProcess struct {
	name      string
	interval  time.Duration
	startTime time.Time
	handler   func()
	stop      chan struct{}
}

func main() {
	pendingNotificationsProcess()
}

func pendingNotificationsProcess() {
	process := &RecurringProcess{}
	notifications := PendingUserNotifications{}
	handler := func() {
		collectNewUsersNotifications(notifications)
		handlePendingUsersNotifications(notifications, sendUserBatchNotificationsEmail, process)
	}
	interval := 10 * time.Second
	startTime := time.Now().Add(3 * time.Minute)
	process = createRecurringProcess("Pending User Notifications", handler, interval, startTime)

	<-process.stop
}

func sendUserBatchNotificationsEmail(userId int, notifications []*Notification) {
	fmt.Printf("Sending email to user with userId %d for pending notifications %v\n", userId, notifications)
}

func handlePendingUsersNotifications(pendingNotifications PendingUserNotifications, handler func(userId int, notifications []*Notification), process *RecurringProcess) {
	userNotificationCount := 0
	for userId, notifications := range pendingNotifications {
		userNotificationCount++
		handler(userId, notifications)
		delete(pendingNotifications, userId)
	}

	if userNotificationCount == 0 {
		process.Cancel()
	}
}

func collectNewUsersNotifications(notifications PendingUserNotifications) {
	randomNotifications := getRandomNotifications()
	if len(randomNotifications) > 0 {
		notifications[randomNotifications[0].UserId] = randomNotifications
	}
}

func getRandomNotifications() (notifications []*Notification) {
	rand.Seed(time.Now().UnixNano())
	userId := rand.Intn(100-10+1) + 10
	numOfNotifications := rand.Intn(5-0+1) + 0
	fmt.Printf("numOfNotifications %v\n", numOfNotifications)
	for i := 0; i < numOfNotifications; i++ {
		notifications = append(notifications, &Notification{Content: gofakeit.Paragraph(1, 2, 10, " "), UserId: userId})
	}

	return
}

func createRecurringProcess(name string, handler ProcessHandler, interval time.Duration, startTime time.Time) *RecurringProcess {
	process := &RecurringProcess{
		name:      name,
		interval:  interval,
		startTime: startTime,
		handler:   handler,
		stop:      make(chan struct{}),
	}

	go process.Start()

	return process
}

func (p *RecurringProcess) Start() {
	startTicker := &time.Timer{}
	ticker := &time.Ticker{C: nil}
	defer func() { ticker.Stop() }()

	if p.startTime.Before(time.Now()) {
		p.startTime = time.Now()
	}
	startTicker = time.NewTimer(time.Until(p.startTime))

	for {
		select {
		case <-startTicker.C:
			ticker = time.NewTicker(p.interval)
			fmt.Println("Starting recurring process")
			p.handler()
		case <-ticker.C:
			fmt.Println("Next run")
			p.handler()
		case <-p.stop:
			fmt.Println("Stoping recurring process")
			return
		}
	}
}

func (p *RecurringProcess) Cancel() {
	close(p.stop)
}

Introdujimos una nueva estructura para representar un proceso recurrente RecurringProcess. Esta estructura contiene los siguientes campos:

  • name - El nombre del proceso
  • interval — El tiempo de intervalo entre cada ejecución
  • startTime — La hora en que se iniciará el proceso.
  • handler — Una función de controlador para llamar en cada ejecución
  • stop — Un canal para detener el proceso

En pendingNotificationsProcessfunción, inicializamos un nuevo proceso recurrente y las notificaciones en las líneas 30 y 31 respectivamente. La función de controlador que usaremos es una función que tiene ambas funciones collectNewUsersNotificationsy handlePendingUsersNotificationsdentro. Observe aquí que estamos pasando el proceso a handlePendingUsersNotificationsporque será necesario para detener el proceso.

También especificamos el intervalo y la hora de inicio.

Luego llamamos createRecurringProcess, esta función crea el proceso recurrente y también lo inicia. Centrémonos en la línea 88, donde estamos usando una gorutina para iniciar el proceso.

En la línea 40, bloqueamos la rutina principal leyendo desde el canal de parada, lo que significa que la rutina principal se bloqueará hasta que se envíe un mensaje a este canal.

Echemos un vistazo a la Startfunción en la línea 93 que contiene toda la lógica para ejecutar el proceso recurrente.

Esta función usa la startTickervariable para iniciar el proceso recurrente usando la hora de inicio. Si la hora de inicio ya pasó, el proceso comenzará de inmediato.

El time.NewTimerenviará la hora actual en su canal cuando haya pasado la duración especificada, y esto nos permitirá iniciar el proceso. Es por esto que tenemos el primer caso de la selección esperando que el canal reciba la señal.

También tenemos en la línea 95 una tickervariable que es un time.Ticker. Un ticker en marcha enviará ticks en su canal en el intervalo especificado. Una vez que el startTicker.Ccanal envía la señal, asignamos un nuevo ticker con el intervalo a la tickervariable en la línea 106 y también llamamos a la función de controlador.

Después de esto, tickercomenzará a recibir marcas en el segundo caso seleccionado, y cada vez que reciba una, también se llamará a la función del controlador.

En el último caso de la selección, esperamos hasta que se envíe una señal para detener el proceso simplemente regresando.

Observe cómo está dentro del bucle selectinfinito . forEsto se debe a que queremos mantener el bucle hasta que uno de los casos rompa explícitamente el bucle. Cada vez que recibamos un tick, se ejecutará el segundo caso y luego volverá a entrar en el mismo bucle donde el select volverá a esperar a que se ejecuten algunos de sus casos.

Para detener el proceso agregamos algo de lógica en la línea 55, contamos el número de notificaciones y si en algún momento no había notificaciones pendientes, el programa cancela el proceso. La Cancelfunción cierra el canal de parada y esto hará que el programa termine.

Ejecutemos el programa para ver cómo funciona:

Salida del programa

Genial, el programa funciona como se esperaba. Este es solo un ejemplo de cómo ejecutar un proceso recurrente. Este puede ser el código base para implementar algo más complejo. Puede crear programas complejos con select.

Conclusión

La creación de programas simultáneos puede ser un desafío al principio, especialmente si tiene dificultades para comprender cómo funcionan las rutinas, los canales y la selección.

Espero que con este artículo se sienta menos confundido y haya encontrado algunos casos de uso en los que puede usar select.

Gracias por leer, y estén atentos para más. 

Esta historia se publicó originalmente en https://betterprogramming.pub/concurrency-with-select-goroutines-and-channels-9786e0c6be3c

#concurrency #go 

Dominar La Simultaneidad En Go: Con Select, Goroutines Y Channels