1677907920
Concurrent computing in Julia with actors.
Actors
implements the Actor Model of computation:
An actor ... in response to a message it receives, can concurrently:
- send a finite number of messages to other actors;
- create a finite number of new actors;
- designate the behavior to be used for the next message it receives.
Actors
make(s) concurrency easy to understand and reason about and integrate(s) well with Julia's multi-threading and distributed computing. It provides an API for writing reactive applications, that are:
The following example defines two behavior functions: greet
and hello
and spawns two actors with them. sayhello
will forward a message to greeter
, get a greeting string back and deliver it as a result:
julia> using Actors
julia> import Actors: spawn
julia> greet(greeting, msg) = greeting*", "*msg*"!" # a greetings server
greet (generic function with 1 method)
julia> hello(greeter, to) = request(greeter, to) # a greetings client
hello (generic function with 1 method)
julia> greeter = spawn(greet, "Hello") # start the server with a greet string
Link{Channel{Any}}(Channel{Any}(sz_max:32,sz_curr:0), 1, :default)
julia> sayhello = spawn(hello, greeter) # start the client with a link to the server
Link{Channel{Any}}(Channel{Any}(sz_max:32,sz_curr:0), 1, :default)
julia> request(sayhello, "World") # request the client
"Hello, World!"
julia> request(sayhello, "Kermit")
"Hello, Kermit!"
Please look into the manual for more information and more serious examples.
Actors
is part of the Julia GitHub group JuliaActors. Please join!
Author: JuliaActors
Source Code: https://github.com/JuliaActors/Actors.jl
License: MIT license
1672334220
AMPHP is a collection of event-driven libraries for PHP designed with fibers and concurrency in mind. amphp/sync
specifically provides synchronization primitives such as locks and semaphores for asynchronous and concurrent programming.
This package can be installed as a Composer dependency.
composer require amphp/sync
The weak link when managing concurrency is humans; so amphp/sync
provides abstractions to hide some complexity.
Mutual exclusion can be achieved using Amp\Sync\synchronized()
and any Mutex
implementation, or by manually using the Mutex
instance to acquire a Lock
.
As long as the resulting Lock
object isn't released using Lock::release()
or by being garbage collected, the holder of the lock can exclusively run some code as long as all other parties running the same code also acquire a lock before doing so.
function writeExclusively(Amp\Sync\Mutex $mutex, string $filePath, string $data) {
$lock = $mutex->acquire();
try {
Amp\File\write($filePath, $data);
} finally {
$lock->release();
}
}
function writeExclusively(Amp\Sync\Mutex $mutex, string $filePath, string $data) {
Amp\Sync\synchronized($mutex, fn () => Amp\File\write($filePath, $data));
}
Semaphores are another synchronization primitive in addition to mutual exclusion.
Instead of providing exclusive access to a single party, they provide access to a limited set of N parties at the same time. This makes them great to control concurrency, e.g. limiting an HTTP client to X concurrent requests, so the HTTP server doesn't get overwhelmed.
Similar to Mutex
, Lock
instances can be acquired using Semaphore::acquire()
. Please refer to the Mutex
documentation for additional usage documentation, as they're basically equivalent except for the fact that Mutex
is always a Semaphore
with a count of exactly one party.
In many cases you can use amphp/pipeline
instead of directly using a Semaphore
.
Given you have a list of URLs you want to crawl, let's discuss a few possible approaches. For simplicity, we will assume a fetch
function already exists, which takes a URL and returns the HTTP status code (which is everything we want to know for these examples).
Simple loop using non-blocking I/O, but no concurrency while fetching the individual URLs; starts the second request as soon as the first completed.
$urls = [...];
$results = [];
foreach ($urls as $url) {
$results[$url] = fetch($url);
}
var_dump($results);
Almost the same loop, but awaiting all operations at once; starts all requests immediately. Might not be feasible with too many URLs.
$urls = [...];
$results = [];
foreach ($urls as $url) {
$results[$url] = Amp\async(fetch(...), $url);
}
$results = Amp\Future\await($results);
var_dump($results);
Splitting the jobs into chunks of ten; all requests within a chunk are made concurrently, but each chunk sequentially, so the timing for each chunk depends on the slowest response; starts the eleventh request as soon as the first ten requests completed.
$urls = [...];
$results = [];
foreach (\array_chunk($urls, 10) as $chunk) {
$futures = [];
foreach ($chunk as $url) {
$futures[$url] = Amp\async(fetch(...), $url);
}
$results = \array_merge($results, Amp\Future\await($futures));
}
var_dump($results);
TODO: Link to example of amphp/pipeline
amphp/sync
follows the semver semantic versioning specification like all other amphp
packages.
If you discover any security related issues, please email me@kelunik.com
instead of using the issue tracker.
Author: amphp
Source Code: https://github.com/amphp/sync
License: MIT license
1672299240
amphp/parallel
provides true parallel processing for PHP using multiple processes or native threads, without blocking and no extensions required.
To be as flexible as possible, this library comes with a collection of non-blocking concurrency tools that can be used independently as needed, as well as an "opinionated" worker API that allows you to assign units of work to a pool of worker threads or processes.
This package can be installed as a Composer dependency.
composer require amphp/parallel
The basic usage of this library is to submit blocking tasks to be executed by a worker pool in order to avoid blocking the main event loop.
<?php
require __DIR__ . '/../vendor/autoload.php';
use Amp\Parallel\Worker;
use Amp\Promise;
$urls = [
'https://secure.php.net',
'https://amphp.org',
'https://github.com',
];
$promises = [];
foreach ($urls as $url) {
$promises[$url] = Worker\enqueueCallable('file_get_contents', $url);
}
$responses = Promise\wait(Promise\all($promises));
foreach ($responses as $url => $response) {
\printf("Read %d bytes from %s\n", \strlen($response), $url);
}
file_get_contents
is just used as an example for a blocking function here. If you just want to fetch multiple HTTP resources concurrently, it's better to use amphp/http-client
, our non-blocking HTTP client.
The functions you call must be predefined or autoloadable by Composer so they also exist in the worker processes. Instead of simple callables, you can also enqueue Task
instances with Amp\Parallel\Worker\enqueue()
.
Documentation can be found on amphp.org/parallel as well as in the ./docs
directory.
amphp/parallel
follows the semver semantic versioning specification like all other amphp
packages.
If you discover any security related issues, please email me@kelunik.com
instead of using the issue tracker.
Want to hack on the source? A Vagrant box is provided with the repository to give a common development environment for running concurrent threads and processes, and comes with a bunch of handy tools and scripts for testing and experimentation.
Starting up and logging into the virtual machine is as simple as
vagrant up && vagrant ssh
Once inside the VM, you can install PHP extensions with Pickle, switch versions with newphp VERSION
, and test for memory leaks with Valgrind.
Author: amphp
Source Code: https://github.com/amphp/parallel
License: MIT license
1667500080
As explained here and here, the map
type in Go doesn't support concurrent reads and writes. concurrent-map
provides a high-performance solution to this by sharding the map with minimal time spent waiting for locks.
Prior to Go 1.9, there was no concurrent map implementation in the stdlib. In Go 1.9, sync.Map
was introduced. The new sync.Map
has a few key differences from this map. The stdlib sync.Map
is designed for append-only scenarios. So if you want to use the map for something more like in-memory db, you might benefit from using our version. You can read more about it in the golang repo, for example here and here
Import the package:
import (
"github.com/orcaman/concurrent-map/v2"
)
go get "github.com/orcaman/concurrent-map/v2"
The package is now imported under the "cmap" namespace.
// Create a new map.
m := cmap.New[string]()
// Sets item within map, sets "bar" under key "foo"
m.Set("foo", "bar")
// Retrieve item from map.
bar, ok := m.Get("foo")
// Removes item under key "foo"
m.Remove("foo")
For more examples have a look at concurrent_map_test.go.
Running tests:
go test "github.com/orcaman/concurrent-map/v2"
Contributions are highly welcome. In order for a contribution to be merged, please follow these guidelines:
concurrent-map
as simple as possible and as similar to the native map
. Please keep this in mind when opening issues.Author: orcaman
Source Code: https://github.com/orcaman/concurrent-map
License: MIT license
1667335500
This package provides a class to crawl links on a website. Under the hood Guzzle promises are used to crawl multiple urls concurrently.
Because the crawler can execute JavaScript, it can crawl JavaScript rendered sites. Under the hood Chrome and Puppeteer are used to power this feature.
This package can be installed via Composer:
composer require spatie/crawler
The crawler can be instantiated like this
use Spatie\Crawler\Crawler;
Crawler::create()
->setCrawlObserver(<class that extends \Spatie\Crawler\CrawlObservers\CrawlObserver>)
->startCrawling($url);
The argument passed to setCrawlObserver
must be an object that extends the \Spatie\Crawler\CrawlObservers\CrawlObserver
abstract class:
namespace Spatie\Crawler\CrawlObservers;
use GuzzleHttp\Exception\RequestException;
use Psr\Http\Message\ResponseInterface;
use Psr\Http\Message\UriInterface;
abstract class CrawlObserver
{
/**
* Called when the crawler will crawl the url.
*
* @param \Psr\Http\Message\UriInterface $url
*/
public function willCrawl(UriInterface $url): void
{
}
/**
* Called when the crawler has crawled the given url successfully.
*
* @param \Psr\Http\Message\UriInterface $url
* @param \Psr\Http\Message\ResponseInterface $response
* @param \Psr\Http\Message\UriInterface|null $foundOnUrl
*/
abstract public function crawled(
UriInterface $url,
ResponseInterface $response,
?UriInterface $foundOnUrl = null
): void;
/**
* Called when the crawler had a problem crawling the given url.
*
* @param \Psr\Http\Message\UriInterface $url
* @param \GuzzleHttp\Exception\RequestException $requestException
* @param \Psr\Http\Message\UriInterface|null $foundOnUrl
*/
abstract public function crawlFailed(
UriInterface $url,
RequestException $requestException,
?UriInterface $foundOnUrl = null
): void;
/**
* Called when the crawl has ended.
*/
public function finishedCrawling(): void
{
}
}
You can set multiple observers with setCrawlObservers
:
Crawler::create()
->setCrawlObservers([
<class that extends \Spatie\Crawler\CrawlObservers\CrawlObserver>,
<class that extends \Spatie\Crawler\CrawlObservers\CrawlObserver>,
...
])
->startCrawling($url);
Alternatively you can set multiple observers one by one with addCrawlObserver
:
Crawler::create()
->addCrawlObserver(<class that extends \Spatie\Crawler\CrawlObservers\CrawlObserver>)
->addCrawlObserver(<class that extends \Spatie\Crawler\CrawlObservers\CrawlObserver>)
->addCrawlObserver(<class that extends \Spatie\Crawler\CrawlObservers\CrawlObserver>)
->startCrawling($url);
By default, the crawler will not execute JavaScript. This is how you can enable the execution of JavaScript:
Crawler::create()
->executeJavaScript()
...
In order to make it possible to get the body html after the javascript has been executed, this package depends on our Browsershot package. This package uses Puppeteer under the hood. Here are some pointers on how to install it on your system.
Browsershot will make an educated guess as to where its dependencies are installed on your system. By default, the Crawler will instantiate a new Browsershot instance. You may find the need to set a custom created instance using the setBrowsershot(Browsershot $browsershot)
method.
Crawler::create()
->setBrowsershot($browsershot)
->executeJavaScript()
...
Note that the crawler will still work even if you don't have the system dependencies required by Browsershot. These system dependencies are only required if you're calling executeJavaScript()
.
You can tell the crawler not to visit certain urls by using the setCrawlProfile
-function. That function expects an object that extends Spatie\Crawler\CrawlProfiles\CrawlProfile
:
/*
* Determine if the given url should be crawled.
*/
public function shouldCrawl(UriInterface $url): bool;
This package comes with three CrawlProfiles
out of the box:
CrawlAllUrls
: this profile will crawl all urls on all pages including urls to an external site.CrawlInternalUrls
: this profile will only crawl the internal urls on the pages of a host.CrawlSubdomains
: this profile will only crawl the internal urls and its subdomains on the pages of a host.By default, the crawler will respect robots data. It is possible to disable these checks like so:
Crawler::create()
->ignoreRobots()
...
Robots data can come from either a robots.txt
file, meta tags or response headers. More information on the spec can be found here: http://www.robotstxt.org/.
Parsing robots data is done by our package spatie/robots-txt.
By default, the crawler will reject all links containing attribute rel="nofollow". It is possible to disable these checks like so:
Crawler::create()
->acceptNofollowLinks()
...
In order to respect robots.txt rules for a custom User Agent you can specify your own custom User Agent.
Crawler::create()
->setUserAgent('my-agent')
You can add your specific crawl rule group for 'my-agent' in robots.txt. This example disallows crawling the entire site for crawlers identified by 'my-agent'.
// Disallow crawling for my-agent
User-agent: my-agent
Disallow: /
To improve the speed of the crawl the package concurrently crawls 10 urls by default. If you want to change that number you can use the setConcurrency
method.
Crawler::create()
->setConcurrency(1) // now all urls will be crawled one by one
By default, the crawler continues until it has crawled every page it can find. This behavior might cause issues if you are working in an environment with limitations such as a serverless environment.
The crawl behavior can be controlled with the following two options:
setTotalCrawlLimit
): This limit defines the maximal count of URLs to crawl.setCurrentCrawlLimit
): This defines how many URLs are processed during the current crawl.Let's take a look at some examples to clarify the difference between these two methods.
The setTotalCrawlLimit
method allows to limit the total number of URLs to crawl, no matter often you call the crawler.
$queue = <your selection/implementation of a queue>;
// Crawls 5 URLs and ends.
Crawler::create()
->setCrawlQueue($queue)
->setTotalCrawlLimit(5)
->startCrawling($url);
// Doesn't crawl further as the total limit is reached.
Crawler::create()
->setCrawlQueue($queue)
->setTotalCrawlLimit(5)
->startCrawling($url);
The setCurrentCrawlLimit
will set a limit on how many URls will be crawled per execution. This piece of code will process 5 pages with each execution, without a total limit of pages to crawl.
$queue = <your selection/implementation of a queue>;
// Crawls 5 URLs and ends.
Crawler::create()
->setCrawlQueue($queue)
->setCurrentCrawlLimit(5)
->startCrawling($url);
// Crawls the next 5 URLs and ends.
Crawler::create()
->setCrawlQueue($queue)
->setCurrentCrawlLimit(5)
->startCrawling($url);
Both limits can be combined to control the crawler:
$queue = <your selection/implementation of a queue>;
// Crawls 5 URLs and ends.
Crawler::create()
->setCrawlQueue($queue)
->setTotalCrawlLimit(10)
->setCurrentCrawlLimit(5)
->startCrawling($url);
// Crawls the next 5 URLs and ends.
Crawler::create()
->setCrawlQueue($queue)
->setTotalCrawlLimit(10)
->setCurrentCrawlLimit(5)
->startCrawling($url);
// Doesn't crawl further as the total limit is reached.
Crawler::create()
->setCrawlQueue($queue)
->setTotalCrawlLimit(10)
->setCurrentCrawlLimit(5)
->startCrawling($url);
You can use the setCurrentCrawlLimit
to break up long running crawls. The following example demonstrates a (simplified) approach. It's made up of an initial request and any number of follow-up requests continuing the crawl.
To start crawling across different requests, you will need to create a new queue of your selected queue-driver. Start by passing the queue-instance to the crawler. The crawler will start filling the queue as pages are processed and new URLs are discovered. Serialize and store the queue reference after the crawler has finished (using the current crawl limit).
// Create a queue using your queue-driver.
$queue = <your selection/implementation of a queue>;
// Crawl the first set of URLs
Crawler::create()
->setCrawlQueue($queue)
->setCurrentCrawlLimit(10)
->startCrawling($url);
// Serialize and store your queue
$serializedQueue = serialize($queue);
For any following requests you will need to unserialize your original queue and pass it to the crawler:
// Unserialize queue
$queue = unserialize($serializedQueue);
// Crawls the next set of URLs
Crawler::create()
->setCrawlQueue($queue)
->setCurrentCrawlLimit(10)
->startCrawling($url);
// Serialize and store your queue
$serialized_queue = serialize($queue);
The behavior is based on the information in the queue. Only if the same queue-instance is passed in the behavior works as described. When a completely new queue is passed in, the limits of previous crawls -even for the same website- won't apply.
An example with more details can be found here.
By default, the crawler continues until it has crawled every page of the supplied URL. If you want to limit the depth of the crawler you can use the setMaximumDepth
method.
Crawler::create()
->setMaximumDepth(2)
Most html pages are quite small. But the crawler could accidentally pick up on large files such as PDFs and MP3s. To keep memory usage low in such cases the crawler will only use the responses that are smaller than 2 MB. If, when streaming a response, it becomes larger than 2 MB, the crawler will stop streaming the response. An empty response body will be assumed.
You can change the maximum response size.
// let's use a 3 MB maximum.
Crawler::create()
->setMaximumResponseSize(1024 * 1024 * 3)
In some cases you might get rate-limited when crawling too aggressively. To circumvent this, you can use the setDelayBetweenRequests()
method to add a pause between every request. This value is expressed in milliseconds.
Crawler::create()
->setDelayBetweenRequests(150) // After every page crawled, the crawler will wait for 150ms
By default, every found page will be downloaded (up to setMaximumResponseSize()
in size) and parsed for additional links. You can limit which content-types should be downloaded and parsed by setting the setParseableMimeTypes()
with an array of allowed types.
Crawler::create()
->setParseableMimeTypes(['text/html', 'text/plain'])
This will prevent downloading the body of pages that have different mime types, like binary files, audio/video, ... that are unlikely to have links embedded in them. This feature mostly saves bandwidth.
When crawling a site the crawler will put urls to be crawled in a queue. By default, this queue is stored in memory using the built-in ArrayCrawlQueue
.
When a site is very large you may want to store that queue elsewhere, maybe a database. In such cases, you can write your own crawl queue.
A valid crawl queue is any class that implements the Spatie\Crawler\CrawlQueues\CrawlQueue
-interface. You can pass your custom crawl queue via the setCrawlQueue
method on the crawler.
Crawler::create()
->setCrawlQueue(<implementation of \Spatie\Crawler\CrawlQueues\CrawlQueue>)
Here
By default, the crawler will set the base url scheme to http
if none. You have the ability to change that with setDefaultScheme
.
Crawler::create()
->setDefaultScheme('https')
Please see CHANGELOG for more information what has changed recently.
Please see CONTRIBUTING for details.
First, install the Puppeteer dependency, or your tests will fail.
npm install puppeteer
To run the tests you'll have to start the included node based server first in a separate terminal window.
cd tests/server
npm install
node server.js
With the server running, you can start testing.
composer test
If you've found a bug regarding security please mail security@spatie.be instead of using the issue tracker.
You're free to use this package, but if it makes it to your production environment we highly appreciate you sending us a postcard from your hometown, mentioning which of our package(s) you are using.
Our address is: Spatie, Kruikstraat 22, 2018 Antwerp, Belgium.
We publish all received postcards on our company website.
Author: Spatie
Source Code: https://github.com/spatie/crawler
License: MIT license
1663992000
In today's post we will learn about 5 Popular Concurrency Library for Rust.
What is Concurrency?
Concurrency is the occurrence of multiple events within overlapping time frames, but not simultaneously. On a computer system, concurrency is implemented in the paradigm called concurrent computing.
The three main types of concurrent computing are threading, asynchrony, and preemptive multitasking. Each method has its own special precautions which must be taken to prevent race conditions, where multiple threads or processes access the same shared data in memory in improper order.
When working with databases, concurrency controls help make sure each transaction on the database takes place in a particular order rather than at the same time. This keeps the transactions from working at the same time, which could cause data to become incorrect or corrupt the database.
Table of contents:
Rc
/Arc
pointer types. Support for parallelism and low-level concurrency in Rust.
This crate provides a set of tools for concurrent programming:
AtomicCell
, a thread-safe mutable memory location.(no_std)AtomicConsume
, for reading from primitive atomic types with "consume" ordering.(no_std)deque
, work-stealing deques for building task schedulers.ArrayQueue
, a bounded MPMC queue that allocates a fixed-capacity buffer on construction.(alloc)SegQueue
, an unbounded MPMC queue that allocates small buffers, segments, on demand.(alloc)epoch
, an epoch-based garbage collector.(alloc)channel
, multi-producer multi-consumer channels for message passing.Parker
, a thread parking primitive.ShardedLock
, a sharded reader-writer lock with fast concurrent reads.WaitGroup
, for synchronizing the beginning or end of some computation.Backoff
, for exponential backoff in spin loops.(no_std)CachePadded
, for padding and aligning a value to the length of a cache line.(no_std)scope
, for spawning threads that borrow local variables from the stack.Features marked with (no_std) can be used in no_std
environments.
Features marked with (alloc) can be used in no_std
environments, but only if alloc
feature is enabled.
The main crossbeam
crate just re-exports tools from smaller subcrates:
crossbeam-channel
provides multi-producer multi-consumer channels for message passing.crossbeam-deque
provides work-stealing deques, which are primarily intended for building task schedulers.crossbeam-epoch
provides epoch-based garbage collection for building concurrent data structures.crossbeam-queue
provides concurrent queues that can be shared among threads.crossbeam-utils
provides atomics, synchronization primitives, scoped threads, and other utilities.There is one more experimental subcrate that is not yet included in crossbeam
:
crossbeam-skiplist
provides concurrent maps and sets based on lock-free skip lists.Add this to your Cargo.toml
:
[dependencies]
crossbeam = "0.8"
Crossbeam supports stable Rust releases going back at least six months, and every time the minimum supported Rust version is increased, a new minor version is released. Currently, the minimum supported Rust version is 1.38.
Library to abstract from Rc
/Arc
pointer types.
Archery is a rust library that offers a way to abstraction over Rc
and Arc
smart pointers. This allows you to create data structures where the pointer type is parameterizable, so you can avoid the overhead of Arc
when you don’t need to share data across threads.
In languages that supports higher-kinded polymorphism this would be simple to achieve without any library, but rust does not support that yet. To mimic higher-kinded polymorphism Archery implements the approach suggested by Joshua Liebow-Feeser in “Rust has higher kinded types already… sort of”. While other approaches exist, they seem to always offer poor ergonomics for the user.
To use Archery add the following to your Cargo.toml
:
[dependencies]
archery = "<version>"
Archery defines a SharedPointer
that receives the kind of pointer as a type parameter. This gives you a convenient and ergonomic way to abstract the pointer type away.
Declare a data structure with the pointer kind as a type parameter bounded by SharedPointerKind
:
use archery::*;
struct KeyValuePair<K, V, P: SharedPointerKind> {
pub key: SharedPointer<K, P>,
pub value: SharedPointer<V, P>,
}
impl<K, V, P: SharedPointerKind> KeyValuePair<K, V, P> {
fn new(key: K, value: V) -> KeyValuePair<K, V, P> {
KeyValuePair {
key: SharedPointer::new(key),
value: SharedPointer::new(value),
}
}
}
To use it just plug-in the kind of pointer you want:
let pair: KeyValuePair<_, _, RcK> =
KeyValuePair::new("António Variações", 1944);
assert_eq!(*pair.value, 1944);
A data parallelism library for Rust.
Rayon is a data-parallelism library for Rust. It is extremely lightweight and makes it easy to convert a sequential computation into a parallel one. It also guarantees data-race freedom. (You may also enjoy this blog post about Rayon, which gives more background and details about how it works, or this video, from the Rust Belt Rust conference.) Rayon is available on crates.io, and API Documentation is available on docs.rs.
Rayon makes it drop-dead simple to convert sequential iterators into parallel ones: usually, you just change your foo.iter()
call into foo.par_iter()
, and Rayon does the rest:
use rayon::prelude::*;
fn sum_of_squares(input: &[i32]) -> i32 {
input.par_iter() // <-- just change that!
.map(|&i| i * i)
.sum()
}
Parallel iterators take care of deciding how to divide your data into tasks; it will dynamically adapt for maximum performance. If you need more flexibility than that, Rayon also offers the join and scope functions, which let you create parallel tasks on your own. For even more control, you can create custom threadpools rather than using Rayon's default, global threadpool.
Rayon is available on crates.io. The recommended way to use it is to add a line into your Cargo.toml such as:
[dependencies]
rayon = "1.5"
To use the Parallel Iterator APIs, a number of traits have to be in scope. The easiest way to bring those things into scope is to use the Rayon prelude. In each module where you would like to use the parallel iterator APIs, just add:
use rayon::prelude::*;
Rayon currently requires rustc 1.46.0
or greater.
Rayon can work on the Web via WebAssembly, but requires an adapter and some project configuration to account for differences between WebAssembly threads and threads on the other platforms.
Check out wasm-bindgen-rayon docs for more details.
Rayon is an open source project! If you'd like to contribute to Rayon, check out the list of "help wanted" issues. These are all (or should be) issues that are suitable for getting started, and they generally include a detailed set of instructions for what to do. Please ask questions if anything is unclear! Also, check out the Guide to Development page on the wiki. Note that all code submitted in PRs to Rayon is assumed to be licensed under Rayon's dual MIT/Apache2 licensing.
To see Rayon in action, check out the rayon-demo
directory, which includes a number of demos of code using Rayon. For example, run this command to get a visualization of an nbody simulation. To see the effect of using Rayon, press s
to run sequentially and p
to run in parallel.
> cd rayon-demo
> cargo run --release -- nbody visualize
For more information on demos, try:
> cd rayon-demo
> cargo run --release -- --help
Coroutine Library in Rust.
[dependencies]
coroutine = "0.8"
Basic usage of Coroutine
extern crate coroutine;
use std::usize;
use coroutine::asymmetric::Coroutine;
fn main() {
let coro: Coroutine<i32> = Coroutine::spawn(|me,_| {
for num in 0..10 {
me.yield_with(num);
}
usize::MAX
});
for num in coro {
println!("{}", num.unwrap());
}
}
This program will print the following to the console
0
1
2
3
4
5
6
7
8
9
18446744073709551615
For more detail, please run cargo doc --open
.
Basic single threaded coroutine support
Asymmetric Coroutines
Symmetric Coroutines
Thread-safe: can only resume a coroutine in one thread simultaneously
Basically it supports arm, i686, mips, mipsel and x86_64 platforms, but we have only tested in
OS X 10.10.*, x86_64, nightly
ArchLinux, x86_64, nightly
Coroutine I/O for Rust.
Coroutine scheduling with work-stealing algorithm.
WARN: Possibly crash because of TLS inline, check #56 for more detail!
Note: You must use Nightly Rust to build this Project.
[dependencies.coio]
git = "https://github.com/zonyitoo/coio-rs.git"
extern crate coio;
use coio::Scheduler;
fn main() {
Scheduler::new()
.run(|| {
for _ in 0..10 {
println!("Heil Hydra");
Scheduler::sched(); // Yields the current coroutine
}
})
.unwrap();
}
extern crate coio;
use std::io::{Read, Write};
use coio::net::TcpListener;
use coio::{spawn, Scheduler};
fn main() {
// Spawn a coroutine for accepting new connections
Scheduler::new().with_workers(4).run(move|| {
let acceptor = TcpListener::bind("127.0.0.1:8080").unwrap();
println!("Waiting for connection ...");
for stream in acceptor.incoming() {
let (mut stream, addr) = stream.unwrap();
println!("Got connection from {:?}", addr);
// Spawn a new coroutine to handle the connection
spawn(move|| {
let mut buf = [0; 1024];
loop {
match stream.read(&mut buf) {
Ok(0) => {
println!("EOF");
break;
},
Ok(len) => {
println!("Read {} bytes, echo back", len);
stream.write_all(&buf[0..len]).unwrap();
},
Err(err) => {
println!("Error occurs: {:?}", err);
break;
}
}
}
println!("Client closed");
});
}
}).unwrap();
}
Thank you for following this article.
Rust Concurrency Explained
1660227900
What are concurrency and parallelism, and how do they apply to Python?
You can find all the code examples from this article in the concurrency-parallelism-and-asyncio repo on GitHub.
Source: https://testdriven.io
1660220640
Concurrency và Parallelism là gì và chúng áp dụng cho Python như thế nào?
Có nhiều lý do khiến ứng dụng của bạn có thể bị chậm. Đôi khi điều này là do thiết kế thuật toán kém hoặc lựa chọn sai cấu trúc dữ liệu. Tuy nhiên, đôi khi, đó là do các lực lượng nằm ngoài tầm kiểm soát của chúng tôi, chẳng hạn như các hạn chế về phần cứng hoặc các trục trặc của mạng. Đó là nơi mà tính đồng thời và song song phù hợp. Chúng cho phép các chương trình của bạn thực hiện nhiều việc cùng một lúc, cùng một lúc hoặc bằng cách lãng phí ít thời gian nhất có thể cho việc chờ đợi các tác vụ bận rộn.
Cho dù bạn đang xử lý các tài nguyên web bên ngoài, đọc và ghi vào nhiều tệp hoặc cần sử dụng một hàm tính toán chuyên sâu nhiều lần với các tham số khác nhau, bài viết này sẽ giúp bạn tối đa hóa hiệu quả và tốc độ mã của mình.
Đầu tiên, chúng ta sẽ đi sâu vào vấn đề đồng thời và song song là gì và cách chúng phù hợp với lĩnh vực Python bằng cách sử dụng các thư viện tiêu chuẩn như phân luồng, đa xử lý và asyncio. Phần cuối cùng của bài viết này sẽ so sánh việc triển khai async
/ của Python await
với cách các ngôn ngữ khác đã triển khai chúng.
Bạn có thể tìm thấy tất cả các ví dụ mã từ bài viết này trong repo đồng thời-song song-và-asyncio trên GitHub.
Để làm việc với các ví dụ trong bài viết này, bạn nên biết cách làm việc với các yêu cầu HTTP.
Đến cuối bài viết này, bạn sẽ có thể trả lời các câu hỏi sau:
Đồng thời là gì?
Một định nghĩa hiệu quả cho đồng thời là "có thể thực hiện nhiều tác vụ cùng một lúc". Tuy nhiên, điều này có một chút sai lầm, vì các tác vụ có thể thực sự được thực hiện cùng một lúc. Thay vào đó, một quá trình có thể bắt đầu, sau đó khi nó đang đợi một hướng dẫn cụ thể kết thúc, hãy chuyển sang một nhiệm vụ mới, chỉ để quay lại khi nó không còn chờ đợi nữa. Khi một nhiệm vụ hoàn thành, nó lại chuyển sang một nhiệm vụ chưa hoàn thành cho đến khi tất cả chúng đã được thực hiện. Các tác vụ bắt đầu không đồng bộ, được thực hiện không đồng bộ và sau đó kết thúc không đồng bộ.
Nếu điều đó khiến bạn bối rối, thay vào đó hãy nghĩ đến một phép tương tự: Giả sử bạn muốn tạo BLT . Đầu tiên, bạn sẽ cho thịt xông khói vào chảo ở lửa vừa và nhỏ. Trong khi nấu thịt xông khói, bạn có thể lấy cà chua và rau diếp ra và bắt đầu sơ chế (rửa và cắt) chúng. Trong khi đó, bạn tiếp tục kiểm tra và thỉnh thoảng lật thịt xông khói của bạn.
Tại thời điểm này, bạn đã bắt đầu một nhiệm vụ, sau đó bắt đầu và hoàn thành hai nhiệm vụ khác trong thời gian chờ đợi, tất cả trong khi bạn vẫn đang chờ đợi nhiệm vụ đầu tiên.
Cuối cùng, bạn cho bánh mì của mình vào máy nướng bánh mì. Trong khi nướng, bạn tiếp tục kiểm tra thịt xông khói của mình. Khi các miếng hoàn thành, bạn kéo chúng ra và đặt chúng vào đĩa. Sau khi nướng xong bánh mì, bạn phết lên đó lớp bánh mì sandwich mà bạn lựa chọn, sau đó bạn có thể bắt đầu xếp lớp trên cà chua, rau diếp và sau đó, khi đã nấu xong là thịt xông khói của bạn. Chỉ khi mọi thứ đã được nấu chín, chuẩn bị và xếp lớp, bạn mới có thể đặt miếng bánh mì nướng cuối cùng lên bánh mì sandwich, cắt miếng (tùy chọn) và ăn.
Vì nó đòi hỏi bạn phải thực hiện nhiều nhiệm vụ cùng một lúc, nên việc tạo BLT vốn dĩ là một quá trình đồng thời, ngay cả khi bạn không tập trung toàn bộ vào từng nhiệm vụ đó cùng một lúc. Đối với tất cả các ý định và mục đích, trong phần tiếp theo, chúng tôi sẽ đề cập đến hình thức đồng thời này chỉ là "đồng thời". Chúng ta sẽ phân biệt nó ở phần sau trong bài viết này.
Vì lý do này, đồng thời là rất tốt cho các quy trình I / O chuyên sâu - các tác vụ liên quan đến việc chờ đợi các yêu cầu web hoặc thao tác đọc / ghi tệp.
Trong Python, có một số cách khác nhau để đạt được sự đồng thời. Đầu tiên chúng ta sẽ xem xét thư viện luồng.
Đối với các ví dụ của chúng tôi trong phần này, chúng tôi sẽ xây dựng một chương trình Python nhỏ để lấy một thể loại nhạc ngẫu nhiên từ API Genrenator của Binary Jazz năm lần, in thể loại đó ra màn hình và đưa mỗi thể loại vào tệp riêng của nó.
Để làm việc với phân luồng trong Python, bạn cần nhập duy nhất threading
, nhưng đối với ví dụ này, tôi cũng đã nhập urllib
để làm việc với các yêu cầu HTTP, time
để xác định thời gian hoàn thành các hàm và json
dễ dàng chuyển đổi dữ liệu json được trả về từ API Genrenator.
Bạn có thể tìm thấy mã cho ví dụ này ở đây .
Hãy bắt đầu với một hàm đơn giản:
def write_genre(file_name):
"""
Uses genrenator from binaryjazz.us to write a random genre to the
name of the given file
"""
req = Request("https://binaryjazz.us/wp-json/genrenator/v1/genre/", headers={"User-Agent": "Mozilla/5.0"})
genre = json.load(urlopen(req))
with open(file_name, "w") as new_file:
print(f"Writing '{genre}' to '{file_name}'...")
new_file.write(genre)
Kiểm tra mã ở trên, chúng tôi đang thực hiện một yêu cầu đối với API Genrenator, tải phản hồi JSON của nó (một thể loại nhạc ngẫu nhiên), in nó, sau đó ghi nó vào một tệp.
Nếu không có tiêu đề "Tác nhân người dùng", bạn sẽ nhận được 304.
Điều chúng tôi thực sự quan tâm là phần tiếp theo, nơi diễn ra luồng thực tế:
threads = []
for i in range(5):
thread = threading.Thread(
target=write_genre,
args=[f"./threading/new_file{i}.txt"]
)
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
Đầu tiên chúng ta bắt đầu với một danh sách. Sau đó, chúng tôi tiến hành lặp lại năm lần, mỗi lần tạo một chuỗi mới. Tiếp theo, chúng tôi bắt đầu từng chuỗi, thêm nó vào danh sách "chuỗi" của chúng tôi, sau đó lặp lại danh sách của chúng tôi lần cuối để tham gia từng chuỗi.
Giải thích: Tạo chủ đề trong Python rất dễ dàng.
Để tạo một chủ đề mới, hãy sử dụng threading.Thread()
. Bạn có thể chuyển vào nó kwarg (đối số từ khóa) target
với giá trị của bất kỳ hàm nào bạn muốn chạy trên chuỗi đó. Nhưng chỉ chuyển vào tên của hàm chứ không phải giá trị của nó (ý nghĩa, cho mục đích của chúng tôi write_genre
và không phải write_genre()
). Để chuyển các đối số, hãy chuyển vào "kwargs" (lấy một chính tả của kwargs của bạn) hoặc "args" (lấy một có thể lặp lại chứa các args của bạn - trong trường hợp này là một danh sách).
Tuy nhiên, tạo một chuỗi không giống như bắt đầu một chuỗi. Để bắt đầu chủ đề của bạn, hãy sử dụng {the name of your thread}.start()
. Bắt đầu một luồng có nghĩa là "bắt đầu thực thi của nó."
Cuối cùng, khi chúng tôi nối các chuỗi với thread.join()
, tất cả những gì chúng tôi đang làm là đảm bảo chuỗi đã kết thúc trước khi tiếp tục với mã của chúng tôi.
Nhưng chính xác thì một chủ đề là gì?
Luồng là một cách cho phép máy tính của bạn chia nhỏ một quy trình / chương trình thành nhiều phần nhẹ thực thi song song. Hơi khó hiểu, việc triển khai phân luồng theo tiêu chuẩn của Python giới hạn các luồng chỉ có thể thực thi từng luồng một do một thứ được gọi là Global Interpreter Lock (GIL). GIL là cần thiết vì quản lý bộ nhớ của CPython (triển khai mặc định của Python) không an toàn theo luồng. Do hạn chế này, luồng trong Python là đồng thời, nhưng không song song. Để giải quyết vấn đề này, Python có một multiprocessing
mô-đun riêng biệt không bị giới hạn bởi GIL có chức năng quay các quy trình riêng biệt, cho phép thực thi song song mã của bạn. Sử dụng multiprocessing
mô-đun gần giống như sử dụng threading
mô-đun.
Có thể tìm thấy thêm thông tin về GIL của Python và độ an toàn của chuỗi trên Real Python và các tài liệu chính thức của Python .
Chúng ta sẽ sớm xem xét sâu hơn về đa xử lý trong Python.
Trước khi chúng tôi cho thấy khả năng cải thiện tốc độ so với mã không phân luồng, tôi cũng đã tự do tạo một phiên bản không phân luồng của cùng một chương trình (một lần nữa, có sẵn trên GitHub ). Thay vì tạo một luồng mới và tham gia từng luồng, thay vào đó, nó gọi write_genre
trong một vòng lặp for lặp lại năm lần.
Để so sánh các điểm chuẩn tốc độ, tôi cũng đã nhập time
thư viện để tính thời gian thực thi các tập lệnh của chúng tôi:
Starting...
Writing "binary indoremix" to "./sync/new_file0.txt"...
Writing "slavic aggro polka fusion" to "./sync/new_file1.txt"...
Writing "israeli new wave" to "./sync/new_file2.txt"...
Writing "byzantine motown" to "./sync/new_file3.txt"...
Writing "dutch hate industrialtune" to "./sync/new_file4.txt"...
Time to complete synchronous read/writes: 1.42 seconds
Upon running the script, we see that it takes my computer around 1.49 seconds (along with classic music genres such as "dutch hate industrialtune"). Not too bad.
Now let's run the version that uses threading:
Starting...
Writing "college k-dubstep" to "./threading/new_file2.txt"...
Writing "swiss dirt" to "./threading/new_file0.txt"...
Writing "bop idol alternative" to "./threading/new_file4.txt"...
Writing "ethertrio" to "./threading/new_file1.txt"...
Writing "beach aust shanty français" to "./threading/new_file3.txt"...
Time to complete threading read/writes: 0.77 seconds
The first thing that might stand out to you is the functions not being completed in order: 2 - 0 - 4 - 1 - 3
This is because of the asynchronous nature of threading: as one function waits, another one begins, and so on. Because we're able to continue performing tasks while we're waiting on others to finish (either due to networking or file I/O operations), you may also have noticed that we cut our time roughly in half: 0.77 seconds. Whereas this might not seem like a lot now, it's easy to imagine the very real case of building a web application that needs to write much more data to a file or interact with much more complex web services.
So, if threading is so great, why don't we end the article here?
Because there are even better ways to perform tasks concurrently.
Let's take a look at an example using asyncio. For this method, we're going to install aiohttp using pip
. This will allow us to make non-blocking requests and receive responses using the async
/await
syntax that will be introduced shortly. It also has the extra benefit of a function that converts a JSON response without needing to import the json
library. We'll also install and import aiofiles, which allows non-blocking file operations. Other than aiohttp
and aiofiles
, import asyncio
, which comes with the Python standard library.
"Non-blocking" means a program will allow other threads to continue running while it's waiting. This is opposed to "blocking" code, which stops execution of your program completely. Normal, synchronous I/O operations suffer from this limitation.
You can find the code for this example here.
Once we have our imports in place, let's take a look at the asynchronous version of the write_genre
function from our asyncio example:
async def write_genre(file_name):
"""
Uses genrenator from binaryjazz.us to write a random genre to the
name of the given file
"""
async with aiohttp.ClientSession() as session:
async with session.get("https://binaryjazz.us/wp-json/genrenator/v1/genre/") as response:
genre = await response.json()
async with aiofiles.open(file_name, "w") as new_file:
print(f'Writing "{genre}" to "{file_name}"...')
await new_file.write(genre)
For those not familiar with the async
/await
syntax that can be found in many other modern languages, async
declares that a function, for
loop, or with
statement must be used asynchronously. To call an async function, you must either use the await
keyword from another async function or call create_task()
directly from the event loop, which can be grabbed from asyncio.get_event_loop()
-- i.e., loop = asyncio.get_event_loop()
.
Additionally:
async with
allows awaiting async responses and file operations.async for
(not used here) iterates over an asynchronous stream.Event loops are constructs inherent to asynchronous programming that allow performing tasks asynchronously. As you're reading this article, I can safely assume you're probably not too familiar with the concept. However, even if you've never written an async application, you have experience with event loops every time you use a computer. Whether your computer is listening for keyboard input, you're playing online multiplayer games, or you're browsing Reddit while you have files copying in the background, an event loop is the driving force that keeps everything working smoothly and efficiently. In its purest essence, an event loop is a process that waits around for triggers and then performs specific (programmed) actions once those triggers are met. They often return a "promise" (JavaScript syntax) or "future" (Python syntax) of some sort to denote that a task has been added. Once the task is finished, the promise or future returns a value passed back from the called function (assuming the function does return a value).
The idea of performing a function in response to another function is called a "callback."
For another take on callbacks and events, here's a great answer on Stack Overflow.
Here's a walkthrough of our function:
We're using async with
to open our client session asynchronously. The aiohttp.ClientSession()
class is what allows us to make HTTP requests and remain connected to a source without blocking the execution of our code. We then make an async request to the Genrenator API and await the JSON response (a random music genre). In the next line, we use async with
again with the aiofiles
library to asynchronously open a new file to write our new genre to. We print the genre, then write it to the file.
Unlike regular Python scripts, programming with asyncio pretty much enforces* using some sort of "main" function.
*Unless you're using the deprecated "yield" syntax with the @asyncio.coroutine decorator, which will be removed in Python 3.10.
This is because you need to use the "async" keyword in order to use the "await" syntax, and the "await" syntax is the only way to actually run other async functions.
Here's our main function:
async def main():
tasks = []
for i in range(5):
tasks.append(write_genre(f"./async/new_file{i}.txt"))
await asyncio.gather(*tasks)
As you can see, we've declared it with "async." We then create an empty list called "tasks" to house our async tasks (calls to Genrenator and our file I/O). We append our tasks to our list, but they are not actually run yet. The calls don't actually get made until we schedule them with await asyncio.gather(*tasks)
. This runs all of the tasks in our list and waits for them to finish before continuing with the rest of our program. Lastly, we use asyncio.run(main())
to run our "main" function. The .run()
function is the entry point for our program, and it should generally only be called once per process.
For those not familiar, the
*
in front of tasks is called "argument unpacking." Just as it sounds, it unpacks our list into a series of arguments for our function. Our function isasyncio.gather()
in this case.
And that's all we need to do. Now, running our program (the source of which includes the same timing functionality of the synchronous and threading examples)...
Writing "albuquerque fiddlehaus" to "./async/new_file1.txt"...
Writing "euroreggaebop" to "./async/new_file2.txt"...
Writing "shoedisco" to "./async/new_file0.txt"...
Writing "russiagaze" to "./async/new_file4.txt"...
Writing "alternative xylophone" to "./async/new_file3.txt"...
Time to complete asyncio read/writes: 0.71 seconds
...we see it's even faster still. And, in general, the asyncio method will always be a bit faster than the threading method. This is because when we use the "await" syntax, we essentially tell our program "hold on, I'll be right back," but our program keeps track of how long it takes us to finish what we're doing. Once we're done, our program will know, and will pick back up as soon as it's able. Threading in Python allows asynchronicity, but our program could theoretically skip around different threads that may not yet be ready, wasting time if there are threads ready to continue running.
So when should I use threading, and when should I use asyncio?
When you're writing new code, use asyncio. If you need to interface with older libraries or those that don't support asyncio, you might be better off with threading.
It turns out testing async functions with pytest is as easy as testing synchronous functions. Just install the pytest-asyncio package with pip
, mark your tests with the async
keyword, and apply a decorator that lets pytest
know it's asynchronous: @pytest.mark.asyncio
. Let's look at an example.
First, let's write an arbitrary async function in a file called hello_asyncio.py:
import asyncio
async def say_hello(name: str):
""" Sleeps for two seconds, then prints 'Hello, {{ name }}!' """
try:
if type(name) != str:
raise TypeError("'name' must be a string")
if name == "":
raise ValueError("'name' cannot be empty")
except (TypeError, ValueError):
raise
print("Sleeping...")
await asyncio.sleep(2)
print(f"Hello, {name}!")
The function takes a single string argument: name
. Upon ensuring that name
is a string with a length greater than one, our function asynchronously sleeps for two seconds, then prints "Hello, {name}!"
to the console.
The difference between
asyncio.sleep()
andtime.sleep()
is thatasyncio.sleep()
is non-blocking.
Now let's test it with pytest. In the same directory as hello_asyncio.py, create a file called test_hello_asyncio.py, then open it in your favorite text editor.
Let's start with our imports:
import pytest # Note: pytest-asyncio does not require a separate import
from hello_asyncio import say_hello
Then we'll create a test with proper input:
@pytest.mark.parametrize("name", [
"Robert Paulson",
"Seven of Nine",
"x Æ a-12"
])
@pytest.mark.asyncio
async def test_say_hello(name):
await say_hello(name)
Things to note:
@pytest.mark.asyncio
decorator lets pytest work asynchronouslyasync
syntaxawait
ing our async function as we would if we were running it outside of a testNow let's run our test with the verbose -v
option:
pytest -v
...
collected 3 items
test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED [ 33%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED [ 66%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED [100%]
Looks good. Next we'll write a couple of tests with bad input. Back inside of test_hello_asyncio.py, let's create a class called TestSayHelloThrowsExceptions
:
class TestSayHelloThrowsExceptions:
@pytest.mark.parametrize("name", [
"",
])
@pytest.mark.asyncio
async def test_say_hello_value_error(self, name):
with pytest.raises(ValueError):
await say_hello(name)
@pytest.mark.parametrize("name", [
19,
{"name", "Diane"},
[]
])
@pytest.mark.asyncio
async def test_say_hello_type_error(self, name):
with pytest.raises(TypeError):
await say_hello(name)
Again, we decorate our tests with @pytest.mark.asyncio
, mark our tests with the async
syntax, then call our function with await
.
Run the tests again:
pytest -v
...
collected 7 items
test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED [ 14%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED [ 28%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED [ 42%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_value_error[] PASSED [ 57%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[19] PASSED [ 71%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name1] PASSED [ 85%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name2] PASSED [100%]
Alternatively to pytest-asyncio, you can create a pytest fixture that yields an asyncio event loop:
import asyncio
import pytest
from hello_asyncio import say_hello
@pytest.fixture
def event_loop():
loop = asyncio.get_event_loop()
yield loop
Then, rather than using the async
/await
syntax, you create your tests as you would normal, synchronous tests:
@pytest.mark.parametrize("name", [
"Robert Paulson",
"Seven of Nine",
"x Æ a-12"
])
def test_say_hello(event_loop, name):
event_loop.run_until_complete(say_hello(name))
class TestSayHelloThrowsExceptions:
@pytest.mark.parametrize("name", [
"",
])
def test_say_hello_value_error(self, event_loop, name):
with pytest.raises(ValueError):
event_loop.run_until_complete(say_hello(name))
@pytest.mark.parametrize("name", [
19,
{"name", "Diane"},
[]
])
def test_say_hello_type_error(self, event_loop, name):
with pytest.raises(TypeError):
event_loop.run_until_complete(say_hello(name))
If you're interested, here's a more advanced tutorial on asyncio testing.
If you want to learn more about what distinguishes Python's implementation of threading vs asyncio, here's a great article from Medium.
For even better examples and explanations of threading in Python, here's a video by Corey Schafer that goes more in-depth, including using the concurrent.futures
library.
Lastly, for a massive deep-dive into asyncio itself, here's an article from Real Python completely dedicated to it.
Bonus: One more library you might be interested in is called Unsync, especially if you want to easily convert your current synchronous code into asynchronous code. To use it, you install the library with pip, import it with from unsync import unsync
, then decorate whatever currently synchronous function with @unsync
to make it asynchronous. To await it and get its return value (which you can do anywhere -- it doesn't have to be in an async/unsync function), just call .result()
after the function call.
What is parallelism?
Parallelism is very-much related to concurrency. In fact, parallelism is a subset of concurrency: whereas a concurrent process performs multiple tasks at the same time whether they're being diverted total attention or not, a parallel process is physically performing multiple tasks all at the same time. A good example would be driving, listening to music, and eating the BLT we made in the last section at the same time.
Because they don't require a lot of intensive effort, you can do them all at once without having to wait on anything or divert your attention away.
Now let's take a look at how to implement this in Python. We could use the multiprocessing
library, but let's use the concurrent.futures
library instead -- it eliminates the need to manage the number of process manually. Because the major benefit of multiprocessing happens when you perform multiple cpu-heavy tasks, we're going to compute the squares of 1 million (1000000) to 1 million and 16 (1000016).
You can find the code for this example here.
The only import we'll need is concurrent.futures
:
import concurrent.futures
import time
if __name__ == "__main__":
pow_list = [i for i in range(1000000, 1000016)]
print("Starting...")
start = time.time()
with concurrent.futures.ProcessPoolExecutor() as executor:
futures = [executor.submit(pow, i, i) for i in pow_list]
for f in concurrent.futures.as_completed(futures):
print("okay")
end = time.time()
print(f"Time to complete: {round(end - start, 2)}")
Because I'm developing on a Windows machine, I'm using
if __name__ == "main"
. This is necessary because Windows does not have thefork
system call inherent to Unix systems. Because Windows doesn't have this capability, it resorts to launching a new interpreter with each process that tries to import the main module. If the main module doesn't exist, it reruns your entire program, causing recursive chaos to ensue.
So taking a look at our main function, we use a list comprehension to create a list from 1 million to 1 million and 16, we open a ProcessPoolExecutor with concurrent.futures, and we use list comprehension and ProcessPoolExecutor().submit()
to start executing our processes and throwing them into a list called "futures."
We could also use
ThreadPoolExecutor()
if we wanted to use threads instead -- concurrent.futures is versatile.
And this is where the asynchronicity comes in: The "results" list does not actually contain the results from running our functions. Instead, it contains "futures" which are similar to the JavaScript idea of "promises." In order to allow our program to continue running, we get back these futures that represent a placeholder for a value. If we try to print the future, depending on whether it's finished running or not, we'll either get back a state of "pending" or "finished." Once it's finished we can get the return value (assuming there is one) using var.result()
. In this case, our var will be "result."
We then iterate through our list of futures, but instead of printing our values, we're simply printing out "okay." This is just because of how massive the resulting calculations come out to be.
Just as before, I built a comparison script that does this synchronously. And, just as before, you can find it on GitHub.
Running our control program, which also includes functionality for timing our program, we get:
Starting...
okay
...
okay
Time to complete: 54.64
Wow. 54.64 seconds is quite a long time. Let's see if our version with multiprocessing does any better:
Starting...
okay
...
okay
Time to complete: 6.24
Our time has been significantly reduced. We're at about 1/9th of our original time.
So what would happen if we used threading for this instead?
I'm sure you can guess -- it wouldn't be much faster than doing it synchronously. In fact, it might be slower because it still takes a little time and effort to spin up new threads. But don't take my word for it, here's what we get when we replace ProcessPoolExecutor()
with ThreadPoolExecutor()
:
Starting...
okay
...
okay
Time to complete: 53.83
As I mentioned earlier, threading allows your applications to focus on new tasks while others are waiting. In this case, we're never sitting idly by. Multiprocessing, on the other hand, spins up totally new services, usually on separate CPU cores, ready to do whatever you ask it completely in tandem with whatever else your script is doing. This is why the multiprocessing version taking roughly 1/9th of the time makes sense -- I have 8 cores in my CPU.
Now that we've talked about concurrency and parallelism in Python, we can finally set the terms straight. If you're having trouble distinguishing between the terms, you can safely and accurately think of our previous definitions of "parallelism" and "concurrency" as "parallel concurrency" and "non-parallel concurrency" respectively.
Real Python has a great article on concurrency vs parallelism.
Engineer Man has a good video comparison of threading vs multiprocessing.
Corey Schafer also has a good video on multiprocessing in the same spirit as his threading video.
If you only watch one video, watch this excellent talk by Raymond Hettinger. He does an amazing job explaining the differences between multiprocessing, threading, and asyncio.
What if I need to combine many I/O operations with heavy calculations?
We can do that too. Say you need to scrape 100 web pages for a specific piece of information, and then you need to save that piece of info in a file for later. We can separate the compute power across each of our computer's cores by making each process scrape a fraction of the pages.
For this script, let's install Beautiful Soup to help us easily scrape our pages: pip install beautifulsoup4
. This time we actually have quite a few imports. Here they are, and here's why we're using them:
import asyncio # Gives us async/await
import concurrent.futures # Allows creating new processes
import time
from math import floor # Helps divide up our requests evenly across our CPU cores
from multiprocessing import cpu_count # Returns our number of CPU cores
import aiofiles # For asynchronously performing file I/O operations
import aiohttp # For asynchronously making HTTP requests
from bs4 import BeautifulSoup # For easy webpage scraping
You can find the code for this example here.
First, we're going to create an async function that makes a request to Wikipedia to get back random pages. We'll scrape each page we get back for its title using BeautifulSoup
, and then we'll append it to a given file; we'll separate each title with a tab. The function will take two arguments:
async def get_and_scrape_pages(num_pages: int, output_file: str):
"""
Makes {{ num_pages }} requests to Wikipedia to receive {{ num_pages }} random
articles, then scrapes each page for its title and appends it to {{ output_file }},
separating each title with a tab: "\\t"
#### Arguments
---
num_pages: int -
Number of random Wikipedia pages to request and scrape
output_file: str -
File to append titles to
"""
async with \
aiohttp.ClientSession() as client, \
aiofiles.open(output_file, "a+", encoding="utf-8") as f:
for _ in range(num_pages):
async with client.get("https://en.wikipedia.org/wiki/Special:Random") as response:
if response.status > 399:
# I was getting a 429 Too Many Requests at a higher volume of requests
response.raise_for_status()
page = await response.text()
soup = BeautifulSoup(page, features="html.parser")
title = soup.find("h1").text
await f.write(title + "\t")
await f.write("\n")
We're both asynchronously opening an aiohttp ClientSession
and our output file. The mode, a+
, means append to the file and create it if it doesn't already exist. Encoding our strings as utf-8 ensures we don't get an error if our titles contain international characters. If we get an error response, we'll raise it instead of continuing (at high request volumes I was getting a 429 Too Many Requests). We asynchronously get the text from our response, then we parse the title and asynchronously and append it to our file. After we append all of our titles, we append a new line: "\n".
Our next function is the function we'll start with each new process to allow running it asynchronously:
def start_scraping(num_pages: int, output_file: str, i: int):
""" Starts an async process for requesting and scraping Wikipedia pages """
print(f"Process {i} starting...")
asyncio.run(get_and_scrape_pages(num_pages, output_file))
print(f"Process {i} finished.")
Now for our main function. Let's start with some constants (and our function declaration):
def main():
NUM_PAGES = 100 # Number of pages to scrape altogether
NUM_CORES = cpu_count() # Our number of CPU cores (including logical cores)
OUTPUT_FILE = "./wiki_titles.tsv" # File to append our scraped titles to
PAGES_PER_CORE = floor(NUM_PAGES / NUM_CORES)
PAGES_FOR_FINAL_CORE = PAGES_PER_CORE + NUM_PAGES % PAGES_PER_CORE # For our final core
And now the logic:
futures = []
with concurrent.futures.ProcessPoolExecutor(NUM_CORES) as executor:
for i in range(NUM_CORES - 1):
new_future = executor.submit(
start_scraping, # Function to perform
# v Arguments v
num_pages=PAGES_PER_CORE,
output_file=OUTPUT_FILE,
i=i
)
futures.append(new_future)
futures.append(
executor.submit(
start_scraping,
PAGES_FOR_FINAL_CORE, OUTPUT_FILE, NUM_CORES-1
)
)
concurrent.futures.wait(futures)
We create an array to store our futures, then we create a ProcessPoolExecutor
, setting its max_workers
equal to our number of cores. We iterate over a range equal to our number of cores minus 1, running a new process with our start_scraping
function. We then append it our futures list. Our final core will potentially have extra work to do as it will scrape a number of pages equal to each of our other cores, but will additionally scrape a number of pages equal to the remainder that we got when dividing our total number of pages to scrape by our total number of cpu cores.
Make sure to actually run your main function:
if __name__ == "__main__":
start = time.time()
main()
print(f"Time to complete: {round(time.time() - start, 2)} seconds.")
After running the program with my 8-core CPU (along with benchmarking code):
This version (asyncio with multiprocessing):
Time to complete: 5.65 seconds.
Time to complete: 8.87 seconds.
Time to complete: 47.92 seconds.
Time to complete: 88.86 seconds.
I'm actually quite surprised to see that the improvement of asyncio with multiprocessing over just multiprocessing wasn't as great as I thought it would be.
async
/await
and similar syntax also exist in other languages, and in some of those languages, its implementation can differ drastically.
The first programming language (back in 2007) to use the async
syntax was Microsoft's F#. Whereas it doesn't exactly use await
to wait on a function call, it uses specific syntax like let!
and do!
along with proprietary Async
functions included in the System
module.
You can find more about async programming in F# on Microsoft's F# docs.
Their C# team then built upon this concept, and that's where the async
/await
keywords that we're now familiar with were born:
using System;
// Allows the "Task" return type
using System.Threading.Tasks;
public class Program
{
// Declare an async function with "async"
private static async Task<string> ReturnHello()
{
return "hello world";
}
// Main can be async -- no problem
public static async Task Main()
{
// await an async string
string result = await ReturnHello();
// Print the string we got asynchronously
Console.WriteLine(result);
}
}
We ensure that we're using System.Threading.Tasks
as it includes the Task
type, and, in general, the Task
type is needed for an async function to be awaited. The cool thing about C# is that you can make your main function asynchronous just by declaring it with async
, and you won't have any issues.
If you're interested in learning more about
async
/await
in C#, Microsoft's C# docs have a good page on it.
First introduced in ES6, the async
/await
syntax is essentially an abstraction over JavaScript promises (which are similar to Python futures). Unlike Python, however, so long as you're not awaiting, you can call an async function normally without a specific function like Python's asyncio.start()
:
// Declare a function with async
async function returnHello(){
return "hello world";
}
async function printSomething(){
// await an async string
const result = await returnHello();
// print the string we got asynchronously
console.log(result);
}
// Run our async code
printSomething();
Xem MDN để biết thêm thông tin về
async
/await
trong JavaScript .
Rust hiện cũng cho phép sử dụng cú pháp async
/ await
và nó hoạt động tương tự như Python, C # và JavaScript:
// Allows blocking synchronous code to run async code
use futures::executor::block_on;
// Declare an async function with "async"
async fn return_hello() -> String {
"hello world".to_string()
}
// Code that awaits must also be declared with "async"
async fn print_something(){
// await an async String
let result: String = return_hello().await;
// Print the string we got asynchronously
println!("{0}", result);
}
fn main() {
// Block the current synchronous execution to run our async code
block_on(print_something());
}
Để sử dụng các hàm không đồng bộ, trước tiên chúng ta phải thêm futures = "0.3"
vào Cargo.toml của mình . Sau đó, chúng tôi nhập block_on
hàm với use futures::executor::block_on
- block_on
là cần thiết để chạy hàm không đồng bộ của chúng tôi từ main
hàm đồng bộ của chúng tôi.
Bạn có thể tìm thêm thông tin về
async
/await
về Rust trong tài liệu về Rust.
Thay vì cú pháp async
/ truyền thống await
vốn có cho tất cả các ngôn ngữ trước đây mà chúng tôi đã đề cập, Go sử dụng "goroutines" và "channel". Bạn có thể nghĩ về một kênh tương tự như một tương lai của Python. Trong Go, bạn thường gửi một kênh dưới dạng đối số cho một hàm, sau đó sử dụng go
để chạy hàm đồng thời. Bất cứ khi nào bạn cần đảm bảo rằng hàm đã hoàn tất, bạn sử dụng <-
cú pháp mà bạn có thể coi là await
cú pháp phổ biến hơn. Nếu goroutine của bạn (hàm bạn đang chạy không đồng bộ) có giá trị trả về, nó có thể được lấy theo cách này.
package main
import "fmt"
// "chan" makes the return value a string channel instead of a string
func returnHello(result chan string){
// Gives our channel a value
result <- "hello world"
}
func main() {
// Creates a string channel
result := make(chan string)
// Starts execution of our goroutine
go returnHello(result)
// Awaits and prints our string
fmt.Println(<- result)
}
Để biết thêm thông tin về đồng thời trong cờ vây, hãy xem Giới thiệu về lập trình trong cờ vây của Caleb Doxsey.
Tương tự như Python, Ruby cũng có giới hạn về Khóa thông dịch viên toàn cầu. Những gì nó không có là đồng thời được tích hợp sẵn trong ngôn ngữ. Tuy nhiên, có một loại đá quý do cộng đồng tạo ra cho phép sử dụng đồng thời trong Ruby và bạn có thể tìm thấy nguồn của nó trên GitHub .
Giống như Ruby, Java không có async
/ await
cú pháp tích hợp, nhưng nó có khả năng đồng thời bằng cách sử dụng java.util.concurrent
mô-đun. Tuy nhiên, Electronic Arts đã viết một thư viện Async cho phép sử dụng await
như một phương pháp. Nó không hoàn toàn giống với Python / C # / JavaScript / Rust, nhưng nó đáng để xem xét nếu bạn là nhà phát triển Java và quan tâm đến loại chức năng này.
Mặc dù C ++ cũng không có cú pháp async
/ await
, nhưng nó có khả năng sử dụng tương lai để chạy mã đồng thời bằng cách sử dụng futures
mô-đun:
#include <iostream>
#include <string>
// Necessary for futures
#include <future>
// No async declaration needed
std::string return_hello() {
return "hello world";
}
int main ()
{
// Declares a string future
std::future<std::string> fut = std::async(return_hello);
// Awaits the result of the future
std::string result = fut.get();
// Prints the string we got asynchronously
std::cout << result << '\n';
}
Không cần khai báo một hàm với bất kỳ từ khóa nào để biểu thị liệu nó có thể và nên chạy không đồng bộ hay không. Thay vào đó, bạn khai báo tương lai ban đầu của mình bất cứ khi nào bạn cần std::future<{{ function return type }}>
và đặt nó bằng std::async()
, bao gồm tên của hàm bạn muốn thực hiện không đồng bộ cùng với bất kỳ đối số nào mà nó cần - tức là std::async(do_something, 1, 2, "string")
. Để chờ đợi giá trị của tương lai, hãy sử dụng .get()
cú pháp trên đó.
Bạn có thể tìm thấy tài liệu về không đồng bộ trong C ++ trên cplusplus.com.
Cho dù bạn đang làm việc với các hoạt động mạng hoặc tệp không đồng bộ hay bạn đang thực hiện nhiều phép tính phức tạp, có một số cách khác nhau để tối đa hóa hiệu quả mã của bạn.
Nếu bạn đang sử dụng Python, bạn có thể sử dụng asyncio
hoặc threading
để tận dụng tối đa các hoạt động I / O hoặc multiprocessing
mô-đun dành cho mã đòi hỏi nhiều CPU.
Cũng nên nhớ rằng
concurrent.futures
mô-đun có thể được sử dụng thay cho một trong haithreading
hoặcmultiprocessing
.
Nếu bạn đang sử dụng một ngôn ngữ lập trình khác, rất có thể bạn cũng sẽ triển khai async
/ await
cho ngôn ngữ đó.
Nguồn: https://testdriven.io
1660212900
Что такое параллелизм и параллелизм и как они применимы к Python?
Есть много причин, по которым ваши приложения могут работать медленно. Иногда это происходит из-за плохого алгоритмического дизайна или неправильного выбора структуры данных. Однако иногда это происходит из-за не зависящих от нас сил, таких как аппаратные ограничения или особенности сети. Вот тут-то и подходят параллелизм и параллелизм. Они позволяют вашим программам делать несколько вещей одновременно, либо одновременно, либо тратя как можно меньше времени на ожидание загруженных задач.
Независимо от того, имеете ли вы дело с внешними веб-ресурсами, чтением и записью в несколько файлов или вам нужно несколько раз использовать функцию с интенсивными вычислениями с различными параметрами, эта статья должна помочь вам максимизировать эффективность и скорость вашего кода.
Во-первых, мы углубимся в то, что такое параллелизм и параллелизм и как они вписываются в область Python, используя стандартные библиотеки, такие как многопоточность, многопроцессорность и асинхронность. async
В последней части этой статьи реализация / в Python будет сравниваться await
с тем, как они реализованы в других языках.
Вы можете найти все примеры кода из этой статьи в репозитории concurrency-parallelism-and-asyncio на GitHub.
Чтобы работать с примерами в этой статье, вы уже должны знать, как работать с HTTP-запросами.
К концу этой статьи вы должны быть в состоянии ответить на следующие вопросы:
Что такое параллелизм?
Эффективным определением параллелизма является «способность выполнять несколько задач одновременно». Однако это немного вводит в заблуждение, поскольку задачи могут выполняться или не выполняться в одно и то же время. Вместо этого процесс может начаться, а затем, когда он ожидает завершения определенной инструкции, переключиться на новую задачу, чтобы вернуться только после того, как он больше не ждет. Как только одна задача завершена, она снова переключается на незавершенную задачу, пока все они не будут выполнены. Задачи начинаются асинхронно, выполняются асинхронно и затем асинхронно завершаются.
Если это сбивает вас с толку, давайте вместо этого придумаем аналогию: скажем, вы хотите создать BLT . Во-первых, вам нужно бросить бекон в сковороду на среднем огне. Пока бекон готовится, вы можете достать помидоры и листья салата и начать их готовить (мыть и нарезать). Все это время вы продолжаете проверять и время от времени переворачиваете свой бекон.
На этом этапе вы начали одну задачу, а затем тем временем начали и завершили еще две, все еще ожидая выполнения первой.
В конце концов, вы кладете свой хлеб в тостер. Пока он поджаривается, вы продолжаете проверять свой бекон. Когда кусочки готовы, вы вытаскиваете их и кладете на тарелку. Как только ваш хлеб поджарится, вы намазываете его выбранной пастой для сэндвичей, а затем можете начать выкладывать слоями помидоры, листья салата, а затем, когда все готово, бекон. Только после того, как все приготовлено, подготовлено и выложено слоями, вы можете положить последний кусок тоста на бутерброд, нарезать его (по желанию) и съесть.
Поскольку это требует от вас одновременного выполнения нескольких задач, создание BLT по своей сути является параллельным процессом, даже если вы не уделяете все свое внимание каждой из этих задач одновременно. Во всех смыслах и целях в следующем разделе мы будем называть эту форму параллелизма просто параллелизмом. Мы будем различать его позже в этой статье.
По этой причине параллелизм отлично подходит для процессов с интенсивным вводом-выводом — задач, включающих ожидание веб-запросов или операций чтения/записи файлов.
В Python существует несколько различных способов достижения параллелизма. Сначала мы рассмотрим библиотеку потоков.
Для наших примеров в этом разделе мы собираемся создать небольшую программу на Python, которая пять раз выбирает случайный музыкальный жанр из API Genrenator Binary Jazz , выводит жанр на экран и помещает каждый в отдельный файл.
Для работы с многопоточностью в Python вам потребуется единственный импорт threading
, но для этого примера я также импортировал urllib
для работы с HTTP-запросами, time
чтобы определить, сколько времени требуется для выполнения функций, и json
чтобы легко преобразовать возвращаемые данные json. через Genrenator API.
Вы можете найти код для этого примера здесь .
Начнем с простой функции:
def write_genre(file_name):
"""
Uses genrenator from binaryjazz.us to write a random genre to the
name of the given file
"""
req = Request("https://binaryjazz.us/wp-json/genrenator/v1/genre/", headers={"User-Agent": "Mozilla/5.0"})
genre = json.load(urlopen(req))
with open(file_name, "w") as new_file:
print(f"Writing '{genre}' to '{file_name}'...")
new_file.write(genre)
Изучая приведенный выше код, мы делаем запрос к Genrenator API, загружаем его ответ JSON (случайный музыкальный жанр), распечатываем его, а затем записываем в файл.
Без заголовка «User-Agent» вы получите 304.
Что нас действительно интересует, так это следующий раздел, где происходит фактическая многопоточность:
threads = []
for i in range(5):
thread = threading.Thread(
target=write_genre,
args=[f"./threading/new_file{i}.txt"]
)
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
Сначала мы начинаем со списка. Затем мы повторяем пять раз, каждый раз создавая новый поток. Затем мы запускаем каждый поток, добавляем его в наш список «потоков», а затем проходим по нашему списку в последний раз, чтобы присоединиться к каждому потоку.
Объяснение: Создавать потоки в Python очень просто.
Чтобы создать новый поток, используйте threading.Thread()
. Вы можете передать в него kwarg (аргумент ключевого слова) target
со значением любой функции, которую вы хотите запустить в этом потоке. Но передавать только имя функции, а не ее значение (имеющееся в виду для наших целей, write_genre
а не write_genre()
). Чтобы передать аргументы, передайте «kwargs» (который принимает dict ваших kwargs) или «args» (который принимает итерацию, содержащую ваши аргументы — в данном случае список).
Однако создание потока — это не то же самое, что запуск потока. Чтобы начать тему, используйте {the name of your thread}.start()
. Запуск потока означает «начало его выполнения».
Наконец, когда мы объединяем потоки с помощью thread.join()
, все, что мы делаем, — это обеспечиваем завершение потока, прежде чем продолжить работу с нашим кодом.
Но что такое нить?
Поток — это способ, позволяющий вашему компьютеру разбить один процесс/программу на множество легковесных частей, которые выполняются параллельно. Несколько сбивает с толку то, что стандартная реализация многопоточности в Python ограничивает возможность выполнения потоков только по одному из-за так называемой глобальной блокировки интерпретатора (GIL). GIL необходим, потому что управление памятью CPython (реализация Python по умолчанию) не является потокобезопасным. Из-за этого ограничения многопоточность в Python является одновременной, но не параллельной. Чтобы обойти это, в Python есть отдельный multiprocessing
модуль, не ограниченный GIL, который запускает отдельные процессы, обеспечивая параллельное выполнение вашего кода. Использование multiprocessing
модуля почти идентично использованию threading
модуля.
Дополнительную информацию о GIL Python и безопасности потоков можно найти в официальной документации Real Python и Python .
Вскоре мы более подробно рассмотрим многопроцессорность в Python.
Прежде чем мы покажем потенциальное улучшение скорости по сравнению с беспотоковым кодом, я позволил себе также создать непоточную версию той же программы (опять же, доступную на GitHub ). Вместо того, чтобы создавать новый поток и присоединяться к каждому из них, он вместо этого вызывает write_genre
цикл for, который повторяется пять раз.
Чтобы сравнить тесты скорости, я также импортировал time
библиотеку для измерения времени выполнения наших скриптов:
Starting...
Writing "binary indoremix" to "./sync/new_file0.txt"...
Writing "slavic aggro polka fusion" to "./sync/new_file1.txt"...
Writing "israeli new wave" to "./sync/new_file2.txt"...
Writing "byzantine motown" to "./sync/new_file3.txt"...
Writing "dutch hate industrialtune" to "./sync/new_file4.txt"...
Time to complete synchronous read/writes: 1.42 seconds
После запуска сценария мы видим, что мой компьютер занимает около 1,49 секунды (наряду с классическими музыкальными жанрами, такими как «голландская ненависть индастриалтюн»). Не так уж плохо.
Теперь давайте запустим версию, использующую многопоточность:
Starting...
Writing "college k-dubstep" to "./threading/new_file2.txt"...
Writing "swiss dirt" to "./threading/new_file0.txt"...
Writing "bop idol alternative" to "./threading/new_file4.txt"...
Writing "ethertrio" to "./threading/new_file1.txt"...
Writing "beach aust shanty français" to "./threading/new_file3.txt"...
Time to complete threading read/writes: 0.77 seconds
Первое, что может вас заинтересовать, это то, что функции выполняются не по порядку: 2 - 0 - 4 - 1 - 3.
Это связано с асинхронным характером многопоточности: пока одна функция ожидает, начинается другая и так далее. Поскольку мы можем продолжать выполнять задачи, пока ждем завершения других (из-за сетевых или файловых операций ввода-вывода), вы также могли заметить, что мы сократили наше время примерно вдвое: 0,77 секунды. Хотя сейчас это может показаться не таким уж большим, легко представить себе вполне реальный случай создания веб-приложения, которое должно записывать гораздо больше данных в файл или взаимодействовать с гораздо более сложными веб-сервисами.
Итак, если многопоточность — это так здорово, почему бы нам не закончить статью на этом?
Потому что есть еще лучшие способы одновременного выполнения задач.
Давайте рассмотрим пример использования asyncio. Для этого метода мы собираемся установить aiohttp с помощью pip
. Это позволит нам делать неблокирующие запросы и получать ответы, используя синтаксис async
/ await
, который вскоре будет представлен. Он также имеет дополнительное преимущество функции, которая преобразует ответ JSON без необходимости импортировать json
библиотеку. Мы также установим и импортируем файлы aiofiles , которые позволяют выполнять неблокирующие операции с файлами. Кроме aiohttp
and aiofiles
, import asyncio
, который входит в стандартную библиотеку Python.
«Неблокирующий» означает, что программа позволит другим потокам продолжать работу, пока она ожидает. Это противоположно «блокирующему» коду, который полностью останавливает выполнение вашей программы. Обычные синхронные операции ввода-вывода страдают от этого ограничения.
Вы можете найти код для этого примера здесь .
Когда у нас есть импорт, давайте взглянем на асинхронную версию write_genre
функции из нашего примера asyncio:
async def write_genre(file_name):
"""
Uses genrenator from binaryjazz.us to write a random genre to the
name of the given file
"""
async with aiohttp.ClientSession() as session:
async with session.get("https://binaryjazz.us/wp-json/genrenator/v1/genre/") as response:
genre = await response.json()
async with aiofiles.open(file_name, "w") as new_file:
print(f'Writing "{genre}" to "{file_name}"...')
await new_file.write(genre)
For those not familiar with the async
/await
syntax that can be found in many other modern languages, async
declares that a function, for
loop, or with
statement must be used asynchronously. To call an async function, you must either use the await
keyword from another async function or call create_task()
directly from the event loop, which can be grabbed from asyncio.get_event_loop()
-- i.e., loop = asyncio.get_event_loop()
.
Additionally:
async with
allows awaiting async responses and file operations.async for
(not used here) iterates over an asynchronous stream.Циклы событий — это конструкции, присущие асинхронному программированию, которые позволяют выполнять задачи асинхронно. Поскольку вы читаете эту статью, я могу с уверенностью предположить, что вы, вероятно, не слишком знакомы с этой концепцией. Однако, даже если вы никогда не писали асинхронное приложение, у вас есть опыт работы с циклами событий каждый раз, когда вы используете компьютер. Независимо от того, прослушивает ли ваш компьютер ввод с клавиатуры, играете ли вы в многопользовательские онлайн-игры или просматриваете Reddit во время копирования файлов в фоновом режиме, цикл событий является движущей силой, обеспечивающей бесперебойную и эффективную работу. В чистом виде цикл событий — это процесс, который ожидает триггеров, а затем выполняет определенные (запрограммированные) действия, как только эти триггеры встречаются. Они часто возвращают «обещание» (синтаксис JavaScript) или «будущее». (синтаксис Python) для обозначения того, что задача была добавлена. После завершения задачи обещание или будущее возвращает значение, переданное из вызываемой функции (при условии, что функция действительно возвращает значение).
Идея выполнения функции в ответ на другую функцию называется «обратным вызовом».
Чтобы еще раз взглянуть на обратные вызовы и события, вот отличный ответ на Stack Overflow .
Вот пошаговое руководство по нашей функции:
Мы используем async with
для асинхронного открытия нашего клиентского сеанса. Класс aiohttp.ClientSession()
— это то, что позволяет нам делать HTTP-запросы и оставаться на связи с источником, не блокируя выполнение нашего кода. Затем мы делаем асинхронный запрос к Genrenator API и ждем ответа JSON (случайный музыкальный жанр). В следующей строке мы async with
снова используем aiofiles
библиотеку, чтобы асинхронно открыть новый файл, чтобы записать в него наш новый жанр. Печатаем жанр, потом пишем в файл.
В отличие от обычных скриптов Python, программирование с помощью asyncio в значительной степени требует * использования какой-то «основной» функции.
*Если вы не используете устаревший синтаксис «yield» с декоратором @asyncio.coroutine, который будет удален в Python 3.10 .
Это связано с тем, что вам нужно использовать ключевое слово «async», чтобы использовать синтаксис «ожидания», а синтаксис «ожидания» — единственный способ фактически запустить другие асинхронные функции.
Вот наша основная функция:
async def main():
tasks = []
for i in range(5):
tasks.append(write_genre(f"./async/new_file{i}.txt"))
await asyncio.gather(*tasks)
Как видите, мы объявили это с помощью «async». Затем мы создаем пустой список под названием «задачи» для размещения наших асинхронных задач (вызовы Genrenator и наш файловый ввод-вывод). Мы добавляем наши задачи в наш список, но на самом деле они еще не запущены. Звонки на самом деле не совершаются, пока мы не запланируем их с помощью await asyncio.gather(*tasks)
. Это запускает все задачи в нашем списке и ждет их завершения, прежде чем продолжить остальную часть нашей программы. Наконец, мы используем asyncio.run(main())
для запуска нашей «основной» функции. Функция .run()
является точкой входа для нашей программы, и обычно ее следует вызывать только один раз для каждого процесса .
Для тех, кто не знаком,
*
перед задачами называется «распаковка аргументов». Как это ни звучит, он распаковывает наш список в ряд аргументов для нашей функции. Наша функцияasyncio.gather()
в этом случае.
И это все, что нам нужно сделать. Теперь запустим нашу программу (источник которой включает в себя те же функции синхронизации, что и примеры синхронного и многопоточного выполнения)...
Writing "albuquerque fiddlehaus" to "./async/new_file1.txt"...
Writing "euroreggaebop" to "./async/new_file2.txt"...
Writing "shoedisco" to "./async/new_file0.txt"...
Writing "russiagaze" to "./async/new_file4.txt"...
Writing "alternative xylophone" to "./async/new_file3.txt"...
Time to complete asyncio read/writes: 0.71 seconds
...мы видим, что это еще быстрее. И вообще метод asyncio всегда будет немного быстрее, чем метод threading. Это связано с тем, что когда мы используем синтаксис «ожидания», мы, по сути, говорим нашей программе «подожди, я сейчас вернусь», но наша программа отслеживает, сколько времени нам потребуется, чтобы закончить то, что мы делаем. Как только мы закончим, наша программа узнает об этом и возобновит работу, как только сможет. Потоки в Python допускают асинхронность, но наша программа теоретически может пропускать разные потоки, которые могут быть еще не готовы, что приводит к потере времени, если есть потоки, готовые к продолжению выполнения.
Итак, когда я должен использовать многопоточность и когда я должен использовать asyncio?
Когда вы пишете новый код, используйте asyncio. Если вам нужно взаимодействовать со старыми библиотеками или теми, которые не поддерживают asyncio, вам может быть лучше использовать многопоточность.
Оказывается, тестировать асинхронные функции с помощью pytest так же просто, как тестировать синхронные функции. Просто установите пакет pytest-asyncio с помощью pip
, отметьте свои тесты async
ключевым словом и примените декоратор, который дает pytest
понять, что он асинхронный: @pytest.mark.asyncio
. Давайте посмотрим на пример.
Во-первых, давайте напишем произвольную асинхронную функцию в файле с именем hello_asyncio.py :
import asyncio
async def say_hello(name: str):
""" Sleeps for two seconds, then prints 'Hello, {{ name }}!' """
try:
if type(name) != str:
raise TypeError("'name' must be a string")
if name == "":
raise ValueError("'name' cannot be empty")
except (TypeError, ValueError):
raise
print("Sleeping...")
await asyncio.sleep(2)
print(f"Hello, {name}!")
Функция принимает один строковый аргумент: name
. Убедившись, что name
это строка длиной больше единицы, наша функция асинхронно приостанавливается на две секунды, а затем выводит "Hello, {name}!"
на консоль.
Разница между
asyncio.sleep()
иtime.sleep()
в том, чтоasyncio.sleep()
он не блокирует.
Теперь давайте проверим это с помощью pytest. В том же каталоге, что и hello_asyncio.py, создайте файл с именем test_hello_asyncio.py, а затем откройте его в своем любимом текстовом редакторе.
Начнем с нашего импорта:
import pytest # Note: pytest-asyncio does not require a separate import
from hello_asyncio import say_hello
Затем мы создадим тест с правильным вводом:
@pytest.mark.parametrize("name", [
"Robert Paulson",
"Seven of Nine",
"x Æ a-12"
])
@pytest.mark.asyncio
async def test_say_hello(name):
await say_hello(name)
Что следует отметить:
@pytest.mark.asyncio
работать асинхронноasync
синтаксисawait
запускаем нашу асинхронную функцию так, как если бы мы запускали ее вне теста.Теперь давайте запустим наш тест с подробной -v
опцией:
pytest -v
...
collected 3 items
test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED [ 33%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED [ 66%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED [100%]
Выглядит неплохо. Далее мы напишем пару тестов с плохим входом. Вернувшись внутрь test_hello_asyncio.py , давайте создадим класс с именем TestSayHelloThrowsExceptions
:
class TestSayHelloThrowsExceptions:
@pytest.mark.parametrize("name", [
"",
])
@pytest.mark.asyncio
async def test_say_hello_value_error(self, name):
with pytest.raises(ValueError):
await say_hello(name)
@pytest.mark.parametrize("name", [
19,
{"name", "Diane"},
[]
])
@pytest.mark.asyncio
async def test_say_hello_type_error(self, name):
with pytest.raises(TypeError):
await say_hello(name)
Опять же, мы украшаем наши тесты с помощью @pytest.mark.asyncio
, помечаем наши тесты async
синтаксисом, а затем вызываем нашу функцию с помощью await
.
Запустите тесты еще раз:
pytest -v
...
collected 7 items
test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED [ 14%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED [ 28%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED [ 42%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_value_error[] PASSED [ 57%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[19] PASSED [ 71%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name1] PASSED [ 85%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name2] PASSED [100%]
В качестве альтернативы pytest-asyncio вы можете создать фикстуру pytest, которая создает цикл событий asyncio:
import asyncio
import pytest
from hello_asyncio import say_hello
@pytest.fixture
def event_loop():
loop = asyncio.get_event_loop()
yield loop
Затем вместо использования синтаксиса async
/ await
вы создаете свои тесты, как обычные синхронные тесты:
@pytest.mark.parametrize("name", [
"Robert Paulson",
"Seven of Nine",
"x Æ a-12"
])
def test_say_hello(event_loop, name):
event_loop.run_until_complete(say_hello(name))
class TestSayHelloThrowsExceptions:
@pytest.mark.parametrize("name", [
"",
])
def test_say_hello_value_error(self, event_loop, name):
with pytest.raises(ValueError):
event_loop.run_until_complete(say_hello(name))
@pytest.mark.parametrize("name", [
19,
{"name", "Diane"},
[]
])
def test_say_hello_type_error(self, event_loop, name):
with pytest.raises(TypeError):
event_loop.run_until_complete(say_hello(name))
If you're interested, here's a more advanced tutorial on asyncio testing.
If you want to learn more about what distinguishes Python's implementation of threading vs asyncio, here's a great article from Medium.
For even better examples and explanations of threading in Python, here's a video by Corey Schafer that goes more in-depth, including using the concurrent.futures
library.
Lastly, for a massive deep-dive into asyncio itself, here's an article from Real Python completely dedicated to it.
Bonus: One more library you might be interested in is called Unsync, especially if you want to easily convert your current synchronous code into asynchronous code. To use it, you install the library with pip, import it with from unsync import unsync
, then decorate whatever currently synchronous function with @unsync
to make it asynchronous. To await it and get its return value (which you can do anywhere -- it doesn't have to be in an async/unsync function), just call .result()
after the function call.
What is parallelism?
Параллелизм очень сильно связан с параллелизмом. На самом деле, параллелизм — это подмножество параллелизма: в то время как параллельный процесс выполняет несколько задач одновременно, независимо от того, отвлекается ли на них все внимание или нет, параллельный процесс физически выполняет несколько задач одновременно. Хорошим примером может быть вождение, прослушивание музыки и одновременная поедание BLT, которое мы сделали в последнем разделе.
Поскольку они не требуют больших интенсивных усилий, вы можете выполнять их все сразу, не ожидая ничего и не отвлекая внимания.
Теперь давайте посмотрим, как это реализовать на Python. Мы могли бы использовать multiprocessing
библиотеку, но давайте concurrent.futures
вместо этого воспользуемся библиотекой — она устраняет необходимость управлять количеством процессов вручную. Поскольку основное преимущество многопроцессорной обработки возникает, когда вы выполняете несколько ресурсоемких задач, мы собираемся вычислить квадраты от 1 миллиона (1000000) до 1 миллиона и 16 (1000016).
Вы можете найти код для этого примера здесь .
Единственный импорт, который нам понадобится, это concurrent.futures
:
import concurrent.futures
import time
if __name__ == "__main__":
pow_list = [i for i in range(1000000, 1000016)]
print("Starting...")
start = time.time()
with concurrent.futures.ProcessPoolExecutor() as executor:
futures = [executor.submit(pow, i, i) for i in pow_list]
for f in concurrent.futures.as_completed(futures):
print("okay")
end = time.time()
print(f"Time to complete: {round(end - start, 2)}")
Поскольку я разрабатываю на компьютере с Windows, я использую
if __name__ == "main"
. Это необходимо, поскольку в Windows нетfork
системного вызова, присущего системам Unix . Поскольку Windows не имеет такой возможности, она прибегает к запуску нового интерпретатора для каждого процесса, пытающегося импортировать основной модуль. Если основной модуль не существует, он перезапускает всю вашу программу, вызывая рекурсивный хаос.
Итак, взглянув на нашу основную функцию, мы используем понимание списка для создания списка от 1 миллиона до 1 миллиона и 16, мы открываем ProcessPoolExecutor с concurrent.futures и используем понимание списка и ProcessPoolExecutor().submit()
начинаем выполнять наши процессы и бросать их в список под названием «фьючерсы».
Мы также могли бы использовать
ThreadPoolExecutor()
, если бы вместо этого хотели использовать потоки — concurrent.futures универсален.
И здесь проявляется асинхронность: список «результаты» на самом деле не содержит результатов выполнения наших функций. Вместо этого он содержит «фьючерсы», которые аналогичны идее «обещаний» в JavaScript. Чтобы наша программа продолжала работать, мы возвращаем эти фьючерсы, которые представляют собой заполнитель для значения. Если мы попытаемся напечатать будущее, в зависимости от того, завершено оно или нет, мы либо вернемся в состояние «ожидание», либо «завершено». После завершения мы можем получить возвращаемое значение (при условии, что оно есть), используя var.result()
. В этом случае наша переменная будет «результатом».
Затем мы повторяем наш список фьючерсов, но вместо того, чтобы печатать наши значения, мы просто печатаем «хорошо». Это просто из-за того, насколько массивными получаются результирующие вычисления.
Как и прежде, я создал скрипт сравнения, который делает это синхронно. И, как и прежде, вы можете найти его на GitHub .
Запустив нашу управляющую программу, которая также включает в себя функции синхронизации нашей программы, мы получаем:
Starting...
okay
...
okay
Time to complete: 54.64
Ух ты. 54,64 секунды — это довольно много. Посмотрим, будет ли лучше наша версия с многопроцессорностью:
Starting...
okay
...
okay
Time to complete: 6.24
Наше время значительно сократилось. Мы находимся примерно в 1/9 от нашего первоначального времени.
Так что же произойдет, если вместо этого мы будем использовать потоки?
Я уверен, вы можете догадаться - это будет не намного быстрее, чем синхронное выполнение. На самом деле, это может быть медленнее, потому что для создания новых потоков по-прежнему требуется немного времени и усилий. Но не верьте мне на слово, вот что мы получим, если заменим ProcessPoolExecutor()
на ThreadPoolExecutor()
:
Starting...
okay
...
okay
Time to complete: 53.83
Как я упоминал ранее, многопоточность позволяет вашим приложениям сосредоточиться на новых задачах, пока другие ждут. В этом случае мы никогда не сидим сложа руки. С другой стороны, многопроцессорность запускает совершенно новые сервисы, обычно на отдельных ядрах ЦП, готовые делать все, что вы попросите, полностью в тандеме с тем, что делает ваш скрипт. Вот почему многопроцессорная версия, занимающая примерно 1/9 времени, имеет смысл — у меня 8 ядер в моем процессоре.
Теперь, когда мы поговорили о параллелизме и параллелизме в Python, мы можем, наконец, прояснить термины. Если у вас возникли проблемы с различием между терминами, вы можете безопасно и точно думать о наших предыдущих определениях «параллелизм» и «параллелизм» как «параллельный параллелизм» и «непараллельный параллелизм» соответственно.
В Real Python есть отличная статья о параллелизме и параллелизме .
У Engineer Man есть хорошее видео, сравнивающее многопоточность и многопроцессорность .
У Кори Шафера также есть хорошее видео о многопроцессорности в том же духе, что и его видео о многопоточности.
Если вы смотрите только одно видео, посмотрите это превосходное выступление Рэймонда Хеттингера . Он проделывает потрясающую работу, объясняя различия между многопроцессорностью, многопоточностью и асинхронностью.
Что делать, если мне нужно объединить множество операций ввода-вывода с тяжелыми вычислениями?
Мы тоже можем это сделать. Скажем, вам нужно очистить 100 веб-страниц для определенной части информации, а затем вам нужно сохранить эту часть информации в файле на потом. Мы можем разделить вычислительную мощность между каждым из ядер нашего компьютера, заставив каждый процесс очищать часть страниц.
Для этого скрипта давайте установим Beautiful Soup , который поможет нам легко очищать наши страницы: pip install beautifulsoup4
. На этот раз у нас на самом деле довольно много импорта. Вот они, и вот почему мы их используем:
import asyncio # Gives us async/await
import concurrent.futures # Allows creating new processes
import time
from math import floor # Helps divide up our requests evenly across our CPU cores
from multiprocessing import cpu_count # Returns our number of CPU cores
import aiofiles # For asynchronously performing file I/O operations
import aiohttp # For asynchronously making HTTP requests
from bs4 import BeautifulSoup # For easy webpage scraping
Вы можете найти код для этого примера здесь .
Во-первых, мы собираемся создать асинхронную функцию, которая отправляет запрос в Википедию на получение случайных страниц. Мы будем очищать заголовок каждой страницы, который получим, с помощью BeautifulSoup
, а затем добавим его в заданный файл; мы будем отделять каждый заголовок табуляцией. Функция будет принимать два аргумента:
async def get_and_scrape_pages(num_pages: int, output_file: str):
"""
Makes {{ num_pages }} requests to Wikipedia to receive {{ num_pages }} random
articles, then scrapes each page for its title and appends it to {{ output_file }},
separating each title with a tab: "\\t"
#### Arguments
---
num_pages: int -
Number of random Wikipedia pages to request and scrape
output_file: str -
File to append titles to
"""
async with \
aiohttp.ClientSession() as client, \
aiofiles.open(output_file, "a+", encoding="utf-8") as f:
for _ in range(num_pages):
async with client.get("https://en.wikipedia.org/wiki/Special:Random") as response:
if response.status > 399:
# I was getting a 429 Too Many Requests at a higher volume of requests
response.raise_for_status()
page = await response.text()
soup = BeautifulSoup(page, features="html.parser")
title = soup.find("h1").text
await f.write(title + "\t")
await f.write("\n")
Мы оба асинхронно открываем aiohttp ClientSession
и наш выходной файл. Режим a+
означает добавление к файлу и создание его, если он еще не существует. Кодирование наших строк как utf-8 гарантирует, что мы не получим ошибку, если наши заголовки содержат международные символы. Если мы получим ответ об ошибке, мы поднимем его вместо продолжения (при больших объемах запросов я получал 429 Too Many Requests). Мы асинхронно получаем текст из нашего ответа, затем разбираем заголовок и асинхронно добавляем его в наш файл. После того, как мы добавим все наши заголовки, мы добавим новую строку: "\n".
Наша следующая функция — это функция, которую мы будем запускать с каждым новым процессом, чтобы разрешить его асинхронный запуск:
def start_scraping(num_pages: int, output_file: str, i: int):
""" Starts an async process for requesting and scraping Wikipedia pages """
print(f"Process {i} starting...")
asyncio.run(get_and_scrape_pages(num_pages, output_file))
print(f"Process {i} finished.")
Теперь о нашей основной функции. Начнем с некоторых констант (и объявления нашей функции):
def main():
NUM_PAGES = 100 # Number of pages to scrape altogether
NUM_CORES = cpu_count() # Our number of CPU cores (including logical cores)
OUTPUT_FILE = "./wiki_titles.tsv" # File to append our scraped titles to
PAGES_PER_CORE = floor(NUM_PAGES / NUM_CORES)
PAGES_FOR_FINAL_CORE = PAGES_PER_CORE + NUM_PAGES % PAGES_PER_CORE # For our final core
А теперь логика:
futures = []
with concurrent.futures.ProcessPoolExecutor(NUM_CORES) as executor:
for i in range(NUM_CORES - 1):
new_future = executor.submit(
start_scraping, # Function to perform
# v Arguments v
num_pages=PAGES_PER_CORE,
output_file=OUTPUT_FILE,
i=i
)
futures.append(new_future)
futures.append(
executor.submit(
start_scraping,
PAGES_FOR_FINAL_CORE, OUTPUT_FILE, NUM_CORES-1
)
)
concurrent.futures.wait(futures)
Мы создаем массив для хранения наших фьючерсов, затем мы создаем ProcessPoolExecutor
, устанавливая его max_workers
равным нашему количеству ядер. Мы перебираем диапазон, равный нашему количеству ядер минус 1, запуская новый процесс с нашей start_scraping
функцией. Затем мы добавляем к нему наш список фьючерсов. У нашего последнего ядра потенциально будет дополнительная работа, поскольку оно будет очищать количество страниц, равное каждому из наших других ядер, но дополнительно будет очищать количество страниц, равное остатку, который мы получили при делении нашего общего количества страниц для очистки. по общему количеству ядер процессора.
Убедитесь, что ваша основная функция действительно запущена:
if __name__ == "__main__":
start = time.time()
main()
print(f"Time to complete: {round(time.time() - start, 2)} seconds.")
После запуска программы на моем 8-ядерном процессоре (вместе с кодом бенчмаркинга):
Эта версия ( asyncio с многопроцессорностью ):
Time to complete: 5.65 seconds.
Time to complete: 8.87 seconds.
Time to complete: 47.92 seconds.
Time to complete: 88.86 seconds.
На самом деле я очень удивлен, увидев, что улучшение asyncio с многопроцессорностью по сравнению с просто многопроцессорностью оказалось не таким значительным, как я думал.
async
/ await
и подобный синтаксис также существует в других языках, и в некоторых из этих языков его реализация может сильно отличаться.
Первым языком программирования (еще в 2007 году), использовавшим этот async
синтаксис, был Microsoft F#. В то время как он точно не использует await
ожидание вызова функции, он использует особый синтаксис, такой как let!
и do!
наряду с проприетарными Async
функциями, включенными в System
модуль.
Дополнительные сведения об асинхронном программировании на F# можно найти в документации Microsoft по F# .
Затем их команда C# построила эту концепцию, и именно здесь родились ключевые слова async
/ , с await
которыми мы теперь знакомы:
using System;
// Allows the "Task" return type
using System.Threading.Tasks;
public class Program
{
// Declare an async function with "async"
private static async Task<string> ReturnHello()
{
return "hello world";
}
// Main can be async -- no problem
public static async Task Main()
{
// await an async string
string result = await ReturnHello();
// Print the string we got asynchronously
Console.WriteLine(result);
}
}
Мы гарантируем, что мы, using System.Threading.Tasks
поскольку он включает Task
тип, и, как правило, Task
тип необходим для ожидания асинхронной функции. Самое классное в C# то, что вы можете сделать свою основную функцию асинхронной, просто объявив ее с помощью async
, и у вас не будет никаких проблем.
Если вы хотите узнать больше о
async
/await
в C#, в документации Microsoft по C# есть хорошая страница.
async
Синтаксис / , впервые представленный в ES6, по await
сути представляет собой абстракцию обещаний JavaScript (которые аналогичны фьючерсам Python). Однако, в отличие от Python, пока вы не ждете, вы можете вызывать асинхронную функцию в обычном режиме без специальной функции, такой как Python asyncio.start()
:
// Declare a function with async
async function returnHello(){
return "hello world";
}
async function printSomething(){
// await an async string
const result = await returnHello();
// print the string we got asynchronously
console.log(result);
}
// Run our async code
printSomething();
См. MDN для получения дополнительной информации о
async
/await
в JavaScript .
Теперь Rust также позволяет использовать синтаксис async
/ await
и работает аналогично Python, C# и JavaScript:
// Allows blocking synchronous code to run async code
use futures::executor::block_on;
// Declare an async function with "async"
async fn return_hello() -> String {
"hello world".to_string()
}
// Code that awaits must also be declared with "async"
async fn print_something(){
// await an async String
let result: String = return_hello().await;
// Print the string we got asynchronously
println!("{0}", result);
}
fn main() {
// Block the current synchronous execution to run our async code
block_on(print_something());
}
Чтобы использовать асинхронные функции, мы должны сначала добавить futures = "0.3"
в наш Cargo.toml . Затем мы импортируем block_on
функцию с use futures::executor::block_on
-- block_on
это необходимо для запуска нашей асинхронной функции из нашей синхронной main
функции.
Вы можете найти больше информации о
async
/await
в Rust в документации Rust.
Вместо традиционного синтаксиса async
/ await
, присущего всем предыдущим рассмотренным нами языкам, в Go используются «горутины» и «каналы». Вы можете думать о канале как о будущем Python. В Go вы обычно отправляете канал в качестве аргумента функции, а затем используете go
для одновременного запуска функции. Всякий раз, когда вам нужно убедиться, что функция завершила свое выполнение, вы используете <-
синтаксис, который вы можете считать более распространенным await
синтаксисом. Если ваша горутина (функция, которую вы запускаете асинхронно) имеет возвращаемое значение, ее можно получить таким образом.
package main
import "fmt"
// "chan" makes the return value a string channel instead of a string
func returnHello(result chan string){
// Gives our channel a value
result <- "hello world"
}
func main() {
// Creates a string channel
result := make(chan string)
// Starts execution of our goroutine
go returnHello(result)
// Awaits and prints our string
fmt.Println(<- result)
}
Запустите его на игровой площадке Go
Дополнительные сведения о параллелизме в Go см . в статье «Введение в программирование на Go » Калеба Докси.
Подобно Python, Ruby также имеет ограничение Global Interpreter Lock. Чего у него нет, так это параллелизма, встроенного в язык. Тем не менее, есть созданный сообществом гем, который позволяет параллелизм в Ruby, и вы можете найти его исходный код на GitHub .
Как и в Ruby, в Java нет встроенного синтаксиса async
/ , но есть возможности параллелизма с использованием модуля. Однако Electronic Arts написала асинхронную библиотеку , позволяющую использовать в качестве метода. Это не совсем то же самое, что Python/C#/JavaScript/Rust, но на него стоит обратить внимание, если вы являетесь Java-разработчиком и заинтересованы в такой функциональности.awaitjava.util.concurrentawait
Хотя C++ также не имеет синтаксиса async
/ await
, у него есть возможность использовать фьючерсы для одновременного запуска кода с использованием futures
модуля:
#include <iostream>
#include <string>
// Necessary for futures
#include <future>
// No async declaration needed
std::string return_hello() {
return "hello world";
}
int main ()
{
// Declares a string future
std::future<std::string> fut = std::async(return_hello);
// Awaits the result of the future
std::string result = fut.get();
// Prints the string we got asynchronously
std::cout << result << '\n';
}
Нет необходимости объявлять функцию с каким-либо ключевым словом, чтобы указать, может и должна ли она выполняться асинхронно. Вместо этого вы объявляете свое начальное будущее всякий раз, когда вам это нужно, std::future<{{ function return type }}>
и устанавливаете его равным std::async()
, включая имя функции, которую вы хотите выполнить асинхронно, вместе с любыми аргументами, которые она принимает, т . Е. std::async(do_something, 1, 2, "string")
. Чтобы дождаться значения будущего, используйте для него .get()
синтаксис.
Вы можете найти документацию по асинхронности в C++ на сайте cplusplus.com.
Независимо от того, работаете ли вы с асинхронными сетевыми или файловыми операциями или выполняете множество сложных вычислений, существует несколько различных способов максимизировать эффективность вашего кода.
Если вы используете Python, вы можете использовать asyncio
или threading
максимально использовать операции ввода-вывода или multiprocessing
модуль для кода, интенсивно использующего ЦП.
Также помните, что
concurrent.futures
модуль можно использовать вместо любогоthreading
илиmultiprocessing
.
Если вы используете другой язык программирования, скорее всего, для него тоже есть реализация async
/ .await
Источник: https://testdriven.io
1660205580
什麼是並發和並行性,它們如何應用於 Python?
您的應用程序運行緩慢的原因有很多。有時這是由於算法設計不佳或數據結構選擇錯誤造成的。然而,有時,這是由於我們無法控制的力量,例如硬件限製或網絡的怪癖。這就是並發性和並行性適合的地方。它們允許您的程序同時執行多項操作,或者同時或通過浪費最少的時間等待繁忙的任務。
無論您是處理外部 Web 資源、讀取和寫入多個文件,還是需要多次使用不同參數的計算密集型函數,本文都應幫助您最大限度地提高代碼的效率和速度。
首先,我們將深入研究什麼是並發和並行性,以及它們如何使用標準庫(如線程、多處理和異步)融入 Python 領域。本文的最後一部分將比較 Python 對async
/的實現await
與其他語言的實現方式。
您可以在 GitHub 上的concurrency-parallelism-and-asyncio 存儲庫中找到本文中的所有代碼示例。
要完成本文中的示例,您應該已經知道如何處理 HTTP 請求。
在本文結束時,您應該能夠回答以下問題:
什麼是並發?
An effective definition for concurrency is "being able to perform multiple tasks at once". This is a bit misleading though, as the tasks may or may not actually be performed at exactly the same time. Instead, a process might start, then once it's waiting on a specific instruction to finish, switch to a new task, only to come back once it's no longer waiting. Once one task is finished, it switches again to an unfinished task until they have all been performed. Tasks start asynchronously, get performed asynchronously, and then finish asynchronously.
If that was confusing to you, let's instead think of an analogy: Say you want to make a BLT. First, you'll want to throw the bacon in a pan on medium-low heat. While the bacon's cooking, you can get out your tomatoes and lettuce and start preparing (washing and cutting) them. All the while, you continue checking on and occasionally flipping over your bacon.
At this point, you've started a task, and then started and completed two more in the meantime, all while you're still waiting on the first.
Eventually you put your bread in a toaster. While it's toasting, you continue checking on your bacon. As pieces get finished, you pull them out and place them on a plate. Once your bread is done toasting, you apply to it your sandwich spread of choice, and then you can start layering on your tomatoes, lettuce, and then, once it's done cooking, your bacon. Only once everything is cooked, prepared, and layered can you place the last piece of toast onto your sandwich, slice it (optional), and eat it.
Because it requires you to perform multiple tasks at the same time, making a BLT is inherently a concurrent process, even if you are not giving your full attention to each of those tasks all at once. For all intents and purposes, for the next section, we'll refer to this form of concurrency as just "concurrency." We'll differentiate it later on in this article.
For this reason, concurrency is great for I/O-intensive processes -- tasks that involve waiting on web requests or file read/write operations.
In Python, there are a few different ways to achieve concurrency. The first we'll take a look at is the threading library.
For our examples in this section, we're going to build a small Python program that grabs a random music genre from Binary Jazz's Genrenator API five times, prints the genre to the screen, and puts each one into its own file.
To work with threading in Python, the only import you'll need is threading
, but for this example, I've also imported urllib
to work with HTTP requests, time
to determine how long the functions take to complete, and json
to easily convert the json data returned from the Genrenator API.
You can find the code for this example here.
Let's start with a simple function:
def write_genre(file_name):
"""
Uses genrenator from binaryjazz.us to write a random genre to the
name of the given file
"""
req = Request("https://binaryjazz.us/wp-json/genrenator/v1/genre/", headers={"User-Agent": "Mozilla/5.0"})
genre = json.load(urlopen(req))
with open(file_name, "w") as new_file:
print(f"Writing '{genre}' to '{file_name}'...")
new_file.write(genre)
Examining the code above, we're making a request to the Genrenator API, loading its JSON response (a random music genre), printing it, then writing it to a file.
Without the "User-Agent" header you will receive a 304.
What we're really interested in is the next section, where the actual threading happens:
threads = []
for i in range(5):
thread = threading.Thread(
target=write_genre,
args=[f"./threading/new_file{i}.txt"]
)
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
We first start with a list. We then proceed to iterate five times, creating a new thread each time. Next, we start each thread, append it to our "threads" list, and then iterate over our list one last time to join each thread.
Explanation: Creating threads in Python is easy.
To create a new thread, use threading.Thread()
. You can pass into it the kwarg (keyword argument) target
with a value of whatever function you would like to run on that thread. But only pass in the name of the function, not its value (meaning, for our purposes, write_genre
and not write_genre()
). To pass arguments, pass in "kwargs" (which takes a dict of your kwargs) or "args" (which takes an iterable containing your args -- in this case, a list).
Creating a thread is not the same as starting a thread, however. To start your thread, use {the name of your thread}.start()
. Starting a thread means "starting its execution."
Lastly, when we join threads with thread.join()
, all we're doing is ensuring the thread has finished before continuing on with our code.
But what exactly is a thread?
A thread is a way of allowing your computer to break up a single process/program into many lightweight pieces that execute in parallel. Somewhat confusingly, Python's standard implementation of threading limits threads to only being able to execute one at a time due to something called the Global Interpreter Lock (GIL). The GIL is necessary because CPython's (Python's default implementation) memory management is not thread-safe. Because of this limitation, threading in Python is concurrent, but not parallel. To get around this, Python has a separate multiprocessing
module not limited by the GIL that spins up separate processes, enabling parallel execution of your code. Using the multiprocessing
module is nearly identical to using the threading
module.
More info about Python's GIL and thread safety can be found on Real Python and Python's official docs.
We'll take a more in-depth look at multiprocessing in Python shortly.
Before we show the potential speed improvement over non-threaded code, I took the liberty of also creating a non-threaded version of the same program (again, available on GitHub). Instead of creating a new thread and joining each one, it instead calls write_genre
in a for loop that iterates five times.
To compare speed benchmarks, I also imported the time
library to time the execution of our scripts:
Starting...
Writing "binary indoremix" to "./sync/new_file0.txt"...
Writing "slavic aggro polka fusion" to "./sync/new_file1.txt"...
Writing "israeli new wave" to "./sync/new_file2.txt"...
Writing "byzantine motown" to "./sync/new_file3.txt"...
Writing "dutch hate industrialtune" to "./sync/new_file4.txt"...
Time to complete synchronous read/writes: 1.42 seconds
Upon running the script, we see that it takes my computer around 1.49 seconds (along with classic music genres such as "dutch hate industrialtune"). Not too bad.
Now let's run the version that uses threading:
Starting...
Writing "college k-dubstep" to "./threading/new_file2.txt"...
Writing "swiss dirt" to "./threading/new_file0.txt"...
Writing "bop idol alternative" to "./threading/new_file4.txt"...
Writing "ethertrio" to "./threading/new_file1.txt"...
Writing "beach aust shanty français" to "./threading/new_file3.txt"...
Time to complete threading read/writes: 0.77 seconds
The first thing that might stand out to you is the functions not being completed in order: 2 - 0 - 4 - 1 - 3
This is because of the asynchronous nature of threading: as one function waits, another one begins, and so on. Because we're able to continue performing tasks while we're waiting on others to finish (either due to networking or file I/O operations), you may also have noticed that we cut our time roughly in half: 0.77 seconds. Whereas this might not seem like a lot now, it's easy to imagine the very real case of building a web application that needs to write much more data to a file or interact with much more complex web services.
So, if threading is so great, why don't we end the article here?
Because there are even better ways to perform tasks concurrently.
Let's take a look at an example using asyncio. For this method, we're going to install aiohttp using pip
. This will allow us to make non-blocking requests and receive responses using the async
/await
syntax that will be introduced shortly. It also has the extra benefit of a function that converts a JSON response without needing to import the json
library. We'll also install and import aiofiles, which allows non-blocking file operations. Other than aiohttp
and aiofiles
, import asyncio
, which comes with the Python standard library.
"Non-blocking" means a program will allow other threads to continue running while it's waiting. This is opposed to "blocking" code, which stops execution of your program completely. Normal, synchronous I/O operations suffer from this limitation.
You can find the code for this example here.
Once we have our imports in place, let's take a look at the asynchronous version of the write_genre
function from our asyncio example:
async def write_genre(file_name):
"""
Uses genrenator from binaryjazz.us to write a random genre to the
name of the given file
"""
async with aiohttp.ClientSession() as session:
async with session.get("https://binaryjazz.us/wp-json/genrenator/v1/genre/") as response:
genre = await response.json()
async with aiofiles.open(file_name, "w") as new_file:
print(f'Writing "{genre}" to "{file_name}"...')
await new_file.write(genre)
For those not familiar with the async
/await
syntax that can be found in many other modern languages, async
declares that a function, for
loop, or with
statement must be used asynchronously. To call an async function, you must either use the await
keyword from another async function or call create_task()
directly from the event loop, which can be grabbed from asyncio.get_event_loop()
-- i.e., loop = asyncio.get_event_loop()
.
Additionally:
async with
allows awaiting async responses and file operations.async for
(not used here) iterates over an asynchronous stream.Event loops are constructs inherent to asynchronous programming that allow performing tasks asynchronously. As you're reading this article, I can safely assume you're probably not too familiar with the concept. However, even if you've never written an async application, you have experience with event loops every time you use a computer. Whether your computer is listening for keyboard input, you're playing online multiplayer games, or you're browsing Reddit while you have files copying in the background, an event loop is the driving force that keeps everything working smoothly and efficiently. In its purest essence, an event loop is a process that waits around for triggers and then performs specific (programmed) actions once those triggers are met. They often return a "promise" (JavaScript syntax) or "future" (Python syntax) of some sort to denote that a task has been added. Once the task is finished, the promise or future returns a value passed back from the called function (assuming the function does return a value).
The idea of performing a function in response to another function is called a "callback."
For another take on callbacks and events, here's a great answer on Stack Overflow.
Here's a walkthrough of our function:
We're using async with
to open our client session asynchronously. The aiohttp.ClientSession()
class is what allows us to make HTTP requests and remain connected to a source without blocking the execution of our code. We then make an async request to the Genrenator API and await the JSON response (a random music genre). In the next line, we use async with
again with the aiofiles
library to asynchronously open a new file to write our new genre to. We print the genre, then write it to the file.
Unlike regular Python scripts, programming with asyncio pretty much enforces* using some sort of "main" function.
*Unless you're using the deprecated "yield" syntax with the @asyncio.coroutine decorator, which will be removed in Python 3.10.
This is because you need to use the "async" keyword in order to use the "await" syntax, and the "await" syntax is the only way to actually run other async functions.
Here's our main function:
async def main():
tasks = []
for i in range(5):
tasks.append(write_genre(f"./async/new_file{i}.txt"))
await asyncio.gather(*tasks)
As you can see, we've declared it with "async." We then create an empty list called "tasks" to house our async tasks (calls to Genrenator and our file I/O). We append our tasks to our list, but they are not actually run yet. The calls don't actually get made until we schedule them with await asyncio.gather(*tasks)
. This runs all of the tasks in our list and waits for them to finish before continuing with the rest of our program. Lastly, we use asyncio.run(main())
to run our "main" function. The .run()
function is the entry point for our program, and it should generally only be called once per process.
For those not familiar, the
*
in front of tasks is called "argument unpacking." Just as it sounds, it unpacks our list into a series of arguments for our function. Our function isasyncio.gather()
in this case.
And that's all we need to do. Now, running our program (the source of which includes the same timing functionality of the synchronous and threading examples)...
Writing "albuquerque fiddlehaus" to "./async/new_file1.txt"...
Writing "euroreggaebop" to "./async/new_file2.txt"...
Writing "shoedisco" to "./async/new_file0.txt"...
Writing "russiagaze" to "./async/new_file4.txt"...
Writing "alternative xylophone" to "./async/new_file3.txt"...
Time to complete asyncio read/writes: 0.71 seconds
...we see it's even faster still. And, in general, the asyncio method will always be a bit faster than the threading method. This is because when we use the "await" syntax, we essentially tell our program "hold on, I'll be right back," but our program keeps track of how long it takes us to finish what we're doing. Once we're done, our program will know, and will pick back up as soon as it's able. Threading in Python allows asynchronicity, but our program could theoretically skip around different threads that may not yet be ready, wasting time if there are threads ready to continue running.
So when should I use threading, and when should I use asyncio?
When you're writing new code, use asyncio. If you need to interface with older libraries or those that don't support asyncio, you might be better off with threading.
It turns out testing async functions with pytest is as easy as testing synchronous functions. Just install the pytest-asyncio package with pip
, mark your tests with the async
keyword, and apply a decorator that lets pytest
know it's asynchronous: @pytest.mark.asyncio
. Let's look at an example.
First, let's write an arbitrary async function in a file called hello_asyncio.py:
import asyncio
async def say_hello(name: str):
""" Sleeps for two seconds, then prints 'Hello, {{ name }}!' """
try:
if type(name) != str:
raise TypeError("'name' must be a string")
if name == "":
raise ValueError("'name' cannot be empty")
except (TypeError, ValueError):
raise
print("Sleeping...")
await asyncio.sleep(2)
print(f"Hello, {name}!")
The function takes a single string argument: name
. Upon ensuring that name
is a string with a length greater than one, our function asynchronously sleeps for two seconds, then prints "Hello, {name}!"
to the console.
The difference between
asyncio.sleep()
andtime.sleep()
is thatasyncio.sleep()
is non-blocking.
Now let's test it with pytest. In the same directory as hello_asyncio.py, create a file called test_hello_asyncio.py, then open it in your favorite text editor.
Let's start with our imports:
import pytest # Note: pytest-asyncio does not require a separate import
from hello_asyncio import say_hello
Then we'll create a test with proper input:
@pytest.mark.parametrize("name", [
"Robert Paulson",
"Seven of Nine",
"x Æ a-12"
])
@pytest.mark.asyncio
async def test_say_hello(name):
await say_hello(name)
Things to note:
@pytest.mark.asyncio
decorator lets pytest work asynchronouslyasync
syntaxawait
ing our async function as we would if we were running it outside of a testNow let's run our test with the verbose -v
option:
pytest -v
...
collected 3 items
test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED [ 33%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED [ 66%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED [100%]
Looks good. Next we'll write a couple of tests with bad input. Back inside of test_hello_asyncio.py, let's create a class called TestSayHelloThrowsExceptions
:
class TestSayHelloThrowsExceptions:
@pytest.mark.parametrize("name", [
"",
])
@pytest.mark.asyncio
async def test_say_hello_value_error(self, name):
with pytest.raises(ValueError):
await say_hello(name)
@pytest.mark.parametrize("name", [
19,
{"name", "Diane"},
[]
])
@pytest.mark.asyncio
async def test_say_hello_type_error(self, name):
with pytest.raises(TypeError):
await say_hello(name)
Again, we decorate our tests with @pytest.mark.asyncio
, mark our tests with the async
syntax, then call our function with await
.
Run the tests again:
pytest -v
...
collected 7 items
test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED [ 14%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED [ 28%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED [ 42%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_value_error[] PASSED [ 57%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[19] PASSED [ 71%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name1] PASSED [ 85%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name2] PASSED [100%]
Alternatively to pytest-asyncio, you can create a pytest fixture that yields an asyncio event loop:
import asyncio
import pytest
from hello_asyncio import say_hello
@pytest.fixture
def event_loop():
loop = asyncio.get_event_loop()
yield loop
Then, rather than using the async
/await
syntax, you create your tests as you would normal, synchronous tests:
@pytest.mark.parametrize("name", [
"Robert Paulson",
"Seven of Nine",
"x Æ a-12"
])
def test_say_hello(event_loop, name):
event_loop.run_until_complete(say_hello(name))
class TestSayHelloThrowsExceptions:
@pytest.mark.parametrize("name", [
"",
])
def test_say_hello_value_error(self, event_loop, name):
with pytest.raises(ValueError):
event_loop.run_until_complete(say_hello(name))
@pytest.mark.parametrize("name", [
19,
{"name", "Diane"},
[]
])
def test_say_hello_type_error(self, event_loop, name):
with pytest.raises(TypeError):
event_loop.run_until_complete(say_hello(name))
If you're interested, here's a more advanced tutorial on asyncio testing.
If you want to learn more about what distinguishes Python's implementation of threading vs asyncio, here's a great article from Medium.
For even better examples and explanations of threading in Python, here's a video by Corey Schafer that goes more in-depth, including using the concurrent.futures
library.
Lastly, for a massive deep-dive into asyncio itself, here's an article from Real Python completely dedicated to it.
Bonus: One more library you might be interested in is called Unsync, especially if you want to easily convert your current synchronous code into asynchronous code. To use it, you install the library with pip, import it with from unsync import unsync
, then decorate whatever currently synchronous function with @unsync
to make it asynchronous. To await it and get its return value (which you can do anywhere -- it doesn't have to be in an async/unsync function), just call .result()
after the function call.
What is parallelism?
Parallelism is very-much related to concurrency. In fact, parallelism is a subset of concurrency: whereas a concurrent process performs multiple tasks at the same time whether they're being diverted total attention or not, a parallel process is physically performing multiple tasks all at the same time. A good example would be driving, listening to music, and eating the BLT we made in the last section at the same time.
Because they don't require a lot of intensive effort, you can do them all at once without having to wait on anything or divert your attention away.
Now let's take a look at how to implement this in Python. We could use the multiprocessing
library, but let's use the concurrent.futures
library instead -- it eliminates the need to manage the number of process manually. Because the major benefit of multiprocessing happens when you perform multiple cpu-heavy tasks, we're going to compute the squares of 1 million (1000000) to 1 million and 16 (1000016).
You can find the code for this example here.
The only import we'll need is concurrent.futures
:
import concurrent.futures
import time
if __name__ == "__main__":
pow_list = [i for i in range(1000000, 1000016)]
print("Starting...")
start = time.time()
with concurrent.futures.ProcessPoolExecutor() as executor:
futures = [executor.submit(pow, i, i) for i in pow_list]
for f in concurrent.futures.as_completed(futures):
print("okay")
end = time.time()
print(f"Time to complete: {round(end - start, 2)}")
Because I'm developing on a Windows machine, I'm using
if __name__ == "main"
. This is necessary because Windows does not have thefork
system call inherent to Unix systems. Because Windows doesn't have this capability, it resorts to launching a new interpreter with each process that tries to import the main module. If the main module doesn't exist, it reruns your entire program, causing recursive chaos to ensue.
So taking a look at our main function, we use a list comprehension to create a list from 1 million to 1 million and 16, we open a ProcessPoolExecutor with concurrent.futures, and we use list comprehension and ProcessPoolExecutor().submit()
to start executing our processes and throwing them into a list called "futures."
We could also use
ThreadPoolExecutor()
if we wanted to use threads instead -- concurrent.futures is versatile.
And this is where the asynchronicity comes in: The "results" list does not actually contain the results from running our functions. Instead, it contains "futures" which are similar to the JavaScript idea of "promises." In order to allow our program to continue running, we get back these futures that represent a placeholder for a value. If we try to print the future, depending on whether it's finished running or not, we'll either get back a state of "pending" or "finished." Once it's finished we can get the return value (assuming there is one) using var.result()
. In this case, our var will be "result."
We then iterate through our list of futures, but instead of printing our values, we're simply printing out "okay." This is just because of how massive the resulting calculations come out to be.
Just as before, I built a comparison script that does this synchronously. And, just as before, you can find it on GitHub.
Running our control program, which also includes functionality for timing our program, we get:
Starting...
okay
...
okay
Time to complete: 54.64
Wow. 54.64 seconds is quite a long time. Let's see if our version with multiprocessing does any better:
Starting...
okay
...
okay
Time to complete: 6.24
Our time has been significantly reduced. We're at about 1/9th of our original time.
So what would happen if we used threading for this instead?
I'm sure you can guess -- it wouldn't be much faster than doing it synchronously. In fact, it might be slower because it still takes a little time and effort to spin up new threads. But don't take my word for it, here's what we get when we replace ProcessPoolExecutor()
with ThreadPoolExecutor()
:
Starting...
okay
...
okay
Time to complete: 53.83
As I mentioned earlier, threading allows your applications to focus on new tasks while others are waiting. In this case, we're never sitting idly by. Multiprocessing, on the other hand, spins up totally new services, usually on separate CPU cores, ready to do whatever you ask it completely in tandem with whatever else your script is doing. This is why the multiprocessing version taking roughly 1/9th of the time makes sense -- I have 8 cores in my CPU.
Now that we've talked about concurrency and parallelism in Python, we can finally set the terms straight. If you're having trouble distinguishing between the terms, you can safely and accurately think of our previous definitions of "parallelism" and "concurrency" as "parallel concurrency" and "non-parallel concurrency" respectively.
Real Python has a great article on concurrency vs parallelism.
Engineer Man has a good video comparison of threading vs multiprocessing.
Corey Schafer also has a good video on multiprocessing in the same spirit as his threading video.
If you only watch one video, watch this excellent talk by Raymond Hettinger. He does an amazing job explaining the differences between multiprocessing, threading, and asyncio.
What if I need to combine many I/O operations with heavy calculations?
We can do that too. Say you need to scrape 100 web pages for a specific piece of information, and then you need to save that piece of info in a file for later. We can separate the compute power across each of our computer's cores by making each process scrape a fraction of the pages.
For this script, let's install Beautiful Soup to help us easily scrape our pages: pip install beautifulsoup4
. This time we actually have quite a few imports. Here they are, and here's why we're using them:
import asyncio # Gives us async/await
import concurrent.futures # Allows creating new processes
import time
from math import floor # Helps divide up our requests evenly across our CPU cores
from multiprocessing import cpu_count # Returns our number of CPU cores
import aiofiles # For asynchronously performing file I/O operations
import aiohttp # For asynchronously making HTTP requests
from bs4 import BeautifulSoup # For easy webpage scraping
You can find the code for this example here.
First, we're going to create an async function that makes a request to Wikipedia to get back random pages. We'll scrape each page we get back for its title using BeautifulSoup
, and then we'll append it to a given file; we'll separate each title with a tab. The function will take two arguments:
async def get_and_scrape_pages(num_pages: int, output_file: str):
"""
Makes {{ num_pages }} requests to Wikipedia to receive {{ num_pages }} random
articles, then scrapes each page for its title and appends it to {{ output_file }},
separating each title with a tab: "\\t"
#### Arguments
---
num_pages: int -
Number of random Wikipedia pages to request and scrape
output_file: str -
File to append titles to
"""
async with \
aiohttp.ClientSession() as client, \
aiofiles.open(output_file, "a+", encoding="utf-8") as f:
for _ in range(num_pages):
async with client.get("https://en.wikipedia.org/wiki/Special:Random") as response:
if response.status > 399:
# I was getting a 429 Too Many Requests at a higher volume of requests
response.raise_for_status()
page = await response.text()
soup = BeautifulSoup(page, features="html.parser")
title = soup.find("h1").text
await f.write(title + "\t")
await f.write("\n")
We're both asynchronously opening an aiohttp ClientSession
and our output file. The mode, a+
, means append to the file and create it if it doesn't already exist. Encoding our strings as utf-8 ensures we don't get an error if our titles contain international characters. If we get an error response, we'll raise it instead of continuing (at high request volumes I was getting a 429 Too Many Requests). We asynchronously get the text from our response, then we parse the title and asynchronously and append it to our file. After we append all of our titles, we append a new line: "\n".
Our next function is the function we'll start with each new process to allow running it asynchronously:
def start_scraping(num_pages: int, output_file: str, i: int):
""" Starts an async process for requesting and scraping Wikipedia pages """
print(f"Process {i} starting...")
asyncio.run(get_and_scrape_pages(num_pages, output_file))
print(f"Process {i} finished.")
Now for our main function. Let's start with some constants (and our function declaration):
def main():
NUM_PAGES = 100 # Number of pages to scrape altogether
NUM_CORES = cpu_count() # Our number of CPU cores (including logical cores)
OUTPUT_FILE = "./wiki_titles.tsv" # File to append our scraped titles to
PAGES_PER_CORE = floor(NUM_PAGES / NUM_CORES)
PAGES_FOR_FINAL_CORE = PAGES_PER_CORE + NUM_PAGES % PAGES_PER_CORE # For our final core
And now the logic:
futures = []
with concurrent.futures.ProcessPoolExecutor(NUM_CORES) as executor:
for i in range(NUM_CORES - 1):
new_future = executor.submit(
start_scraping, # Function to perform
# v Arguments v
num_pages=PAGES_PER_CORE,
output_file=OUTPUT_FILE,
i=i
)
futures.append(new_future)
futures.append(
executor.submit(
start_scraping,
PAGES_FOR_FINAL_CORE, OUTPUT_FILE, NUM_CORES-1
)
)
concurrent.futures.wait(futures)
We create an array to store our futures, then we create a ProcessPoolExecutor
, setting its max_workers
equal to our number of cores. We iterate over a range equal to our number of cores minus 1, running a new process with our start_scraping
function. We then append it our futures list. Our final core will potentially have extra work to do as it will scrape a number of pages equal to each of our other cores, but will additionally scrape a number of pages equal to the remainder that we got when dividing our total number of pages to scrape by our total number of cpu cores.
Make sure to actually run your main function:
if __name__ == "__main__":
start = time.time()
main()
print(f"Time to complete: {round(time.time() - start, 2)} seconds.")
After running the program with my 8-core CPU (along with benchmarking code):
This version (asyncio with multiprocessing):
Time to complete: 5.65 seconds.
Time to complete: 8.87 seconds.
Time to complete: 47.92 seconds.
Time to complete: 88.86 seconds.
I'm actually quite surprised to see that the improvement of asyncio with multiprocessing over just multiprocessing wasn't as great as I thought it would be.
async
/await
and similar syntax also exist in other languages, and in some of those languages, its implementation can differ drastically.
The first programming language (back in 2007) to use the async
syntax was Microsoft's F#. Whereas it doesn't exactly use await
to wait on a function call, it uses specific syntax like let!
and do!
along with proprietary Async
functions included in the System
module.
You can find more about async programming in F# on Microsoft's F# docs.
Their C# team then built upon this concept, and that's where the async
/await
keywords that we're now familiar with were born:
using System;
// Allows the "Task" return type
using System.Threading.Tasks;
public class Program
{
// Declare an async function with "async"
private static async Task<string> ReturnHello()
{
return "hello world";
}
// Main can be async -- no problem
public static async Task Main()
{
// await an async string
string result = await ReturnHello();
// Print the string we got asynchronously
Console.WriteLine(result);
}
}
We ensure that we're using System.Threading.Tasks
as it includes the Task
type, and, in general, the Task
type is needed for an async function to be awaited. The cool thing about C# is that you can make your main function asynchronous just by declaring it with async
, and you won't have any issues.
If you're interested in learning more about
async
/await
in C#, Microsoft's C# docs have a good page on it.
First introduced in ES6, the async
/await
syntax is essentially an abstraction over JavaScript promises (which are similar to Python futures). Unlike Python, however, so long as you're not awaiting, you can call an async function normally without a specific function like Python's asyncio.start()
:
// Declare a function with async
async function returnHello(){
return "hello world";
}
async function printSomething(){
// await an async string
const result = await returnHello();
// print the string we got asynchronously
console.log(result);
}
// Run our async code
printSomething();
async
有關/await
in JavaScript的更多信息,請參閱 MDN 。
Rust 現在也允許使用async
/await
語法,它的工作方式類似於 Python、C# 和 JavaScript:
// Allows blocking synchronous code to run async code
use futures::executor::block_on;
// Declare an async function with "async"
async fn return_hello() -> String {
"hello world".to_string()
}
// Code that awaits must also be declared with "async"
async fn print_something(){
// await an async String
let result: String = return_hello().await;
// Print the string we got asynchronously
println!("{0}", result);
}
fn main() {
// Block the current synchronous execution to run our async code
block_on(print_something());
}
為了使用異步函數,我們必須首先添加futures = "0.3"
到我們的Cargo.toml中。然後,我們使用--導入block_on
函數,這是從同步函數運行異步函數所必需的。use futures::executor::block_onblock_onmain
你可以在 Rust 文檔中找到關於
async
/await
in Rust的更多信息。
Go 使用“goroutines”和“channels”,而不是我們之前介紹的所有語言固有的傳統async
/語法。await
您可以將通道視為類似於 Python 的未來。在 Go 中,您通常將通道作為參數發送給函數,然後用於go
並發運行該函數。每當您需要確保函數完成完成時,您都可以使用<-
語法,您可以將其視為更常見的await
語法。如果您的 goroutine(您正在異步運行的函數)有返回值,則可以通過這種方式獲取它。
package main
import "fmt"
// "chan" makes the return value a string channel instead of a string
func returnHello(result chan string){
// Gives our channel a value
result <- "hello world"
}
func main() {
// Creates a string channel
result := make(chan string)
// Starts execution of our goroutine
go returnHello(result)
// Awaits and prints our string
fmt.Println(<- result)
}
有關 Go 並發的更多信息,請參閱Caleb Doxsey的 Go 編程簡介。
與 Python 類似,Ruby 也有 Global Interpreter Lock 限制。它沒有的是語言內置的並發性。但是,有一個社區創建的 gem 允許在 Ruby 中進行並發,您可以在 GitHub 上找到它的源代碼。
與 Ruby 一樣,Java 沒有內置async
/語法,但它確實具有使用模塊的並發功能。但是,Electronic Arts 編寫了一個允許將其用作方法的異步庫。它與 Python/C#/JavaScript/Rust 並不完全相同,但如果您是 Java 開發人員並且對此類功能感興趣,則值得研究一下。awaitjava.util.concurrentawait
儘管 C++ 也沒有async
/await
語法,但它確實能夠使用期貨來使用futures
模塊同時運行代碼:
#include <iostream>
#include <string>
// Necessary for futures
#include <future>
// No async declaration needed
std::string return_hello() {
return "hello world";
}
int main ()
{
// Declares a string future
std::future<std::string> fut = std::async(return_hello);
// Awaits the result of the future
std::string result = fut.get();
// Prints the string we got asynchronously
std::cout << result << '\n';
}
無需使用任何關鍵字聲明函數來表示它是否可以並且應該異步運行。相反,您可以在需要時聲明您的初始未來,std::future<{{ function return type }}>
並將其設置為等於std::async()
,包括您要異步執行的函數的名稱以及它所採用的任何參數——即std::async(do_something, 1, 2, "string")
. 要等待未來的值,請使用其.get()
上的語法。
您可以在 cplusplus.com 上找到C++ 中的異步文檔。
無論您是在處理異步網絡或文件操作,還是在執行大量複雜的計算,都有幾種不同的方法可以最大限度地提高代碼的效率。
如果您使用的是 Python,則可以使用asyncio
或threading
充分利用 I/O 操作或multiprocessing
CPU 密集型代碼的模塊。
還要記住,該
concurrent.futures
模塊可以用來代替threading
或multiprocessing
。
如果您使用的是另一種編程語言,那麼可能也有async
/的實現await
。
1660198260
Que sont la concurrence et le parallélisme, et comment s'appliquent-ils à Python ?
Il existe de nombreuses raisons pour lesquelles vos applications peuvent être lentes. Parfois, cela est dû à une mauvaise conception algorithmique ou à un mauvais choix de structure de données. Parfois, cependant, cela est dû à des forces indépendantes de notre volonté, telles que des contraintes matérielles ou les bizarreries du réseau. C'est là que la concurrence et le parallélisme s'intègrent. Ils permettent à vos programmes de faire plusieurs choses à la fois, soit en même temps, soit en perdant le moins de temps possible à attendre des tâches occupées.
Que vous ayez affaire à des ressources Web externes, que vous lisiez et écriviez dans plusieurs fichiers, ou que vous ayez besoin d'utiliser plusieurs fois une fonction gourmande en calculs avec différents paramètres, cet article devrait vous aider à maximiser l'efficacité et la vitesse de votre code.
Tout d'abord, nous allons approfondir ce que sont la concurrence et le parallélisme et comment ils s'intègrent dans le domaine de Python en utilisant des bibliothèques standard telles que le threading, le multitraitement et l'asyncio. La dernière partie de cet article comparera l'implémentation de async
/ de Python await
avec la façon dont d'autres langages les ont implémentés.
Vous pouvez trouver tous les exemples de code de cet article dans le référentiel concurrency-parallelism-and-asyncio sur GitHub.
Pour parcourir les exemples de cet article, vous devez déjà savoir comment utiliser les requêtes HTTP.
À la fin de cet article, vous devriez être en mesure de répondre aux questions suivantes :
Qu'est-ce que la concurrence ?
Une définition efficace de la simultanéité est "être capable d'effectuer plusieurs tâches à la fois". C'est un peu trompeur, car les tâches peuvent ou non être effectuées exactement au même moment. Au lieu de cela, un processus peut démarrer, puis une fois qu'il attend une instruction spécifique pour se terminer, passer à une nouvelle tâche, pour revenir une fois qu'il n'attend plus. Une fois qu'une tâche est terminée, il repasse à une tâche inachevée jusqu'à ce qu'elles aient toutes été exécutées. Les tâches démarrent de manière asynchrone, sont exécutées de manière asynchrone, puis se terminent de manière asynchrone.
Si cela vous a déconcerté, pensons plutôt à une analogie : Supposons que vous vouliez faire un BLT . Tout d'abord, vous voudrez jeter le bacon dans une casserole à feu moyen-doux. Pendant la cuisson du bacon, vous pouvez sortir vos tomates et laitues et commencer à les préparer (laver et couper). Pendant tout ce temps, vous continuez à vérifier et à retourner de temps en temps votre bacon.
À ce stade, vous avez commencé une tâche, puis commencé et terminé deux autres entre-temps, tout en attendant toujours la première.
Finalement, vous mettez votre pain dans un grille-pain. Pendant qu'il grille, vous continuez à vérifier votre bacon. Au fur et à mesure que les pièces sont terminées, vous les sortez et les placez sur une assiette. Une fois votre pain grillé, vous y appliquez la pâte à tartiner de votre choix, puis vous pouvez commencer à superposer vos tomates, votre laitue, puis, une fois la cuisson terminée, votre bacon. Ce n'est qu'une fois que tout est cuit, préparé et en couches que vous pouvez placer le dernier morceau de pain grillé sur votre sandwich, le trancher (facultatif) et le manger.
Parce qu'il vous oblige à effectuer plusieurs tâches en même temps, la création d'un BLT est par nature un processus simultané, même si vous n'accordez pas toute votre attention à chacune de ces tâches en même temps. À toutes fins utiles, pour la section suivante, nous désignerons cette forme de concurrence par « concurrence ». Nous le différencierons plus tard dans cet article.
Pour cette raison, la simultanéité est idéale pour les processus gourmands en E/S, c'est-à-dire les tâches qui impliquent d'attendre des requêtes Web ou des opérations de lecture/écriture de fichiers.
En Python, il existe plusieurs façons d'obtenir la concurrence. La première que nous allons examiner est la bibliothèque de threads.
Pour nos exemples dans cette section, nous allons construire un petit programme Python qui récupère cinq fois un genre musical aléatoire de l'API Genrenator de Binary Jazz , imprime le genre à l'écran et place chacun dans son propre fichier.
Pour travailler avec le threading en Python, la seule importation dont vous aurez besoin est threading
, mais pour cet exemple, j'ai également importé urllib
pour travailler avec des requêtes HTTP, time
pour déterminer combien de temps les fonctions prennent pour se terminer et json
pour convertir facilement les données json renvoyées depuis l'API Genrenator.
Vous pouvez trouver le code de cet exemple ici .
Commençons par une fonction simple :
def write_genre(file_name):
"""
Uses genrenator from binaryjazz.us to write a random genre to the
name of the given file
"""
req = Request("https://binaryjazz.us/wp-json/genrenator/v1/genre/", headers={"User-Agent": "Mozilla/5.0"})
genre = json.load(urlopen(req))
with open(file_name, "w") as new_file:
print(f"Writing '{genre}' to '{file_name}'...")
new_file.write(genre)
En examinant le code ci-dessus, nous faisons une demande à l'API Genrenator, chargeons sa réponse JSON (un genre musical aléatoire), l'imprimons, puis l'écrivons dans un fichier.
Sans l'en-tête "User-Agent", vous recevrez un 304.
Ce qui nous intéresse vraiment, c'est la section suivante, où le threading réel se produit :
threads = []
for i in range(5):
thread = threading.Thread(
target=write_genre,
args=[f"./threading/new_file{i}.txt"]
)
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
Nous commençons d'abord par une liste. Nous procédons ensuite à une itération cinq fois, en créant un nouveau thread à chaque fois. Ensuite, nous démarrons chaque thread, l'ajoutons à notre liste de "threads", puis parcourons notre liste une dernière fois pour rejoindre chaque thread.
Explication : Créer des threads en Python est facile.
Pour créer un nouveau fil, utilisez threading.Thread()
. Vous pouvez y passer le kwarg (argument de mot clé) target
avec une valeur de la fonction que vous souhaitez exécuter sur ce thread. Mais ne transmettez que le nom de la fonction, pas sa valeur (c'est-à-dire, pour nos besoins, write_genre
et non write_genre()
). Pour passer des arguments, passez "kwargs" (qui prend un dict de vos kwargs) ou "args" (qui prend un itérable contenant vos arguments - dans ce cas, une liste).
Cependant, créer un thread n'est pas la même chose que démarrer un thread. Pour démarrer votre fil, utilisez {the name of your thread}.start()
. Démarrer un thread signifie "démarrer son exécution".
Enfin, lorsque nous rejoignons des threads avec thread.join()
, tout ce que nous faisons est de nous assurer que le thread est terminé avant de continuer avec notre code.
Mais qu'est-ce qu'un fil exactement ?
Un thread est un moyen de permettre à votre ordinateur de décomposer un processus/programme unique en plusieurs éléments légers qui s'exécutent en parallèle. De manière quelque peu déroutante, l'implémentation standard de Python du threading limite les threads à ne pouvoir s'exécuter qu'un seul à la fois en raison de quelque chose appelé le Global Interpreter Lock (GIL). Le GIL est nécessaire car la gestion de la mémoire de CPython (l'implémentation par défaut de Python) n'est pas thread-safe. En raison de cette limitation, le threading en Python est simultané, mais pas parallèle. Pour contourner ce problème, Python dispose d'un multiprocessing
module séparé non limité par le GIL qui exécute des processus séparés, permettant l'exécution parallèle de votre code. L'utilisation du multiprocessing
module est presque identique à l'utilisation du threading
module.
Plus d'informations sur le GIL de Python et la sécurité des threads peuvent être trouvées sur Real Python et la documentation officielle de Python .
Nous examinerons plus en détail le multitraitement en Python sous peu.
Avant de montrer l'amélioration potentielle de la vitesse par rapport au code non-thread, j'ai pris la liberté de créer également une version non-thread du même programme (là encore, disponible sur GitHub ). Au lieu de créer un nouveau thread et de joindre chacun d'eux, il appelle write_genre
à la place une boucle for qui itère cinq fois.
Pour comparer les benchmarks de vitesse, j'ai aussi importé la time
librairie pour chronométrer l'exécution de nos scripts :
Starting...
Writing "binary indoremix" to "./sync/new_file0.txt"...
Writing "slavic aggro polka fusion" to "./sync/new_file1.txt"...
Writing "israeli new wave" to "./sync/new_file2.txt"...
Writing "byzantine motown" to "./sync/new_file3.txt"...
Writing "dutch hate industrialtune" to "./sync/new_file4.txt"...
Time to complete synchronous read/writes: 1.42 seconds
Lors de l'exécution du script, nous constatons qu'il faut environ 1,49 seconde à mon ordinateur (ainsi que des genres musicaux classiques tels que "dutch hate industrialtune"). Pas mal.
Exécutons maintenant la version qui utilise le threading :
Starting...
Writing "college k-dubstep" to "./threading/new_file2.txt"...
Writing "swiss dirt" to "./threading/new_file0.txt"...
Writing "bop idol alternative" to "./threading/new_file4.txt"...
Writing "ethertrio" to "./threading/new_file1.txt"...
Writing "beach aust shanty français" to "./threading/new_file3.txt"...
Time to complete threading read/writes: 0.77 seconds
La première chose qui pourrait vous surprendre est que les fonctions ne sont pas complétées dans l'ordre : 2 - 0 - 4 - 1 - 3
Cela est dû à la nature asynchrone du threading : lorsqu'une fonction attend, une autre commence, et ainsi de suite. Étant donné que nous pouvons continuer à effectuer des tâches pendant que nous attendons que les autres finissent (soit en raison de la mise en réseau ou des opérations d'E/S de fichiers), vous avez peut-être également remarqué que nous réduisons notre temps environ de moitié : 0,77 seconde. Bien que cela ne semble pas beaucoup maintenant, il est facile d'imaginer le cas très réel de la création d'une application Web qui doit écrire beaucoup plus de données dans un fichier ou interagir avec des services Web beaucoup plus complexes.
Donc, si le threading est si génial, pourquoi ne pas terminer l'article ici ?
Parce qu'il existe des moyens encore meilleurs d'effectuer des tâches simultanément.
Jetons un coup d'œil à un exemple utilisant asyncio. Pour cette méthode, nous allons installer aiohttp en utilisant pip
. Cela nous permettra de faire des requêtes non bloquantes et de recevoir des réponses en utilisant la syntaxe async
/ await
qui sera introduite prochainement. Il a également l'avantage supplémentaire d'une fonction qui convertit une réponse JSON sans avoir besoin d'importer la json
bibliothèque. Nous installerons et importerons également des fichiers aio , ce qui permet des opérations de fichiers non bloquantes. Autre que aiohttp
et aiofiles
, import asyncio
, qui est fourni avec la bibliothèque standard Python.
"Non bloquant" signifie qu'un programme permettra à d'autres threads de continuer à s'exécuter pendant qu'il attend. Cela s'oppose au code "bloquant", qui arrête complètement l'exécution de votre programme. Les opérations d'E/S normales et synchrones souffrent de cette limitation.
Vous pouvez trouver le code de cet exemple ici .
Une fois nos importations en place, examinons la version asynchrone de la write_genre
fonction de notre exemple asyncio :
async def write_genre(file_name):
"""
Uses genrenator from binaryjazz.us to write a random genre to the
name of the given file
"""
async with aiohttp.ClientSession() as session:
async with session.get("https://binaryjazz.us/wp-json/genrenator/v1/genre/") as response:
genre = await response.json()
async with aiofiles.open(file_name, "w") as new_file:
print(f'Writing "{genre}" to "{file_name}"...')
await new_file.write(genre)
Pour ceux qui ne connaissent pas la syntaxe async
/ await
que l'on trouve dans de nombreux autres langages modernes, async
déclare qu'une fonction, une for
boucle ou une with
instruction doit être utilisée de manière asynchrone. Pour appeler une fonction asynchrone, vous devez soit utiliser le mot- await
clé d'une autre fonction asynchrone, soit appeler create_task()
directement à partir de la boucle d'événements, qui peut être extraite de asyncio.get_event_loop()
-- c'est-à-dire loop = asyncio.get_event_loop()
.
En outre:
async with
permet d'attendre des réponses asynchrones et des opérations sur les fichiers.async for
(non utilisé ici) itère sur un flux asynchrone .Les boucles d'événements sont des constructions inhérentes à la programmation asynchrone qui permettent d'effectuer des tâches de manière asynchrone. Pendant que vous lisez cet article, je peux supposer que vous n'êtes probablement pas trop familier avec le concept. Cependant, même si vous n'avez jamais écrit d'application asynchrone, vous avez de l'expérience avec les boucles d'événements chaque fois que vous utilisez un ordinateur. Que votre ordinateur écoute les entrées au clavier, que vous jouiez à des jeux multijoueurs en ligne ou que vous naviguiez sur Reddit pendant que vous copiez des fichiers en arrière-plan, une boucle d'événements est la force motrice qui permet à tout de fonctionner de manière fluide et efficace. Dans son essence la plus pure, une boucle d'événements est un processus qui attend des déclencheurs, puis exécute des actions spécifiques (programmées) une fois que ces déclencheurs sont satisfaits. Ils renvoient souvent une "promesse" (syntaxe JavaScript) ou un "futur" (syntaxe Python) d'une sorte pour indiquer qu'une tâche a été ajoutée. Une fois la tâche terminée, la promesse ou le futur renvoie une valeur renvoyée par la fonction appelée (en supposant que la fonction renvoie une valeur).
L'idée d'exécuter une fonction en réponse à une autre fonction s'appelle un "rappel".
Pour une autre approche des rappels et des événements, voici une excellente réponse sur Stack Overflow .
Voici une présentation de notre fonction :
Nous utilisons async with
pour ouvrir notre session client de manière asynchrone. La aiohttp.ClientSession()
classe est ce qui nous permet de faire des requêtes HTTP et de rester connecté à une source sans bloquer l'exécution de notre code. Nous faisons ensuite une requête asynchrone à l'API Genrenator et attendons la réponse JSON (un genre musical aléatoire). Dans la ligne suivante, nous utilisons async with
à nouveau avec la aiofiles
bibliothèque pour ouvrir de manière asynchrone un nouveau fichier dans lequel écrire notre nouveau genre. Nous imprimons le genre, puis l'écrivons dans le fichier.
Contrairement aux scripts Python classiques, la programmation avec asyncio applique à peu près* l'utilisation d'une sorte de fonction "principale".
*Sauf si vous utilisez la syntaxe obsolète "yield" avec le décorateur @asyncio.coroutine, qui sera supprimé dans Python 3.10 .
En effet, vous devez utiliser le mot-clé "async" pour utiliser la syntaxe "wait", et la syntaxe "wait" est le seul moyen d'exécuter réellement d'autres fonctions asynchrones.
Voici notre fonction principale :
async def main():
tasks = []
for i in range(5):
tasks.append(write_genre(f"./async/new_file{i}.txt"))
await asyncio.gather(*tasks)
Comme vous pouvez le voir, nous l'avons déclaré avec "async". Nous créons ensuite une liste vide appelée "tâches" pour héberger nos tâches asynchrones (appels à Genrenator et nos E/S de fichiers). Nous ajoutons nos tâches à notre liste, mais elles ne sont pas encore exécutées. Les appels ne sont pas passés tant que nous ne les avons pas planifiés avec await asyncio.gather(*tasks)
. Cela exécute toutes les tâches de notre liste et attend qu'elles se terminent avant de continuer avec le reste de notre programme. Enfin, nous utilisons asyncio.run(main())
pour exécuter notre fonction "main". La .run()
fonction est le point d'entrée de notre programme, et elle ne doit généralement être appelée qu'une seule fois par processus .
Pour ceux qui ne sont pas familiers, le
*
devant des tâches est appelé "déballage des arguments". Tout comme cela sonne, il décompresse notre liste en une série d'arguments pour notre fonction. Notre fonction estasyncio.gather()
dans ce cas.
Et c'est tout ce que nous devons faire. Maintenant, exécutant notre programme (dont la source inclut la même fonctionnalité de synchronisation des exemples synchrones et de threading)...
Writing "albuquerque fiddlehaus" to "./async/new_file1.txt"...
Writing "euroreggaebop" to "./async/new_file2.txt"...
Writing "shoedisco" to "./async/new_file0.txt"...
Writing "russiagaze" to "./async/new_file4.txt"...
Writing "alternative xylophone" to "./async/new_file3.txt"...
Time to complete asyncio read/writes: 0.71 seconds
...on voit que c'est encore plus rapide. Et, en général, la méthode asyncio sera toujours un peu plus rapide que la méthode de threading. En effet, lorsque nous utilisons la syntaxe "attendre", nous disons essentiellement à notre programme "attendez, je reviens tout de suite", mais notre programme garde une trace du temps qu'il nous faut pour terminer ce que nous faisons. Une fois que nous aurons terminé, notre programme le saura et reprendra dès qu'il le pourra. Le threading en Python permet l'asynchronicité, mais notre programme pourrait théoriquement ignorer différents threads qui ne sont peut-être pas encore prêts, perdant du temps s'il y a des threads prêts à continuer à fonctionner.
Alors, quand dois-je utiliser le threading et quand dois-je utiliser asyncio ?
Lorsque vous écrivez un nouveau code, utilisez asyncio. Si vous avez besoin d'interfacer avec des bibliothèques plus anciennes ou celles qui ne prennent pas en charge l'asyncio, vous feriez peut-être mieux d'utiliser le threading.
Il s'avère que tester des fonctions asynchrones avec pytest est aussi simple que tester des fonctions synchrones. Installez simplement le package pytest-asyncio avec pip
, marquez vos tests avec le mot- async
clé et appliquez un décorateur qui indique pytest
qu'il est asynchrone : @pytest.mark.asyncio
. Prenons un exemple.
Commençons par écrire une fonction asynchrone arbitraire dans un fichier appelé hello_asyncio.py :
import asyncio
async def say_hello(name: str):
""" Sleeps for two seconds, then prints 'Hello, {{ name }}!' """
try:
if type(name) != str:
raise TypeError("'name' must be a string")
if name == "":
raise ValueError("'name' cannot be empty")
except (TypeError, ValueError):
raise
print("Sleeping...")
await asyncio.sleep(2)
print(f"Hello, {name}!")
La fonction prend un seul argument de chaîne : name
. Après s'être assuré qu'il name
s'agit d'une chaîne d'une longueur supérieure à un, notre fonction dort de manière asynchrone pendant deux secondes, puis imprime "Hello, {name}!"
sur la console.
La différence entre
asyncio.sleep()
ettime.sleep()
est qu'ilasyncio.sleep()
n'est pas bloquant.
Testons-le maintenant avec pytest. Dans le même répertoire que hello_asyncio.py, créez un fichier appelé test_hello_asyncio.py, puis ouvrez-le dans votre éditeur de texte préféré.
Commençons par nos importations :
import pytest # Note: pytest-asyncio does not require a separate import
from hello_asyncio import say_hello
Ensuite, nous allons créer un test avec une entrée appropriée :
@pytest.mark.parametrize("name", [
"Robert Paulson",
"Seven of Nine",
"x Æ a-12"
])
@pytest.mark.asyncio
async def test_say_hello(name):
await say_hello(name)
À noter :
@pytest.mark.asyncio
décorateur permet à pytest de fonctionner de manière asynchroneasync
syntaxeawait
utilisons notre fonction asynchrone comme nous le ferions si nous l'exécutions en dehors d'un testExécutons maintenant notre test avec l' -v
option verbose :
pytest -v
...
collected 3 items
test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED [ 33%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED [ 66%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED [100%]
Cela semble bon. Ensuite, nous allons écrire quelques tests avec une mauvaise entrée. De retour à l'intérieur de test_hello_asyncio.py , créons une classe appelée TestSayHelloThrowsExceptions
:
class TestSayHelloThrowsExceptions:
@pytest.mark.parametrize("name", [
"",
])
@pytest.mark.asyncio
async def test_say_hello_value_error(self, name):
with pytest.raises(ValueError):
await say_hello(name)
@pytest.mark.parametrize("name", [
19,
{"name", "Diane"},
[]
])
@pytest.mark.asyncio
async def test_say_hello_type_error(self, name):
with pytest.raises(TypeError):
await say_hello(name)
Encore une fois, nous décorons nos tests avec @pytest.mark.asyncio
, marquons nos tests avec la async
syntaxe, puis appelons notre fonction avec await
.
Relancez les tests :
pytest -v
...
collected 7 items
test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED [ 14%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED [ 28%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED [ 42%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_value_error[] PASSED [ 57%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[19] PASSED [ 71%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name1] PASSED [ 85%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name2] PASSED [100%]
Alternativement à pytest-asyncio, vous pouvez créer un appareil pytest qui génère une boucle d'événement asyncio :
import asyncio
import pytest
from hello_asyncio import say_hello
@pytest.fixture
def event_loop():
loop = asyncio.get_event_loop()
yield loop
Ensuite, plutôt que d'utiliser la syntaxe async
/ await
, vous créez vos tests comme vous le feriez pour des tests synchrones normaux :
@pytest.mark.parametrize("name", [
"Robert Paulson",
"Seven of Nine",
"x Æ a-12"
])
def test_say_hello(event_loop, name):
event_loop.run_until_complete(say_hello(name))
class TestSayHelloThrowsExceptions:
@pytest.mark.parametrize("name", [
"",
])
def test_say_hello_value_error(self, event_loop, name):
with pytest.raises(ValueError):
event_loop.run_until_complete(say_hello(name))
@pytest.mark.parametrize("name", [
19,
{"name", "Diane"},
[]
])
def test_say_hello_type_error(self, event_loop, name):
with pytest.raises(TypeError):
event_loop.run_until_complete(say_hello(name))
Si cela vous intéresse, voici un tutoriel plus avancé sur les tests asynchrones .
Si vous voulez en savoir plus sur ce qui distingue l'implémentation de Python du threading par rapport à l'asyncio, voici un excellent article de Medium .
Pour des exemples et des explications encore meilleurs sur le threading en Python, voici une vidéo de Corey Schafer qui va plus en profondeur, y compris l'utilisation de la concurrent.futures
bibliothèque.
Enfin, pour une plongée massive dans l'asyncio lui-même, voici un article de Real Python entièrement dédié à celui-ci.
Bonus : Une autre bibliothèque qui pourrait vous intéresser s'appelle Unsync , surtout si vous souhaitez convertir facilement votre code synchrone actuel en code asynchrone. Pour l'utiliser, vous installez la bibliothèque avec pip, l'importez avec from unsync import unsync
, puis décorez la fonction actuellement synchrone avec @unsync
pour la rendre asynchrone. Pour l'attendre et obtenir sa valeur de retour (ce que vous pouvez faire n'importe où - il n'est pas nécessaire qu'il soit dans une fonction async/unsync), appelez simplement .result()
après l'appel de la fonction.
Qu'est-ce que le parallélisme ?
Le parallélisme est très lié à la concurrence. En fait, le parallélisme est un sous-ensemble de la simultanéité : alors qu'un processus simultané exécute plusieurs tâches en même temps, qu'elles fassent l'objet d'une attention totale ou non, un processus parallèle exécute physiquement plusieurs tâches en même temps. Un bon exemple serait de conduire, d'écouter de la musique et de manger le BLT que nous avons préparé dans la dernière section en même temps.
Parce qu'ils ne nécessitent pas beaucoup d'efforts intensifs, vous pouvez les faire tous en même temps sans avoir à attendre quoi que ce soit ou à détourner votre attention.
Voyons maintenant comment implémenter cela en Python. Nous pourrions utiliser la multiprocessing
bibliothèque, mais utilisons concurrent.futures
plutôt la bibliothèque -- cela élimine le besoin de gérer manuellement le nombre de processus. Étant donné que le principal avantage du multitraitement se produit lorsque vous effectuez plusieurs tâches gourmandes en ressources processeur, nous allons calculer les carrés de 1 million (1000000) à 1 million et 16 (1000016).
Vous pouvez trouver le code de cet exemple ici .
La seule importation dont nous aurons besoin estconcurrent.futures
:
import concurrent.futures
import time
if __name__ == "__main__":
pow_list = [i for i in range(1000000, 1000016)]
print("Starting...")
start = time.time()
with concurrent.futures.ProcessPoolExecutor() as executor:
futures = [executor.submit(pow, i, i) for i in pow_list]
for f in concurrent.futures.as_completed(futures):
print("okay")
end = time.time()
print(f"Time to complete: {round(end - start, 2)}")
Parce que je développe sur une machine Windows, j'utilise
if __name__ == "main"
. Cela est nécessaire car Windows ne dispose pas de l'fork
appel système inhérent aux systèmes Unix . Parce que Windows n'a pas cette capacité, il a recours au lancement d'un nouvel interpréteur avec chaque processus qui tente d'importer le module principal. Si le module principal n'existe pas, il relance tout votre programme, provoquant un chaos récursif.
Donc, en regardant notre fonction principale, nous utilisons une compréhension de liste pour créer une liste de 1 million à 1 million et 16, nous ouvrons un ProcessPoolExecutor avec concurrent.futures, et nous utilisons la compréhension de liste et ProcessPoolExecutor().submit()
pour commencer à exécuter nos processus et les lancer dans une liste appelée "futures".
Nous pourrions également utiliser
ThreadPoolExecutor()
si nous voulions utiliser des threads à la place - concurrent.futures est polyvalent.
Et c'est là qu'intervient l'asynchronicité : la liste des "résultats" ne contient pas réellement les résultats de l'exécution de nos fonctions. Au lieu de cela, il contient des "futurs" qui sont similaires à l'idée JavaScript de "promesses". Afin de permettre à notre programme de continuer à fonctionner, nous récupérons ces contrats à terme qui représentent un espace réservé pour une valeur. Si nous essayons d'imprimer le futur, selon qu'il est terminé ou non, nous retrouverons soit un état "en attente" soit "terminé". Une fois terminé, nous pouvons obtenir la valeur de retour (en supposant qu'il y en ait une) en utilisant var.result()
. Dans ce cas, notre var sera "résultat".
Nous parcourons ensuite notre liste de contrats à terme, mais au lieu d'imprimer nos valeurs, nous imprimons simplement "d'accord". C'est juste à cause de l'ampleur des calculs qui en résultent.
Comme avant, j'ai construit un script de comparaison qui fait cela de manière synchrone. Et, comme avant, vous pouvez le trouver sur GitHub .
En exécutant notre programme de contrôle, qui inclut également une fonctionnalité pour chronométrer notre programme, nous obtenons :
Starting...
okay
...
okay
Time to complete: 54.64
Ouah. 54,64 secondes, c'est assez long. Voyons si notre version avec multitraitement fait mieux :
Starting...
okay
...
okay
Time to complete: 6.24
Notre temps a été considérablement réduit. Nous sommes à environ 1/9e de notre temps d'origine.
Alors que se passerait-il si nous utilisions le threading pour cela à la place ?
Je suis sûr que vous pouvez deviner - ce ne serait pas beaucoup plus rapide que de le faire de manière synchrone. En fait, cela peut être plus lent car il faut encore un peu de temps et d'efforts pour créer de nouveaux threads. Mais ne me croyez pas sur parole, voici ce que nous obtenons lorsque nous remplaçons ProcessPoolExecutor()
par ThreadPoolExecutor()
:
Starting...
okay
...
okay
Time to complete: 53.83
Comme je l'ai mentionné précédemment, le threading permet à vos applications de se concentrer sur de nouvelles tâches pendant que d'autres attendent. Dans ce cas, nous ne restons jamais les bras croisés. Le multitraitement, d'autre part, crée des services totalement nouveaux, généralement sur des cœurs de processeur séparés, prêts à faire tout ce que vous lui demandez complètement en tandem avec tout ce que fait votre script. C'est pourquoi la version multitraitement prenant environ 1/9ème du temps a du sens - j'ai 8 cœurs dans mon CPU.
Maintenant que nous avons parlé de concurrence et de parallélisme en Python, nous pouvons enfin clarifier les termes. Si vous rencontrez des difficultés pour faire la distinction entre les termes, vous pouvez penser en toute sécurité et avec précision à nos définitions précédentes de "parallélisme" et de "concurrence" comme "concurrence parallèle" et "concurrence non parallèle" respectivement.
Real Python a un excellent article sur la concurrence vs le parallélisme .
Engineer Man a une bonne comparaison vidéo de threading vs multiprocessing .
Corey Schafer a également une bonne vidéo sur le multitraitement dans le même esprit que sa vidéo de threading.
Si vous ne regardez qu'une seule vidéo, regardez cette excellente conférence de Raymond Hettinger . Il fait un travail incroyable en expliquant les différences entre le multitraitement, le threading et l'asyncio.
Que se passe-t-il si j'ai besoin de combiner de nombreuses opérations d'E/S avec des calculs lourds ?
Nous pouvons le faire également. Supposons que vous deviez récupérer 100 pages Web pour une information spécifique, puis que vous deviez enregistrer cette information dans un fichier pour plus tard. Nous pouvons séparer la puissance de calcul entre chacun des cœurs de notre ordinateur en faisant en sorte que chaque processus gratte une fraction des pages.
Pour ce script, installons Beautiful Soup pour nous aider à gratter facilement nos pages : pip install beautifulsoup4
. Cette fois, nous avons en fait pas mal d'importations. Les voici, et voici pourquoi nous les utilisons :
import asyncio # Gives us async/await
import concurrent.futures # Allows creating new processes
import time
from math import floor # Helps divide up our requests evenly across our CPU cores
from multiprocessing import cpu_count # Returns our number of CPU cores
import aiofiles # For asynchronously performing file I/O operations
import aiohttp # For asynchronously making HTTP requests
from bs4 import BeautifulSoup # For easy webpage scraping
Vous pouvez trouver le code de cet exemple ici .
Tout d'abord, nous allons créer une fonction asynchrone qui demande à Wikipédia de récupérer des pages aléatoires. Nous gratterons chaque page que nous récupérons pour son titre en utilisant BeautifulSoup
, puis nous l'ajouterons à un fichier donné ; nous séparerons chaque titre par une tabulation. La fonction prendra deux arguments :
async def get_and_scrape_pages(num_pages: int, output_file: str):
"""
Makes {{ num_pages }} requests to Wikipedia to receive {{ num_pages }} random
articles, then scrapes each page for its title and appends it to {{ output_file }},
separating each title with a tab: "\\t"
#### Arguments
---
num_pages: int -
Number of random Wikipedia pages to request and scrape
output_file: str -
File to append titles to
"""
async with \
aiohttp.ClientSession() as client, \
aiofiles.open(output_file, "a+", encoding="utf-8") as f:
for _ in range(num_pages):
async with client.get("https://en.wikipedia.org/wiki/Special:Random") as response:
if response.status > 399:
# I was getting a 429 Too Many Requests at a higher volume of requests
response.raise_for_status()
page = await response.text()
soup = BeautifulSoup(page, features="html.parser")
title = soup.find("h1").text
await f.write(title + "\t")
await f.write("\n")
Nous ouvrons tous les deux de manière asynchrone un aiohttp ClientSession
et notre fichier de sortie. Le mode, a+
, signifie ajouter au fichier et le créer s'il n'existe pas déjà. L'encodage de nos chaînes en utf-8 garantit que nous n'obtenons pas d'erreur si nos titres contiennent des caractères internationaux. Si nous obtenons une réponse d'erreur, nous l'augmenterons au lieu de continuer (à des volumes de demandes élevés, j'obtenais un 429 Too Many Requests). Nous obtenons de manière asynchrone le texte de notre réponse, puis nous analysons le titre de manière asynchrone et l'ajoutons à notre fichier. Après avoir ajouté tous nos titres, nous ajoutons une nouvelle ligne : "\n".
Notre prochaine fonction est la fonction que nous allons démarrer avec chaque nouveau processus pour permettre son exécution asynchrone :
def start_scraping(num_pages: int, output_file: str, i: int):
""" Starts an async process for requesting and scraping Wikipedia pages """
print(f"Process {i} starting...")
asyncio.run(get_and_scrape_pages(num_pages, output_file))
print(f"Process {i} finished.")
Passons maintenant à notre fonction principale. Commençons par quelques constantes (et notre déclaration de fonction) :
def main():
NUM_PAGES = 100 # Number of pages to scrape altogether
NUM_CORES = cpu_count() # Our number of CPU cores (including logical cores)
OUTPUT_FILE = "./wiki_titles.tsv" # File to append our scraped titles to
PAGES_PER_CORE = floor(NUM_PAGES / NUM_CORES)
PAGES_FOR_FINAL_CORE = PAGES_PER_CORE + NUM_PAGES % PAGES_PER_CORE # For our final core
Et maintenant la logique :
futures = []
with concurrent.futures.ProcessPoolExecutor(NUM_CORES) as executor:
for i in range(NUM_CORES - 1):
new_future = executor.submit(
start_scraping, # Function to perform
# v Arguments v
num_pages=PAGES_PER_CORE,
output_file=OUTPUT_FILE,
i=i
)
futures.append(new_future)
futures.append(
executor.submit(
start_scraping,
PAGES_FOR_FINAL_CORE, OUTPUT_FILE, NUM_CORES-1
)
)
concurrent.futures.wait(futures)
Nous créons un tableau pour stocker nos futurs, puis nous créons un ProcessPoolExecutor
, en le définissant max_workers
comme égal à notre nombre de cœurs. Nous parcourons une plage égale à notre nombre de cœurs moins 1, en exécutant un nouveau processus avec notre start_scraping
fonction. Nous l'ajoutons ensuite à notre liste des contrats à terme. Notre noyau final aura potentiellement du travail supplémentaire à faire car il grattera un nombre de pages égal à chacun de nos autres noyaux, mais grattera en outre un nombre de pages égal au reste que nous avons obtenu en divisant notre nombre total de pages à gratter par notre nombre total de cœurs de processeur.
Assurez-vous d'exécuter réellement votre fonction principale :
if __name__ == "__main__":
start = time.time()
main()
print(f"Time to complete: {round(time.time() - start, 2)} seconds.")
Après avoir exécuté le programme avec mon processeur à 8 cœurs (avec le code d'analyse comparative) :
Cette version ( asyncio avec multitraitement ):
Time to complete: 5.65 seconds.
Time to complete: 8.87 seconds.
Time to complete: 47.92 seconds.
Time to complete: 88.86 seconds.
Je suis en fait assez surpris de voir que l'amélioration de l'asyncio avec le multitraitement par rapport au multitraitement n'était pas aussi grande que je le pensais.
async
/ await
et une syntaxe similaire existent également dans d'autres langages, et dans certains de ces langages, son implémentation peut différer considérablement.
Le premier langage de programmation (en 2007) à utiliser la async
syntaxe était le F# de Microsoft. Alors qu'il n'est pas exactement utilisé await
pour attendre un appel de fonction, il utilise une syntaxe spécifique comme let!
et do!
avec les Async
fonctions propriétaires incluses dans le System
module.
Vous pouvez en savoir plus sur la programmation asynchrone en F# dans la documentation F# de Microsoft .
Leur équipe C# s'est ensuite appuyée sur ce concept, et c'est là que sont nés les mots-clés async
/ await
que nous connaissons maintenant :
using System;
// Allows the "Task" return type
using System.Threading.Tasks;
public class Program
{
// Declare an async function with "async"
private static async Task<string> ReturnHello()
{
return "hello world";
}
// Main can be async -- no problem
public static async Task Main()
{
// await an async string
string result = await ReturnHello();
// Print the string we got asynchronously
Console.WriteLine(result);
}
}
On s'assure qu'on est using System.Threading.Tasks
tel qu'il inclut le Task
type, et, en général, le Task
type est nécessaire pour qu'une fonction asynchrone soit attendue. L'avantage de C# est que vous pouvez rendre votre fonction principale asynchrone simplement en la déclarant avec async
, et vous n'aurez aucun problème.
Si vous souhaitez en savoir plus sur
async
/await
dans C #, les documents C # de Microsoft contiennent une bonne page.
Introduite pour la première fois dans ES6, la syntaxe async
/ await
est essentiellement une abstraction des promesses JavaScript (qui sont similaires aux futures Python). Contrairement à Python, cependant, tant que vous n'attendez pas, vous pouvez appeler une fonction asynchrone normalement sans fonction spécifique comme celle de Python asyncio.start()
:
// Declare a function with async
async function returnHello(){
return "hello world";
}
async function printSomething(){
// await an async string
const result = await returnHello();
// print the string we got asynchronously
console.log(result);
}
// Run our async code
printSomething();
Voir MDN pour plus d'informations sur
async
/await
dans JavaScript .
Rust autorise désormais également l'utilisation de la syntaxe async
/ await
, et fonctionne de la même manière que Python, C# et JavaScript :
// Allows blocking synchronous code to run async code
use futures::executor::block_on;
// Declare an async function with "async"
async fn return_hello() -> String {
"hello world".to_string()
}
// Code that awaits must also be declared with "async"
async fn print_something(){
// await an async String
let result: String = return_hello().await;
// Print the string we got asynchronously
println!("{0}", result);
}
fn main() {
// Block the current synchronous execution to run our async code
block_on(print_something());
}
Pour utiliser les fonctions asynchrones, nous devons d'abord ajouter futures = "0.3"
à notre Cargo.toml . Nous importons ensuite la block_on
fonction avec use futures::executor::block_on
-- block_on
est nécessaire pour exécuter notre fonction asynchrone à partir de notre main
fonction synchrone.
Vous pouvez trouver plus d'informations sur
async
/await
dans Rust dans la documentation Rust.
Plutôt que la syntaxe async
/ traditionnelle await
inhérente à tous les langages précédents que nous avons couverts, Go utilise des "goroutines" et des "canaux". Vous pouvez considérer un canal comme étant similaire à un futur Python. Dans Go, vous envoyez généralement un canal comme argument à une fonction, puis utilisez go
pour exécuter la fonction simultanément. Chaque fois que vous devez vous assurer que la fonction est terminée, vous utilisez la <-
syntaxe, que vous pouvez considérer comme la await
syntaxe la plus courante. Si votre goroutine (la fonction que vous exécutez de manière asynchrone) a une valeur de retour, elle peut être saisie de cette façon.
package main
import "fmt"
// "chan" makes the return value a string channel instead of a string
func returnHello(result chan string){
// Gives our channel a value
result <- "hello world"
}
func main() {
// Creates a string channel
result := make(chan string)
// Starts execution of our goroutine
go returnHello(result)
// Awaits and prints our string
fmt.Println(<- result)
}
Exécutez-le dans le Go Playground
Pour plus d'informations sur la simultanéité dans Go, consultez An Introduction to Programming in Go par Caleb Doxsey.
Comme Python, Ruby a également la limitation Global Interpreter Lock. Ce qu'il n'a pas, c'est la simultanéité intégrée au langage. Cependant, il existe un joyau créé par la communauté qui permet la simultanéité dans Ruby, et vous pouvez trouver sa source sur GitHub .
Comme Ruby, Java n'a pas la syntaxe async
/ await
intégrée, mais il a des capacités de concurrence en utilisant le java.util.concurrent
module. Cependant, Electronic Arts a écrit une bibliothèque Async qui permet de l'utiliser await
comme méthode. Ce n'est pas exactement la même chose que Python/C#/JavaScript/Rust, mais cela vaut la peine d'être examiné si vous êtes un développeur Java et que vous êtes intéressé par ce type de fonctionnalité.
Bien que C++ n'ait pas non plus la syntaxe async
/ await
, il a la possibilité d'utiliser des contrats à terme pour exécuter du code simultanément à l'aide du futures
module :
#include <iostream>
#include <string>
// Necessary for futures
#include <future>
// No async declaration needed
std::string return_hello() {
return "hello world";
}
int main ()
{
// Declares a string future
std::future<std::string> fut = std::async(return_hello);
// Awaits the result of the future
std::string result = fut.get();
// Prints the string we got asynchronously
std::cout << result << '\n';
}
Il n'est pas nécessaire de déclarer une fonction avec un mot clé pour indiquer si elle peut et doit être exécutée de manière asynchrone. Au lieu de cela, vous déclarez votre futur initial chaque fois que vous en avez besoin avec std::future<{{ function return type }}>
et le définissez égal à std::async()
, y compris le nom de la fonction que vous souhaitez exécuter de manière asynchrone avec tous les arguments qu'elle prend - c'est-à-dire std::async(do_something, 1, 2, "string")
. Pour attendre la valeur du futur, utilisez la .get()
syntaxe dessus.
Vous pouvez trouver de la documentation pour async en C++ sur cplusplus.com.
Que vous travailliez avec des opérations de réseau ou de fichier asynchrones ou que vous effectuiez de nombreux calculs complexes, il existe plusieurs façons d'optimiser l'efficacité de votre code.
Si vous utilisez Python, vous pouvez utiliser asyncio
ou threading
pour tirer le meilleur parti des opérations d'E/S ou du multiprocessing
module pour le code gourmand en CPU.
Rappelez-vous également que le
concurrent.futures
module peut être utilisé à la place dethreading
oumultiprocessing
.
Si vous utilisez un autre langage de programmation, il y a de fortes chances qu'il y ait une implémentation de async
/ await
pour lui aussi.
Source : https://testdrive.io
1660190700
¿Qué son la concurrencia y el paralelismo, y cómo se aplican a Python?
Hay muchas razones por las que sus aplicaciones pueden ser lentas. A veces, esto se debe a un diseño algorítmico deficiente o a una elección incorrecta de la estructura de datos. A veces, sin embargo, se debe a fuerzas fuera de nuestro control, como restricciones de hardware o las peculiaridades de las redes. Ahí es donde encajan la concurrencia y el paralelismo. Permiten que sus programas hagan varias cosas a la vez, ya sea al mismo tiempo o perdiendo el menor tiempo posible esperando en tareas ocupadas.
Ya sea que esté tratando con recursos web externos, leyendo y escribiendo en varios archivos, o necesite usar una función de cálculo intensivo varias veces con diferentes parámetros, este artículo debería ayudarlo a maximizar la eficiencia y la velocidad de su código.
Primero, profundizaremos en qué son la simultaneidad y el paralelismo y cómo encajan en el ámbito de Python utilizando bibliotecas estándar como subprocesamiento, multiprocesamiento y asyncio. La última parte de este artículo comparará la implementación de async
/ de Python await
con la forma en que otros lenguajes la han implementado.
Puede encontrar todos los ejemplos de código de este artículo en el repositorio de concurrency-parallelism-and-asyncio en GitHub.
Para trabajar con los ejemplos de este artículo, ya debería saber cómo trabajar con solicitudes HTTP.
Al final de este artículo, debería poder responder las siguientes preguntas:
¿Qué es la concurrencia?
Una definición efectiva de concurrencia es "ser capaz de realizar múltiples tareas a la vez". Sin embargo, esto es un poco engañoso, ya que las tareas pueden o no realizarse exactamente al mismo tiempo. En cambio, un proceso podría comenzar, luego, una vez que está esperando que finalice una instrucción específica, cambiar a una nueva tarea, solo para regresar una vez que ya no esté esperando. Una vez que finaliza una tarea, cambia de nuevo a una tarea sin terminar hasta que se hayan realizado todas. Las tareas comienzan de forma asíncrona, se realizan de forma asíncrona y luego finalizan de forma asíncrona.
Si eso te resultó confuso, pensemos en una analogía: digamos que quieres hacer un BLT . Primero, querrás tirar el tocino en una sartén a fuego medio-bajo. Mientras se cocina el tocino, puedes sacar los tomates y la lechuga y comenzar a prepararlos (lavarlos y cortarlos). Mientras tanto, continúas revisando y ocasionalmente volteando tu tocino.
En este punto, ha comenzado una tarea y luego comenzó y completó dos más mientras tanto, todo mientras todavía está esperando la primera.
Eventualmente pones tu pan en una tostadora. Mientras se tuesta, continúas revisando tu tocino. A medida que se terminan las piezas, las saca y las coloca en un plato. Una vez que el pan haya terminado de tostarse, se le aplica la crema para untar de su elección, y luego puede comenzar a colocar capas sobre los tomates, la lechuga y luego, una vez que haya terminado de cocinarse, el tocino. Solo una vez que todo esté cocido, preparado y en capas, puede colocar la última tostada en su sándwich, cortarlo (opcional) y comerlo.
Debido a que requiere que realice varias tareas al mismo tiempo, hacer un BLT es inherentemente un proceso simultáneo, incluso si no está prestando toda su atención a cada una de esas tareas a la vez. Para todos los efectos, en la siguiente sección, nos referiremos a esta forma de concurrencia simplemente como "concurrencia". Lo diferenciaremos más adelante en este artículo.
Por esta razón, la simultaneidad es ideal para procesos intensivos de E/S, tareas que implican esperar solicitudes web u operaciones de lectura/escritura de archivos.
En Python, hay algunas formas diferentes de lograr la concurrencia. Lo primero que veremos es la biblioteca de subprocesos.
Para nuestros ejemplos en esta sección, vamos a construir un pequeño programa de Python que toma cinco veces un género musical aleatorio de la API Genrenator de Binary Jazz , imprime el género en la pantalla y coloca cada uno en su propio archivo.
Para trabajar con subprocesos en Python, la única importación que necesitará es threading
, pero para este ejemplo, también importé urllib
para trabajar con solicitudes HTTP, time
para determinar cuánto tardan las funciones en completarse y json
para convertir fácilmente los datos json devueltos. de la API de Genrenator.
Puede encontrar el código para este ejemplo aquí .
Comencemos con una función simple:
def write_genre(file_name):
"""
Uses genrenator from binaryjazz.us to write a random genre to the
name of the given file
"""
req = Request("https://binaryjazz.us/wp-json/genrenator/v1/genre/", headers={"User-Agent": "Mozilla/5.0"})
genre = json.load(urlopen(req))
with open(file_name, "w") as new_file:
print(f"Writing '{genre}' to '{file_name}'...")
new_file.write(genre)
Examinando el código anterior, hacemos una solicitud a la API de Genrenator, cargamos su respuesta JSON (un género musical aleatorio), la imprimimos y luego la escribimos en un archivo.
Sin el encabezado "User-Agent", recibirá un 304.
Lo que realmente nos interesa es la siguiente sección, donde ocurre el enhebrado real:
threads = []
for i in range(5):
thread = threading.Thread(
target=write_genre,
args=[f"./threading/new_file{i}.txt"]
)
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
Primero comenzamos con una lista. Luego procedemos a iterar cinco veces, creando un nuevo hilo cada vez. A continuación, comenzamos cada subproceso, lo agregamos a nuestra lista de "subprocesos" y luego iteramos sobre nuestra lista una última vez para unir cada subproceso.
Explicación: crear hilos en Python es fácil.
Para crear un hilo nuevo, utilice threading.Thread()
. Puede pasarle el kwarg (argumento de palabra clave) target
con un valor de cualquier función que le gustaría ejecutar en ese hilo. Pero solo pase el nombre de la función, no su valor (es decir, para nuestros propósitos, write_genre
y no write_genre()
). Para pasar argumentos, pase "kwargs" (que toma un dict de sus kwargs) o "args" (que toma un iterable que contiene sus argumentos, en este caso, una lista).
Sin embargo, crear un hilo no es lo mismo que iniciar un hilo. Para iniciar su hilo, utilice {the name of your thread}.start()
. Comenzar un hilo significa "comenzar su ejecución".
Por último, cuando unimos hilos con thread.join()
, todo lo que hacemos es asegurarnos de que el hilo haya terminado antes de continuar con nuestro código.
Pero, ¿qué es exactamente un hilo?
Un subproceso es una forma de permitir que su computadora divida un solo proceso/programa en muchas piezas livianas que se ejecutan en paralelo. De manera un tanto confusa, la implementación estándar de subprocesos de Python limita los subprocesos a solo poder ejecutar uno a la vez debido a algo llamado Bloqueo global de intérprete (GIL). El GIL es necesario porque la administración de memoria de CPython (la implementación predeterminada de Python) no es segura para subprocesos. Debido a esta limitación, los subprocesos en Python son concurrentes, pero no paralelos. Para evitar esto, Python tiene un multiprocessing
módulo separado no limitado por GIL que activa procesos separados, lo que permite la ejecución paralela de su código. El uso del multiprocessing
módulo es casi idéntico al uso del threading
módulo.
Puede encontrar más información sobre GIL de Python y la seguridad de subprocesos en Real Python y en los documentos oficiales de Python .
Echaremos un vistazo más profundo al multiprocesamiento en Python en breve.
Antes de mostrar la mejora potencial de la velocidad con respecto al código sin subprocesos, me tomé la libertad de crear también una versión sin subprocesos del mismo programa (nuevamente, disponible en GitHub ). En lugar de crear un nuevo subproceso y unir cada uno de ellos, llama write_genre
a un bucle for que itera cinco veces.
Para comparar puntos de referencia de velocidad, también importé la time
biblioteca para cronometrar la ejecución de nuestros scripts:
Starting...
Writing "binary indoremix" to "./sync/new_file0.txt"...
Writing "slavic aggro polka fusion" to "./sync/new_file1.txt"...
Writing "israeli new wave" to "./sync/new_file2.txt"...
Writing "byzantine motown" to "./sync/new_file3.txt"...
Writing "dutch hate industrialtune" to "./sync/new_file4.txt"...
Time to complete synchronous read/writes: 1.42 seconds
Al ejecutar el script, vemos que mi computadora tarda alrededor de 1,49 segundos (junto con géneros musicales clásicos como "Dutch Hat Industrialtune"). No está mal.
Ahora ejecutemos la versión que usa subprocesos:
Starting...
Writing "college k-dubstep" to "./threading/new_file2.txt"...
Writing "swiss dirt" to "./threading/new_file0.txt"...
Writing "bop idol alternative" to "./threading/new_file4.txt"...
Writing "ethertrio" to "./threading/new_file1.txt"...
Writing "beach aust shanty français" to "./threading/new_file3.txt"...
Time to complete threading read/writes: 0.77 seconds
Lo primero que le puede llamar la atención es que las funciones no se completan en orden: 2 - 0 - 4 - 1 - 3
Esto se debe a la naturaleza asincrónica de los subprocesos: mientras una función espera, comienza otra, y así sucesivamente. Debido a que podemos continuar realizando tareas mientras esperamos que otros terminen (ya sea debido a operaciones de red o de E/S de archivos), también puede haber notado que reducimos nuestro tiempo aproximadamente a la mitad: 0,77 segundos. Si bien esto puede no parecer mucho ahora, es fácil imaginar el caso real de crear una aplicación web que necesita escribir muchos más datos en un archivo o interactuar con servicios web mucho más complejos.
Entonces, si la creación de subprocesos es tan buena, ¿por qué no terminamos el artículo aquí?
Porque hay formas aún mejores de realizar tareas al mismo tiempo.
Echemos un vistazo a un ejemplo usando asyncio. Para este método, instalaremos aiohttp usando pip
. Esto nos permitirá realizar solicitudes sin bloqueo y recibir respuestas utilizando la sintaxis async
/ await
que se presentará en breve. También tiene el beneficio adicional de una función que convierte una respuesta JSON sin necesidad de importar la json
biblioteca. También instalaremos e importaremos aiofiles , lo que permite operaciones de archivo sin bloqueo. Aparte de aiohttp
and aiofiles
, import asyncio
, que viene con la biblioteca estándar de Python.
"Sin bloqueo" significa que un programa permitirá que otros subprocesos continúen ejecutándose mientras espera. Esto se opone al código de "bloqueo", que detiene la ejecución de su programa por completo. Las operaciones de E/S sincrónicas normales sufren esta limitación.
Puede encontrar el código para este ejemplo aquí .
Una vez que tengamos nuestras importaciones en su lugar, echemos un vistazo a la versión asíncrona de la write_genre
función de nuestro ejemplo asyncio:
async def write_genre(file_name):
"""
Uses genrenator from binaryjazz.us to write a random genre to the
name of the given file
"""
async with aiohttp.ClientSession() as session:
async with session.get("https://binaryjazz.us/wp-json/genrenator/v1/genre/") as response:
genre = await response.json()
async with aiofiles.open(file_name, "w") as new_file:
print(f'Writing "{genre}" to "{file_name}"...')
await new_file.write(genre)
Para aquellos que no están familiarizados con la sintaxis async
/ await
que se puede encontrar en muchos otros lenguajes modernos, async
declara que una función, for
bucle o with
declaración debe usarse de forma asíncrona. Para llamar a una función asíncrona, debe usar la await
palabra clave de otra función asíncrona o llamar create_task()
directamente desde el bucle de eventos, que se puede obtener de asyncio.get_event_loop()
, es decir, loop = asyncio.get_event_loop()
.
Además:
async with
permite esperar respuestas asíncronas y operaciones de archivo.async for
(no se usa aquí) itera sobre una secuencia asíncrona .Los bucles de eventos son construcciones inherentes a la programación asíncrona que permiten realizar tareas de forma asíncrona. Mientras lee este artículo, puedo asumir con seguridad que probablemente no esté muy familiarizado con el concepto. Sin embargo, incluso si nunca ha escrito una aplicación asíncrona, tiene experiencia con bucles de eventos cada vez que usa una computadora. Ya sea que su computadora esté escuchando la entrada del teclado, esté jugando juegos multijugador en línea o esté navegando en Reddit mientras tiene archivos que se copian en segundo plano, un bucle de eventos es la fuerza impulsora que mantiene todo funcionando sin problemas y de manera eficiente. En su esencia más pura, un bucle de eventos es un proceso que espera desencadenantes y luego realiza acciones específicas (programadas) una vez que se cumplen esos desencadenantes. A menudo devuelven una "promesa" (sintaxis de JavaScript) o "futuro" (sintaxis de Python) de algún tipo para indicar que se ha agregado una tarea. Una vez que finaliza la tarea, la promesa o el futuro devuelve un valor pasado desde la función llamada (suponiendo que la función devuelva un valor).
La idea de realizar una función en respuesta a otra función se denomina "devolución de llamada".
Para otra versión de las devoluciones de llamada y los eventos, aquí hay una excelente respuesta en Stack Overflow .
Aquí hay un tutorial de nuestra función:
Estamos usando async with
para abrir nuestra sesión de cliente de forma asíncrona. La aiohttp.ClientSession()
clase es lo que nos permite realizar solicitudes HTTP y permanecer conectados a una fuente sin bloquear la ejecución de nuestro código. Luego hacemos una solicitud asíncrona a la API de Genrenator y esperamos la respuesta JSON (un género musical aleatorio). En la siguiente línea, usamos async with
nuevamente con la aiofiles
biblioteca para abrir de forma asincrónica un nuevo archivo para escribir nuestro nuevo género. Imprimimos el género, luego lo escribimos en el archivo.
A diferencia de las secuencias de comandos regulares de Python, la programación con asyncio prácticamente impone* el uso de algún tipo de función "principal".
* A menos que esté usando la sintaxis obsoleta de "rendimiento" con el decorador @asyncio.coroutine, que se eliminará en Python 3.10 .
Esto se debe a que necesita usar la palabra clave "async" para usar la sintaxis "await", y la sintaxis "await" es la única forma de ejecutar realmente otras funciones asíncronas.
Aquí está nuestra función principal:
async def main():
tasks = []
for i in range(5):
tasks.append(write_genre(f"./async/new_file{i}.txt"))
await asyncio.gather(*tasks)
Como puede ver, lo hemos declarado con "async". Luego creamos una lista vacía llamada "tareas" para albergar nuestras tareas asíncronas (llamadas a Genrenator y nuestro archivo I/O). Agregamos nuestras tareas a nuestra lista, pero en realidad aún no se ejecutan. En realidad, las llamadas no se realizan hasta que las programamos con await asyncio.gather(*tasks)
. Esto ejecuta todas las tareas de nuestra lista y espera a que finalicen antes de continuar con el resto de nuestro programa. Por último, usamos asyncio.run(main())
para ejecutar nuestra función "principal". La .run()
función es el punto de entrada de nuestro programa y, por lo general , solo debe llamarse una vez por proceso .
Para aquellos que no estén familiarizados, el
*
frente de las tareas se llama "desempaquetado de argumentos". Tal como suena, descomprime nuestra lista en una serie de argumentos para nuestra función. Nuestra función esasyncio.gather()
en este caso.
Y eso es todo lo que tenemos que hacer. Ahora, ejecutando nuestro programa (cuya fuente incluye la misma funcionalidad de sincronización de los ejemplos sincrónicos y de subprocesamiento)...
Writing "albuquerque fiddlehaus" to "./async/new_file1.txt"...
Writing "euroreggaebop" to "./async/new_file2.txt"...
Writing "shoedisco" to "./async/new_file0.txt"...
Writing "russiagaze" to "./async/new_file4.txt"...
Writing "alternative xylophone" to "./async/new_file3.txt"...
Time to complete asyncio read/writes: 0.71 seconds
... vemos que es aún más rápido aún. Y, en general, el método asyncio siempre será un poco más rápido que el método de subprocesamiento. Esto se debe a que cuando usamos la sintaxis "aguardar", esencialmente le decimos a nuestro programa "espere, vuelvo enseguida", pero nuestro programa realiza un seguimiento de cuánto tiempo nos lleva terminar lo que estamos haciendo. Una vez que hayamos terminado, nuestro programa lo sabrá y se reanudará tan pronto como sea posible. La creación de subprocesos en Python permite la asincronía, pero nuestro programa teóricamente podría omitir diferentes subprocesos que aún no estén listos, perdiendo el tiempo si hay subprocesos listos para continuar ejecutándose.
Entonces, ¿cuándo debo usar subprocesos y cuándo debo usar asyncio?
Cuando esté escribiendo código nuevo, use asyncio. Si necesita interactuar con bibliotecas más antiguas o aquellas que no son compatibles con asyncio, es posible que esté mejor con subprocesos.
Resulta que probar funciones asíncronas con pytest es tan fácil como probar funciones síncronas. Simplemente instale el paquete pytest-asyncio con pip
, marque sus pruebas con la async
palabra clave y aplique un decorador que permita pytest
saber que es asíncrono: @pytest.mark.asyncio
. Veamos un ejemplo.
Primero, escribamos una función asíncrona arbitraria en un archivo llamado hello_asyncio.py :
import asyncio
async def say_hello(name: str):
""" Sleeps for two seconds, then prints 'Hello, {{ name }}!' """
try:
if type(name) != str:
raise TypeError("'name' must be a string")
if name == "":
raise ValueError("'name' cannot be empty")
except (TypeError, ValueError):
raise
print("Sleeping...")
await asyncio.sleep(2)
print(f"Hello, {name}!")
La función toma un solo argumento de cadena: name
. Al asegurarnos de que name
es una cadena con una longitud superior a uno, nuestra función duerme de forma asíncrona durante dos segundos y luego se imprime "Hello, {name}!"
en la consola.
La diferencia entre
asyncio.sleep()
ytime.sleep()
es queasyncio.sleep()
no bloquea.
Ahora vamos a probarlo con pytest. En el mismo directorio que hello_asyncio.py, cree un archivo llamado test_hello_asyncio.py, luego ábralo en su editor de texto favorito.
Comencemos con nuestras importaciones:
import pytest # Note: pytest-asyncio does not require a separate import
from hello_asyncio import say_hello
Luego crearemos una prueba con la entrada adecuada:
@pytest.mark.parametrize("name", [
"Robert Paulson",
"Seven of Nine",
"x Æ a-12"
])
@pytest.mark.asyncio
async def test_say_hello(name):
await say_hello(name)
Cosas a tener en cuenta:
@pytest.mark.asyncio
decorador permite que pytest funcione de forma asíncrona.async
sintaxisawait
ejecutando nuestra función asíncrona como lo haríamos si la estuviéramos ejecutando fuera de una prueba-v
Ahora ejecutemos nuestra prueba con la opción detallada :
pytest -v
...
collected 3 items
test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED [ 33%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED [ 66%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED [100%]
Se ve bien. A continuación, escribiremos un par de pruebas con una entrada incorrecta. De vuelta dentro de test_hello_asyncio.py , creemos una clase llamada TestSayHelloThrowsExceptions
:
class TestSayHelloThrowsExceptions:
@pytest.mark.parametrize("name", [
"",
])
@pytest.mark.asyncio
async def test_say_hello_value_error(self, name):
with pytest.raises(ValueError):
await say_hello(name)
@pytest.mark.parametrize("name", [
19,
{"name", "Diane"},
[]
])
@pytest.mark.asyncio
async def test_say_hello_type_error(self, name):
with pytest.raises(TypeError):
await say_hello(name)
Nuevamente, decoramos nuestras pruebas con @pytest.mark.asyncio
, marcamos nuestras pruebas con la async
sintaxis, luego llamamos a nuestra función con await
.
Vuelva a ejecutar las pruebas:
pytest -v
...
collected 7 items
test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED [ 14%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED [ 28%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED [ 42%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_value_error[] PASSED [ 57%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[19] PASSED [ 71%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name1] PASSED [ 85%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name2] PASSED [100%]
Como alternativa a pytest-asyncio, puede crear un accesorio pytest que produzca un bucle de eventos asyncio:
import asyncio
import pytest
from hello_asyncio import say_hello
@pytest.fixture
def event_loop():
loop = asyncio.get_event_loop()
yield loop
Luego, en lugar de usar la sintaxis async
/ await
, crea sus pruebas como lo haría con las pruebas sincrónicas normales:
@pytest.mark.parametrize("name", [
"Robert Paulson",
"Seven of Nine",
"x Æ a-12"
])
def test_say_hello(event_loop, name):
event_loop.run_until_complete(say_hello(name))
class TestSayHelloThrowsExceptions:
@pytest.mark.parametrize("name", [
"",
])
def test_say_hello_value_error(self, event_loop, name):
with pytest.raises(ValueError):
event_loop.run_until_complete(say_hello(name))
@pytest.mark.parametrize("name", [
19,
{"name", "Diane"},
[]
])
def test_say_hello_type_error(self, event_loop, name):
with pytest.raises(TypeError):
event_loop.run_until_complete(say_hello(name))
If you're interested, here's a more advanced tutorial on asyncio testing.
If you want to learn more about what distinguishes Python's implementation of threading vs asyncio, here's a great article from Medium.
For even better examples and explanations of threading in Python, here's a video by Corey Schafer that goes more in-depth, including using the concurrent.futures
library.
Lastly, for a massive deep-dive into asyncio itself, here's an article from Real Python completely dedicated to it.
Bonus: One more library you might be interested in is called Unsync, especially if you want to easily convert your current synchronous code into asynchronous code. To use it, you install the library with pip, import it with from unsync import unsync
, then decorate whatever currently synchronous function with @unsync
to make it asynchronous. To await it and get its return value (which you can do anywhere -- it doesn't have to be in an async/unsync function), just call .result()
after the function call.
What is parallelism?
Parallelism is very-much related to concurrency. In fact, parallelism is a subset of concurrency: whereas a concurrent process performs multiple tasks at the same time whether they're being diverted total attention or not, a parallel process is physically performing multiple tasks all at the same time. A good example would be driving, listening to music, and eating the BLT we made in the last section at the same time.
Because they don't require a lot of intensive effort, you can do them all at once without having to wait on anything or divert your attention away.
Now let's take a look at how to implement this in Python. We could use the multiprocessing
library, but let's use the concurrent.futures
library instead -- it eliminates the need to manage the number of process manually. Because the major benefit of multiprocessing happens when you perform multiple cpu-heavy tasks, we're going to compute the squares of 1 million (1000000) to 1 million and 16 (1000016).
You can find the code for this example here.
The only import we'll need is concurrent.futures
:
import concurrent.futures
import time
if __name__ == "__main__":
pow_list = [i for i in range(1000000, 1000016)]
print("Starting...")
start = time.time()
with concurrent.futures.ProcessPoolExecutor() as executor:
futures = [executor.submit(pow, i, i) for i in pow_list]
for f in concurrent.futures.as_completed(futures):
print("okay")
end = time.time()
print(f"Time to complete: {round(end - start, 2)}")
Because I'm developing on a Windows machine, I'm using
if __name__ == "main"
. This is necessary because Windows does not have thefork
system call inherent to Unix systems. Because Windows doesn't have this capability, it resorts to launching a new interpreter with each process that tries to import the main module. If the main module doesn't exist, it reruns your entire program, causing recursive chaos to ensue.
So taking a look at our main function, we use a list comprehension to create a list from 1 million to 1 million and 16, we open a ProcessPoolExecutor with concurrent.futures, and we use list comprehension and ProcessPoolExecutor().submit()
to start executing our processes and throwing them into a list called "futures."
We could also use
ThreadPoolExecutor()
if we wanted to use threads instead -- concurrent.futures is versatile.
And this is where the asynchronicity comes in: The "results" list does not actually contain the results from running our functions. Instead, it contains "futures" which are similar to the JavaScript idea of "promises." In order to allow our program to continue running, we get back these futures that represent a placeholder for a value. If we try to print the future, depending on whether it's finished running or not, we'll either get back a state of "pending" or "finished." Once it's finished we can get the return value (assuming there is one) using var.result()
. In this case, our var will be "result."
We then iterate through our list of futures, but instead of printing our values, we're simply printing out "okay." This is just because of how massive the resulting calculations come out to be.
Just as before, I built a comparison script that does this synchronously. And, just as before, you can find it on GitHub.
Running our control program, which also includes functionality for timing our program, we get:
Starting...
okay
...
okay
Time to complete: 54.64
Wow. 54.64 seconds is quite a long time. Let's see if our version with multiprocessing does any better:
Starting...
okay
...
okay
Time to complete: 6.24
Our time has been significantly reduced. We're at about 1/9th of our original time.
So what would happen if we used threading for this instead?
I'm sure you can guess -- it wouldn't be much faster than doing it synchronously. In fact, it might be slower because it still takes a little time and effort to spin up new threads. But don't take my word for it, here's what we get when we replace ProcessPoolExecutor()
with ThreadPoolExecutor()
:
Starting...
okay
...
okay
Time to complete: 53.83
As I mentioned earlier, threading allows your applications to focus on new tasks while others are waiting. In this case, we're never sitting idly by. Multiprocessing, on the other hand, spins up totally new services, usually on separate CPU cores, ready to do whatever you ask it completely in tandem with whatever else your script is doing. This is why the multiprocessing version taking roughly 1/9th of the time makes sense -- I have 8 cores in my CPU.
Now that we've talked about concurrency and parallelism in Python, we can finally set the terms straight. If you're having trouble distinguishing between the terms, you can safely and accurately think of our previous definitions of "parallelism" and "concurrency" as "parallel concurrency" and "non-parallel concurrency" respectively.
Real Python has a great article on concurrency vs parallelism.
Engineer Man has a good video comparison of threading vs multiprocessing.
Corey Schafer also has a good video on multiprocessing in the same spirit as his threading video.
If you only watch one video, watch this excellent talk by Raymond Hettinger. He does an amazing job explaining the differences between multiprocessing, threading, and asyncio.
What if I need to combine many I/O operations with heavy calculations?
We can do that too. Say you need to scrape 100 web pages for a specific piece of information, and then you need to save that piece of info in a file for later. We can separate the compute power across each of our computer's cores by making each process scrape a fraction of the pages.
For this script, let's install Beautiful Soup to help us easily scrape our pages: pip install beautifulsoup4
. This time we actually have quite a few imports. Here they are, and here's why we're using them:
import asyncio # Gives us async/await
import concurrent.futures # Allows creating new processes
import time
from math import floor # Helps divide up our requests evenly across our CPU cores
from multiprocessing import cpu_count # Returns our number of CPU cores
import aiofiles # For asynchronously performing file I/O operations
import aiohttp # For asynchronously making HTTP requests
from bs4 import BeautifulSoup # For easy webpage scraping
You can find the code for this example here.
First, we're going to create an async function that makes a request to Wikipedia to get back random pages. We'll scrape each page we get back for its title using BeautifulSoup
, and then we'll append it to a given file; we'll separate each title with a tab. The function will take two arguments:
async def get_and_scrape_pages(num_pages: int, output_file: str):
"""
Makes {{ num_pages }} requests to Wikipedia to receive {{ num_pages }} random
articles, then scrapes each page for its title and appends it to {{ output_file }},
separating each title with a tab: "\\t"
#### Arguments
---
num_pages: int -
Number of random Wikipedia pages to request and scrape
output_file: str -
File to append titles to
"""
async with \
aiohttp.ClientSession() as client, \
aiofiles.open(output_file, "a+", encoding="utf-8") as f:
for _ in range(num_pages):
async with client.get("https://en.wikipedia.org/wiki/Special:Random") as response:
if response.status > 399:
# I was getting a 429 Too Many Requests at a higher volume of requests
response.raise_for_status()
page = await response.text()
soup = BeautifulSoup(page, features="html.parser")
title = soup.find("h1").text
await f.write(title + "\t")
await f.write("\n")
Ambos estamos abriendo asíncronamente un aiohttp ClientSession
y nuestro archivo de salida. El modo, a+
significa agregar al archivo y crearlo si aún no existe. Codificar nuestras cadenas como utf-8 garantiza que no recibamos un error si nuestros títulos contienen caracteres internacionales. Si recibimos una respuesta de error, la generaremos en lugar de continuar (en grandes volúmenes de solicitudes recibía un 429 Demasiadas solicitudes). Obtenemos de forma asíncrona el texto de nuestra respuesta, luego analizamos el título y de forma asíncrona y lo agregamos a nuestro archivo. Después de agregar todos nuestros títulos, agregamos una nueva línea: "\n".
Nuestra próxima función es la función que comenzaremos con cada nuevo proceso para permitir ejecutarlo de forma asíncrona:
def start_scraping(num_pages: int, output_file: str, i: int):
""" Starts an async process for requesting and scraping Wikipedia pages """
print(f"Process {i} starting...")
asyncio.run(get_and_scrape_pages(num_pages, output_file))
print(f"Process {i} finished.")
Ahora nuestra función principal. Comencemos con algunas constantes (y nuestra declaración de función):
def main():
NUM_PAGES = 100 # Number of pages to scrape altogether
NUM_CORES = cpu_count() # Our number of CPU cores (including logical cores)
OUTPUT_FILE = "./wiki_titles.tsv" # File to append our scraped titles to
PAGES_PER_CORE = floor(NUM_PAGES / NUM_CORES)
PAGES_FOR_FINAL_CORE = PAGES_PER_CORE + NUM_PAGES % PAGES_PER_CORE # For our final core
Y ahora la lógica:
futures = []
with concurrent.futures.ProcessPoolExecutor(NUM_CORES) as executor:
for i in range(NUM_CORES - 1):
new_future = executor.submit(
start_scraping, # Function to perform
# v Arguments v
num_pages=PAGES_PER_CORE,
output_file=OUTPUT_FILE,
i=i
)
futures.append(new_future)
futures.append(
executor.submit(
start_scraping,
PAGES_FOR_FINAL_CORE, OUTPUT_FILE, NUM_CORES-1
)
)
concurrent.futures.wait(futures)
Creamos una matriz para almacenar nuestros futuros, luego creamos un ProcessPoolExecutor
, estableciendo su max_workers
igual a nuestra cantidad de núcleos. Iteramos sobre un rango igual a nuestro número de núcleos menos 1, ejecutando un nuevo proceso con nuestra start_scraping
función. Luego le agregamos nuestra lista de futuros. Nuestro núcleo final tendrá potencialmente trabajo adicional que hacer, ya que raspará una cantidad de páginas igual a cada uno de nuestros otros núcleos, pero también raspará una cantidad de páginas igual al resto que obtuvimos al dividir nuestro número total de páginas para raspar por nuestro número total de núcleos de CPU.
Asegúrate de ejecutar tu función principal:
if __name__ == "__main__":
start = time.time()
main()
print(f"Time to complete: {round(time.time() - start, 2)} seconds.")
Después de ejecutar el programa con mi CPU de 8 núcleos (junto con el código de evaluación comparativa):
Esta versión ( asyncio con multiprocesamiento ):
Time to complete: 5.65 seconds.
Time to complete: 8.87 seconds.
Time to complete: 47.92 seconds.
Time to complete: 88.86 seconds.
De hecho, estoy bastante sorprendido de ver que la mejora de asyncio con multiprocesamiento sobre solo multiprocesamiento no fue tan buena como pensé que sería.
async
/ await
y una sintaxis similar también existe en otros idiomas, y en algunos de esos idiomas, su implementación puede diferir drásticamente.
El primer lenguaje de programación (en 2007) que utilizó la async
sintaxis fue F# de Microsoft. Mientras que no suele await
esperar una llamada de función, utiliza una sintaxis específica como let!
y do!
junto con Async
funciones propietarias incluidas en el System
módulo.
Puede encontrar más información sobre la programación asíncrona en F# en los documentos de F# de Microsoft .
Luego, su equipo de C# se basó en este concepto, y ahí es donde nacieron las palabras clave async
/ await
con las que ahora estamos familiarizados:
using System;
// Allows the "Task" return type
using System.Threading.Tasks;
public class Program
{
// Declare an async function with "async"
private static async Task<string> ReturnHello()
{
return "hello world";
}
// Main can be async -- no problem
public static async Task Main()
{
// await an async string
string result = await ReturnHello();
// Print the string we got asynchronously
Console.WriteLine(result);
}
}
Nos aseguramos de que estemos using System.Threading.Tasks
ya que incluye el Task
tipo y, en general, el Task
tipo es necesario para que se espere una función asíncrona. Lo bueno de C# es que puede hacer que su función principal sea asíncrona simplemente declarándola con async
, y no tendrá ningún problema.
Si está interesado en obtener más información sobre
async
/await
en C#, los documentos de C# de Microsoft tienen una buena página al respecto.
Presentada por primera vez en ES6, la sintaxis async
/ await
es esencialmente una abstracción sobre las promesas de JavaScript (que son similares a los futuros de Python). Sin embargo, a diferencia de Python, siempre que no esté esperando, puede llamar a una función asíncrona normalmente sin una función específica como la de Python asyncio.start()
:
// Declare a function with async
async function returnHello(){
return "hello world";
}
async function printSomething(){
// await an async string
const result = await returnHello();
// print the string we got asynchronously
console.log(result);
}
// Run our async code
printSomething();
Consulte MDN para obtener más información sobre
async
/await
en JavaScript .
Rust ahora también permite el uso de la sintaxis async
/ await
y funciona de manera similar a Python, C# y JavaScript:
// Allows blocking synchronous code to run async code
use futures::executor::block_on;
// Declare an async function with "async"
async fn return_hello() -> String {
"hello world".to_string()
}
// Code that awaits must also be declared with "async"
async fn print_something(){
// await an async String
let result: String = return_hello().await;
// Print the string we got asynchronously
println!("{0}", result);
}
fn main() {
// Block the current synchronous execution to run our async code
block_on(print_something());
}
Para usar funciones asíncronas, primero debemos agregar futures = "0.3"
a nuestro Cargo.toml . Luego importamos la block_on
función con use futures::executor::block_on
-- block_on
es necesario para ejecutar nuestra función asíncrona desde nuestra main
función síncrona.
Puede encontrar más información sobre
async
/await
en Rust en los documentos de Rust.
async
En lugar de la sintaxis / tradicional await
inherente a todos los lenguajes anteriores que hemos cubierto, Go usa "goroutines" y "channels". Puede pensar en un canal como algo similar a un futuro de Python. En Go, generalmente envía un canal como argumento a una función y luego lo usa go
para ejecutar la función simultáneamente. Siempre que necesite asegurarse de que la función haya terminado de completarse, use la <-
sintaxis, que puede considerar como la await
sintaxis más común. Si su goroutine (la función que está ejecutando de forma asincrónica) tiene un valor de retorno, se puede obtener de esta manera.
package main
import "fmt"
// "chan" makes the return value a string channel instead of a string
func returnHello(result chan string){
// Gives our channel a value
result <- "hello world"
}
func main() {
// Creates a string channel
result := make(chan string)
// Starts execution of our goroutine
go returnHello(result)
// Awaits and prints our string
fmt.Println(<- result)
}
Ejecutarlo en el Go Playground
Para obtener más información sobre la simultaneidad en Go, consulte Introducción a la programación en Go de Caleb Doxsey.
De manera similar a Python, Ruby también tiene la limitación Global Interpreter Lock. Lo que no tiene es concurrencia integrada en el lenguaje. Sin embargo, existe una gema creada por la comunidad que permite la concurrencia en Ruby, y puede encontrar su fuente en GitHub .
Al igual que Ruby, Java no tiene la sintaxis async
/ await
incorporada, pero tiene capacidades de concurrencia usando el java.util.concurrent
módulo. Sin embargo, Electronic Arts escribió una biblioteca Async que permite su uso await
como método. No es exactamente lo mismo que Python/C#/JavaScript/Rust, pero vale la pena investigarlo si es un desarrollador de Java y está interesado en este tipo de funcionalidad.
Aunque C ++ tampoco tiene la sintaxis async
/ await
, tiene la capacidad de usar futuros para ejecutar código simultáneamente usando el futures
módulo:
#include <iostream>
#include <string>
// Necessary for futures
#include <future>
// No async declaration needed
std::string return_hello() {
return "hello world";
}
int main ()
{
// Declares a string future
std::future<std::string> fut = std::async(return_hello);
// Awaits the result of the future
std::string result = fut.get();
// Prints the string we got asynchronously
std::cout << result << '\n';
}
No es necesario declarar una función con ninguna palabra clave para indicar si puede y debe ejecutarse de forma asíncrona o no. En su lugar, declaras tu futuro inicial siempre que lo necesites std::future<{{ function return type }}>
y lo estableces igual a std::async()
, incluido el nombre de la función que deseas realizar de forma asíncrona junto con los argumentos necesarios, es decir, std::async(do_something, 1, 2, "string")
. Para esperar el valor del futuro, use la .get()
sintaxis en él.
Puede encontrar documentación para async en C++ en cplusplus.com.
Ya sea que esté trabajando con operaciones de archivo o de red asincrónicas o esté realizando numerosos cálculos complejos, hay algunas formas diferentes de maximizar la eficiencia de su código.
Si usa Python, puede usar asyncio
o threading
para aprovechar al máximo las operaciones de E/S o el multiprocessing
módulo para código con uso intensivo de CPU.
Recuerde también que el
concurrent.futures
módulo se puede usar en lugar dethreading
omultiprocessing
.
Si está utilizando otro lenguaje de programación, es probable que también haya una implementación de async
/ await
para él.
Fuente: https://testdriven.io
1660183260
O que são simultaneidade e paralelismo e como eles se aplicam ao Python?
Há muitas razões pelas quais seus aplicativos podem ser lentos. Às vezes, isso se deve ao design algorítmico ruim ou à escolha errada da estrutura de dados. Às vezes, no entanto, é devido a forças fora de nosso controle, como restrições de hardware ou peculiaridades da rede. É aí que a simultaneidade e o paralelismo se encaixam. Eles permitem que seus programas façam várias coisas ao mesmo tempo, seja ao mesmo tempo ou desperdiçando o mínimo de tempo possível esperando em tarefas ocupadas.
Não importa se você está lidando com recursos da Web externos, lendo e gravando em vários arquivos ou precisa usar uma função de cálculo intensivo várias vezes com parâmetros diferentes, este artigo deve ajudá-lo a maximizar a eficiência e a velocidade do seu código.
Primeiro, vamos nos aprofundar no que são simultaneidade e paralelismo e como eles se encaixam no reino do Python usando bibliotecas padrão como threading, multiprocessing e assíncrono. A última parte deste artigo irá comparar a implementação de async
/ do Python await
com como outras linguagens as implementaram.
Você pode encontrar todos os exemplos de código deste artigo no repositório concurrency-parallelism-and-asyncio no GitHub.
Para trabalhar com os exemplos deste artigo, você já deve saber como trabalhar com solicitações HTTP.
Ao final deste artigo, você deverá ser capaz de responder às seguintes perguntas:
O que é simultaneidade?
Uma definição eficaz para simultaneidade é "ser capaz de executar várias tarefas ao mesmo tempo". Isso é um pouco enganador, pois as tarefas podem ou não ser executadas exatamente ao mesmo tempo. Em vez disso, um processo pode iniciar e, quando estiver aguardando uma instrução específica para terminar, mudar para uma nova tarefa, apenas para voltar quando não estiver mais esperando. Quando uma tarefa é concluída, ela alterna novamente para uma tarefa inacabada até que todas tenham sido executadas. As tarefas começam de forma assíncrona, são executadas de forma assíncrona e terminam de forma assíncrona.
Se isso foi confuso para você, vamos pensar em uma analogia: digamos que você queira fazer um BLT . Primeiro, você vai querer jogar o bacon em uma panela em fogo médio-baixo. Enquanto o bacon cozinha, você pode pegar seus tomates e alface e começar a prepará-los (lavar e cortar). O tempo todo, você continua verificando e ocasionalmente virando seu bacon.
Neste ponto, você iniciou uma tarefa e, em seguida, iniciou e concluiu mais duas, enquanto ainda espera pela primeira.
Eventualmente você coloca seu pão em uma torradeira. Enquanto está torrando, você continua verificando seu bacon. À medida que as peças terminam, você as retira e as coloca em um prato. Depois que o pão estiver tostado, você aplica nele a pasta de sanduíche de sua escolha e, em seguida, pode começar a colocar camadas de tomate, alface e, quando terminar de cozinhar, o bacon. Somente depois que tudo estiver cozido, preparado e em camadas, você pode colocar o último pedaço de torrada no sanduíche, cortá-lo (opcional) e comê-lo.
Como exige que você execute várias tarefas ao mesmo tempo, fazer um BLT é inerentemente um processo simultâneo, mesmo que você não esteja dando atenção total a cada uma dessas tarefas de uma só vez. Para todos os efeitos, na próxima seção, nos referiremos a essa forma de simultaneidade apenas como "simultaneidade". Vamos diferenciá-lo mais adiante neste artigo.
Por esse motivo, a simultaneidade é ótima para processos com uso intenso de E/S -- tarefas que envolvem aguardar solicitações da Web ou operações de leitura/gravação de arquivos.
Em Python, existem algumas maneiras diferentes de alcançar a simultaneidade. A primeira que veremos é a biblioteca de encadeamento.
Para nossos exemplos nesta seção, vamos construir um pequeno programa Python que pega um gênero de música aleatório da API Genrenator da Binary Jazz cinco vezes, imprime o gênero na tela e coloca cada um em seu próprio arquivo.
Para trabalhar com encadeamento em Python, a única importação necessária é threading
, mas para este exemplo, também importei urllib
para trabalhar com solicitações HTTP, time
para determinar quanto tempo as funções levam para serem concluídas e json
para converter facilmente os dados json retornados da API do gerador.
Você pode encontrar o código para este exemplo aqui .
Vamos começar com uma função simples:
def write_genre(file_name):
"""
Uses genrenator from binaryjazz.us to write a random genre to the
name of the given file
"""
req = Request("https://binaryjazz.us/wp-json/genrenator/v1/genre/", headers={"User-Agent": "Mozilla/5.0"})
genre = json.load(urlopen(req))
with open(file_name, "w") as new_file:
print(f"Writing '{genre}' to '{file_name}'...")
new_file.write(genre)
Examinando o código acima, estamos fazendo uma solicitação à API do Genrenator, carregando sua resposta JSON (um gênero musical aleatório), imprimindo-a e gravando-a em um arquivo.
Sem o cabeçalho "User-Agent", você receberá um 304.
O que realmente nos interessa é a próxima seção, onde o encadeamento real acontece:
threads = []
for i in range(5):
thread = threading.Thread(
target=write_genre,
args=[f"./threading/new_file{i}.txt"]
)
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
Começamos com uma lista. Em seguida, continuamos a iterar cinco vezes, criando um novo encadeamento a cada vez. Em seguida, iniciamos cada thread, anexamos à nossa lista de "threads" e, em seguida, iteramos nossa lista uma última vez para ingressar em cada thread.
Explicação: Criar threads em Python é fácil.
Para criar um novo tópico, use threading.Thread()
. Você pode passar para ele o kwarg (argumento de palavra-chave) target
com um valor de qualquer função que você gostaria de executar nesse thread. Mas passe apenas o nome da função, não seu valor (ou seja, para nossos propósitos, write_genre
e não write_genre()
). Para passar argumentos, passe "kwargs" (que recebe um dict de seus kwargs) ou "args" (que recebe um iterável contendo seus argumentos -- neste caso, uma lista).
No entanto, criar um thread não é o mesmo que iniciar um thread. Para iniciar seu tópico, use {the name of your thread}.start()
. Iniciar um thread significa "iniciar sua execução".
Por fim, quando juntamos threads com thread.join()
, tudo o que estamos fazendo é garantir que o thread tenha terminado antes de continuar com nosso código.
Mas o que exatamente é um fio?
Um thread é uma maneira de permitir que seu computador divida um único processo/programa em muitas partes leves que são executadas em paralelo. Um tanto confuso, a implementação padrão de encadeamento do Python limita os encadeamentos a apenas serem capazes de executar um de cada vez devido a algo chamado Global Interpreter Lock (GIL). O GIL é necessário porque o gerenciamento de memória do CPython (implementação padrão do Python) não é thread-safe. Devido a essa limitação, o encadeamento em Python é simultâneo, mas não paralelo. Para contornar isso, o Python tem um multiprocessing
módulo separado não limitado pelo GIL que ativa processos separados, permitindo a execução paralela do seu código. O uso do multiprocessing
módulo é quase idêntico ao uso do threading
módulo.
Mais informações sobre a segurança de thread e GIL do Python podem ser encontradas em Real Python e nos documentos oficiais do Python .
Daremos uma olhada mais detalhada no multiprocessamento em Python em breve.
Antes de mostrarmos a potencial melhoria de velocidade em relação ao código não encadeado, tomei a liberdade de também criar uma versão não encadeada do mesmo programa (novamente, disponível no GitHub ). Em vez de criar um novo thread e unir cada um, ele chama write_genre
um loop for que itera cinco vezes.
Para comparar benchmarks de velocidade, também importei a time
biblioteca para cronometrar a execução de nossos scripts:
Starting...
Writing "binary indoremix" to "./sync/new_file0.txt"...
Writing "slavic aggro polka fusion" to "./sync/new_file1.txt"...
Writing "israeli new wave" to "./sync/new_file2.txt"...
Writing "byzantine motown" to "./sync/new_file3.txt"...
Writing "dutch hate industrialtune" to "./sync/new_file4.txt"...
Time to complete synchronous read/writes: 1.42 seconds
Ao executar o script, vemos que meu computador leva cerca de 1,49 segundo (junto com gêneros de música clássica como "dutch hate industrialtune"). Não é tão ruim.
Agora vamos executar a versão que usa threading:
Starting...
Writing "college k-dubstep" to "./threading/new_file2.txt"...
Writing "swiss dirt" to "./threading/new_file0.txt"...
Writing "bop idol alternative" to "./threading/new_file4.txt"...
Writing "ethertrio" to "./threading/new_file1.txt"...
Writing "beach aust shanty français" to "./threading/new_file3.txt"...
Time to complete threading read/writes: 0.77 seconds
A primeira coisa que pode se destacar para você é que as funções não estão sendo concluídas na ordem: 2 - 0 - 4 - 1 - 3
Isso ocorre devido à natureza assíncrona do encadeamento: enquanto uma função espera, outra começa e assim por diante. Como podemos continuar executando tarefas enquanto esperamos que os outros terminem (seja devido a operações de rede ou de E/S de arquivo), você também deve ter notado que reduzimos nosso tempo aproximadamente pela metade: 0,77 segundos. Embora isso possa não parecer muito agora, é fácil imaginar o caso real de construir um aplicativo da Web que precisa gravar muito mais dados em um arquivo ou interagir com serviços da Web muito mais complexos.
Então, se o encadeamento é tão bom, por que não terminamos o artigo aqui?
Porque existem maneiras ainda melhores de executar tarefas simultaneamente.
Vamos dar uma olhada em um exemplo usando assíncrono. Para este método, vamos instalar o aiohttp usando pip
. Isso nos permitirá fazer solicitações sem bloqueio e receber respostas usando a sintaxe async
/ await
que será introduzida em breve. Ele também tem o benefício extra de uma função que converte uma resposta JSON sem precisar importar a json
biblioteca. Também instalaremos e importaremos aiofiles , que permite operações de arquivo sem bloqueio. Além de aiohttp
e aiofiles
, import asyncio
, que vem com a biblioteca padrão do Python.
"Sem bloqueio" significa que um programa permitirá que outros threads continuem em execução enquanto aguardam. Isso se opõe ao código de "bloqueio", que interrompe completamente a execução do seu programa. As operações de E/S normais e síncronas sofrem com essa limitação.
Você pode encontrar o código para este exemplo aqui .
Assim que tivermos nossas importações em vigor, vamos dar uma olhada na versão assíncrona da write_genre
função do nosso exemplo assíncrono:
async def write_genre(file_name):
"""
Uses genrenator from binaryjazz.us to write a random genre to the
name of the given file
"""
async with aiohttp.ClientSession() as session:
async with session.get("https://binaryjazz.us/wp-json/genrenator/v1/genre/") as response:
genre = await response.json()
async with aiofiles.open(file_name, "w") as new_file:
print(f'Writing "{genre}" to "{file_name}"...')
await new_file.write(genre)
Para aqueles que não estão familiarizados com a sintaxe async
/ await
que pode ser encontrada em muitas outras linguagens modernas, async
declara que uma função, for
loop ou with
instrução deve ser usada de forma assíncrona. Para chamar uma função assíncrona, você deve usar a await
palavra-chave de outra função assíncrona ou chamar create_task()
diretamente do loop de eventos, que pode ser obtido de asyncio.get_event_loop()
-- ou seja, loop = asyncio.get_event_loop()
.
Adicionalmente:
async with
permite aguardar respostas assíncronas e operações de arquivo.async for
(não usado aqui) itera em um fluxo assíncrono .Os loops de eventos são construções inerentes à programação assíncrona que permitem realizar tarefas de forma assíncrona. Enquanto você está lendo este artigo, posso assumir com segurança que você provavelmente não está muito familiarizado com o conceito. No entanto, mesmo que você nunca tenha escrito um aplicativo assíncrono, você tem experiência com loops de eventos toda vez que usa um computador. Esteja seu computador ouvindo a entrada do teclado, você jogando jogos multiplayer online ou navegando no Reddit enquanto os arquivos são copiados em segundo plano, um loop de eventos é a força motriz que mantém tudo funcionando de forma suave e eficiente. Em sua essência mais pura, um loop de eventos é um processo que espera por gatilhos e, em seguida, executa ações específicas (programadas) quando esses gatilhos são atendidos. Eles geralmente retornam uma "promessa" (sintaxe JavaScript) ou "futuro" (Sintaxe Python) de algum tipo para denotar que uma tarefa foi adicionada. Depois que a tarefa é concluída, a promessa ou o futuro retorna um valor passado da função chamada (supondo que a função retorne um valor).
A ideia de executar uma função em resposta a outra função é chamada de "retorno de chamada".
Para outra abordagem sobre retornos de chamada e eventos, aqui está uma ótima resposta no Stack Overflow .
Aqui está um passo a passo da nossa função:
Estamos usando async with
para abrir nossa sessão do cliente de forma assíncrona. A aiohttp.ClientSession()
classe é o que nos permite fazer requisições HTTP e permanecermos conectados a uma fonte sem bloquear a execução do nosso código. Em seguida, fazemos uma solicitação assíncrona à API do Genrenator e aguardamos a resposta JSON (um gênero musical aleatório). Na próxima linha, usamos async with
novamente com a aiofiles
biblioteca para abrir de forma assíncrona um novo arquivo para gravar nosso novo gênero. Imprimimos o gênero e depois gravamos no arquivo.
Ao contrário dos scripts Python regulares, a programação com assíncrono praticamente impõe* o uso de algum tipo de função "principal".
*A menos que você esteja usando a sintaxe obsoleta "yield" com o decorador @asyncio.coroutine, que será removido no Python 3.10 .
Isso ocorre porque você precisa usar a palavra-chave "async" para usar a sintaxe "await", e a sintaxe "await" é a única maneira de realmente executar outras funções assíncronas.
Aqui está nossa função principal:
async def main():
tasks = []
for i in range(5):
tasks.append(write_genre(f"./async/new_file{i}.txt"))
await asyncio.gather(*tasks)
Como você pode ver, nós o declaramos com "async". Em seguida, criamos uma lista vazia chamada "tasks" para abrigar nossas tarefas assíncronas (chamadas para Genrenator e nosso arquivo I/O). Anexamos nossas tarefas à nossa lista, mas elas ainda não foram executadas. As chamadas não são feitas até que as agendemos com await asyncio.gather(*tasks)
. Isso executa todas as tarefas em nossa lista e espera que elas terminem antes de continuar com o restante do nosso programa. Por fim, usamos asyncio.run(main())
para executar nossa função "main". A .run()
função é o ponto de entrada para nosso programa e geralmente deve ser chamada apenas uma vez por processo .
Para quem não conhece, a
*
frente das tarefas é chamada de "descompactação de argumentos". Assim como parece, ele descompacta nossa lista em uma série de argumentos para nossa função. Nossa função éasyncio.gather()
neste caso.
E isso é tudo que precisamos fazer. Agora, executando nosso programa (cuja fonte inclui a mesma funcionalidade de tempo dos exemplos síncrono e de encadeamento)...
Writing "albuquerque fiddlehaus" to "./async/new_file1.txt"...
Writing "euroreggaebop" to "./async/new_file2.txt"...
Writing "shoedisco" to "./async/new_file0.txt"...
Writing "russiagaze" to "./async/new_file4.txt"...
Writing "alternative xylophone" to "./async/new_file3.txt"...
Time to complete asyncio read/writes: 0.71 seconds
...vemos que é ainda mais rápido. E, em geral, o método assíncrono sempre será um pouco mais rápido que o método de encadeamento. Isso ocorre porque quando usamos a sintaxe "await", basicamente dizemos ao nosso programa "espere, já volto", mas nosso programa registra quanto tempo levamos para terminar o que estamos fazendo. Quando terminarmos, nosso programa saberá e voltará a funcionar assim que possível. Threading em Python permite assincronia, mas nosso programa teoricamente poderia pular diferentes threads que podem ainda não estar prontos, perdendo tempo se houver threads prontos para continuar rodando.
Então, quando devo usar o encadeamento e quando devo usar o assíncrono?
Quando você estiver escrevendo um novo código, use assíncrono. Se você precisar fazer interface com bibliotecas mais antigas ou que não suportam assíncrono, talvez seja melhor usar o threading.
Acontece que testar funções assíncronas com pytest é tão fácil quanto testar funções síncronas. Basta instalar o pacote pytest-asyncio com pip
, marcar seus testes com a palavra- async
chave e aplicar um decorador que pytest
informe que é assíncrono: @pytest.mark.asyncio
. Vejamos um exemplo.
Primeiro, vamos escrever uma função assíncrona arbitrária em um arquivo chamado hello_asyncio.py :
import asyncio
async def say_hello(name: str):
""" Sleeps for two seconds, then prints 'Hello, {{ name }}!' """
try:
if type(name) != str:
raise TypeError("'name' must be a string")
if name == "":
raise ValueError("'name' cannot be empty")
except (TypeError, ValueError):
raise
print("Sleeping...")
await asyncio.sleep(2)
print(f"Hello, {name}!")
A função recebe um único argumento de string: name
. Ao garantir que name
é uma string com um comprimento maior que um, nossa função dorme de forma assíncrona por dois segundos e depois imprime "Hello, {name}!"
no console.
A diferença entre
asyncio.sleep()
etime.sleep()
é queasyncio.sleep()
não é bloqueante.
Agora vamos testá-lo com pytest. No mesmo diretório que hello_asyncio.py, crie um arquivo chamado test_hello_asyncio.py e abra-o em seu editor de texto favorito.
Vamos começar com nossas importações:
import pytest # Note: pytest-asyncio does not require a separate import
from hello_asyncio import say_hello
Em seguida, criaremos um teste com entrada adequada:
@pytest.mark.parametrize("name", [
"Robert Paulson",
"Seven of Nine",
"x Æ a-12"
])
@pytest.mark.asyncio
async def test_say_hello(name):
await say_hello(name)
Coisas a observar:
@pytest.mark.asyncio
decorador permite que o pytest funcione de forma assíncronaasync
sintaxeawait
executando nossa função assíncrona como faríamos se a estivéssemos executando fora de um testeAgora vamos executar nosso teste com a -v
opção verbose:
pytest -v
...
collected 3 items
test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED [ 33%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED [ 66%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED [100%]
Parece bom. Em seguida, escreveremos alguns testes com entrada ruim. De volta ao test_hello_asyncio.py , vamos criar uma classe chamada TestSayHelloThrowsExceptions
:
class TestSayHelloThrowsExceptions:
@pytest.mark.parametrize("name", [
"",
])
@pytest.mark.asyncio
async def test_say_hello_value_error(self, name):
with pytest.raises(ValueError):
await say_hello(name)
@pytest.mark.parametrize("name", [
19,
{"name", "Diane"},
[]
])
@pytest.mark.asyncio
async def test_say_hello_type_error(self, name):
with pytest.raises(TypeError):
await say_hello(name)
Novamente, decoramos nossos testes com @pytest.mark.asyncio
, marcamos nossos testes com a async
sintaxe e chamamos nossa função com await
.
Execute os testes novamente:
pytest -v
...
collected 7 items
test_hello_asyncio.py::test_say_hello[Robert Paulson] PASSED [ 14%]
test_hello_asyncio.py::test_say_hello[Seven of Nine] PASSED [ 28%]
test_hello_asyncio.py::test_say_hello[x \xc6 a-12] PASSED [ 42%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_value_error[] PASSED [ 57%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[19] PASSED [ 71%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name1] PASSED [ 85%]
test_hello_asyncio.py::TestSayHelloThrowsExceptions::test_say_hello_type_error[name2] PASSED [100%]
Alternativamente ao pytest-asyncio, você pode criar um acessório pytest que produz um loop de eventos assíncrono:
import asyncio
import pytest
from hello_asyncio import say_hello
@pytest.fixture
def event_loop():
loop = asyncio.get_event_loop()
yield loop
Então, em vez de usar a sintaxe async
/ await
, você cria seus testes como faria com testes normais e síncronos:
@pytest.mark.parametrize("name", [
"Robert Paulson",
"Seven of Nine",
"x Æ a-12"
])
def test_say_hello(event_loop, name):
event_loop.run_until_complete(say_hello(name))
class TestSayHelloThrowsExceptions:
@pytest.mark.parametrize("name", [
"",
])
def test_say_hello_value_error(self, event_loop, name):
with pytest.raises(ValueError):
event_loop.run_until_complete(say_hello(name))
@pytest.mark.parametrize("name", [
19,
{"name", "Diane"},
[]
])
def test_say_hello_type_error(self, event_loop, name):
with pytest.raises(TypeError):
event_loop.run_until_complete(say_hello(name))
Se você estiver interessado, aqui está um tutorial mais avançado sobre testes assíncronos .
Se você quiser saber mais sobre o que distingue a implementação de threading versus assíncrona do Python, aqui está um ótimo artigo do Medium .
Para exemplos e explicações ainda melhores de encadeamento em Python, aqui está um vídeo de Corey Schafer que é mais aprofundado, incluindo o uso da concurrent.futures
biblioteca.
Por fim, para um mergulho profundo no próprio assíncrono, aqui está um artigo do Real Python completamente dedicado a ele.
Bônus : mais uma biblioteca na qual você pode estar interessado é chamada Unsync , especialmente se você deseja converter facilmente seu código síncrono atual em código assíncrono. Para usá-lo, você instala a biblioteca com pip, importa com from unsync import unsync
, depois decora qualquer função atualmente síncrona @unsync
para torná-la assíncrona. Para aguardá-lo e obter seu valor de retorno (o que você pode fazer em qualquer lugar - não precisa estar em uma função assíncrona/dessincronizada), basta chamar .result()
após a chamada da função.
O que é paralelismo?
O paralelismo está muito relacionado à simultaneidade. Na verdade, o paralelismo é um subconjunto de simultaneidade: enquanto um processo simultâneo executa várias tarefas ao mesmo tempo, independentemente de estarem desviando a atenção total ou não, um processo paralelo está executando fisicamente várias tarefas ao mesmo tempo. Um bom exemplo seria dirigir, ouvir música e comer o BLT que fizemos na última seção ao mesmo tempo.
Como eles não exigem muito esforço intenso, você pode fazê-los todos de uma vez sem ter que esperar por nada ou desviar sua atenção.
Agora vamos dar uma olhada em como implementar isso em Python. Poderíamos usar a multiprocessing
biblioteca, mas vamos usar a concurrent.futures
biblioteca - elimina a necessidade de gerenciar o número de processos manualmente. Como o principal benefício do multiprocessamento acontece quando você executa várias tarefas com muita CPU, vamos calcular os quadrados de 1 milhão (1000000) a 1 milhão e 16 (1000016).
Você pode encontrar o código para este exemplo aqui .
A única importação que precisamos é concurrent.futures
:
import concurrent.futures
import time
if __name__ == "__main__":
pow_list = [i for i in range(1000000, 1000016)]
print("Starting...")
start = time.time()
with concurrent.futures.ProcessPoolExecutor() as executor:
futures = [executor.submit(pow, i, i) for i in pow_list]
for f in concurrent.futures.as_completed(futures):
print("okay")
end = time.time()
print(f"Time to complete: {round(end - start, 2)}")
Como estou desenvolvendo em uma máquina Windows, estou usando o
if __name__ == "main"
. Isso é necessário porque o Windows não possui afork
chamada de sistema inerente aos sistemas Unix . Como o Windows não possui esse recurso, ele lança um novo interpretador a cada processo que tenta importar o módulo principal. Se o módulo principal não existir, ele executará novamente todo o seu programa, causando um caos recursivo.
Então, dando uma olhada em nossa função principal, usamos uma compreensão de lista para criar uma lista de 1 milhão a 1 milhão e 16, abrimos um ProcessPoolExecutor com concurrent.futures e usamos compreensão de lista e ProcessPoolExecutor().submit()
começamos a executar nossos processos e jogá-los em uma lista chamada "futuros".
Também poderíamos usar
ThreadPoolExecutor()
se quiséssemos usar threads -- concurrent.futures é versátil.
E é aí que entra a assincronia: a lista de "resultados" não contém os resultados da execução de nossas funções. Em vez disso, ele contém "futuros" que são semelhantes à ideia JavaScript de "promessas". Para permitir que nosso programa continue em execução, recuperamos esses futuros que representam um espaço reservado para um valor. Se tentarmos imprimir o futuro, dependendo se ele terminou de ser executado ou não, retornaremos um estado de "pendente" ou "concluído". Uma vez terminado, podemos obter o valor de retorno (supondo que haja um) usando var.result()
. Nesse caso, nosso var será "resultado".
Em seguida, iteramos nossa lista de futuros, mas em vez de imprimir nossos valores, estamos simplesmente imprimindo "ok". Isso é apenas por causa de quão massivos os cálculos resultantes se tornam.
Assim como antes, criei um script de comparação que faz isso de forma síncrona. E, assim como antes, você pode encontrá-lo no GitHub .
Executando nosso programa de controle, que também inclui funcionalidade para cronometrar nosso programa, obtemos:
Starting...
okay
...
okay
Time to complete: 54.64
Uau. 54,64 segundos é bastante tempo. Vamos ver se nossa versão com multiprocessamento se sai melhor:
Starting...
okay
...
okay
Time to complete: 6.24
Nosso tempo foi significativamente reduzido. Estamos em cerca de 1/9 do nosso tempo original.
Então, o que aconteceria se usássemos threading para isso?
Tenho certeza que você pode adivinhar - não seria muito mais rápido do que fazê-lo de forma síncrona. Na verdade, pode ser mais lento porque ainda leva um pouco de tempo e esforço para criar novos tópicos. Mas não acredite na minha palavra, aqui está o que obtemos quando substituímos ProcessPoolExecutor()
por ThreadPoolExecutor()
:
Starting...
okay
...
okay
Time to complete: 53.83
Como mencionei anteriormente, o encadeamento permite que seus aplicativos se concentrem em novas tarefas enquanto outros estão esperando. Neste caso, nunca estamos sentados de braços cruzados. O multiprocessamento, por outro lado, gera serviços totalmente novos, geralmente em núcleos de CPU separados, prontos para fazer o que você pedir, completamente em conjunto com o que mais seu script estiver fazendo. É por isso que a versão de multiprocessamento que ocupa cerca de 1/9 do tempo faz sentido - eu tenho 8 núcleos na minha CPU.
Agora que falamos sobre simultaneidade e paralelismo em Python, podemos finalmente esclarecer os termos. Se você está tendo problemas para distinguir entre os termos, você pode pensar com segurança e precisão em nossas definições anteriores de "paralelismo" e "simultaneidade" como "simultaneidade paralela" e "simultaneidade não paralela", respectivamente.
Real Python tem um ótimo artigo sobre simultaneidade vs paralelismo .
Engineer Man tem uma boa comparação de vídeo de threading vs multiprocessamento .
Corey Schafer também tem um bom vídeo sobre multiprocessamento no mesmo espírito de seu vídeo de threading.
Se você assistir apenas a um vídeo, assista a esta excelente palestra de Raymond Hettinger . Ele faz um trabalho incrível explicando as diferenças entre multiprocessamento, threading e assíncrono.
E se eu precisar combinar muitas operações de E/S com cálculos pesados?
Podemos fazer isso também. Digamos que você precise extrair 100 páginas da Web para obter uma informação específica e, em seguida, salve essa informação em um arquivo para mais tarde. Podemos separar o poder de computação em cada um dos núcleos de nosso computador, fazendo com que cada processo raspe uma fração das páginas.
Para este script, vamos instalar o Beautiful Soup para nos ajudar a raspar facilmente nossas páginas: pip install beautifulsoup4
. Desta vez, temos realmente algumas importações. Aqui estão eles, e é por isso que os estamos usando:
import asyncio # Gives us async/await
import concurrent.futures # Allows creating new processes
import time
from math import floor # Helps divide up our requests evenly across our CPU cores
from multiprocessing import cpu_count # Returns our number of CPU cores
import aiofiles # For asynchronously performing file I/O operations
import aiohttp # For asynchronously making HTTP requests
from bs4 import BeautifulSoup # For easy webpage scraping
Você pode encontrar o código para este exemplo aqui .
Primeiro, vamos criar uma função assíncrona que faz uma solicitação à Wikipedia para recuperar páginas aleatórias. Vamos raspar cada página que obtivermos para seu título usando BeautifulSoup
, e então vamos anexá-la a um determinado arquivo; vamos separar cada título com uma tabulação. A função terá dois argumentos:
async def get_and_scrape_pages(num_pages: int, output_file: str):
"""
Makes {{ num_pages }} requests to Wikipedia to receive {{ num_pages }} random
articles, then scrapes each page for its title and appends it to {{ output_file }},
separating each title with a tab: "\\t"
#### Arguments
---
num_pages: int -
Number of random Wikipedia pages to request and scrape
output_file: str -
File to append titles to
"""
async with \
aiohttp.ClientSession() as client, \
aiofiles.open(output_file, "a+", encoding="utf-8") as f:
for _ in range(num_pages):
async with client.get("https://en.wikipedia.org/wiki/Special:Random") as response:
if response.status > 399:
# I was getting a 429 Too Many Requests at a higher volume of requests
response.raise_for_status()
page = await response.text()
soup = BeautifulSoup(page, features="html.parser")
title = soup.find("h1").text
await f.write(title + "\t")
await f.write("\n")
Ambos estamos abrindo de forma assíncrona um aiohttp ClientSession
e nosso arquivo de saída. O modo, a+
, significa anexar ao arquivo e criá-lo se ele ainda não existir. Codificar nossas strings como utf-8 garante que não receberemos um erro se nossos títulos contiverem caracteres internacionais. Se recebermos uma resposta de erro, vamos aumentá-la em vez de continuar (em grandes volumes de solicitações, eu estava recebendo 429 Too Many Requests). Obtemos de forma assíncrona o texto de nossa resposta, analisamos o título e o anexamos de forma assíncrona ao nosso arquivo. Depois de anexarmos todos os nossos títulos, anexamos uma nova linha: "\n".
Nossa próxima função é a função que iniciaremos com cada novo processo para permitir executá-lo de forma assíncrona:
def start_scraping(num_pages: int, output_file: str, i: int):
""" Starts an async process for requesting and scraping Wikipedia pages """
print(f"Process {i} starting...")
asyncio.run(get_and_scrape_pages(num_pages, output_file))
print(f"Process {i} finished.")
Agora para a nossa função principal. Vamos começar com algumas constantes (e nossa declaração de função):
def main():
NUM_PAGES = 100 # Number of pages to scrape altogether
NUM_CORES = cpu_count() # Our number of CPU cores (including logical cores)
OUTPUT_FILE = "./wiki_titles.tsv" # File to append our scraped titles to
PAGES_PER_CORE = floor(NUM_PAGES / NUM_CORES)
PAGES_FOR_FINAL_CORE = PAGES_PER_CORE + NUM_PAGES % PAGES_PER_CORE # For our final core
E agora a lógica:
futures = []
with concurrent.futures.ProcessPoolExecutor(NUM_CORES) as executor:
for i in range(NUM_CORES - 1):
new_future = executor.submit(
start_scraping, # Function to perform
# v Arguments v
num_pages=PAGES_PER_CORE,
output_file=OUTPUT_FILE,
i=i
)
futures.append(new_future)
futures.append(
executor.submit(
start_scraping,
PAGES_FOR_FINAL_CORE, OUTPUT_FILE, NUM_CORES-1
)
)
concurrent.futures.wait(futures)
Criamos um array para armazenar nossos futuros, depois criamos um ProcessPoolExecutor
, definindo seu max_workers
igual ao nosso número de núcleos. Nós iteramos em um intervalo igual ao nosso número de núcleos menos 1, executando um novo processo com nossa start_scraping
função. Em seguida, anexamos nossa lista de futuros. Nosso núcleo final terá potencialmente trabalho extra a fazer, pois irá extrair um número de páginas igual a cada um de nossos outros núcleos, mas também irá extrair um número de páginas igual ao restante que obtivemos ao dividir nosso número total de páginas a serem coletadas pelo nosso número total de núcleos de CPU.
Certifique-se de realmente executar sua função principal:
if __name__ == "__main__":
start = time.time()
main()
print(f"Time to complete: {round(time.time() - start, 2)} seconds.")
Depois de executar o programa com minha CPU de 8 núcleos (junto com o código de benchmark):
Esta versão ( assíncrona com multiprocessamento ):
Time to complete: 5.65 seconds.
Time to complete: 8.87 seconds.
Time to complete: 47.92 seconds.
Time to complete: 88.86 seconds.
Na verdade, estou bastante surpreso ao ver que a melhoria do assíncrono com multiprocessamento sobre apenas multiprocessamento não foi tão grande quanto eu pensei que seria.
async
/ await
e sintaxe semelhante também existem em outras linguagens e, em algumas dessas linguagens, sua implementação pode diferir drasticamente.
A primeira linguagem de programação (em 2007) a usar a async
sintaxe foi o F# da Microsoft. Considerando que ele não usa exatamente await
para esperar em uma chamada de função, ele usa uma sintaxe específica como let!
e do!
junto com Async
funções proprietárias incluídas no System
módulo.
Você pode encontrar mais informações sobre programação assíncrona em F# em F# docs da Microsoft .
A equipe de C# deles construiu esse conceito, e é aí que as palavras-chave async
/ com as await
quais estamos familiarizados nasceram:
using System;
// Allows the "Task" return type
using System.Threading.Tasks;
public class Program
{
// Declare an async function with "async"
private static async Task<string> ReturnHello()
{
return "hello world";
}
// Main can be async -- no problem
public static async Task Main()
{
// await an async string
string result = await ReturnHello();
// Print the string we got asynchronously
Console.WriteLine(result);
}
}
Garantimos que estamos using System.Threading.Tasks
como inclui o Task
tipo e, em geral, o Task
tipo é necessário para que uma função assíncrona seja aguardada. O legal do C# é que você pode tornar sua função principal assíncrona apenas declarando-a com async
, e você não terá problemas.
Se você estiver interessado em aprender mais sobre
async
/await
em C#, os documentos C# da Microsoft têm uma boa página.
Introduzido pela primeira vez no ES6, a sintaxe async
/ await
é essencialmente uma abstração sobre as promessas do JavaScript (que são semelhantes aos futuros do Python). Ao contrário do Python, no entanto, desde que você não esteja aguardando, você pode chamar uma função assíncrona normalmente sem uma função específica como a do Python asyncio.start()
:
// Declare a function with async
async function returnHello(){
return "hello world";
}
async function printSomething(){
// await an async string
const result = await returnHello();
// print the string we got asynchronously
console.log(result);
}
// Run our async code
printSomething();
Consulte MDN para obter mais informações sobre
async
/await
em JavaScript .
Rust agora também permite o uso da sintaxe async
/ await
e funciona de maneira semelhante ao Python, C# e JavaScript:
// Allows blocking synchronous code to run async code
use futures::executor::block_on;
// Declare an async function with "async"
async fn return_hello() -> String {
"hello world".to_string()
}
// Code that awaits must also be declared with "async"
async fn print_something(){
// await an async String
let result: String = return_hello().await;
// Print the string we got asynchronously
println!("{0}", result);
}
fn main() {
// Block the current synchronous execution to run our async code
block_on(print_something());
}
Para usar funções assíncronas, devemos primeiro adicionar futures = "0.3"
ao nosso Cargo.toml . Em seguida, importamos a block_on
função com use futures::executor::block_on
-- block_on
é necessário para executar nossa função assíncrona de nossa main
função síncrona.
Você pode encontrar mais informações sobre
async
/await
em Rust nos documentos do Rust.
Em vez da sintaxe async
/ tradicional await
inerente a todas as linguagens anteriores que abordamos, o Go usa "goroutines" e "channels". Você pode pensar em um canal como sendo semelhante a um futuro Python. Em Go, você geralmente envia um canal como um argumento para uma função e, em seguida, usa go
para executar a função simultaneamente. Sempre que precisar garantir que a função foi concluída, use a <-
sintaxe, que pode ser considerada a await
sintaxe mais comum. Se sua goroutine (a função que você está executando de forma assíncrona) tiver um valor de retorno, ela poderá ser capturada dessa maneira.
package main
import "fmt"
// "chan" makes the return value a string channel instead of a string
func returnHello(result chan string){
// Gives our channel a value
result <- "hello world"
}
func main() {
// Creates a string channel
result := make(chan string)
// Starts execution of our goroutine
go returnHello(result)
// Awaits and prints our string
fmt.Println(<- result)
}
Para obter mais informações sobre simultaneidade em Go, consulte An Introduction to Programming in Go por Caleb Doxsey.
Da mesma forma que o Python, o Ruby também possui a limitação Global Interpreter Lock. O que não tem é simultaneidade embutida na linguagem. No entanto, existe uma gem criada pela comunidade que permite simultaneidade em Ruby, e você pode encontrar sua fonte no GitHub .
Como Ruby, Java não possui a sintaxe async
/ await
incorporada, mas possui recursos de simultaneidade usando o java.util.concurrent
módulo. No entanto, a Electronic Arts escreveu uma biblioteca Async que permite o uso await
como método. Não é exatamente o mesmo que Python/C#/JavaScript/Rust, mas vale a pena dar uma olhada se você for um desenvolvedor Java e estiver interessado nesse tipo de funcionalidade.
Embora o C++ também não tenha a sintaxe async
/ await
, ele tem a capacidade de usar futuros para executar código simultaneamente usando o futures
módulo:
#include <iostream>
#include <string>
// Necessary for futures
#include <future>
// No async declaration needed
std::string return_hello() {
return "hello world";
}
int main ()
{
// Declares a string future
std::future<std::string> fut = std::async(return_hello);
// Awaits the result of the future
std::string result = fut.get();
// Prints the string we got asynchronously
std::cout << result << '\n';
}
Não há necessidade de declarar uma função com qualquer palavra-chave para indicar se ela pode ou não ser executada de forma assíncrona. Em vez disso, você declara seu futuro inicial sempre que precisar std::future<{{ function return type }}>
e o define como std::async()
, incluindo o nome da função que deseja executar de forma assíncrona junto com quaisquer argumentos necessários - ou seja, std::async(do_something, 1, 2, "string")
. Para aguardar o valor do futuro, use a .get()
sintaxe nele.
Você pode encontrar documentação para async em C++ em cplusplus.com.
Esteja você trabalhando com rede assíncrona ou operações de arquivo ou executando vários cálculos complexos, existem algumas maneiras diferentes de maximizar a eficiência do seu código.
Se estiver usando Python, você pode usar asyncio
ou threading
para aproveitar ao máximo as operações de E/S ou o multiprocessing
módulo para código com uso intensivo de CPU.
Lembre-se também de que o
concurrent.futures
módulo pode ser usado no lugar dethreading
oumultiprocessing
.
Se você estiver usando outra linguagem de programação, é provável que haja uma implementação de async
/ await
para ela também.
Fonte: https://testdrive.io
1654385580
この記事では、Golangでselect、goroutines、channelsを組み合わせた並行プログラムを構築する方法について説明します。
並行性、チャネル、およびゴルーチンの概念を理解するために、最初にこれら2つの記事を読むことをお勧めします。
選択する
Goツアーのドキュメントから:
「この
select
ステートメントにより、ゴルーチンは複数の通信操作を待機できます。ケースの
select
1つが実行できるようになるまでブロックし、その後、そのケースを実行します。複数の準備ができている場合は、ランダムに1つを選択します。」
APIサーバーの応答
select
最速のAPI呼び出しから応答を取得するためにどのように使用できるかを調査します。select
理解するためのコードとその強力な機能について詳しく見ていきましょう。
package main
import (
"encoding/json"
"fmt"
"io/ioutil"
"log"
"net/http"
"time"
)
const (
API_KEY = "f32ee7b348msh230c75aaf106721p1366a6jsn952b266f7ae5"
API_GOOGLE_NEWS_HOST = "google-news.p.rapidapi.com"
API_FREE_NEWS_HOST = "free-news.p.rapidapi.com"
GOOGLE_NEWS_URL = "https://google-news.p.rapidapi.com/v1/top_headlines?lang=en&country=US"
FREE_NEWS_URL = "https://free-news.p.rapidapi.com/v1/search?lang=en&q=Elon"
)
var (
google = make(chan News)
free = make(chan News)
)
type Article struct {
Title string `json:"title"`
Link string `json:"link"`
Id string `json:"id"`
MongoId string `json:"_id"`
}
type News struct {
Source string
Articles []*Article `json:"articles"`
}
type Function struct {
f func(news chan<- News)
channel chan News
}
func main() {
functions := []*Function{
{f: googleNews, channel: google},
{f: freeNews, channel: free},
}
quickestApiResponse(functions)
}
func quickestApiResponse(functions []*Function) {
var articles []*Article
for _, function := range functions {
function.Run()
}
select {
case googleNewsResponse := <-google:
fmt.Printf("Source: %s\n", googleNewsResponse.Source)
articles = googleNewsResponse.Articles
case freeNewsReponse := <-free:
fmt.Printf("Source: %s\n", freeNewsReponse.Source)
articles = freeNewsReponse.Articles
}
fmt.Printf("Articles %v\n", articles)
}
func googleNews(google chan<- News) {
req, err := http.NewRequest("GET", GOOGLE_NEWS_URL, nil)
if err != nil {
fmt.Printf("Error initializing request%v\n", err.Error())
return
}
req.Header.Add("X-RapidAPI-Key", API_KEY)
req.Header.Add("X-RapidAPI-Host", API_GOOGLE_NEWS_HOST)
client := &http.Client{}
resp, err := client.Do(req)
if err != nil {
fmt.Printf("Error making request %v\n", err.Error())
return
}
defer resp.Body.Close()
if resp.StatusCode != 200 {
fmt.Printf("Google News Response StatusCode %v Status %v\n", resp.StatusCode, resp.Status)
return
}
googleNewsArticles := News{Source: "GoogleNewsApi"}
if err := json.NewDecoder(resp.Body).Decode(&googleNewsArticles); err != nil {
fmt.Printf("Error decoding body %v\n", err.Error())
return
}
fmt.Printf("Google Articles %v\n", googleNewsArticles)
fmt.Printf("Google Articles Size %d\n", len(googleNewsArticles.Articles))
google <- googleNewsArticles
}
func freeNews(free chan<- News) {
req, err := http.NewRequest("GET", FREE_NEWS_URL, nil)
if err != nil {
fmt.Printf("Error initializing request%v\n", err.Error())
return
}
req.Header.Add("X-RapidAPI-Key", API_KEY)
req.Header.Add("X-RapidAPI-Host", API_FREE_NEWS_HOST)
client := &http.Client{}
resp, err := client.Do(req)
if err != nil {
fmt.Printf("Error making request %v\n", err.Error())
return
}
defer resp.Body.Close()
if resp.StatusCode != 200 {
fmt.Printf("Free News Response StatusCode %v Status %v\n", resp.StatusCode, resp.Status)
return
}
var freeNewsArticles News
if err := json.NewDecoder(resp.Body).Decode(&freeNewsArticles); err != nil {
fmt.Printf("Error decoding body %v\n", err.Error())
return
}
freeNewsArticles.Source = "FreeNewsApi"
fmt.Printf("Free Articles %v\n", freeNewsArticles)
fmt.Printf("Free Articles Size %d\n", len(freeNewsArticles.Articles))
free <- freeNewsArticles
}
func (a *Article) GetId() string {
if a.Id == "" {
return a.MongoId
}
return a.Id
}
func (f *Function) Run() {
go f.f(f.channel)
}
上記の実装は、ケースの1つが実行されるまでselectがどのように待機するかを強調することに焦点を当てています。
この例では、さまざまな部分を理解することが重要なので、それらを1つずつ見ていきましょう。
選択ロジックを見る前に、API呼び出しがどのように行われるかを調べてみましょう。
Function
構造体は単一のAPI呼び出しを表し、その属性はタイプf
のチャネルを取得するNews
関数です。この関数のシグネチャによって、チャネルがチャネルとして扱われることがすでに強制されていることに注意してくださいsend-only
。2番目の属性は、タイプのチャネルですNews
。 API呼び出しが実行され、応答が解析されます。このチャネルは、結果の送信に使用されます。
構造体は、News
記事を保持するオブジェクトであり、記事がどのソースからのものであるかを示します。
43行目では、のスライスFunction
を2つの要素で初期化します。最初の要素はgoogleNews
関数を持ち、google
チャネルを使用し、2番目の要素は関数を使用freeNews
してチャネルを使用しfree
ます。
両方のAPI呼び出しがニュースをフェッチするため、チャネルは同じタイプですが、関数ごとに1つです。
69行目と102行目には、これら2つのAPIの実装があります。それぞれがそれぞれのURLにHTTPリクエストを送信し、応答を解析します。それが完了すると、ニュースはそれぞれのチャネルを介して送信されます。
quickestApiResponse
それでは、メソッドに焦点を当てましょう。このメソッドの目的は、記事変数を最速のAPIからの応答に設定することです。Run
54行目では、メソッドを呼び出して各関数を実行しています。このメソッドは、関数で新しいゴルーチンを開始し、チャネルを渡します。これらのAPI呼び出しは、順番に実行したくないため、別のゴルーチンで実行する必要があることに注意してください。
次に、selectはgoogle
またはfree
チャネルのいずれかが応答を送信するのを待ちます。API呼び出しのいずれかがそれぞれのチャネルを介して応答を送信すると、selectはその場合のコードを実行し、他のコードを無視します。これにより、記事が最速のAPI呼び出しからの応答に効果的に設定されます。
プログラムを実行して、出力を確認してみましょう。
APIサーバーの応答出力
FreeNewsApi
より速く走った!
このロジックは他の多くのユースケースに適用でき、プログラムが複数のゴルーチンを実行し、チャネルを使用して通信し、selectを使用してそれらを待機できるようにします。
この例で実装できるもう1つのことは、ある種のタイムアウトを強制することです。API呼び出しに制限を超える場合は、記事を空のままにします。以下のコードは、selectにケースをもう1つ追加することでこれを実現しています。
const (
API_MAX_TIMEOUT = 3 * time.Second
)
func quickestApiResponse(functions []*Function) {
var articles []*Article
for _, function := range functions {
function.Run()
}
select {
case googleNewsResponse := <-google:
fmt.Printf("Source: %s\n", googleNewsResponse.Source)
articles = googleNewsResponse.Articles
case freeNewsReponse := <-free:
fmt.Printf("Source: %s\n", freeNewsReponse.Source)
articles = freeNewsReponse.Articles
case <-time.After(API_MAX_TIMEOUT):
fmt.Println("Time out! API calls took too long!!")
}
fmt.Printf("Articles %v\n", articles)
}
はタイプのtime.After
チャネルを返し、time.Time
指定された時間が経過すると現在の時刻を送信します。ここでは、このチャネルの値を変数に割り当てていないことに注意してください。これは、チャネルが送信するデータを気にせず、信号の受信のみを気にするためです。両方のAPIで3秒間スリープすると、タイムアウトケースが実行され、他の2つのケースは無視されます。
APIサーバーの応答タイムアウト
定期的なプロセスの実行
select
定期的なプロセスを実行するためにどのように使用できるかを見てみましょう。このプログラムでは、次のシナリオがあります。
プログラムは、プロセスが実行を開始するタイミングと各実行間の間隔時間として、任意の関数を定期的なプロセスとして渡す必要があります。
以下に初期コードがあります。見てみましょう。
package chaptereight
import (
"fmt"
"math/rand"
"time"
"github.com/brianvoe/gofakeit/v6"
)
type PendingUserNotifications map[int][]*Notification
type Notification struct {
Content string
UserId int
}
func sendUserBatchNotificationsEmail(userId int, notifications []*Notification) {
fmt.Printf("Sending email to user with userId %d for pending notifications %v\n", userId, notifications)
}
func handlePendingUsersNotifications(pendingNotifications PendingUserNotifications, handler func(userId int, notifications []*Notification)) {
for userId, notifications := range pendingNotifications {
handler(userId, notifications)
delete(pendingNotifications, userId)
}
}
func collectNewUsersNotifications(notifications PendingUserNotifications) {
randomNotifications := getRandomNotifications()
if len(randomNotifications) > 0 {
notifications[randomNotifications[0].UserId] = randomNotifications
}
}
func getRandomNotifications() (notifications []*Notification) {
rand.Seed(time.Now().UnixNano())
userId := rand.Intn(100-10+1) + 10
numOfNotifications := rand.Intn(5-0+1) + 0
fmt.Printf("numOfNotifications %v\n", numOfNotifications)
for i := 0; i < numOfNotifications; i++ {
notifications = append(notifications, &Notification{Content: gofakeit.Paragraph(1, 2, 10, " "), UserId: userId})
}
return
}
上記のコードは、実行するタスクを反映しています。2つの主要な機能collectNewUsersNotifications
とがありhandlePendingUsersNotifications
ます。1つ目は、すべての新しいユーザー通知を収集することを目的としています。理想的な実装は、この関数がデータベース内の未読通知を検索することですが、この例では、特定のユーザーに対してランダムな通知を受け取ることをシミュレートしています。
通知はNotification
、コンテンツ用とユーザーID用の2つのフィールドのみを持つ構造体を使用して作成されます。
収集関数は、PendingUserNotifications
タイプを使用して通知を格納します。このタイプは、キーがユーザーIDを表す整数であり、値がのスライスであるマップですNotification
。
すべての通知を収集handlePendingUserNotifications
したら、関数を使用して通知を繰り返し処理し、各通知に対してハンドラー関数を実行します。各ユーザーの通知を処理した後、それらはマップから削除されます。この場合に使用するハンドラーはsendUserBatchNotificationsEmail
です。その目的は、保留中のすべての通知を含む電子メールをユーザーに送信して、ユーザーが確認できるようにすることです。
ここで、を使用してこのタスクを繰り返し実行する方法に焦点を当てましょうselect
。前に述べたように、次のことを考慮する必要があります。
以下のコードは、これを実現する方法を示しています。
package main
import (
"fmt"
"math/rand"
"time"
"github.com/brianvoe/gofakeit/v6"
)
type PendingUserNotifications map[int][]*Notification
type ProcessHandler func()
type Notification struct {
Content string
UserId int
}
type RecurringProcess struct {
name string
interval time.Duration
startTime time.Time
handler func()
stop chan struct{}
}
func main() {
pendingNotificationsProcess()
}
func pendingNotificationsProcess() {
process := &RecurringProcess{}
notifications := PendingUserNotifications{}
handler := func() {
collectNewUsersNotifications(notifications)
handlePendingUsersNotifications(notifications, sendUserBatchNotificationsEmail, process)
}
interval := 10 * time.Second
startTime := time.Now().Add(3 * time.Minute)
process = createRecurringProcess("Pending User Notifications", handler, interval, startTime)
<-process.stop
}
func sendUserBatchNotificationsEmail(userId int, notifications []*Notification) {
fmt.Printf("Sending email to user with userId %d for pending notifications %v\n", userId, notifications)
}
func handlePendingUsersNotifications(pendingNotifications PendingUserNotifications, handler func(userId int, notifications []*Notification), process *RecurringProcess) {
userNotificationCount := 0
for userId, notifications := range pendingNotifications {
userNotificationCount++
handler(userId, notifications)
delete(pendingNotifications, userId)
}
if userNotificationCount == 0 {
process.Cancel()
}
}
func collectNewUsersNotifications(notifications PendingUserNotifications) {
randomNotifications := getRandomNotifications()
if len(randomNotifications) > 0 {
notifications[randomNotifications[0].UserId] = randomNotifications
}
}
func getRandomNotifications() (notifications []*Notification) {
rand.Seed(time.Now().UnixNano())
userId := rand.Intn(100-10+1) + 10
numOfNotifications := rand.Intn(5-0+1) + 0
fmt.Printf("numOfNotifications %v\n", numOfNotifications)
for i := 0; i < numOfNotifications; i++ {
notifications = append(notifications, &Notification{Content: gofakeit.Paragraph(1, 2, 10, " "), UserId: userId})
}
return
}
func createRecurringProcess(name string, handler ProcessHandler, interval time.Duration, startTime time.Time) *RecurringProcess {
process := &RecurringProcess{
name: name,
interval: interval,
startTime: startTime,
handler: handler,
stop: make(chan struct{}),
}
go process.Start()
return process
}
func (p *RecurringProcess) Start() {
startTicker := &time.Timer{}
ticker := &time.Ticker{C: nil}
defer func() { ticker.Stop() }()
if p.startTime.Before(time.Now()) {
p.startTime = time.Now()
}
startTicker = time.NewTimer(time.Until(p.startTime))
for {
select {
case <-startTicker.C:
ticker = time.NewTicker(p.interval)
fmt.Println("Starting recurring process")
p.handler()
case <-ticker.C:
fmt.Println("Next run")
p.handler()
case <-p.stop:
fmt.Println("Stoping recurring process")
return
}
}
}
func (p *RecurringProcess) Cancel() {
close(p.stop)
}
定期的なプロセスを表す新しい構造体を導入しましたRecurringProcess
。この構造体には、次のフィールドが含まれています。
name
—プロセスの名前interval
—各実行間の間隔時間startTime
—プロセスが開始される時間handler
—実行ごとに呼び出すハンドラー関数stop
—プロセスを停止するチャネル関数ではpendingNotificationsProcess
、新しい定期的なプロセスと通知をそれぞれ30行目と31行目に初期化します。使用するハンドラー関数は、collectNewUsersNotifications
とhandlePendingUsersNotifications
関数の両方を内部に持つ関数です。handlePendingUsersNotifications
ここで、プロセスを停止する必要があるため、プロセスをに渡していることに注意してください。
間隔と開始時間も指定しました。
次に、を呼び出しますcreateRecurringProcess
。この関数は定期的なプロセスを作成し、それも開始します。プロセスを開始するためにゴルーチンを使用している88行目に焦点を当てましょう。
40行目では、停止チャネルから読み取ることによってメインゴルーチンをブロックします。これは、メッセージがこのチャネルに送信されるまでメインゴルーチンがブロックされることを意味します。
Start
繰り返しプロセスを実行するためのすべてのロジックを含む93行目の関数を見てみましょう。
この関数は、startTicker
変数を使用して、開始時刻を使用して定期的なプロセスを開始します。開始時刻が過去の場合、プロセスはすぐに開始されます。
time.NewTimer
指定された期間が経過すると、はチャネルで現在の時刻を送信します。これにより、プロセスを開始できます。これが、チャネルが信号を受信するのを待機しているselectの最初のケースがある理由です。
また、95行目には。であるticker
変数がありtime.Ticker
ます。goのティッカーは、指定された間隔でそのチャネルにティックを送信します。startTicker.C
チャネルがシグナルを送信したら、ticker
106行目の変数に間隔を指定した新しいティッカーを割り当て、ハンドラー関数を呼び出します。
この後、ticker
は2番目の選択ケースでティックの受信を開始し、ティックを受信するたびに、ハンドラー関数も呼び出されます。
選択の最後のケースでは、シグナルが送信されるまで待機して、戻るだけでプロセスを停止します。
select
が無限for
ループ内にあることに注目してください。これは、ケースの1つが明示的にループを解除するまでループを継続したいためです。ティックを受け取るたびに、2番目のケースが実行され、次に同じループに入り、selectはそのケースのいくつかが実行されるのを再び待ちます。
55行目にロジックを追加してプロセスを停止するために、通知の数をカウントし、保留中の通知がない場合、プログラムはプロセスをキャンセルします。このCancel
関数は停止チャネルを閉じ、これによりプログラムが終了します。
プログラムを実行して、どのように機能するかを見てみましょう。
プログラム出力
プログラムは期待どおりに機能します。これは、定期的なプロセスを実行する方法の単なる例です。これは、より複雑なものを実装するための基本コードになる可能性があります。を使用して複雑なプログラムを作成できますselect
。
結論
並行プログラムの構築は、特にゴルーチン、チャネル、および選択がどのように機能するかを理解するのに苦労している場合、最初は難しい場合があります。
この記事で、混乱が少なくなり、を使用できるいくつかのユースケースが見つかったことを願っていますselect
。
読んでいただきありがとうございます。今後ともよろしくお願いいたします。
このストーリーは、もともとhttps://betterprogramming.pub/concurrency-with-select-goroutines-and-channels-9786e0c6be3cで公開されました
1654385520
En este artículo, vamos a hablar sobre cómo crear programas simultáneos combinando select, goroutines y canales en Golang.
Recomendaría leer estos dos artículos primero para familiarizarse con los conceptos de simultaneidad, canales y goroutines.
Seleccione
De la documentación del recorrido Go:
“La
select
declaración permite que una rutina go espere en múltiples operaciones de comunicación.A
select
bloquea hasta que se puede ejecutar uno de sus casos, luego ejecuta ese caso. Elige uno al azar si hay varios listos”.
Respuesta del servidor API
Vamos a investigar cómo podemos usar select
para tomar la respuesta de la llamada API más rápida. Sumerjámonos en un poco de código para entender select
y sus poderosas características.
package main
import (
"encoding/json"
"fmt"
"io/ioutil"
"log"
"net/http"
"time"
)
const (
API_KEY = "f32ee7b348msh230c75aaf106721p1366a6jsn952b266f7ae5"
API_GOOGLE_NEWS_HOST = "google-news.p.rapidapi.com"
API_FREE_NEWS_HOST = "free-news.p.rapidapi.com"
GOOGLE_NEWS_URL = "https://google-news.p.rapidapi.com/v1/top_headlines?lang=en&country=US"
FREE_NEWS_URL = "https://free-news.p.rapidapi.com/v1/search?lang=en&q=Elon"
)
var (
google = make(chan News)
free = make(chan News)
)
type Article struct {
Title string `json:"title"`
Link string `json:"link"`
Id string `json:"id"`
MongoId string `json:"_id"`
}
type News struct {
Source string
Articles []*Article `json:"articles"`
}
type Function struct {
f func(news chan<- News)
channel chan News
}
func main() {
functions := []*Function{
{f: googleNews, channel: google},
{f: freeNews, channel: free},
}
quickestApiResponse(functions)
}
func quickestApiResponse(functions []*Function) {
var articles []*Article
for _, function := range functions {
function.Run()
}
select {
case googleNewsResponse := <-google:
fmt.Printf("Source: %s\n", googleNewsResponse.Source)
articles = googleNewsResponse.Articles
case freeNewsReponse := <-free:
fmt.Printf("Source: %s\n", freeNewsReponse.Source)
articles = freeNewsReponse.Articles
}
fmt.Printf("Articles %v\n", articles)
}
func googleNews(google chan<- News) {
req, err := http.NewRequest("GET", GOOGLE_NEWS_URL, nil)
if err != nil {
fmt.Printf("Error initializing request%v\n", err.Error())
return
}
req.Header.Add("X-RapidAPI-Key", API_KEY)
req.Header.Add("X-RapidAPI-Host", API_GOOGLE_NEWS_HOST)
client := &http.Client{}
resp, err := client.Do(req)
if err != nil {
fmt.Printf("Error making request %v\n", err.Error())
return
}
defer resp.Body.Close()
if resp.StatusCode != 200 {
fmt.Printf("Google News Response StatusCode %v Status %v\n", resp.StatusCode, resp.Status)
return
}
googleNewsArticles := News{Source: "GoogleNewsApi"}
if err := json.NewDecoder(resp.Body).Decode(&googleNewsArticles); err != nil {
fmt.Printf("Error decoding body %v\n", err.Error())
return
}
fmt.Printf("Google Articles %v\n", googleNewsArticles)
fmt.Printf("Google Articles Size %d\n", len(googleNewsArticles.Articles))
google <- googleNewsArticles
}
func freeNews(free chan<- News) {
req, err := http.NewRequest("GET", FREE_NEWS_URL, nil)
if err != nil {
fmt.Printf("Error initializing request%v\n", err.Error())
return
}
req.Header.Add("X-RapidAPI-Key", API_KEY)
req.Header.Add("X-RapidAPI-Host", API_FREE_NEWS_HOST)
client := &http.Client{}
resp, err := client.Do(req)
if err != nil {
fmt.Printf("Error making request %v\n", err.Error())
return
}
defer resp.Body.Close()
if resp.StatusCode != 200 {
fmt.Printf("Free News Response StatusCode %v Status %v\n", resp.StatusCode, resp.Status)
return
}
var freeNewsArticles News
if err := json.NewDecoder(resp.Body).Decode(&freeNewsArticles); err != nil {
fmt.Printf("Error decoding body %v\n", err.Error())
return
}
freeNewsArticles.Source = "FreeNewsApi"
fmt.Printf("Free Articles %v\n", freeNewsArticles)
fmt.Printf("Free Articles Size %d\n", len(freeNewsArticles.Articles))
free <- freeNewsArticles
}
func (a *Article) GetId() string {
if a.Id == "" {
return a.MongoId
}
return a.Id
}
func (f *Function) Run() {
go f.f(f.channel)
}
La implementación anterior se enfoca en resaltar cómo una selección esperará hasta que se ejecute uno de sus casos.
Es importante entender las diferentes partes en este ejemplo, así que veámoslas una por una.
Antes de ver la lógica de selección, examinemos cómo se realizan las llamadas a la API.
La Function
estructura representa una sola llamada API, sus atributos son una función f
que toma un canal de tipo News
, observe cómo la firma de esta función ya impone que el canal será tratado como un send-only
canal, el segundo atributo es un canal de tipo News
, una vez que el Se ejecuta la llamada a la API y se analiza la respuesta; este canal se utilizará para enviar los resultados.
La News
estructura es el objeto para contener los artículos y de qué fuente provienen.
En la línea 43, inicializamos un segmento de Function
, con dos elementos, el primero tiene la googleNews
función y usa el google
canal, y el segundo usa la freeNews
función y usa el free
canal.
Dado que ambas llamadas a la API obtendrán noticias, los canales son del mismo tipo, pero uno para cada función.
En las líneas 69 y 102, tenemos las implementaciones de estas dos API. Cada uno hace una solicitud HTTP a sus respectivas URL y analiza la respuesta, una vez hecho esto, las noticias se envían a través de sus respectivos canales.
Centrémonos ahora en el quickestApiResponse
método. El propósito de este método es establecer la variable del artículo en la respuesta de la API más rápida. En la línea 54, cada función se ejecuta llamando al Run
método. Este método inicia una nueva rutina en la función y pasa el canal. Es importante tener en cuenta que estas llamadas API deben ejecutarse en una rutina separada porque no queremos ejecutarlas secuencialmente.
Luego, la selección esperará a que el canal google
o free
envíe una respuesta. Una vez que cualquiera de las llamadas a la API envíe la respuesta a través de su canal respectivo, la selección ejecutará el código en ese caso e ignorará el otro. Esto configurará efectivamente los artículos para la respuesta de la llamada API más rápida.
Ejecutemos el programa para ver el resultado:
Salida de respuesta del servidor API
El FreeNewsApi
corrió más rápido!.
Esta lógica se puede aplicar a muchos otros casos de uso, lo que permite que el programa ejecute varias rutinas, use canales para comunicarse y use select para esperarlas.
Una cosa más que podemos implementar en este ejemplo es imponer algún tipo de tiempo de espera, si las llamadas a la API superan el límite, dejaremos los artículos vacíos. El siguiente código logra esto agregando un caso más al select.
const (
API_MAX_TIMEOUT = 3 * time.Second
)
func quickestApiResponse(functions []*Function) {
var articles []*Article
for _, function := range functions {
function.Run()
}
select {
case googleNewsResponse := <-google:
fmt.Printf("Source: %s\n", googleNewsResponse.Source)
articles = googleNewsResponse.Articles
case freeNewsReponse := <-free:
fmt.Printf("Source: %s\n", freeNewsReponse.Source)
articles = freeNewsReponse.Articles
case <-time.After(API_MAX_TIMEOUT):
fmt.Println("Time out! API calls took too long!!")
}
fmt.Printf("Articles %v\n", articles)
}
El time.After
devuelve un canal de tipo time.Time
y enviará la hora actual una vez que haya pasado el tiempo especificado. Observe cómo aquí no estamos asignando el valor de este canal a una variable, esto se debe a que no nos importan los datos que enviará el canal, solo nos importa recibir la señal. Si dormimos durante tres segundos en ambas API, veremos que se ejecuta el caso de tiempo de espera y los otros dos casos se ignoran.
Tiempo de espera de respuesta del servidor API
Ejecución de procesos recurrentes
Veamos cómo podemos utilizar select
para ejecutar un proceso recurrente. Para este programa, tendremos el siguiente escenario:
El programa debe permitirnos pasar cualquier función como el proceso recurrente cuando ese proceso debe comenzar a ejecutarse y un intervalo de tiempo entre cada ejecución.
A continuación tenemos el código inicial, echemos un vistazo:
package chaptereight
import (
"fmt"
"math/rand"
"time"
"github.com/brianvoe/gofakeit/v6"
)
type PendingUserNotifications map[int][]*Notification
type Notification struct {
Content string
UserId int
}
func sendUserBatchNotificationsEmail(userId int, notifications []*Notification) {
fmt.Printf("Sending email to user with userId %d for pending notifications %v\n", userId, notifications)
}
func handlePendingUsersNotifications(pendingNotifications PendingUserNotifications, handler func(userId int, notifications []*Notification)) {
for userId, notifications := range pendingNotifications {
handler(userId, notifications)
delete(pendingNotifications, userId)
}
}
func collectNewUsersNotifications(notifications PendingUserNotifications) {
randomNotifications := getRandomNotifications()
if len(randomNotifications) > 0 {
notifications[randomNotifications[0].UserId] = randomNotifications
}
}
func getRandomNotifications() (notifications []*Notification) {
rand.Seed(time.Now().UnixNano())
userId := rand.Intn(100-10+1) + 10
numOfNotifications := rand.Intn(5-0+1) + 0
fmt.Printf("numOfNotifications %v\n", numOfNotifications)
for i := 0; i < numOfNotifications; i++ {
notifications = append(notifications, &Notification{Content: gofakeit.Paragraph(1, 2, 10, " "), UserId: userId})
}
return
}
El código anterior refleja la tarea que queremos ejecutar. Tenemos dos funciones principales collectNewUsersNotifications
y handlePendingUsersNotifications
. La primera está destinada a recopilar todas las notificaciones de nuevos usuarios, la implementación ideal sería que esta función busque notificaciones no leídas en una base de datos, pero por el bien de este ejemplo, estamos simulando recibir notificaciones aleatorias para ciertos usuarios.
Las notificaciones se crean utilizando la Notification
estructura con solo dos campos, uno para el contenido y otro para la identificación del usuario.
La función de recopilación utiliza el PendingUserNotifications
tipo para almacenar las notificaciones. Este tipo es un mapa donde la clave es un número entero que representa la identificación del usuario y el valor es una porción de Notification
.
Después de recopilar todas las notificaciones, queremos usar handlePendingUserNotifications
la función para iterar sobre las notificaciones y ejecutar una función de controlador en cada una de ellas. Después de que procesamos las notificaciones de cada usuario, se eliminan del mapa. El controlador que usaremos en este caso es el sendUserBatchNotificationsEmail
. Su finalidad es enviar un email al usuario con todas las notificaciones pendientes para que pueda echar un vistazo.
Centrémonos ahora en cómo ejecutar esta tarea de manera recurrente usando select
. Como mencioné anteriormente, tenemos que considerar lo siguiente:
El siguiente código muestra cómo lograr esto:
package main
import (
"fmt"
"math/rand"
"time"
"github.com/brianvoe/gofakeit/v6"
)
type PendingUserNotifications map[int][]*Notification
type ProcessHandler func()
type Notification struct {
Content string
UserId int
}
type RecurringProcess struct {
name string
interval time.Duration
startTime time.Time
handler func()
stop chan struct{}
}
func main() {
pendingNotificationsProcess()
}
func pendingNotificationsProcess() {
process := &RecurringProcess{}
notifications := PendingUserNotifications{}
handler := func() {
collectNewUsersNotifications(notifications)
handlePendingUsersNotifications(notifications, sendUserBatchNotificationsEmail, process)
}
interval := 10 * time.Second
startTime := time.Now().Add(3 * time.Minute)
process = createRecurringProcess("Pending User Notifications", handler, interval, startTime)
<-process.stop
}
func sendUserBatchNotificationsEmail(userId int, notifications []*Notification) {
fmt.Printf("Sending email to user with userId %d for pending notifications %v\n", userId, notifications)
}
func handlePendingUsersNotifications(pendingNotifications PendingUserNotifications, handler func(userId int, notifications []*Notification), process *RecurringProcess) {
userNotificationCount := 0
for userId, notifications := range pendingNotifications {
userNotificationCount++
handler(userId, notifications)
delete(pendingNotifications, userId)
}
if userNotificationCount == 0 {
process.Cancel()
}
}
func collectNewUsersNotifications(notifications PendingUserNotifications) {
randomNotifications := getRandomNotifications()
if len(randomNotifications) > 0 {
notifications[randomNotifications[0].UserId] = randomNotifications
}
}
func getRandomNotifications() (notifications []*Notification) {
rand.Seed(time.Now().UnixNano())
userId := rand.Intn(100-10+1) + 10
numOfNotifications := rand.Intn(5-0+1) + 0
fmt.Printf("numOfNotifications %v\n", numOfNotifications)
for i := 0; i < numOfNotifications; i++ {
notifications = append(notifications, &Notification{Content: gofakeit.Paragraph(1, 2, 10, " "), UserId: userId})
}
return
}
func createRecurringProcess(name string, handler ProcessHandler, interval time.Duration, startTime time.Time) *RecurringProcess {
process := &RecurringProcess{
name: name,
interval: interval,
startTime: startTime,
handler: handler,
stop: make(chan struct{}),
}
go process.Start()
return process
}
func (p *RecurringProcess) Start() {
startTicker := &time.Timer{}
ticker := &time.Ticker{C: nil}
defer func() { ticker.Stop() }()
if p.startTime.Before(time.Now()) {
p.startTime = time.Now()
}
startTicker = time.NewTimer(time.Until(p.startTime))
for {
select {
case <-startTicker.C:
ticker = time.NewTicker(p.interval)
fmt.Println("Starting recurring process")
p.handler()
case <-ticker.C:
fmt.Println("Next run")
p.handler()
case <-p.stop:
fmt.Println("Stoping recurring process")
return
}
}
}
func (p *RecurringProcess) Cancel() {
close(p.stop)
}
Introdujimos una nueva estructura para representar un proceso recurrente RecurringProcess
. Esta estructura contiene los siguientes campos:
name
- El nombre del procesointerval
— El tiempo de intervalo entre cada ejecuciónstartTime
— La hora en que se iniciará el proceso.handler
— Una función de controlador para llamar en cada ejecuciónstop
— Un canal para detener el procesoEn pendingNotificationsProcess
función, inicializamos un nuevo proceso recurrente y las notificaciones en las líneas 30 y 31 respectivamente. La función de controlador que usaremos es una función que tiene ambas funciones collectNewUsersNotifications
y handlePendingUsersNotifications
dentro. Observe aquí que estamos pasando el proceso a handlePendingUsersNotifications
porque será necesario para detener el proceso.
También especificamos el intervalo y la hora de inicio.
Luego llamamos createRecurringProcess
, esta función crea el proceso recurrente y también lo inicia. Centrémonos en la línea 88, donde estamos usando una gorutina para iniciar el proceso.
En la línea 40, bloqueamos la rutina principal leyendo desde el canal de parada, lo que significa que la rutina principal se bloqueará hasta que se envíe un mensaje a este canal.
Echemos un vistazo a la Start
función en la línea 93 que contiene toda la lógica para ejecutar el proceso recurrente.
Esta función usa la startTicker
variable para iniciar el proceso recurrente usando la hora de inicio. Si la hora de inicio ya pasó, el proceso comenzará de inmediato.
El time.NewTimer
enviará la hora actual en su canal cuando haya pasado la duración especificada, y esto nos permitirá iniciar el proceso. Es por esto que tenemos el primer caso de la selección esperando que el canal reciba la señal.
También tenemos en la línea 95 una ticker
variable que es un time.Ticker
. Un ticker en marcha enviará ticks en su canal en el intervalo especificado. Una vez que el startTicker.C
canal envía la señal, asignamos un nuevo ticker con el intervalo a la ticker
variable en la línea 106 y también llamamos a la función de controlador.
Después de esto, ticker
comenzará a recibir marcas en el segundo caso seleccionado, y cada vez que reciba una, también se llamará a la función del controlador.
En el último caso de la selección, esperamos hasta que se envíe una señal para detener el proceso simplemente regresando.
Observe cómo está dentro del bucle select
infinito . for
Esto se debe a que queremos mantener el bucle hasta que uno de los casos rompa explícitamente el bucle. Cada vez que recibamos un tick, se ejecutará el segundo caso y luego volverá a entrar en el mismo bucle donde el select volverá a esperar a que se ejecuten algunos de sus casos.
Para detener el proceso agregamos algo de lógica en la línea 55, contamos el número de notificaciones y si en algún momento no había notificaciones pendientes, el programa cancela el proceso. La Cancel
función cierra el canal de parada y esto hará que el programa termine.
Ejecutemos el programa para ver cómo funciona:
Salida del programa
Genial, el programa funciona como se esperaba. Este es solo un ejemplo de cómo ejecutar un proceso recurrente. Este puede ser el código base para implementar algo más complejo. Puede crear programas complejos con select
.
Conclusión
La creación de programas simultáneos puede ser un desafío al principio, especialmente si tiene dificultades para comprender cómo funcionan las rutinas, los canales y la selección.
Espero que con este artículo se sienta menos confundido y haya encontrado algunos casos de uso en los que puede usar select
.
Gracias por leer, y estén atentos para más.
Esta historia se publicó originalmente en https://betterprogramming.pub/concurrency-with-select-goroutines-and-channels-9786e0c6be3c