1659571020
A Java native language binding for Hoverfly, a Go proxy which allows you to simulate http services in your unit tests. Another term for this is Service Virtualisation.
Full documentation is available here
<dependency>
<groupId>io.specto</groupId>
<artifactId>hoverfly-java</artifactId>
<version>0.14.0</version>
<scope>test</scope>
</dependency>
// Capture and output HTTP traffic to json file
@ClassRule
public static HoverflyRule hoverflyRule = HoverflyRule.inCaptureMode("simulation.json");
// After the capturing, switch to inSimulationMode to spin up a stub server
@ClassRule
public static HoverflyRule hoverflyRule = HoverflyRule.inSimulationMode(defaultPath("simulation.json"));
// Or you can use both approaches at once. If json file not present in capture mode, if present in simulation mode
@ClassRule
public static HoverflyRule hoverflyRule = HoverflyRule.inCaptureOrSimulationMode("simulation.json");
@ClassRule
public static HoverflyRule hoverflyRule = HoverflyRule.inSimulationMode(dsl(
service("www.my-test.com")
.get("/api/bookings/1")
.willReturn(created("http://localhost/api/bookings/1"))
));
@Test
public void shouldBeAbleToGetABookingUsingHoverfly() {
// When
final ResponseEntity<String> getBookingResponse = restTemplate.getForEntity("http://www.my-test.com/api/bookings/1", String.class);
// Then
assertEquals(bookFlightResponse.getStatusCode(), CREATED);
assertEquals(bookFlightResponse.getHeaders().getLocation(), "http://localhost/api/bookings/1");
}
Some code examples for the DSL are available here.
More code examples for the DSL using request matchers can be found here.
// Verify that at least one request to a specific endpoint with any query params
hoverflyRule.verify(service(matches("*.flight.*")).get("/api/bookings").anyQueryParams(), atLeastOnce());
// Verify that an external service/dependency was not called
hoverflyRule.verifyZeroRequestTo(service(matches("*.flight.*")));
// Verify all the stubbed requests were made at least once
hoverflyRule.verifyAll();
Contributions are welcome!
To submit a pull request you should fork the Hoverfly-Java repository, and make your change on a feature branch of your fork.
As of v0.10.2
, hoverfly binaries are no longer stored in the repository. You should run ./gradlew clean test
once to cache the binaries for development with your IDE.
If you have forked this project prior to v0.10.2
, please re-fork to get a slimmer version of the repository.
Feel free to raise an issues on Github.
Author: SpectoLabs
Source code: https://github.com/SpectoLabs/hoverfly-java
License: Apache-2.0, Unknown licenses found
#java #testing
1621492530
It is time to learn new test frameworks in 2021 to improve your code quality and decrease the time of your testing phase. Let’s explore 6 options for devs.
It is time to learn new test frameworks to improve your code quality and decrease the time of your testing phase. I have selected six testing frameworks that sound promising. Some have existed for quite a long time but I have not heard about them before.
At the end of the article, please tell me what you think about them and what your favorite ones are.
Robot Framework is a generic open-source automation framework. It can be used for test automation and robotic process automation (RPA).
Robot Framework is open and extensible and can be integrated with virtually any other tool to create powerful and flexible automation solutions. Being open-source also means that Robot Framework is free to use without licensing costs.
The RoboFramework is a framework** to write test cases and automation processes.** It means that it may replace** your classic combo Selenium + Cucumber + Gherkins**. To be more precise, the Cucumber Gherkins custom implementation you wrote will be handled by RoboFramework and Selenium invoked below.
For the Java developers, this framework can be executed with Maven or Gradle (but less mature for the latter solution).
#java #testing #test #java framework #java frameworks #testing and developing #java testing #robot framework #test framework #2021
1600135200
OpenJDk or Open Java Development Kit is a free, open-source framework of the Java Platform, Standard Edition (or Java SE). It contains the virtual machine, the Java Class Library, and the Java compiler. The difference between the Oracle OpenJDK and Oracle JDK is that OpenJDK is a source code reference point for the open-source model. Simultaneously, the Oracle JDK is a continuation or advanced model of the OpenJDK, which is not open source and requires a license to use.
In this article, we will be installing OpenJDK on Centos 8.
#tutorials #alternatives #centos #centos 8 #configuration #dnf #frameworks #java #java development kit #java ee #java environment variables #java framework #java jdk #java jre #java platform #java sdk #java se #jdk #jre #open java development kit #open source #openjdk #openjdk 11 #openjdk 8 #openjdk runtime environment
1648667160
🚀 LaravelS is
an out-of-the-box adapter
between Swoole and Laravel/Lumen.
Please Watch
this repository to get the latest updates.
_ _ _____
| | | |/ ____|
| | __ _ _ __ __ ___ _____| | (___
| | / _` | '__/ _` \ \ / / _ \ |\___ \
| |___| (_| | | | (_| |\ V / __/ |____) |
|______\__,_|_| \__,_| \_/ \___|_|_____/
Built-in Http/WebSocket server
Memory resident
Gracefully reload
Automatically reload after modifying code
Support Laravel/Lumen both, good compatibility
Simple & Out of the box
Which is the fastest web framework?
TechEmpower Framework Benchmarks
Dependency | Requirement |
---|---|
PHP | >= 5.5.9 Recommend PHP7+ |
Swoole | >= 1.7.19 No longer support PHP5 since 2.0.12 Recommend 4.5.0+ |
Laravel/Lumen | >= 5.1 Recommend 8.0+ |
1.Require package via Composer(packagist).
composer require "hhxsv5/laravel-s:~3.7.0" -vvv
# Make sure that your composer.lock file is under the VCS
2.Register service provider(pick one of two).
Laravel
: in config/app.php
file, Laravel 5.5+ supports package discovery automatically, you should skip this step
'providers' => [
//...
Hhxsv5\LaravelS\Illuminate\LaravelSServiceProvider::class,
],
Lumen
: in bootstrap/app.php
file
$app->register(Hhxsv5\LaravelS\Illuminate\LaravelSServiceProvider::class);
3.Publish configuration and binaries.
After upgrading LaravelS, you need to republish; click here to see the change notes of each version.
php artisan laravels publish
# Configuration: config/laravels.php
# Binary: bin/laravels bin/fswatch bin/inotify
4.Change config/laravels.php
: listen_ip, listen_port, refer Settings.
5.Performance tuning
Number of Workers: LaravelS uses Swoole's Synchronous IO
mode, the larger the worker_num
setting, the better the concurrency performance, but it will cause more memory usage and process switching overhead. If one request takes 100ms, in order to provide 1000QPS concurrency, at least 100 Worker processes need to be configured. The calculation method is: worker_num = 1000QPS/(1s/1ms) = 100, so incremental pressure testing is needed to calculate the best worker_num
.
Please read the notices carefully before running
, Important notices(IMPORTANT).
php bin/laravels {start|stop|restart|reload|info|help}
.Command | Description |
---|---|
start | Start LaravelS, list the processes by "ps -ef|grep laravels" |
stop | Stop LaravelS, and trigger the method onStop of Custom process |
restart | Restart LaravelS: Stop gracefully before starting; The service is unavailable until startup is complete |
reload | Reload all Task/Worker/Timer processes which contain your business codes, and trigger the method onReload of Custom process, CANNOT reload Master/Manger processes. After modifying config/laravels.php , you only have to call restart to restart |
info | Display component version information |
help | Display help information |
start
and restart
.Option | Description |
---|---|
-d|--daemonize | Run as a daemon, this option will override the swoole.daemonize setting in laravels.php |
-e|--env | The environment the command should run under, such as --env=testing will use the configuration file .env.testing firstly, this feature requires Laravel 5.2+ |
-i|--ignore | Ignore checking PID file of Master process |
-x|--x-version | The version(branch) of the current project, stored in $_ENV/$_SERVER, access via $_ENV['X_VERSION'] $_SERVER['X_VERSION'] $request->server->get('X_VERSION') |
Runtime
files: start
will automatically execute php artisan laravels config
and generate these files, developers generally don't need to pay attention to them, it's recommended to add them to .gitignore
.File | Description |
---|---|
storage/laravels.conf | LaravelS's runtime configuration file |
storage/laravels.pid | PID file of Master process |
storage/laravels-timer-process.pid | PID file of the Timer process |
storage/laravels-custom-processes.pid | PID file of all custom processes |
It is recommended to supervise the main process through Supervisord, the premise is without option
-d
and to setswoole.daemonize
tofalse
.
[program:laravel-s-test]
directory=/var/www/laravel-s-test
command=/usr/local/bin/php bin/laravels start -i
numprocs=1
autostart=true
autorestart=true
startretries=3
user=www-data
redirect_stderr=true
stdout_logfile=/var/log/supervisor/%(program_name)s.log
Demo.
gzip on;
gzip_min_length 1024;
gzip_comp_level 2;
gzip_types text/plain text/css text/javascript application/json application/javascript application/x-javascript application/xml application/x-httpd-php image/jpeg image/gif image/png font/ttf font/otf image/svg+xml;
gzip_vary on;
gzip_disable "msie6";
upstream swoole {
# Connect IP:Port
server 127.0.0.1:5200 weight=5 max_fails=3 fail_timeout=30s;
# Connect UnixSocket Stream file, tips: put the socket file in the /dev/shm directory to get better performance
#server unix:/yourpath/laravel-s-test/storage/laravels.sock weight=5 max_fails=3 fail_timeout=30s;
#server 192.168.1.1:5200 weight=3 max_fails=3 fail_timeout=30s;
#server 192.168.1.2:5200 backup;
keepalive 16;
}
server {
listen 80;
# Don't forget to bind the host
server_name laravels.com;
root /yourpath/laravel-s-test/public;
access_log /yourpath/log/nginx/$server_name.access.log main;
autoindex off;
index index.html index.htm;
# Nginx handles the static resources(recommend enabling gzip), LaravelS handles the dynamic resource.
location / {
try_files $uri @laravels;
}
# Response 404 directly when request the PHP file, to avoid exposing public/*.php
#location ~* \.php$ {
# return 404;
#}
location @laravels {
# proxy_connect_timeout 60s;
# proxy_send_timeout 60s;
# proxy_read_timeout 120s;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Real-PORT $remote_port;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_set_header Scheme $scheme;
proxy_set_header Server-Protocol $server_protocol;
proxy_set_header Server-Name $server_name;
proxy_set_header Server-Addr $server_addr;
proxy_set_header Server-Port $server_port;
# "swoole" is the upstream
proxy_pass http://swoole;
}
}
LoadModule proxy_module /yourpath/modules/mod_proxy.so
LoadModule proxy_balancer_module /yourpath/modules/mod_proxy_balancer.so
LoadModule lbmethod_byrequests_module /yourpath/modules/mod_lbmethod_byrequests.so
LoadModule proxy_http_module /yourpath/modules/mod_proxy_http.so
LoadModule slotmem_shm_module /yourpath/modules/mod_slotmem_shm.so
LoadModule rewrite_module /yourpath/modules/mod_rewrite.so
LoadModule remoteip_module /yourpath/modules/mod_remoteip.so
LoadModule deflate_module /yourpath/modules/mod_deflate.so
<IfModule deflate_module>
SetOutputFilter DEFLATE
DeflateCompressionLevel 2
AddOutputFilterByType DEFLATE text/html text/plain text/css text/javascript application/json application/javascript application/x-javascript application/xml application/x-httpd-php image/jpeg image/gif image/png font/ttf font/otf image/svg+xml
</IfModule>
<VirtualHost *:80>
# Don't forget to bind the host
ServerName www.laravels.com
ServerAdmin hhxsv5@sina.com
DocumentRoot /yourpath/laravel-s-test/public;
DirectoryIndex index.html index.htm
<Directory "/">
AllowOverride None
Require all granted
</Directory>
RemoteIPHeader X-Forwarded-For
ProxyRequests Off
ProxyPreserveHost On
<Proxy balancer://laravels>
BalancerMember http://192.168.1.1:5200 loadfactor=7
#BalancerMember http://192.168.1.2:5200 loadfactor=3
#BalancerMember http://192.168.1.3:5200 loadfactor=1 status=+H
ProxySet lbmethod=byrequests
</Proxy>
#ProxyPass / balancer://laravels/
#ProxyPassReverse / balancer://laravels/
# Apache handles the static resources, LaravelS handles the dynamic resource.
RewriteEngine On
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} !-d
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} !-f
RewriteRule ^/(.*)$ balancer://laravels%{REQUEST_URI} [P,L]
ErrorLog ${APACHE_LOG_DIR}/www.laravels.com.error.log
CustomLog ${APACHE_LOG_DIR}/www.laravels.com.access.log combined
</VirtualHost>
The Listening address of WebSocket Sever is the same as Http Server.
1.Create WebSocket Handler class, and implement interface WebSocketHandlerInterface
.The instant is automatically instantiated when start, you do not need to manually create it.
namespace App\Services;
use Hhxsv5\LaravelS\Swoole\WebSocketHandlerInterface;
use Swoole\Http\Request;
use Swoole\Http\Response;
use Swoole\WebSocket\Frame;
use Swoole\WebSocket\Server;
/**
* @see https://www.swoole.co.uk/docs/modules/swoole-websocket-server
*/
class WebSocketService implements WebSocketHandlerInterface
{
// Declare constructor without parameters
public function __construct()
{
}
// public function onHandShake(Request $request, Response $response)
// {
// Custom handshake: https://www.swoole.co.uk/docs/modules/swoole-websocket-server-on-handshake
// The onOpen event will be triggered automatically after a successful handshake
// }
public function onOpen(Server $server, Request $request)
{
// Before the onOpen event is triggered, the HTTP request to establish the WebSocket has passed the Laravel route,
// so Laravel's Request, Auth information are readable, Session is readable and writable, but only in the onOpen event.
// \Log::info('New WebSocket connection', [$request->fd, request()->all(), session()->getId(), session('xxx'), session(['yyy' => time()])]);
// The exceptions thrown here will be caught by the upper layer and recorded in the Swoole log. Developers need to try/catch manually.
$server->push($request->fd, 'Welcome to LaravelS');
}
public function onMessage(Server $server, Frame $frame)
{
// \Log::info('Received message', [$frame->fd, $frame->data, $frame->opcode, $frame->finish]);
// The exceptions thrown here will be caught by the upper layer and recorded in the Swoole log. Developers need to try/catch manually.
$server->push($frame->fd, date('Y-m-d H:i:s'));
}
public function onClose(Server $server, $fd, $reactorId)
{
// The exceptions thrown here will be caught by the upper layer and recorded in the Swoole log. Developers need to try/catch manually.
}
}
2.Modify config/laravels.php
.
// ...
'websocket' => [
'enable' => true, // Note: set enable to true
'handler' => \App\Services\WebSocketService::class,
],
'swoole' => [
//...
// Must set dispatch_mode in (2, 4, 5), see https://www.swoole.co.uk/docs/modules/swoole-server/configuration
'dispatch_mode' => 2,
//...
],
// ...
3.Use SwooleTable
to bind FD & UserId, optional, Swoole Table Demo. Also you can use the other global storage services, like Redis/Memcached/MySQL, but be careful that FD will be possible conflicting between multiple Swoole Servers
.
4.Cooperate with Nginx (Recommended)
Refer WebSocket Proxy
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
upstream swoole {
# Connect IP:Port
server 127.0.0.1:5200 weight=5 max_fails=3 fail_timeout=30s;
# Connect UnixSocket Stream file, tips: put the socket file in the /dev/shm directory to get better performance
#server unix:/yourpath/laravel-s-test/storage/laravels.sock weight=5 max_fails=3 fail_timeout=30s;
#server 192.168.1.1:5200 weight=3 max_fails=3 fail_timeout=30s;
#server 192.168.1.2:5200 backup;
keepalive 16;
}
server {
listen 80;
# Don't forget to bind the host
server_name laravels.com;
root /yourpath/laravel-s-test/public;
access_log /yourpath/log/nginx/$server_name.access.log main;
autoindex off;
index index.html index.htm;
# Nginx handles the static resources(recommend enabling gzip), LaravelS handles the dynamic resource.
location / {
try_files $uri @laravels;
}
# Response 404 directly when request the PHP file, to avoid exposing public/*.php
#location ~* \.php$ {
# return 404;
#}
# Http and WebSocket are concomitant, Nginx identifies them by "location"
# !!! The location of WebSocket is "/ws"
# Javascript: var ws = new WebSocket("ws://laravels.com/ws");
location =/ws {
# proxy_connect_timeout 60s;
# proxy_send_timeout 60s;
# proxy_read_timeout: Nginx will close the connection if the proxied server does not send data to Nginx in 60 seconds; At the same time, this close behavior is also affected by heartbeat setting of Swoole.
# proxy_read_timeout 60s;
proxy_http_version 1.1;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Real-PORT $remote_port;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_set_header Scheme $scheme;
proxy_set_header Server-Protocol $server_protocol;
proxy_set_header Server-Name $server_name;
proxy_set_header Server-Addr $server_addr;
proxy_set_header Server-Port $server_port;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_pass http://swoole;
}
location @laravels {
# proxy_connect_timeout 60s;
# proxy_send_timeout 60s;
# proxy_read_timeout 60s;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Real-PORT $remote_port;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_set_header Scheme $scheme;
proxy_set_header Server-Protocol $server_protocol;
proxy_set_header Server-Name $server_name;
proxy_set_header Server-Addr $server_addr;
proxy_set_header Server-Port $server_port;
proxy_pass http://swoole;
}
}
5.Heartbeat setting
Heartbeat setting of Swoole
// config/laravels.php
'swoole' => [
//...
// All connections are traversed every 60 seconds. If a connection does not send any data to the server within 600 seconds, the connection will be forced to close.
'heartbeat_idle_time' => 600,
'heartbeat_check_interval' => 60,
//...
],
Proxy read timeout of Nginx
# Nginx will close the connection if the proxied server does not send data to Nginx in 60 seconds
proxy_read_timeout 60s;
6.Push data in controller
namespace App\Http\Controllers;
class TestController extends Controller
{
public function push()
{
$fd = 1; // Find fd by userId from a map [userId=>fd].
/**@var \Swoole\WebSocket\Server $swoole */
$swoole = app('swoole');
$success = $swoole->push($fd, 'Push data to fd#1 in Controller');
var_dump($success);
}
}
Usually, you can reset/destroy some
global/static
variables, or change the currentRequest/Response
object.
laravels.received_request
After LaravelS parsed Swoole\Http\Request
to Illuminate\Http\Request
, before Laravel's Kernel handles this request.
// Edit file `app/Providers/EventServiceProvider.php`, add the following code into method `boot`
// If no variable $events, you can also call Facade \Event::listen().
$events->listen('laravels.received_request', function (\Illuminate\Http\Request $req, $app) {
$req->query->set('get_key', 'hhxsv5');// Change query of request
$req->request->set('post_key', 'hhxsv5'); // Change post of request
});
laravels.generated_response
After Laravel's Kernel handled the request, before LaravelS parses Illuminate\Http\Response
to Swoole\Http\Response
.
// Edit file `app/Providers/EventServiceProvider.php`, add the following code into method `boot`
// If no variable $events, you can also call Facade \Event::listen().
$events->listen('laravels.generated_response', function (\Illuminate\Http\Request $req, \Symfony\Component\HttpFoundation\Response $rsp, $app) {
$rsp->headers->set('header-key', 'hhxsv5');// Change header of response
});
This feature depends on
AsyncTask
ofSwoole
, your need to setswoole.task_worker_num
inconfig/laravels.php
firstly. The performance of asynchronous event processing is influenced by number of Swoole task process, you need to set task_worker_num appropriately.
1.Create event class.
use Hhxsv5\LaravelS\Swoole\Task\Event;
class TestEvent extends Event
{
protected $listeners = [
// Listener list
TestListener1::class,
// TestListener2::class,
];
private $data;
public function __construct($data)
{
$this->data = $data;
}
public function getData()
{
return $this->data;
}
}
2.Create listener class.
use Hhxsv5\LaravelS\Swoole\Task\Task;
use Hhxsv5\LaravelS\Swoole\Task\Listener;
class TestListener1 extends Listener
{
/**
* @var TestEvent
*/
protected $event;
public function handle()
{
\Log::info(__CLASS__ . ':handle start', [$this->event->getData()]);
sleep(2);// Simulate the slow codes
// Deliver task in CronJob, but NOT support callback finish() of task.
// Note: Modify task_ipc_mode to 1 or 2 in config/laravels.php, see https://www.swoole.co.uk/docs/modules/swoole-server/configuration
$ret = Task::deliver(new TestTask('task data'));
var_dump($ret);
// The exceptions thrown here will be caught by the upper layer and recorded in the Swoole log. Developers need to try/catch manually.
}
}
3.Fire event.
// Create instance of event and fire it, "fire" is asynchronous.
use Hhxsv5\LaravelS\Swoole\Task\Event;
$event = new TestEvent('event data');
// $event->delay(10); // Delay 10 seconds to fire event
// $event->setTries(3); // When an error occurs, try 3 times in total
$success = Event::fire($event);
var_dump($success);// Return true if sucess, otherwise false
This feature depends on
AsyncTask
ofSwoole
, your need to setswoole.task_worker_num
inconfig/laravels.php
firstly. The performance of task processing is influenced by number of Swoole task process, you need to set task_worker_num appropriately.
1.Create task class.
use Hhxsv5\LaravelS\Swoole\Task\Task;
class TestTask extends Task
{
private $data;
private $result;
public function __construct($data)
{
$this->data = $data;
}
// The logic of task handling, run in task process, CAN NOT deliver task
public function handle()
{
\Log::info(__CLASS__ . ':handle start', [$this->data]);
sleep(2);// Simulate the slow codes
// The exceptions thrown here will be caught by the upper layer and recorded in the Swoole log. Developers need to try/catch manually.
$this->result = 'the result of ' . $this->data;
}
// Optional, finish event, the logic of after task handling, run in worker process, CAN deliver task
public function finish()
{
\Log::info(__CLASS__ . ':finish start', [$this->result]);
Task::deliver(new TestTask2('task2 data')); // Deliver the other task
}
}
2.Deliver task.
// Create instance of TestTask and deliver it, "deliver" is asynchronous.
use Hhxsv5\LaravelS\Swoole\Task\Task;
$task = new TestTask('task data');
// $task->delay(3);// delay 3 seconds to deliver task
// $task->setTries(3); // When an error occurs, try 3 times in total
$ret = Task::deliver($task);
var_dump($ret);// Return true if sucess, otherwise false
Wrapper cron job base on Swoole's Millisecond Timer, replace
Linux
Crontab
.
1.Create cron job class.
namespace App\Jobs\Timer;
use App\Tasks\TestTask;
use Swoole\Coroutine;
use Hhxsv5\LaravelS\Swoole\Task\Task;
use Hhxsv5\LaravelS\Swoole\Timer\CronJob;
class TestCronJob extends CronJob
{
protected $i = 0;
// !!! The `interval` and `isImmediate` of cron job can be configured in two ways(pick one of two): one is to overload the corresponding method, and the other is to pass parameters when registering cron job.
// --- Override the corresponding method to return the configuration: begin
public function interval()
{
return 1000;// Run every 1000ms
}
public function isImmediate()
{
return false;// Whether to trigger `run` immediately after setting up
}
// --- Override the corresponding method to return the configuration: end
public function run()
{
\Log::info(__METHOD__, ['start', $this->i, microtime(true)]);
// do something
// sleep(1); // Swoole < 2.1
Coroutine::sleep(1); // Swoole>=2.1 Coroutine will be automatically created for run().
$this->i++;
\Log::info(__METHOD__, ['end', $this->i, microtime(true)]);
if ($this->i >= 10) { // Run 10 times only
\Log::info(__METHOD__, ['stop', $this->i, microtime(true)]);
$this->stop(); // Stop this cron job, but it will run again after restart/reload.
// Deliver task in CronJob, but NOT support callback finish() of task.
// Note: Modify task_ipc_mode to 1 or 2 in config/laravels.php, see https://www.swoole.co.uk/docs/modules/swoole-server/configuration
$ret = Task::deliver(new TestTask('task data'));
var_dump($ret);
}
// The exceptions thrown here will be caught by the upper layer and recorded in the Swoole log. Developers need to try/catch manually.
}
}
2.Register cron job.
// Register cron jobs in file "config/laravels.php"
[
// ...
'timer' => [
'enable' => true, // Enable Timer
'jobs' => [ // The list of cron job
// Enable LaravelScheduleJob to run `php artisan schedule:run` every 1 minute, replace Linux Crontab
// \Hhxsv5\LaravelS\Illuminate\LaravelScheduleJob::class,
// Two ways to configure parameters:
// [\App\Jobs\Timer\TestCronJob::class, [1000, true]], // Pass in parameters when registering
\App\Jobs\Timer\TestCronJob::class, // Override the corresponding method to return the configuration
],
'max_wait_time' => 5, // Max waiting time of reloading
// Enable the global lock to ensure that only one instance starts the timer when deploying multiple instances. This feature depends on Redis, please see https://laravel.com/docs/7.x/redis
'global_lock' => false,
'global_lock_key' => config('app.name', 'Laravel'),
],
// ...
];
3.Note: it will launch multiple timers when build the server cluster, so you need to make sure that launch one timer only to avoid running repetitive task.
4.LaravelS v3.4.0
starts to support the hot restart [Reload] Timer
process. After LaravelS receives the SIGUSR1
signal, it waits for max_wait_time
(default 5) seconds to end the process, then the Manager
process will pull up the Timer
process again.
5.If you only need to use minute-level
scheduled tasks, it is recommended to enable Hhxsv5\LaravelS\Illuminate\LaravelScheduleJob
instead of Linux Crontab, so that you can follow the coding habits of Laravel task scheduling and configure Kernel
.
// app/Console/Kernel.php
protected function schedule(Schedule $schedule)
{
// runInBackground() will start a new child process to execute the task. This is asynchronous and will not affect the execution timing of other tasks.
$schedule->command(TestCommand::class)->runInBackground()->everyMinute();
}
Via inotify
, support Linux only.
1.Install inotify extension.
2.Turn on the switch in Settings.
3.Notice: Modify the file only in Linux
to receive the file change events. It's recommended to use the latest Docker. Vagrant Solution.
Via fswatch
, support OS X/Linux/Windows.
1.Install fswatch.
2.Run command in your project root directory.
# Watch current directory
./bin/fswatch
# Watch app directory
./bin/fswatch ./app
Via inotifywait
, support Linux.
1.Install inotify-tools.
2.Run command in your project root directory.
# Watch current directory
./bin/inotify
# Watch app directory
./bin/inotify ./app
When the above methods does not work, the ultimate solution: set max_request=1,worker_num=1
, so that Worker
process will restart after processing a request. The performance of this method is very poor, so only development environment use
.
SwooleServer
in your project/**
* $swoole is the instance of `Swoole\WebSocket\Server` if enable WebSocket server, otherwise `Swoole\Http\Server`
* @var \Swoole\WebSocket\Server|\Swoole\Http\Server $swoole
*/
$swoole = app('swoole');
var_dump($swoole->stats());
$swoole->push($fd, 'Push WebSocket message');
SwooleTable
1.Define Table, support multiple.
All defined tables will be created before Swoole starting.
// in file "config/laravels.php"
[
// ...
'swoole_tables' => [
// Scene:bind UserId & FD in WebSocket
'ws' => [// The Key is table name, will add suffix "Table" to avoid naming conflicts. Here defined a table named "wsTable"
'size' => 102400,// The max size
'column' => [// Define the columns
['name' => 'value', 'type' => \Swoole\Table::TYPE_INT, 'size' => 8],
],
],
//...Define the other tables
],
// ...
];
2.Access Table
: all table instances will be bound on SwooleServer
, access by app('swoole')->xxxTable
.
namespace App\Services;
use Hhxsv5\LaravelS\Swoole\WebSocketHandlerInterface;
use Swoole\Http\Request;
use Swoole\WebSocket\Frame;
use Swoole\WebSocket\Server;
class WebSocketService implements WebSocketHandlerInterface
{
/**@var \Swoole\Table $wsTable */
private $wsTable;
public function __construct()
{
$this->wsTable = app('swoole')->wsTable;
}
// Scene:bind UserId & FD in WebSocket
public function onOpen(Server $server, Request $request)
{
// var_dump(app('swoole') === $server);// The same instance
/**
* Get the currently logged in user
* This feature requires that the path to establish a WebSocket connection go through middleware such as Authenticate.
* E.g:
* Browser side: var ws = new WebSocket("ws://127.0.0.1:5200/ws");
* Then the /ws route in Laravel needs to add the middleware like Authenticate.
* Route::get('/ws', function () {
* // Respond any content with status code 200
* return 'websocket';
* })->middleware(['auth']);
*/
// $user = Auth::user();
// $userId = $user ? $user->id : 0; // 0 means a guest user who is not logged in
$userId = mt_rand(1000, 10000);
// if (!$userId) {
// // Disconnect the connections of unlogged users
// $server->disconnect($request->fd);
// return;
// }
$this->wsTable->set('uid:' . $userId, ['value' => $request->fd]);// Bind map uid to fd
$this->wsTable->set('fd:' . $request->fd, ['value' => $userId]);// Bind map fd to uid
$server->push($request->fd, "Welcome to LaravelS #{$request->fd}");
}
public function onMessage(Server $server, Frame $frame)
{
// Broadcast
foreach ($this->wsTable as $key => $row) {
if (strpos($key, 'uid:') === 0 && $server->isEstablished($row['value'])) {
$content = sprintf('Broadcast: new message "%s" from #%d', $frame->data, $frame->fd);
$server->push($row['value'], $content);
}
}
}
public function onClose(Server $server, $fd, $reactorId)
{
$uid = $this->wsTable->get('fd:' . $fd);
if ($uid !== false) {
$this->wsTable->del('uid:' . $uid['value']); // Unbind uid map
}
$this->wsTable->del('fd:' . $fd);// Unbind fd map
$server->push($fd, "Goodbye #{$fd}");
}
}
For more information, please refer to Swoole Server AddListener
To make our main server support more protocols not just Http and WebSocket, we bring the feature multi-port mixed protocol
of Swoole in LaravelS and name it Socket
. Now, you can build TCP/UDP
applications easily on top of Laravel.
Create Socket
handler class, and extend Hhxsv5\LaravelS\Swoole\Socket\{TcpSocket|UdpSocket|Http|WebSocket}
.
namespace App\Sockets;
use Hhxsv5\LaravelS\Swoole\Socket\TcpSocket;
use Swoole\Server;
class TestTcpSocket extends TcpSocket
{
public function onConnect(Server $server, $fd, $reactorId)
{
\Log::info('New TCP connection', [$fd]);
$server->send($fd, 'Welcome to LaravelS.');
}
public function onReceive(Server $server, $fd, $reactorId, $data)
{
\Log::info('Received data', [$fd, $data]);
$server->send($fd, 'LaravelS: ' . $data);
if ($data === "quit\r\n") {
$server->send($fd, 'LaravelS: bye' . PHP_EOL);
$server->close($fd);
}
}
public function onClose(Server $server, $fd, $reactorId)
{
\Log::info('Close TCP connection', [$fd]);
$server->send($fd, 'Goodbye');
}
}
These Socket
connections share the same worker processes with your HTTP
/WebSocket
connections. So it won't be a problem at all if you want to deliver tasks, use SwooleTable
, even Laravel components such as DB, Eloquent and so on. At the same time, you can access Swoole\Server\Port
object directly by member property swoolePort
.
public function onReceive(Server $server, $fd, $reactorId, $data)
{
$port = $this->swoolePort; // Get the `Swoole\Server\Port` object
}
namespace App\Http\Controllers;
class TestController extends Controller
{
public function test()
{
/**@var \Swoole\Http\Server|\Swoole\WebSocket\Server $swoole */
$swoole = app('swoole');
// $swoole->ports: Traverse all Port objects, https://www.swoole.co.uk/docs/modules/swoole-server/multiple-ports
$port = $swoole->ports[0]; // Get the `Swoole\Server\Port` object, $port[0] is the port of the main server
foreach ($port->connections as $fd) { // Traverse all connections
// $swoole->send($fd, 'Send tcp message');
// if($swoole->isEstablished($fd)) {
// $swoole->push($fd, 'Send websocket message');
// }
}
}
}
Register Sockets.
// Edit `config/laravels.php`
//...
'sockets' => [
[
'host' => '127.0.0.1',
'port' => 5291,
'type' => SWOOLE_SOCK_TCP,// Socket type: SWOOLE_SOCK_TCP/SWOOLE_SOCK_TCP6/SWOOLE_SOCK_UDP/SWOOLE_SOCK_UDP6/SWOOLE_UNIX_DGRAM/SWOOLE_UNIX_STREAM
'settings' => [// Swoole settings:https://www.swoole.co.uk/docs/modules/swoole-server-methods#swoole_server-addlistener
'open_eof_check' => true,
'package_eof' => "\r\n",
],
'handler' => \App\Sockets\TestTcpSocket::class,
'enable' => true, // whether to enable, default true
],
],
About the heartbeat configuration, it can only be set on the main server
and cannot be configured on Socket
, but the Socket
inherits the heartbeat configuration of the main server
.
For TCP socket, onConnect
and onClose
events will be blocked when dispatch_mode
of Swoole is 1/3
, so if you want to unblock these two events please set dispatch_mode
to 2/4/5
.
'swoole' => [
//...
'dispatch_mode' => 2,
//...
];
Test.
TCP: telnet 127.0.0.1 5291
UDP: [Linux] echo "Hello LaravelS" > /dev/udp/127.0.0.1/5292
Register example of other protocols.
turn on WebSocket
, that is, set websocket.enable
to true
.Warning: The order of code execution in the coroutine is out of order. The data of the request level should be isolated by the coroutine ID. However, there are many singleton and static attributes in Laravel/Lumen, the data between different requests will affect each other, it's Unsafe
. For example, the database connection is a singleton, the same database connection shares the same PDO resource. This is fine in the synchronous blocking mode, but it does not work in the asynchronous coroutine mode. Each query needs to create different connections and maintain IO state of different connections, which requires a connection pool.
DO NOT
enable the coroutine, only the custom process can use the coroutine.
Support developers to create special work processes for monitoring, reporting, or other special tasks. Refer addProcess.
Create Proccess class, implements CustomProcessInterface.
namespace App\Processes;
use App\Tasks\TestTask;
use Hhxsv5\LaravelS\Swoole\Process\CustomProcessInterface;
use Hhxsv5\LaravelS\Swoole\Task\Task;
use Swoole\Coroutine;
use Swoole\Http\Server;
use Swoole\Process;
class TestProcess implements CustomProcessInterface
{
/**
* @var bool Quit tag for Reload updates
*/
private static $quit = false;
public static function callback(Server $swoole, Process $process)
{
// The callback method cannot exit. Once exited, Manager process will automatically create the process
while (!self::$quit) {
\Log::info('Test process: running');
// sleep(1); // Swoole < 2.1
Coroutine::sleep(1); // Swoole>=2.1: Coroutine & Runtime will be automatically enabled for callback().
// Deliver task in custom process, but NOT support callback finish() of task.
// Note: Modify task_ipc_mode to 1 or 2 in config/laravels.php, see https://www.swoole.co.uk/docs/modules/swoole-server/configuration
$ret = Task::deliver(new TestTask('task data'));
var_dump($ret);
// The upper layer will catch the exception thrown in the callback and record it in the Swoole log, and then this process will exit. The Manager process will re-create the process after 3 seconds, so developers need to try/catch to catch the exception by themselves to avoid frequent process creation.
// throw new \Exception('an exception');
}
}
// Requirements: LaravelS >= v3.4.0 & callback() must be async non-blocking program.
public static function onReload(Server $swoole, Process $process)
{
// Stop the process...
// Then end process
\Log::info('Test process: reloading');
self::$quit = true;
// $process->exit(0); // Force exit process
}
// Requirements: LaravelS >= v3.7.4 & callback() must be async non-blocking program.
public static function onStop(Server $swoole, Process $process)
{
// Stop the process...
// Then end process
\Log::info('Test process: stopping');
self::$quit = true;
// $process->exit(0); // Force exit process
}
}
Register TestProcess.
// Edit `config/laravels.php`
// ...
'processes' => [
'test' => [ // Key name is process name
'class' => \App\Processes\TestProcess::class,
'redirect' => false, // Whether redirect stdin/stdout, true or false
'pipe' => 0, // The type of pipeline, 0: no pipeline 1: SOCK_STREAM 2: SOCK_DGRAM
'enable' => true, // Whether to enable, default true
//'num' => 3 // To create multiple processes of this class, default is 1
//'queue' => [ // Enable message queue as inter-process communication, configure empty array means use default parameters
// 'msg_key' => 0, // The key of the message queue. Default: ftok(__FILE__, 1).
// 'mode' => 2, // Communication mode, default is 2, which means contention mode
// 'capacity' => 8192, // The length of a single message, is limited by the operating system kernel parameters. The default is 8192, and the maximum is 65536
//],
//'restart_interval' => 5, // After the process exits abnormally, how many seconds to wait before restarting the process, default 5 seconds
],
],
Note: The callback() cannot quit. If quit, the Manager process will re-create the process.
Example: Write data to a custom process.
// config/laravels.php
'processes' => [
'test' => [
'class' => \App\Processes\TestProcess::class,
'redirect' => false,
'pipe' => 1,
],
],
// app/Processes/TestProcess.php
public static function callback(Server $swoole, Process $process)
{
while ($data = $process->read()) {
\Log::info('TestProcess: read data', [$data]);
$process->write('TestProcess: ' . $data);
}
}
// app/Http/Controllers/TestController.php
public function testProcessWrite()
{
/**@var \Swoole\Process $process */
$process = app('swoole')->customProcesses['test'];
$process->write('TestController: write data' . time());
var_dump($process->read());
}
LaravelS
will pull theApollo
configuration and write it to the.env
file when starting. At the same time,LaravelS
will start the custom processapollo
to monitor the configuration and automaticallyreload
when the configuration changes.
Enable Apollo: add --enable-apollo
and Apollo parameters to the startup parameters.
php bin/laravels start --enable-apollo --apollo-server=http://127.0.0.1:8080 --apollo-app-id=LARAVEL-S-TEST
Support hot updates(optional).
// Edit `config/laravels.php`
'processes' => Hhxsv5\LaravelS\Components\Apollo\Process::getDefinition(),
// When there are other custom process configurations
'processes' => [
'test' => [
'class' => \App\Processes\TestProcess::class,
'redirect' => false,
'pipe' => 1,
],
// ...
] + Hhxsv5\LaravelS\Components\Apollo\Process::getDefinition(),
List of available parameters.
Parameter | Description | Default | Demo |
---|---|---|---|
apollo-server | Apollo server URL | - | --apollo-server=http://127.0.0.1:8080 |
apollo-app-id | Apollo APP ID | - | --apollo-app-id=LARAVEL-S-TEST |
apollo-namespaces | The namespace to which the APP belongs, support specify the multiple | application | --apollo-namespaces=application --apollo-namespaces=env |
apollo-cluster | The cluster to which the APP belongs | default | --apollo-cluster=default |
apollo-client-ip | IP of current instance, can also be used for grayscale publishing | Local intranet IP | --apollo-client-ip=10.2.1.83 |
apollo-pull-timeout | Timeout time(seconds) when pulling configuration | 5 | --apollo-pull-timeout=5 |
apollo-backup-old-env | Whether to backup the old configuration file when updating the configuration file .env | false | --apollo-backup-old-env |
Support Prometheus monitoring and alarm, Grafana visually view monitoring metrics. Please refer to Docker Compose for the environment construction of Prometheus and Grafana.
Require extension APCu >= 5.0.0, please install it by pecl install apcu
.
Copy the configuration file prometheus.php
to the config
directory of your project. Modify the configuration as appropriate.
# Execute commands in the project root directory
cp vendor/hhxsv5/laravel-s/config/prometheus.php config/
If your project is Lumen
, you also need to manually load the configuration $app->configure('prometheus');
in bootstrap/app.php
.
Configure global
middleware: Hhxsv5\LaravelS\Components\Prometheus\RequestMiddleware::class
. In order to count the request time consumption as accurately as possible, RequestMiddleware
must be the first
global middleware, which needs to be placed in front of other middleware.
Register ServiceProvider: Hhxsv5\LaravelS\Components\Prometheus\ServiceProvider::class
.
Configure the CollectorProcess in config/laravels.php
to collect the metrics of Swoole Worker/Task/Timer processes regularly.
'processes' => Hhxsv5\LaravelS\Components\Prometheus\CollectorProcess::getDefinition(),
Create the route to output metrics.
use Hhxsv5\LaravelS\Components\Prometheus\Exporter;
Route::get('/actuator/prometheus', function () {
$result = app(Exporter::class)->render();
return response($result, 200, ['Content-Type' => Exporter::REDNER_MIME_TYPE]);
});
Complete the configuration of Prometheus and start it.
global:
scrape_interval: 5s
scrape_timeout: 5s
evaluation_interval: 30s
scrape_configs:
- job_name: laravel-s-test
honor_timestamps: true
metrics_path: /actuator/prometheus
scheme: http
follow_redirects: true
static_configs:
- targets:
- 127.0.0.1:5200 # The ip and port of the monitored service
# Dynamically discovered using one of the supported service-discovery mechanisms
# https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config
# - job_name: laravels-eureka
# honor_timestamps: true
# scrape_interval: 5s
# metrics_path: /actuator/prometheus
# scheme: http
# follow_redirects: true
# eureka_sd_configs:
# - server: http://127.0.0.1:8080/eureka
# follow_redirects: true
# refresh_interval: 5s
Start Grafana, then import panel json.
Supported events:
Event | Interface | When happened |
---|---|---|
ServerStart | Hhxsv5\LaravelS\Swoole\Events\ServerStartInterface | Occurs when the Master process is starting, this event should not handle complex business logic, and can only do some simple work of initialization . |
ServerStop | Hhxsv5\LaravelS\Swoole\Events\ServerStopInterface | Occurs when the server exits normally, CANNOT use async or coroutine related APIs in this event . |
WorkerStart | Hhxsv5\LaravelS\Swoole\Events\WorkerStartInterface | Occurs after the Worker/Task process is started, and the Laravel initialization has been completed. |
WorkerStop | Hhxsv5\LaravelS\Swoole\Events\WorkerStopInterface | Occurs after the Worker/Task process exits normally |
WorkerError | Hhxsv5\LaravelS\Swoole\Events\WorkerErrorInterface | Occurs when an exception or fatal error occurs in the Worker/Task process |
1.Create an event class to implement the corresponding interface.
namespace App\Events;
use Hhxsv5\LaravelS\Swoole\Events\ServerStartInterface;
use Swoole\Atomic;
use Swoole\Http\Server;
class ServerStartEvent implements ServerStartInterface
{
public function __construct()
{
}
public function handle(Server $server)
{
// Initialize a global counter (available across processes)
$server->atomicCount = new Atomic(2233);
// Invoked in controller: app('swoole')->atomicCount->get();
}
}
namespace App\Events;
use Hhxsv5\LaravelS\Swoole\Events\WorkerStartInterface;
use Swoole\Http\Server;
class WorkerStartEvent implements WorkerStartInterface
{
public function __construct()
{
}
public function handle(Server $server, $workerId)
{
// Initialize a database connection pool
// DatabaseConnectionPool::init();
}
}
2.Configuration.
// Edit `config/laravels.php`
'event_handlers' => [
'ServerStart' => [\App\Events\ServerStartEvent::class], // Trigger events in array order
'WorkerStart' => [\App\Events\WorkerStartEvent::class],
],
1.Modify bootstrap/app.php
and set the storage directory. Because the project directory is read-only, the /tmp
directory can only be read and written.
$app->useStoragePath(env('APP_STORAGE_PATH', '/tmp/storage'));
2.Create a shell script laravels_bootstrap
and grant executable permission
.
#!/usr/bin/env bash
set +e
# Create storage-related directories
mkdir -p /tmp/storage/app/public
mkdir -p /tmp/storage/framework/cache
mkdir -p /tmp/storage/framework/sessions
mkdir -p /tmp/storage/framework/testing
mkdir -p /tmp/storage/framework/views
mkdir -p /tmp/storage/logs
# Set the environment variable APP_STORAGE_PATH, please make sure it's the same as APP_STORAGE_PATH in .env
export APP_STORAGE_PATH=/tmp/storage
# Start LaravelS
php bin/laravels start
3.Configure template.xml
.
ROSTemplateFormatVersion: '2015-09-01'
Transform: 'Aliyun::Serverless-2018-04-03'
Resources:
laravel-s-demo:
Type: 'Aliyun::Serverless::Service'
Properties:
Description: 'LaravelS Demo for Serverless'
fc-laravel-s:
Type: 'Aliyun::Serverless::Function'
Properties:
Handler: laravels.handler
Runtime: custom
MemorySize: 512
Timeout: 30
CodeUri: ./
InstanceConcurrency: 10
EnvironmentVariables:
BOOTSTRAP_FILE: laravels_bootstrap
Under FPM mode, singleton instances will be instantiated and recycled in every request, request start=>instantiate instance=>request end=>recycled instance.
Under Swoole Server, All singleton instances will be held in memory, different lifetime from FPM, request start=>instantiate instance=>request end=>do not recycle singleton instance. So need developer to maintain status of singleton instances in every request.
Common solutions:
Write a XxxCleaner
class to clean up the singleton object state. This class implements the interface Hhxsv5\LaravelS\Illuminate\Cleaners\CleanerInterface
and then registers it in cleaners
of laravels.php
.
Reset
status of singleton instances by Middleware
.
Re-register ServiceProvider
, add XxxServiceProvider
into register_providers
of file laravels.php
. So that reinitialize singleton instances in every request Refer.
Known issues: a package of known issues and solutions.
Logging; if you want to output to the console, you can use stderr
, Log::channel('stderr')->debug('debug message').
Laravel Dump Server(Laravel 5.7 has been integrated by default).
Read request by Illuminate\Http\Request
Object, $_ENV is readable, $_SERVER is partially readable, CANNOT USE
$_GET/$_POST/$_FILES/$_COOKIE/$_REQUEST/$_SESSION/$GLOBALS.
public function form(\Illuminate\Http\Request $request)
{
$name = $request->input('name');
$all = $request->all();
$sessionId = $request->cookie('sessionId');
$photo = $request->file('photo');
// Call getContent() to get the raw POST body, instead of file_get_contents('php://input')
$rawContent = $request->getContent();
//...
}
Respond by Illuminate\Http\Response
Object, compatible with echo/vardump()/print_r(),CANNOT USE
functions dd()/exit()/die()/header()/setcookie()/http_response_code().
public function json()
{
return response()->json(['time' => time()])->header('header1', 'value1')->withCookie('c1', 'v1');
}
Singleton connection
will be resident in memory, it is recommended to turn on persistent connection
for better performance.
will
reconnect automatically immediately
after disconnect.// config/database.php
'connections' => [
'my_conn' => [
'driver' => 'mysql',
'host' => env('DB_MY_CONN_HOST', 'localhost'),
'port' => env('DB_MY_CONN_PORT', 3306),
'database' => env('DB_MY_CONN_DATABASE', 'forge'),
'username' => env('DB_MY_CONN_USERNAME', 'forge'),
'password' => env('DB_MY_CONN_PASSWORD', ''),
'charset' => 'utf8mb4',
'collation' => 'utf8mb4_unicode_ci',
'prefix' => '',
'strict' => false,
'options' => [
// Enable persistent connection
\PDO::ATTR_PERSISTENT => true,
],
],
],
won't
reconnect automatically immediately
after disconnect, and will throw an exception about lost connection, reconnect next time. You need to make sure that SELECT DB
correctly before operating Redis every time.// config/database.php
'redis' => [
'client' => env('REDIS_CLIENT', 'phpredis'), // It is recommended to use phpredis for better performance.
'default' => [
'host' => env('REDIS_HOST', 'localhost'),
'password' => env('REDIS_PASSWORD', null),
'port' => env('REDIS_PORT', 6379),
'database' => 0,
'persistent' => true, // Enable persistent connection
],
],
Avoid using global variables. If necessary, please clean or reset them manually.
Infinitely appending element into static
/global
variable will lead to OOM(Out of Memory).
class Test
{
public static $array = [];
public static $string = '';
}
// Controller
public function test(Request $req)
{
// Out of Memory
Test::$array[] = $req->input('param1');
Test::$string .= $req->input('param2');
}
Memory leak detection method
Modify config/laravels.php
: worker_num=1, max_request=1000000
, remember to change it back after test;
Add routing /debug-memory-leak
without route middleware
to observe the memory changes of the Worker
process;
Start LaravelS
and request /debug-memory-leak
until diff_mem
is less than or equal to zero; if diff_mem
is always greater than zero, it means that there may be a memory leak in Global Middleware
or Laravel Framework
;
After completing Step 3
, alternately
request the business routes and /debug-memory-leak
(It is recommended to use ab
/wrk
to make a large number of requests for business routes), the initial increase in memory is normal. After a large number of requests for the business routes, if diff_mem
is always greater than zero and curr_mem
continues to increase, there is a high probability of memory leak; If curr_mem
always changes within a certain range and does not continue to increase, there is a low probability of memory leak.
If you still can't solve it, max_request is the last guarantee.
Author: hhxsv5
Source Code: https://github.com/hhxsv5/laravel-s
License: MIT License
1657764000
You can use the data you can get from social media in a number of ways, like sentiment analysis (analyzing people's thoughts) on a specific issue or field of interest.
There are several ways you can scrape (or gather) data from Twitter. And in this article, we will look at two of those ways: using Tweepy and Snscrape.
We will learn a method to scrape public conversations from people on a specific trending topic, as well as tweets from a particular user.
Now without further ado, let’s get started.
Now, before we get into the implementation of each platform, let's try to grasp the differences and limits of each platform.
Tweepy is a Python library for integrating with the Twitter API. Because Tweepy is connected with the Twitter API, you can perform complex queries in addition to scraping tweets. It enables you to take advantage of all of the Twitter API's capabilities.
But there are some drawbacks – like the fact that its standard API only allows you to collect tweets for up to a week (that is, Tweepy does not allow recovery of tweets beyond a week window, so historical data retrieval is not permitted).
Also, there are limits to how many tweets you can retrieve from a user's account. You can read more about Tweepy's functionalities here.
Snscrape is another approach for scraping information from Twitter that does not require the use of an API. Snscrape allows you to scrape basic information such as a user's profile, tweet content, source, and so on.
Snscrape is not limited to Twitter, but can also scrape content from other prominent social media networks like Facebook, Instagram, and others.
Its advantages are that there are no limits to the number of tweets you can retrieve or the window of tweets (that is, the date range of tweets). So Snscrape allows you to retrieve old data.
But the one disadvantage is that it lacks all the other functionalities of Tweepy – still, if you only want to scrape tweets, Snscrape would be enough.
Now that we've clarified the distinction between the two methods, let's go over their implementation one by one.
Before we begin using Tweepy, we must first make sure that our Twitter credentials are ready. With that, we can connect Tweepy to our API key and begin scraping.
If you do not have Twitter credentials, you can register for a Twitter developer account by going here. You will be asked some basic questions about how you intend to use the Twitter API. After that, you can begin the implementation.
The first step is to install the Tweepy library on your local machine, which you can do by typing:
pip install git+https://github.com/tweepy/tweepy.git
Now that we’ve installed the Tweepy library, let’s scrape 100 tweets from a user called john
on Twitter. We'll look at the full code implementation that will let us do this and discuss it in detail so we can grasp what’s going on:
import tweepy
consumer_key = "XXXX" #Your API/Consumer key
consumer_secret = "XXXX" #Your API/Consumer Secret Key
access_token = "XXXX" #Your Access token key
access_token_secret = "XXXX" #Your Access token Secret key
#Pass in our twitter API authentication key
auth = tweepy.OAuth1UserHandler(
consumer_key, consumer_secret,
access_token, access_token_secret
)
#Instantiate the tweepy API
api = tweepy.API(auth, wait_on_rate_limit=True)
username = "john"
no_of_tweets =100
try:
#The number of tweets we want to retrieved from the user
tweets = api.user_timeline(screen_name=username, count=no_of_tweets)
#Pulling Some attributes from the tweet
attributes_container = [[tweet.created_at, tweet.favorite_count,tweet.source, tweet.text] for tweet in tweets]
#Creation of column list to rename the columns in the dataframe
columns = ["Date Created", "Number of Likes", "Source of Tweet", "Tweet"]
#Creation of Dataframe
tweets_df = pd.DataFrame(attributes_container, columns=columns)
except BaseException as e:
print('Status Failed On,',str(e))
time.sleep(3)
Now let's go over each part of the code in the above block.
import tweepy
consumer_key = "XXXX" #Your API/Consumer key
consumer_secret = "XXXX" #Your API/Consumer Secret Key
access_token = "XXXX" #Your Access token key
access_token_secret = "XXXX" #Your Access token Secret key
#Pass in our twitter API authentication key
auth = tweepy.OAuth1UserHandler(
consumer_key, consumer_secret,
access_token, access_token_secret
)
#Instantiate the tweepy API
api = tweepy.API(auth, wait_on_rate_limit=True)
In the above code, we've imported the Tweepy library into our code, then we've created some variables where we store our Twitter credentials (The Tweepy authentication handler requires four of our Twitter credentials). So we then pass in those variable into the Tweepy authentication handler and save them into another variable.
Then the last statement of call is where we instantiated the Tweepy API and passed in the require parameters.
username = "john"
no_of_tweets =100
try:
#The number of tweets we want to retrieved from the user
tweets = api.user_timeline(screen_name=username, count=no_of_tweets)
#Pulling Some attributes from the tweet
attributes_container = [[tweet.created_at, tweet.favorite_count,tweet.source, tweet.text] for tweet in tweets]
#Creation of column list to rename the columns in the dataframe
columns = ["Date Created", "Number of Likes", "Source of Tweet", "Tweet"]
#Creation of Dataframe
tweets_df = pd.DataFrame(attributes_container, columns=columns)
except BaseException as e:
print('Status Failed On,',str(e))
In the above code, we created the name of the user (the @name in Twitter) we want to retrieved the tweets from and also the number of tweets. We then created an exception handler to help us catch errors in a more effective way.
After that, the api.user_timeline()
returns a collection of the most recent tweets posted by the user we picked in the screen_name
parameter and the number of tweets you want to retrieve.
In the next line of code, we passed in some attributes we want to retrieve from each tweet and saved them into a list. To see more attributes you can retrieve from a tweet, read this.
In the last chunk of code we created a dataframe and passed in the list we created along with the names of the column we created.
Note that the column names must be in the sequence of how you passed them into the attributes container (that is, how you passed those attributes in a list when you were retrieving the attributes from the tweet).
If you correctly followed the steps I described, you should have something like this:
Image by Author
Now that we are done, let's go over one more example before we move into the Snscrape implementation.
In this method, we will be retrieving a tweet based on a search. You can do that like this:
import tweepy
consumer_key = "XXXX" #Your API/Consumer key
consumer_secret = "XXXX" #Your API/Consumer Secret Key
access_token = "XXXX" #Your Access token key
access_token_secret = "XXXX" #Your Access token Secret key
#Pass in our twitter API authentication key
auth = tweepy.OAuth1UserHandler(
consumer_key, consumer_secret,
access_token, access_token_secret
)
#Instantiate the tweepy API
api = tweepy.API(auth, wait_on_rate_limit=True)
search_query = "sex for grades"
no_of_tweets =150
try:
#The number of tweets we want to retrieved from the search
tweets = api.search_tweets(q=search_query, count=no_of_tweets)
#Pulling Some attributes from the tweet
attributes_container = [[tweet.user.name, tweet.created_at, tweet.favorite_count, tweet.source, tweet.text] for tweet in tweets]
#Creation of column list to rename the columns in the dataframe
columns = ["User", "Date Created", "Number of Likes", "Source of Tweet", "Tweet"]
#Creation of Dataframe
tweets_df = pd.DataFrame(attributes_container, columns=columns)
except BaseException as e:
print('Status Failed On,',str(e))
The above code is similar to the previous code, except that we changed the API method from api.user_timeline()
to api.search_tweets()
. We've also added tweet.user.name
to the attributes container list.
In the code above, you can see that we passed in two attributes. This is because if we only pass in tweet.user
, it would only return a dictionary user object. So we must also pass in another attribute we want to retrieve from the user object, which is name
.
You can go here to see a list of additional attributes that you can retrieve from a user object. Now you should see something like this once you run it:
Image by Author.
Alright, that just about wraps up the Tweepy implementation. Just remember that there is a limit to the number of tweets you can retrieve, and you can not retrieve tweets more than 7 days old using Tweepy.
As I mentioned previously, Snscrape does not require Twitter credentials (API key) to access it. There is also no limit to the number of tweets you can fetch.
For this example, though, we'll just retrieve the same tweets as in the previous example, but using Snscrape instead.
To use Snscrape, we must first install its library on our PC. You can do that by typing:
pip3 install git+https://github.com/JustAnotherArchivist/snscrape.git
Snscrape includes two methods for getting tweets from Twitter: the command line interface (CLI) and a Python Wrapper. Just keep in mind that the Python Wrapper is currently undocumented – but we can still get by with trial and error.
In this example, we will use the Python Wrapper because it is more intuitive than the CLI method. But if you get stuck with some code, you can always turn to the GitHub community for assistance. The contributors will be happy to help you.
To retrieve tweets from a particular user, we can do the following:
import snscrape.modules.twitter as sntwitter
import pandas as pd
# Created a list to append all tweet attributes(data)
attributes_container = []
# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:john').get_items()):
if i>100:
break
attributes_container.append([tweet.date, tweet.likeCount, tweet.sourceLabel, tweet.content])
# Creating a dataframe from the tweets list above
tweets_df = pd.DataFrame(attributes_container, columns=["Date Created", "Number of Likes", "Source of Tweet", "Tweets"])
Let's go over some of the code that you might not understand at first glance:
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:john').get_items()):
if i>100:
break
attributes_container.append([tweet.date, tweet.likeCount, tweet.sourceLabel, tweet.content])
# Creating a dataframe from the tweets list above
tweets_df = pd.DataFrame(attributes_container, columns=["Date Created", "Number of Likes", "Source of Tweet", "Tweets"])
In the above code, what the sntwitter.TwitterSearchScaper
does is return an object of tweets from the name of the user we passed into it (which is john).
As I mentioned earlier, Snscrape does not have limits on numbers of tweets so it will return however many tweets from that user. To help with this, we need to add the enumerate function which will iterate through the object and add a counter so we can access the most recent 100 tweets from the user.
You can see that the attributes syntax we get from each tweet looks like the one from Tweepy. These are the list of attributes that we can get from the Snscrape tweet which was curated by Martin Beck.
Credit: Martin Beck
More attributes might be added, as the Snscrape library is still in development. Like for instance in the above image, source
has been replaced with sourceLabel
. If you pass in only source
it will return an object.
If you run the above code, you should see something like this as well:
Image by Author
Now let's do the same for scraping by search.
import snscrape.modules.twitter as sntwitter
import pandas as pd
# Creating list to append tweet data to
attributes_container = []
# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('sex for grades since:2021-07-05 until:2022-07-06').get_items()):
if i>150:
break
attributes_container.append([tweet.user.username, tweet.date, tweet.likeCount, tweet.sourceLabel, tweet.content])
# Creating a dataframe to load the list
tweets_df = pd.DataFrame(attributes_container, columns=["User", "Date Created", "Number of Likes", "Source of Tweet", "Tweet"])
Again, you can access a lot of historical data using Snscrape (unlike Tweepy, as its standard API cannot exceed 7 days. The premium API is 30 days.). So we can pass in the date from which we want to start the search and the date we want it to end in the sntwitter.TwitterSearchScraper()
method.
What we've done in the preceding code is basically what we discussed before. The only thing to bear in mind is that until works similarly to the range function in Python (that is, it excludes the last integer). So if you want to get tweets from today, you need to include the day after today in the "until" parameter.
Image of Author.
Now you know how to scrape tweets with Snscrape, too!
Now that we've seen how each method works, you might be wondering when to use which.
Well, there is no universal rule for when to utilize each method. Everything comes down to a matter preference and your use case.
If you want to acquire an endless number of tweets, you should use Snscrape. But if you want to use extra features that Snscrape cannot provide (like geolocation, for example), then you should definitely use Tweepy. It is directly integrated with the Twitter API and provides complete functionality.
Even so, Snscrape is the most commonly used method for basic scraping.
Conclusion
In this article, we learned how to scrape data from Python using Tweepy and Snscrape. But this was only a brief overview of how each approach works. You can learn more by exploring the web for additional information.
I've included some useful resources that you can use if you need additional information. Thank you for reading.
Source: https://www.freecodecamp.org/news/python-web-scraping-tutorial/
1657769340
如果您是数据爱好者,您可能会同意社交媒体是现实世界数据中最丰富的来源之一。像 Twitter 这样的网站充满了数据。
您可以通过多种方式使用从社交媒体获得的数据,例如针对特定问题或感兴趣领域的情绪分析(分析人们的想法)。
您可以通过多种方式从 Twitter 上抓取(或收集)数据。在本文中,我们将研究其中两种方式:使用 Tweepy 和 Snscrap。
我们将学习一种方法来抓取人们关于特定趋势主题的公开对话,以及来自特定用户的推文。
现在事不宜迟,让我们开始吧。
现在,在我们进入每个平台的实现之前,让我们尝试掌握每个平台的差异和限制。
Tweepy 是一个用于与 Twitter API 集成的 Python 库。因为 Tweepy 与 Twitter API 连接,除了抓取推文之外,您还可以执行复杂的查询。它使您能够利用 Twitter API 的所有功能。
但也有一些缺点——比如它的标准 API 只允许您收集长达一周的推文(也就是说,Tweepy 不允许恢复超过一周窗口的推文,因此不允许检索历史数据)。
此外,您可以从用户帐户中检索多少条推文也是有限制的。您可以在此处阅读有关 Tweepy 功能的更多信息。
Snscape 是另一种从 Twitter 上抓取信息的方法,不需要使用 API。Snscrape 允许您抓取基本信息,例如用户的个人资料、推文内容、来源等。
Snscape 不仅限于 Twitter,还可以从其他著名的社交媒体网络(如 Facebook、Instagram 等)中抓取内容。
它的优点是可以检索的推文数量或推文窗口(即推文的日期范围)没有限制。因此,Snscape 允许您检索旧数据。
但一个缺点是它缺乏 Tweepy 的所有其他功能——不过,如果你只想抓取推文,Snscrap 就足够了。
现在我们已经阐明了这两种方法之间的区别,让我们一一来看看它们的实现。
在我们开始使用 Tweepy 之前,我们必须首先确保我们的 Twitter 凭据已准备好。有了它,我们可以将 Tweepy 连接到我们的 API 密钥并开始抓取。
如果您没有 Twitter 凭据,您可以前往此处注册 Twitter 开发者帐户。您将被问及一些关于您打算如何使用 Twitter API 的基本问题。之后,您可以开始实施。
第一步是在你的本地机器上安装 Tweepy 库,你可以通过键入:
pip install git+https://github.com/tweepy/tweepy.git
现在我们已经安装了 Tweepy 库,让我们从john
Twitter 上调用的用户那里抓取 100 条推文。我们将查看完整的代码实现,让我们这样做并详细讨论它,以便我们了解发生了什么:
import tweepy
consumer_key = "XXXX" #Your API/Consumer key
consumer_secret = "XXXX" #Your API/Consumer Secret Key
access_token = "XXXX" #Your Access token key
access_token_secret = "XXXX" #Your Access token Secret key
#Pass in our twitter API authentication key
auth = tweepy.OAuth1UserHandler(
consumer_key, consumer_secret,
access_token, access_token_secret
)
#Instantiate the tweepy API
api = tweepy.API(auth, wait_on_rate_limit=True)
username = "john"
no_of_tweets =100
try:
#The number of tweets we want to retrieved from the user
tweets = api.user_timeline(screen_name=username, count=no_of_tweets)
#Pulling Some attributes from the tweet
attributes_container = [[tweet.created_at, tweet.favorite_count,tweet.source, tweet.text] for tweet in tweets]
#Creation of column list to rename the columns in the dataframe
columns = ["Date Created", "Number of Likes", "Source of Tweet", "Tweet"]
#Creation of Dataframe
tweets_df = pd.DataFrame(attributes_container, columns=columns)
except BaseException as e:
print('Status Failed On,',str(e))
time.sleep(3)
现在让我们回顾一下上面代码块中的每一部分代码。
import tweepy
consumer_key = "XXXX" #Your API/Consumer key
consumer_secret = "XXXX" #Your API/Consumer Secret Key
access_token = "XXXX" #Your Access token key
access_token_secret = "XXXX" #Your Access token Secret key
#Pass in our twitter API authentication key
auth = tweepy.OAuth1UserHandler(
consumer_key, consumer_secret,
access_token, access_token_secret
)
#Instantiate the tweepy API
api = tweepy.API(auth, wait_on_rate_limit=True)
在上面的代码中,我们将 Tweepy 库导入到我们的代码中,然后我们创建了一些变量来存储我们的 Twitter 凭据(Tweepy 身份验证处理程序需要我们的四个 Twitter 凭据)。所以我们然后将这些变量传递给 Tweepy 身份验证处理程序并将它们保存到另一个变量中。
然后最后一个调用语句是我们实例化 Tweepy API 并传入 require 参数的地方。
username = "john"
no_of_tweets =100
try:
#The number of tweets we want to retrieved from the user
tweets = api.user_timeline(screen_name=username, count=no_of_tweets)
#Pulling Some attributes from the tweet
attributes_container = [[tweet.created_at, tweet.favorite_count,tweet.source, tweet.text] for tweet in tweets]
#Creation of column list to rename the columns in the dataframe
columns = ["Date Created", "Number of Likes", "Source of Tweet", "Tweet"]
#Creation of Dataframe
tweets_df = pd.DataFrame(attributes_container, columns=columns)
except BaseException as e:
print('Status Failed On,',str(e))
在上面的代码中,我们创建了要从中检索推文的用户名(Twitter 中的@name)以及推文的数量。然后我们创建了一个异常处理程序来帮助我们以更有效的方式捕获错误。
之后,api.user_timeline()
返回我们在参数中选择的用户发布的最新推文的集合以及screen_name
您要检索的推文数量。
在下一行代码中,我们传入了一些我们想从每条推文中检索的属性,并将它们保存到一个列表中。要查看可以从推文中检索到的更多属性,请阅读此。
在最后一段代码中,我们创建了一个数据框,并传入了我们创建的列表以及我们创建的列的名称。
请注意,列名必须按照您将它们传递到属性容器的顺序(即,当您从推文中检索属性时,您如何在列表中传递这些属性)。
如果你正确地按照我描述的步骤,你应该有这样的:
作者图片
现在我们已经完成了,在我们进入 Snscrap 实现之前,让我们再看一个例子。
在这种方法中,我们将根据搜索检索推文。你可以这样做:
import tweepy
consumer_key = "XXXX" #Your API/Consumer key
consumer_secret = "XXXX" #Your API/Consumer Secret Key
access_token = "XXXX" #Your Access token key
access_token_secret = "XXXX" #Your Access token Secret key
#Pass in our twitter API authentication key
auth = tweepy.OAuth1UserHandler(
consumer_key, consumer_secret,
access_token, access_token_secret
)
#Instantiate the tweepy API
api = tweepy.API(auth, wait_on_rate_limit=True)
search_query = "sex for grades"
no_of_tweets =150
try:
#The number of tweets we want to retrieved from the search
tweets = api.search_tweets(q=search_query, count=no_of_tweets)
#Pulling Some attributes from the tweet
attributes_container = [[tweet.user.name, tweet.created_at, tweet.favorite_count, tweet.source, tweet.text] for tweet in tweets]
#Creation of column list to rename the columns in the dataframe
columns = ["User", "Date Created", "Number of Likes", "Source of Tweet", "Tweet"]
#Creation of Dataframe
tweets_df = pd.DataFrame(attributes_container, columns=columns)
except BaseException as e:
print('Status Failed On,',str(e))
上面的代码与前面的代码类似,只是我们将 API 方法从 更改api.user_timeline()
为api.search_tweets()
。我们还添加tweet.user.name
了属性容器列表。
在上面的代码中,你可以看到我们传入了两个属性。这是因为如果我们只传入tweet.user
,它只会返回一个字典用户对象。所以我们还必须传入另一个我们想从用户对象中检索的属性,即name
.
您可以在此处查看可以从用户对象中检索的附加属性列表。现在,一旦您运行它,您应该会看到类似这样的内容:
图片由作者提供。
好的,这就是 Tweepy 的实现。请记住,您可以检索的推文数量是有限制的,并且您不能使用 Tweepy 检索超过 7 天的推文。
正如我之前提到的,Snscrape 不需要 Twitter 凭据(API 密钥)来访问它。您可以获取的推文数量也没有限制。
但是,对于这个示例,我们将只检索与上一个示例相同的推文,但使用 Snscape。
要使用 Snscrap,我们必须首先在我们的 PC 上安装它的库。您可以通过键入:
pip3 install git+https://github.com/JustAnotherArchivist/snscrape.git
Snscrape 包括两种从 Twitter 获取推文的方法:命令行界面 (CLI) 和 Python Wrapper。请记住,Python Wrapper 目前没有文档记录——但我们仍然可以通过反复试验来度过难关。
在本例中,我们将使用 Python Wrapper,因为它比 CLI 方法更直观。但是,如果您遇到一些代码问题,您可以随时向 GitHub 社区寻求帮助。贡献者将很乐意为您提供帮助。
要检索特定用户的推文,我们可以执行以下操作:
import snscrape.modules.twitter as sntwitter
import pandas as pd
# Created a list to append all tweet attributes(data)
attributes_container = []
# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:john').get_items()):
if i>100:
break
attributes_container.append([tweet.date, tweet.likeCount, tweet.sourceLabel, tweet.content])
# Creating a dataframe from the tweets list above
tweets_df = pd.DataFrame(attributes_container, columns=["Date Created", "Number of Likes", "Source of Tweet", "Tweets"])
让我们复习一下你可能第一眼看不懂的一些代码:
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:john').get_items()):
if i>100:
break
attributes_container.append([tweet.date, tweet.likeCount, tweet.sourceLabel, tweet.content])
# Creating a dataframe from the tweets list above
tweets_df = pd.DataFrame(attributes_container, columns=["Date Created", "Number of Likes", "Source of Tweet", "Tweets"])
在上面的代码中,所做的sntwitter.TwitterSearchScaper
是从我们传递给它的用户名(即 john)中返回一个推文对象。
正如我之前提到的,Snscrape 对推文的数量没有限制,因此它会返回来自该用户的许多推文。为了解决这个问题,我们需要添加枚举函数,该函数将遍历对象并添加一个计数器,以便我们可以访问用户最近的 100 条推文。
您可以看到,我们从每条推文中获得的属性语法与 Tweepy 中的类似。这些是我们可以从 Martin Beck 策划的 Snscape 推文中获得的属性列表。
学分:马丁贝克
可能会添加更多属性,因为 Snscape 库仍在开发中。例如上图中的,source
已替换为sourceLabel
. 如果你只传入source
它会返回一个对象。
如果你运行上面的代码,你应该也会看到类似这样的东西:
作者图片
现在让我们对通过搜索进行抓取做同样的事情。
import snscrape.modules.twitter as sntwitter
import pandas as pd
# Creating list to append tweet data to
attributes_container = []
# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('sex for grades since:2021-07-05 until:2022-07-06').get_items()):
if i>150:
break
attributes_container.append([tweet.user.username, tweet.date, tweet.likeCount, tweet.sourceLabel, tweet.content])
# Creating a dataframe to load the list
tweets_df = pd.DataFrame(attributes_container, columns=["User", "Date Created", "Number of Likes", "Source of Tweet", "Tweet"])
同样,您可以使用 Snscrape 访问大量历史数据(与 Tweepy 不同,因为它的标准 API 不能超过 7 天。高级 API 是 30 天。)。所以我们可以在方法中传入我们想要开始搜索的日期和想要结束的日期sntwitter.TwitterSearchScraper()
。
我们在前面的代码中所做的基本上就是我们之前讨论过的。唯一要记住的是,直到与 Python 中的范围函数类似(也就是说,它不包括最后一个整数)。因此,如果您想从今天开始获取推文,则需要在“直到”参数中包含今天之后的一天。
作者的形象。
现在您也知道如何使用 Snscape 抓取推文了!
现在我们已经了解了每种方法的工作原理,您可能想知道何时使用哪种方法。
好吧,对于何时使用每种方法没有通用规则。一切都取决于问题偏好和您的用例。
如果你想获得无穷无尽的推文,你应该使用 Snscrap。但是,如果您想使用 Snscrape 无法提供的额外功能(例如地理定位),那么您绝对应该使用 Tweepy。它直接与 Twitter API 集成并提供完整的功能。
即便如此,Snscrape 是最常用的基本刮削方法。
结论
在本文中,我们学习了如何使用 Tweepy 和 Snscrap 从 Python 中抓取数据。但这只是对每种方法如何工作的简要概述。您可以通过浏览网络了解更多信息以获取更多信息。
我提供了一些有用的资源,如果您需要更多信息,可以使用它们。感谢您的阅读。
来源:https ://www.freecodecamp.org/news/python-web-scraping-tutorial/