Hoverfly Java: Easy Creation of Stub Http Servers for Testing

Hoverfly Java

A Java native language binding for Hoverfly, a Go proxy which allows you to simulate http services in your unit tests. Another term for this is Service Virtualisation.

Features

  • Simulation of http/https services
  • Strict or loose http request matching based on URL, method, body and header combinations
  • Fluent and expressive DSL for easy generation of simulated services
  • Automatic marshalling of objects into JSON during request / response body generation
  • Create simulations by capturing live traffic
  • Hoverfly is a proxy, so you don’t need to alter the host that you make requests to
  • Multiple hosts / services per single instance of Hoverfly
  • Https automatically supported, no extra configuration required
  • Supports Mutual TLS authentication capture
  • Interoperable with standard Hoverfly JSON, making it easy to re-use data between Java and other native language bindings.
  • Use externally managed Hoverfly cluster for API simulations
  • Request verification
  • Response templating
  • Stateful capture / simulation

JUnit 5 extension

Documentation

Full documentation is available here

Maven Dependency

<dependency>
    <groupId>io.specto</groupId>
    <artifactId>hoverfly-java</artifactId>
    <version>0.14.0</version>
    <scope>test</scope>
</dependency>

Example

Create API simulation using capture mode

// Capture and output HTTP traffic to json file
@ClassRule
public static HoverflyRule hoverflyRule = HoverflyRule.inCaptureMode("simulation.json");


// After the capturing, switch to inSimulationMode to spin up a stub server
@ClassRule
public static HoverflyRule hoverflyRule = HoverflyRule.inSimulationMode(defaultPath("simulation.json"));

// Or you can use both approaches at once. If json file not present in capture mode, if present in simulation mode
@ClassRule
public static HoverflyRule hoverflyRule = HoverflyRule.inCaptureOrSimulationMode("simulation.json");

Create API simulation using DSL

@ClassRule
public static HoverflyRule hoverflyRule = HoverflyRule.inSimulationMode(dsl(
    service("www.my-test.com")
        .get("/api/bookings/1")
        .willReturn(created("http://localhost/api/bookings/1"))
));

@Test
public void shouldBeAbleToGetABookingUsingHoverfly() {
    // When
    final ResponseEntity<String> getBookingResponse = restTemplate.getForEntity("http://www.my-test.com/api/bookings/1", String.class);

    // Then
    assertEquals(bookFlightResponse.getStatusCode(), CREATED);
    assertEquals(bookFlightResponse.getHeaders().getLocation(), "http://localhost/api/bookings/1");
}

Some code examples for the DSL are available here.

More code examples for the DSL using request matchers can be found here.

Verify requests

// Verify that at least one request to a specific endpoint with any query params
hoverflyRule.verify(service(matches("*.flight.*")).get("/api/bookings").anyQueryParams(), atLeastOnce());

// Verify that an external service/dependency was not called
hoverflyRule.verifyZeroRequestTo(service(matches("*.flight.*")));

// Verify all the stubbed requests were made at least once
hoverflyRule.verifyAll();

Contributions

Contributions are welcome!

To submit a pull request you should fork the Hoverfly-Java repository, and make your change on a feature branch of your fork.

As of v0.10.2, hoverfly binaries are no longer stored in the repository. You should run ./gradlew clean test once to cache the binaries for development with your IDE.

If you have forked this project prior to v0.10.2, please re-fork to get a slimmer version of the repository.

Issues

Feel free to raise an issues on Github.

Download details:

Author: SpectoLabs
Source code: https://github.com/SpectoLabs/hoverfly-java
License: Apache-2.0, Unknown licenses found

#java #testing

What is GEEK

Buddha Community

Hoverfly Java: Easy Creation of Stub Http Servers for Testing
Joseph  Murray

Joseph Murray

1621492530

7 Test Frameworks To Follow in 2021 for Java/Fullstack Developers

It is time to learn new test frameworks in 2021 to improve your code quality and decrease the time of your testing phase. Let’s explore 6 options for devs.

It is time to learn new test frameworks to improve your code quality and decrease the time of your testing phase. I have selected six testing frameworks that sound promising. Some have existed for quite a long time but I have not heard about them before.

At the end of the article, please tell me what you think about them and what your favorite ones are.

Robot Framework

Robot Framework is a generic open-source automation framework. It can be used for test automation and robotic process automation (RPA).

Robot Framework is open and extensible and can be integrated with virtually any other tool to create powerful and flexible automation solutions. Being open-source also means that Robot Framework is free to use without licensing costs.

The RoboFramework is a framework** to write test cases and automation processes.** It means that it may replace** your classic combo Selenium + Cucumber + Gherkins**. To be more precise, the Cucumber Gherkins custom implementation you wrote will be handled by RoboFramework and Selenium invoked below.

For the Java developers, this framework can be executed with Maven or Gradle (but less mature for the latter solution).

#java #testing #test #java framework #java frameworks #testing and developing #java testing #robot framework #test framework #2021

Tyrique  Littel

Tyrique Littel

1600135200

How to Install OpenJDK 11 on CentOS 8

What is OpenJDK?

OpenJDk or Open Java Development Kit is a free, open-source framework of the Java Platform, Standard Edition (or Java SE). It contains the virtual machine, the Java Class Library, and the Java compiler. The difference between the Oracle OpenJDK and Oracle JDK is that OpenJDK is a source code reference point for the open-source model. Simultaneously, the Oracle JDK is a continuation or advanced model of the OpenJDK, which is not open source and requires a license to use.

In this article, we will be installing OpenJDK on Centos 8.

#tutorials #alternatives #centos #centos 8 #configuration #dnf #frameworks #java #java development kit #java ee #java environment variables #java framework #java jdk #java jre #java platform #java sdk #java se #jdk #jre #open java development kit #open source #openjdk #openjdk 11 #openjdk 8 #openjdk runtime environment

LaravelS: Glue for using Swoole in Laravel Or Lumen

🚀 LaravelS is an out-of-the-box adapter between Swoole and Laravel/Lumen.

Please Watch this repository to get the latest updates.

 _                               _  _____ 
| |                             | |/ ____|
| |     __ _ _ __ __ ___   _____| | (___  
| |    / _` | '__/ _` \ \ / / _ \ |\___ \ 
| |___| (_| | | | (_| |\ V /  __/ |____) |
|______\__,_|_|  \__,_| \_/ \___|_|_____/ 

中文文档

Features

Built-in Http/WebSocket server

Multi-port mixed protocol

Custom process

Memory resident

Asynchronous event listening

Asynchronous task queue

Millisecond cron job

Common Components

Gracefully reload

Automatically reload after modifying code

Support Laravel/Lumen both, good compatibility

Simple & Out of the box

Benchmark

Which is the fastest web framework?

TechEmpower Framework Benchmarks

Requirements

DependencyRequirement
PHP>= 5.5.9 Recommend PHP7+
Swoole>= 1.7.19 No longer support PHP5 since 2.0.12 Recommend 4.5.0+
Laravel/Lumen>= 5.1 Recommend 8.0+

Install

1.Require package via Composer(packagist).

composer require "hhxsv5/laravel-s:~3.7.0" -vvv
# Make sure that your composer.lock file is under the VCS

2.Register service provider(pick one of two).

Laravel: in config/app.php file, Laravel 5.5+ supports package discovery automatically, you should skip this step

'providers' => [
    //...
    Hhxsv5\LaravelS\Illuminate\LaravelSServiceProvider::class,
],

Lumen: in bootstrap/app.php file

$app->register(Hhxsv5\LaravelS\Illuminate\LaravelSServiceProvider::class);

3.Publish configuration and binaries.

After upgrading LaravelS, you need to republish; click here to see the change notes of each version.

php artisan laravels publish
# Configuration: config/laravels.php
# Binary: bin/laravels bin/fswatch bin/inotify

4.Change config/laravels.php: listen_ip, listen_port, refer Settings.

5.Performance tuning

Adjust kernel parameters

Number of Workers: LaravelS uses Swoole's Synchronous IO mode, the larger the worker_num setting, the better the concurrency performance, but it will cause more memory usage and process switching overhead. If one request takes 100ms, in order to provide 1000QPS concurrency, at least 100 Worker processes need to be configured. The calculation method is: worker_num = 1000QPS/(1s/1ms) = 100, so incremental pressure testing is needed to calculate the best worker_num.

Number of Task Workers

Run

Please read the notices carefully before running, Important notices(IMPORTANT).

  • Commands: php bin/laravels {start|stop|restart|reload|info|help}.
CommandDescription
startStart LaravelS, list the processes by "ps -ef|grep laravels"
stopStop LaravelS, and trigger the method onStop of Custom process
restartRestart LaravelS: Stop gracefully before starting; The service is unavailable until startup is complete
reloadReload all Task/Worker/Timer processes which contain your business codes, and trigger the method onReload of Custom process, CANNOT reload Master/Manger processes. After modifying config/laravels.php, you only have to call restart to restart
infoDisplay component version information
helpDisplay help information
  • Boot options for the commands start and restart.
OptionDescription
-d|--daemonizeRun as a daemon, this option will override the swoole.daemonize setting in laravels.php
-e|--envThe environment the command should run under, such as --env=testing will use the configuration file .env.testing firstly, this feature requires Laravel 5.2+
-i|--ignoreIgnore checking PID file of Master process
-x|--x-versionThe version(branch) of the current project, stored in $_ENV/$_SERVER, access via $_ENV['X_VERSION'] $_SERVER['X_VERSION'] $request->server->get('X_VERSION')
  • Runtime files: start will automatically execute php artisan laravels config and generate these files, developers generally don't need to pay attention to them, it's recommended to add them to .gitignore.
FileDescription
storage/laravels.confLaravelS's runtime configuration file
storage/laravels.pidPID file of Master process
storage/laravels-timer-process.pidPID file of the Timer process
storage/laravels-custom-processes.pidPID file of all custom processes

Deploy

It is recommended to supervise the main process through Supervisord, the premise is without option -d and to set swoole.daemonize to false.

[program:laravel-s-test]
directory=/var/www/laravel-s-test
command=/usr/local/bin/php bin/laravels start -i
numprocs=1
autostart=true
autorestart=true
startretries=3
user=www-data
redirect_stderr=true
stdout_logfile=/var/log/supervisor/%(program_name)s.log

Cooperate with Nginx (Recommended)

Demo.

gzip on;
gzip_min_length 1024;
gzip_comp_level 2;
gzip_types text/plain text/css text/javascript application/json application/javascript application/x-javascript application/xml application/x-httpd-php image/jpeg image/gif image/png font/ttf font/otf image/svg+xml;
gzip_vary on;
gzip_disable "msie6";
upstream swoole {
    # Connect IP:Port
    server 127.0.0.1:5200 weight=5 max_fails=3 fail_timeout=30s;
    # Connect UnixSocket Stream file, tips: put the socket file in the /dev/shm directory to get better performance
    #server unix:/yourpath/laravel-s-test/storage/laravels.sock weight=5 max_fails=3 fail_timeout=30s;
    #server 192.168.1.1:5200 weight=3 max_fails=3 fail_timeout=30s;
    #server 192.168.1.2:5200 backup;
    keepalive 16;
}
server {
    listen 80;
    # Don't forget to bind the host
    server_name laravels.com;
    root /yourpath/laravel-s-test/public;
    access_log /yourpath/log/nginx/$server_name.access.log  main;
    autoindex off;
    index index.html index.htm;
    # Nginx handles the static resources(recommend enabling gzip), LaravelS handles the dynamic resource.
    location / {
        try_files $uri @laravels;
    }
    # Response 404 directly when request the PHP file, to avoid exposing public/*.php
    #location ~* \.php$ {
    #    return 404;
    #}
    location @laravels {
        # proxy_connect_timeout 60s;
        # proxy_send_timeout 60s;
        # proxy_read_timeout 120s;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Real-PORT $remote_port;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_set_header Scheme $scheme;
        proxy_set_header Server-Protocol $server_protocol;
        proxy_set_header Server-Name $server_name;
        proxy_set_header Server-Addr $server_addr;
        proxy_set_header Server-Port $server_port;
        # "swoole" is the upstream
        proxy_pass http://swoole;
    }
}

Cooperate with Apache

LoadModule proxy_module /yourpath/modules/mod_proxy.so
LoadModule proxy_balancer_module /yourpath/modules/mod_proxy_balancer.so
LoadModule lbmethod_byrequests_module /yourpath/modules/mod_lbmethod_byrequests.so
LoadModule proxy_http_module /yourpath/modules/mod_proxy_http.so
LoadModule slotmem_shm_module /yourpath/modules/mod_slotmem_shm.so
LoadModule rewrite_module /yourpath/modules/mod_rewrite.so
LoadModule remoteip_module /yourpath/modules/mod_remoteip.so
LoadModule deflate_module /yourpath/modules/mod_deflate.so

<IfModule deflate_module>
    SetOutputFilter DEFLATE
    DeflateCompressionLevel 2
    AddOutputFilterByType DEFLATE text/html text/plain text/css text/javascript application/json application/javascript application/x-javascript application/xml application/x-httpd-php image/jpeg image/gif image/png font/ttf font/otf image/svg+xml
</IfModule>

<VirtualHost *:80>
    # Don't forget to bind the host
    ServerName www.laravels.com
    ServerAdmin hhxsv5@sina.com

    DocumentRoot /yourpath/laravel-s-test/public;
    DirectoryIndex index.html index.htm
    <Directory "/">
        AllowOverride None
        Require all granted
    </Directory>

    RemoteIPHeader X-Forwarded-For

    ProxyRequests Off
    ProxyPreserveHost On
    <Proxy balancer://laravels>  
        BalancerMember http://192.168.1.1:5200 loadfactor=7
        #BalancerMember http://192.168.1.2:5200 loadfactor=3
        #BalancerMember http://192.168.1.3:5200 loadfactor=1 status=+H
        ProxySet lbmethod=byrequests
    </Proxy>
    #ProxyPass / balancer://laravels/
    #ProxyPassReverse / balancer://laravels/

    # Apache handles the static resources, LaravelS handles the dynamic resource.
    RewriteEngine On
    RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} !-d
    RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} !-f
    RewriteRule ^/(.*)$ balancer://laravels%{REQUEST_URI} [P,L]

    ErrorLog ${APACHE_LOG_DIR}/www.laravels.com.error.log
    CustomLog ${APACHE_LOG_DIR}/www.laravels.com.access.log combined
</VirtualHost>

Enable WebSocket server

The Listening address of WebSocket Sever is the same as Http Server.

1.Create WebSocket Handler class, and implement interface WebSocketHandlerInterface.The instant is automatically instantiated when start, you do not need to manually create it.

namespace App\Services;
use Hhxsv5\LaravelS\Swoole\WebSocketHandlerInterface;
use Swoole\Http\Request;
use Swoole\Http\Response;
use Swoole\WebSocket\Frame;
use Swoole\WebSocket\Server;
/**
 * @see https://www.swoole.co.uk/docs/modules/swoole-websocket-server
 */
class WebSocketService implements WebSocketHandlerInterface
{
    // Declare constructor without parameters
    public function __construct()
    {
    }
    // public function onHandShake(Request $request, Response $response)
    // {
           // Custom handshake: https://www.swoole.co.uk/docs/modules/swoole-websocket-server-on-handshake
           // The onOpen event will be triggered automatically after a successful handshake
    // }
    public function onOpen(Server $server, Request $request)
    {
        // Before the onOpen event is triggered, the HTTP request to establish the WebSocket has passed the Laravel route,
        // so Laravel's Request, Auth information are readable, Session is readable and writable, but only in the onOpen event.
        // \Log::info('New WebSocket connection', [$request->fd, request()->all(), session()->getId(), session('xxx'), session(['yyy' => time()])]);
        // The exceptions thrown here will be caught by the upper layer and recorded in the Swoole log. Developers need to try/catch manually.
        $server->push($request->fd, 'Welcome to LaravelS');
    }
    public function onMessage(Server $server, Frame $frame)
    {
        // \Log::info('Received message', [$frame->fd, $frame->data, $frame->opcode, $frame->finish]);
        // The exceptions thrown here will be caught by the upper layer and recorded in the Swoole log. Developers need to try/catch manually.
        $server->push($frame->fd, date('Y-m-d H:i:s'));
    }
    public function onClose(Server $server, $fd, $reactorId)
    {
        // The exceptions thrown here will be caught by the upper layer and recorded in the Swoole log. Developers need to try/catch manually.
    }
}

2.Modify config/laravels.php.

// ...
'websocket'      => [
    'enable'  => true, // Note: set enable to true
    'handler' => \App\Services\WebSocketService::class,
],
'swoole'         => [
    //...
    // Must set dispatch_mode in (2, 4, 5), see https://www.swoole.co.uk/docs/modules/swoole-server/configuration
    'dispatch_mode' => 2,
    //...
],
// ...

3.Use SwooleTable to bind FD & UserId, optional, Swoole Table Demo. Also you can use the other global storage services, like Redis/Memcached/MySQL, but be careful that FD will be possible conflicting between multiple Swoole Servers.

4.Cooperate with Nginx (Recommended)

Refer WebSocket Proxy

map $http_upgrade $connection_upgrade {
    default upgrade;
    ''      close;
}
upstream swoole {
    # Connect IP:Port
    server 127.0.0.1:5200 weight=5 max_fails=3 fail_timeout=30s;
    # Connect UnixSocket Stream file, tips: put the socket file in the /dev/shm directory to get better performance
    #server unix:/yourpath/laravel-s-test/storage/laravels.sock weight=5 max_fails=3 fail_timeout=30s;
    #server 192.168.1.1:5200 weight=3 max_fails=3 fail_timeout=30s;
    #server 192.168.1.2:5200 backup;
    keepalive 16;
}
server {
    listen 80;
    # Don't forget to bind the host
    server_name laravels.com;
    root /yourpath/laravel-s-test/public;
    access_log /yourpath/log/nginx/$server_name.access.log  main;
    autoindex off;
    index index.html index.htm;
    # Nginx handles the static resources(recommend enabling gzip), LaravelS handles the dynamic resource.
    location / {
        try_files $uri @laravels;
    }
    # Response 404 directly when request the PHP file, to avoid exposing public/*.php
    #location ~* \.php$ {
    #    return 404;
    #}
    # Http and WebSocket are concomitant, Nginx identifies them by "location"
    # !!! The location of WebSocket is "/ws"
    # Javascript: var ws = new WebSocket("ws://laravels.com/ws");
    location =/ws {
        # proxy_connect_timeout 60s;
        # proxy_send_timeout 60s;
        # proxy_read_timeout: Nginx will close the connection if the proxied server does not send data to Nginx in 60 seconds; At the same time, this close behavior is also affected by heartbeat setting of Swoole.
        # proxy_read_timeout 60s;
        proxy_http_version 1.1;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Real-PORT $remote_port;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_set_header Scheme $scheme;
        proxy_set_header Server-Protocol $server_protocol;
        proxy_set_header Server-Name $server_name;
        proxy_set_header Server-Addr $server_addr;
        proxy_set_header Server-Port $server_port;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
        proxy_pass http://swoole;
    }
    location @laravels {
        # proxy_connect_timeout 60s;
        # proxy_send_timeout 60s;
        # proxy_read_timeout 60s;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Real-PORT $remote_port;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_set_header Scheme $scheme;
        proxy_set_header Server-Protocol $server_protocol;
        proxy_set_header Server-Name $server_name;
        proxy_set_header Server-Addr $server_addr;
        proxy_set_header Server-Port $server_port;
        proxy_pass http://swoole;
    }
}

5.Heartbeat setting

Heartbeat setting of Swoole

// config/laravels.php
'swoole' => [
    //...
    // All connections are traversed every 60 seconds. If a connection does not send any data to the server within 600 seconds, the connection will be forced to close.
    'heartbeat_idle_time'      => 600,
    'heartbeat_check_interval' => 60,
    //...
],

Proxy read timeout of Nginx

# Nginx will close the connection if the proxied server does not send data to Nginx in 60 seconds
proxy_read_timeout 60s;

6.Push data in controller

namespace App\Http\Controllers;
class TestController extends Controller
{
    public function push()
    {
        $fd = 1; // Find fd by userId from a map [userId=>fd].
        /**@var \Swoole\WebSocket\Server $swoole */
        $swoole = app('swoole');
        $success = $swoole->push($fd, 'Push data to fd#1 in Controller');
        var_dump($success);
    }
}

Listen events

System events

Usually, you can reset/destroy some global/static variables, or change the current Request/Response object.

laravels.received_request After LaravelS parsed Swoole\Http\Request to Illuminate\Http\Request, before Laravel's Kernel handles this request.

// Edit file `app/Providers/EventServiceProvider.php`, add the following code into method `boot`
// If no variable $events, you can also call Facade \Event::listen(). 
$events->listen('laravels.received_request', function (\Illuminate\Http\Request $req, $app) {
    $req->query->set('get_key', 'hhxsv5');// Change query of request
    $req->request->set('post_key', 'hhxsv5'); // Change post of request
});

laravels.generated_response After Laravel's Kernel handled the request, before LaravelS parses Illuminate\Http\Response to Swoole\Http\Response.

// Edit file `app/Providers/EventServiceProvider.php`, add the following code into method `boot`
// If no variable $events, you can also call Facade \Event::listen(). 
$events->listen('laravels.generated_response', function (\Illuminate\Http\Request $req, \Symfony\Component\HttpFoundation\Response $rsp, $app) {
    $rsp->headers->set('header-key', 'hhxsv5');// Change header of response
});

Customized asynchronous events

This feature depends on AsyncTask of Swoole, your need to set swoole.task_worker_num in config/laravels.php firstly. The performance of asynchronous event processing is influenced by number of Swoole task process, you need to set task_worker_num appropriately.

1.Create event class.

use Hhxsv5\LaravelS\Swoole\Task\Event;
class TestEvent extends Event
{
    protected $listeners = [
        // Listener list
        TestListener1::class,
        // TestListener2::class,
    ];
    private $data;
    public function __construct($data)
    {
        $this->data = $data;
    }
    public function getData()
    {
        return $this->data;
    }
}

2.Create listener class.

use Hhxsv5\LaravelS\Swoole\Task\Task;
use Hhxsv5\LaravelS\Swoole\Task\Listener;
class TestListener1 extends Listener
{
    /**
     * @var TestEvent
     */
    protected $event;
    
    public function handle()
    {
        \Log::info(__CLASS__ . ':handle start', [$this->event->getData()]);
        sleep(2);// Simulate the slow codes
        // Deliver task in CronJob, but NOT support callback finish() of task.
        // Note: Modify task_ipc_mode to 1 or 2 in config/laravels.php, see https://www.swoole.co.uk/docs/modules/swoole-server/configuration
        $ret = Task::deliver(new TestTask('task data'));
        var_dump($ret);
        // The exceptions thrown here will be caught by the upper layer and recorded in the Swoole log. Developers need to try/catch manually.
    }
}

3.Fire event.

// Create instance of event and fire it, "fire" is asynchronous.
use Hhxsv5\LaravelS\Swoole\Task\Event;
$event = new TestEvent('event data');
// $event->delay(10); // Delay 10 seconds to fire event
// $event->setTries(3); // When an error occurs, try 3 times in total
$success = Event::fire($event);
var_dump($success);// Return true if sucess, otherwise false

Asynchronous task queue

This feature depends on AsyncTask of Swoole, your need to set swoole.task_worker_num in config/laravels.php firstly. The performance of task processing is influenced by number of Swoole task process, you need to set task_worker_num appropriately.

1.Create task class.

use Hhxsv5\LaravelS\Swoole\Task\Task;
class TestTask extends Task
{
    private $data;
    private $result;
    public function __construct($data)
    {
        $this->data = $data;
    }
    // The logic of task handling, run in task process, CAN NOT deliver task
    public function handle()
    {
        \Log::info(__CLASS__ . ':handle start', [$this->data]);
        sleep(2);// Simulate the slow codes
        // The exceptions thrown here will be caught by the upper layer and recorded in the Swoole log. Developers need to try/catch manually.
        $this->result = 'the result of ' . $this->data;
    }
    // Optional, finish event, the logic of after task handling, run in worker process, CAN deliver task 
    public function finish()
    {
        \Log::info(__CLASS__ . ':finish start', [$this->result]);
        Task::deliver(new TestTask2('task2 data')); // Deliver the other task
    }
}

2.Deliver task.

// Create instance of TestTask and deliver it, "deliver" is asynchronous.
use Hhxsv5\LaravelS\Swoole\Task\Task;
$task = new TestTask('task data');
// $task->delay(3);// delay 3 seconds to deliver task
// $task->setTries(3); // When an error occurs, try 3 times in total
$ret = Task::deliver($task);
var_dump($ret);// Return true if sucess, otherwise false

Millisecond cron job

Wrapper cron job base on Swoole's Millisecond Timer, replace Linux Crontab.

1.Create cron job class.

namespace App\Jobs\Timer;
use App\Tasks\TestTask;
use Swoole\Coroutine;
use Hhxsv5\LaravelS\Swoole\Task\Task;
use Hhxsv5\LaravelS\Swoole\Timer\CronJob;
class TestCronJob extends CronJob
{
    protected $i = 0;
    // !!! The `interval` and `isImmediate` of cron job can be configured in two ways(pick one of two): one is to overload the corresponding method, and the other is to pass parameters when registering cron job.
    // --- Override the corresponding method to return the configuration: begin
    public function interval()
    {
        return 1000;// Run every 1000ms
    }
    public function isImmediate()
    {
        return false;// Whether to trigger `run` immediately after setting up
    }
    // --- Override the corresponding method to return the configuration: end
    public function run()
    {
        \Log::info(__METHOD__, ['start', $this->i, microtime(true)]);
        // do something
        // sleep(1); // Swoole < 2.1
        Coroutine::sleep(1); // Swoole>=2.1 Coroutine will be automatically created for run().
        $this->i++;
        \Log::info(__METHOD__, ['end', $this->i, microtime(true)]);

        if ($this->i >= 10) { // Run 10 times only
            \Log::info(__METHOD__, ['stop', $this->i, microtime(true)]);
            $this->stop(); // Stop this cron job, but it will run again after restart/reload.
            // Deliver task in CronJob, but NOT support callback finish() of task.
            // Note: Modify task_ipc_mode to 1 or 2 in config/laravels.php, see https://www.swoole.co.uk/docs/modules/swoole-server/configuration
            $ret = Task::deliver(new TestTask('task data'));
            var_dump($ret);
        }
        // The exceptions thrown here will be caught by the upper layer and recorded in the Swoole log. Developers need to try/catch manually.
    }
}

2.Register cron job.

// Register cron jobs in file "config/laravels.php"
[
    // ...
    'timer'          => [
        'enable' => true, // Enable Timer
        'jobs'   => [ // The list of cron job
            // Enable LaravelScheduleJob to run `php artisan schedule:run` every 1 minute, replace Linux Crontab
            // \Hhxsv5\LaravelS\Illuminate\LaravelScheduleJob::class,
            // Two ways to configure parameters:
            // [\App\Jobs\Timer\TestCronJob::class, [1000, true]], // Pass in parameters when registering
            \App\Jobs\Timer\TestCronJob::class, // Override the corresponding method to return the configuration
        ],
        'max_wait_time' => 5, // Max waiting time of reloading
        // Enable the global lock to ensure that only one instance starts the timer when deploying multiple instances. This feature depends on Redis, please see https://laravel.com/docs/7.x/redis
        'global_lock'     => false,
        'global_lock_key' => config('app.name', 'Laravel'),
    ],
    // ...
];

3.Note: it will launch multiple timers when build the server cluster, so you need to make sure that launch one timer only to avoid running repetitive task.

4.LaravelS v3.4.0 starts to support the hot restart [Reload] Timer process. After LaravelS receives the SIGUSR1 signal, it waits for max_wait_time(default 5) seconds to end the process, then the Manager process will pull up the Timer process again.

5.If you only need to use minute-level scheduled tasks, it is recommended to enable Hhxsv5\LaravelS\Illuminate\LaravelScheduleJob instead of Linux Crontab, so that you can follow the coding habits of Laravel task scheduling and configure Kernel.

// app/Console/Kernel.php
protected function schedule(Schedule $schedule)
{
    // runInBackground() will start a new child process to execute the task. This is asynchronous and will not affect the execution timing of other tasks.
    $schedule->command(TestCommand::class)->runInBackground()->everyMinute();
}

Automatically reload after modifying code

Via inotify, support Linux only.

1.Install inotify extension.

2.Turn on the switch in Settings.

3.Notice: Modify the file only in Linux to receive the file change events. It's recommended to use the latest Docker. Vagrant Solution.

Via fswatch, support OS X/Linux/Windows.

1.Install fswatch.

2.Run command in your project root directory.

# Watch current directory
./bin/fswatch
# Watch app directory
./bin/fswatch ./app

Via inotifywait, support Linux.

1.Install inotify-tools.

2.Run command in your project root directory.

# Watch current directory
./bin/inotify
# Watch app directory
./bin/inotify ./app

When the above methods does not work, the ultimate solution: set max_request=1,worker_num=1, so that Worker process will restart after processing a request. The performance of this method is very poor, so only development environment use.

Get the instance of SwooleServer in your project

/**
 * $swoole is the instance of `Swoole\WebSocket\Server` if enable WebSocket server, otherwise `Swoole\Http\Server`
 * @var \Swoole\WebSocket\Server|\Swoole\Http\Server $swoole
 */
$swoole = app('swoole');
var_dump($swoole->stats());
$swoole->push($fd, 'Push WebSocket message');

Use SwooleTable

1.Define Table, support multiple.

All defined tables will be created before Swoole starting.

// in file "config/laravels.php"
[
    // ...
    'swoole_tables'  => [
        // Scene:bind UserId & FD in WebSocket
        'ws' => [// The Key is table name, will add suffix "Table" to avoid naming conflicts. Here defined a table named "wsTable"
            'size'   => 102400,// The max size
            'column' => [// Define the columns
                ['name' => 'value', 'type' => \Swoole\Table::TYPE_INT, 'size' => 8],
            ],
        ],
        //...Define the other tables
    ],
    // ...
];

2.Access Table: all table instances will be bound on SwooleServer, access by app('swoole')->xxxTable.

namespace App\Services;
use Hhxsv5\LaravelS\Swoole\WebSocketHandlerInterface;
use Swoole\Http\Request;
use Swoole\WebSocket\Frame;
use Swoole\WebSocket\Server;
class WebSocketService implements WebSocketHandlerInterface
{
    /**@var \Swoole\Table $wsTable */
    private $wsTable;
    public function __construct()
    {
        $this->wsTable = app('swoole')->wsTable;
    }
    // Scene:bind UserId & FD in WebSocket
    public function onOpen(Server $server, Request $request)
    {
        // var_dump(app('swoole') === $server);// The same instance
        /**
         * Get the currently logged in user
         * This feature requires that the path to establish a WebSocket connection go through middleware such as Authenticate.
         * E.g:
         * Browser side: var ws = new WebSocket("ws://127.0.0.1:5200/ws");
         * Then the /ws route in Laravel needs to add the middleware like Authenticate.
         * Route::get('/ws', function () {
         *     // Respond any content with status code 200
         *     return 'websocket';
         * })->middleware(['auth']);
         */
        // $user = Auth::user();
        // $userId = $user ? $user->id : 0; // 0 means a guest user who is not logged in
        $userId = mt_rand(1000, 10000);
        // if (!$userId) {
        //     // Disconnect the connections of unlogged users
        //     $server->disconnect($request->fd);
        //     return;
        // }
        $this->wsTable->set('uid:' . $userId, ['value' => $request->fd]);// Bind map uid to fd
        $this->wsTable->set('fd:' . $request->fd, ['value' => $userId]);// Bind map fd to uid
        $server->push($request->fd, "Welcome to LaravelS #{$request->fd}");
    }
    public function onMessage(Server $server, Frame $frame)
    {
        // Broadcast
        foreach ($this->wsTable as $key => $row) {
            if (strpos($key, 'uid:') === 0 && $server->isEstablished($row['value'])) {
                $content = sprintf('Broadcast: new message "%s" from #%d', $frame->data, $frame->fd);
                $server->push($row['value'], $content);
            }
        }
    }
    public function onClose(Server $server, $fd, $reactorId)
    {
        $uid = $this->wsTable->get('fd:' . $fd);
        if ($uid !== false) {
            $this->wsTable->del('uid:' . $uid['value']); // Unbind uid map
        }
        $this->wsTable->del('fd:' . $fd);// Unbind fd map
        $server->push($fd, "Goodbye #{$fd}");
    }
}

Multi-port mixed protocol

For more information, please refer to Swoole Server AddListener

To make our main server support more protocols not just Http and WebSocket, we bring the feature multi-port mixed protocol of Swoole in LaravelS and name it Socket. Now, you can build TCP/UDP applications easily on top of Laravel.

Create Socket handler class, and extend Hhxsv5\LaravelS\Swoole\Socket\{TcpSocket|UdpSocket|Http|WebSocket}.

namespace App\Sockets;
use Hhxsv5\LaravelS\Swoole\Socket\TcpSocket;
use Swoole\Server;
class TestTcpSocket extends TcpSocket
{
    public function onConnect(Server $server, $fd, $reactorId)
    {
        \Log::info('New TCP connection', [$fd]);
        $server->send($fd, 'Welcome to LaravelS.');
    }
    public function onReceive(Server $server, $fd, $reactorId, $data)
    {
        \Log::info('Received data', [$fd, $data]);
        $server->send($fd, 'LaravelS: ' . $data);
        if ($data === "quit\r\n") {
            $server->send($fd, 'LaravelS: bye' . PHP_EOL);
            $server->close($fd);
        }
    }
    public function onClose(Server $server, $fd, $reactorId)
    {
        \Log::info('Close TCP connection', [$fd]);
        $server->send($fd, 'Goodbye');
    }
}

These Socket connections share the same worker processes with your HTTP/WebSocket connections. So it won't be a problem at all if you want to deliver tasks, use SwooleTable, even Laravel components such as DB, Eloquent and so on. At the same time, you can access Swoole\Server\Port object directly by member property swoolePort.

public function onReceive(Server $server, $fd, $reactorId, $data)
{
    $port = $this->swoolePort; // Get the `Swoole\Server\Port` object
}
namespace App\Http\Controllers;
class TestController extends Controller
{
    public function test()
    {
        /**@var \Swoole\Http\Server|\Swoole\WebSocket\Server $swoole */
        $swoole = app('swoole');
        // $swoole->ports: Traverse all Port objects, https://www.swoole.co.uk/docs/modules/swoole-server/multiple-ports
        $port = $swoole->ports[0]; // Get the `Swoole\Server\Port` object, $port[0] is the port of the main server
        foreach ($port->connections as $fd) { // Traverse all connections
            // $swoole->send($fd, 'Send tcp message');
            // if($swoole->isEstablished($fd)) {
            //     $swoole->push($fd, 'Send websocket message');
            // }
        }
    }
}

Register Sockets.

// Edit `config/laravels.php`
//...
'sockets' => [
    [
        'host'     => '127.0.0.1',
        'port'     => 5291,
        'type'     => SWOOLE_SOCK_TCP,// Socket type: SWOOLE_SOCK_TCP/SWOOLE_SOCK_TCP6/SWOOLE_SOCK_UDP/SWOOLE_SOCK_UDP6/SWOOLE_UNIX_DGRAM/SWOOLE_UNIX_STREAM
        'settings' => [// Swoole settings:https://www.swoole.co.uk/docs/modules/swoole-server-methods#swoole_server-addlistener
            'open_eof_check' => true,
            'package_eof'    => "\r\n",
        ],
        'handler'  => \App\Sockets\TestTcpSocket::class,
        'enable'   => true, // whether to enable, default true
    ],
],

About the heartbeat configuration, it can only be set on the main server and cannot be configured on Socket, but the Socket inherits the heartbeat configuration of the main server.

For TCP socket, onConnect and onClose events will be blocked when dispatch_mode of Swoole is 1/3, so if you want to unblock these two events please set dispatch_mode to 2/4/5.

'swoole' => [
    //...
    'dispatch_mode' => 2,
    //...
];

Test.

TCP: telnet 127.0.0.1 5291

UDP: [Linux] echo "Hello LaravelS" > /dev/udp/127.0.0.1/5292

Register example of other protocols.

  • UDP
  • Http
  • WebSocket: The main server must turn on WebSocket, that is, set websocket.enable to true.

Coroutine

Swoole Coroutine

Warning: The order of code execution in the coroutine is out of order. The data of the request level should be isolated by the coroutine ID. However, there are many singleton and static attributes in Laravel/Lumen, the data between different requests will affect each other, it's Unsafe. For example, the database connection is a singleton, the same database connection shares the same PDO resource. This is fine in the synchronous blocking mode, but it does not work in the asynchronous coroutine mode. Each query needs to create different connections and maintain IO state of different connections, which requires a connection pool.

DO NOT enable the coroutine, only the custom process can use the coroutine.

Custom process

Support developers to create special work processes for monitoring, reporting, or other special tasks. Refer addProcess.

Create Proccess class, implements CustomProcessInterface.

namespace App\Processes;
use App\Tasks\TestTask;
use Hhxsv5\LaravelS\Swoole\Process\CustomProcessInterface;
use Hhxsv5\LaravelS\Swoole\Task\Task;
use Swoole\Coroutine;
use Swoole\Http\Server;
use Swoole\Process;
class TestProcess implements CustomProcessInterface
{
    /**
     * @var bool Quit tag for Reload updates
     */
    private static $quit = false;

    public static function callback(Server $swoole, Process $process)
    {
        // The callback method cannot exit. Once exited, Manager process will automatically create the process 
        while (!self::$quit) {
            \Log::info('Test process: running');
            // sleep(1); // Swoole < 2.1
            Coroutine::sleep(1); // Swoole>=2.1: Coroutine & Runtime will be automatically enabled for callback().
             // Deliver task in custom process, but NOT support callback finish() of task.
            // Note: Modify task_ipc_mode to 1 or 2 in config/laravels.php, see https://www.swoole.co.uk/docs/modules/swoole-server/configuration
            $ret = Task::deliver(new TestTask('task data'));
            var_dump($ret);
            // The upper layer will catch the exception thrown in the callback and record it in the Swoole log, and then this process will exit. The Manager process will re-create the process after 3 seconds, so developers need to try/catch to catch the exception by themselves to avoid frequent process creation.
            // throw new \Exception('an exception');
        }
    }
    // Requirements: LaravelS >= v3.4.0 & callback() must be async non-blocking program.
    public static function onReload(Server $swoole, Process $process)
    {
        // Stop the process...
        // Then end process
        \Log::info('Test process: reloading');
        self::$quit = true;
        // $process->exit(0); // Force exit process
    }
    // Requirements: LaravelS >= v3.7.4 & callback() must be async non-blocking program.
    public static function onStop(Server $swoole, Process $process)
    {
        // Stop the process...
        // Then end process
        \Log::info('Test process: stopping');
        self::$quit = true;
        // $process->exit(0); // Force exit process
    }
}

Register TestProcess.

// Edit `config/laravels.php`
// ...
'processes' => [
    'test' => [ // Key name is process name
        'class'    => \App\Processes\TestProcess::class,
        'redirect' => false, // Whether redirect stdin/stdout, true or false
        'pipe'     => 0,     // The type of pipeline, 0: no pipeline 1: SOCK_STREAM 2: SOCK_DGRAM
        'enable'   => true,  // Whether to enable, default true
        //'num'    => 3   // To create multiple processes of this class, default is 1
        //'queue'    => [ // Enable message queue as inter-process communication, configure empty array means use default parameters
        //    'msg_key'  => 0,    // The key of the message queue. Default: ftok(__FILE__, 1).
        //    'mode'     => 2,    // Communication mode, default is 2, which means contention mode
        //    'capacity' => 8192, // The length of a single message, is limited by the operating system kernel parameters. The default is 8192, and the maximum is 65536
        //],
        //'restart_interval' => 5, // After the process exits abnormally, how many seconds to wait before restarting the process, default 5 seconds
    ],
],

Note: The callback() cannot quit. If quit, the Manager process will re-create the process.

Example: Write data to a custom process.

// config/laravels.php
'processes' => [
    'test' => [
        'class'    => \App\Processes\TestProcess::class,
        'redirect' => false,
        'pipe'     => 1,
    ],
],
// app/Processes/TestProcess.php
public static function callback(Server $swoole, Process $process)
{
    while ($data = $process->read()) {
        \Log::info('TestProcess: read data', [$data]);
        $process->write('TestProcess: ' . $data);
    }
}
// app/Http/Controllers/TestController.php
public function testProcessWrite()
{
    /**@var \Swoole\Process $process */
    $process = app('swoole')->customProcesses['test'];
    $process->write('TestController: write data' . time());
    var_dump($process->read());
}

Common components

Apollo

LaravelS will pull the Apollo configuration and write it to the .env file when starting. At the same time, LaravelS will start the custom process apollo to monitor the configuration and automatically reload when the configuration changes.

Enable Apollo: add --enable-apollo and Apollo parameters to the startup parameters.

php bin/laravels start --enable-apollo --apollo-server=http://127.0.0.1:8080 --apollo-app-id=LARAVEL-S-TEST

Support hot updates(optional).

// Edit `config/laravels.php`
'processes' => Hhxsv5\LaravelS\Components\Apollo\Process::getDefinition(),
// When there are other custom process configurations
'processes' => [
    'test' => [
        'class'    => \App\Processes\TestProcess::class,
        'redirect' => false,
        'pipe'     => 1,
    ],
    // ...
] + Hhxsv5\LaravelS\Components\Apollo\Process::getDefinition(),

List of available parameters.

ParameterDescriptionDefaultDemo
apollo-serverApollo server URL---apollo-server=http://127.0.0.1:8080
apollo-app-idApollo APP ID---apollo-app-id=LARAVEL-S-TEST
apollo-namespacesThe namespace to which the APP belongs, support specify the multipleapplication--apollo-namespaces=application --apollo-namespaces=env
apollo-clusterThe cluster to which the APP belongsdefault--apollo-cluster=default
apollo-client-ipIP of current instance, can also be used for grayscale publishingLocal intranet IP--apollo-client-ip=10.2.1.83
apollo-pull-timeoutTimeout time(seconds) when pulling configuration5--apollo-pull-timeout=5
apollo-backup-old-envWhether to backup the old configuration file when updating the configuration file .envfalse--apollo-backup-old-env

Prometheus

Support Prometheus monitoring and alarm, Grafana visually view monitoring metrics. Please refer to Docker Compose for the environment construction of Prometheus and Grafana.

Require extension APCu >= 5.0.0, please install it by pecl install apcu.

Copy the configuration file prometheus.php to the config directory of your project. Modify the configuration as appropriate.

# Execute commands in the project root directory
cp vendor/hhxsv5/laravel-s/config/prometheus.php config/

If your project is Lumen, you also need to manually load the configuration $app->configure('prometheus'); in bootstrap/app.php.

Configure global middleware: Hhxsv5\LaravelS\Components\Prometheus\RequestMiddleware::class. In order to count the request time consumption as accurately as possible, RequestMiddleware must be the first global middleware, which needs to be placed in front of other middleware.

Register ServiceProvider: Hhxsv5\LaravelS\Components\Prometheus\ServiceProvider::class.

Configure the CollectorProcess in config/laravels.php to collect the metrics of Swoole Worker/Task/Timer processes regularly.

'processes' => Hhxsv5\LaravelS\Components\Prometheus\CollectorProcess::getDefinition(),

Create the route to output metrics.

use Hhxsv5\LaravelS\Components\Prometheus\Exporter;

Route::get('/actuator/prometheus', function () {
    $result = app(Exporter::class)->render();
    return response($result, 200, ['Content-Type' => Exporter::REDNER_MIME_TYPE]);
});

Complete the configuration of Prometheus and start it.

global:
  scrape_interval: 5s
  scrape_timeout: 5s
  evaluation_interval: 30s
scrape_configs:
- job_name: laravel-s-test
  honor_timestamps: true
  metrics_path: /actuator/prometheus
  scheme: http
  follow_redirects: true
  static_configs:
  - targets:
    - 127.0.0.1:5200 # The ip and port of the monitored service
# Dynamically discovered using one of the supported service-discovery mechanisms
# https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config
# - job_name: laravels-eureka
#   honor_timestamps: true
#   scrape_interval: 5s
#   metrics_path: /actuator/prometheus
#   scheme: http
#   follow_redirects: true
  # eureka_sd_configs:
  # - server: http://127.0.0.1:8080/eureka
  #   follow_redirects: true
  #   refresh_interval: 5s

Start Grafana, then import panel json.

Grafana Dashboard

Other features

Configure Swoole events

Supported events:

EventInterfaceWhen happened
ServerStartHhxsv5\LaravelS\Swoole\Events\ServerStartInterfaceOccurs when the Master process is starting, this event should not handle complex business logic, and can only do some simple work of initialization.
ServerStopHhxsv5\LaravelS\Swoole\Events\ServerStopInterfaceOccurs when the server exits normally, CANNOT use async or coroutine related APIs in this event.
WorkerStartHhxsv5\LaravelS\Swoole\Events\WorkerStartInterfaceOccurs after the Worker/Task process is started, and the Laravel initialization has been completed.
WorkerStopHhxsv5\LaravelS\Swoole\Events\WorkerStopInterfaceOccurs after the Worker/Task process exits normally
WorkerErrorHhxsv5\LaravelS\Swoole\Events\WorkerErrorInterfaceOccurs when an exception or fatal error occurs in the Worker/Task process

1.Create an event class to implement the corresponding interface.

namespace App\Events;
use Hhxsv5\LaravelS\Swoole\Events\ServerStartInterface;
use Swoole\Atomic;
use Swoole\Http\Server;
class ServerStartEvent implements ServerStartInterface
{
    public function __construct()
    {
    }
    public function handle(Server $server)
    {
        // Initialize a global counter (available across processes)
        $server->atomicCount = new Atomic(2233);

        // Invoked in controller: app('swoole')->atomicCount->get();
    }
}
namespace App\Events;
use Hhxsv5\LaravelS\Swoole\Events\WorkerStartInterface;
use Swoole\Http\Server;
class WorkerStartEvent implements WorkerStartInterface
{
    public function __construct()
    {
    }
    public function handle(Server $server, $workerId)
    {
        // Initialize a database connection pool
        // DatabaseConnectionPool::init();
    }
}

2.Configuration.

// Edit `config/laravels.php`
'event_handlers' => [
    'ServerStart' => [\App\Events\ServerStartEvent::class], // Trigger events in array order
    'WorkerStart' => [\App\Events\WorkerStartEvent::class],
],

Serverless

Alibaba Cloud Function Compute

Function Compute.

1.Modify bootstrap/app.php and set the storage directory. Because the project directory is read-only, the /tmp directory can only be read and written.

$app->useStoragePath(env('APP_STORAGE_PATH', '/tmp/storage'));

2.Create a shell script laravels_bootstrap and grant executable permission.

#!/usr/bin/env bash
set +e

# Create storage-related directories
mkdir -p /tmp/storage/app/public
mkdir -p /tmp/storage/framework/cache
mkdir -p /tmp/storage/framework/sessions
mkdir -p /tmp/storage/framework/testing
mkdir -p /tmp/storage/framework/views
mkdir -p /tmp/storage/logs

# Set the environment variable APP_STORAGE_PATH, please make sure it's the same as APP_STORAGE_PATH in .env
export APP_STORAGE_PATH=/tmp/storage

# Start LaravelS
php bin/laravels start

3.Configure template.xml.

ROSTemplateFormatVersion: '2015-09-01'
Transform: 'Aliyun::Serverless-2018-04-03'
Resources:
  laravel-s-demo:
    Type: 'Aliyun::Serverless::Service'
    Properties:
      Description: 'LaravelS Demo for Serverless'
    fc-laravel-s:
      Type: 'Aliyun::Serverless::Function'
      Properties:
        Handler: laravels.handler
        Runtime: custom
        MemorySize: 512
        Timeout: 30
        CodeUri: ./
        InstanceConcurrency: 10
        EnvironmentVariables:
          BOOTSTRAP_FILE: laravels_bootstrap

Important notices

Singleton Issue

Under FPM mode, singleton instances will be instantiated and recycled in every request, request start=>instantiate instance=>request end=>recycled instance.

Under Swoole Server, All singleton instances will be held in memory, different lifetime from FPM, request start=>instantiate instance=>request end=>do not recycle singleton instance. So need developer to maintain status of singleton instances in every request.

Common solutions:

Write a XxxCleaner class to clean up the singleton object state. This class implements the interface Hhxsv5\LaravelS\Illuminate\Cleaners\CleanerInterface and then registers it in cleaners of laravels.php.

Reset status of singleton instances by Middleware.

Re-register ServiceProvider, add XxxServiceProvider into register_providers of file laravels.php. So that reinitialize singleton instances in every request Refer.

Cleaners

Configuration cleaners.

Known issues

Known issues: a package of known issues and solutions.

Debugging method

Logging; if you want to output to the console, you can use stderr, Log::channel('stderr')->debug('debug message').

Laravel Dump Server(Laravel 5.7 has been integrated by default).

Read request

Read request by Illuminate\Http\Request Object, $_ENV is readable, $_SERVER is partially readable, CANNOT USE $_GET/$_POST/$_FILES/$_COOKIE/$_REQUEST/$_SESSION/$GLOBALS.

public function form(\Illuminate\Http\Request $request)
{
    $name = $request->input('name');
    $all = $request->all();
    $sessionId = $request->cookie('sessionId');
    $photo = $request->file('photo');
    // Call getContent() to get the raw POST body, instead of file_get_contents('php://input')
    $rawContent = $request->getContent();
    //...
}

Output response

Respond by Illuminate\Http\Response Object, compatible with echo/vardump()/print_r(),CANNOT USE functions dd()/exit()/die()/header()/setcookie()/http_response_code().

public function json()
{
    return response()->json(['time' => time()])->header('header1', 'value1')->withCookie('c1', 'v1');
}

Persistent connection

Singleton connection will be resident in memory, it is recommended to turn on persistent connection for better performance.

  1. Database connection, it will reconnect automatically immediately after disconnect.
// config/database.php
'connections' => [
    'my_conn' => [
        'driver'    => 'mysql',
        'host'      => env('DB_MY_CONN_HOST', 'localhost'),
        'port'      => env('DB_MY_CONN_PORT', 3306),
        'database'  => env('DB_MY_CONN_DATABASE', 'forge'),
        'username'  => env('DB_MY_CONN_USERNAME', 'forge'),
        'password'  => env('DB_MY_CONN_PASSWORD', ''),
        'charset'   => 'utf8mb4',
        'collation' => 'utf8mb4_unicode_ci',
        'prefix'    => '',
        'strict'    => false,
        'options'   => [
            // Enable persistent connection
            \PDO::ATTR_PERSISTENT => true,
        ],
    ],
],
  1. Redis connection, it won't reconnect automatically immediately after disconnect, and will throw an exception about lost connection, reconnect next time. You need to make sure that SELECT DB correctly before operating Redis every time.
// config/database.php
'redis' => [
    'client' => env('REDIS_CLIENT', 'phpredis'), // It is recommended to use phpredis for better performance.
    'default' => [
        'host'       => env('REDIS_HOST', 'localhost'),
        'password'   => env('REDIS_PASSWORD', null),
        'port'       => env('REDIS_PORT', 6379),
        'database'   => 0,
        'persistent' => true, // Enable persistent connection
    ],
],

About memory leaks

Avoid using global variables. If necessary, please clean or reset them manually.

Infinitely appending element into static/global variable will lead to OOM(Out of Memory).

class Test
{
    public static $array = [];
    public static $string = '';
}

// Controller
public function test(Request $req)
{
    // Out of Memory
    Test::$array[] = $req->input('param1');
    Test::$string .= $req->input('param2');
}

Memory leak detection method

Modify config/laravels.php: worker_num=1, max_request=1000000, remember to change it back after test;

Add routing /debug-memory-leak without route middleware to observe the memory changes of the Worker process;

Start LaravelS and request /debug-memory-leak until diff_mem is less than or equal to zero; if diff_mem is always greater than zero, it means that there may be a memory leak in Global Middleware or Laravel Framework;

After completing Step 3, alternately request the business routes and /debug-memory-leak (It is recommended to use ab/wrk to make a large number of requests for business routes), the initial increase in memory is normal. After a large number of requests for the business routes, if diff_mem is always greater than zero and curr_mem continues to increase, there is a high probability of memory leak; If curr_mem always changes within a certain range and does not continue to increase, there is a low probability of memory leak.

If you still can't solve it, max_request is the last guarantee.

Linux kernel parameter adjustment

Linux kernel parameter adjustment

Pressure test

Pressure test

Alternatives

Sponsor

PayPal

BTC

Gitee

Author: hhxsv5
Source Code: https://github.com/hhxsv5/laravel-s 
License: MIT License

#php #laravel #http 

Hoang  Ha

Hoang Ha

1657764000

Cách Thu Thập Dữ Liệu Từ Twitter Bằng Tweepy Và Snscrape


You can use the data you can get from social media in a number of ways, like sentiment analysis (analyzing people's thoughts) on a specific issue or field of interest.

There are several ways you can scrape (or gather) data from Twitter. And in this article, we will look at two of those ways: using Tweepy and Snscrape.

We will learn a method to scrape public conversations from people on a specific trending topic, as well as tweets from a particular user.

Now without further ado, let’s get started.

Tweepy vs Snscrape – Introduction to Our Scraping Tools

Now, before we get into the implementation of each platform, let's try to grasp the differences and limits of each platform.

Tweepy

Tweepy is a Python library for integrating with the Twitter API. Because Tweepy is connected with the Twitter API, you can perform complex queries in addition to scraping tweets. It enables you to take advantage of all of the Twitter API's capabilities.

But there are some drawbacks – like the fact that its standard API only allows you to collect tweets for up to a week (that is, Tweepy does not allow recovery of tweets beyond a week window, so historical data retrieval is not permitted).

Also, there are limits to how many tweets you can retrieve from a user's account. You can read more about Tweepy's functionalities here.

Snscrape

Snscrape is another approach for scraping information from Twitter that does not require the use of an API. Snscrape allows you to scrape basic information such as a user's profile, tweet content, source, and so on.

Snscrape is not limited to Twitter, but can also scrape content from other prominent social media networks like Facebook, Instagram, and others.

Its advantages are that there are no limits to the number of tweets you can retrieve or the window of tweets (that is, the date range of tweets). So Snscrape allows you to retrieve old data.

But the one disadvantage is that it lacks all the other functionalities of Tweepy – still, if you only want to scrape tweets, Snscrape would be enough.

Now that we've clarified the distinction between the two methods, let's go over their implementation one by one.

How to Use Tweepy to Scrape Tweets

Before we begin using Tweepy, we must first make sure that our Twitter credentials are ready. With that, we can connect Tweepy to our API key and begin scraping.

If you do not have Twitter credentials, you can register for a Twitter developer account by going here. You will be asked some basic questions about how you intend to use the Twitter API. After that, you can begin the implementation.

The first step is to install the Tweepy library on your local machine, which you can do by typing:

pip install git+https://github.com/tweepy/tweepy.git

How to Scrape Tweets from a User on Twitter

Now that we’ve installed the Tweepy library, let’s scrape 100 tweets from a user called john on Twitter. We'll look at the full code implementation that will let us do this and discuss it in detail so we can grasp what’s going on:

import tweepy

consumer_key = "XXXX" #Your API/Consumer key 
consumer_secret = "XXXX" #Your API/Consumer Secret Key
access_token = "XXXX"    #Your Access token key
access_token_secret = "XXXX" #Your Access token Secret key

#Pass in our twitter API authentication key
auth = tweepy.OAuth1UserHandler(
    consumer_key, consumer_secret,
    access_token, access_token_secret
)

#Instantiate the tweepy API
api = tweepy.API(auth, wait_on_rate_limit=True)


username = "john"
no_of_tweets =100


try:
    #The number of tweets we want to retrieved from the user
    tweets = api.user_timeline(screen_name=username, count=no_of_tweets)
    
    #Pulling Some attributes from the tweet
    attributes_container = [[tweet.created_at, tweet.favorite_count,tweet.source,  tweet.text] for tweet in tweets]

    #Creation of column list to rename the columns in the dataframe
    columns = ["Date Created", "Number of Likes", "Source of Tweet", "Tweet"]
    
    #Creation of Dataframe
    tweets_df = pd.DataFrame(attributes_container, columns=columns)
except BaseException as e:
    print('Status Failed On,',str(e))
    time.sleep(3)

Now let's go over each part of the code in the above block.

import tweepy

consumer_key = "XXXX" #Your API/Consumer key 
consumer_secret = "XXXX" #Your API/Consumer Secret Key
access_token = "XXXX"    #Your Access token key
access_token_secret = "XXXX" #Your Access token Secret key

#Pass in our twitter API authentication key
auth = tweepy.OAuth1UserHandler(
    consumer_key, consumer_secret,
    access_token, access_token_secret
)

#Instantiate the tweepy API
api = tweepy.API(auth, wait_on_rate_limit=True)

In the above code, we've imported the Tweepy library into our code, then we've created some variables where we store our Twitter credentials (The Tweepy authentication handler requires four of our Twitter credentials). So we then pass in those variable into the Tweepy authentication handler and save them into another variable.

Then the last statement of call is where we instantiated the Tweepy API and passed in the require parameters.

username = "john"
no_of_tweets =100


try:
    #The number of tweets we want to retrieved from the user
    tweets = api.user_timeline(screen_name=username, count=no_of_tweets)
    
    #Pulling Some attributes from the tweet
    attributes_container = [[tweet.created_at, tweet.favorite_count,tweet.source,  tweet.text] for tweet in tweets]

    #Creation of column list to rename the columns in the dataframe
    columns = ["Date Created", "Number of Likes", "Source of Tweet", "Tweet"]
    
    #Creation of Dataframe
    tweets_df = pd.DataFrame(attributes_container, columns=columns)
except BaseException as e:
    print('Status Failed On,',str(e))

In the above code, we created the name of the user (the @name in Twitter) we want to retrieved the tweets from and also the number of tweets. We then created an exception handler to help us catch errors in a more effective way.

After that, the api.user_timeline() returns a collection of the most recent tweets posted by the user we picked in the screen_name parameter and the number of tweets you want to retrieve.

In the next line of code, we passed in some attributes we want to retrieve from each tweet and saved them into a list. To see more attributes you can retrieve from a tweet, read this.

In the last chunk of code we created a dataframe and passed in the list we created along with the names of the column we created.

Note that the column names must be in the sequence of how you passed them into the attributes container (that is, how you passed those attributes in a list when you were retrieving the attributes from the tweet).

If you correctly followed the steps I described, you should have something like this:

image-17

Image by Author

Now that we are done, let's go over one more example before we move into the Snscrape implementation.

How to Scrape Tweets from a Text Search

In this method, we will be retrieving a tweet based on a search. You can do that like this:

import tweepy

consumer_key = "XXXX" #Your API/Consumer key 
consumer_secret = "XXXX" #Your API/Consumer Secret Key
access_token = "XXXX"    #Your Access token key
access_token_secret = "XXXX" #Your Access token Secret key

#Pass in our twitter API authentication key
auth = tweepy.OAuth1UserHandler(
    consumer_key, consumer_secret,
    access_token, access_token_secret
)

#Instantiate the tweepy API
api = tweepy.API(auth, wait_on_rate_limit=True)


search_query = "sex for grades"
no_of_tweets =150


try:
    #The number of tweets we want to retrieved from the search
    tweets = api.search_tweets(q=search_query, count=no_of_tweets)
    
    #Pulling Some attributes from the tweet
    attributes_container = [[tweet.user.name, tweet.created_at, tweet.favorite_count, tweet.source,  tweet.text] for tweet in tweets]

    #Creation of column list to rename the columns in the dataframe
    columns = ["User", "Date Created", "Number of Likes", "Source of Tweet", "Tweet"]
    
    #Creation of Dataframe
    tweets_df = pd.DataFrame(attributes_container, columns=columns)
except BaseException as e:
    print('Status Failed On,',str(e))

The above code is similar to the previous code, except that we changed the API method from api.user_timeline() to api.search_tweets(). We've also added tweet.user.name to the attributes container list.

In the code above, you can see that we passed in two attributes. This is because if we only pass in tweet.user, it would only return a dictionary user object. So we must also pass in another attribute we want to retrieve from the user object, which is name.

You can go here to see a list of additional attributes that you can retrieve from a user object. Now you should see something like this once you run it:

image-18

Image by Author.

Alright, that just about wraps up the Tweepy implementation. Just remember that there is a limit to the number of tweets you can retrieve, and you can not retrieve tweets more than 7 days old using Tweepy.

How to Use Snscrape to Scrape Tweets

As I mentioned previously, Snscrape does not require Twitter credentials (API key) to access it. There is also no limit to the number of tweets you can fetch.

For this example, though, we'll just retrieve the same tweets as in the previous example, but using Snscrape instead.

To use Snscrape, we must first install its library on our PC. You can do that by typing:

pip3 install git+https://github.com/JustAnotherArchivist/snscrape.git

How to Scrape Tweets from a User with Snscrape

Snscrape includes two methods for getting tweets from Twitter: the command line interface (CLI) and a Python Wrapper. Just keep in mind that the Python Wrapper is currently undocumented – but we can still get by with trial and error.

In this example, we will use the Python Wrapper because it is more intuitive than the CLI method. But if you get stuck with some code, you can always turn to the GitHub community for assistance. The contributors will be happy to help you.

To retrieve tweets from a particular user, we can do the following:

import snscrape.modules.twitter as sntwitter
import pandas as pd

# Created a list to append all tweet attributes(data)
attributes_container = []

# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:john').get_items()):
    if i>100:
        break
    attributes_container.append([tweet.date, tweet.likeCount, tweet.sourceLabel, tweet.content])
    
# Creating a dataframe from the tweets list above 
tweets_df = pd.DataFrame(attributes_container, columns=["Date Created", "Number of Likes", "Source of Tweet", "Tweets"])

Let's go over some of the code that you might not understand at first glance:

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:john').get_items()):
    if i>100:
        break
    attributes_container.append([tweet.date, tweet.likeCount, tweet.sourceLabel, tweet.content])
    
  
# Creating a dataframe from the tweets list above 
tweets_df = pd.DataFrame(attributes_container, columns=["Date Created", "Number of Likes", "Source of Tweet", "Tweets"])

In the above code, what the sntwitter.TwitterSearchScaper does is return an object of tweets from the name of the user we passed into it (which is john).

As I mentioned earlier, Snscrape does not have limits on numbers of tweets so it will return however many tweets from that user. To help with this, we need to add the enumerate function which will iterate through the object and add a counter so we can access the most recent 100 tweets from the user.

You can see that the attributes syntax we get from each tweet looks like the one from Tweepy. These are the list of attributes that we can get from the Snscrape tweet which was curated by Martin Beck.

Sns.Scrape

Credit: Martin Beck

More attributes might be added, as the Snscrape library is still in development. Like for instance in the above image, source has been replaced with sourceLabel. If you pass in only source it will return an object.

If you run the above code, you should see something like this as well:

image-19

Image by Author

Now let's do the same for scraping by search.

How to Scrape Tweets from a Text Search with Snscrape

import snscrape.modules.twitter as sntwitter
import pandas as pd

# Creating list to append tweet data to
attributes_container = []

# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('sex for grades since:2021-07-05 until:2022-07-06').get_items()):
    if i>150:
        break
    attributes_container.append([tweet.user.username, tweet.date, tweet.likeCount, tweet.sourceLabel, tweet.content])
    
# Creating a dataframe to load the list
tweets_df = pd.DataFrame(attributes_container, columns=["User", "Date Created", "Number of Likes", "Source of Tweet", "Tweet"])

Again, you can access a lot of historical data using Snscrape (unlike Tweepy, as its standard API cannot exceed 7 days. The premium API is 30 days.). So we can pass in the date from which we want to start the search and the date we want it to end in the sntwitter.TwitterSearchScraper() method.

What we've done in the preceding code is basically what we discussed before. The only thing to bear in mind is that until works similarly to the range function in Python (that is, it excludes the last integer). So if you want to get tweets from today, you need to include the day after today in the "until" parameter.

image-21

Image of Author.

Now you know how to scrape tweets with Snscrape, too!

When to use each approach

Now that we've seen how each method works, you might be wondering when to use which.

Well, there is no universal rule for when to utilize each method. Everything comes down to a matter preference and your use case.

If you want to acquire an endless number of tweets, you should use Snscrape. But if you want to use extra features that Snscrape cannot provide (like geolocation, for example), then you should definitely use Tweepy. It is directly integrated with the Twitter API and provides complete functionality.

Even so, Snscrape is the most commonly used method for basic scraping.

Conclusion

In this article, we learned how to scrape data from Python using Tweepy and Snscrape. But this was only a brief overview of how each approach works. You can learn more by exploring the web for additional information.

I've included some useful resources that you can use if you need additional information. Thank you for reading.

 Source: https://www.freecodecamp.org/news/python-web-scraping-tutorial/

#python #web 

许 志强

许 志强

1657769340

如何使用 Tweepy 和 Snscape 从 Twitter 上抓取数据

如果您是数据爱好者,您可能会同意社交媒体是现实世界数据中最丰富的来源之一。像 Twitter 这样的网站充满了数据。

您可以通过多种方式使用从社交媒体获得的数据,例如针对特定问题或感兴趣领域的情绪分析(分析人们的想法)。

您可以通过多种方式从 Twitter 上抓取(或收集)数据。在本文中,我们将研究其中两种方式:使用 Tweepy 和 Snscrap。

我们将学习一种方法来抓取人们关于特定趋势主题的公开对话,以及来自特定用户的推文。

现在事不宜迟,让我们开始吧。

Tweepy vs Snscrape——我们的抓取工具简介

现在,在我们进入每个平台的实现之前,让我们尝试掌握每个平台的差异和限制。

呸呸呸

Tweepy 是一个用于与 Twitter API 集成的 Python 库。因为 Tweepy 与 Twitter API 连接,除了抓取推文之外,您还可以执行复杂的查询。它使您能够利用 Twitter API 的所有功能。

但也有一些缺点——比如它的标准 API 只允许您收集长达一周的推文(也就是说,Tweepy 不允许恢复超过一周窗口的推文,因此不允许检索历史数据)。

此外,您可以从用户帐户中检索多少条推文也是有限制的。您可以在此处阅读有关 Tweepy 功能的更多信息

刮擦

Snscape 是另一种从 Twitter 上抓取信息的方法,不需要使用 API。Snscrape 允许您抓取基本信息,例如用户的个人资料、推文内容、来源等。

Snscape 不仅限于 Twitter,还可以从其他著名的社交媒体网络(如 Facebook、Instagram 等)中抓取内容。

它的优点是可以检索的推文数量或推文窗口(即推文的日期范围)没有限制。因此,Snscape 允许您检索旧数据。

但一个缺点是它缺乏 Tweepy 的所有其他功能——不过,如果你只想抓取推文,Snscrap 就足够了。

现在我们已经阐明了这两种方法之间的区别,让我们一一来看看它们的实现。

如何使用 Tweepy 抓取推文

在我们开始使用 Tweepy 之前,我们必须首先确保我们的 Twitter 凭据已准备好。有了它,我们可以将 Tweepy 连接到我们的 API 密钥并开始抓取。

如果您没有 Twitter 凭据,您可以前往此处注册 Twitter 开发者帐户。您将被问及一些关于您打算如何使用 Twitter API 的基本问题。之后,您可以开始实施。

第一步是在你的本地机器上安装 Tweepy 库,你可以通过键入:

pip install git+https://github.com/tweepy/tweepy.git

如何在 Twitter 上抓取用户的推文

现在我们已经安装了 Tweepy 库,让我们从johnTwitter 上调用的用户那里抓取 100 条推文。我们将查看完整的代码实现,让我们这样做并详细讨论它,以便我们了解发生了什么:

import tweepy

consumer_key = "XXXX" #Your API/Consumer key 
consumer_secret = "XXXX" #Your API/Consumer Secret Key
access_token = "XXXX"    #Your Access token key
access_token_secret = "XXXX" #Your Access token Secret key

#Pass in our twitter API authentication key
auth = tweepy.OAuth1UserHandler(
    consumer_key, consumer_secret,
    access_token, access_token_secret
)

#Instantiate the tweepy API
api = tweepy.API(auth, wait_on_rate_limit=True)


username = "john"
no_of_tweets =100


try:
    #The number of tweets we want to retrieved from the user
    tweets = api.user_timeline(screen_name=username, count=no_of_tweets)
    
    #Pulling Some attributes from the tweet
    attributes_container = [[tweet.created_at, tweet.favorite_count,tweet.source,  tweet.text] for tweet in tweets]

    #Creation of column list to rename the columns in the dataframe
    columns = ["Date Created", "Number of Likes", "Source of Tweet", "Tweet"]
    
    #Creation of Dataframe
    tweets_df = pd.DataFrame(attributes_container, columns=columns)
except BaseException as e:
    print('Status Failed On,',str(e))
    time.sleep(3)

现在让我们回顾一下上面代码块中的每一部分代码。

import tweepy

consumer_key = "XXXX" #Your API/Consumer key 
consumer_secret = "XXXX" #Your API/Consumer Secret Key
access_token = "XXXX"    #Your Access token key
access_token_secret = "XXXX" #Your Access token Secret key

#Pass in our twitter API authentication key
auth = tweepy.OAuth1UserHandler(
    consumer_key, consumer_secret,
    access_token, access_token_secret
)

#Instantiate the tweepy API
api = tweepy.API(auth, wait_on_rate_limit=True)

在上面的代码中,我们将 Tweepy 库导入到我们的代码中,然后我们创建了一些变量来存储我们的 Twitter 凭据(Tweepy 身份验证处理程序需要我们的四个 Twitter 凭据)。所以我们然后将这些变量传递给 Tweepy 身份验证处理程序并将它们保存到另一个变量中。

然后最后一个调用语句是我们实例化 Tweepy API 并传入 require 参数的地方。

username = "john"
no_of_tweets =100


try:
    #The number of tweets we want to retrieved from the user
    tweets = api.user_timeline(screen_name=username, count=no_of_tweets)
    
    #Pulling Some attributes from the tweet
    attributes_container = [[tweet.created_at, tweet.favorite_count,tweet.source,  tweet.text] for tweet in tweets]

    #Creation of column list to rename the columns in the dataframe
    columns = ["Date Created", "Number of Likes", "Source of Tweet", "Tweet"]
    
    #Creation of Dataframe
    tweets_df = pd.DataFrame(attributes_container, columns=columns)
except BaseException as e:
    print('Status Failed On,',str(e))

在上面的代码中,我们创建了要从中检索推文的用户名(Twitter 中的@name)以及推文的数量。然后我们创建了一个异常处理程序来帮助我们以更有效的方式捕获错误。

之后,api.user_timeline()返回我们在参数中选择的用户发布的最新推文的集合以及screen_name您要检索的推文数量。

在下一行代码中,我们传入了一些我们想从每条推文中检索的属性,并将它们保存到一个列表中。要查看可以从推文中检索到的更多属性,请阅读

在最后一段代码中,我们创建了一个数据框,并传入了我们创建的列表以及我们创建的列的名称。

请注意,列名必须按照您将它们传递到属性容器的顺序(即,当您从推文中检索属性时,您如何在列表中传递这些属性)。

如果你正确地按照我描述的步骤,你应该有这样的:

图像 17

作者图片

现在我们已经完成了,在我们进入 Snscrap 实现之前,让我们再看一个例子。

如何从文本搜索中抓取推文

在这种方法中,我们将根据搜索检索推文。你可以这样做:

import tweepy

consumer_key = "XXXX" #Your API/Consumer key 
consumer_secret = "XXXX" #Your API/Consumer Secret Key
access_token = "XXXX"    #Your Access token key
access_token_secret = "XXXX" #Your Access token Secret key

#Pass in our twitter API authentication key
auth = tweepy.OAuth1UserHandler(
    consumer_key, consumer_secret,
    access_token, access_token_secret
)

#Instantiate the tweepy API
api = tweepy.API(auth, wait_on_rate_limit=True)


search_query = "sex for grades"
no_of_tweets =150


try:
    #The number of tweets we want to retrieved from the search
    tweets = api.search_tweets(q=search_query, count=no_of_tweets)
    
    #Pulling Some attributes from the tweet
    attributes_container = [[tweet.user.name, tweet.created_at, tweet.favorite_count, tweet.source,  tweet.text] for tweet in tweets]

    #Creation of column list to rename the columns in the dataframe
    columns = ["User", "Date Created", "Number of Likes", "Source of Tweet", "Tweet"]
    
    #Creation of Dataframe
    tweets_df = pd.DataFrame(attributes_container, columns=columns)
except BaseException as e:
    print('Status Failed On,',str(e))

上面的代码与前面的代码类似,只是我们将 API 方法从 更改api.user_timeline()api.search_tweets()。我们还添加tweet.user.name了属性容器列表。

在上面的代码中,你可以看到我们传入了两个属性。这是因为如果我们只传入tweet.user,它只会返回一个字典用户对象。所以我们还必须传入另一个我们想从用户对象中检索的属性,即name.

您可以在此处查看可以从用户对象中检索的附加属性列表。现在,一旦您运行它,您应该会看到类似这样的内容:

图像 18

图片由作者提供。

好的,这就是 Tweepy 的实现。请记住,您可以检索的推文数量是有限制的,并且您不能使用 Tweepy 检索超过 7 天的推文。

如何使用 Snscrape 来抓取推文

正如我之前提到的,Snscrape 不需要 Twitter 凭据(API 密钥)来访问它。您可以获取的推文数量也没有限制。

但是,对于这个示例,我们将只检索与上一个示例相同的推文,但使用 Snscape。

要使用 Snscrap,我们必须首先在我们的 PC 上安装它的库。您可以通过键入:

pip3 install git+https://github.com/JustAnotherArchivist/snscrape.git

如何使用 Snscrape 抓取用户的推文

Snscrape 包括两种从 Twitter 获取推文的方法:命令行界面 (CLI) 和 Python Wrapper。请记住,Python Wrapper 目前没有文档记录——但我们仍然可以通过反复试验来度过难关。

在本例中,我们将使用 Python Wrapper,因为它比 CLI 方法更直观。但是,如果您遇到一些代码问题,您可以随时向 GitHub 社区寻求帮助。贡献者将很乐意为您提供帮助。

要检索特定用户的推文,我们可以执行以下操作:

import snscrape.modules.twitter as sntwitter
import pandas as pd

# Created a list to append all tweet attributes(data)
attributes_container = []

# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:john').get_items()):
    if i>100:
        break
    attributes_container.append([tweet.date, tweet.likeCount, tweet.sourceLabel, tweet.content])
    
# Creating a dataframe from the tweets list above 
tweets_df = pd.DataFrame(attributes_container, columns=["Date Created", "Number of Likes", "Source of Tweet", "Tweets"])

让我们复习一下你可能第一眼看不懂的一些代码:

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:john').get_items()):
    if i>100:
        break
    attributes_container.append([tweet.date, tweet.likeCount, tweet.sourceLabel, tweet.content])
    
  
# Creating a dataframe from the tweets list above 
tweets_df = pd.DataFrame(attributes_container, columns=["Date Created", "Number of Likes", "Source of Tweet", "Tweets"])

在上面的代码中,所做的sntwitter.TwitterSearchScaper是从我们传递给它的用户名(即 john)中返回一个推文对象。

正如我之前提到的,Snscrape 对推文的数量没有限制,因此它会返回来自该用户的许多推文。为了解决这个问题,我们需要添加枚举函数,该函数将遍历对象并添加一个计数器,以便我们可以访问用户最近的 100 条推文。

您可以看到,我们从每条推文中获得的属性语法与 Tweepy 中的类似。这些是我们可以从 Martin Beck 策划的 Snscape 推文中获得的属性列表。

Sns.Scrape

学分:马丁贝克

可能会添加更多属性,因为 Snscape 库仍在开发中。例如上图中的,source已替换为sourceLabel. 如果你只传入source它会返回一个对象。

如果你运行上面的代码,你应该也会看到类似这样的东西:

图像 19

作者图片

现在让我们对通过搜索进行抓取做同样的事情。

如何使用 Snscrape 从文本搜索中抓取推文

import snscrape.modules.twitter as sntwitter
import pandas as pd

# Creating list to append tweet data to
attributes_container = []

# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('sex for grades since:2021-07-05 until:2022-07-06').get_items()):
    if i>150:
        break
    attributes_container.append([tweet.user.username, tweet.date, tweet.likeCount, tweet.sourceLabel, tweet.content])
    
# Creating a dataframe to load the list
tweets_df = pd.DataFrame(attributes_container, columns=["User", "Date Created", "Number of Likes", "Source of Tweet", "Tweet"])

同样,您可以使用 Snscrape 访问大量历史数据(与 Tweepy 不同,因为它的标准 API 不能超过 7 天。高级 API 是 30 天。)。所以我们可以在方法中传入我们想要开始搜索的日期和想要结束的日期sntwitter.TwitterSearchScraper()

我们在前面的代码中所做的基本上就是我们之前讨论过的。唯一要记住的是,直到与 Python 中的范围函数类似(也就是说,它不包括最后一个整数)。因此,如果您想从今天开始获取推文,则需要在“直到”参数中包含今天之后的一天。

图像 21

作者的形象。

现在您也知道如何使用 Snscape 抓取推文了!

何时使用每种方法

现在我们已经了解了每种方法的工作原理,您可能想知道何时使用哪种方法。

好吧,对于何时使用每种方法没有通用规则。一切都取决于问题偏好和您的用例。

如果你想获得无穷无尽的推文,你应该使用 Snscrap。但是,如果您想使用 Snscrape 无法提供的额外功能(例如地理定位),那么您绝对应该使用 Tweepy。它直接与 Twitter API 集成并提供完整的功能。

即便如此,Snscrape 是最常用的基本刮削方法。

结论

在本文中,我们学习了如何使用 Tweepy 和 Snscrap 从 Python 中抓取数据。但这只是对每种方法如何工作的简要概述。您可以通过浏览网络了解更多信息以获取更多信息。

我提供了一些有用的资源,如果您需要更多信息,可以使用它们。感谢您的阅读。

 来源:https ://www.freecodecamp.org/news/python-web-scraping-tutorial/

#python #web