Gordon  Taylor

Gordon Taylor

1660709340

EStimo: Evaluates How Long A Browser Will Execute Your Javascript Code

Estimo

Estimo is a tool for measuring parse / compile / execution time of javascript.

This tool can emulate CPU throttling, Network conditions, use Chrome Device emulation and more for measuring javascript performance.

Inspired by Size Limit. Thanks @ai and @aslushnikov for support.

Why?

JavaScript is the most expensive part of our frontend.

3.5 seconds to process 170 KB of JS and 0.1 second for 170 KB of JPEG. @Addy Osmani

3.5 seconds to process 170 KB of JS and 0.1 second for 170 KB of JPEG.

Usage

JS API

const path = require('path')
const estimo = require('estimo')

;(async () => {
  const report = await estimo(path.join(__dirname, 'examples', 'example.js'))
  console.log(report)
})()

CLI

estimo -r ./examples/example.js

Output

;[
  {
    name: 'example.js',
    parseHTML: 1.01,
    styleLayout: 34.08,
    paintCompositeRender: 1.4,
    scriptParseCompile: 1.04,
    scriptEvaluation: 8.11,
    javaScript: 9.15,
    garbageCollection: 0,
    other: 8.2,
    total: 53.84
  }
]

Fields Description

name - Resource name (file name or web url).

parseHTML - Time which was spent for ParseHTML, ParseAuthorStyleSheet events.

styleLayout - Time which was spent for ScheduleStyleRecalculation, UpdateLayoutTree, InvalidateLayout, Layout events.

paintCompositeRender - Time which was spent for Animation, HitTest, PaintSetup, Paint, PaintImage, RasterTask, ScrollLayer, UpdateLayer, UpdateLayerTree, CompositeLayers events.

scriptParseCompile - Time which was spent for v8.compile, v8.compileModule, v8.parseOnBackground events.

scriptEvaluation - Time which was spent for EventDispatch, EvaluateScript, v8.evaluateModule, FunctionCall, TimerFire, FireIdleCallback, FireAnimationFrame, RunMicrotasks, V8.Execute events.

javaScript - Time which was spent for both event groups (scriptParseCompile and scriptEvaluation).

garbageCollection - Time which was spent for MinorGC, MajorGC, BlinkGC.AtomicPhase, ThreadState::performIdleLazySweep, ThreadState::completeSweep, BlinkGCMarking events.

other - Time which was spent for MessageLoop::RunTask, TaskQueueManager::ProcessTaskFromWorkQueue, ThreadControllerImpl::DoWork events.

total - Total time.

Time

All metrics are in milliseconds.

We measure system-cpu time. The number of seconds that the process has spent on the CPU.

We not including time spent waiting for its turn on the CPU.

Multiple Runs

All results measured in time. It means that results could be unstable depends on available on your device resources.

You can use runs option to run estimo N times and get median value as a result.

JS API

const report = await estimo(['/path/to/examples/example.js'], { runs: 10 })

CLI

estimo -r /path/to/examples/example.js -runs 10

Diff Mode

You can compare metrics of an origin file with others its versions to understand consequences on performance.

We take the first file as a baseline. All rest files will be compared with the baseline.

JS API

const report = await estimo(['lib-1.0.5.js', 'lib-1.1.0.js'], { diff: true })

CLI

estimo -r lib-1.0.5.js lib-1.1.0.js -diff

Output

;[
  {
    name: 'lib-1.0.5.js',
    parseHTML: 1.48,
    styleLayout: 44.61,
    paintCompositeRender: 2.19,
    scriptParseCompile: 1.21,
    scriptEvaluation: 9.63,
    javaScript: 10.84,
    garbageCollection: 0,
    other: 9.95,
    total: 69.06
  },
  {
    name: 'lib-1.1.0.js',
    parseHTML: 2.97,
    styleLayout: 61.02,
    paintCompositeRender: 2.11,
    scriptParseCompile: 2.11,
    scriptEvaluation: 19.28,
    javaScript: 21.39,
    garbageCollection: 0,
    other: 15.49,
    total: 102.98,
    diff: {
      parseHTML: '2.97 (+50.17% 🔺)',
      styleLayout: '61.02 (+26.9% 🔺)',
      paintCompositeRender: '2.11 (-3.8% 🔽)',
      scriptParseCompile: '2.11 (+42.66% 🔺)',
      scriptEvaluation: '19.28 (+50.06% 🔺)',
      javaScript: '21.39 (+49.33% 🔺)',
      garbageCollection: '0 (N/A)',
      other: '15.49 (+35.77% 🔺)',
      total: '102.98 (+32.94% 🔺)'
    }
  }
]

Additional Use Cases

CPU Throttling Rate

The CPU Throttling Rate Emulation allows you to simulate CPU performance.

  • cpuThrottlingRate (default: 1) - Sets the CPU throttling rate. The number represents the slowdown factor (e.g., 2 is a "2x" slowdown).

JS API:

await estimo('/path/to/example.js', { cpuThrottlingRate: 4 })

CLI:

estimo -r ./examples/example.js -cpu 4

Network Emulation

The Network Emulation allows you to simulate a specified network conditions.

JS API:

await estimo('/path/to/example.js', { emulateNetworkConditions: 'Slow 3G' })

CLI:

estimo -r ./examples/example.js -net Slow\ 3G

Chrome Device Emulation

The Chrome Device Emulation allow you to simulate a specified device conditions.

JS API

const report = await estimo('/path/to/example.js', { device: 'Galaxy S5' })

CLI

estimo -r ./examples/examples.js -device Galaxy\ S5

When using CLI, for device names with spaces you should use symbols escaping.

Changing default timeout

You can specify how long estimo should wait for page to load.

  • timeout (default: 20000) - Sets timeout in ms.

JS API:

await estimo('/path/to/example.js', { timeout: 90000 })

CLI:

estimo -r ./examples/example.js -timeout 90000

Multiple Resources

JS API

const report = await estimo([
  '/path/to/libs/example.js',
  '/path/to/another/example/lib.js'
])

CLI

estimo -r /path/to/example.js https://unpkg.com/react@16/umd/react.development.js

Pages

You can use all features not only with js files, but with web pages too.

We will wait for navigation to be finished when the load event is fired.

JS API

const report = await estimo('https://www.google.com/')

CLI

estimo -r https://www.google.com/

Install

npm i estimo

or

yarn add estimo

How?

It uses puppeteer to generate Chrome Timelines. Which can be transformed in human-readable shape by Tracium.

We will use your local Chrome if it suitable for using with Estimo.

Keep in mind there result depends on your device and available resources.

Who Uses Estimo

Contributing

Pull requests, feature ideas and bug reports are very welcome. We highly appreciate any feedback.

Download Details:

Author: mbalabash
Source Code: https://github.com/mbalabash/estimo 
License: MIT license

#javascript #chrome #performance #headless 

EStimo: Evaluates How Long A Browser Will Execute Your Javascript Code
Elian  Harber

Elian Harber

1659925920

Go-sessions: The Sessions Manager for The Go Programming Language

Fast http sessions manager for Go.

Simple API, while providing robust set of features such as immutability, expiration time (can be shifted), databases like badger and redis as back-end storage.

Quick view

import "github.com/kataras/go-sessions/v3"

sess := sessions.Start(http.ResponseWriter, *http.Request)
sess.
  ID() string
  Get(string) interface{}
  HasFlash() bool
  GetFlash(string) interface{}
  GetFlashString(string) string
  GetString(key string) string
  GetInt(key string) (int, error)
  GetInt64(key string) (int64, error)
  GetFloat32(key string) (float32, error)
  GetFloat64(key string) (float64, error)
  GetBoolean(key string) (bool, error)
  GetAll() map[string]interface{}
  GetFlashes() map[string]interface{}
  VisitAll(cb func(k string, v interface{}))
  Set(string, interface{})
  SetImmutable(key string, value interface{})
  SetFlash(string, interface{})
  Delete(string)
  Clear()
  ClearFlashes()

Installation

The only requirement is the Go Programming Language, at least 1.14.

$ go get github.com/kataras/go-sessions/v3

go.mod

module your_app

go 1.14

require (
    github.com/kataras/go-sessions/v3 v3.3.0
)

Features

Documentation

Take a look at the ./examples folder.

Outline

// Start starts the session for the particular net/http request
Start(w http.ResponseWriter,r *http.Request) Session
// ShiftExpiration move the expire date of a session to a new date
// by using session default timeout configuration.
ShiftExpiration(w http.ResponseWriter, r *http.Request)
// UpdateExpiration change expire date of a session to a new date
// by using timeout value passed by `expires` receiver.
UpdateExpiration(w http.ResponseWriter, r *http.Request, expires time.Duration)
// Destroy kills the net/http session and remove the associated cookie
Destroy(w http.ResponseWriter,r  *http.Request)

// Start starts the session for the particular valyala/fasthttp request
StartFasthttp(ctx *fasthttp.RequestCtx) Session
// ShiftExpirationFasthttp move the expire date of a session to a new date
// by using session default timeout configuration.
ShiftExpirationFasthttp(ctx *fasthttp.RequestCtx)
// UpdateExpirationFasthttp change expire date of a session to a new date
// by using timeout value passed by `expires` receiver.
UpdateExpirationFasthttp(ctx *fasthttp.RequestCtx, expires time.Duration)
// Destroy kills the valyala/fasthttp session and remove the associated cookie
DestroyFasthttp(ctx *fasthttp.RequestCtx)

// DestroyByID removes the session entry
// from the server-side memory (and database if registered).
// Client's session cookie will still exist but it will be reseted on the next request.
//
// It's safe to use it even if you are not sure if a session with that id exists.
// Works for both net/http & fasthttp
DestroyByID(string)
// DestroyAll removes all sessions
// from the server-side memory (and database if registered).
// Client's session cookie will still exist but it will be reseted on the next request.
// Works for both net/http & fasthttp
DestroyAll()

// UseDatabase ,optionally, adds a session database to the manager's provider,
// a session db doesn't have write access
// see https://github.com/kataras/go-sessions/tree/master/sessiondb
UseDatabase(Database)

Configuration

// Config is the configuration for sessions. Please read it before using sessions.
Config struct {
    // Cookie string, the session's client cookie name, for example: "mysessionid"
    //
    // Defaults to "irissessionid".
    Cookie string

    // CookieSecureTLS set to true if server is running over TLS
    // and you need the session's cookie "Secure" field to be setted true.
    //
    // Note: The user should fill the Decode configuation field in order for this to work.
    // Recommendation: You don't need this to be setted to true, just fill the Encode and Decode fields
    // with a third-party library like secure cookie, example is provided at the _examples folder.
    //
    // Defaults to false.
    CookieSecureTLS bool

    // AllowReclaim will allow to
    // Destroy and Start a session in the same request handler.
    // All it does is that it removes the cookie for both `Request` and `ResponseWriter` while `Destroy`
    // or add a new cookie to `Request` while `Start`.
    //
    // Defaults to false.
    AllowReclaim bool

    // Encode the cookie value if not nil.
    // Should accept as first argument the cookie name (config.Cookie)
    //         as second argument the server's generated session id.
    // Should return the new session id, if error the session id setted to empty which is invalid.
    //
    // Note: Errors are not printed, so you have to know what you're doing,
    // and remember: if you use AES it only supports key sizes of 16, 24 or 32 bytes.
    // You either need to provide exactly that amount or you derive the key from what you type in.
    //
    // Defaults to nil.
    Encode func(cookieName string, value interface{}) (string, error)
    // Decode the cookie value if not nil.
    // Should accept as first argument the cookie name (config.Cookie)
    //               as second second accepts the client's cookie value (the encoded session id).
    // Should return an error if decode operation failed.
    //
    // Note: Errors are not printed, so you have to know what you're doing,
    // and remember: if you use AES it only supports key sizes of 16, 24 or 32 bytes.
    // You either need to provide exactly that amount or you derive the key from what you type in.
    //
    // Defaults to nil.
    Decode func(cookieName string, cookieValue string, v interface{}) error

    // Encoding same as Encode and Decode but receives a single instance which
    // completes the "CookieEncoder" interface, `Encode` and `Decode` functions.
    //
    // Defaults to nil.
    Encoding Encoding

    // Expires the duration of which the cookie must expires (created_time.Add(Expires)).
    // If you want to delete the cookie when the browser closes, set it to -1.
    //
    // 0 means no expire, (24 years)
    // -1 means when browser closes
    // > 0 is the time.Duration which the session cookies should expire.
    //
    // Defaults to infinitive/unlimited life duration(0).
    Expires time.Duration

    // SessionIDGenerator should returns a random session id.
    // By default we will use a uuid impl package to generate
    // that, but developers can change that with simple assignment.
    SessionIDGenerator func() string

    // DisableSubdomainPersistence set it to true in order dissallow your subdomains to have access to the session cookie
    //
    // Defaults to false.
    DisableSubdomainPersistence bool
}

Usage NET/HTTP

Start returns a Session, Session outline

ID() string
Get(string) interface{}
HasFlash() bool
GetFlash(string) interface{}
GetString(key string) string
GetFlashString(string) string
GetInt(key string) (int, error)
GetInt64(key string) (int64, error)
GetFloat32(key string) (float32, error)
GetFloat64(key string) (float64, error)
GetBoolean(key string) (bool, error)
GetAll() map[string]interface{}
GetFlashes() map[string]interface{}
VisitAll(cb func(k string, v interface{}))
Set(string, interface{})
SetImmutable(key string, value interface{})
SetFlash(string, interface{})
Delete(string)
Clear()
ClearFlashes()
package main

import (
    "fmt"
    "net/http"
    "time"

    "github.com/kataras/go-sessions/v3"
)

type businessModel struct {
    Name string
}

func main() {
    app := http.NewServeMux()
    sess := sessions.New(sessions.Config{
        // Cookie string, the session's client cookie name, for example: "mysessionid"
        //
        // Defaults to "gosessionid"
        Cookie: "mysessionid",
        // it's time.Duration, from the time cookie is created, how long it can be alive?
        // 0 means no expire.
        // -1 means expire when browser closes
        // or set a value, like 2 hours:
        Expires: time.Hour * 2,
        // if you want to invalid cookies on different subdomains
        // of the same host, then enable it
        DisableSubdomainPersistence: false,
        // want to be crazy safe? Take a look at the "securecookie" example folder.
    })

    app.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        w.Write([]byte(fmt.Sprintf("You should navigate to the /set, /get, /delete, /clear,/destroy instead")))
    })
    app.HandleFunc("/set", func(w http.ResponseWriter, r *http.Request) {

        //set session values.
        s := sess.Start(w, r)
        s.Set("name", "iris")

        //test if setted here
        w.Write([]byte(fmt.Sprintf("All ok session setted to: %s", s.GetString("name"))))

        // Set will set the value as-it-is,
        // if it's a slice or map
        // you will be able to change it on .Get directly!
        // Keep note that I don't recommend saving big data neither slices or maps on a session
        // but if you really need it then use the `SetImmutable` instead of `Set`.
        // Use `SetImmutable` consistently, it's slower.
        // Read more about muttable and immutable go types: https://stackoverflow.com/a/8021081
    })

    app.HandleFunc("/get", func(w http.ResponseWriter, r *http.Request) {
        // get a specific value, as string, if no found returns just an empty string
        name := sess.Start(w, r).GetString("name")

        w.Write([]byte(fmt.Sprintf("The name on the /set was: %s", name)))
    })

    app.HandleFunc("/delete", func(w http.ResponseWriter, r *http.Request) {
        // delete a specific key
        sess.Start(w, r).Delete("name")
    })

    app.HandleFunc("/clear", func(w http.ResponseWriter, r *http.Request) {
        // removes all entries
        sess.Start(w, r).Clear()
    })

    app.HandleFunc("/update", func(w http.ResponseWriter, r *http.Request) {
        // updates expire date
        sess.ShiftExpiration(w, r)
    })

    app.HandleFunc("/destroy", func(w http.ResponseWriter, r *http.Request) {

        //destroy, removes the entire session data and cookie
        sess.Destroy(w, r)
    })
    // Note about Destroy:
    //
    // You can destroy a session outside of a handler too, using the:
    // mySessions.DestroyByID
    // mySessions.DestroyAll

    // remember: slices and maps are muttable by-design
    // The `SetImmutable` makes sure that they will be stored and received
    // as immutable, so you can't change them directly by mistake.
    //
    // Use `SetImmutable` consistently, it's slower than `Set`.
    // Read more about muttable and immutable go types: https://stackoverflow.com/a/8021081
    app.HandleFunc("/set_immutable", func(w http.ResponseWriter, r *http.Request) {
        business := []businessModel{{Name: "Edward"}, {Name: "value 2"}}
        s := sess.Start(w, r)
        s.SetImmutable("businessEdit", business)
        businessGet := s.Get("businessEdit").([]businessModel)

        // try to change it, if we used `Set` instead of `SetImmutable` this
        // change will affect the underline array of the session's value "businessEdit", but now it will not.
        businessGet[0].Name = "Gabriel"

    })

    app.HandleFunc("/get_immutable", func(w http.ResponseWriter, r *http.Request) {
        valSlice := sess.Start(w, r).Get("businessEdit")
        if valSlice == nil {
            w.Header().Set("Content-Type", "text/html; charset=UTF-8")
            w.Write([]byte("please navigate to the <a href='/set_immutable'>/set_immutable</a> first"))
            return
        }

        firstModel := valSlice.([]businessModel)[0]
        // businessGet[0].Name is equal to Edward initially
        if firstModel.Name != "Edward" {
            panic("Report this as a bug, immutable data cannot be changed from the caller without re-SetImmutable")
        }

        w.Write([]byte(fmt.Sprintf("[]businessModel[0].Name remains: %s", firstModel.Name)))

        // the name should remains "Edward"
    })

    http.ListenAndServe(":8080", app)
}

Usage FASTHTTP

StartFasthttp returns the same object as Start, Session.

ID() string
Get(string) interface{}
HasFlash() bool
GetFlash(string) interface{}
GetString(key string) string
GetFlashString(string) string
GetInt(key string) (int, error)
GetInt64(key string) (int64, error)
GetFloat32(key string) (float32, error)
GetFloat64(key string) (float64, error)
GetBoolean(key string) (bool, error)
GetAll() map[string]interface{}
GetFlashes() map[string]interface{}
VisitAll(cb func(k string, v interface{}))
Set(string, interface{})
SetImmutable(key string, value interface{})
SetFlash(string, interface{})
Delete(string)
Clear()
ClearFlashes()

We have only one simple example because the API is the same, the returned session is the same for both net/http and valyala/fasthttp.

Just append the word "Fasthttp", the rest of the API remains as it's with net/http.

Start for net/http, StartFasthttp for valyala/fasthttp. ShiftExpiration for net/http, ShiftExpirationFasthttp for valyala/fasthttp. UpdateExpiration for net/http, UpdateExpirationFasthttp for valyala/fasthttp. Destroy for net/http, DestroyFasthttp for valyala/fasthttp.

package main

import (
    "fmt"

    "github.com/kataras/go-sessions/v3"
    "github.com/valyala/fasthttp"
)

func main() {
// set some values to the session
setHandler := func(reqCtx *fasthttp.RequestCtx) {
    values := map[string]interface{}{
        "Name":   "go-sessions",
        "Days":   "1",
        "Secret": "dsads£2132215£%%Ssdsa",
    }

    sess := sessions.StartFasthttp(reqCtx) // init the session
    // sessions.StartFasthttp returns the, same, Session interface we saw before too

    for k, v := range values {
        sess.Set(k, v) // fill session, set each of the key-value pair
    }
    reqCtx.WriteString("Session saved, go to /get to view the results")
}

// get the values from the session
getHandler := func(reqCtx *fasthttp.RequestCtx) {
    sess := sessions.StartFasthttp(reqCtx) // init the session
    sessValues := sess.GetAll()            // get all values from this session

    reqCtx.WriteString(fmt.Sprintf("%#v", sessValues))
}

// clear all values from the session
clearHandler := func(reqCtx *fasthttp.RequestCtx) {
    sess := sessions.StartFasthttp(reqCtx)
    sess.Clear()
}

// destroys the session, clears the values and removes the server-side entry and client-side sessionid cookie
destroyHandler := func(reqCtx *fasthttp.RequestCtx) {
    sessions.DestroyFasthttp(reqCtx)
}

fmt.Println("Open a browser tab and navigate to the localhost:8080/set")
fasthttp.ListenAndServe(":8080", func(reqCtx *fasthttp.RequestCtx) {
    path := string(reqCtx.Path())

    if path == "/set" {
        setHandler(reqCtx)
    } else if path == "/get" {
        getHandler(reqCtx)
    } else if path == "/clear" {
        clearHandler(reqCtx)
    } else if path == "/destroy" {
        destroyHandler(reqCtx)
    } else {
        reqCtx.WriteString("Please navigate to /set or /get or /clear or /destroy")
    }
})
}

FAQ

If you'd like to discuss this package, or ask questions about it, feel free to

Versioning

Current: v3.3.0

Read more about Semantic Versioning 2.0.0

People

The author of go-sessions is @kataras.

Contributing

If you are interested in contributing to the go-sessions project, please make a PR.

Author: Kataras
Source Code: https://github.com/kataras/go-sessions 
License: MIT license

#go #golang #performance 

Go-sessions: The Sessions Manager for The Go Programming Language
Rupert  Beatty

Rupert Beatty

1658718180

Laravel-pjax: A Pjax Middleware for Laravel

A pjax middleware for Laravel

Pjax is a jQuery plugin that leverages ajax to speed up the loading time of your pages. It works by only fetching specific html fragments from the server, and client-side updating only happens on certain parts of the page.

The package provides a middleware that can return the response that the jQuery plugin expects.

There's a Vue-PJAX Adapter equivalent by @barnabaskecskes which doesn't require jQuery.

Spatie is a webdesign agency based in Antwerp, Belgium. You'll find an overview of all our open source projects on our website.

Installation

You can install the package via composer:

$ composer require spatie/laravel-pjax

Next you must add the \Spatie\Pjax\Middleware\FilterIfPjax-middleware to the kernel.

// app/Http/Kernel.php

...
protected $middleware = [
    ...
    \Spatie\Pjax\Middleware\FilterIfPjax::class,
];

Usage

The provided middleware provides the behaviour that the pjax plugin expects of the server:

An X-PJAX request header is set to differentiate a pjax request from normal XHR requests. In this case, if the request is pjax, we skip the layout html and just render the inner contents of the container.

Laravel cache busting tip

When using Laravel Mix to manage your frontend cache busting, you can use it to your advantage to bust pjax's cache. Simply include the mix method as the content of the x-pjax-version meta tag:

<meta http-equiv="x-pjax-version" content="{{ mix('/css/app.css') }}">

Multiple files:

<meta http-equiv="x-pjax-version" content="{{ mix('/css/app.css') . mix('/css/app2.css') }}">

This way, anytime your frontend's cache gets busted, pjax's cache gets automatically busted as well!

Changelog

Please see CHANGELOG for more information what has changed recently.

Testing

$ composer test

Contributing

Please see CONTRIBUTING for details.

Security

If you've found a bug regarding security please mail security@spatie.be instead of using the issue tracker.

Credits

The middleware in this package was originally written by Jeffrey Way for the Laracasts-lesson on pjax. His original code can be found in this repo on GitHub.

Support us

We invest a lot of resources into creating best in class open source packages. You can support us by buying one of our paid products.

We highly appreciate you sending us a postcard from your hometown, mentioning which of our package(s) you are using. You'll find our address on our contact page. We publish all received postcards on our virtual postcard wall.

Author: Spatie
Source Code: https://github.com/spatie/laravel-pjax 
License: MIT license

#laravel #javascript #php #performance 

Laravel-pjax: A Pjax Middleware for Laravel

Go-perfbook: Thoughts on Go Performance Optimization

go-perfbook

This document outlines best practices for writing high-performance Go code.

The first sections cover writing optimized code in any language. The later sections cover Go-specific techniques.

Writing and Optimizing Go code

This document outlines best practices for writing high-performance Go code.

While some discussions will be made for making individual services faster (caching, etc), designing performant distributed systems is beyond the scope of this work. There are already good texts on monitoring and distributed system design. Optimizing distributed systems encompasses an entirely different set of research and design trade-offs.

All the content will be licensed under CC-BY-SA.

This book is split into different sections:

  1. Basic tips for writing not-slow software
    • CS 101-level stuff
  2. Tips for writing fast software
    • Go-specific sections on how to get the best from Go
  3. Advanced tips for writing really fast software
    • For when your optimized code isn't fast enough

We can summarize these three sections as:

  1. "Be reasonable"
  2. "Be deliberate"
  3. "Be dangerous"

When and Where to Optimize

I'm putting this first because it's really the most important step. Should you even be doing this at all?

Every optimization has a cost. Generally, this cost is expressed in terms of code complexity or cognitive load -- optimized code is rarely simpler than the unoptimized version.

But there's another side that I'll call the economics of optimization. As a programmer, your time is valuable. There's the opportunity cost of what else you could be working on for your project, which bugs to fix, which features to add. Optimizing things is fun, but it's not always the right task to choose. Performance is a feature, but so is shipping, and so is correctness.

Choose the most important thing to work on. Sometimes it's not an actual CPU optimization, but a user-experience one. Something as simple as adding a progress bar, or making a page more responsive by doing computation in the background after rendering the page.

Sometimes this will be obvious: an hourly report that completes in three hours is probably less useful than one that completes in less than one.

Just because something is easy to optimize doesn't mean it's worth optimizing. Ignoring low-hanging fruit is a valid development strategy.

Think of this as optimizing your time.

You get to choose what to optimize and when to optimize. You can move the slider between "Fast Software" and "Fast Deployment"

People hear and mindlessly repeat "premature optimization is the root of all evil", but they miss the full context of the quote.

"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."

-- Knuth

Add: https://www.youtube.com/watch?time_continue=429&v=3WBaY61c9sE

  • don't ignore the easy optimizations
  • more knowledge of algorithms and data structures makes more optimizations "easy" or "obvious"

Should you optimize?

Yes, but only if the problem is important, the program is genuinely too slow, and there is some expectation that it can be made faster while maintaining correctness, robustness, and clarity."

-- The Practice of Programming, Kernighan and Pike

Premature optimization can also hurt you by tying you into certain decisions. The optimized code can be harder to modify if requirements change and harder to throw away (sunk-cost fallacy) if needed.

BitFunnel performance estimation has some numbers that make this trade-off explicit. Imagine a hypothetical search engine needing 30,000 machines across multiple data centers. These machines have a cost of approximately $1,000 USD per year. If you can double the speed of the software, this can save the company $15M USD per year. Even a single developer spending an entire year to improve performance by only 1% will pay for itself.

In the vast majority of cases, the size and speed of a program is not a concern. The easiest optimization is not having to do it. The second easiest optimization is just buying faster hardware.

Once you've decided you're going to change your program, keep reading.

How to Optimize

Optimization Workflow

Before we get into the specifics, let's talk about the general process of optimization.

Optimization is a form of refactoring. But each step, rather than improving some aspect of the source code (code duplication, clarity, etc), improves some aspect of the performance: lower CPU, memory usage, latency, etc. This improvement generally comes at the cost of readability. This means that in addition to a comprehensive set of unit tests (to ensure your changes haven't broken anything), you also need a good set of benchmarks to ensure your changes are having the desired effect on performance. You must be able to verify that your change really is lowering CPU. Sometimes a change you thought would improve performance will actually turn out to have a zero or negative change. Always make sure you undo your fix in these cases.

What is the best comment in source code you have ever encountered? - Stack Overflow:

//
// Dear maintainer:
//
// Once you are done trying to 'optimize' this routine,
// and have realized what a terrible mistake that was,
// please increment the following counter as a warning
// to the next guy:
//
// total_hours_wasted_here = 42
//

The benchmarks you are using must be correct and provide reproducible numbers on representative workloads. If individual runs have too high a variance, it will make small improvements more difficult to spot. You will need to use benchstat or equivalent statistical tests and won't be able just to eyeball it. (Note that using statistical tests is a good idea anyway.) The steps to run the benchmarks should be documented, and any custom scripts and tooling should be committed to the repository with instructions for how to run them. Be mindful of large benchmark suites that take a long time to run: it will make the development iterations slower.

Note also that anything that can be measured can be optimized. Make sure you're measuring the right thing.

The next step is to decide what you are optimizing for. If the goal is to improve CPU, what is an acceptable speed? Do you want to improve the current performance by 2x? 10x? Can you state it as "a problem of size N in less than time T"? Are you trying to reduce memory usage? By how much? How much slower is acceptable for what change in memory usage? What are you willing to give up in exchange for lower space requirements?

Optimizing for service latency is a trickier proposition. Entire books have been written on how to performance test web servers. The primary issue is that for a single function, performance is fairly consistent for a given problem size. For webservices, you don't have a single number. A proper web-service benchmark suite will provide a latency distribution for a given reqs/second level. This talk gives a good overview of some of the issues: "How NOT to Measure Latency" by Gil Tene

TODO: See the later section on optimizing web services

The performance goals must be specific. You will (almost) always be able to make something faster. Optimizing is frequently a game of diminishing returns. You need to know when to stop. How much effort are you going to put into getting the last little bit of work. How much uglier and harder to maintain are you willing to make the code?

Dan Luu's previously mentioned talk on BitFunnel performance estimation shows an example of using rough calculations to determine if your target performance figures are reasonable.

Simon Eskildsen has a talk from SRECon covering this topic in more depth: Advanced Napkin Math: Estimating System Performance from First Principles

Finally, Jon Bentley's "Programming Pearls" has a chapter titled "The Back of the Envelope" covering Fermi problems. Sadly, these kinds of estimation skills got a bad wrap thanks to their use in Microsoft style "puzzle interview questions" in the 1990s and early 2000s.

For greenfield development, you shouldn't leave all benchmarking and performance numbers until the end. It's easy to say "we'll fix it later", but if performance is really important it will be a design consideration from the start. Any significant architectural changes required to fix performance issues will be too risky near the deadline. Note that during development, the focus should be on reasonable program design, algorithms, and data structures. Optimizing at lower-levels of the stack should wait until later in the development cycle when a more complete view of the system performance is available. Any full-system profiles you do while the system is incomplete will give a skewed view of where the bottlenecks will be in the finished system.

TODO: How to avoid/detect "Death by 1000 cuts" from poorly written software. Solution: "Premature pessimization is the root of all evil". This matches with my Rule 1: Be deliberate. You don't need to write every line of code to be fast, but neither should by default do wasteful things.

"Premature pessimization is when you write code that is slower than it needs to be, usually by asking for unnecessary extra work, when equivalently complex code would be faster and should just naturally flow out of your fingers."

-- Herb Sutter

Benchmarking as part of CI is hard due to noisy neighbours and even different CI boxes if it's just you. Hard to gate on performance metrics. A good middle ground is to have benchmarks run by the developer (on appropriate hardware) and included in the commit message for commits that specifically address performance. For those that are just general patches, try to catch performance degradations "by eye" in code review.

TODO: how to track performance over time?

Write code that you can benchmark. Profiling you can do on larger systems. Benchmarking you want to test isolated pieces. You need to be able to extract and setup sufficient context that benchmarks test enough and are representative.

The difference between what your target is and the current performance will also give you an idea of where to start. If you need only a 10-20% performance improvement, you can probably get that with some implementation tweaks and smaller fixes. If you need a factor of 10x or more, then just replacing a multiplication with a left-shift isn't going to cut it. That's probably going to call for changes up and down your stack, possibly redesigning large portions of the system with these performance goals in mind.

Good performance work requires knowledge at many different levels, from system design, networking, hardware (CPU, caches, storage), algorithms, tuning, and debugging. With limited time and resources, consider which level will give the most improvement: it won't always be an algorithm or program tuning.

In general, optimizations should proceed from top to bottom. Optimizations at the system level will have more impact than expression-level ones. Make sure you're solving the problem at the appropriate level.

This book is mostly going to talk about reducing CPU usage, reducing memory usage, and reducing latency. It's good to point out that you can very rarely do all three. Maybe CPU time is faster, but now your program uses more memory. Maybe you need to reduce memory space, but now the program will take longer.

Amdahl's Law tells us to focus on the bottlenecks. If you double the speed of routine that only takes 5% of the runtime, that's only a 2.5% speedup in total wall-clock. On the other hand, speeding up routine that takes 80% of the time by only 10% will improve runtime by almost 8%. Profiles will help identify where time is actually spent.

When optimizing, you want to reduce the amount of work the CPU has to do. Quicksort is faster than bubble sort because it solves the same problem (sorting) in fewer steps. It's a more efficient algorithm. You've reduced the work the CPU needs to do in order to accomplish the same task.

Program tuning, like compiler optimizations, will generally make only a small dent in the total runtime. Large wins will almost always come from an algorithmic change or data structure change, a fundamental shift in how your program is organized. Compiler technology improves, but slowly. Proebsting's Law says compilers double in performance every 18 years, a stark contrast with the (slightly misunderstood interpretation) of Moore's Law that doubles processor performance every 18 months. Algorithmic improvements work at larger magnitudes. Algorithms for mixed integer programming improved by a factor of 30,000 between 1991 and 2008. For a more concrete example, consider this breakdown of replacing a brute force geo-spatial algorithm described in an Uber blog post with more specialized one more suited to the presented task. There is no compiler switch that will give you an equivalent boost in performance.

TODO: Optimizing floating point FFT and MMM algorithm differences in gttse07.pdf

A profiler might show you that lots of time is spent in a particular routine. It could be this is an expensive routine, or it could be a cheap routine that is just called many many times. Rather than immediately trying to speed up that one routine, see if you can reduce the number of times it's called or eliminate it completely. We'll discuss more concrete optimization strategies in the next section.

The Three Optimization Questions:

  • Do we have to do this at all? The fastest code is the code that's never run.
  • If yes, is this the best algorithm.
  • If yes, is this the best implementation of this algorithm.

Concrete optimization tips

Jon Bentley's 1982 work "Writing Efficient Programs" approached program optimization as an engineering problem: Benchmark. Analyze. Improve. Verify. Iterate. A number of his tips are now done automatically by compilers. A programmer's job is to use the transformations compilers can't do.

There are summaries of the book:

and the program tuning rules:

When thinking of changes you can make to your program, there are two basic options: you can either change your data or you can change your code.

Data Changes

Changing your data means either adding to or altering the representation of the data you're processing. From a performance perspective, some of these will end up changing the O() associated with different aspects of the data structure. This may even include preprocessing the input to be in a different, more useful format.

Ideas for augmenting your data structure:

Extra fields

The classic example of this is storing the length of a linked list in a field in the root node. It takes a bit more work to keep it updated, but then querying the length becomes a simple field lookup instead of an O(n) traversal. Your data structure might present a similar win: a bit of bookkeeping during some operations in exchange for some faster performance on a common use case.

Similarly, storing pointers to frequently needed nodes instead of performing additional searches. This covers things like the "backwards" links in a doubly-linked list to make node removal O(1). Some skip lists keep a "search finger", where you store a pointer to where you just were in your data structure on the assumption it's a good starting point for your next operation.

Extra search indexes

Most data structures are designed for a single type of query. If you need two different query types, having an additional "view" onto your data can be large improvement. For example, a set of structs might have a primary ID (integer) that you use to look up in a slice, but sometimes need to look up with a secondary ID (string). Instead of iterating over the slice, you can augment your data structure with a map either from string to ID or directly to the struct itself.

Extra information about elements

For example, keeping a bloom filter of all the elements you've inserted can let you quickly return "no match" for lookups. These need to be small and fast to not overwhelm the rest of the data structure. (If a lookup in your main data structure is cheap, the cost of the bloom filter will outweigh any savings.)

If queries are expensive, add a cache.

At a larger level, an in-process or external cache (like memcache) can help. It might be excessive for a single data structure. We'll cover more about caches below.

These sorts of changes are useful when the data you need is cheap to store and easy to keep up-to-date.

These are all clear examples of "do less work" at the data structure level. They all cost space. Most of the time if you're optimizing for CPU, your program will use more memory. This is the classic space-time trade-off.

It's important to examine how this tradeoff can affect your solutions -- it's not always straight-forward. Sometimes a small amount of memory can give a significant speed, sometimes the tradeoff is linear (2x memory usage == 2x performance speedup), sometimes it's significantly worse: a huge amount of memory gives only a small speedup. Where you need to be on this memory/performance curve can affect what algorithm choices are reasonable. It's not always possible to just tune an algorithm parameter. Different memory usages might be completely different algorithmic approaches.

Lookup tables also fall into this space-time trade-off. A simple lookup table might just be a cache of previously requested computations.

If the domain is small enough, the entire set of results could be precomputed and stored in the table. As an example, this could be the approach taken for a fast popcount implementation, where by the number of set bits in byte is stored in a 256-entry table. A larger table could store the bits required for all 16-bit words. In this case, they're storing exact results.

A number of algorithms for trigonometric functions use lookup tables as a starting point for a calculation.

If your program uses too much memory, it's also possible to go the other way. Reduce space usage in exchange for increased computation. Rather than storing things, calculate them every time. You can also compress the data in memory and decompress it on the fly when you need it.

If the data you're processing is on disk, instead of loading everything into RAM, you could create an index for the pieces you need and keep that in memory, or pre-process the file into smaller workable chunks.

Small Memory Software is a book available online covering techniques for reducing the space used by your programs. While it was originally written targeting embedded developers, the ideas are applicable for programs on modern hardware dealing with huge amounts of data.

Rearrange your data

Eliminate structure padding. Remove extra fields. Use a smaller data type.

Change to a slower data structure

Simpler data structures frequently have lower memory requirements. For example, moving from a pointer-heavy tree structure to use slice and linear search instead.

Custom compression format for your data

Compression algorithms depend very heavily on what is being compressed. It's best to choose one that suites your data. If you have []byte, the something like snappy, gzip, lz4, behaves well. For floating point data there is go-tsz for time series and fpc for scientific data. Lots of research has been done around compressing integers, generally for information retrieval in search engines. Examples include delta encoding and varints to more complex schemes involving Huffman encoded xor-differences. You can also come up with custom compression formats optimized for exactly your data.

Do you need to inspect the data or can it stay compressed? Do you need random access or only streaming? If you need access to individual entries but don't want to decompress the entire thing, you can compress the data in smaller blocks and keep an index indicating what range of entries are in each block. Access to a single entry just needs to check the index and unpack the smaller data block.

If your data is not just in-process but will be written to disk, what about data migration or adding/removing fields. You'll now be dealing with raw []byte instead of nice structured Go types, so you'll need unsafe and to consider serialization options.

We will talk more about data layouts later.

Modern computers and the memory hierarchy make the space/time trade-off less clear. It's very easy for lookup tables to be "far away" in memory (and therefore expensive to access) making it faster to just recompute a value every time it's needed.

This also means that benchmarking will frequently show improvements that are not realized in the production system due to cache contention (e.g., lookup tables are in the processor cache during benchmarking but always flushed by "real data" when used in a real system. Google's Jump Hash paper in fact addressed this directly, comparing performance on both a contended and uncontended processor cache. (See graphs 4 and 5 in the Jump Hash paper)

TODO: how to simulate a contended cache, show incremental cost TODO: sync.Map as a Go-ish example of cache-contention addressing

Another aspect to consider is data-transfer time. Generally network and disk access is very slow, and so being able to load a compressed chunk will be much faster than the extra CPU time required to decompress the data once it has been fetched. As always, benchmark. A binary format will generally be smaller and faster to parse than a text one, but at the cost of no longer being as human readable.

For data transfer, move to a less chatty protocol, or augment the API to allow partial queries. For example, an incremental query rather than being forced to fetch the entire dataset each time.

Algorithmic Changes

If you're not changing the data, the other main option is to change the code.

The biggest improvement is likely to come from an algorithmic change. This is the equivalent of replacing bubble sort (O(n^2)) with quicksort (O(n log n)) or replacing a linear scan through an array (O(n)) with a binary search (O(log n)) or a map lookup (O(1)).

This is how software becomes slow. Structures originally designed for one use is repurposed for something it wasn't designed for. This happens gradually.

It's important to have an intuitive grasp of the different big-O levels. Choose the right data structure for your problem. You don't have to always shave cycles, but this just prevents dumb performance issues that might not be noticed until much later.

The basic classes of complexity are:

O(1): a field access, array or map lookup

Advice: don't worry about it (but keep in mind the constant factor.)

O(log n): binary search

Advice: only a problem if it's in a loop

O(n): simple loop

Advice: you're doing this all the time

O(n log n): divide-and-conquer, sorting

Advice: still fairly fast

O(n*m): nested loop / quadratic

Advice: be careful and constrain your set sizes

Anything else between quadratic and subexponential

Advice: don't run this on a million rows

O(b ^ n), O(n!): exponential and up

Advice: good luck if you have more than a dozen or two data points

Link: http://bigocheatsheet.com

Let's say you need to search through of an unsorted set of data. "I should use a binary search" you think, knowing that a binary search is O(log n) which is faster than the O(n) linear scan. However, a binary search requires that the data is sorted, which means you'll need to sort it first, which will take O(n log n) time. If you're doing lots of searches, then the upfront cost of sorting will pay off. On the other hand, if you're mostly doing lookups, maybe having an array was the wrong choice and you'd be better off paying the O(1) lookup cost for a map instead.

Being able to analyze your problem in terms of big-O notation also means you can figure out if you're already at the limit for what is possible for your problem, and if you need to change approaches in order to speed things up. For example, finding the minimum of an unsorted list is O(n), because you have to look at every single item. There's no way to make that faster.

If your data structure is static, then you can generally do much better than the dynamic case. It becomes easier to build an optimal data structure customized for exactly your lookup patterns. Solutions like minimal perfect hashing can make sense here, or precomputed bloom filters. This also make sense if your data structure is "static" for long enough and you can amortize the up-front cost of construction across many lookups.

Choose the simplest reasonable data structure and move on. This is CS 101 for writing "not-slow software". This should be your default development mode. If you know you need random access, don't choose a linked-list. If you know you need in-order traversal, don't use a map. Requirements change and you can't always guess the future. Make a reasonable guess at the workload.

http://daslab.seas.harvard.edu/rum-conjecture/

Data structures for similar problems will differ in when they do a piece of work. A binary tree sorts a little at a time as inserts happen. A unsorted array is faster to insert but it's unsorted: at the end to "finalize" you need to do the sorting all at once.

When writing a package to be used by others, avoid the temptation to optimize upfront for every single use case. This will result in unreadable code. Data structures by design are effectively single-purpose. You can neither read minds nor predict the future. If a user says "Your package is too slow for this use case", a reasonable answer might be "Then use this other package over here". A package should "do one thing well".

Sometimes hybrid data structures will provide the performance improvement you need. For example, by bucketing your data you can limit your search to a single bucket. This still pays the theoretical cost of O(n), but the constant will be smaller. We'll revisit these kinds of tweaks when we get to program tuning.

Two things that people forget when discussion big-O notation:

One, there's a constant factor involved. Two algorithms which have the same algorithmic complexity can have different constant factors. Imagine looping over a list 100 times vs just looping over it once. Even though both are O(n), one has a constant factor that's 100 times higher.

These constant factors are why even though merge sort, quicksort, and heapsort all O(n log n), everybody uses quicksort because it's the fastest. It has the smallest constant factor.

The second thing is that big-O only says "as n grows to infinity". It talks about the growth trend, "As the numbers get big, this is the growth factor that will dominate the run time." It says nothing about the actual performance, or how it behaves with small n.

There's frequently a cut-off point below which a dumber algorithm is faster. A nice example from the Go standard library's sort package. Most of the time it's using quicksort, but it has a shell-sort pass then insertion sort when the partition size drops below 12 elements.

For some algorithms, the constant factor might be so large that this cut-off point may be larger than all reasonable inputs. That is, the O(n^2) algorithm is faster than the O(n) algorithm for all inputs that you're ever likely to deal with.

This also means you need to know representative input sizes, both for choosing the most appropriate algorithm and for writing good benchmarks. 10 items? 1000 items? 1000000 items?

This also goes the other way: For example, choosing to use a more complicated data structure to give you O(n) scaling instead of O(n^2), even though the benchmarks for small inputs got slower. This also applies to most lock-free data structures. They're generally slower in the single-threaded case but more scalable when many threads are using it.

The memory hierarchy in modern computers confuses the issue here a little bit, in that caches prefer the predictable access of scanning a slice to the effectively random access of chasing a pointer. Still, it's best to begin with a good algorithm. We will talk about this in the hardware-specific section.

TODO: extending last paragraph, mention O() notation is an model where each operation has fixed cost. That's a wrong assumption on modern hardware.

The fight may not always go to the strongest, nor the race to the fastest, but that's the way to bet. -- Rudyard Kipling

Sometimes the best algorithm for a particular problem is not a single algorithm, but a collection of algorithms specialized for slightly different input classes. This "polyalgorithm" quickly detects what kind of input it needs to deal with and then dispatches to the appropriate code path. This is what the sorting package mentioned above does: determine the problem size and choose a different algorithm. In addition to combining quicksort, shell sort, and insertion sort, it also tracks recursion depth of quicksort and calls heapsort if necessary. The string and bytes packages do something similar, detecting and specializing for different cases. As with data compression, the more you know about what your input looks like, the better your custom solution can be. Even if an optimization is not always applicable, complicating your code by determining that it's safe to use and executing different logic can be worth it.

This also applies to subproblems your algorithm needs to solve. For example, being able to use radix sort can have a significant impact on performance, or using quickselect if you only need a partial sort.

Sometimes rather than specialization for your particular task, the best approach is to abstract it into a more general problem space that has been well-studied by researchers. Then you can apply the more general solution to your specific problem. Mapping your problem into a domain that already has well-researched implementations can be a significant win.

Similarly, using a simpler algorithm means that tradeoffs, analysis, and implementation details are more likely to be more studied and well understood than more esoteric or exotic and complex ones.

Simpler algorithms can also be faster. These two examples are not isolated cases https://go-review.googlesource.com/c/crypto/+/169037 https://go-review.googlesource.com/c/go/+/170322/

TODO: notes on algorithm selection

TODO: improve worst-case behaviour at slight cost to average runtime linear-time regexp matching

While most algorithms are deterministic, there are a class of algorithms that use randomness as a way to simplify otherwise complex decision making step. Instead of having code that does the Right Thing, you use randomness to select a probably not bad thing. For example, a treap is a probabilistically balanced binary tree. Each node has a key, but also is assigned a random value. When inserting into the tree, the normal binary tree insertion path is followed but the nodes also obey the heap property based on each nodes randomly assigned weight. This simpler approach replaces otherwise complicated tree rotating solutions (like AVL and Red Black trees) but still maintains a balanced tree with O(log n) insert/lookup "with high probability. Skip lists are another similar, simple data structure that uses randomness to produce "probably" O(log n) insertion and lookups.

Similarly, choosing a random pivot for quicksort can be simpler than a more complex median-of-medians approach to finding a good pivot, and the probability that bad pivots are continually (randomly) chosen and degrading quicksort's performance to O(n^2) is vanishingly small.

Randomized algorithms are classed as either "Monte Carlo" algorithms or "Las Vegas" algorithms, after two well known gambling locations. A Monte Carlo algorithm gambles with correctness: it might output a wrong answer (or in the case of the above, an unbalanced binary tree). A Las Vegas algorithm always outputs a correct answer, but might take a very long time to terminate.

Another well-known example of a randomized algorithm is the Miller-Rabin primality testing algorithm. Each iteration will output either "not prime" or "maybe prime". While "not prime" is certain, the "maybe prime" is correct with probability at least 1/2. That is, there are non-primes for which "maybe prime" will still be output. By running many iterations of Miller-Rabin, we can make the probability of failure (that is, outputing "maybe prime" for a composite number) as small as we'd like. If it passes 200 iterations, then we can say the number is composite with probability at most 1/(2^200).

Another area where randomness plays a part is called "The power of two random choices". While initially the research was applied to load balancing, it turned out to be widely applicable to a number of selection problems. The idea is that rather than trying to find the best selection out of a group of items, pick two at random and select the best from that. Returning to load balancing (or hash table chains), the power of two random choices reduces the expected load (or hash chain length) from O(log n) items to O(log log n) items. For more information, see The Power of Two Random Choices: A Survey of Techniques and Results

randomized algorithms: other caching algorithms statistical approximations (frequently depend on sample size and not population size)

TODO: batching to reduce overhead: https://lemire.me/blog/2018/04/17/iterating-in-batches-over-data-structures-can-be-much-faster/

TODO: - Algorithm Design Manual: http://algorist.com/algorist.html - How To Solve It By Computer - to what extent is this a "how to write algorithms" book? If you're going to change the code to speed it up, by definition you're writing new algorithms. Soo... maybe?

Benchmark Inputs

Real-world inputs rarely match the theoretical "worst case". Benchmarking is vital to understanding how your system behaves in production.

You need to know what class of inputs your system will be seeing once deployed, and your benchmarks must use instances pulled from that same distribution. As we've seen, different algorithms make sense at different input sizes. If your expected input range is <100, then your benchmarks should reflect that. Otherwise, choosing an algorithm which is optimal for n=10^6 might not be the fastest.

Be able to generate representative test data. Different distributions of data can provoke different behaviours in your algorithm: think of the classic "quicksort is O(n^2) when the data is sorted" example. Similarly, interpolation search is O(log log n) for uniform random data, but O(n) worst case. Knowing what your inputs look like is the key to both representative benchmarks and for choosing the best algorithm. If the data you're using to test isn't representative of real workloads, you can easily end up optimizing for one particular data set, "overfitting" your code to work best with one specific set of inputs.

This also means your benchmark data needs to be representative of the real world. Using purely randomized inputs may skew the behaviour of your algorithm. Caching and compression algorithms both exploit skewed distributions not present in random data and so will perform worse, while a binary tree will perform better with random values as they will tend to keep the tree balanced. (This is the idea behind a treap, by the way.)

On the other hand, consider the case of testing a system with a cache. If your benchmark input consists only a single query, then every request will hit the cache giving potentially a very unrealistic view of how the system will behave in the real world with a more varied request pattern.

Also, note that some issues that are not apparent on your laptop might be visible once you deploy to production and are hitting 250k reqs/second on a 40 core server. Similarly, the behaviour of the garbage collector during benchmarking can misrepresent real-world impact. There are (rare) cases where a microbenchmark will show a slow-down, but real-world performance improves. Microbenchmarks can help nudge you in the right direction but being able to fully test the impact of a change across the entire system is best.

Writing good benchmarks can be difficult.

Use geometric mean to compare groups of benchmarks.

Evaluating Benchmark Accuracy:

Program Tuning

Program tuning used to be an art form, but then compilers got better. So now it turns out that compilers can optimize straight-forward code better than complicated code. The Go compiler still has a long way to go to match gcc and clang, but it does mean that you need to be careful when tuning and especially when upgrading Go versions that your code doesn't become "worse". There are definitely cases where tweaks to work around the lack of a particular compiler optimization became slower once the compiler was improved.

My RC6 cipher implementation had a 10% speed up for the inner loop just by switching to encoding/binary and math/bits instead of my hand-rolled versions.

Similarly, the compress/bzip2 package was sped by switching to simpler code the compiler was better able to optimize

If you are working around a specific runtime or compiler code generation issue, always document your change with a link to the upstream issue. This will allow you to quickly revisit your optimization once the bug is fixed.

Fight the temptation to cargo cult folklore-based "performance tips", or even over-generalize from your own experience. Each performance bug needs to be approached on its own merits. Even if something has worked previously, make sure to profile to ensure the fix is still applicable. Your previous work can guide you, but don't apply previous optimizations blindly.

Program tuning is an iterative process. Keep revisiting your code and seeing what changes can be made. Ensure you're making progress at each step. Frequently one improvement will enable others to be made. (Now that I'm not doing A, I can simplify B by doing C instead.) This means you need to keep looking at the entire picture and not get too obsessed with one small set of lines.

Once you've settled on the right algorithm, program tuning is the process of improving the implementation of that algorithm. In Big-O notation, this is the process of reducing the constants associated with your program.

All program tuning is either making a slow thing fast, or doing a slow thing fewer times. Algorithmic changes also fall into these categories, but we're going to be looking at smaller changes. Exactly how you do this varies as technologies change.

Making a slow thing fast might be replacing SHA1 or hash/fnv1 with a faster hash function. Doing a slow thing fewer times might be saving the result of the hash calculation of a large file so you don't have to do it multiple times.

Keep comments. If something doesn't need to be done, explain why. Frequently when optimizing an algorithm you'll discover steps that don't need to be performed under some circumstances. Document them. Somebody else might think it's a bug and needs to be put back.

Empty programs gives the wrong answer in no time at all.

It's easy to be fast if you don't have to be correct.

"Correctness" can depend on the problem. Heuristic algorithms that are mostly-right most of the time can be fast, as can algorithms which guess and improve allowing you to stop when you hit an acceptable limit.

Cache common cases:

We're all familiar with memcache, but there are also in-process caches. Using an in-process cache saves the cost of both the network call and the cost of serialization. On the other hand, this increases GC pressure as there is more memory to keep track of. You also need to consider eviction strategies, cache invalidation, and thread-safety. An external cache will generally handle eviction for you, but cache invalidation remains a problem. Thread-safety can also be an issue with external caches as it becomes effectively shared mutable state either between different goroutines in the same service or even different service instances if the external cache is shared.

A cache saves information you've just spent time computing in the hopes that you'll be able to reuse it again soon and save the computation time. A cache doesn't need to be complex. Even storing a single item -- the most recently seen query/response -- can be a big win, as seen in the time.Parse() example below.

With caches it's important to compare the cost (in terms of actual wall-clock and code complexity) of your caching logic to simply refetching or recomputing the data. The more complex algorithms that give higher hit rates are generally not cheap themselves. Randomized cache eviction is simple and fast and can be effective in many cases. Similarly, randomized cache insertion can limit your cache to only popular items with minimal logic. While these may not be as effective as the more complex algorithms, the big improvement will be adding a cache in the first place: choosing exactly which caching algorithm gives only minor improvements.

It's important to benchmark your choice of cache eviction algorithm with real-world traces. If in the real world repeated requests are sufficiently rare, it can be more expensive to keep cached responses around than to simply recompute them when needed. I've had services where testing with production data showed even an optimal cache wasn't worth it. we simply did't have sufficient repeated requests to make the added complexity of a cache make sense.

Your expected cache hit ratio is important. You'll want to export the ratio to your monitoring stack. Changing ratios will show a shift in traffic. Then it's time to revisit the cache size or the expiration policy.

A large cache can increase GC pressure. At the extreme (little or no eviction, caching all requests to an expensive function) this can turn into memoization

Program tuning:

Program tuning is the art of iteratively improving a program in small steps. Egon Elbre lays out his procedure:

  • Come up with a hypothesis as to why your program is slow.
  • Come up with N solutions to solve it
  • Try them all and keep the fastest.
  • Keep the second fastest just in case.
  • Repeat.

Tunings can take many forms.

  • If possible, keep the old implementation around for testing.
  • If not possible, generate sufficient golden test cases to compare output to.
  • "Sufficient" means including edge cases, as those are the ones likely to get affected by tuning as you aim to improve performance in the general case.
  • Exploit a mathematical identity:
    • Note that implementing and optimizing numerical calculations is almost its own field
    • "pay only for what you use, not what you could have used"
      • zero only part of an array, rather than the whole thing
    • best done in tiny steps, a few statements at a time
    • cheap checks before more expensive checks:
      • e.g., strcmp before regexp, (q.v., bloom filter before query) "do expensive things fewer times"
    • common cases before rare cases i.e., avoid extra tests that always fail
    • unrolling still effective: https://play.golang.org/p/6tnySwNxG6O
      • code size. vs branch test overhead
    • using offsets instead of slice assignment can help with bounds checks, data dependencies, and code gen (less to copy in inner loop).
    • remove bounds checks and nil checks from loops: https://go-review.googlesource.com/c/go/+/151158
    • other tricks for the prove pass
    • this is where pieces of Hacker's Delight fall

Many folklore performance tips for tuning rely on poorly optimizing compilers and encourage the programmer to do these transformations by hand. Compilers have been using shifts instead of multiplying or dividing by a power of two for 15 years now -- nobody should be doing that by hand. Similarly, hoisting invariant calculations out of loops, basic loop unrolling, common sub-expression elimination and many others are all done automatically by gcc and clang and the like. Go's compiler does many of these and continues to improve. As always, benchmark before committing to the new version.

The transformations the compiler can't do rely on you knowing things about the algorithm, about your input data, about invariants in your system, and other assumptions you can make, and factoring that implicit knowledge into removing or altering steps in the data structure.

Every optimization codifies an assumption about your data. These must be documented and, even better, tested for. These assumptions are going to be where your program crashes, slows down, or starts returning incorrect data as the system evolves.

Program tuning improvements are cumulative. 5x 3% improvements is a 15% improvement. When making optimizations, it's worth it to think about the expected performance improvement. Replacing a hash function with a faster one is a constant factor improvement.

Understanding your requirements and where they can be altered can lead to performance improvements. One issue that was presented in the #performance Gophers Slack channel was the amount of time that was spent creating a unique identifier for a map of string key/value pairs. The original solution was to extract the keys, sort them, and pass the resulting string to a hash function. The improved solution we came up was to individually hash the keys/values as they were added to the map, then xor all these hashes together to create the identifier.

Here's an example of specialization.

Let's say we're processing a massive log file for a single day, and each line begins with a time stamp.

Sun  4 Mar 2018 14:35:09 PST <...........................>

For each line, we're going to call time.Parse() to turn it into a epoch. If profiling shows us time.Parse() is the bottleneck, we have a few options to speed things up.

The easiest is to keep a single-item cache of the previously seen time stamp and the associated epoch. As long as our log file has multiple lines for a single second, this will be a win. For the case of a 10 million line log file, this strategy reduces the number of expensive calls to time.Parse() from 10,000,000 to 86400 -- one for each unique second.

TODO: code example for single-item cache

Can we do more? Because we know exactly what format the timestamps are in and that they all fall in a single day, we can write custom time parsing logic that takes this into account. We can calculate the epoch for midnight, then extract hour, minute, and second from the timestamp string -- they'll all be in fixed offsets in the string -- and do some integer math.

TODO: code example for string offset version

In my benchmarks, this reduced the time parsing from 275ns/op to 5ns/op. (Of course, even at 275 ns/op, you're more likely to be blocked on I/O and not CPU for time parsing.)

The general algorithm is slow because it has to handle more cases. Your algorithm can be faster because you know more about your problem. But the code is more closely tied to exactly what you need. It's much more difficult to update if the time format changes.

Optimization is specialization, and specialized code is more fragile to change than general purpose code.

The standard library implementations need to be "fast enough" for most cases. If you have higher performance needs you will probably need specialized implementations.

Profile regularly to ensure to track the performance characteristics of your system and be prepared to re-optimize as your traffic changes. Know the limits of your system and have good metrics that allow you to predict when you will hit those limits.

When the usage of your application changes, different pieces may become hotspots. Revisit previous optimizations and decide if they're still worth it, and revert to more readable code when possible. I had one system that I had optimized process startup time with a complex set of mmap, reflect, and unsafe. Once we changed how the system was deployed, this code was no longer required and I replaced it with much more readable regular file operations.

TODO(dgryski): hash function work should fall here; manually inlining, removing structs, unrolling loops, removing bounds checks

Optimization workflow summary

All optimizations should follow these steps:

  1. determine your performance goals and confirm you are not meeting them
  2. profile to identify the areas to improve.
    • This can be CPU, heap allocations, or goroutine blocking.
  3. benchmark to determine the speed up your solution will provide using the built-in benchmarking framework (http://golang.org/pkg/testing/)
    • Make sure you're benchmarking the right thing on your target operating system and architecture.
  4. profile again afterwards to verify the issue is gone
  5. use https://godoc.org/golang.org/x/perf/benchstat or https://github.com/codahale/tinystat to verify that a set of timings are 'sufficiently' different for an optimization to be worth the added code complexity.
  6. use https://github.com/tsenart/vegeta for load testing http services (+ other fancy ones: k6, fortio, fbender)
    • if possible, test ramp-up/ramp-down in addition to steady-state load
  7. make sure your latency numbers make sense

TODO: mention github.com/aclements/perflock as cpu noise reduction tool

The first step is important. It tells you when and where to start optimizing. More importantly, it also tells you when to stop. Pretty much all optimizations add code complexity in exchange for speed. And you can always make code faster. It's a balancing act.

Garbage Collection

You pay for memory allocation more than once. The first is obviously when you allocate it. But you also pay every time the garbage collection runs.

Reduce/Reuse/Recycle. -- @bboreham

  • Stack vs. heap allocations
  • What causes heap allocations?
  • Understanding escape analysis (and the current limitation)
  • /debug/pprof/heap , and -base
  • API design to limit allocations:
    • allow passing in buffers so caller can reuse rather than forcing an allocation
    • you can even modify a slice in place carefully while you scan over it
    • passing in a struct could allow caller to stack allocate it
  • reducing pointers to reduce gc scan times
    • pointer-free slices
    • maps with both pointer-free keys and values
  • GOGC
  • buffer reuse (sync.Pool vs or custom via go-slab, etc)
  • slicing vs. offset: pointer writes while GC is running need writebarrier: https://github.com/golang/go/commit/b85433975aedc2be2971093b6bbb0a7dc264c8fd
  • use error variables instead of errors.New() / fmt.Errorf() at call site (performance or style? interface requires pointer, so it escapes to heap anyway)
  • use structured errors to reduce allocation (pass struct value), create string at error printing time
  • size classes
  • beware pinning larger allocation with smaller substrings or slices

Runtime and compiler

  • cost of calls via interfaces (indirect calls on the CPU level)
  • runtime.convT2E / runtime.convT2I
  • type assertions vs. type switches
  • defer
  • special-case map implementations for ints, strings
  • bounds check elimination
  • []byte <-> string copies, map optimizations
  • two-value range will copy an array, use the slice instead:
  • use string concatenation instead of fmt.Sprintf where possible; runtime has optimized routines for it

Unsafe

Common gotchas with the standard library

  • time.After() leaks until it fires; use t := NewTimer(); t.Stop() / t.Reset()
  • Reusing HTTP connections...; ensure the body is drained (issue #?)
  • rand.Int() and friends are 1) mutex protected and 2) expensive to create
    • consider alternate random number generation (go-pcgr, xorshift)
  • binary.Read and binary.Write use reflection and are slow; do it by hand. (https://github.com/conformal/yubikey/commit/613e3b04ae2eeb78e6a19636b8ff8e9106d2e7bc)
  • use strconv instead of fmt if possible
  • Use strings.EqualFold(str1, str2) instead of strings.ToLower(str1) == strings.ToLower(str2) or strings.ToUpper(str1) == strings.ToUpper(str2) to efficiently compare strings if possible.
  • ...

Alternate implementations

Popular replacements for standard library packages:

  • encoding/json -> ffjson, easyjson, jingo (only encoder), etc
  • net/http
    • fasthttp (but incompatible API, not RFC compliant in subtle ways)
    • httprouter (has other features besides speed; I've never actually seen routing in my profiles)
  • regexp -> ragel (or other regular expression package)
  • serialization
  • database/sql -> has tradeoffs that affect performance
    • look for drivers that don't use it: jackx/pgx, crawshaw sqlite, ...
  • gccgo (benchmark!), gollvm (WIP)
  • container/list: use a slice instead (almost always)

cgo

cgo is not go -- Rob Pike

  • Performance characteristics of cgo calls
  • Tricks to reduce the costs: batching
  • Rules on passing pointers between Go and C
  • syso files (race detector, dev.boringssl)

Advanced Techniques

Techniques specific to the architecture running the code

introduction to CPU caches

  • performance cliffs
  • building intuition around cache-lines: sizes, padding, alignment
  • OS tools to view cache-misses (perf)
  • maps vs. slices
  • SOA vs AOS layouts: row-major vs. column major; when you have an X, do you need another X or do you need a Y?
  • temporal and spacial locality: use what you have and what's nearby as much as possible
  • reducing pointer chasing
  • explicit memory prefetching; frequently ineffective; lack of intrinsics means function call overhead (removed from runtime)
  • make the first 64-bytes of your struct count

branch prediction

remove branches from inner loops: if a { for { } } else { for { } } instead of for { if a { } else { } } benchmark due to branch prediction structure to avoid branch

if i % 2 == 0 { evens++ } else { odds++ }

counts[i & 1] ++ "branch-free code", benchmark; not always faster, but frequently harder to read TODO: ASCII class counts example, with benchmarks

sorting data can help improve performance via both cache locality and branch prediction, even taking into account the time it takes to sort

function call overhead: inliner is getting better

reduce data copies (including for repeated large lists of function params)

Comment about Jeff Dean's 2002 numbers (plus updates)

  • cpus have gotten faster, but memory hasn't kept up

TODO: little comment about code-aligment free optimization (or unoptimization)

Concurrency

  • Figure out which pieces can be done in parallel and which must be sequential
  • goroutines are cheap, but not free.
  • Optimizing multi-threaded code
    • false-sharing -> pad to cache-line size
    • true sharing -> sharding
  • Overlap with previous section on caches and false/true sharing
  • Lazy synchronization; it's expensive, so duplicating work may be cheaper
  • things you can control: number of workers, batch size

You need a mutex to protect shared mutable state. If you have lots of mutex contention, you need to either reduce the shared, or reduce the mutable. Two ways to reduce the shared are 1) shard the locks or 2) process independently and combine afterwards. To reduce mutable: well, make your data structure read-only. You can also reduce the time the data needs be shared by reducing the critical section -- hold the lock as little as needed. Sometimes a RWMutex will be sufficient, although note that they're slower but they allow multiple readers in.

If you're sharding the locks, be careful of shared cache-lines. You'll need to pad to avoid cache-line bouncing between processors.

var stripe [8]struct{ sync.Mutex; _ [7]uint64 } // mutex is 64-bits; padding fills the rest of the cacheline

Don't do anything expensive in your critical section if you can help it. This includes things like I/O (which are cheap but slow).

TODO: how to decompose problem for concurrency TODO: reasons parallel implementation might be slower (communication overhead, best algorithm is sequential, ... )

Assembly

  • Stuff about writing assembly code for Go
  • compilers improve; the bar is high
  • replace as little as possible to make an impact; maintenance cost is high
  • good reasons: SIMD instructions or other things outside of what Go and the compiler can provide
  • very important to benchmark: improvements can be huge (10x for go-highway) zero (go-speck/rc6/farm32), or even slower (no inlining)
  • rebenchmark with new versions to see if you can delete your code yet
    • TODO: link to 1.11 patches removing asm code
  • always have pure-Go version (purego build tag): testing, arm, gccgo
  • brief intro to syntax
  • how to type the middle dot
  • calling convention: everything is on the stack, followed by the return values.
  • using opcodes unsupported by the asm (asm2plan9, but this is getting rarer)
  • notes about why inline assembly is hard: golang/go#26891
  • all the tooling to make this easier:
  • https://github.com/golang/go/wiki/AssemblyPolicy
  • Design of the Go Assembler: https://talks.golang.org/2016/asm.slide

Optimizing an entire service

Most of the time you won't be presented with a single CPU-bound routine. That's the easy case. If you have a service to optimize, you need to look at the entire system. Monitoring. Metrics. Log lots of things over time so you can see them getting worse and so you can see the impact your changes have in production.

tip.golang.org/doc/diagnostics.html

  • references for system design: SRE Book, practical distributed system design
  • extra tooling: more logging + analysis
  • The two basic rules: either speed up the slow things or do them less frequently.
  • distributed tracing to track bottlenecks at a higher level
  • query patterns for querying a single server instead of in bulk
  • your performance issues may not be your code, but you'll have to work around them anyway
  • https://docs.microsoft.com/en-us/azure/architecture/antipatterns/

Tooling

Introductory Profiling

This is a quick cheat-sheet for using the pprof tooling. There are plenty of other guides available on this. Check out https://github.com/davecheney/high-performance-go-workshop.

TODO(dgryski): videos?

  1. Introduction to pprof
  2. Writing and running (micro)benchmarks
    • small, like unit tests
    • profile, extract hot code to benchmark, optimize benchmark, profile.
    • -cpuprofile / -memprofile / -benchmem
    • 0.5 ns/op means it was optimized away -> how to avoid
    • tips for writing good microbenchmarks (remove unnecessary work, but add baselines)
  3. How to read it pprof output
  4. What are the different pieces of the runtime that show up
  • malloc, gc workers
  • runtime._ExternalCode
  1. Macro-benchmarks (Profiling in production)
    • larger, like end-to-end tests
    • net/http/pprof, debug muxer
    • because it's sampling, hitting 10 servers at 100hz is the same as hitting 1 server at 1000hz
  2. Using -base to look at differences
  3. Memory options: -inuse_space, -inuse_objects, -alloc_space, -alloc_objects
  4. Profiling in production; localhost+ssh tunnels, auth headers, using curl.
  5. How to read flame graphs

Tracer

Look at some more interesting/advanced tooling

Appendix: Implementing Research Papers

Tips for implementing papers: (For algorithm read also data structure)

  • Don't. Start with the obvious solution and reasonable data structures.

"Modern" algorithms tend to have lower theoretical complexities but high constant factors and lots of implementation complexity. One of the classic examples of this is Fibonacci heaps. They're notoriously difficult to get right and have a huge constant factor. There has been a number of papers published comparing different heap implementations on different workloads, and in general the 4- or 8-ary implicit heaps consistently come out on top. And even in the cases where Fibonacci heap should be faster (due to O(1) "decrease-key"), experiments with Dijkstra's depth-first search algorithm show it's faster when they use the straight heap removal and addition.

Similarly, treaps or skiplists vs. the more complex red-black or AVL trees. On modern hardware, the "slower" algorithm may be fast enough, or even faster.

The fastest algorithm can frequently be replaced by one that is almost as fast and much easier to understand.

-- Douglas W. Jones, University of Iowa

and

Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy.

Rule 4. Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures. -- "Notes on C Programming" (Rob Pike, 1989)

The added complexity has to be enough that the payoff is actually worth it. Another example is cache eviction algorithms. Different algorithms can have much higher complexity for only a small improvement in hit ratio. Of course, you may not be able to test this until you have a working implementation and have integrated it into your program.

Sometimes the paper will have graphs, but much like the trend towards publishing only positive results, these will tend to be skewed in favour of showing how good the new algorithm is.

  • Choose the right paper.
  • Look for the paper their algorithm claims to beat and implement that.

Frequently, earlier papers will be easier to understand and necessarily have simpler algorithms.

Not all papers are good.

Look at the context the paper was written in. Determine assumptions about the hardware: disk space, memory usage, etc. Some older papers make different tradeoffs that were reasonable in the 70s or 80s but don't necessarily apply to your use case. For example, what they determine to be "reasonable" memory vs. disk usage tradeoffs. Memory sizes are now orders of magnitude larger, and SSDs have altered the latency penalty for using disk. Similarly, some streaming algorithms are designed for router hardware, which can make it a pain to translate into software.

Make sure the assumptions the algorithm makes about your data hold.

This will take some digging. You probably don't want to implement the first paper you find.

Make sure you understand the algorithm. This sounds obvious, but it will be impossible to debug otherwise.

https://blizzard.cs.uwaterloo.ca/keshav/home/Papers/data/07/paper-reading.pdf

A good understanding may allow you to extract the key idea from the paper and possibly apply just that to your problem, which may be simpler than reimplementing the entire thing.

The original paper for a data structure or algorithm isn't always the best. Later papers may have better explanations.

Some papers release reference source code which you can compare against, but

  1. academic code is almost universally terrible
  2. beware licensing restrictions ("research purposes only")
  3. beware bugs; edge cases, error checking, performance etc.

Other resources on this topic:

Contributing

This is a work-in-progress book in Go performance.

There are different ways to contribute:

  1. add to or summarizes the resources in TODO
  2. add bullet points or new topics to be covered
  3. write prose and flesh out the sections in the book

Eventually sample programs to optimize and exercises will be needed (maybe).

Coordination will be done in the #performance channel on the Gophers slack.

Multiple Language Versions

Author: dgryski
Source Code: https://github.com/dgryski/go-perfbook/ 
License: 

#go #golang #performance 

Go-perfbook: Thoughts on Go Performance Optimization

Go-critic: The Most Opinionated Go Source Code Linter for Code Audit

go-critic

Highly extensible Go source code linter providing checks currently missing from other linters.

Logo

There is never too much static code analysis. Try it out.

Features

  • Almost 100 diagnostics that check for bugs, performance and style issues
  • Extensible without re-compilation with dynamic rules
  • Includes #opinionated checks with very strict and specific requirements
  • Self-documented: gocritic doc <checkname> gives a checker description

Installation

For most users, using go-critic under golangci-lint is enough.

Precompiled go-critic binaries can be found at releases page.

Instructions below show how to build go-critic from sources.

GO111MODULE=on go get -v -u github.com/go-critic/go-critic/cmd/gocritic

If the command above does not work, you can try cloning this repository under your GOPATH and run make gocritic.

On macOS, you can also install go-critic using MacPorts: sudo port install go-critic

Usage

Be sure gocritic executable is under your $PATH.

Usage of gocritic: gocritic [sub-command] [sub-command args...] Run gocritic without arguments to get help output.

Supported sub-commands:
    check - run linter over specified targets
        $ gocritic check -help
        $ gocritic check -v -enable='paramTypeCombine,unslice' strings bytes
        $ gocritic check -v -enable='#diagnostic' -disable='#experimental,#opinionated' ./...
    version - print linter version
        $ gocritic version
    doc - get installed checkers documentation
        $ gocritic doc -help
        $ gocritic doc
        $ gocritic doc checkerName

check sub-command examples:

# Runs all stable checkers on `fmt` package:
gocritic check fmt

# Run all stable checkers on `pkg1` and `pkg2`
gocritic check pkg1 pkg2

# Run all stable checkers on `fmt` package and configure rangeExprCopy checker
gocritic check -@rangeExprCopy.sizeThreshold 128 fmt

# Runs specified checkers on `fmt` package:
gocritic check -enable elseif,paramName fmt

# Run all stable checkers on current dir and all its children recursively:
gocritic check ./...

# Like above, but without `appendAssign` check:
gocritic check -disable=appendAssign ./...

# Run all stable checkers on `foo.go` file:
gocritic check foo.go

# Run stable diagnostics over `strings` package:
gocritic check -enable='#diagnostic' -disable='#experimental' strings

# Run all stable and non-opinionated checks:
gocritic check -enableAll -disable='#experimental,#opinionated' ./src/...

To get a list of available checker parameters, run gocritic doc <checkerName>.

In place of a single name, tag can be used. Tag is a named checkers group.

Tags:

  • #diagnostic - kind of checks that detect various errors in code
  • #style - kind of checks that find style issues in code
  • #performance - kind of checks that detect potential performance issues in code
  • #experimental - check is under testing and development. Disabled by default
  • #opinionated - check can be unwanted for some people. Disabled by default
  • #security - kind of checks that find security issues in code. Disabled by default and empty, so will fail if enabled.

Documentation

The latest documentation is available at go-critic.com.

Contributing

This project aims to be contribution-friendly.

Our chats: English or Russian (Telegram website)

We're using an optimistic merging strategy most of the time. In short, this means that if your contribution has some flaws, we can still merge it and then fix them by ourselves. Experimental and work-in-progress checkers are isolated, so nothing bad will happen.

Code style is the same as in Go project, see CodeReviewComments.

See CONTRIBUTING.md for more details. It also describes how to develop a new checker for the linter.

Author: Go-critic
Source Code: https://github.com/go-critic/go-critic 
License: MIT license

#go #golang #linter 

Go-critic: The Most Opinionated Go Source Code Linter for Code Audit
Oral  Brekke

Oral Brekke

1653486300

Piscina: A Fast, Efficient Node.js Worker Thread Pool Implementation

piscina - the node.js worker pool

  • ✔ Fast communication between threads
  • ✔ Covers both fixed-task and variable-task scenarios
  • ✔ Supports flexible pool sizes
  • ✔ Proper async tracking integration
  • ✔ Tracking statistics for run and wait times
  • ✔ Cancellation Support
  • ✔ Supports enforcing memory resource limits
  • ✔ Supports CommonJS, ESM, and TypeScript
  • ✔ Custom task queues
  • ✔ Optional CPU scheduling priorities on Linux

Written in TypeScript.

For Node.js 12.x and higher.

Piscina API

Example

In main.js:

const Piscina = require('piscina');

const piscina = new Piscina({
  filename: path.resolve(__dirname, 'worker.js')
});

(async function() {
  const result = await piscina.runTask({ a: 4, b: 6 });
  console.log(result);  // Prints 10
})();

In worker.js:

module.exports = ({ a, b }) => {
  return a + b;
};

The worker may also be an async function or may return a Promise:

const { promisify } = require('util');
const sleep = promisify(setTimeout);

module.exports = async ({ a, b } => {
  // Fake some async activity
  await sleep(100);
  return a + b;
})

ESM is also supported for both Piscina and workers:

import { Piscina } from 'piscina';

const piscina = new Piscina({
  // The URL must be a file:// URL
  filename: new URL('./worker.mjs', import.meta.url).href
});

(async function () {
  const result = await piscina.runTask({ a: 4, b: 6 });
  console.log(result); // Prints 10
})();

In worker.mjs:

export default ({ a, b }) => {
  return a + b;
};

Cancelable Tasks

Submitted tasks may be canceled using either an AbortController or an EventEmitter:

'use strict';

const Piscina = require('piscina');
const { AbortController } = require('abort-controller');
const { resolve } = require('path');

const piscina = new Piscina({
  filename: resolve(__dirname, 'worker.js')
});

(async function() {
  const abortController = new AbortController();
  try {
    const task = piscina.runTask({ a: 4, b: 6 }, abortController.signal);
    abortController.abort();
    await task;
  } catch (err) {
    console.log('The task was canceled');
  }
})();

To use AbortController, you will need to npm i abort-controller (or yarn add abort-controller).

Alternatively, any EventEmitter that emits an 'abort' event may be used as an abort controller:

'use strict';

const Piscina = require('piscina');
const EventEmitter = require('events');
const { resolve } = require('path');

const piscina = new Piscina({
  filename: resolve(__dirname, 'worker.js')
});

(async function() {
  const ee = new EventEmitter();
  try {
    const task = piscina.runTask({ a: 4, b: 6 }, ee);
    ee.emit('abort');
    await task;
  } catch (err) {
    console.log('The task was canceled');
  }
})();

Delaying Availability of Workers

A worker thread will not be made available to process tasks until Piscina determines that it is "ready". By default, a worker is ready as soon as Piscina loads it and acquires a reference to the exported handler function.

There may be times when the availability of a worker may need to be delayed longer while the worker initializes any resources it may need to operate. To support this case, the worker module may export a Promise that resolves the handler function as opposed to exporting the function directly:

async function initialize() {
  await someAsyncInitializationActivity();
  return ({ a, b }) => a + b;
}

module.exports = initialize();

Piscina will await the resolution of the exported Promise before marking the worker thread available.

Backpressure

When the maxQueue option is set, once the Piscina queue is full, no additional tasks may be submitted until the queue size falls below the limit. The 'drain' event may be used to receive notification when the queue is empty and all tasks have been submitted to workers for processing.

Example: Using a Node.js stream to feed a Piscina worker pool:

'use strict';

const { resolve } = require('path');
const Pool = require('../..');

const pool = new Pool({
  filename: resolve(__dirname, 'worker.js'),
  maxQueue: 'auto'
});

const stream = getStreamSomehow();
stream.setEncoding('utf8');

pool.on('drain', () => {
  if (stream.isPaused()) {
    console.log('resuming...', counter, pool.queueSize);
    stream.resume();
  }
});

stream
  .on('data', (data) => {
    pool.runTask(data);
    if (pool.queueSize === pool.options.maxQueue) {
      console.log('pausing...', counter, pool.queueSize);
      stream.pause();
    }
  })
  .on('error', console.error)
  .on('end', () => {
    console.log('done');
  });

Additional Examples

Additional examples can be found in the GitHub repo at https://github.com/jasnell/piscina/tree/master/examples

Class: Piscina

Piscina works by creating a pool of Node.js Worker Threads to which one or more tasks may be dispatched. Each worker thread executes a single exported function defined in a separate file. Whenever a task is dispatched to a worker, the worker invokes the exported function and reports the return value back to Piscina when the function completes.

This class extends EventEmitter from Node.js.

Constructor: new Piscina([options])

  • The following optional configuration is supported:
    • filename: (string | null) Provides the default source for the code that runs the tasks on Worker threads. This should be an absolute path or an absolute file:// URL to a file that exports a JavaScript function or async function as its default export or module.exports. ES modules are supported.
    • minThreads: (number) Sets the minimum number of threads that are always running for this thread pool. The default is based on the number of available CPUs.
    • maxThreads: (number) Sets the maximum number of threads that are running for this thread pool. The default is based on the number of available CPUs.
    • idleTimeout: (number) A timeout in milliseconds that specifies how long a Worker is allowed to be idle, i.e. not handling any tasks, before it is shut down. By default, this is immediate.
    • maxQueue: (number | string) The maximum number of tasks that may be scheduled to run, but not yet running due to lack of available threads, at a given time. By default, there is no limit. The special value 'auto' may be used to have Piscina calculate the maximum as the square of maxThreads. When 'auto' is used, the calculated maxQueue value may be found by checking the options.maxQueue property.
    • concurrentTasksPerWorker: (number) Specifies how many tasks can share a single Worker thread simultaneously. The default is 1. This generally only makes sense to specify if there is some kind of asynchronous component to the task. Keep in mind that Worker threads are generally not built for handling I/O in parallel.
    • useAtomics: (boolean) Use the Atomics API for faster communication between threads. This is on by default.
    • resourceLimits: (object) See Node.js new Worker options
      • maxOldGenerationSizeMb: (number) The maximum size of each worker threads main heap in MB.
      • maxYoungGenerationSizeMb: (number) The maximum size of a heap space for recently created objects.
      • codeRangeSizeMb: (number) The size of a pre-allocated memory range used for generated code.
    • env: (object) If set, specifies the initial value of process.env inside the worker threads. See Node.js new Worker options for details.
    • argv: (any[]) List of arguments that will be stringified and appended to process.argv in the worker. See Node.js new Worker options for details.
    • execArgv: (string[]) List of Node.js CLI options passed to the worker. See Node.js new Worker options for details.
    • workerData: (any) Any JavaScript value that can be cloned and made available as require('piscina').workerData. See Node.js new Worker options for details. Unlike regular Node.js Worker Threads, workerData must not specify any value requiring a transferList. This is because the workerData will be cloned for each pooled worker.
    • taskQueue: (TaskQueue) By default, Piscina uses a first-in-first-out queue for submitted tasks. The taskQueue option can be used to provide an alternative implementation. See Custom Task Queues for additional detail.
    • niceIncrement: (number) An optional value that decreases priority for the individual threads, i.e. the higher the value, the lower the priority of the Worker threads. This value is only used on Linux and requires the optional nice-napi module to be installed. See nice(2) for more details.

Use caution when setting resource limits. Setting limits that are too low may result in the Piscina worker threads being unusable.

Method: runTask(task[, transferList][, filename][, abortSignal])

Schedules a task to be run on a Worker thread.

  • task: Any value. This will be passed to the function that is exported from filename.
  • transferList: An optional lists of objects that is passed to [postMessage()] when posting task to the Worker, which are transferred rather than cloned.
  • filename: Optionally overrides the filename option passed to the constructor for this task. If no filename was specified to the constructor, this is mandatory.
  • abortSignal: An [AbortSignal][] instance. If passed, this can be used to cancel a task. If the task is already running, the corresponding Worker thread will be stopped. (More generally, any EventEmitter or EventTarget that emits 'abort' events can be passed here.) Abortable tasks cannot share threads regardless of the concurrentTasksPerWorker options.

This returns a Promise for the return value of the (async) function call made to the function exported from filename. If the (async) function throws an error, the returned Promise will be rejected with that error. If the task is aborted, the returned Promise is rejected with an error as well.

Method: destroy()

Stops all Workers and rejects all Promises for pending tasks.

This returns a Promise that is fulfilled once all threads have stopped.

Event: 'error'

An 'error' event is emitted by instances of this class when:

  • Uncaught exceptions occur inside Worker threads that do not currently handle tasks.
  • Unexpected messages are sent from from Worker threads.

All other errors are reported by rejecting the Promise returned from runTask(), including rejections reported by the handler function itself.

Event: 'drain'

A 'drain' event is emitted whenever the queueSize reaches 0.

Property: completed (readonly)

The current number of completed tasks.

Property: duration (readonly)

The length of time (in milliseconds) since this Piscina instance was created.

Property: options (readonly)

A copy of the options that are currently being used by this instance. This object has the same properties as the options object passed to the constructor.

Property: runTime (readonly)

A histogram summary object summarizing the collected run times of completed tasks. All values are expressed in milliseconds.

  • runTime.average {number} The average run time of all tasks
  • runTime.mean {number} The mean run time of all tasks
  • runTime.stddev {number} The standard deviation of collected run times
  • runTime.min {number} The fastest recorded run time
  • runTime.max {number} The slowest recorded run time

All properties following the pattern p{N} where N is a number (e.g. p1, p99) represent the percentile distributions of run time observations. For example, p99 is the 99th percentile indicating that 99% of the observed run times were faster or equal to the given value.

{
  average: 1880.25,
  mean: 1880.25,
  stddev: 1.93,
  min: 1877,
  max: 1882.0190887451172,
  p0_001: 1877,
  p0_01: 1877,
  p0_1: 1877,
  p1: 1877,
  p2_5: 1877,
  p10: 1877,
  p25: 1877,
  p50: 1881,
  p75: 1881,
  p90: 1882,
  p97_5: 1882,
  p99: 1882,
  p99_9: 1882,
  p99_99: 1882,
  p99_999: 1882
}

Property: threads (readonly)

An Array of the Worker instances used by this pool.

Property: queueSize (readonly)

The current number of tasks waiting to be assigned to a Worker thread.

Property: utilization (readonly)

A point-in-time ratio comparing the approximate total mean run time of completed tasks to the total runtime capacity of the pool.

A pools runtime capacity is determined by multiplying the duration by the options.maxThread count. This provides an absolute theoretical maximum aggregate compute time that the pool would be capable of.

The approximate total mean run time is determined by multiplying the mean run time of all completed tasks by the total number of completed tasks. This number represents the approximate amount of time the pool as been actively processing tasks.

The utilization is then calculated by dividing the approximate total mean run time by the capacity, yielding a fraction between 0 and 1.

Property: waitTime (readonly)

A histogram summary object summarizing the collected times tasks spent waiting in the queue. All values are expressed in milliseconds.

  • waitTime.average {number} The average wait time of all tasks
  • waitTime.mean {number} The mean wait time of all tasks
  • waitTime.stddev {number} The standard deviation of collected wait times
  • waitTime.min {number} The fastest recorded wait time
  • waitTime.max {number} The longest recorded wait time

All properties following the pattern p{N} where N is a number (e.g. p1, p99) represent the percentile distributions of wait time observations. For example, p99 is the 99th percentile indicating that 99% of the observed wait times were faster or equal to the given value.

{
  average: 1880.25,
  mean: 1880.25,
  stddev: 1.93,
  min: 1877,
  max: 1882.0190887451172,
  p0_001: 1877,
  p0_01: 1877,
  p0_1: 1877,
  p1: 1877,
  p2_5: 1877,
  p10: 1877,
  p25: 1877,
  p50: 1881,
  p75: 1881,
  p90: 1882,
  p97_5: 1882,
  p99: 1882,
  p99_9: 1882,
  p99_99: 1882,
  p99_999: 1882
}

Static property: isWorkerThread (readonly)

Is true if this code runs inside a Piscina threadpool as a Worker.

Static property: version (readonly)

Provides the current version of this library as a semver string.

Static method: move(value)

By default, any value returned by a worker function will be cloned when returned back to the Piscina pool, even if that object is capable of being transfered. The Piscina.move() method can be used to wrap and mark transferable values such that they will by transfered rather than cloned.

The value may be any object supported by Node.js to be transferable (e.g. ArrayBuffer, any TypedArray, or MessagePort), or any object implementing the Transferable interface.

const { move } = require('piscina');

module.exports = () => {
  return move(new ArrayBuffer(10));
}

The move() method will throw if the value is not transferable.

The object returned by the move() method should not be set as a nested value in an object. If it is used, the move() object itself will be cloned as opposed to transfering the object it wraps.

Interface: Transferable

Objects may implement the Transferable interface to create their own custom transferable objects. This is useful when an object being passed into or from a worker contains a deeply nested transferable object such as an ArrayBuffer or MessagePort.

Transferable objects expose two properties inspected by Piscina to determine how to transfer the object. These properties are named using the special static Piscina.transferableSymbol and Piscina.valueSymbol properties:

The Piscina.transferableSymbol property provides the object (or objects) that are to be included in the transferList.

The Piscina.valueSymbol property provides a surrogate value to transmit in place of the Transferable itself.

Both properties are required.

For example,

const {
  move,
  transferableSymbol,
  valueSymbol
} = require('piscina');

module.exports = () => {
  const obj = {
    a: { b: new Uint8Array(5); },
    c: { new Uint8Array(10); },

    get [transferableSymbol]() {
      // Transfer the two underlying ArrayBuffers
      return [this.a.b.buffer, this.c.buffer];
    }

    get [valueSymbol]() {
      return { a: { b: this.b }, c: this.c };
    }
  };
  return move(obj);
};

Custom Task Queues

By default, Piscina uses a simple array-based first-in-first-out (fifo) task queue. When a new task is submitted and there are no available workers, tasks are pushed on to the queue until a worker becomes available.

If the default fifo queue is not sufficient, user code may replace the task queue implementation with a custom implementation using the taskQueue option on the Piscina constructor.

Custom task queue objects must implement the TaskQueue interface, described below using TypeScript syntax:

interface Task {
  readonly [Piscina.queueOptionsSymbol] : object | null;
}

interface TaskQueue {
  readonly size : number;
  shift () : Task | null;
  remove (task : Task) : void;
  push (task : Task) : void;
}

An example of a custom task queue that uses a shuffled priority queue is available in examples/task-queue;

The special symbol Piscina.queueOptionsSymbol may be set as a property on tasks submitted to runTask() as a way of passing additional options on to the custom TaskQueue implementation. (Note that because the queue options are set as a property on the task, tasks with queue options cannot be submitted as JavaScript primitives).

Current Limitations (Things we're working on / would love help with)

  • Improved Documentation
  • More examples
  • Benchmarks

Performance Notes

Workers are generally optimized for offloading synchronous, compute-intensive operations off the main Node.js event loop thread. While it is possible to perform asynchronous operations and I/O within a Worker, the performance advantages of doing so will be minimal.

Specifically, it is worth noting that asynchronous operations within Node.js, including I/O such as file system operations or CPU-bound tasks such as crypto operations or compression algorithms, are already performed in parallel by Node.js and libuv on a per-process level. This means that there will be little performance impact on moving such async operations into a Piscina worker (see examples/scrypt for example).

Queue Size

Piscina provides the ability to configure the minimum and maximum number of worker threads active in the pool, as well as set limits on the number of tasks that may be queued up waiting for a free worker. It is important to note that setting the maxQueue size too high relative to the number of worker threads can have a detrimental impact on performance and memory usage. Setting the maxQueue size too small can also be problematic as doing so could cause your worker threads to become idle and be shutdown. Our testing has shown that a maxQueue size of approximately the square of the maximum number of threads is generally sufficient and performs well for many cases, but this will vary significantly depending on your workload. It will be important to test and benchmark your worker pools to ensure you've effectively balanced queue wait times, memory usage, and worker pool utilization.

Queue Pressure and Idle Threads

The thread pool maintained by Piscina has both a minimum and maximum limit to the number of threads that may be created. When a Piscina instance is created, it will spawn the minimum number of threads immediately, then create additional threads as needed up to the limit set by maxThreads. Whenever a worker completes a task, a check is made to determine if there is additional work for it to perform. If there is no additional work, the thread is marked idle. By default, idle threads are shutdown immediately, with Piscina ensuring that the pool always maintains at least the minimum.

When a Piscina pool is processing a stream of tasks (for instance, processing http server requests as in the React server-side rendering example in examples/react-ssr), if the rate in which new tasks are received and queued is not sufficient to keep workers from going idle and terminating, the pool can experience a thrashing effect -- excessively creating and terminating workers that will cause a net performance loss. There are a couple of strategies to avoid this churn:

Strategy 1: Ensure that the queue rate of new tasks is sufficient to keep workers from going idle. We refer to this as "queue pressure". If the queue pressure is too low, workers will go idle and terminate. If the queue pressure is too high, tasks will stack up, experience increased wait latency, and consume additional memory.

Strategy 2: Increase the idleTimeout configuration option. By default, idle threads terminate immediately. The idleTimeout option can be used to specify a longer period of time to wait for additional tasks to be submitted before terminating the worker. If the queue pressure is not maintained, this could result in workers sitting idle but those will have less of a performance impact than the thrashing that occurs when threads are repeatedly terminated and recreated.

Strategy 3: Increase the minThreads configuration option. This has the same basic effect as increasing the idleTimeout. If the queue pressure is not high enough, workers may sit idle indefinitely but there will be less of a performance hit.

In applications using Piscina, it will be most effective to use a combination of these three approaches and tune the various configuration parameters to find the optimum combination both for the application workload and the capabilities of the deployment environment. There are no one set of options that are going to work best.

Thread priority on Linux systems

On Linux systems that support nice(2), Piscina is capable of setting the priority of every worker in the pool. To use this mechanism, an additional optional native addon dependency (nice-napi, npm i nice-napi) is required. Once nice-napi is installed, creating a Piscina instance with the niceIncrement configuration option will set the priority for the pool:

const Piscina = require('piscina');
const pool = new Piscina({
  worker: '/absolute/path/to/worker.js',
  niceIncrement: 20
});

The higher the niceIncrement, the lower the CPU scheduling priority will be for the pooled workers which will generally extend the execution time of CPU-bound tasks but will help prevent those threads from stealing CPU time from the main Node.js event loop thread. Whether this is a good thing or not depends entirely on your application and will require careful profiling to get correct.

The key metrics to pay attention to when tuning the niceIncrement are the sampled run times of the tasks in the worker pool (using the runTime property) and the delay of the Node.js main thread event loop.

Multiple Thread Pools and Embedding Piscina as a Dependency

Every Piscina instance creates a separate pool of threads and operates without any awareness of the other. When multiple pools are created in a single application the various threads may contend with one another, and with the Node.js main event loop thread, and may cause an overall reduction in system performance.

Modules that embed Piscina as a dependency should make it clear via documentation that threads are being used. It would be ideal if those would make it possible for users to provide an existing Piscina instance as a configuration option in lieu of always creating their own.

Release Notes

1.6.1

  • Bug fix: Reject if AbortSignal is already aborted
  • Bug Fix: Use once listener for abort event

1.6.0

  • Add the niceIncrement configuration parameter.

1.5.1

  • Bug fixes around abortable task selection.

1.5.0

  • Added Piscina.move()
  • Added Custom Task Queues
  • Added utilization metric
  • Wait for workers to be ready before considering them as candidates
  • Additional examples

1.4.0

  • Added maxQueue = 'auto' to autocalculate the maximum queue size.
  • Added more examples, including an example of implementing a worker as a Node.js native addon.

1.3.0

  • Added the 'drain' event

1.2.0

  • Added support for ESM and file:// URLs
  • Added env, argv, execArgv, and workerData options
  • More examples

1.1.0

  • Added support for Worker Thread resourceLimits

1.0.0

  • Initial release!

The Team

Acknowledgements

Piscina development is sponsored by NearForm Research.

Author: Piscinajs
Source Code: https://github.com/piscinajs/piscina 
License: View license

#node #javascript #typescript #performance 

Piscina: A Fast, Efficient Node.js Worker Thread Pool Implementation
Annie  Emard

Annie Emard

1650597240

Go Critic: The Most Opinionated Go Source Code Linter for Code Audit

go-critic

Highly extensible Go source code linter providing checks currently missing from other linters.

There is never too much static code analysis. Try it out.

Features

  • Almost 100 diagnostics that check for bugs, performance and style issues
  • Extensible without re-compilation with dynamic rules
  • Includes #opinionated checks with very strict and specific requirements
  • Self-documented: gocritic doc <checkname> gives a checker description

Documentation

The latest documentation is available at go-critic.com.

Installation

For most users, using go-critic under golangci-lint is enough.

Precompiled go-critic binaries can be found at releases page.

Instructions below show how to build go-critic from sources.

GO111MODULE=on go get -v -u github.com/go-critic/go-critic/cmd/gocritic

If the command above does not work, you can try cloning this repository under your GOPATH and run make gocritic.

On macOS, you can also install go-critic using MacPorts: sudo port install go-critic

Usage

Be sure gocritic executable is under your $PATH.

Usage of gocritic: gocritic [sub-command] [sub-command args...] Run gocritic without arguments to get help output.

Supported sub-commands:
    check - run linter over specified targets
        $ gocritic check -help
        $ gocritic check -v -enable='paramTypeCombine,unslice' strings bytes
        $ gocritic check -v -enable='#diagnostic' -disable='#experimental,#opinionated' ./...
    version - print linter version
        $ gocritic version
    doc - get installed checkers documentation
        $ gocritic doc -help
        $ gocritic doc
        $ gocritic doc checkerName

check sub-command examples:

# Runs all stable checkers on `fmt` package:
gocritic check fmt

# Run all stable checkers on `pkg1` and `pkg2`
gocritic check pkg1 pkg2

# Run all stable checkers on `fmt` package and configure rangeExprCopy checker
gocritic check -@rangeExprCopy.sizeThreshold 128 fmt

# Runs specified checkers on `fmt` package:
gocritic check -enable elseif,paramName fmt

# Run all stable checkers on current dir and all its children recursively:
gocritic check ./...

# Like above, but without `appendAssign` check:
gocritic check -disable=appendAssign ./...

# Run all stable checkers on `foo.go` file:
gocritic check foo.go

# Run stable diagnostics over `strings` package:
gocritic check -enable='#diagnostic' -disable='#experimental' strings

# Run all stable and non-opinionated checks:
gocritic check -enableAll -disable='#experimental,#opinionated' ./src/...

To get a list of available checker parameters, run gocritic doc <checkerName>.

In place of a single name, tag can be used. Tag is a named checkers group.

Tags:

  • #diagnostic - kind of checks that detect various errors in code
  • #style - kind of checks that find style issues in code
  • #performance - kind of checks that detect potential performance issues in code
  • #experimental - check is under testing and development. Disabled by default
  • #opinionated - check can be unwanted for some people. Disabled by default
  • #security - kind of checks that find security issues in code. Disabled by default and empty, so will fail if enabled.

Contributing

This project aims to be contribution-friendly.

Our chats: English or Russian (Telegram website)

We're using an optimistic merging strategy most of the time. In short, this means that if your contribution has some flaws, we can still merge it and then fix them by ourselves. Experimental and work-in-progress checkers are isolated, so nothing bad will happen.

Code style is the same as in Go project, see CodeReviewComments.

See CONTRIBUTING.md for more details. It also describes how to develop a new checker for the linter.

Author: go-critic
Source Code: https://github.com/go-critic/go-critic
License: MIT License

#go 

Go Critic: The Most Opinionated Go Source Code Linter for Code Audit

How to Monitor App Performance with Instabug for iOS

In this video we will take a look at the performance monitoring platform – Instabug. Instabug offers a comprehensive suite of tools to monitor, diagnose, and address bugs/crashes/other app related interactions. You can even monitor app launch performance, usage statistics, network traffic, and more.

💻 Source Code: https://patreon.com/iOSAcademy

#swift  #performance  #iosdevelopers #ios 

How to Monitor App Performance with Instabug for iOS

WebdriverIO: Automated Testing Based on The WebDriver Protocol

 Next-gen browser and mobile automation test framework for Node.js.

WebdriverIO is a test automation framework that allows you to run tests based on the Webdriver protocol and Appium automation technology. It provides support for your favorite BDD/TDD test framework and will run your tests locally or in the cloud using Sauce Labs, BrowserStack, TestingBot or LambdaTest.

👩‍💻 👨‍💻 Contributing

You like WebdriverIO and want to help making it better? Awesome! Have a look into our Contributor Documentation to get started with setting up the repo.

If you're looking for issues to help out with, check out the issues labelled "good first pick". You can also reach out in our Gitter Channel if you have question on where to start contributing.

🏢 WebdriverIO for Enterprise

Available as part of the Tidelift Subscription.

The maintainers of WebdriverIO and thousands of other packages are working with Tidelift to deliver commercial support and maintenance for the open source dependencies you use to build your applications. Save time, reduce risk, and improve code health, while paying the maintainers of the exact dependencies you use. Learn more.

📦 Packages

This repository contains some of the core packages of the WebdriverIO project. There are many wonderful curated resources the WebdriverIO community has put together.

Did you build a WebdriverIO service or reporter? That's awesome! Please add it to our configuration wizard and docs (e.g. like in this example commit) as well as to our awesome-webdriverio list. Thank you! 🙏 ❤️

Core

  • webdriver - A Node.js bindings implementation for the W3C WebDriver and Mobile JSONWire Protocol
  • devtools - A Chrome DevTools protocol binding that maps WebDriver commands into Chrome DevTools commands using Puppeteer
  • webdriverio - Next-gen browser and mobile automation test framework for Node.js
  • @wdio/cli - A WebdriverIO testrunner command line interface

Helper

  • @wdio/config - A helper utility to parse and validate WebdriverIO options
  • @wdio/logger - A helper utility for logging of WebdriverIO packages
  • @wdio/protocols - Utility package providing information about automation protocols
  • @wdio/repl - A WDIO helper utility to provide a repl interface for WebdriverIO
  • @wdio/reporter - A WebdriverIO utility to help reporting all events
  • @wdio/runner - A WebdriverIO service that runs tests in arbitrary environments
  • @wdio/sync - A WebdriverIO plugin. Helper module to run WebdriverIO commands synchronously
  • @wdio/utils - A WDIO helper utility to provide several utility functions used across the project

Reporter

Services

Runner

Framework Adapters

Others

🤝 Project Governance

This project is maintained by awesome people following a common set of rules and treating each other with respect and appreciation.

👨‍🍳 👩‍🍳 Backers

Become a backer and show your support to our open source project.

💸 Sponsors

Does your company use WebdriverIO? Ask your manager or marketing team if your company would be interested in supporting our project. Support will allow the maintainers to dedicate more time for maintenance and new features for everyone. Also, your company's logo will show on GitHub - who doesn't want a little extra exposure? Here's the info.

🔰 Badge

Show the world you're using webdriver.io → tested with webdriverio

GitHub markup

[![tested with webdriver.io](https://img.shields.io/badge/tested%20with-webdriver.io-%23ea5906)](https://webdriver.io/)

HTML

<a href="https://webdriver.io/">
    <img alt="WebdriverIO" src="https://img.shields.io/badge/tested%20with-webdriver.io-%23ea5906">
</a>

Author: Webdriverio
Source Code: https://github.com/webdriverio/webdriverio 
License: MIT License

#node #javascript #automation #performance 

WebdriverIO: Automated Testing Based on The WebDriver Protocol
Desmond  Gerber

Desmond Gerber

1645338900

Sharp: The Fastest Module for Resizing JPEG, PNG, WebP and TIFF Images

sharp

The typical use case for this high speed Node.js module is to convert large images in common formats to smaller, web-friendly JPEG, PNG, WebP and AVIF images of varying dimensions.

Resizing an image is typically 4x-5x faster than using the quickest ImageMagick and GraphicsMagick settings due to its use of libvips.

Colour spaces, embedded ICC profiles and alpha transparency channels are all handled correctly. Lanczos resampling ensures quality is not sacrificed for speed.

As well as image resizing, operations such as rotation, extraction, compositing and gamma correction are available.

Most modern macOS, Windows and Linux systems running Node.js >= 12.13.0 do not require any additional install or runtime dependencies.

Documentation

Visit sharp.pixelplumbing.com for complete installation instructions, API documentation, benchmark tests and changelog.

Examples

npm install sharp
const sharp = require('sharp');

Callback

sharp(inputBuffer)
  .resize(320, 240)
  .toFile('output.webp', (err, info) => { ... });

Promise

sharp('input.jpg')
  .rotate()
  .resize(200)
  .jpeg({ mozjpeg: true })
  .toBuffer()
  .then( data => { ... })
  .catch( err => { ... });

Async/await

const semiTransparentRedPng = await sharp({
  create: {
    width: 48,
    height: 48,
    channels: 4,
    background: { r: 255, g: 0, b: 0, alpha: 0.5 }
  }
})
  .png()
  .toBuffer();

Stream

const roundedCorners = Buffer.from(
  '<svg><rect x="0" y="0" width="200" height="200" rx="50" ry="50"/></svg>'
);

const roundedCornerResizer =
  sharp()
    .resize(200, 200)
    .composite([{
      input: roundedCorners,
      blend: 'dest-in'
    }])
    .png();

readableStream
  .pipe(roundedCornerResizer)
  .pipe(writableStream);

Contributing

A guide for contributors covers reporting bugs, requesting features and submitting code changes.

Author: Lovell
Source Code: https://github.com/lovell/sharp 
License: Apache-2.0 License

#node #javascript #svg #performance 

Sharp: The Fastest Module for Resizing JPEG, PNG, WebP and TIFF Images

3将来の連絡先センターのパフォーマンス管理のヒント

カスタマーサポートの需要は、規模だけでなく複雑さも増し続けています。ことその後、驚きいいえ10個のうち8コンタクトセンターは、需要の増加にサービスを提供し、彼らはパンデミックの間に失われた可能性があり、従業員を交換するために来年までスタッフに計画しています。それでも、これらの組織のほとんど(Ventana Researchによると62%)は、 依然としてスプレッドシートを使用して、新入社員のエージェントのパフォーマンスを追跡しています。

この2番目の統計は、コンタクトセンターのリーダーが、最も重要な資産である従業員を管理するためにレガシーテクノロジーと時間のかかる方法を使用して立ち往生していることを示唆しているため、厄介です。残念ながら、これらの時代遅れの慣行はリソースを圧迫し、従業員の離職率を高め、通常はつまらない結果を生み出します。 

このドミノ効果を止めるための鍵は、従来のパフォーマンスレビューを超えることです。代わりに、コンタクトセンターのパフォーマンス管理に対して、全体的で将来を見据えたアプローチを採用してください。 

1.ポストコビッドコンタクトセンター(およびポストコビッド従業員)の役割を受け入れる

パンデミック後の世界では、優れた顧客体験は優れた従業員の関与から始まることは一般的な知識です。しかし、その理由を理解することが重要です。

まず、カスタマーサポートに対する需要が増えるだけでなく、カスタマーサポート自体もより厳しいものになります。パンデミックは永遠に続くことはありませんが、ここ数年は消費者行動に意味のある変化をもたらし、ここにとどまります。たとえば、100万件を超えるサービスコールの最近の分析によると、エージェントは特に不安でイライラしている顧客に対応しています。この同じ分析により、「難しい」と評価されたコールは、前の年と比較して2倍になっていることが明らかになりました。 

第二に、従業員は、機械の歯車だけでなく、人間として見られることをますます要求しています。ハーバードビジネスレビューによると、2021年に。 

  • 現在、調査対象の労働者の88%は、場所と柔軟性が新しい雇用機会に役割を果たすことを期待しています。
  • 76%は、労働者の優先順位がライフスタイル(家族など)と物理的なワークスペースへの近さにシフトすると考えています
  • 86%は、アウトプットよりもアウトカムを測定する会社で働きたいと考えています

このエージェントの考え方の変化は、パフォーマンス管理プログラムが個々のレベルでエージェントを考慮する必要があることを意味します。これは、個人的な目標、エージェントの幸福、および成長の機会を統合することが、近い将来、パフォーマンス管理の不可欠な部分になることを意味します。 

また、コンタクトセンターのリーダーとマネージャーは、ビジネスの目標だけでなく、個人の目標に対する進捗状況を測定することに加えて、エージェントとのオープンで正直な話し合いを歓迎する必要があることも意味します。

2.パフォーマンスレビューを再調整します

エージェントがコンタクトセンターの成功に不可欠であることを認める場合、パフォーマンス管理は日常業務で積極的な役割を果たす必要があります。これは、毎年または四半期ごとに出力を測定することから離れることを意味します。パンデミックが発生する前でさえ、採用マネージャーの38%だけが、年次レビューがビジネスニーズに追いつくことができると感じていました。

代わりに、最新のコンタクトセンター(およびその中で働くエージェント)の要求を満たすために、マネージャーはユーティリティの提供に集中する必要があります。つまり、パフォーマンス管理は有用である必要があり、それには正当な理由があります。

2020年に、Gartnerは、パフォーマンス管理の有用性が高いと認識されている企業のサンプルセットを、低い企業と比較しました。彼らは、公益事業の多い企業では、従業員の関与が14%高く、職場のパフォーマンスが24%高いことを発見しました。

実用性の高いパフォーマンス管理のメリットを享受するには、メトリックが日常のコンタクトセンターの運用で積極的な役割を果たす必要があります。毎日のシフトハドル、ツールボックストーク、および定期的なレビューにより、これらのメトリックが個々のエージェントのコンテキストを確実に持つようになります。従業員のための一対一のセッションでは、デモンストレーションの懸念をし、キャリア開発の各段階での良い習慣を強化します。フィードバックループは、高性能のレビューユーティリティを維持するための追加の手段を提供します。

科学ベースの助けの顧客サポートチームによる行動への「計画-DO-チェック-行為を」フィードバックループプットこれは自分の過ちから学び、他の場所に適用することができます良いアイデアを識別します。

コンタクトセンターのパフォーマンス管理

カスタマーサービスがより仮想的で相互依存的に成長し続けるにつれて、コンタクトセンターのエージェントは同僚やマネージャーから自然なフィードバックを受け取ることが少なくなります。正しく実装されたフィードバックループは、この問題に対する健全なカウンターウェイトを提供し、エージェントが日常業務を通じてロックステップを維持できるようにします。 

3.行動ベースの科学でエージェントを導きます

パフォーマンス管理に対するこの「縮小」アプローチのもう1つの利点は、トレーニングの機会が十分にあることです。 

ほとんどのコンタクトセンターは、すべてのレベルで継続的なトレーニングを提供しています(新入社員トレーニング以外)。しかし、トレーニング時間は限られています。コンタクトセンターは、平均して、すべてのレベルにわたる年間の継続的なトレーニングに10日以下を割り当てます。低頻度に加えて、コンタクトセンターの37%は、従業員のトレーニングの大部分を独学で行っています。

トレーニングをまったく行わないよりは確かに優れていますが、このアプローチは、特にそのようなイニシアチブが行動ベースの科学に基づいている場合、継続的改善の確立された利点を活用できません。最高の訓練を受けたエージェントでさえ、直感に頼り、リアルタイムで顧客に対応する際の偏見や心理的誤謬に無意識のうちに導かれます。これらの状況では、広範でコンテキストが解除された年次トレーニングはほとんどメリットがありません。

行動経済学者、心理学者、神経科学者による研究は、従業員のトレーニングに対してはるかに効果的なアプローチを生み出しました。彼らの調査結果は、エージェントが顧客とリアルタイムでやり取りするときにエージェントに提示される微妙で戦略的な介入の使用をサポートしています(リアルタイムコーチングと呼ばれることもあります)。これらの「ナッジ」は、選択を制限することなくガイドするように設計されています。このような行動的アプローチの実装にかかる費用がいかに少ないかを考えると、結果はかなりのものになる可能性があります。

たとえば、英国の行動エコノミストはVirgin Airlinesと提携して、どの行動ベースのアプローチが全体的な燃料費を最も削減できるかを確認しました。335人の飛行船長は4つの異なるグループに分けられました。対照群は、彼らの燃料使用が監視されるだろうとだけ言われました。別のグループは、燃料使用量に関する毎月のフィードバックを受け取りました。最後の2つのグループはそれぞれ、上記のフィードバック「ナッジング」アプローチのバリエーションを受け取りました。 

実験の終わりに、3つのグループすべてがコントロールよりも優れたパフォーマンスを示しました。しかし、行動ナッジを受けた2つのグループは、何よりも優れたパフォーマンスを示しました。全体として、この実験は燃料費を330万ポンド節約し、実験自体の費用を小さくしました。

本当のインパクトを生み出す 

コックピットにあるかコンタクトセンターにあるかにかかわらず、最新のパフォーマンス管理が影響を与えるには、その瞬間にそれを実行する必要があることは明らかです。幸いなことに、これらの現代は、これを実現するのに役立つ人工知能(AI)の多くの進歩ももたらしました。

そこでCrestaが登場します。Expertise.AI™は、リアルタイムのパフォーマンス管理と行動の「ナッジング」をまったく新しいレベルに引き上げています。

リンク: https://cresta.com/blog/contact-center-performance-management-tips

#ai #performance 

3将来の連絡先センターのパフォーマンス管理のヒント
Oda  Shields

Oda Shields

1637370000

Learn About Code Abstraction and Performance

In this video, we will look at abstractions of code and the performance of different programming languages and what different code can impact the performance. But the specific abstractions may give you improvements in readability and structure.

#performance #

 

Learn About Code Abstraction and Performance
Nabunya  Jane

Nabunya Jane

1634263200

Mobile Device App Performance Testing Guide

As of the second quarter of 2020, the number of available apps on the Google Play Store was 3.6 million, followed by 1.82 million on Apple App Store. This represents a 6% increase for Google Play Store and a 6.95% increase for Apple App Store than the previous quarter. Statista also projects a rise in the number of app downloads, more than 350 billion by 2021.

#mobile #performance 

Mobile Device App Performance Testing Guide
Ruthie  Blanda

Ruthie Blanda

1632474000

Learn About 7 Eloquent Performance Tips

Let's finish reviewing this week's challenge about Eloquent performance, with your "tips'n'tricks" and my opinion about them.

 

00:00 Intro
00:41 Write Tests Before Refactoring
03:55 Query Builder and Raw Queries
07:08 Load Only Columns You Need
08:27 SubSelect for belongsTo
09:32 Round the Average: PHP or DB?
12:04 Caching. Kind of.
13:31 Windows Web-Server Performance
- Challenge - Your Pull Requests: https://github.com/LaravelDaily/Laravel-Challenge-Movie-Table-Eloquent/pulls\

#performance #laravel 

Learn About 7 Eloquent Performance Tips

Google Performance Monitoring for Firebase

Google Performance Monitoring for Firebase

A Flutter plugin to use the Google Performance Monitoring for Firebase API.

For Flutter plugins for other Firebase products, see README.md.

Usage

To use this plugin, first connect to Firebase by following the instructions for Android / iOS. Then add this plugin by following these instructions. See the example folder for details.

You can confirm that Performance Monitoring results appear in the Firebase Performance Monitoring console. Results should appear within a few minutes.

Define a Custom Trace

A custom trace is a report of performance data associated with some of the code in your app. To learn more about custom traces, see the Performance Monitoring overview.

final Trace myTrace = FirebasePerformance.instance.newTrace("test_trace");
await myTrace.start();

final Item item = cache.fetch("item");
if (item != null) {
  await myTrace.incrementMetric("item_cache_hit", 1);
} else {
  await myTrace.incrementMetric("item_cache_miss", 1);
}

await myTrace.stop();

Add monitoring for specific network requests

Performance Monitoring collects network requests automatically. Although this includes most network requests for your app, some might not be reported. To include specific network requests in Performance Monitoring, add the following code to your app:

class _MetricHttpClient extends BaseClient {
  _MetricHttpClient(this._inner);

  final Client _inner;

  @override
  Future<StreamedResponse> send(BaseRequest request) async {
    final HttpMetric metric = FirebasePerformance.instance
        .newHttpMetric(request.url.toString(), HttpMethod.Get);

    await metric.start();

    StreamedResponse response;
    try {
      response = await _inner.send(request);
      metric
        ..responsePayloadSize = response.contentLength
        ..responseContentType = response.headers['Content-Type']
        ..requestPayloadSize = request.contentLength
        ..httpResponseCode = response.statusCode;
    } finally {
      await metric.stop();
    }

    return response;
  }
}

class _MyAppState extends State<MyApp> {
.
.
.
  Future<void> testHttpMetric() async {
    final _MetricHttpClient metricHttpClient = _MetricHttpClient(Client());

    final Request request =
        Request("SEND", Uri.parse("https://www.google.com"));

    metricHttpClient.send(request);
  }
.
.
.
}

Getting Started

See the example directory for a complete sample app using Google Performance Monitoring for Firebase.

Issues and feedback

Please file FlutterFire specific issues, bugs, or feature requests in our issue tracker.

Plugin issues that are not specific to Flutterfire can be filed in the Flutter issue tracker.

To contribute a change to this plugin, please review our contribution guide and open a pull request.

Use this package as a library

Depend on it

Run this command:

With Flutter:

 $ flutter pub add firebase_performance

This will add a line like this to your package's pubspec.yaml (and run an implicit dart pub get):


dependencies:
  firebase_performance: ^0.7.0+8

Alternatively, your editor might support flutter pub get. Check the docs for your editor to learn more.

Import it

Now in your Dart code, you can use:

import 'package:firebase_performance/firebase_performance.dart';

example/lib/main.dart

// ignore_for_file: require_trailing_commas
// Copyright 2019 The Chromium Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.

// @dart=2.9

import 'dart:async';

import 'package:http/http.dart';
import 'package:flutter/material.dart';
import 'package:pedantic/pedantic.dart';

import 'package:firebase_performance/firebase_performance.dart';

void main() => runApp(const MyApp());

void myLog(String msg) {
  print('My Log: $msg');
}

class MyApp extends StatefulWidget {
  const MyApp({Key key}) : super(key: key);

  @override
  _MyAppState createState() => _MyAppState();
}

class _MetricHttpClient extends BaseClient {
  _MetricHttpClient(this._inner);

  final Client _inner;

  @override
  Future<StreamedResponse> send(BaseRequest request) async {
    final HttpMetric metric = FirebasePerformance.instance
        .newHttpMetric(request.url.toString(), HttpMethod.Get);

    await metric.start();

    StreamedResponse response;
    try {
      response = await _inner.send(request);
      myLog(
          'Called ${request.url} with custom monitoring, response code: ${response.statusCode}');

      metric
        ..responsePayloadSize = response.contentLength
        ..responseContentType = response.headers['Content-Type']
        ..requestPayloadSize = request.contentLength
        ..httpResponseCode = response.statusCode;

      await metric.putAttribute('score', '15');
      await metric.putAttribute('to_be_removed', 'should_not_be_logged');
    } finally {
      await metric.removeAttribute('to_be_removed');
      await metric.stop();
    }

    unawaited(metric
        .getAttributes()
        .then((attributes) => myLog('Http metric attributes: $attributes')));

    String score = metric.getAttribute('score');
    myLog('Http metric score attribute value: $score');

    return response;
  }
}

class _MyAppState extends State<MyApp> {
  FirebasePerformance _performance = FirebasePerformance.instance;
  bool _isPerformanceCollectionEnabled = false;
  String _performanceCollectionMessage =
      'Unknown status of performance collection.';
  bool _trace1HasRan = false;
  bool _trace2HasRan = false;
  bool _customHttpMetricHasRan = false;

  @override
  void initState() {
    super.initState();
    _togglePerformanceCollection();
  }

  Future<void> _togglePerformanceCollection() async {
    await _performance
        .setPerformanceCollectionEnabled(!_isPerformanceCollectionEnabled);

    final bool isEnabled = await _performance.isPerformanceCollectionEnabled();
    setState(() {
      _isPerformanceCollectionEnabled = isEnabled;
      _performanceCollectionMessage = _isPerformanceCollectionEnabled
          ? 'Performance collection is enabled.'
          : 'Performance collection is disabled.';
    });
  }

  Future<void> _testTrace1() async {
    setState(() {
      _trace1HasRan = false;
    });

    final Trace trace = _performance.newTrace('test_trace_1');
    await trace.start();
    await trace.putAttribute('favorite_color', 'blue');
    await trace.putAttribute('to_be_removed', 'should_not_be_logged');

    for (int i = 0; i < 10; i++) {
      await trace.incrementMetric('sum', i);
    }

    await trace.removeAttribute('to_be_removed');
    await trace.stop();

    unawaited(trace
        .getMetric('sum')
        .then((sumValue) => myLog('test_trace_1 sum value: $sumValue')));
    unawaited(trace
        .getAttributes()
        .then((attributes) => myLog('test_trace_1 attributes: $attributes')));

    String favoriteColor = trace.getAttribute('favorite_color');
    myLog('test_trace_1 favorite_color: $favoriteColor');

    setState(() {
      _trace1HasRan = true;
    });
  }

  Future<void> _testTrace2() async {
    setState(() {
      _trace2HasRan = false;
    });

    final Trace trace = await FirebasePerformance.startTrace('test_trace_2');

    int sum = 0;
    for (int i = 0; i < 10000000; i++) {
      sum += i;
    }
    await trace.setMetric('sum', sum);
    await trace.stop();

    unawaited(trace
        .getMetric('sum')
        .then((sumValue) => myLog('test_trace_2 sum value: $sumValue')));

    setState(() {
      _trace2HasRan = true;
    });
  }

  Future<void> _testCustomHttpMetric() async {
    setState(() {
      _customHttpMetricHasRan = false;
    });

    final _MetricHttpClient metricHttpClient = _MetricHttpClient(Client());

    final Request request = Request(
      'SEND',
      Uri.parse('https://www.google.com'),
    );

    unawaited(metricHttpClient.send(request));

    setState(() {
      _customHttpMetricHasRan = true;
    });
  }

  Future<void> _testAutomaticHttpMetric() async {
    Response response = await get(Uri.parse('https://www.facebook.com'));
    myLog('Called facebook, response code: ${response.statusCode}');
  }

  @override
  Widget build(BuildContext context) {
    const textStyle = TextStyle(color: Colors.lightGreenAccent, fontSize: 25);
    return MaterialApp(
      home: Scaffold(
        appBar: AppBar(
          title: const Text('Firebase Performance Example'),
        ),
        body: Center(
          child: Column(
            children: <Widget>[
              Text(_performanceCollectionMessage),
              ElevatedButton(
                onPressed: _togglePerformanceCollection,
                child: const Text('Toggle Data Collection'),
              ),
              ElevatedButton(
                onPressed: _testTrace1,
                child: const Text('Run Trace One'),
              ),
              Text(
                _trace1HasRan ? 'Trace Ran!' : '',
                style: textStyle,
              ),
              ElevatedButton(
                onPressed: _testTrace2,
                child: const Text('Run Trace Two'),
              ),
              Text(
                _trace2HasRan ? 'Trace Ran!' : '',
                style: textStyle,
              ),
              ElevatedButton(
                onPressed: _testCustomHttpMetric,
                child: const Text('Run Custom HttpMetric'),
              ),
              Text(
                _customHttpMetricHasRan ? 'Custom HttpMetric Ran!' : '',
                style: textStyle,
              ),
              ElevatedButton(
                onPressed: _testAutomaticHttpMetric,
                child: const Text('Run Automatic HttpMetric'),
              ),
            ],
          ),
        ),
      ),
    );
  }
}

#firebase  #performance #flutter 

Google Performance Monitoring for Firebase