BigCache: Efficient Key/value Cache for Gigabytes Of Data

BigCache

Fast, concurrent, evicting in-memory cache written to keep big number of entries without impact on performance. BigCache keeps entries on heap but omits GC for them. To achieve that, operations on byte slices take place, therefore entries (de)serialization in front of the cache will be needed in most use cases.

Requires Go 1.12 or newer.

Usage

Simple initialization

import "github.com/allegro/bigcache/v3"

cache, _ := bigcache.NewBigCache(bigcache.DefaultConfig(10 * time.Minute))

cache.Set("my-unique-key", []byte("value"))

entry, _ := cache.Get("my-unique-key")
fmt.Println(string(entry))

Custom initialization

When cache load can be predicted in advance then it is better to use custom initialization because additional memory allocation can be avoided in that way.

import (
    "log"

    "github.com/allegro/bigcache/v3"
)

config := bigcache.Config {
        // number of shards (must be a power of 2)
        Shards: 1024,

        // time after which entry can be evicted
        LifeWindow: 10 * time.Minute,

        // Interval between removing expired entries (clean up).
        // If set to <= 0 then no action is performed.
        // Setting to < 1 second is counterproductive — bigcache has a one second resolution.
        CleanWindow: 5 * time.Minute,

        // rps * lifeWindow, used only in initial memory allocation
        MaxEntriesInWindow: 1000 * 10 * 60,

        // max entry size in bytes, used only in initial memory allocation
        MaxEntrySize: 500,

        // prints information about additional memory allocation
        Verbose: true,

        // cache will not allocate more memory than this limit, value in MB
        // if value is reached then the oldest entries can be overridden for the new ones
        // 0 value means no size limit
        HardMaxCacheSize: 8192,

        // callback fired when the oldest entry is removed because of its expiration time or no space left
        // for the new entry, or because delete was called. A bitmask representing the reason will be returned.
        // Default value is nil which means no callback and it prevents from unwrapping the oldest entry.
        OnRemove: nil,

        // OnRemoveWithReason is a callback fired when the oldest entry is removed because of its expiration time or no space left
        // for the new entry, or because delete was called. A constant representing the reason will be passed through.
        // Default value is nil which means no callback and it prevents from unwrapping the oldest entry.
        // Ignored if OnRemove is specified.
        OnRemoveWithReason: nil,
    }

cache, initErr := bigcache.NewBigCache(config)
if initErr != nil {
    log.Fatal(initErr)
}

cache.Set("my-unique-key", []byte("value"))

if entry, err := cache.Get("my-unique-key"); err == nil {
    fmt.Println(string(entry))
}

LifeWindow & CleanWindow

LifeWindow is a time. After that time, an entry can be called dead but not deleted.

CleanWindow is a time. After that time, all the dead entries will be deleted, but not the entries that still have life.

Benchmarks

Three caches were compared: bigcache, freecache and map. Benchmark tests were made using an i7-6700K CPU @ 4.00GHz with 32GB of RAM on Ubuntu 18.04 LTS (5.2.12-050212-generic).

Benchmarks source code can be found here

Writes and reads

go version
go version go1.13 linux/amd64

go test -bench=. -benchmem -benchtime=4s ./... -timeout 30m
goos: linux
goarch: amd64
pkg: github.com/allegro/bigcache/v3/caches_bench
BenchmarkMapSet-8                         12999889           376 ns/op         199 B/op           3 allocs/op
BenchmarkConcurrentMapSet-8                4355726          1275 ns/op         337 B/op           8 allocs/op
BenchmarkFreeCacheSet-8                   11068976           703 ns/op         328 B/op           2 allocs/op
BenchmarkBigCacheSet-8                    10183717           478 ns/op         304 B/op           2 allocs/op
BenchmarkMapGet-8                         16536015           324 ns/op          23 B/op           1 allocs/op
BenchmarkConcurrentMapGet-8               13165708           401 ns/op          24 B/op           2 allocs/op
BenchmarkFreeCacheGet-8                   10137682           690 ns/op         136 B/op           2 allocs/op
BenchmarkBigCacheGet-8                    11423854           450 ns/op         152 B/op           4 allocs/op
BenchmarkBigCacheSetParallel-8            34233472           148 ns/op         317 B/op           3 allocs/op
BenchmarkFreeCacheSetParallel-8           34222654           268 ns/op         350 B/op           3 allocs/op
BenchmarkConcurrentMapSetParallel-8       19635688           240 ns/op         200 B/op           6 allocs/op
BenchmarkBigCacheGetParallel-8            60547064            86.1 ns/op         152 B/op           4 allocs/op
BenchmarkFreeCacheGetParallel-8           50701280           147 ns/op         136 B/op           3 allocs/op
BenchmarkConcurrentMapGetParallel-8       27353288           175 ns/op          24 B/op           2 allocs/op
PASS
ok      github.com/allegro/bigcache/v3/caches_bench    256.257s

Writes and reads in bigcache are faster than in freecache. Writes to map are the slowest.

GC pause time

go version
go version go1.13 linux/amd64

go run caches_gc_overhead_comparison.go

Number of entries:  20000000
GC pause for bigcache:  1.506077ms
GC pause for freecache:  5.594416ms
GC pause for map:  9.347015ms
go version
go version go1.13 linux/arm64

go run caches_gc_overhead_comparison.go
Number of entries:  20000000
GC pause for bigcache:  22.382827ms
GC pause for freecache:  41.264651ms
GC pause for map:  72.236853ms

Test shows how long are the GC pauses for caches filled with 20mln of entries. Bigcache and freecache have very similar GC pause time.

Memory usage

You may encounter system memory reporting what appears to be an exponential increase, however this is expected behaviour. Go runtime allocates memory in chunks or 'spans' and will inform the OS when they are no longer required by changing their state to 'idle'. The 'spans' will remain part of the process resource usage until the OS needs to repurpose the address. Further reading available here.

How it works

BigCache relies on optimization presented in 1.5 version of Go (issue-9477). This optimization states that if map without pointers in keys and values is used then GC will omit its content. Therefore BigCache uses map[uint64]uint32 where keys are hashed and values are offsets of entries.

Entries are kept in byte slices, to omit GC again. Byte slices size can grow to gigabytes without impact on performance because GC will only see single pointer to it.

Collisions

BigCache does not handle collisions. When new item is inserted and it's hash collides with previously stored item, new item overwrites previously stored value.

Bigcache vs Freecache

Both caches provide the same core features but they reduce GC overhead in different ways. Bigcache relies on map[uint64]uint32, freecache implements its own mapping built on slices to reduce number of pointers.

Results from benchmark tests are presented above. One of the advantage of bigcache over freecache is that you don’t need to know the size of the cache in advance, because when bigcache is full, it can allocate additional memory for new entries instead of overwriting existing ones as freecache does currently. However hard max size in bigcache also can be set, check HardMaxCacheSize.

HTTP Server

This package also includes an easily deployable HTTP implementation of BigCache, which can be found in the server package.

More

Bigcache genesis is described in allegro.tech blog post: writing a very fast cache service in Go

Author: Allegro
Source Code: https://github.com/allegro/bigcache 
License: Apache-2.0 License

#go #golang #caching 

What is GEEK

Buddha Community

BigCache: Efficient Key/value Cache for Gigabytes Of Data
 iOS App Dev

iOS App Dev

1620466520

Your Data Architecture: Simple Best Practices for Your Data Strategy

If you accumulate data on which you base your decision-making as an organization, you should probably think about your data architecture and possible best practices.

If you accumulate data on which you base your decision-making as an organization, you most probably need to think about your data architecture and consider possible best practices. Gaining a competitive edge, remaining customer-centric to the greatest extent possible, and streamlining processes to get on-the-button outcomes can all be traced back to an organization’s capacity to build a future-ready data architecture.

In what follows, we offer a short overview of the overarching capabilities of data architecture. These include user-centricity, elasticity, robustness, and the capacity to ensure the seamless flow of data at all times. Added to these are automation enablement, plus security and data governance considerations. These points from our checklist for what we perceive to be an anticipatory analytics ecosystem.

#big data #data science #big data analytics #data analysis #data architecture #data transformation #data platform #data strategy #cloud data platform #data acquisition

Gerhard  Brink

Gerhard Brink

1620629020

Getting Started With Data Lakes

Frameworks for Efficient Enterprise Analytics

The opportunities big data offers also come with very real challenges that many organizations are facing today. Often, it’s finding the most cost-effective, scalable way to store and process boundless volumes of data in multiple formats that come from a growing number of sources. Then organizations need the analytical capabilities and flexibility to turn this data into insights that can meet their specific business objectives.

This Refcard dives into how a data lake helps tackle these challenges at both ends — from its enhanced architecture that’s designed for efficient data ingestion, storage, and management to its advanced analytics functionality and performance flexibility. You’ll also explore key benefits and common use cases.

Introduction

As technology continues to evolve with new data sources, such as IoT sensors and social media churning out large volumes of data, there has never been a better time to discuss the possibilities and challenges of managing such data for varying analytical insights. In this Refcard, we dig deep into how data lakes solve the problem of storing and processing enormous amounts of data. While doing so, we also explore the benefits of data lakes, their use cases, and how they differ from data warehouses (DWHs).


This is a preview of the Getting Started With Data Lakes Refcard. To read the entire Refcard, please download the PDF from the link above.

#big data #data analytics #data analysis #business analytics #data warehouse #data storage #data lake #data lake architecture #data lake governance #data lake management

Cyrus  Kreiger

Cyrus Kreiger

1618039260

How Has COVID-19 Impacted Data Science?

The COVID-19 pandemic disrupted supply chains and brought economies around the world to a standstill. In turn, businesses need access to accurate, timely data more than ever before. As a result, the demand for data analytics is skyrocketing as businesses try to navigate an uncertain future. However, the sudden surge in demand comes with its own set of challenges.

Here is how the COVID-19 pandemic is affecting the data industry and how enterprises can prepare for the data challenges to come in 2021 and beyond.

#big data #data #data analysis #data security #data integration #etl #data warehouse #data breach #elt

Macey  Kling

Macey Kling

1597579680

Applications Of Data Science On 3D Imagery Data

CVDC 2020, the Computer Vision conference of the year, is scheduled for 13th and 14th of August to bring together the leading experts on Computer Vision from around the world. Organised by the Association of Data Scientists (ADaSCi), the premier global professional body of data science and machine learning professionals, it is a first-of-its-kind virtual conference on Computer Vision.

The second day of the conference started with quite an informative talk on the current pandemic situation. Speaking of talks, the second session “Application of Data Science Algorithms on 3D Imagery Data” was presented by Ramana M, who is the Principal Data Scientist in Analytics at Cyient Ltd.

Ramana talked about one of the most important assets of organisations, data and how the digital world is moving from using 2D data to 3D data for highly accurate information along with realistic user experiences.

The agenda of the talk included an introduction to 3D data, its applications and case studies, 3D data alignment, 3D data for object detection and two general case studies, which are-

  • Industrial metrology for quality assurance.
  • 3d object detection and its volumetric analysis.

This talk discussed the recent advances in 3D data processing, feature extraction methods, object type detection, object segmentation, and object measurements in different body cross-sections. It also covered the 3D imagery concepts, the various algorithms for faster data processing on the GPU environment, and the application of deep learning techniques for object detection and segmentation.

#developers corner #3d data #3d data alignment #applications of data science on 3d imagery data #computer vision #cvdc 2020 #deep learning techniques for 3d data #mesh data #point cloud data #uav data

Uriah  Dietrich

Uriah Dietrich

1618457700

What Is ETLT? Merging the Best of ETL and ELT Into a Single ETLT Data Integration Strategy

Data integration solutions typically advocate that one approach – either ETL or ELT – is better than the other. In reality, both ETL (extract, transform, load) and ELT (extract, load, transform) serve indispensable roles in the data integration space:

  • ETL is valuable when it comes to data quality, data security, and data compliance. It can also save money on data warehousing costs. However, ETL is slow when ingesting unstructured data, and it can lack flexibility.
  • ELT is fast when ingesting large amounts of raw, unstructured data. It also brings flexibility to your data integration and data analytics strategies. However, ELT sacrifices data quality, security, and compliance in many cases.

Because ETL and ELT present different strengths and weaknesses, many organizations are using a hybrid “ETLT” approach to get the best of both worlds. In this guide, we’ll help you understand the “why, what, and how” of ETLT, so you can determine if it’s right for your use-case.

#data science #data #data security #data integration #etl #data warehouse #data breach #elt #bid data