LLaMA-rs: Run LLaMA inference on CPU, with Rust

LLaMA-rs

Do the LLaMA thing, but now in Rust 🦀🚀🦙 

Gif showcasing language generation using llama-rs

LLaMA-rs is a Rust port of the llama.cpp project. This allows running inference for Facebook's LLaMA model on a CPU with good performance using full precision, f16 or 4-bit quantized versions of the model.

Just like its C++ counterpart, it is powered by the ggml tensor library, achieving the same performance as the original code.

Getting started

Make sure you have a rust toolchain set up.

  1. Get a copy of the model's weights[^1]
  2. Clone the repository
  3. Build (cargo build --release)
  4. Run with cargo run --release -- <ARGS>

[^1]: The only legal source to get the weights at the time of writing is this repository. The choice of words also may or may not hint at the existence of other kinds of sources.

NOTE: Make sure to build and run in release mode. Debug builds are currently broken.

For example, you try the following prompt:

cargo run --release -- -m /data/Llama/LLaMA/7B/ggml-model-q4_0.bin -p "Tell me how cool the Rust programming language is

Q&A

Q: Why did you do this?

A: It was not my choice. Ferris appeared to me in my dreams and asked me to rewrite this in the name of the Holy crab.

Q: Seriously now

A: Come on! I don't want to get into a flame war. You know how it goes, something something memory something something cargo is nice, don't make me say it, everybody knows this already.

Q: I insist.

A: Sheesh! Okaaay. After seeing the huge potential for llama.cpp, the first thing I did was to see how hard would it be to turn it into a library to embed in my projects. I started digging into the code, and realized the heavy lifting is done by ggml (a C library, easy to bind to Rust) and the whole project was just around ~2k lines of C++ code (not so easy to bind). After a couple of (failed) attempts to build an HTTP server into the tool, I realized I'd be much more productive if I just ported the code to Rust, where I'm more comfortable.

Q: Is this the real reason?

A: Haha. Of course not. I just like collecting imaginary internet points, in the form of little stars, that people seem to give to me whenever I embark on pointless quests for rewriting X thing, but in Rust.

Known issues / To-dos

Contributions welcome! Here's a few pressing issues:

  •  The code only sets the right CFLAGS on Linux. The build.rs script in ggml_raw needs to be fixed, so inference will be very slow on every other OS.
  •  The quantization code has not been ported (yet). You can still use the quantized models with llama.cpp.
  •  The code needs to be "library"-fied. It is nice as a showcase binary, but the real potential for this tool is to allow embedding in other services.
  •  No crates.io release. The name llama-rs is reserved and I plan to do this soon-ish.
  •  Debug builds are currently broken.
  •  Anything from the original C++ code.

Download Details:

Author: setzer22
Source Code: https://github.com/setzer22/llama-rs 
License: MIT license

#rust #cpu 

LLaMA-rs: Run LLaMA inference on CPU, with Rust
Royce  Reinger

Royce Reinger

1678984680

DLL: Fast Deep Learning Library (DLL) for C++

Deep Learning Library (DLL) 1.1

DLL is a library that aims to provide a C++ implementation of Restricted Boltzmann Machine (RBM) and Deep Belief Network (DBN) and their convolution versions as well. It also has support for some more standard neural networks.

Features

  • Restricted Boltzmann Machine
    • Various units: Stochastic binary, Gaussian, Softmax and nRLU units
    • Contrastive Divergence and Persistence Contrastive Divergence
      • CD-1 learning by default
    • Momentum
    • Weight decay
    • Sparsity target
    • Train as Denoising autoencoder
  • Convolutional Restricted Boltzmann Machine
    • Standard version
    • Version with Probabilistic Max Pooling (Honglak Lee)
    • Binary and Gaussian visible units
    • Binary and ReLU hidden units for the standard version
    • Binary hidden units for the Probabilistic Max Pooling version
    • Training with CD-k or PCD-k (only for standard version)
    • Momentum, Weight Decay, Sparsity Target
    • Train as Denoising autoencoder
  • Deep Belief Network
    • Pretraining with RBMs
    • Fine tuning with Conjugate Gradient
    • Fine tuning with Stochastic Gradient Descent
    • Classification with SVM (libsvm)
  • Convolutional Deep Belief Network
    • Pretraining with CRBMs
    • Classification with SVM (libsvm)
  • Input data
    • Input data can be either in containers or in iterators
      • Even if iterators are supported for SVM classifier, libsvm will move all the data in memory structure.

Building

Note: When you clone the library, you need to clone the sub modules as well, using the --recursive option.

The folder include must be included with the -I option, as well as the etl/include folder.

This library is completely header-only, there is no need to build it.

However, this library makes extensive use of C++11 and C++14, therefore, a recent compiler is necessary to use it. Currently, this library is only tested with g++ 9.3.0.

If for some reasons, it should not work on one of the supported compilers, contact me and I'll fix it. It should work fine on recent versions of clang.

This has never been tested on Windows. While it should compile on Mingw, I don't expect Visual Studio to be able to compile it for now, although VS 2017 sounds promising. If you have problems compiling this library, I'd be glad to help, but cannot guarantee that this will work on other compilers.

If you want to use GPU, you should use CUDA 8.0 or superior and CUDNN 5.0.1 or superior. I haven't tried other versions, but lower versions of CUDA, such as 7, should work, and higher versions as well. If you got issues with different versions of CUDA and CUDNN, please open an issue on Github.


Download Details:

Author: Wichtounet
Source Code: https://github.com/wichtounet/dll 
License: MIT license

#machinelearning #cpluplus #performance #cpu #deeplearning

DLL: Fast Deep Learning Library (DLL) for C++

CPUTime.jl: Julia Module for CPU Timing

CPUTime.jl

A Julia package for measuring elapsed CPU time in Julia.

Installation

You should only use this package if you know what you're doing - CPU time on multi-core processors is a tricky beast. Please at least read the discussion in Issue #1 before proceeding. Once you've done that, to install call:

Pkg.add("CPUTime")

from the Julia command line.

Functions and Macros

The exported functions and macros, as well as their absolute time equivalents, are listed in the following table.

Real time (Julia standard library)CPU time (CPUTime.jl)
time_ns()CPUtime_us()
tic()CPUtic()
toq()CPUtoq()
toc()CPUtoc()
@time@CPUtime
@elapsed@CPUelapsed

Note that the finest resolution for CPU time is microseconds, as opposed to nanoseconds for absolute time.

Usage Example

using CPUTime

function add_and_sleep()
    x = 0
    for i in 1:10_000_000
        x += i
    end
    sleep(1)
    x
end

@time @CPUtime add_and_sleep()
elapsed CPU time: 0.000174 seconds
  1.005624 seconds (32 allocations: 1.109 KiB)
50000005000000

Download Details:

Author: schmrlng
Source Code: https://github.com/schmrlng/CPUTime.jl 
License: View license

#julia #module #cpu

CPUTime.jl: Julia Module for CPU Timing
Lawson  Wehner

Lawson Wehner

1676585460

Now Get Geekbench 6 for Your PC, Tablet, or Phone

Now Get Geekbench 6 for Your PC, Tablet, or Phone

Geekbench is one of the most popular benchmarking tools, allowing you to test how different computers, tablets, and phones stack up in computing performance. Now there’s a brand new version available.

Primate Labs, the developer group behind the utility, has released Geekbench 6. Just like earlier versions, it can run benchmark tests for both CPU and GPU performance, giving you a final score that you can compare against other devices and hardware. However, the exact tests have been tweaked to better reflect how modern software works on a given device.

According to the developers, the CPU benchmark is calibrated against a baseline score of 2,500, which comes from a Dell Precision 3460 with a Core i7-12700 processor. It tests navigation with OpenStreetMap, opening various pages using a background web browser, rendering complex PDF documents, indexing and editing images, and even compiling code. There’s also a GPU benchmark that tests support for OpenCL, CUDA, Metal (on Apple devices), and Vulkan APIs — that last one is new in Geekbench 6.

Geekbench 6 on macOS image

Geekbench 6 on macOS

Just like with previous Geekbench versions, higher scores are better. My M1 MacBook Air earned a CPU score of 2,300 on single-core performance and 8,538 on multi-core performance. After the test is complete, the results are uploaded online and opened in your web browser, which you can then share with others using the page link. Perfect for showing off the performance of your new custom-built PC or shiny new iPad.

Geekbench 6 is available for Mac, Windows, Linux, Android, iPhone, and iPad. The Mac version requires macOS 11 or later, at least 4 GB RAM, and either an Intel or Apple Silicon chip. The Windows version requires 4 GB RAM and Windows 10 or later — there’s apparently no support for ARM Windows, as Geekbench calls for a 64-bit Intel or AMD CPU.

You can download Geekbench 6 from the official site. There is a paid version available, but it’s only required if you want to perform automated testing, use a portable version, or keep your results offline.

Original article source at: https://www.howtogeek.com/

#pc #phone #cpu #explainer 

Now Get Geekbench 6 for Your PC, Tablet, or Phone
Rupert  Beatty

Rupert Beatty

1675797780

LLDebugtool: A Debugging Tool for Developers & Testers

LLDebugtool

A debugging tool for developers and testers that can help you analyze and manipulate data in non-xcode situations.

Introduction

LLDebugTool is a debugging tool for developers and testers that can help you analyze and manipulate data in non-xcode situations.

LLDebugToolSwift is the extension of LLDebugTool, it provide swift interface for LLDebugTool, LLDebugToolSwift will release with LLDebugTool at same time.

If your project is a Objective-C project, you can use LLDebugTool, if your project is a Swift project or contains swift files, you can use LLDebugToolSwift.

Choose LLDebugTool for your next project, or migrate over your existing projects—you'll be happy you did! 🎊🎊🎊

Gif

Preview

What's new in 1.3.8.1

Remove auto check version.

  • Too many visits to the website cocoadocs.org cause cocoadocs.org to disable the access to LLDebugTool, so this function is removed.

What can you do with LLDebugTool?

Always check the network request or view log information for certain events without having to run under XCode. This is useful in solving the testers' problems.

Easier filtering and filtering of useful information.

Easier analysis of occasional problems.

Easier analysis of the cause of the crash.

Easier sharing, previewing, or removing sandbox files, which can be very useful in the development stage.

Easier observe app's memory, CPU, FPS and other information.

Take screenshots, tag and share.

More intuitive view of view structure and dynamic modify properties.

Determine UI elements and colors in your App more accurately.

Easy access to and comparison of point information.

Easy access to element borders and frames.

Quick entry for html.

Mock location at anytime.

Adding LLDebugTool to your project

CocoaPods

CocoaPods is the recommended way to add LLDebugTool to your project.

Objective - C

  1. Add a pod entry for LLDebugTool to your Podfile pod 'LLDebugTool' , '~> 1.0'.
  2. If only you want to use it only in Debug mode, Add a pod entry for LLDebugTool to your Podfile pod 'LLDebugTool' , '~> 1.0' ,:configurations => ['Debug'], Details also see Wiki/Use in Debug environment. If you want to specify the version, use as pod 'LLDebugTool' , '1.3.8.1' ,:configurations => ['Debug'].
  3. The recommended approach is to use multiple targets and only add pod 'LLDebugTool', '~> 1.0' to Debug Target. This has the advantage of not contamiling the code in the Product environment and can be integrated into the App in the Archive Debug environment (if :configurations => ['Debug'], it can only run through XCode. It is not possible to Archive as an App).
  4. Install the pod(s) by running pod install. If you can't search LLDebugTool or you can't find the newest release version, running pod repo update before pod install.
  5. Include LLDebugTool wherever you need it with #import "LLDebug.h" or you can write #import "LLDebug.h" in your .pch in your .pch file.

Swift

  1. Add a pod entry for LLDebugToolSwift to your Podfile pod 'LLDebugToolSwift' , '~> 1.0'.
  2. If only you want to use it only in Debug mode, Add a pod entry for LLDebugToolSwift to your Podfile pod 'LLDebugToolSwift' , '~> 1.0' ,:configurations => ['Debug'], Details also see Wiki/Use in Debug environment. If you want to specify the version, use as pod 'LLDebugToolSwift' , '1.3.8.1' ,:configurations => ['Debug'].
  3. The recommended approach is to use multiple targets and only add pod 'LLDebugToolSwift', '~> 1.0' to Debug Target. This has the advantage of not contamiling the code in the Product environment and can be integrated into the App in the Archive Debug environment (if :configurations => ['Debug'], it can only run through XCode. It is not possible to Archive as an App).
  4. Must be added in the Podfile use_frameworks!.
  5. Install the pod(s) by running pod install. If you can't search LLDebugToolSwift or you can't find the newest release version, running pod repo update before pod install.
  6. Include LLDebugTool wherever you need it with import "LLDebugToolSwift.

Carthage

Carthage is a decentralized dependency manager that builds your dependencies and provides you with binary frameworks.

Objective - C

To integrate LLDebugTool into your Xcode project using Carthage, specify it in your Cartfile:

github "LLDebugTool"

Run carthage to build the framework and drag the built LLDebugTool.framework into your Xcode project.

Swift

To integrate LLDebugToolSwift into your Xcode project using Carthage, specify it in your Cartfile:

github "LLDebugToolSwift"

Run carthage to build the framework and drag the built LLDebugToolSwift.framework into your Xcode project.

Source files

Alternatively you can directly add the source folder named LLDebugTool. to your project.

Objective - C

  1. Download the latest code version or add the repository as a git submodule to your git-tracked project.
  2. Open your project in Xcode, then drag and drop the source folder named LLDebugTool. When you are prompted to "Choose options for adding these files", be sure to check the "Copy items if needed".
  3. Integrated FMDB to your project,FMDB is an Objective-C wrapper around SQLite.
  4. Integrated Masonry to your project, Masonry is an Objective-C constraint library. There are no specific version requirements, but it is recommended that you use the latest version.
  5. Include LLDebugTool wherever you need it with #import "LLDebug.h" or you can write #import "LLDebug.h" in your .pch in your .pch file.

Swift

  1. Download the LLDebugTool latest code version or add the repository as a git submodule to your git-tracked project.
  2. Download the LLDebugToolSwift latest code version or add the repository as a git submodule to your git-tracked project.
  3. Open your project in Xcode, then drag and drop the source folder named LLDebugTool and LLDebugToolSwift. When you are prompted to "Choose options for adding these files", be sure to check the "Copy items if needed".
  4. Integrated FMDB to your project,FMDB is an Objective-C wrapper around SQLite.
  5. Integrated Masonry to your project, Masonry is an Objective-C constraint library. There are no specific version requirements, but it is recommended that you use the latest version.
  6. Include LLDebugTool wherever you need it with import LLDebugToolSwift".

Usage

Get Started

You need to start LLDebugTool at "application:(UIApplication * )application didFinishLaunchingWithOptions:(NSDictionary * )launchOptions", Otherwise you will lose some information.

If you want to configure some parameters, must configure before "startWorking". More config details see LLConfig.h.

  • Quick Start

In Objective-C

#import "AppDelegate.h"
#import "LLDebug.h"

- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {
    // The default color configuration is green background and white text color. 

    // Start working.
    [[LLDebugTool sharedTool] startWorking];
    
    // Write your project code here.
    return YES;
}

In Swift

import LLDebugToolSwift

    func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplicationLaunchOptionsKey: Any]?) -> Bool {
        // ####################### Start LLDebugTool #######################//
        // Use this line to start working.
        LLDebugTool.shared().startWorking()
        
        // Write your project code here.
        
        return true
    }
  • Start With Custom Config

In Objective-C

#import "AppDelegate.h"
#import "LLDebug.h"

- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {

    // Start working with config.
    [[LLDebugTool sharedTool] startWorkingWithConfigBlock:^(LLConfig * _Nonnull config) {

        //####################### Color Style #######################//
        // Uncomment one of the following lines to change the color configuration.
        // config.colorStyle = LLConfigColorStyleSystem;
        // [config configBackgroundColor:[UIColor orangeColor] primaryColor:[UIColor whiteColor] statusBarStyle:UIStatusBarStyleDefault];

        //####################### User Identity #######################//
        // Use this line to tag user. More config please see "LLConfig.h".
        config.userIdentity = @"Miss L";

        //####################### Window Style #######################//
        // Uncomment one of the following lines to change the window style.
        // config.entryWindowStyle = LLConfigEntryWindowStyleNetBar;

    }];

    return YES;
}

In Swift

import LLDebugToolSwift

    func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplicationLaunchOptionsKey: Any]?) -> Bool {
        
        // Start working with config.
        LLDebugTool.shared().startWorking { (config) in
            //####################### Color Style #######################//
            // Uncomment one of the following lines to change the color configuration.
            // config.colorStyle = .system
            // config.configBackgroundColor(.orange, textColor: .white, statusBarStyle: .default)
        
            //####################### User Identity #######################//
            // Use this line to tag user. More config please see "LLConfig.h".
            config.userIdentity = "Miss L";
        
            //####################### Window Style #######################//
            // Uncomment one of the following lines to change the window style.
            // config.windowStyle = .netBar
        
            //####################### Features #######################//
            // Uncomment this line to change the available features.
            // config.availables = .noneAppInfo
        }
        
        return true
    }

Network Request

You don't need to do anything, just call the "startWorking" will monitoring most of network requests, including the use of NSURLSession, NSURLConnection and AFNetworking. If you find that you can't be monitored in some cases, please open an issue and tell me.

Log

Print and save a log. More log macros details see LLDebugToolMacros.h.

  • Save Log

In Objective-C

#import "LLDebug.h"

- (void)testNormalLog {
    // Insert an LLog where you want to print.
    LLog(@"Message you want to save or print.");
}

In Swift

import LLDebugToolSwift

    func testNormalLog() {
        // Insert an LLog where you want to print.
        LLog.log(message: "Message you want to save or print.")
    }
  • Save Log with event and level

In Objective-C

#import "LLDebug.h"

- (void)testEventErrorLog {
    // Insert an LLog_Error_Event where you want to print an event and level log.
    LLog_Error_Event(@"The event that you want to mark. such as bugA, taskB or processC.",@"Message you want to save or print.");
}

In Swift

import LLDebugToolSwift

    func testEventErrorLog() {
        // Insert an LLog_Error_Event where you want to print an event and level log.
        LLog.errorLog(message: "Message you want to save or print.", event: "The event that you want to mark. such as bugA, taskB or processC.")
    }

Crash

You don't need to do anything, just call the "startWorking" to intercept the crash, store crash information, cause and stack informations, and also store the network requests and log informations at the this time.

AppInfo

LLDebugTool monitors the app's CPU, memory, and FPS. At the same time, you can also quickly check the various information of the app.

Sandbox

LLDebugTool provides a quick way to view and manipulate sandbox, you can easily delete the files/folders inside the sandbox, or you can share files/folders by airdrop elsewhere. As long as apple supports this file format, you can preview the files directly in LLDebugTool.

Screenshots

LLDebugTool provides a screenshot and allows for simple painting and marking that can be easily recorded during testing or while the UI designers debugs the App.

Hierarchy

LLDebugTool provides a view structure tool for viewing or modify elements' properties and information in non-debug mode.

Magnifier

LLDebugTool provides a magnify tool for magnifying local uis and viewing color values at specified pixel.

Ruler

LLDebugTool provides a convenient tools to display touch point information.

Widget Border

LLDebugTool provides a function to display element border, convenient to see the view's frame.

HTML

LLDebugTool can debug HTML pages through WKWebView, UIWebView or your customized ViewController in your app at any time.

Location

LLDebugTool provides a function to mock location at anytime.

More Usage

  • You can get more help by looking at the Wiki.
  • You can download and run the LLDebugToolDemo or LLDebugToolSwiftDemo to find more use with LLDebugTool. The demo is build under MacOS 10.15.1, XCode 11.2.1, iOS 13.2.2, CocoaPods 1.8.4. If there is any version compatibility problem, please let me know.

Requirements

LLDebugTool works on iOS 8+ and requires ARC to build. It depends on the following Apple frameworks, which should already be included with most Xcode templates:

UIKit

Foundation

SystemConfiguration

Photos

QuickLook

CoreTelephony

CoreLocation

MapKit

AVKit

Architecture

LLDebug.h

Public header file. You can refer it to the pch file.

DebugTool

LLDebugTool Used to start and stop LLDebugTool, you need to look at it.

LLConfig Used for the custom color , size , identification and other information. If you want to configure anything, you need to focus on this file.

LLDebugToolMacros.h Quick macro definition file.

Components

  • Network Used to monitoring network request.
  • Log Used to quick print and save log.
  • Crash Used to collect crash information when an App crashes.
  • AppInfo Use to monitoring app's properties.
  • Sandbox Used to view and operate sandbox files.
  • Screenshot Used to process and display screenshots.
  • Hierarchy Used to process and present the view structure.
  • Magnifier Used for magnifying glass function.
  • Ruler Used to ruler function.
  • Widget Border User to widget border function.
  • Function Used to show functions.
  • Html Used to dynamic test web view.
  • Location Used to mock location.
  • Setting Used to dynamically set configs.

Communication

  • If you need help, open an issue.
  • If you'd like to ask a general question, open an issue.
  • If you found a bug, and can provide steps to reliably reproduce it, open an issue.
  • If you have a feature request, open an issue.
  • If you find anything wrong or anything dislike, open an issue.
  • If you have some good ideas or some requests, send mail(llworkinggroup1992@gmail.com) to me.
  • If you want to contribute, submit a pull request.

Contact

Change-log

A brief summary of each LLDebugTool release can be found in the CHANGELOG.

点击查看中文简介

Download Details:

Author: HDB-Li
Source Code: https://github.com/HDB-Li/LLDebugTool 
License: View license

#swift #ios #cpu #monitoring #objective-c #xcode 

LLDebugtool: A Debugging Tool for Developers & Testers
Royce  Reinger

Royce Reinger

1673730600

Serve: Optimize and Scale Pytorch Models in Production

TorchServe

TorchServe is a flexible and easy to use tool for serving and scaling PyTorch models in production.

Requires python >= 3.8

curl http://127.0.0.1:8080/predictions/bert -T input.txt

🚀 Quick start with TorchServe

# Install dependencies
# cuda is optional
python ./ts_scripts/install_dependencies.py --cuda=cu102

# Latest release
pip install torchserve torch-model-archiver torch-workflow-archiver

# Nightly build
pip install torchserve-nightly torch-model-archiver-nightly torch-workflow-archiver-nightly

🚀 Quick start with TorchServe (conda)

# Install dependencies
# cuda is optional
python ./ts_scripts/install_dependencies.py --cuda=cu102

# Latest release
conda install -c pytorch torchserve torch-model-archiver torch-workflow-archiver

# Nightly build
conda install -c pytorch-nightly torchserve torch-model-archiver torch-workflow-archiver

Getting started guide

🐳 Quick Start with Docker

# Latest release
docker pull pytorch/torchserve

# Nightly build
docker pull pytorch/torchserve-nightly

Refer to torchserve docker for details.

⚡ Why TorchServe

🤔 How does TorchServe work

🏆 Highlighted Examples

For more examples

🤓 Learn More

https://pytorch.org/serve

🫂 Contributing

We welcome all contributions!

To learn more about how to contribute, see the contributor guide here.

📰 News

⚖️ Disclaimer

This repository is jointly operated and maintained by Amazon, Meta and a number of individual contributors listed in the CONTRIBUTORS file. For questions directed at Meta, please send an email to opensource@fb.com. For questions directed at Amazon, please send an email to torchserve@amazon.com. For all other questions, please open up an issue in this repository here.

TorchServe acknowledges the Multi Model Server (MMS) project from which it was derived

Download Details:

Author: Pytorch
Source Code: https://github.com/pytorch/serve 
License: Apache-2.0 license

#machinelearning #docker #kubernetes #cpu #deeplearning 

Serve: Optimize and Scale Pytorch Models in Production
Nat  Grady

Nat Grady

1668002820

H2o4gpu: H2Oai GPU Edition

H2O4GPU

H2O4GPU is a collection of GPU solvers by H2Oai with APIs in Python and R. The Python API builds upon the easy-to-use scikit-learn API and its well-tested CPU-based algorithms. It can be used as a drop-in replacement for scikit-learn (i.e. import h2o4gpu as sklearn) with support for GPUs on selected (and ever-growing) algorithms. H2O4GPU inherits all the existing scikit-learn algorithms and falls back to CPU algorithms when the GPU algorithm does not support an important existing scikit-learn class option. The R package is a wrapper around the H2O4GPU Python package, and the interface follows standard R conventions for modeling.

Daal library added for CPU, currently supported only x86_64 architecture.

Requirements

PC running Linux with glibc 2.17+

Install CUDA with bundled display drivers ( CUDA 8 or CUDA 9 or CUDA 9.2) or CUDA 10)

Python shared libraries (e.g. On Ubuntu: sudo apt-get install libpython3.6-dev)

When installing, choose to link the cuda install to /usr/local/cuda . Ensure to reboot after installing the new nvidia drivers.

Nvidia GPU with Compute Capability >= 3.5 (Capability Lookup).

For advanced features, like handling rows/32 > 2^16 (i.e., rows > 2,097,152) in K-means, need Capability >= 5.2

For building the R package, libcurl4-openssl-dev, libssl-dev, and libxml2-dev are needed.

User Installation

Note: Installation steps mentioned below are for users planning to use H2O4GPU. See DEVEL.md for developer installation.

H2O4GPU can be installed using either PIP or Conda

Prerequisites

Add to ~/.bashrc or environment (set appropriate paths for your OS):

export CUDA_HOME=/usr/local/cuda # or choose /usr/local/cuda9 for cuda9 and /usr/local/cuda8 for cuda8
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_HOME/lib64/:$CUDA_HOME/lib/:$CUDA_HOME/extras/CUPTI/lib64
  • Install OpenBlas dev environment:
sudo apt-get install libopenblas-dev pbzip2

If you are building the h2o4gpu R package, it is necessary to install the following dependencies:

sudo apt-get -y install libcurl4-openssl-dev libssl-dev libxml2-dev

PIP install

Download the Python wheel file (For Python 3.6):

Start a fresh pyenv or virtualenv session.

Install the Python wheel file. NOTE: If you don't use a fresh environment, this will overwrite your py3nvml and xgboost installations to use our validated versions.

pip install h2o4gpu-0.3.0-cp36-cp36m-linux_x86_64.whl

Conda installation

Ensure you meet the Requirements and have installed the Prerequisites.

If not already done you need to install conda package manager. Ensure you test your conda installation

H204GPU packages for CUDA8, CUDA 9 and CUDA 9.2 are available from h2oai channel in anaconda cloud.

Create a new conda environment with H2O4GPU based on CUDA 9.2 and all its dependencies using the following command. For other cuda versions substitute the package name as needed. Note the requirement for h2oai and conda-forge channels.

conda create -n h2o4gpuenv -c h2oai -c conda-forge -c rapidsai h2o4gpu-cuda10

Once the environment is created activate it source activate h2o4gpuenv.

To test, start an interactive python session in the environment and follow the steps in the Test Installation section below.

h2o4gpu R package

At this point, you should have installed the H2O4GPU Python package successfully. You can then go ahead and install the h2o4gpu R package via the following:

if (!require(devtools)) install.packages("devtools")
devtools::install_github("h2oai/h2o4gpu", subdir = "src/interface_r")

Detailed instructions can be found here.

Test Installation

To test your installation of the Python package, the following code:

import h2o4gpu
import numpy as np

X = np.array([[1.,1.], [1.,4.], [1.,0.]])
model = h2o4gpu.KMeans(n_clusters=2,random_state=1234).fit(X)
model.cluster_centers_

should give input/output of:

>>> import h2o4gpu
>>> import numpy as np
>>>
>>> X = np.array([[1.,1.], [1.,4.], [1.,0.]])
>>> model = h2o4gpu.KMeans(n_clusters=2,random_state=1234).fit(X)
>>> model.cluster_centers_
array([[ 1.,  1.  ],
       [ 1.,  4.  ]])

To test your installation of the R package, try the following example that builds a simple XGBoost random forest classifier:

library(h2o4gpu)

# Setup dataset
x <- iris[1:4]
y <- as.integer(iris$Species) - 1

# Initialize and train the classifier
model <- h2o4gpu.random_forest_classifier() %>% fit(x, y)

# Make predictions
predictions <- model %>% predict(x)

Next Steps

For more examples using Python API, please check out our Jupyter notebook demos. To run the demos using a local wheel run, at least download src/interface_py/requirements_runtime_demos.txt from the Github repo and do:

pip install -r src/interface_py/requirements_runtime_demos.txt

and then run the jupyter notebook demos.

For more examples using R API, please visit the vignettes.

Running Jupyter Notebooks

You can run Jupyter Notebooks with H2O4GPU in the below two ways

Creating a Conda Environment

Ensure you have a machine that meets the Requirements and Prerequisites mentioned above.

Next follow Conda installation instructions mentioned above. Once you have activated the environment, you will need to downgrade tornado to version 4.5.3 refer issue #680. Start Jupyter notebook, and navigate to the URL shown in the log output in your browser.

source activate h2o4gpuenv
conda install tornado==4.5.3
jupyter notebook --ip='*' --no-browser

Start a Python 3 kernel, and try the code in example notebooks

Using precompiled docker image

Requirements:

Download the Docker file (for linux_x86_64):

  • Bleeding edge (changes with every successful master branch build):

Load and run docker file (e.g. for bleeding-edge of cuda92):

jupyter notebook --generate-config
echo "c.NotebookApp.allow_remote_access = False >> ~/.jupyter/jupyter_notebook_config.py # Choose True if want to allow remote access
pbzip2 -dc h2o4gpu-0.3.0.10000-cuda92-runtime.tar.bz2 | nvidia-docker load
mkdir -p log ; nvidia-docker run --name localhost --rm -p 8888:8888 -u `id -u`:`id -g` -v `pwd`/log:/log -v /home/$USER/.jupyter:/jupyter --entrypoint=./run.sh opsh2oai/h2o4gpu-0.3.0.10000-cuda92-runtime &
find log -name jupyter* -type f -printf '%T@ %p\n' | sort -k1 -n | awk '{print $2}' | tail -1 | xargs cat | grep token | grep http | grep -v NotebookApp

Copy/paste the http link shown into your browser. If the "find" command doesn't work, look for the latest jupyter.log file and look at contents for the http link and token.

If the link shows no token or shows ... for token, try a token of "h2o" (without quotes). If running on your own host, the weblink will look like http://localhost:8888:token with token replaced by the actual token.

This container has a /demos directory which contains Jupyter notebooks and some data.

Plans

The vision is to develop fast GPU algorithms to complement the CPU algorithms in scikit-learn while keeping full scikit-learn API compatibility and scikit-learn CPU algorithm capability. The h2o4gpu Python module is to be used as a drop-in-replacement for scikit-learn that has the full functionality of scikit-learn's CPU algorithms.

Functions and classes will be gradually overridden by GPU-enabled algorithms (unless n_gpu=0 is set and we have no CPU algorithm except scikit-learn's). The CPU algorithms and code initially will be sklearn, but gradually those may be replaced by faster open-source codes like those in Intel DAAL.

This vision is currently accomplished by using the open-source scikit-learn and xgboost and overriding scikit-learn calls with our own GPU versions. In cases when our GPU class is currently incapable of an important scikit-learn feature, we revert to the scikit-learn class.

As noted above, there is an R API in development, which will be released as a stand-alone R package. All algorithms supported by H2O4GPU will be exposed in both Python and R in the future.

Another primary goal is to support all operations on the GPU via the GOAI initiative. This involves ensuring the GPU algorithms can take and return GPU pointers to data instead of going back to the host. In scikit-learn API language these are called fit_ptr, predict_ptr, transform_ptr, etc., where ptr stands for memory pointer.

RoadMap

2019 Q2:

  • A new processing engine that allows to scale beyond GPU memory limits
  • k-Nearest Neighbors
  • Matrix Factorization
  • Factorization Machines
  • API Support: GOAI API support
  • Data.table support

More precise information can be found in the milestone's list.

Solver Classes

Among others, the solver can be used for the following classes of problems

  • GLM: Lasso, Ridge Regression, Logistic Regression, Elastic Net Regulariation
  • KMeans
  • Gradient Boosting Machine (GBM) via XGBoost
  • Singular Value Decomposition(SVD) + Truncated Singular Value Decomposition
  • Principal Components Analysis(PCA)

Benchmarks

Our benchmarking plan is to clearly highlight when modeling benefits from the GPU (usually complex models) or does not (e.g. one-shot simple models dominated by data transfer).

We have benchmarked h2o4gpu, scikit-learn, and h2o-3 on a variety of solvers. Some benchmarks have been performed for a few selected cases that highlight the GPU capabilities (i.e. compute or on-GPU memory operations dominate data transfer to GPU from host):

Benchmarks for GLM, KMeans, and XGBoost for CPU vs. GPU.

A suite of benchmarks are computed when doing "make testperf" from a build directory. These take all of our tests and benchmarks h2o4gpu against h2o-3. These will soon be presented as a live commit-by-commit streaming plots on a website.

Contributing

Please refer to our CONTRIBUTING.md and DEVEL.md for instructions on how to build and test the project and how to contribute. The h2o4gpu Gitter chatroom can be used for discussion related to open source development.

GitHub issues are used for bugs, feature and enhancement discussion/tracking.

Questions

Please ask all code-related questions on StackOverflow using the "h2o4gpu" tag.

Questions related to the roadmap can be directed to the developers on Gitter.

Troubleshooting

FAQ

References

  1. Parameter Selection and Pre-Conditioning for a Graph Form Solver -- C. Fougner and S. Boyd
  2. Block Splitting for Distributed Optimization -- N. Parikh and S. Boyd
  3. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers -- S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein
  4. Proximal Algorithms -- N. Parikh and S. Boyd

Download Details:

Author: h2oai
Source Code: https://github.com/h2oai/h2o4gpu 
License: Apache-2.0 license

#r #python #machinelearning #cpu #gpu 

H2o4gpu: H2Oai GPU Edition
Lawrence  Lesch

Lawrence Lesch

1660882680

Doc: Get Usage and Health Data About Your Node.js Process

doc    

Get usage and health data about your Node.js process.

doc is a small module that helps you collect health metrics about your Node.js process. It does that by using only the API available on Node itself (no native dependencies). It doesn't have any ties with an APM platform, so you are free to use anything you want for that purpose. Its API lets you access both computed and raw values, where possible.

Installation

latest stable version

$ npm i @dnlup/doc

latest development version

$ npm i @dnlup/doc@next

Usage

You can import the module by using either CommonJS or ESM.

By default doc returns a Sampler instance that collects metrics about cpu, memory usage, event loop delay and event loop utilization (only on Node versions that support it).

Importing with CommonJS

const doc = require('@dnlup/doc')

const sampler = doc() // Use the default options

sampler.on('sample', () => {
  doStuffWithCpuUsage(sampler.cpu.usage)
  doStuffWithMemoryUsage(sampler.memory)
  doStuffWithEventLoopDelay(sampler.eventLoopDelay.computed)
  doStuffWithEventLoopUtilization(sampler.eventLoopUtilization.utilization) // Available only on Node versions that support it
})

Importing with ESM

import doc from '@dnlup/doc'

const sampler = doc()

sampler.on('sample', () => {
  doStuffWithCpuUsage(sampler.cpu.usage)
  doStuffWithMemoryUsage(sampler.memory)
  doStuffWithEventLoopDelay(sampler.eventLoopDelay.computed)
  doStuffWithEventLoopUtilization(sampler.eventLoopUtilization.utilization) // Available only on Node versions that support it
})

Note

A Sampler holds a snapshot of the metrics taken at the specified sample interval. This behavior makes the instance stateful. On every tick, a new snapshot will overwrite the previous one.

Enable/disable metrics collection

You can disable the metrics that you don't need.

const doc = require('@dnlup/doc')

// Collect only the event loop delay
const sampler = doc({ collect: { cpu: false, memory: false } })

sampler.on('sample', () => {
  // `sampler.cpu` will be `undefined`
  // `sampler.memory` will be `undefined`
  doStuffWithEventLoopDelay(sampler.eventLoopDelay.computed)
  doStuffWithEventLoopUtilization(sampler.eventLoopUtilization.utilization) // Available only on Node versions that support it
})

You can enable more metrics if you need them.

Garbage collection

const doc = require('@dnlup/doc')

const sampler = doc({ collect: { gc: true } })
sampler.on('sample', () => {
  doStuffWithCpuUsage(sampler.cpu.usage)
  doStuffWithMemoryUsage(sampler.memory)
  doStuffWithEventLoopDelay(sampler.eventLoopDelay.computed)
  doStuffWithEventLoopUtilization(sampler.eventLoopUtilization.utilization) // Available only on Node versions that support it
  doStuffWithGarbageCollectionDuration(sampler.gc.pause)
})

Active handles

const doc = require('@dnlup/doc')

const sampler = doc({ collect: { activeHandles: true } })

sampler.on('sample', () => {
  doStuffWithCpuUsage(sampler.cpu.usage)
  doStuffWithMemoryUsage(sampler.memory)
  doStuffWithEventLoopDelay(sampler.eventLoopDelay.computed)
  doStuffWithEventLoopUtilization(sampler.eventLoopUtilization.utilization) // Available only on Node versions that support it
  doStuffWithActiveHandles(sampler.activeHandles)
})

Examples

You can find more examples in the examples folder.

API

doc([options])

It creates a metrics Sampler instance with the given options.

Class: doc.Sampler

Metrics sampler.

It collects the selected metrics at a regular interval. A Sampler instance is stateful so, on each tick, only the values of the last sample are available. Each time the sampler emits the sample event, it will overwrite the previous one.

new doc.Sampler([options])

  • options <Object>
    • sampleInterval <number>: sample interval (ms) to get a sample. On each sampleInterval ms a sample event is emitted. Default: 500 on Node < 11.10.0, 1000 otherwise. Under the hood the package uses monitorEventLoopDelay when available to track the event loop delay and this allows to increase the default sampleInterval.
    • autoStart <boolean>: start automatically to collect metrics. Default: true.
    • unref <boolean>: unref the timer used to schedule the sampling interval. Default: true.
    • gcOptions <Object>: Garbage collection options
    • eventLoopDelayOptions <Object>: Options to setup monitorEventLoopDelay. Default: { resolution: 10 }
    • collect <Object>: enable/disable the collection of specific metrics.
      • cpu <boolean>: enable cpu metric. Default: true.
      • resourceUsage <boolean>: enable resourceUsage metric. Default: false.
      • eventLoopDelay <boolean>: enable eventLoopDelay metric. Default: true.
      • eventLoopUtilization <boolean>: enable eventLoopUtilization metric. Default: true on Node version 12.19.0 and newer.
      • memory <boolean>: enable memory metric. Default: true.
      • gc <boolean>: enable garbage collection metric. Default: false.
      • activeHandles <boolean>: enable active handles collection metric. Default: false.

If options.collect.resourceUsage is set to true, options.collect.cpu will be set to false because the cpu metric is already available in the resource usage metric.

Event: 'sample'

Emitted every sampleInterval, it signals that new data the sampler has collected new data.

sampler.start()

Start collecting metrics.

sampler.stop()

Stop collecting metrics.

sampler.cpu

Resource usage metric instance.

sampler.resourceUsage

Resource usage metric instance.

sampler.eventLoopDelay

Event loop delay metric instance.

sampler.eventLoopUtilization

Event loop utilization metric instance.

sampler.gc

Garbage collector metric instance.

sampler.activeHandles

  • <number>

Number of active handles returned by process._getActiveHandles().

sampler.memory

  • <object>

Object returned by process.memoryUsage().

Class: CpuMetric

It exposes both computed and raw values of the cpu usage.

cpuMetric.usage

  • <number>

Cpu usage in percentage.

cpuMetric.raw

  • <object>

Raw value returned by process.cpuUsage().

Class: ResourceUsageMetric

It exposes both computed and raw values of the process resource usage.

resourceUsage.cpu

  • <number>

Cpu usage in percentage.

resourceUsage.raw

  • <object>

Raw value returned by process.resourceUsage().

Class: EventLoopDelayMetric

It exposes both computed and raw values about the event loop delay.

eventLoopDelay.computed

  • <number>

Event loop delay in milliseconds. On Node versions that support monitorEventLoopDelay, it computes this value using the mean of the Histogram instance. Otherwise, it uses a simple timer to calculate it.

eventLoopDelay.raw

  • <Histogram> | <number>

On Node versions that support monitorEventLoopDelay this exposes the Histogram instance. Otherwise, it exposes the raw delay value in nanoseconds.

eventLoopDelay.compute(raw)

  • raw <number> The raw value obtained using the Histogram API.
  • Returns <number> The computed delay value.

This function works only on node versions that support monitorEventLoopDelay. It allows to get computed values of the event loop delay from statistics other than the mean of the Histogram instance.

Class: EventLoopUtilizationMetric

It exposes statistics about the event loop utilization.

eventLoopUtilization.idle

  • <number>

The idle value in the object returned by performance.eventLoopUtilization() during the sampleInterval window.

eventLoopUtilization.active

  • <number>

The active value in the object returned by performance.eventLoopUtilization() during the sampleInterval window.

eventLoopUtilization.utilization

  • <number>

The utilization value in the object returned by performance.eventLoopUtilization() during the sampleInterval window.

eventLoopUtilization.raw

  • <object>

Raw value returned by performance.eventLoopUtilization() during the sampleInterval window.

Class: GCMetric

It exposes the garbage collector activity statistics in the specified sampleInterval using hdr histograms.

new GCMetric(options)

  • options <object>: Configuration options

gcMetric.pause

It tracks the global activity of the garbage collector.

gcMetric.major

The activity of the operation of type major. It's present only if GCMetric has been created with the option aggregate equal to true.

See performanceEntry.kind.

gcMetric.minor

The activity of the operation of type minor. It's present only if GCMetric has been created with the option aggregate equal to true.

See performanceEntry.kind.

gcMetric.incremental

The activity of the operation of type incremental. It's present only if GCMetric has been created with the option aggregate equal to true.

See performanceEntry.kind.

gcMetric.weakCb

The activity of the operation of type weakCb. It's present only if GCMetric has been created with the option aggregate equal to true.

See performanceEntry.kind.

Class: GCEntry

It contains garbage collection data, represented with an hdr histogram. All timing values are expressed in nanoseconds.

new GCEntry()

The initialization doesn't require options. It is created internally by a GCMetric.

gcEntry.totalDuration

  • <number>

It is the total time of the entry in nanoseconds.

gcEntry.totalCount

  • <number>

It is the total number of operations counted.

gcEntry.mean

  • <number>

It is the mean value of the entry in nanoseconds.

gcEntry.max

  • <number>

It is the maximum value of the entry in nanoseconds.

gcEntry.min

  • <number>

It is the minimum value of the entry in nanoseconds.

gcEntry.stdDeviation

  • <number>

It is the standard deviation of the entry in nanoseconds.

gcEntry.summary

  • <object>

The hdr histogram summary. See https://github.com/HdrHistogram/HdrHistogramJS#record-values-and-retrieve-metrics.

gcEntry.getPercentile(percentile)

  • percentile <number>: Get a percentile from the histogram.
  • Returns <number> The percentile

See https://github.com/HdrHistogram/HdrHistogramJS#record-values-and-retrieve-metrics.

Class: GCAggregatedEntry

It extends GCEntry and contains garbage collection data plus the flags associated with it (see https://nodejs.org/docs/latest-v12.x/api/perf_hooks.html#perf_hooks_performanceentry_flags).

new GCAggregatedEntry()

The initialization doesn't require options. It is created internally by a GCMetric.

gcAggregatedEntry.flags

  • <object>

This object contains the various hdr histograms of each flag.

gcAggregatedEntry.flags.no

gcAggregatedEntry.flags.constructRetained

gcAggregatedEntry.flags.forced

gcAggregatedEntry.flags.synchronousPhantomProcessing

gcAggregatedEntry.flags.allAvailableGarbage

gcAggregatedEntry.flags.allExternalMemory

gcAggregatedEntry.flags.scheduleIdle

doc.eventLoopUtilizationSupported

  • <boolean>

It tells if the Node.js version in use supports the eventLoopUtilization metric.

doc.resourceUsageSupported

  • <boolean>

It tells if the Node.js version in use supports the resourceUsage metric.

doc.gcFlagsSupported

  • <boolean>

It tells if the Node.js version in use supports GC flags.

doc.errors

In the errors object are exported all the custom errors used by the module.

ErrorError CodeDescription
InvalidArgumentErrorDOC_ERR_INVALID_ARGAn invalid option or argument was used
NotSupportedErrorDOC_ERR_NOT_SUPPORTEDA metric is not supported on the Node.js version used

Download Details:

Author: Dnlup
Source Code: https://github.com/dnlup/doc 
License: ISC license

#javascript #node #cpu #metrics 

Doc: Get Usage and Health Data About Your Node.js Process
Royce  Reinger

Royce Reinger

1658133540

Ruby: Parallel Processing Made Simple and Fast

Parallel

Run any code in parallel Processes(> use all CPUs), Threads(> speedup blocking operations), or Ractors(> use all CPUs).
Best suited for map-reduce or e.g. parallel downloads/uploads.

Install

gem install parallel

Usage

# 2 CPUs -> work in 2 processes (a,b + c)
results = Parallel.map(['a','b','c']) do |one_letter|
  SomeClass.expensive_calculation(one_letter)
end

# 3 Processes -> finished after 1 run
results = Parallel.map(['a','b','c'], in_processes: 3) { |one_letter| SomeClass.expensive_calculation(one_letter) }

# 3 Threads -> finished after 1 run
results = Parallel.map(['a','b','c'], in_threads: 3) { |one_letter| SomeClass.expensive_calculation(one_letter) }

# 3 Ractors -> finished after 1 run
results = Parallel.map(['a','b','c'], in_ractors: 3, ractor: [SomeClass, :expensive_calculation])

Same can be done with each

Parallel.each(['a','b','c']) { |one_letter| ... }

or each_with_index, map_with_index, flat_map

Produce one item at a time with lambda (anything that responds to .call) or Queue.

items = [1,2,3]
Parallel.each( -> { items.pop || Parallel::Stop }) { |number| ... }

Also supports any? or all?

Parallel.any?([1,2,3,4,5,6,7]) { |number| number == 4 }
# => true

Parallel.all?([1,2,nil,4,5]) { |number| number != nil }
# => false

Processes/Threads are workers, they grab the next piece of work when they finish.

Processes

  • Speedup through multiple CPUs
  • Speedup for blocking operations
  • Variables are protected from change
  • Extra memory used
  • Child processes are killed when your main process is killed through Ctrl+c or kill -2

Threads

  • Speedup for blocking operations
  • Variables can be shared/modified
  • No extra memory used

Ractors

  • Ruby 3.0+ only
  • Speedup for blocking operations
  • No extra memory used
  • Very fast to spawn
  • Experimental and unstable
  • start and finish hooks are called on main thread
  • Variables must be passed in Parallel.map([1,2,3].map { |i| [i, ARGV, local_var] }, ...
  • use Ractor.make_shareable to pass in global objects

ActiveRecord

Connection Lost

  • Multithreading needs connection pooling, forks need reconnects
  • Adjust connection pool size in config/database.yml when multithreading
# reproducibly fixes things (spec/cases/map_with_ar.rb)
Parallel.each(User.all, in_processes: 8) do |user|
  user.update_attribute(:some_attribute, some_value)
end
User.connection.reconnect!

# maybe helps: explicitly use connection pool
Parallel.each(User.all, in_threads: 8) do |user|
  ActiveRecord::Base.connection_pool.with_connection do
    user.update_attribute(:some_attribute, some_value)
  end
end

# maybe helps: reconnect once inside every fork
Parallel.each(User.all, in_processes: 8) do |user|
  @reconnected ||= User.connection.reconnect! || true
  user.update_attribute(:some_attribute, some_value)
end

NameError: uninitialized constant

A race happens when ActiveRecord models are autoloaded inside parallel threads in environments that lazy-load, like development, test, or migrations.

To fix, autoloaded classes before the parallel block with either require '<modelname>' or ModelName.class.

Break

Parallel.map([1, 2, 3]) do |i|
  raise Parallel::Break # -> stops after all current items are finished
end
Parallel.map([1, 2, 3]) { |i| raise Parallel::Break, i if i == 2 } == 2

Kill

Only use if whatever is executing in the sub-command is safe to kill at any point

Parallel.map([1,2,3]) do |x|
  raise Parallel::Kill if x == 1# -> stop all sub-processes, killing them instantly
  sleep 100 # Do stuff
end

Progress / ETA

# gem install ruby-progressbar

Parallel.map(1..50, progress: "Doing stuff") { sleep 1 }

# Doing stuff | ETA: 00:00:02 | ====================               | Time: 00:00:10

Use :finish or :start hook to get progress information.

  • :start has item and index
  • :finish has item, index, result

They are called on the main process and protected with a mutex.

Parallel.map(1..100, finish: -> (item, i, result) { ... do something ... }) { sleep 1 }

NOTE: If all you are trying to do is get the index, it is much more performant to use each_with_index instead.

Worker number

Use Parallel.worker_number to determine the worker slot in which your task is running.

Parallel.each(1..5, :in_processes => 2) { |i| puts "Item: #{i}, Worker: #{Parallel.worker_number}" }
Item: 1, Worker: 1
Item: 2, Worker: 0
Item: 3, Worker: 1
Item: 4, Worker: 0
Item: 5, Worker: 1

Tips

Here are a few notable options.

  • [Benchmark/Test] Disable threading/forking with in_threads: 0 or in_processes: 0, great to test performance or to debug parallel issues
  • [Isolation] Do not reuse previous worker processes: isolation: true
  • [Stop all processses with an alternate interrupt signal] 'INT' (from ctrl+c) is caught by default. Catch 'TERM' (from kill) with interrupt_signal: 'TERM'
  • [Process count via ENV] PARALLEL_PROCESSOR_COUNT=16 will use 16 instead of the number of processors detected. This is used to reconfigure a tool using parallel without inserting custom logic.

TODO

  • Replace Signal trapping with simple rescue Interrupt handler

Author: Grosser
Source Code: https://github.com/grosser/parallel 
License: MIT license

#ruby #parallel #cpu 

Ruby: Parallel Processing Made Simple and Fast
Nat  Grady

Nat Grady

1657732620

iStats: A Menubar App to Show CPU and Memory Usage for Mac

iStats

An Electron app on Mac menubar to display CPU and memory stats in a dropdown panel.

App Preview

demo

Screenshots

screenshot1 screenshot2

Download

Please download at release page.

Components Used

  • os-usage - Node module to track Mac OS system usage in real time
  • menubar - high level way to create menubar desktop applications with electron

Run it on your Mac

  • Clone this repo https://github.com/ningt/iStats.git
  • cd iStats
  • run npm install && npm start

Author: Ningt
Source Code: https://github.com/ningt/iStats 
License: 

#electron #mac #cpu #javascript 

iStats: A Menubar App to Show CPU and Memory Usage for Mac
Oral  Brekke

Oral Brekke

1655913136

Doc: Get Usage and Health Data About Your Node.js Process

doc

Get usage and health data about your Node.js process.

doc is a small module that helps you collect health metrics about your Node.js process. It does that by using only the API available on Node itself (no native dependencies). It doesn't have any ties with an APM platform, so you are free to use anything you want for that purpose. Its API lets you access both computed and raw values, where possible.

Installation

latest stable version

$ npm i @dnlup/doc

latest development version

$ npm i @dnlup/doc@next

Usage

You can import the module by using either CommonJS or ESM.

By default doc returns a Sampler instance that collects metrics about cpu, memory usage, event loop delay and event loop utilization (only on Node versions that support it).

Importing with CommonJS

const doc = require('@dnlup/doc')

const sampler = doc() // Use the default options

sampler.on('sample', () => {
  doStuffWithCpuUsage(sampler.cpu.usage)
  doStuffWithMemoryUsage(sampler.memory)
  doStuffWithEventLoopDelay(sampler.eventLoopDelay.computed)
  doStuffWithEventLoopUtilization(sampler.eventLoopUtilization.utilization) // Available only on Node versions that support it
})

Importing with ESM

import doc from '@dnlup/doc'

const sampler = doc()

sampler.on('sample', () => {
  doStuffWithCpuUsage(sampler.cpu.usage)
  doStuffWithMemoryUsage(sampler.memory)
  doStuffWithEventLoopDelay(sampler.eventLoopDelay.computed)
  doStuffWithEventLoopUtilization(sampler.eventLoopUtilization.utilization) // Available only on Node versions that support it
})

Note

A Sampler holds a snapshot of the metrics taken at the specified sample interval. This behavior makes the instance stateful. On every tick, a new snapshot will overwrite the previous one.

Enable/disable metrics collection

You can disable the metrics that you don't need.

const doc = require('@dnlup/doc')

// Collect only the event loop delay
const sampler = doc({ collect: { cpu: false, memory: false } })

sampler.on('sample', () => {
  // `sampler.cpu` will be `undefined`
  // `sampler.memory` will be `undefined`
  doStuffWithEventLoopDelay(sampler.eventLoopDelay.computed)
  doStuffWithEventLoopUtilization(sampler.eventLoopUtilization.utilization) // Available only on Node versions that support it
})

You can enable more metrics if you need them.

Garbage collection

const doc = require('@dnlup/doc')

const sampler = doc({ collect: { gc: true } })
sampler.on('sample', () => {
  doStuffWithCpuUsage(sampler.cpu.usage)
  doStuffWithMemoryUsage(sampler.memory)
  doStuffWithEventLoopDelay(sampler.eventLoopDelay.computed)
  doStuffWithEventLoopUtilization(sampler.eventLoopUtilization.utilization) // Available only on Node versions that support it
  doStuffWithGarbageCollectionDuration(sampler.gc.pause)
})

Active handles

const doc = require('@dnlup/doc')

const sampler = doc({ collect: { activeHandles: true } })

sampler.on('sample', () => {
  doStuffWithCpuUsage(sampler.cpu.usage)
  doStuffWithMemoryUsage(sampler.memory)
  doStuffWithEventLoopDelay(sampler.eventLoopDelay.computed)
  doStuffWithEventLoopUtilization(sampler.eventLoopUtilization.utilization) // Available only on Node versions that support it
  doStuffWithActiveHandles(sampler.activeHandles)
})

Examples

You can find more examples in the examples folder.

API

doc([options])

It creates a metrics Sampler instance with the given options.

Class: doc.Sampler

Metrics sampler.

It collects the selected metrics at a regular interval. A Sampler instance is stateful so, on each tick, only the values of the last sample are available. Each time the sampler emits the sample event, it will overwrite the previous one.

new doc.Sampler([options])

  • options <Object>
    • sampleInterval <number>: sample interval (ms) to get a sample. On each sampleInterval ms a sample event is emitted. Default: 500 on Node < 11.10.0, 1000 otherwise. Under the hood the package uses monitorEventLoopDelay when available to track the event loop delay and this allows to increase the default sampleInterval.
    • autoStart <boolean>: start automatically to collect metrics. Default: true.
    • unref <boolean>: unref the timer used to schedule the sampling interval. Default: true.
    • gcOptions <Object>: Garbage collection options
    • eventLoopDelayOptions <Object>: Options to setup monitorEventLoopDelay. Default: { resolution: 10 }
    • collect <Object>: enable/disable the collection of specific metrics.
      • cpu <boolean>: enable cpu metric. Default: true.
      • resourceUsage <boolean>: enable resourceUsage metric. Default: false.
      • eventLoopDelay <boolean>: enable eventLoopDelay metric. Default: true.
      • eventLoopUtilization <boolean>: enable eventLoopUtilization metric. Default: true on Node version 12.19.0 and newer.
      • memory <boolean>: enable memory metric. Default: true.
      • gc <boolean>: enable garbage collection metric. Default: false.
      • activeHandles <boolean>: enable active handles collection metric. Default: false.

If options.collect.resourceUsage is set to true, options.collect.cpu will be set to false because the cpu metric is already available in the resource usage metric.

Event: 'sample'

Emitted every sampleInterval, it signals that new data the sampler has collected new data.

sampler.start()

Start collecting metrics.

sampler.stop()

Stop collecting metrics.

sampler.cpu

Resource usage metric instance.

sampler.resourceUsage

Resource usage metric instance.

sampler.eventLoopDelay

Event loop delay metric instance.

sampler.eventLoopUtilization

Event loop utilization metric instance.

sampler.gc

Garbage collector metric instance.

sampler.activeHandles

  • <number>

Number of active handles returned by process._getActiveHandles().

sampler.memory

  • <object>

Object returned by process.memoryUsage().

Class: CpuMetric

It exposes both computed and raw values of the cpu usage.

cpuMetric.usage

  • <number>

Cpu usage in percentage.

cpuMetric.raw

  • <object>

Raw value returned by process.cpuUsage().

Class: ResourceUsageMetric

It exposes both computed and raw values of the process resource usage.

resourceUsage.cpu

  • <number>

Cpu usage in percentage.

resourceUsage.raw

  • <object>

Raw value returned by process.resourceUsage().

Class: EventLoopDelayMetric

It exposes both computed and raw values about the event loop delay.

eventLoopDelay.computed

  • <number>

Event loop delay in milliseconds. On Node versions that support monitorEventLoopDelay, it computes this value using the mean of the Histogram instance. Otherwise, it uses a simple timer to calculate it.

eventLoopDelay.raw

  • <Histogram> | <number>

On Node versions that support monitorEventLoopDelay this exposes the Histogram instance. Otherwise, it exposes the raw delay value in nanoseconds.

eventLoopDelay.compute(raw)

  • raw <number> The raw value obtained using the Histogram API.
  • Returns <number> The computed delay value.

This function works only on node versions that support monitorEventLoopDelay. It allows to get computed values of the event loop delay from statistics other than the mean of the Histogram instance.

Class: EventLoopUtilizationMetric

It exposes statistics about the event loop utilization.

eventLoopUtilization.idle

  • <number>

The idle value in the object returned by performance.eventLoopUtilization() during the sampleInterval window.

eventLoopUtilization.active

  • <number>

The active value in the object returned by performance.eventLoopUtilization() during the sampleInterval window.

eventLoopUtilization.utilization

  • <number>

The utilization value in the object returned by performance.eventLoopUtilization() during the sampleInterval window.

eventLoopUtilization.raw

  • <object>

Raw value returned by performance.eventLoopUtilization() during the sampleInterval window.

Class: GCMetric

It exposes the garbage collector activity statistics in the specified sampleInterval using hdr histograms.

new GCMetric(options)

  • options <object>: Configuration options

gcMetric.pause

It tracks the global activity of the garbage collector.

gcMetric.major

The activity of the operation of type major. It's present only if GCMetric has been created with the option aggregate equal to true.

See performanceEntry.kind.

gcMetric.minor

The activity of the operation of type minor. It's present only if GCMetric has been created with the option aggregate equal to true.

See performanceEntry.kind.

gcMetric.incremental

The activity of the operation of type incremental. It's present only if GCMetric has been created with the option aggregate equal to true.

See performanceEntry.kind.

gcMetric.weakCb

The activity of the operation of type weakCb. It's present only if GCMetric has been created with the option aggregate equal to true.

See performanceEntry.kind.

Class: GCEntry

It contains garbage collection data, represented with an hdr histogram. All timing values are expressed in nanoseconds.

new GCEntry()

The initialization doesn't require options. It is created internally by a GCMetric.

gcEntry.totalDuration

  • <number>

It is the total time of the entry in nanoseconds.

gcEntry.totalCount

  • <number>

It is the total number of operations counted.

gcEntry.mean

  • <number>

It is the mean value of the entry in nanoseconds.

gcEntry.max

  • <number>

It is the maximum value of the entry in nanoseconds.

gcEntry.min

  • <number>

It is the minimum value of the entry in nanoseconds.

gcEntry.stdDeviation

  • <number>

It is the standard deviation of the entry in nanoseconds.

gcEntry.summary

  • <object>

The hdr histogram summary. See https://github.com/HdrHistogram/HdrHistogramJS#record-values-and-retrieve-metrics.

gcEntry.getPercentile(percentile)

  • percentile <number>: Get a percentile from the histogram.
  • Returns <number> The percentile

See https://github.com/HdrHistogram/HdrHistogramJS#record-values-and-retrieve-metrics.

Class: GCAggregatedEntry

It extends GCEntry and contains garbage collection data plus the flags associated with it (see https://nodejs.org/docs/latest-v12.x/api/perf_hooks.html#perf_hooks_performanceentry_flags).

new GCAggregatedEntry()

The initialization doesn't require options. It is created internally by a GCMetric.

gcAggregatedEntry.flags

  • <object>

This object contains the various hdr histograms of each flag.

gcAggregatedEntry.flags.no

gcAggregatedEntry.flags.constructRetained

gcAggregatedEntry.flags.forced

gcAggregatedEntry.flags.synchronousPhantomProcessing

gcAggregatedEntry.flags.allAvailableGarbage

gcAggregatedEntry.flags.allExternalMemory

gcAggregatedEntry.flags.scheduleIdle

doc.eventLoopUtilizationSupported

  • <boolean>

It tells if the Node.js version in use supports the eventLoopUtilization metric.

doc.resourceUsageSupported

  • <boolean>

It tells if the Node.js version in use supports the resourceUsage metric.

doc.gcFlagsSupported

  • <boolean>

It tells if the Node.js version in use supports GC flags.

doc.errors

In the errors object are exported all the custom errors used by the module.

ErrorError CodeDescription
InvalidArgumentErrorDOC_ERR_INVALID_ARGAn invalid option or argument was used
NotSupportedErrorDOC_ERR_NOT_SUPPORTEDA metric is not supported on the Node.js version used

Author: Dnlup
Source Code: https://github.com/dnlup/doc 
License: ISC license

#node #cpu #javascript 

Doc: Get Usage and Health Data About Your Node.js Process

Android Studio: See information API Call, Memory and CPU use Profiler

In this video, I show you How to See information API Call, Memory and CPU use Profiler in Android Studio, I hope you’ve enjoyed this video.

#android studio #api #cpu

Android Studio: See information API Call, Memory and CPU  use Profiler

How to Fix A Slow WordPress Admin (Dashboard): 17 Tips!

A simple tutorial on speeding up a slow WordPress admin panel (dashboard).
Written tutorial: https://onlinemediamasters.com/slow-wordpress-admin-panel/

0:00 - Intro
00:27 - Disable object cache
00:57 - Remove high CPU plugins + theme
2:38 - Disable WordPress Heartbeat + increase cache lifespan
3:52 - Update technology
4:21 - Increase memory limit
4:42 - Use server-level caching
5:20 - Clean database
6:19 - Disable unused plugin modules
6:35 - Limit post revisions + autosaves
7:42 - Protect the wp-admin area
9:32 - Offload to CDNs
10:16 - Remove admin bloat
11:06 - Disable plugin data sharing
11:31 - Replace WordPress cron with a real cron job
12:06 - Check CPU usage + TTFB
13:34 - Use faster cloud hosting

A slow WordPress admin is almost always caused by high CPU plugins and page builders, object cache in W3 Total Cache, or cheap, shared hosting. The whole point is to reduce the amount of CPU consumed by your website/plugins while using a powerful server. I would also never use heavy page builders like Elementor, Divi, or WooCommerce on top of shared hosting.

Like and subscribe if you found this helpful :)

Peace out,
Tom

#wordpress #cpu

How to Fix A Slow WordPress Admin (Dashboard): 17 Tips!

Xgboost regression training on CPU and GPU in python

How to unlock the fast training of xgboost models in Python using a GPU

In this article, I want to go along with the steps that are needed to train xgboost models using a GPU and not the default CPU.

Additionally, an analysis of how the training speeds are influenced by the sizes of the matrices and certain hyperparameters is presented as well.

Feel free to clone or fork all the code from here: https://github.com/Eligijus112/xgboost-regression-gpu.

In order to train machine learning models on a GPU you need to have on your machine, well, a Graphical Processing Unit — GPU - a graphics card. By default, machine learning frameworks search for a Central Processing Unit — CPU — inside a computer.

#machine-learning #python #gpu #regression #cpu #xgboost regression training on cpu and gpu in python

Xgboost regression training on CPU and GPU in python

Mohamed Farid

1622187762

Face Detection in 5 minutes - 70 FPS on CPU

Face Detection which is topic of this video. We are going to talk about What it is, how it works, and how to implement 70+Frames Per Second Face Detection on CPU using the MediaPipe Library…All in under 5 minutes.

First up, What is face detection. Well its quite intuitive just detecting faces right? Well the more formal definition is that Face detection is a technology that enables computers to identify human faces in digital images. It is a type of object-class detection in which the task is to find the locations and sizes of all objects in an image that belong to a given class, in this case it is faces.

So the reason why you’d be interested in using face detection, is as we mentioned earlier is to find out

  1. How many faces you have in an image,
  2. The Size of the face, which can also help determine how far the face is from your camera and
  3. The location of the face in the image. Manipulating objects, both physical and/or virtual, like in the case of the invisibility shield project, is just one of many applications of face detection. The other application is to get people to like and subscribe to this video…haha just kidding.

Face Detection is not to be confused with Facial Recognition which is the process of identifying each individual rather than all faces being the same. We’ll cover recognition in a separate video.

There are a massive amount of ways in which you can implement Face Detection. Before Deep Learning, face detection use to be done using Haar Cascades.

Eventually it evolved using machine learning to do feature extraction using Histogram of Oriented Gradients, also know as HOG used in Dlib Frontal Face detector. Currently using facedetection can be implemented with a Deep Neural Network or DNN using a ResNet-10 Architecture.

In our implementation we will use BlazeFace which is a ultrafast face detection model that is both lightweight and accurate.

In this tutorial we will be using the MediaPipe implementation of BlazeFace to achieve over 70+ Frames per second on a 720P video and on CPU! That’s right! No need for expensive CUDA enabled GPU’s/

Subscribe: https://www.youtube.com/c/AugmentedStartups/featured

#cpu #fps

Face Detection in 5 minutes - 70 FPS on CPU