1680255320
Colossal-AI: Making large AI models cheaper, faster and more accessible.
Colossal-AI provides a collection of parallel components for you. We aim to support you to write your distributed deep learning models just like how you write your model on your laptop. We provide user-friendly tools to kickstart distributed training and inference in a few lines.
Parallelism strategies
Heterogeneous Memory Management
Friendly Usage
Inference
24x larger model size on the same hardware
over 3x acceleration
2x faster training, or 50% longer sequence length
Please visit our documentation and examples for more details.
ColossalChat: An open-source solution for cloning ChatGPT with a complete RLHF pipeline. [code] [blog] [demo]
Acceleration of AIGC (AI-Generated Content) models such as Stable Diffusion v1 and Stable Diffusion v2.
Acceleration of AlphaFold Protein Structure
Requirements:
If you encounter any problem about installation, you may want to raise an issue in this repository.
You can easily install Colossal-AI with the following command. By default, we do not build PyTorch extensions during installation.
pip install colossalai
Note: only Linux is supported for now.
However, if you want to build the PyTorch extensions during installation, you can set CUDA_EXT=1
.
CUDA_EXT=1 pip install colossalai
Otherwise, CUDA kernels will be built during runtime when you actually need it.
We also keep release the nightly version to PyPI on a weekly basis. This allows you to access the unreleased features and bug fixes in the main branch. Installation can be made via
pip install colossalai-nightly
The version of Colossal-AI will be in line with the main branch of the repository. Feel free to raise an issue if you encounter any problem. :)
git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI
# install colossalai
pip install .
By default, we do not compile CUDA/C++ kernels. ColossalAI will build them during runtime. If you want to install and enable CUDA kernel fusion (compulsory installation when using fused optimizer):
CUDA_EXT=1 pip install .
You can directly pull the docker image from our DockerHub page. The image is automatically uploaded upon release.
Run the following command to build a docker image from Dockerfile provided.
Building Colossal-AI from scratch requires GPU support, you need to use Nvidia Docker Runtime as the default when doing
docker build
. More details can be found here. We recommend you install Colossal-AI from our project page directly.
cd ColossalAI
docker build -t colossalai ./docker
Run the following command to start the docker container in interactive mode.
docker run -ti --gpus all --rm --ipc=host colossalai bash
Join the Colossal-AI community on Forum, Slack, and WeChat(微信) to share your suggestions, feedback, and questions with our engineering team.
Referring to the successful attempts of BLOOM and Stable Diffusion, any and all developers and partners with computing powers, datasets, models are welcome to join and build the Colossal-AI community, making efforts towards the era of big AI models!
You may contact us or participate in the following ways:
Thanks so much to all of our amazing contributors!
We leverage the power of GitHub Actions to automate our development, release and deployment workflows. Please check out this documentation on how the automated workflows are operated.
This project is inspired by some related projects (some by our team and some by other organizations). We would like to credit these amazing projects as listed in the Reference List.
To cite this project, you can use the following BibTeX citation.
@article{bian2021colossal,
title={Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training},
author={Bian, Zhengda and Liu, Hongxin and Wang, Boxiang and Huang, Haichen and Li, Yongbin and Wang, Chuanrui and Cui, Fan and You, Yang},
journal={arXiv preprint arXiv:2110.14883},
year={2021}
}
Colossal-AI has been accepted as official tutorials by top conference SC, AAAI, PPoPP, CVPR, ISC, etc.
Prof. James Demmel (UC Berkeley): Colossal-AI makes training AI models efficient, easy, and scalable.
Author: hpcaitech
Source Code: https://github.com/hpcaitech/ColossalAI
License: Apache-2.0 license
1680194040
Shelf makes it easy to create and compose web servers and parts of web servers. How?
See the Dart HTTP server documentation for more information. You may also want to look at package:shelf_router and package:shelf_static as examples of packages that build on and extend package:shelf
.
See example/example.dart
import 'package:shelf/shelf.dart';
import 'package:shelf/shelf_io.dart' as shelf_io;
void main() async {
var handler =
const Pipeline().addMiddleware(logRequests()).addHandler(_echoRequest);
var server = await shelf_io.serve(handler, 'localhost', 8080);
// Enable content compression
server.autoCompress = true;
print('Serving at http://${server.address.host}:${server.port}');
}
Response _echoRequest(Request request) =>
Response.ok('Request for "${request.url}"');
A Handler is any function that handles a Request and returns a Response. It can either handle the request itself–for example, a static file server that looks up the requested URI on the filesystem–or it can do some processing and forward it to another handler–for example, a logger that prints information about requests and responses to the command line.
The latter kind of handler is called "middleware", since it sits in the middle of the server stack. Middleware can be thought of as a function that takes a handler and wraps it in another handler to provide additional functionality. A Shelf application is usually composed of many layers of middleware with one or more handlers at the very center; the Pipeline class makes this sort of application easy to construct.
Some middleware can also take multiple handlers and call one or more of them for each request. For example, a routing middleware might choose which handler to call based on the request's URI or HTTP method, while a cascading middleware might call each one in sequence until one returns a successful response.
Middleware that routes requests between handlers should be sure to update each request's handlerPath
and url
. This allows inner handlers to know where they are in the application so they can do their own routing correctly. This can be easily accomplished using Request.change()
:
// In an imaginary routing middleware...
var component = request.url.pathSegments.first;
var handler = _handlers[component];
if (handler == null) return Response.notFound(null);
// Create a new request just like this one but with whatever URL comes after
// [component] instead.
return handler(request.change(path: component));
An adapter is any code that creates Request objects, passes them to a handler, and deals with the resulting Response. For the most part, adapters forward requests from and responses to an underlying HTTP server; shelf_io.serve is this sort of adapter. An adapter might also synthesize HTTP requests within the browser using window.location
and window.history
, or it might pipe requests directly from an HTTP client to a Shelf handler.
An adapter must handle all errors from the handler, including the handler returning a null
response. It should print each error to the console if possible, then act as though the handler returned a 500 response. The adapter may include body data for the 500 response, but this body data must not include information about the error that occurred. This ensures that unexpected errors don't result in exposing internal information in production by default; if the user wants to return detailed error descriptions, they should explicitly include middleware to do so.
An adapter should ensure that asynchronous errors thrown by the handler don't cause the application to crash, even if they aren't reported by the future chain. Specifically, these errors shouldn't be passed to the root zone's error handler; however, if the adapter is run within another error zone, it should allow these errors to be passed to that zone. The following function can be used to capture only errors that would otherwise be top-leveled:
/// Run [callback] and capture any errors that would otherwise be top-leveled.
///
/// If [this] is called in a non-root error zone, it will just run [callback]
/// and return the result. Otherwise, it will capture any errors using
/// [runZoned] and pass them to [onError].
void catchTopLevelErrors(
void Function() callback,
void Function(Object error, StackTrace stackTrace) onError,
) {
if (Zone.current.inSameErrorZone(Zone.root)) {
return runZonedGuarded(callback, onError);
} else {
return callback();
}
}
An adapter that knows its own URL should provide an implementation of the Server
interface.
When implementing an adapter, some rules must be followed. The adapter must not pass the url
or handlerPath
parameters to Request; it should only pass requestedUri
. If it passes the context
parameter, all keys must begin with the adapter's package name followed by a period. If multiple headers with the same name are received, the adapter must collapse them into a single header separated by commas as per RFC 2616 section 4.2.
If the underlying request uses a chunked transfer coding, the adapter must decode the body before passing it to Request and should remove the Transfer-Encoding
header. This ensures that message bodies are chunked if and only if the headers declare that they are.
An adapter must not add or modify any entity headers for a response.
If none of the following conditions are true, the adapter must apply chunked transfer coding to a response's body and set its Transfer-Encoding header to chunked
:
multipart/byteranges
.identity
.Adapters may find the addChunkedEncoding()
middleware useful for implementing this behavior, if the underlying server doesn't implement it manually.
When responding to a HEAD request, the adapter must not emit an entity body. Otherwise, it shouldn't modify the entity body in any way.
An adapter should include information about itself in the Server header of the response by default. If the handler returns a response with the Server header set, that must take precedence over the adapter's default header.
An adapter should include the Date header with the time the handler returns a response. If the handler returns a response with the Date header set, that must take precedence.
Run this command:
With Dart:
$ dart pub add shelf
With Flutter:
$ flutter pub add shelf
This will add a line like this to your package's pubspec.yaml (and run an implicit dart pub get
):
dependencies:
shelf: ^1.4.0
Alternatively, your editor might support dart pub get
or flutter pub get
. Check the docs for your editor to learn more.
Now in your Dart code, you can use:
import 'package:shelf/shelf.dart';
// Copyright (c) 2014, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.
import 'package:shelf/shelf.dart';
import 'package:shelf/shelf_io.dart' as shelf_io;
void main() async {
var handler =
const Pipeline().addMiddleware(logRequests()).addHandler(_echoRequest);
var server = await shelf_io.serve(handler, 'localhost', 8080);
// Enable content compression
server.autoCompress = true;
print('Serving at http://${server.address.host}:${server.port}');
}
Response _echoRequest(Request request) =>
Response.ok('Request for "${request.url}"');
Download Details:
Author: tools.dart.dev
Source Code: https://github.com/dart-lang/shelf/tree/master/pkgs/shelf
1679907000
This is the repo for the Code Alpaca project, which aims to build and share an instruction-following LLaMA model for code generation. This repo is fully based on Stanford Alpaca ,and only changes the data used for training. Training approach is the same.
The Code Alpaca models are fine-tuned from a 7B and 13B LLaMA model on 20K instruction-following data generated by the techniques in the Self-Instruct [1] paper, with some modifications that we discuss in the next section. Evals are still a todo.
The model is not finetuned to be safe and harmless, so be cautious.
Current release contains the data generation procedure, dataset, and training code. Model weights aren't part of the release for now, to respect OpenAI TOS and LLaMA license.
[1]: Self-Instruct: Aligning Language Model with Self Generated Instructions. Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi. https://arxiv.org/abs/2212.10560
data/code_alpaca_20k.json
contains 20K instruction-following data used for fine-tuning the Code Alpaca model. This JSON file is a list of dictionaries, each dictionary contains the following fields:
instruction
: str
, describes the task the model should perform. Each of the 20K instructions is unique.input
: str
, optional context or input for the task. For example, when the instruction is "Amend the following SQL query to select distinct elements", the input is the SQL query. Around 40% of the examples have an input.output
: str
, the answer to the instruction as generated by text-davinci-003
.We used the following prompts for fine-tuning the model:
{instruction}
{input}
- for examples with an empty input field:
Below is an instruction that describes a task. Write a response that appropriately completes the request.
{instruction}
During inference (eg for the web demo), we use the user instruction with an empty input field (second option).
## Data Generation Process
<details>
<summary> <strong> Running the code </strong> </summary>
1. Set environment variables `OPENAI_API_KEY` to your OpenAI API key.
2. Install the dependencies with `pip install -r requirements.txt`.
3. Run `python -m generate_instruction generate_instruction_following_data` to generate the data.
</details>
Data generation pipeline had minor changes from [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
- Modified prompt to focus on code generation/editing/optimization tasks instead of general tasks.
- Modified seed tasks to only be related to code generation.
This produced an instruction-following dataset with 20K examples obtained at a much lower cost (less than $200). Also including a smaller 2k samples dataset which was used to derisk the approach and quality of the model.
## Fine-tuning
Finetuned the models using standard Hugging Face training code and deepspeed with the following hyperparameters:
| Hyperparameter | Value |
|----------------|-------|
| Learning rate | 2e-5 |
| Epochs | 3 |
| Max length | 512 |
| Weight decay | 0 |
Given Hugging Face hasn't officially supported the LLaMA models, we fine-tuned LLaMA with Hugging Face's transformers library by installing it from a particular fork (i.e. this [PR](https://github.com/huggingface/transformers/pull/21955) to be merged).
The hash of the specific commit we installed was `68d640f7c368bcaaaecfc678f11908ebbd3d6176`.
The code runs on a 8xA100 80GB, but can also run on 8xA10040GB or 4xA100 with lower batch size and gradient accumulation steps. To get the GPUs, I suggest using [Lambda Labs](https://cloud.lambdalabs.com/login?redirect_to=/instances?), best pricing for the best hardware.
To reproduce the fine-tuning runs for LLaMA, first install the requirements
```bash
pip install -r requirements.txt
Then, install the particular fork of Hugging Face's transformers library.
Below is a command that fine-tunes LLaMA-7B with our dataset on a machine with 4 A100 80G GPUs in FSDP full_shard
mode. We were able to reproduce a model of similar quality as the one we hosted in our demo with the following command using Python 3.10. Replace <your_random_port>
with a port of your own, <your_path_to_hf_converted_llama_ckpt_and_tokenizer>
with the path to your converted checkpoint and tokenizer (following instructions in the PR), and <your_output_dir>
with where you want to store your outputs.
torchrun --nproc_per_node=8 --master_port=<your_random_port> train.py \
--model_name_or_path <your_path_to_hf_converted_llama_ckpt_and_tokenizer>
--data_path ./data/code_alpaca_20k.json \
--fp16 True \
--output_dir <your_output_dir> \
--num_train_epochs 3 \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 8 \
--gradient_accumulation_steps 4 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 500 \
--save_total_limit 1 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--deepspeed ds_config.json
--tf32 False
Note the given training script is meant to be simple and easy to use, and is not particularly optimized.
For convenience I have included the convert_to_hf.py
to covnert llama checkpoints to huggingface compatible checkpoints. (This file is taken from the hugginface transformers repo)
Cite this repo if you want to, or don't, both are fine.
@misc{codealpaca,
author = {Sahil Chaudhary},
title = {Code Alpaca: An Instruction-following LLaMA model for code generation},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/sahil280114/codealpaca}},
}
Naturally, you should also cite the original LLaMA paper [1] and the Self-Instruct paper [2] and the Stanford Alpaca repo.
The repo contains:
Demo for the model can be found https://code-alpaca-demo.vercel.app/
Author: Sahil280114
Source Code: https://github.com/sahil280114/codealpaca
License: Apache-2.0 license
1679522640
MMD model dances on your chrome with WebGL. (MMD(MikuMikuDance) is a 3D CG animation tool)
Prolly only Windows Chrome. I haven't checked other platforms.
Prolly not yet. I haven't checked.
Boot up your chrome with "--allow-file-access-from-files"
Only .pmd now. .pmx and .x would come soon.
Choose light model. Reduce the number of models show up. Turn off Physics, Stage, Edge and Post-effect.
No any WebGL 3D libraries, yeah!
Author: Takahirox
Source Code: https://github.com/takahirox/mmd-viewer-js
1679033760
This is the repo for the Stanford Alpaca project, which aims to build and share an instruction-following LLaMA model. The repo contains:
The current Alpaca model is fine-tuned from a 7B LLaMA model [1] on 52K instruction-following data generated by the techniques in the Self-Instruct [2] paper, with some modifications that we discuss in the next section. In a preliminary human evaluation, we found that the Alpaca 7B model behaves similarly to the text-davinci-003
model on the Self-Instruct instruction-following evaluation suite [2].
Alpaca is still under development, and there are many limitations that have to be addressed. Importantly, we have not yet fine-tuned the Alpaca model to be safe and harmless. We thus encourage users to be cautious when interacting with Alpaca, and to report any concerning behavior to help improve the safety and ethical considerations of the model.
Our initial release contains the data generation procedure, dataset, and training recipe. We intend to release the model weights if we are given permission to do so by the creators of LLaMA. For now, we have chosen to host a live demo to help readers better understand the capabilities and limits of Alpaca, as well as a way to help us better evaluate Alpaca's performance on a broader audience.
Please read our release blog post for more details about the model, our discussion of the potential harm and limitations of Alpaca models, and our thought process for releasing a reproducible model.
[1]: LLaMA: Open and Efficient Foundation Language Models. Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. https://arxiv.org/abs/2302.13971v1
[2]: Self-Instruct: Aligning Language Model with Self Generated Instructions. Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi. https://arxiv.org/abs/2212.10560
alpaca_data.json
contains 52K instruction-following data we used for fine-tuning the Alpaca model. This JSON file is a list of dictionaries, each dictionary contains the following fields:
instruction
: str
, describes the task the model should perform. Each of the 52K instructions is unique.input
: str
, optional context or input for the task. For example, when the instruction is "Summarize the following article", the input is the article. Around 40% of the examples have an input.output
: str
, the answer to the instruction as generated by text-davinci-003
.We used the following prompts for fine-tuning the Alpaca model:
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Input:
{input}
### Response:
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:
During inference (eg for the web demo), we use the user instruction with an empty input field (second option).
Running the code
We built on the data generation pipeline from self-instruct and made the following modifications:
text-davinci-003
to generate the instruction data instead of davinci
.prompt.txt
) that explicitly gave the requirement of instruction generation to text-davinci-003
. Note: there is a slight error in the prompt we used, and future users should incorporate the edit in #24This produced an instruction-following dataset with 52K examples obtained at a much lower cost (less than $500). In a preliminary study, we also find our 52K generated data to be much more diverse than the data released by self-instruct. We plot the below figure (in the style of Figure 2 in the self-instruct paper to demonstrate the diversity of our data. The inner circle of the plot represents the root verb of the instructions, and the outer circle represents the direct objects.
We fine-tune our models using standard Hugging Face training code with the following hyperparameters:
Hyperparameter | Value |
---|---|
Batch size | 128 |
Learning rate | 2e-5 |
Epochs | 3 |
Max length | 512 |
Weight decay | 0 |
Given Hugging Face hasn't officially supported the LLaMA models, we fine-tuned LLaMA with Hugging Face's transformers library by installing it from a particular fork (i.e. this PR to be merged). The hash of the specific commit we installed was 68d640f7c368bcaaaecfc678f11908ebbd3d6176
.
To reproduce our fine-tuning runs for LLaMA, first install the requirements
pip install -r requirements.txt
Then, install the particular fork of Hugging Face's transformers library.
Below is a command that fine-tunes LLaMA-7B with our dataset on a machine with 4 A100 80G GPUs in FSDP full_shard
mode. We were able to reproduce a model of similar quality as the one we hosted in our demo with the following command using Python 3.10. Replace <your_random_port>
with a port of your own, <your_path_to_hf_converted_llama_ckpt_and_tokenizer>
with the path to your converted checkpoint and tokenizer (following instructions in the PR), and <your_output_dir>
with where you want to store your outputs.
torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
--model_name_or_path <your_path_to_hf_converted_llama_ckpt_and_tokenizer> \
--data_path ./alpaca_data.json \
--bf16 True \
--output_dir <your_output_dir> \
--num_train_epochs 3 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 8 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 2000 \
--save_total_limit 1 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--fsdp "full_shard auto_wrap" \
--fsdp_transformer_layer_cls_to_wrap 'LLaMADecoderLayer' \
--tf32 True
fsdp_transformer_layer_cls_to_wrap
must be set to the name of the specific decoder layer. The LLaMA Hugging Face PR is not stable. Earlier commits used the name LLaMADecoderLayer
for their decoder layer (the commit hash our code is based on this). More recent commits use LlamaDecoderLayer
(notice the small case difference). Not setting fsdp_transformer_layer_cls_to_wrap
to the correct name will lead to drastic slowdowns in training.
The same script also works for OPT fine-tuning. Here's an example for fine-tuning OPT-6.7B
torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
--model_name_or_path "facebook/opt-6.7b" \
--data_path ./alpaca_data.json \
--bf16 True \
--output_dir <your_output_dir> \
--num_train_epochs 3 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 8 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 2000 \
--save_total_limit 1 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--fsdp "full_shard auto_wrap" \
--fsdp_transformer_layer_cls_to_wrap 'OPTDecoderLayer' \
--tf32 True
Note the given training script is meant to be simple and easy to use, and is not particularly optimized. To run on more gpus, you may prefer to turn down gradient_accumulation_steps
to keep a global batch size of 128. Global batch size has not been tested for optimality.
All grad students below contributed equally and the order is determined by random draw.
All advised by Tatsunori B. Hashimoto. Yann is also advised by Percy Liang and Xuechen is also advised by Carlos Guestrin.
Please cite the repo if you use the data or code in this repo.
@misc{alpaca,
author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
title = {Stanford Alpaca: An Instruction-following LLaMA model},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
}
Naturally, you should also cite the original LLaMA paper [1] and the Self-Instruct paper [2].
We thank Yizhong Wang for his help in explaining the data generation pipeline in Self-Instruct and providing the code for the parse analysis plot. We thank Yifan Mai for helpful support, and members of the Stanford NLP Group as well as the Center for Research on Foundation Models (CRFM) for their helpful feedback.
Author: tatsu-lab
Source Code: https://github.com/tatsu-lab/stanford_alpaca
License: Apache-2.0 license
1678741860
The structure of the Chrono git repository was changed as follows:
main
(previously develop
)master
branch, now obsolete, was deletedrelease/*.*
and have tags of the form *.*.*
Project CHRONO
Distributed under a permissive BSD license, Chrono is an open-source multi-physics package used to model and simulate:
Chrono provides a mature and stable code base that continues to be augmented with new features and modules. The core functionality of Chrono provides support for the modeling, simulation, and visualization of rigid and flexible multibody systems with additional capabilities offered through optional modules. These modules provide support for additional classes of problems (e.g., granular dynamics and fluid-solid interaction), modeling and simulation of specialized systems (such as ground vehicles), co-simulation, run-time visualization, post-processing, interfaces to external linear solvers, or specialized parallel computing algorithms (multi-core, GPU, and distributed) for large-scale simulations.
Used in many different scientific and engineering problems by researchers from academia, industry, and government, Chrono has mature and sophisticated support for multibody dynamics, finite element analysis, granular dynamics, fluid-solid interaction, ground vehicle simulation and vehicle-terrain interaction.
Implemented almost entirely in C++, Chrono also provides Python and C# APIs. The build system is based on CMake. Chrono is platform-independent and is actively tested on Linux, Windows, and MacOS using a variety of compilers.
Author: Projectchrono
Source Code: https://github.com/projectchrono/chrono
License: BSD-3-Clause license
#machinelearning #cpluplus #robotics #model #physics #engine
1678692609
Inference of Facebook's LLaMA model in pure C/C++
The main goal is to run the model using 4-bit quantization on a MacBook
This was hacked in an evening - I have no idea if it works correctly. Please do not make conclusions about the models based on the results from this implementation. For all I know, it can be completely wrong. This project is for educational purposes and is not going to be maintained properly. New features will probably be added mostly through community contributions, if any.
Supported platforms:
Here is a typical run using LLaMA-7B:
make -j && ./main -m ./models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512
I llama.cpp build info:
I UNAME_S: Darwin
I UNAME_P: arm
I UNAME_M: arm64
I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread
I LDFLAGS: -framework Accelerate
I CC: Apple clang version 14.0.0 (clang-1400.0.29.202)
I CXX: Apple clang version 14.0.0 (clang-1400.0.29.202)
make: Nothing to be done for `default'.
main: seed = 1678486056
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 11008
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size = 512.00 MB, n_mem = 16384
llama_model_load: .................................... done
llama_model_load: model size = 4017.27 MB / num tensors = 291
main: prompt: 'Building a website can be done in 10 simple steps:'
main: number of tokens in prompt = 15
1 -> ''
8893 -> 'Build'
292 -> 'ing'
263 -> ' a'
4700 -> ' website'
508 -> ' can'
367 -> ' be'
2309 -> ' done'
297 -> ' in'
29871 -> ' '
29896 -> '1'
29900 -> '0'
2560 -> ' simple'
6576 -> ' steps'
29901 -> ':'
sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000
Building a website can be done in 10 simple steps:
1) Select a domain name and web hosting plan
2) Complete a sitemap
3) List your products
4) Write product descriptions
5) Create a user account
6) Build the template
7) Start building the website
8) Advertise the website
9) Provide email support
10) Submit the website to search engines
A website is a collection of web pages that are formatted with HTML. HTML is the code that defines what the website looks like and how it behaves.
The HTML code is formatted into a template or a format. Once this is done, it is displayed on the user's browser.
The web pages are stored in a web server. The web server is also called a host. When the website is accessed, it is retrieved from the server and displayed on the user's computer.
A website is known as a website when it is hosted. This means that it is displayed on a host. The host is usually a web server.
A website can be displayed on different browsers. The browsers are basically the software that renders the website on the user's screen.
A website can also be viewed on different devices such as desktops, tablets and smartphones.
Hence, to have a website displayed on a browser, the website must be hosted.
A domain name is an address of a website. It is the name of the website.
The website is known as a website when it is hosted. This means that it is displayed on a host. The host is usually a web server.
A website can be displayed on different browsers. The browsers are basically the software that renders the website on the user’s screen.
A website can also be viewed on different devices such as desktops, tablets and smartphones. Hence, to have a website displayed on a browser, the website must be hosted.
A domain name is an address of a website. It is the name of the website.
A website is an address of a website. It is a collection of web pages that are formatted with HTML. HTML is the code that defines what the website looks like and how it behaves.
The HTML code is formatted into a template or a format. Once this is done, it is displayed on the user’s browser.
A website is known as a website when it is hosted
main: mem per token = 14434244 bytes
main: load time = 1332.48 ms
main: sample time = 1081.40 ms
main: predict time = 31378.77 ms / 61.41 ms per token
main: total time = 34036.74 ms
And here is another demo of running both LLaMA-7B and whisper.cpp on a single M1 Pro MacBook:
Here are the step for the LLaMA-7B model:
# build this repo
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
# obtain the original LLaMA model weights and place them in ./models
ls ./models
65B 30B 13B 7B tokenizer_checklist.chk tokenizer.model
# install Python dependencies
python3 -m pip install torch numpy sentencepiece
# convert the 7B model to ggml FP16 format
python3 convert-pth-to-ggml.py models/7B/ 1
# quantize the model to 4-bits
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2
# run the inference
./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 128
For the bigger models, there are a few extra quantization steps. For example, for LLaMA-13B, converting to FP16 format will create 2 ggml files, instead of one:
ggml-model-f16.bin
ggml-model-f16.bin.1
You need to quantize each of them separately like this:
./quantize ./models/13B/ggml-model-f16.bin ./models/13B/ggml-model-q4_0.bin 2
./quantize ./models/13B/ggml-model-f16.bin.1 ./models/13B/ggml-model-q4_0.bin.1 2
Everything else is the same. Simply run:
./main -m ./models/13B/ggml-model-q4_0.bin -t 8 -n 128
The number of files generated for each model is as follows:
7B -> 1 file
13B -> 2 files
30B -> 4 files
65B -> 8 files
When running the larger models, make sure you have enough disk space to store all the intermediate files.
If you want a more ChatGPT-like experience, you can run in interactive mode by passing -i
as a parameter. In this mode, you can always interrupt generation by pressing Ctrl+C and enter one or more lines of text which will be converted into tokens and appended to the current context. You can also specify a reverse prompt with the parameter -r "reverse prompt string"
. This will result in user input being prompted whenever the exact tokens of the reverse prompt string are encountered in the generation. A typical use is to use a prompt which makes LLaMa emulate a chat between multiple users, say Alice and Bob, and pass -r "Alice:"
.
Here is an example few-shot interaction, invoked with the command
./main -m ./models/13B/ggml-model-q4_0.bin -t 8 -n 256 --repeat_penalty 1.0 --color -i -r "User:" \
-p \
"Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.
User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User:"
Note the use of --color
to distinguish between user input and generated text.
LLAMA_NO_ACCELERATE=1 make
and the performance will be the same, since no BLAS calls are invoked by the current implementationHot topics
Author: ggerganov
Source Code: https://github.com/ggerganov/llama.cpp
License: MIT license
1678240947
High-speed download of LLaMA, Facebook's 65B parameter GPT model.
This repository contains a high-speed download of LLaMA, Facebook's 65B parameter model that was recently made available via torrent. (Discussion: Facebook LLAMA is being openly distributed via torrents)
It downloads all model weights (7B, 13B, 30B, 65B) in less than two hours on a Chicago Ubuntu server.
real 98m12.980s
user 8m8.916s
sys 5m7.259s
This works out to 40MB/s (235164838073 bytes in 5892 seconds).
Personally, I just wanted to curl
the weights instead of dealing with a torrent. The fact that it's several times faster was just a nice bonus.
To download all model weights, cd
into the directory you want them, then run this:
Linux:
curl -o- https://raw.githubusercontent.com/shawwn/llama-dl/56f50b96072f42fb2520b1ad5a1d6ef30351f23c/llama.sh | bash
Mac:
brew install bash
brew install wget
curl -o- https://raw.githubusercontent.com/shawwn/llama-dl/56f50b96072f42fb2520b1ad5a1d6ef30351f23c/llama.sh | $(brew --prefix)/bin/bash
(Sorry mac users; they use some array syntax in the script that isn't supported on the version of bash that ships with Mac.)
Running random bash scripts generally isn't a good idea, but I'll stake my personal reputation on the fact that this link is safe. (It points to a specific SHA-1 hash rather than https://raw.githubusercontent.com/shawwn/llama-dl/main/llama.sh so that it's still safe even in the event that my repo or account got compromised.)
219G (235164838073 bytes) total. Here's a file list with sizes for each.
I ran this:
mkdir LLaMA
cd LLaMA
time curl -o- https://raw.githubusercontent.com/shawwn/llama-dl/56f50b96072f42fb2520b1ad5a1d6ef30351f23c/llama.sh | bash
cd ..
webtorrent 'magnet:?xt=urn:btih:b8287ebfa04f879b048d4d4404108cf3e8014352&dn=LLaMA&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce'
Webtorrent began seeding immediately, which means every file is identical to what you would've gotten via the torrent. So this is just a faster version of the torrent.
Roughly 3.6x. As of March 4 2023, the torrent seems to download at around 11MB/s, which implies a download time of around 6 hours. (Help seed it, if you can.)
I doubt it. This is using the download link that was leaked in the original torrent. (i.e. the leaker accidentally leaked their own unique download link that Facebook sent them.)
Technically, it may be illegal to knowingly use a private download link that was intended for someone else. Realistically, Facebook would risk their ML reputation by going after people who are merely trying to use what they themselves advertise as "open source."
Update: Facebook shut off the link a couple hours after this repo went live. I mirrored everything to R2 and updated the script to point to that instead.
Note that LLaMA was released under a "non-commercial bespoke license". Interestingly, Nvidia had a similar arrangement for StyleGAN, but that didn't stop Artbreeder from using it anyway. Nvidia never seemed to care enough to go after them. But if you launch your own OpenAI API and start charging money, don't be surprised when Facebook's lawyers come knocking.
Update (March 7, 3:35 PM CST): Looking to inference from the model? See https://github.com/shawwn/llama-dl/issues/1#issuecomment-1458870564 to use the improved sampler. (Facebook's sampler was using poor defaults, so no one was able to get anything good out of the model till now.)
Update (March 5, 12:52 PM CST): @anitakirkovska let us use their fabulous llama photo. If you happen to like the new header image as much as I do, be sure to check out their AI newsletter and their tweets about us.
Update (March 5, 9:51 AM CST): HN user MacsHeadroom left a valuable comment:
I'm running LLaMA-65B on a single A100 80GB with 8bit quantization. $1.5/hr on vast.ai
The output is at least as good as davinci.
I think some early results are using bad repetition penalty and/or temperature settings. I had to set both fairly high to get the best results. (Some people are also incorrectly comparing it to chatGPT/ChatGPT API which is not a good comparison. But that's a different problem.)
I've had it translate, write poems, tell jokes, banter, write executable code. It does it all-- and all on a single card.
I was shocked that this script was distributed with the original torrent, and that no one seemed to notice (a) that it still works, and (b) is almost 20x faster than the torrent method. I was impatient and curious to try to run 65B on an 8xA100 cluster, so I didn't want to wait till tomorrow and started poking around, which is when I found this. I decided to just tweet it out and let you, fellow scientists and hackers, enjoy it before Facebook notices and shuts it off.
"Power to the people" is an overused trope, but as a research scientist, I feel it's important to let individual hackers be able to experiment with the same tools, techniques, and systems that professional ML researchers are fortunate to have access to. This is a tricky situation, because at some point between now and 10 years from now, this might become dangerous -- AI alarmists often ask "Would you want random people experimenting with nuclear weapons in their basement?" My answer is "No, but we're not there yet."
Word on Twitter is that LLaMA's samples seem worse than GPT-3 by a large margin, but then I realized no one has really been able to try the full 65B model yet, for a combination of reasons. (Mostly lack of access to 8xA100 hardware.) So I decided to try it out for myself and see.
Even if it's GPT-3 level, the fact is, LLaMA is already openly available. The torrent isn't going anywhere. So my own thoughts on this are mostly irrelevant; determined hackers can get it themselves anyway.
But for what it's worth, my personal opinion is that LLaMA probably isn't OpenAI-grade -- there's a big difference between training a model in an academic setting vs when your entire company depends on it for wide-scale commercial success. I wasn't impressed that 30B didn't seem to know who Captain Picard was.
People have already started decrying this leak as dangerous. But everyone used to say the same thing about 1.5B. (In fact, the allure of 1.5B's grandiose claims was what drove me to take ML seriously in 2019.) Turns out, four years later, no one really cares about 1.5B anymore, and it certainly didn't cause wide-scale societal harm. I doubt LLaMA will either.
2023 will be interesting. I can't wait for 2024.
Signed with love,
Shawn Presser
twitter: @theshawwn
HN: sillysaurusx
HN discussion | Twitter announcement
Author: Shawwn
Source Code: https://github.com/shawwn/llama-dl
License: GPL-3.0 license
1677915663
Learn about data modeling tools to create, design and manage data models, allowing data scientists to access and use them more quickly.
Data science and data modeling techniques are interrelated. Data modeling tools allow data scientists to access and use it more quickly. Data scientists can obtain a better knowledge of the data and its underlying relationships by developing data models, which can then be used to construct predictive models and other data-driven solutions.
Data modeling is an important aspect of the software development process because it helps ensure that the database is able to efficiently store and retrieve the data, and is able to handle the expected volume and complexity of data. Data modeling tools are computer programmes used to create, design and manage data models. A data model is a graphical depiction of the structure of a database that describes the relationships between different types of data. Users can use data modeling tools to design, visualise, and edit data models. Database administrators, data analysts, and other IT professionals use them to build and describe database systems.
In this post I will explain several Data Modeling Tools to consider as they allocate scarce resources for data science work.
The majority of these Data modeling Tools offer Visual Data Modeling, Reverse Engineering, Forward engineering, Collaboration capabilities, Easy Integration and Data mapping. I have specifically mentioned only the unique features which make them different from each other.
1. Erwin Data Modeler
ED Modeler is a tool used for designing and analyzing data structures with standardized designs, it supports deployment of diagrammatic data, regardless of its location and structure, it offers automated features to generate schema and hybrid architecture.
Key features:
2. ER/Studio
Idera's ER/Studio is a data modeling tool that allows for identification of data assets and sources across multiple database systems, enables the creation and sharing of data models and tracking them from start to finish.
Key features:
3.SQL database modeler
SQL Database Modeler enables developers to create a SQL database online without having to write any code, so it’s easy to create and import scripts and works with both MS SQL Server and MySQL. Another advantage of SQL Database Modeler is that it allows several view modes.
Key features:
4. Oracle SQL Developer Data Modeler
Oracle SQL Developer Data Modeler is another excellent free Database Modeling tool that helps businesses acquire, organise, and get insights from data while increasing productivity. It provides a wide range of Data Models, including Logical, Relational, Physical, and Multi-dimensional Data Type Models.
Key features:
5. IBM Infosphere Data Architect
IBM Infosphere is well-known for its capacity to work on various data patterns and aids in standardising the interface across apps, databases, and servers. Aside from that, infosphere aids in cross-lifecycle work and organisational aid in shortening time-to-market.
Key features:
6. MySQL Workbench
When working with complex ER models, Workbench is a suitable model. It was created specifically for MySQL DB and aids in the generation, execution, and optimization of SQL queries for all major operating systems, including Mac, Linux, and Windows.
Key features:
7. Archi
Archi is heavily used in small and mid-segment organizations where data handling is required within small teams. It offers an elegant solution for providing visual data representation besides being low-cost.
Key features:
Conclusion
The use of data modeling tools helps you organize and structure your data, making it more accessible and usable for your business. Through the use of the above tools, you will be able to improve data quality, data governance, better visualization, improved integration, faster data analysis, and reduced costs. By creating a proper data model, one can optimize the database design, which will result in better performance of data storage and retrieval operations.
Original article source at: https://www.kdnuggets.com/
1676911320
Stable Diffusion was made possible thanks to a collaboration with Stability AI and Runway and builds upon our previous work:
Stable Diffusion is a latent text-to-image diffusion model. Thanks to a generous compute donation from Stability AI and support from LAION, we were able to train a Latent Diffusion Model on 512x512 images from a subset of the LAION-5B database. Similar to Google's Imagen, this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM. See this section below and the model card.
A suitable conda environment named ldm
can be created and activated with:
conda env create -f environment.yaml
conda activate ldm
You can also update an existing latent diffusion environment by running
conda install pytorch torchvision -c pytorch
pip install transformers==4.19.2 diffusers invisible-watermark
pip install -e .
Stable Diffusion v1 refers to a specific configuration of the model architecture that uses a downsampling-factor 8 autoencoder with an 860M UNet and CLIP ViT-L/14 text encoder for the diffusion model. The model was pretrained on 256x256 images and then finetuned on 512x512 images.
Note: Stable Diffusion v1 is a general text-to-image diffusion model and therefore mirrors biases and (mis-)conceptions that are present in its training data. Details on the training procedure and data, as well as the intended use of the model can be found in the corresponding model card.
The weights are available via the CompVis organization at Hugging Face under a license which contains specific use-based restrictions to prevent misuse and harm as informed by the model card, but otherwise remains permissive. While commercial use is permitted under the terms of the license, we do not recommend using the provided weights for services or products without additional safety mechanisms and considerations, since there are known limitations and biases of the weights, and research on safe and ethical deployment of general text-to-image models is an ongoing effort. The weights are research artifacts and should be treated as such.
The CreativeML OpenRAIL M license is an Open RAIL M license, adapted from the work that BigScience and the RAIL Initiative are jointly carrying in the area of responsible AI licensing. See also the article about the BLOOM Open RAIL license on which our license is based.
We currently provide the following checkpoints:
sd-v1-1.ckpt
: 237k steps at resolution 256x256
on laion2B-en. 194k steps at resolution 512x512
on laion-high-resolution (170M examples from LAION-5B with resolution >= 1024x1024
).sd-v1-2.ckpt
: Resumed from sd-v1-1.ckpt
. 515k steps at resolution 512x512
on laion-aesthetics v2 5+ (a subset of laion2B-en with estimated aesthetics score > 5.0
, and additionally filtered to images with an original size >= 512x512
, and an estimated watermark probability < 0.5
. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using the LAION-Aesthetics Predictor V2).sd-v1-3.ckpt
: Resumed from sd-v1-2.ckpt
. 195k steps at resolution 512x512
on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.sd-v1-4.ckpt
: Resumed from sd-v1-2.ckpt
. 225k steps at resolution 512x512
on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0) and 50 PLMS sampling steps show the relative improvements of the checkpoints:
Stable Diffusion is a latent diffusion model conditioned on the (non-pooled) text embeddings of a CLIP ViT-L/14 text encoder. We provide a reference script for sampling, but there also exists a diffusers integration, which we expect to see more active community development.
We provide a reference sampling script, which incorporates
After obtaining the stable-diffusion-v1-*-original
weights, link them
mkdir -p models/ldm/stable-diffusion-v1/
ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt
and sample with
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
By default, this uses a guidance scale of --scale 7.5
, Katherine Crowson's implementation of the PLMS sampler, and renders images of size 512x512 (which it was trained on) in 50 steps. All supported arguments are listed below (type python scripts/txt2img.py --help
).
usage: txt2img.py [-h] [--prompt [PROMPT]] [--outdir [OUTDIR]] [--skip_grid] [--skip_save] [--ddim_steps DDIM_STEPS] [--plms] [--laion400m] [--fixed_code] [--ddim_eta DDIM_ETA]
[--n_iter N_ITER] [--H H] [--W W] [--C C] [--f F] [--n_samples N_SAMPLES] [--n_rows N_ROWS] [--scale SCALE] [--from-file FROM_FILE] [--config CONFIG] [--ckpt CKPT]
[--seed SEED] [--precision {full,autocast}]
optional arguments:
-h, --help show this help message and exit
--prompt [PROMPT] the prompt to render
--outdir [OUTDIR] dir to write results to
--skip_grid do not save a grid, only individual samples. Helpful when evaluating lots of samples
--skip_save do not save individual samples. For speed measurements.
--ddim_steps DDIM_STEPS
number of ddim sampling steps
--plms use plms sampling
--laion400m uses the LAION400M model
--fixed_code if enabled, uses the same starting code across samples
--ddim_eta DDIM_ETA ddim eta (eta=0.0 corresponds to deterministic sampling
--n_iter N_ITER sample this often
--H H image height, in pixel space
--W W image width, in pixel space
--C C latent channels
--f F downsampling factor
--n_samples N_SAMPLES
how many samples to produce for each given prompt. A.k.a. batch size
--n_rows N_ROWS rows in the grid (default: n_samples)
--scale SCALE unconditional guidance scale: eps = eps(x, empty) + scale * (eps(x, cond) - eps(x, empty))
--from-file FROM_FILE
if specified, load prompts from this file
--config CONFIG path to config which constructs model
--ckpt CKPT path to checkpoint of model
--seed SEED the seed (for reproducible sampling)
--precision {full,autocast}
evaluate at this precision
Note: The inference config for all v1 versions is designed to be used with EMA-only checkpoints. For this reason use_ema=False
is set in the configuration, otherwise the code will try to switch from non-EMA to EMA weights. If you want to examine the effect of EMA vs no EMA, we provide "full" checkpoints which contain both types of weights. For these, use_ema=False
will load and use the non-EMA weights.
A simple way to download and sample Stable Diffusion is by using the diffusers library:
# make sure you're logged in with `huggingface-cli login`
from torch import autocast
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
use_auth_token=True
).to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
with autocast("cuda"):
image = pipe(prompt)["sample"][0]
image.save("astronaut_rides_horse.png")
By using a diffusion-denoising mechanism as first proposed by SDEdit, the model can be used for different tasks such as text-guided image-to-image translation and upscaling. Similar to the txt2img sampling script, we provide a script to perform image modification with Stable Diffusion.
The following describes an example where a rough sketch made in Pinta is converted into a detailed artwork.
python scripts/img2img.py --prompt "A fantasy landscape, trending on artstation" --init-img <path-to-img.jpg> --strength 0.8
Here, strength is a value between 0.0 and 1.0, that controls the amount of noise that is added to the input image. Values that approach 1.0 allow for lots of variations but will also produce images that are not semantically consistent with the input. See the following example.
Input
Outputs
This procedure can, for example, also be used to upscale samples from the base model.
Our codebase for the diffusion models builds heavily on OpenAI's ADM codebase and https://github.com/lucidrains/denoising-diffusion-pytorch. Thanks for open-sourcing!
The implementation of the transformer encoder is from x-transformers by lucidrains.
@misc{rombach2021highresolution,
title={High-Resolution Image Synthesis with Latent Diffusion Models},
author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
year={2021},
eprint={2112.10752},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach*, Andreas Blattmann*, Dominik Lorenz, Patrick Esser, Björn Ommer
CVPR '22 Oral | GitHub | arXiv | Project page
Author: CompVis
Source Code: https://github.com/CompVis/stable-diffusion
License: View license
1676903051
As conversational AI becomes more prevalent in various industries, the demand for chatbots and virtual assistants continues to grow. To keep up with this trend, many businesses and developers are turning to ChatGPT models to create more sophisticated conversational AI experiences.
However, creating a ChatGPT model that delivers accurate, relevant, and engaging responses requires more than just training it on a large dataset. It also involves fine-tuning the model to optimize its performance based on your specific use case.
In this article, we’ll discuss how to fine-tune a ChatGPT model to deliver more accurate and personalized responses. We’ll cover everything from the basics of ChatGPT models to specific techniques you can use to improve your conversational AI. Let’s get started.
ChatGPT (Generative Pre-trained Transformer) is a type of machine learning model used for conversational AI. It’s based on the Transformer architecture, which was introduced by Google in 2017 for language translation tasks.
The ChatGPT model is pre-trained on a large corpus of text data and then fine-tuned for specific tasks, such as answering customer queries or providing personalized recommendations. It uses a deep neural network to generate text responses that sound natural and human-like.
Fine-tuning a ChatGPT model involves retraining it on a smaller dataset that’s specific to your use case. Here are the steps you need to follow:
There are several pre-trained ChatGPT models available, such as GPT-2 and GPT-3. Choose the one that’s most appropriate for your use case based on the size of your dataset and the complexity of your task.
To fine-tune your ChatGPT model, you’ll need to collect a smaller dataset that’s specific to your use case. This dataset should be clean and well-structured, with a clear and consistent format.
Once you have your dataset, you can start training your ChatGPT model using transfer learning. Transfer learning is a technique that involves reusing pre-trained models and modifying them to perform new tasks.
After training your model, you need to test it and evaluate its performance. Use a validation set to measure the accuracy and relevance of your model’s responses.
Based on the results of your evaluation, you can fine-tune your model by adjusting its hyperparameters, such as the learning rate and the number of epochs. You can also add more data to your training set or change the architecture of your model to improve its performance.
Here is a general guide on fine-tuning GPT-3 models using Python on Financial data.
Firstly, you need to set up an OpenAI account and have access to the GPT-3 API. Make sure have your Deep Learning Architecture setup properly.
Install the openai module in Python using the command “pip install openai”.
pip install openai
Import OpenAI and add your OpenAI API key by creating a python file and replacing <YOUR_API_KEY>
with your actual API key:
import openai
import pandas as pd
# Authenticate with OpenAI API
openai.api_key = "YOUR_API_KEY"
Load the pre-trained GPT-3 model by adding the below code.
model_engine = "davinci" # You can choose any model from the list provided by OpenAI
model = openai.Model(engine=model_engine)
Fine-tune the GPT-3 model on financial data using a sample dataset. Here is an example of how to fine-tune the model on a financial dataset:
data = pd.read_csv("financial_data.csv")
# Create a training set by selecting a subset of the data
train_data = data.sample(frac=0.8, random_state=123)
# Create a test set using the remaining data
test_data = data.drop(train_data.index)
# Convert the training set into a list of strings
train_strings = train_data["text"].tolist()
# Fine-tune the model on the training set
model.finetune(train_strings)
# Test the model on the test set
test_strings = test_data["text"].tolist()
results = model.generate(test_strings)
Evaluate the performance of the model by calculating its accuracy, precision, recall, and F1 score. Here is an example of how to evaluate the model on the test set:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
# Convert the generated strings into a list of labels
generated_labels = [result.choices[0].text for result in results]
# Convert the actual labels into a list of labels
actual_labels = test_data["label"].tolist()
# Calculate the metrics
accuracy = accuracy_score(actual_labels, generated_labels)
precision = precision_score(actual_labels, generated_labels)
recall = recall_score(actual_labels, generated_labels)
f1 = f1_score(actual_labels, generated_labels)
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
If the model’s performance is not satisfactory, you can fine-tune it further by adjusting the hyperparameters such as the learning rate, batch size, and number of epochs. You can experiment with different combinations of hyperparameters to find the optimal ones.
Here are some other examples about doing some analysis on financial data.
The first step to analyzing financial data is to determine the overall sentiment. Sentiment analysis is a way to determine whether the news or financial reports are positive or negative. Here is an example of how to fine-tune a GPT-3 model on financial data for sentiment analysis:
import openai
import pandas as pd
# Authenticate with OpenAI API
openai.api_key = "YOUR_API_KEY"
# Load financial data into a DataFrame
df = pd.read_csv("financial_data.csv")
# Fine-tune GPT-3 on financial data
model_engine = "text-davinci-002"
prompt = "Sentiment analysis of financial data: "
fin_data = df["news"].tolist()
results = []
for data in fin_data:
prompt = prompt + data + " "
response = openai.Completion.create(engine=model_engine, prompt=prompt, max_tokens=50)
results.append(response.choices[0].text)
# Print the results
for result in results:
print(result)
The next step in financial analysis is to predict stock prices. Here is an example of how to fine-tune a GPT-3 model on financial data for stock price prediction:
import openai
import pandas as pd
# Authenticate with OpenAI API
openai.api_key = "YOUR_API_KEY"
# Load financial data into a DataFrame
df = pd.read_csv("financial_data.csv")
# Fine-tune GPT-3 on financial data
model_engine = "text-davinci-002"
prompt = "Predicting stock prices: "
fin_data = df["financial_reports"].tolist()
results = []
for data in fin_data:
prompt = prompt + data + " "
response = openai.Completion.create(engine=model_engine, prompt=prompt, max_tokens=50)
results.append(response.choices[0].text)
# Print the results
for result in results:
print(result)
Another important aspect of financial analysis is trend analysis. Here is an example of how to fine-tune a GPT-3 model on financial data for trend analysis:
import openai
import pandas as pd
# Authenticate with OpenAI API
openai.api_key = "YOUR_API_KEY"
# Load financial data into a DataFrame
df = pd.read_csv("financial_data.csv")
# Fine-tune GPT-3 on financial data
model_engine = "text-davinci-002"
prompt = "Analyzing financial trends: "
fin_data = df["financial_reports"].tolist()
results = []
for data in fin_data:
prompt = prompt + data + " "
response = openai.Completion.create(engine=model_engine, prompt=prompt, max_tokens=50)
results.append(response.choices[0].text)
# Print the results
for result in results:
print(result)
Note that you need to be very cautious when using financial data. Accessing, storing, or processing financial data may require proper certification and compliance with legal and regulatory requirements. Please consult with a financial or legal expert before using financial data.
Fine-tuning a GPT-3 model is the process of training the model on a specific task or domain by using a smaller dataset that is more specific to the task at hand. This process is used to optimize the performance of the model for a particular use case.
To fine-tune a GPT-3 model, you need to provide the model with a specific training dataset that is relevant to your task. You can then use this dataset to train the model on your specific task, by using a variety of techniques such as transfer learning and optimization algorithms.
The benefits of fine-tuning a GPT-3 model are that it allows you to optimize the performance of the model for a specific use case. This can lead to better accuracy, faster processing times, and a reduction in the amount of data required to train the model.
The time it takes to fine-tune a GPT-3 model can vary depending on the complexity of the task and the size of the training dataset. However, in general, fine-tuning a GPT-3 model can take anywhere from a few hours to a few days.
Some common techniques used for fine-tuning a GPT-3 model include transfer learning, which involves using a pre-trained model to train a new model on a specific task, and optimization algorithms such as gradient descent, which help to improve the accuracy and speed of the model. Other techniques include regularization, data augmentation, and early stopping.
Some best practices for fine-tuning a GPT-3 model include selecting a relevant training dataset, using transfer learning, optimizing the hyperparameters of the model, and monitoring the performance of the model during training. It is also important to have a clear understanding of the limitations of the model and to regularly test the model’s performance on new data.
Original article source at: https://www.cloudbooklet.com/
1676867040
A service framework for large-scale model inference, Energon-AI has the following characteristics:
For models trained by Colossal-AI, they can be easily transferred to Energon-AI. For single-device models, they require manual coding works to introduce tensor parallelism and pipeline parallelism.
Install from source
$ git clone git@github.com:hpcaitech/EnergonAI.git
$ pip install -r requirements.txt
$ pip install .
Use docker
$ docker pull hpcaitech/energon-ai:latest
Download OPT model: To launch the distributed inference service quickly, you can download the checkpoint of OPT-125M here. You can get details for loading other sizes of models here.
Launch an HTTP service: To launch a service, we need to provide python scripts to describe the model type and related configurations, and start an http service. An OPT example is EnergonAI/examples/opt.
The entrance of the service is a bash script server.sh. The config of the service is at opt_config.py, which defines the model type, the checkpoint file path, the parallel strategy, and http settings. You can adapt it for your own case. For example, set the model class as opt_125M and set the correct checkpoint path as follows. Set the tensor parallelism degree the same as your gpu number.
model_class = opt_125M
checkpoint = 'your_file_path'
tp_init_size = #gpu
Now, we can launch a service:
bash server.sh
Then open https://[ip]:[port]/docs in your browser and try out!
You can find technical details in our blog and manuscript:
Build an online OPT service using Colossal-AI in 5 minutes
EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models
@misc{du2022energonai,
title={EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models},
author={Jiangsu Du and Ziming Liu and Jiarui Fang and Shenggui Li and Yongbin Li and Yutong Lu and Yang You},
year={2022},
eprint={2209.02341},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
If interested in making your own contribution to the project, please refer to Contributing for guidance.
Thanks so much!
Author: hpcaitech
Source Code: https://github.com/hpcaitech/EnergonAI
License: Apache-2.0 license
1672806840
An ER diagram is a database blueprint. It facilitates data visualization and database development processes. Read on to find out what ER diagrams are and how to create them in Vertabelo.
In this article, we’ll explain what an ER diagram is and what elements it contains. We’ll discuss ER diagram abstraction levels, such as conceptual, logical, and physical diagrams. You’ll learn how to build ER diagrams using various notations, such as Crow’s Foot notation, Barker’s notation, and more. Finally, we’ll show you several examples of ER diagrams in Vertabelo.
Let’s get started.
An entity-relationship diagram (ER diagram or ERD) uses symbols to describe entities and the relationships between them. Let’s look at a simple example before jumping into definitions.
Here, the entities are Books and Authors. The attributes of the Books entity are Title, Author, Genre, and ISBN. The attributes of the Authors entity are FirstName, LastName, Email, DateOfBirth, and CountryOfOrigin.
Also, there is a many-to-many relationship between the Books and Authors entities; a book can be written by one or more authors and an author can write one or more books.
For more details, check out our earlier article on ER DIAGRAMS.
There are three main levels of abstraction that ER diagrams usually follow (although sometimes the conceptual and logical levels are combined).
The conceptual diagram is the simplest version with the highest level of abstraction. Here we identify the entities (such as people, objects, or concepts) and relationships between them. The primary consideration at this level are business rules; technical terms come later.
The next step is the logical diagram. It is built from the conceptual diagram by adding attributes to each entity and detailing the relationships (i.e. by what will become primary and foreign keys in the database). But we are still not considering a specific database platform and its technical requirements.
Here is an ARTICLE ON THE DATA TYPES AVAILABLE IN VERTABELO FOR LOGICAL MODELS.
The physical diagram is created from the logical model; it is the most precise, technical, and detailed ERD. Here we consider the target database engine and its exact data types and other features. The entities and attributes turn into tables and columns, respectively; the many-to-many relationships require additional tables to store them.
The physical model can be turned into an SQL script for database creation.
To learn more about the abstraction levels of ER diagrams, read our ARTICLE ON CONCEPTUAL, LOGICAL, AND PHYSICAL DATA MODELS.
Let’s move on to the components of ER diagrams.
Examples of entities include people, objects, or concepts. There are strong (independent) entities and weak (dependent) entities. A strong entity can exist on its own, but a weak entity is dependent on a strong entity and typically stores data associated with the strong entity. For example, a person entity is a strong entity; a person_details entity is a weak entity that extends the person entity.
Also, you may come across associative entities, which create associations between entities (i.e. storing data for a many-to-many relationship).
Here is an example of entities (without attributes) in a logical model:
In physical models, entities turn into tables that store data within a database.
Each entity contains a list of attributes. Attributes are characteristics that describe an entity. For example, a Student entity may contain the attributes StudentID, FirstName, LastName, and Email.
Here is an example of attributes in a logical model:
In physical models, attributes turn into table columns within a database:
You can learn more ABOUT THE DIFFERENCE BETWEEN ENTITIES AND ATTRIBUTES elsewhere in our blog.
ER diagrams also show the relationships between entities, but these are a bit more complex than entities and their attributes.
A relationship is a link between two entities. It defines how these entities are related to each other. Each relationship has a cardinality that indicates how many objects of one entity relate to how many objects in the other entity. Examples of cardinality include one-to-one, one-to-many, and many-to-many. Relationships between entities can also include inheritance or associations with their cardinalities.
The association relationship assigns a row from one table to a row from another table, including the cardinalities. Here, each student can be signed up for zero or more courses and each course can have zero or more students.
The inheritance relationship creates a parent-child relationship between tables. This is where attributes of a parent table are inherited by a child table (in addition to the child table’s own attributes). Here, the Student
table is a parent table to the BachelorStudent
, MasterStudent
, and PhDStudent
tables.
Physical models are similar, except that the many-to-many relationship requires an additional table to store the relationship data.
Vertabelo uses various notations to ease the process of diagram creation and enhance the readability of diagrams. All notations use symbols to represent ERD entities and relationships – e.g. boxes to represent entities and lines to represent relationships. But the details are different for each one, so let’s go through each notation.
In addition to laying out entities and defining relationships, Crow’s Foot notation allows you to define the multiplicities (or cardinalities) of the relationships between the entities.
Here are the available multiplicities:
A line connecting two entities can have either of the abovementioned multiplicities at each end.
Here are MORE ELABORATE EXAMPLES OF CROW’S FOOT NOTATION.
Here is an example of a logical ER diagram in Crow’s Foot notation:
Here is an example of a physical ER diagram in Crow’s Foot notation:
Barker’s notation defines the implementation details for entities, attributes, and relationships. Let’s look at the details of this notation scheme.
Here are MORE DETAILS ABOUT BARKER’S NOTATION.
Here is an example of a physical ER diagram in Barker’s notation:
UML is a popular modeling language used throughout the computer science world. Let’s see how it defines diagram symbols:
To learn more, visit OUR ARTICLE ON UML NOTATION.
Here is an example of a logical ER diagram in UML notation:
Here is an example of a physical ER diagram in UML notation:
The Integration DEFinition for Information Modeling (IDEF1X) notation defines entities, attributes, and relationships. Let’s discuss the details.
You can read more about the DETAILS ABOUT THE IDEF1X NOTATION SYMBOLS HERE.
Here is an example of a logical ER diagram in IDEF1X notation:
Here is an example of a physical ER diagram in IDEF1X notation:
Each notation offers a distinct yet similar way to define relationships. To learn more about it, follow our article on the THEORY AND PRACTICE OF DATABASE CARDINALITIES.
Vertabelo lets you use different notations in creating your diagrams. For conceptual or logical models, we can use Crow’s Foot, UML, or IDEF1X. And for physical models, we can use Crow’s Foot, Barker’s, UML, or IDEF1X. Here’s HOW TO CHANGE THE DIAGRAM NOTATION IN VERTABELO.
That’s all you should know before creating ER diagrams in Vertabelo.
Vertabelo lets you build conceptual, logical, and physical ER diagrams using different notations that can be changed at any time. You can choose the one that suits your requirements.
Go ahead and try it out for yourself!
Original article source at: https://www.vertabelo.com/
1672736040
Our MLOps Zoomcamp course
Teach practical aspects of productionizing ML services — from collecting requirements to model deployment and monitoring.
Data scientists and ML engineers. Also software engineers and data engineers interested in learning about putting ML in production.
Course start: 16 of May
The best way to get support is to use DataTalks.Club's Slack. Join the #course-mlops-zoomcamp
channel.
To make discussions in Slack more organized:
(In October)
I want to start preparing for the course. What can I do?
If you haven't used Flask or Docker
If you have no previous experience with ML
I registered but haven't received an invite link. Is it normal?
Yes, we haven't automated it. You'll get a mail from us eventually, don't worry.
If you want to make sure you don't miss anything:
#course-mlops-zoomcamp
channelIs it going to be live?
No and yes. There will be two parts:
I just joined. Can I still get a certificate?
#course-mlops-zoomcamp
channelAuthor: DataTalksClub
Source Code: https://github.com/DataTalksClub/mlops-zoomcamp
1671807144
Enterprises are building better methods for monitoring, managing, and supervising AI systems in production as AI in business becomes more ubiquitous and AI budgets continue to climb around the world.
As more businesses successfully deploy and grow their AI models, model operators are becoming a vital business function. According to our survey of 100 AI-focused executives from prominent financial services firms, each company has an average of 270 models in production. One-fifth of respondents say all of their company's business units use AI regularly, while another 60% say at least some of them do. This isn't entirely surprising, given how far the companies we studied have progressed with AI. However, as the number of models utilized in these businesses grows, AI-focused CEOs face new problems.
Multi-Cloud ModelOps is the latest approach to the operationalization of models in applications and the synchronization between applications and model pipelines. Click to explore our, Multi-Cloud ModelOps Benefits, and Features
Our findings, however, demonstrate that the maturity of the ModelOps procedures that businesses have in place varies substantially from one firm to the next.
Even simple activities, like inventorying the models currently deployed throughout their firms, are proving difficult for CEOs. Only 25% of survey respondents said their methods in place for this are "very effective." Meanw
"Building models that sit on a shelf and never get deployed into production has been a typical problem for analytics teams for a long time," says Glenn Hofmann, Chief Analytics Officer at New York Life Insurance Company.
The strategic power of AI has been established across various industries and companies. As a result, there has been a spike in model creation. However, investments in the people, procedures, and technologies needed to operationalize models – known as ModelOps – have lagged. That is beginning to change. More than half of the companies we studied currently have a ModelOps budget and that number is expected to climb to 90% in the next year. The main drivers for this increasing investment are risk management, cost reduction through automation, and enhanced visibility through improved monitoring.
Original article source at: https://www.xenonstack.com/