1679887042
This fine-tunes the GPT-J 6B model on the Alpaca dataset using a Databricks notebook. Please note that while GPT-J 6B is Apache 2.0 licensed, the Alpaca dataset is licensed under Creative Commons NonCommercial (CC BY-NC 4.0).
dolly
repo to Databricks (under Repos click Add Repo, enter https://github.com/databrickslabs/dolly.git
, then click Create Repo).12.2 LTS ML (includes Apache Spark 3.3.2, GPU, Scala 2.12)
single-node cluster with node type having 8 A100 GPUs (e.g. Standard_ND96asr_v4
or p4d.24xlarge
).train_dolly
notebook in the dolly
repo, attach to your GPU cluster, and run all cells. When training finishes, the notebook will save the model under /dbfs/dolly_training
.pyenv local 3.8.13
python -m venv .venv
. .venv/bin/activate
pip install -r requirements_dev.txt
./run_pytest.sh
Author: Databrickslabs
Source Code: https://github.com/databrickslabs/dolly
License: Apache-2.0 license
#python #dataset #databricks #notebook
1679887042
This fine-tunes the GPT-J 6B model on the Alpaca dataset using a Databricks notebook. Please note that while GPT-J 6B is Apache 2.0 licensed, the Alpaca dataset is licensed under Creative Commons NonCommercial (CC BY-NC 4.0).
dolly
repo to Databricks (under Repos click Add Repo, enter https://github.com/databrickslabs/dolly.git
, then click Create Repo).12.2 LTS ML (includes Apache Spark 3.3.2, GPU, Scala 2.12)
single-node cluster with node type having 8 A100 GPUs (e.g. Standard_ND96asr_v4
or p4d.24xlarge
).train_dolly
notebook in the dolly
repo, attach to your GPU cluster, and run all cells. When training finishes, the notebook will save the model under /dbfs/dolly_training
.pyenv local 3.8.13
python -m venv .venv
. .venv/bin/activate
pip install -r requirements_dev.txt
./run_pytest.sh
Author: Databrickslabs
Source Code: https://github.com/databrickslabs/dolly
License: Apache-2.0 license
1624516500
According to a recent study, call centre agents’ spend approximately 82 percent of their total time looking at step-by-step guides, customer data, and knowledge base articles.
Traditionally, dialogue state tracking (DST) has served as a way to determine what a caller wants at a given point in a conversation. Unfortunately, these aspects are not accounted for in popular DST benchmarks. DST is the core part of a spoken dialogue system. It estimates the beliefs of possible user’s goals at every dialogue turn.
To reduce the burden on call centre agents and improve the SOTA of task-oriented dialogue systems, AI-powered customer service company ASAPP recently launched an action-based conversations dataset (ABCD). The dataset is designed to help develop task-oriented dialogue systems for customer service applications. ABCD consists of a fully labelled dataset with over 10,000 human dialogues containing 55 distinct user intents requiring sequences of actions constrained by company policies to accomplish tasks.
https://twitter.com/asapp/status/1397928363923177472
The dataset is currently available on GitHub.
#developers corner #asapp abcd dataset #asapp new dataset #build enterprise chatbot #chatbot datasets latest #customer support datasets #customer support model training #dataset for chatbots #dataset for customer datasets
1598475240
With OpenAI’s recent breakthrough of pre-trained language model GPT-3, the company has revolutionised the concept of machines writing codes like humans — a step towards artificial general intelligence. Not only it is being used for writing codes, but also for writing blogs, creating stories as well as websites and apps. In fact, in recent news, a college student created an entirely fake blog using GPT-3, which created a massive buzz in the ML community and has also been trending on Hacker News.
#opinions #gpt-3 #gpt-3 efficacy #open ai gpt-3 #openai #openai sam altman #openai’s gpt-2 model
1596458220
Models these days are very big, and most of us don’t have the resources to train them from scratch. Luckily, HuggingFace has generously provided pretrained models in PyTorch, and Google Colab allows usage of their GPU (for a fixed time). Otherwise, even fine-tuning a dataset on my local machine without a NVIDIA GPU would take a significant amount of time. While the tutorial here is for GPT2, this can be done for any of the pretrained models given by HuggingFace, and for any size too.
Go to Google Colab and create a new notebook. It should look something like this.
Set to use GPU by clicking Runtime
> Change runtime type
Then click Save
.
We would run pip3 install transformers
normally in Bash, but because this is in Colab, we have to run it with !
!pip3 install transformers
You can read more about WikiText data here. Overall, there’s WikiText-2 and WikiText-103. We’re going to use WikiText-2 because it’s smaller, and we have limits in terms of how long we can run on GPU, and how much data we can load into memory in Colab. To download and run, in a cell, run
%%bash
wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip
unzip wikitext-2-raw-v1.zip
#gpu #google-colab #gpt-2 #fine-tuning #machine-learning #deep learning
1678240947
High-speed download of LLaMA, Facebook's 65B parameter GPT model.
This repository contains a high-speed download of LLaMA, Facebook's 65B parameter model that was recently made available via torrent. (Discussion: Facebook LLAMA is being openly distributed via torrents)
It downloads all model weights (7B, 13B, 30B, 65B) in less than two hours on a Chicago Ubuntu server.
real 98m12.980s
user 8m8.916s
sys 5m7.259s
This works out to 40MB/s (235164838073 bytes in 5892 seconds).
Personally, I just wanted to curl
the weights instead of dealing with a torrent. The fact that it's several times faster was just a nice bonus.
To download all model weights, cd
into the directory you want them, then run this:
Linux:
curl -o- https://raw.githubusercontent.com/shawwn/llama-dl/56f50b96072f42fb2520b1ad5a1d6ef30351f23c/llama.sh | bash
Mac:
brew install bash
brew install wget
curl -o- https://raw.githubusercontent.com/shawwn/llama-dl/56f50b96072f42fb2520b1ad5a1d6ef30351f23c/llama.sh | $(brew --prefix)/bin/bash
(Sorry mac users; they use some array syntax in the script that isn't supported on the version of bash that ships with Mac.)
Running random bash scripts generally isn't a good idea, but I'll stake my personal reputation on the fact that this link is safe. (It points to a specific SHA-1 hash rather than https://raw.githubusercontent.com/shawwn/llama-dl/main/llama.sh so that it's still safe even in the event that my repo or account got compromised.)
219G (235164838073 bytes) total. Here's a file list with sizes for each.
I ran this:
mkdir LLaMA
cd LLaMA
time curl -o- https://raw.githubusercontent.com/shawwn/llama-dl/56f50b96072f42fb2520b1ad5a1d6ef30351f23c/llama.sh | bash
cd ..
webtorrent 'magnet:?xt=urn:btih:b8287ebfa04f879b048d4d4404108cf3e8014352&dn=LLaMA&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce'
Webtorrent began seeding immediately, which means every file is identical to what you would've gotten via the torrent. So this is just a faster version of the torrent.
Roughly 3.6x. As of March 4 2023, the torrent seems to download at around 11MB/s, which implies a download time of around 6 hours. (Help seed it, if you can.)
I doubt it. This is using the download link that was leaked in the original torrent. (i.e. the leaker accidentally leaked their own unique download link that Facebook sent them.)
Technically, it may be illegal to knowingly use a private download link that was intended for someone else. Realistically, Facebook would risk their ML reputation by going after people who are merely trying to use what they themselves advertise as "open source."
Update: Facebook shut off the link a couple hours after this repo went live. I mirrored everything to R2 and updated the script to point to that instead.
Note that LLaMA was released under a "non-commercial bespoke license". Interestingly, Nvidia had a similar arrangement for StyleGAN, but that didn't stop Artbreeder from using it anyway. Nvidia never seemed to care enough to go after them. But if you launch your own OpenAI API and start charging money, don't be surprised when Facebook's lawyers come knocking.
Update (March 7, 3:35 PM CST): Looking to inference from the model? See https://github.com/shawwn/llama-dl/issues/1#issuecomment-1458870564 to use the improved sampler. (Facebook's sampler was using poor defaults, so no one was able to get anything good out of the model till now.)
Update (March 5, 12:52 PM CST): @anitakirkovska let us use their fabulous llama photo. If you happen to like the new header image as much as I do, be sure to check out their AI newsletter and their tweets about us.
Update (March 5, 9:51 AM CST): HN user MacsHeadroom left a valuable comment:
I'm running LLaMA-65B on a single A100 80GB with 8bit quantization. $1.5/hr on vast.ai
The output is at least as good as davinci.
I think some early results are using bad repetition penalty and/or temperature settings. I had to set both fairly high to get the best results. (Some people are also incorrectly comparing it to chatGPT/ChatGPT API which is not a good comparison. But that's a different problem.)
I've had it translate, write poems, tell jokes, banter, write executable code. It does it all-- and all on a single card.
I was shocked that this script was distributed with the original torrent, and that no one seemed to notice (a) that it still works, and (b) is almost 20x faster than the torrent method. I was impatient and curious to try to run 65B on an 8xA100 cluster, so I didn't want to wait till tomorrow and started poking around, which is when I found this. I decided to just tweet it out and let you, fellow scientists and hackers, enjoy it before Facebook notices and shuts it off.
"Power to the people" is an overused trope, but as a research scientist, I feel it's important to let individual hackers be able to experiment with the same tools, techniques, and systems that professional ML researchers are fortunate to have access to. This is a tricky situation, because at some point between now and 10 years from now, this might become dangerous -- AI alarmists often ask "Would you want random people experimenting with nuclear weapons in their basement?" My answer is "No, but we're not there yet."
Word on Twitter is that LLaMA's samples seem worse than GPT-3 by a large margin, but then I realized no one has really been able to try the full 65B model yet, for a combination of reasons. (Mostly lack of access to 8xA100 hardware.) So I decided to try it out for myself and see.
Even if it's GPT-3 level, the fact is, LLaMA is already openly available. The torrent isn't going anywhere. So my own thoughts on this are mostly irrelevant; determined hackers can get it themselves anyway.
But for what it's worth, my personal opinion is that LLaMA probably isn't OpenAI-grade -- there's a big difference between training a model in an academic setting vs when your entire company depends on it for wide-scale commercial success. I wasn't impressed that 30B didn't seem to know who Captain Picard was.
People have already started decrying this leak as dangerous. But everyone used to say the same thing about 1.5B. (In fact, the allure of 1.5B's grandiose claims was what drove me to take ML seriously in 2019.) Turns out, four years later, no one really cares about 1.5B anymore, and it certainly didn't cause wide-scale societal harm. I doubt LLaMA will either.
2023 will be interesting. I can't wait for 2024.
Signed with love,
Shawn Presser
twitter: @theshawwn
HN: sillysaurusx
HN discussion | Twitter announcement
Author: Shawwn
Source Code: https://github.com/shawwn/llama-dl
License: GPL-3.0 license