1652332980
Spotlight uses PyTorch to build both deep and shallow recommender models. By providing both a slew of building blocks for loss functions (various pointwise and pairwise ranking losses), representations (shallow factorization representations, deep sequence models), and utilities for fetching (or generating) recommendation datasets, it aims to be a tool for rapid exploration and prototyping of new recommender models.
See the full documentation for details.
conda install -c maciejkula -c pytorch spotlight
To fit an explicit feedback model on the MovieLens dataset:
from spotlight.cross_validation import random_train_test_split
from spotlight.datasets.movielens import get_movielens_dataset
from spotlight.evaluation import rmse_score
from spotlight.factorization.explicit import ExplicitFactorizationModel
dataset = get_movielens_dataset(variant='100K')
train, test = random_train_test_split(dataset)
model = ExplicitFactorizationModel(n_iter=1)
model.fit(train)
rmse = rmse_score(model, test)
To fit an implicit ranking model with a BPR pairwise loss on the MovieLens dataset:
from spotlight.cross_validation import random_train_test_split
from spotlight.datasets.movielens import get_movielens_dataset
from spotlight.evaluation import mrr_score
from spotlight.factorization.implicit import ImplicitFactorizationModel
dataset = get_movielens_dataset(variant='100K')
train, test = random_train_test_split(dataset)
model = ImplicitFactorizationModel(n_iter=3,
loss='bpr')
model.fit(train)
mrr = mrr_score(model, test)
Recommendations can be seen as a sequence prediction task: given the items a user has interacted with in the past, what will be the next item they will interact with? Spotlight provides a range of models and utilities for fitting next item recommendation models, including
from spotlight.cross_validation import user_based_train_test_split
from spotlight.datasets.synthetic import generate_sequential
from spotlight.evaluation import sequence_mrr_score
from spotlight.sequence.implicit import ImplicitSequenceModel
dataset = generate_sequential(num_users=100,
num_items=1000,
num_interactions=10000,
concentration_parameter=0.01,
order=3)
train, test = user_based_train_test_split(dataset)
train = train.to_sequence()
test = test.to_sequence()
model = ImplicitSequenceModel(n_iter=3,
representation='cnn',
loss='bpr')
model.fit(train)
mrr = sequence_mrr_score(model, test)
Spotlight offers a slew of popular datasets, including Movielens 100K, 1M, 10M, and 20M. It also incorporates utilities for creating synthetic datasets. For example, generate_sequential generates a Markov-chain-derived interaction dataset, where the next item a user chooses is a function of their previous interactions:
from spotlight.datasets.synthetic import generate_sequential
# Concentration parameter governs how predictable the chain is;
# order determins the order of the Markov chain.
dataset = generate_sequential(num_users=100,
num_items=1000,
num_interactions=10000,
concentration_parameter=0.01,
order=3)
Please cite Spotlight if it helps your research. You can use the following BibTeX entry:
@misc{kula2017spotlight,
title={Spotlight},
author={Kula, Maciej},
year={2017},
publisher={GitHub},
howpublished={\url{https://github.com/maciejkula/spotlight}},
}
Spotlight is meant to be extensible: pull requests are welcome. Development progress is tracked on Trello: have a look at the outstanding tickets to get an idea of what would be a useful contribution.
We accept implementations of new recommendation models into the Spotlight model zoo: if you've just published a paper describing your new model, or have an implementation of a model from the literature, make a PR!
Author: maciejkula
Source Code: https://github.com/maciejkula/spotlight
License: MIT License
1652195126
Узнайте о важности регуляризации отсева и о том, как ее применять в среде глубокого обучения PyTorch на Python.
Поскольку производительность модели может быть значительно улучшена за счет тонкой настройки ее гиперпараметров, ее настройка включает в себя определение оптимальных параметров, которые обеспечат более выдающуюся производительность, чем гиперпараметры модели по умолчанию. Мы можем использовать несколько методов для настройки гиперпараметров. Один из них — выпадение.
В этом уроке мы представим регуляризацию отсева для нейронных сетей. Сначала мы изучим предысторию и мотивацию внедрения отсева, а затем опишем, как теоретически работает отсев и как его реализовать в библиотеке Pytorch на Python.
Мы также увидим график потерь на тестовом наборе во времени в нейронной сети с отсевом и без него.
Для демонстрации мы будем использовать набор данных MNIST, доступный в библиотеке torchvision. Давайте установим все зависимости этого руководства:
$ pip install matplotlib==3.4.3 numpy==1.21.5 torch==1.10.1 torchvision
Если вы используете Colab, вам не нужно ничего устанавливать, так как все предустановлено. Импорт необходимых библиотек:
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
Давайте удостоверимся, что мы тренируемся с использованием GPU, если CUDA доступна:
# defining our device, 'cuda:0' if CUDA is available, 'cpu' otherwise
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
device
device(type='cuda', index=0)
Давайте создадим конвейер преобразования для нашего набора данных MNIST:
# make the transform pipeline, converting to tensor and normalizing
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
# the batch size during training
batch_size = 64
Затем загружаем обучающий набор данных:
train_dataset = torchvision.datasets.CIFAR10(root="./data", train=True,
download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size,
shuffle=True, num_workers=2)
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
170499072/? [00:06<00:00, 29640614.08it/s]
Extracting ./data/cifar-10-python.tar.gz to ./data
Мы переходим True
к, train
чтобы указать обучающий набор данных, и мы также устанавливаем download
, чтобы True
загрузить набор данных в указанную data
папку. После этого мы делаем наш DataLoader
, передаем batch_size
и shuffle
устанавливаем True
.
Аналогично для тестового набора:
test_dataset = torchvision.datasets.CIFAR10(root="./data", train=False,
download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size,
shuffle=False, num_workers=2)
Files already downloaded and verified
На этот раз мы собираемся train
получить False
тестовый набор.
Ниже приведены доступные классы в наборе данных MNIST:
# the MNIST classes
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
Теперь, когда у нас есть набор данных, готовый к обучению, давайте создадим нейронную сеть без отсева:
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
# switch to GPU if available
net.to(device)
Инициализация кросс-энтропийной потери и оптимизатора SGD :
import torch.optim as optim
# defining the loss and the optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
Далее создадим функцию, которая принимает нейронную сеть, потери и загрузчик данных для расчета общих потерь:
def get_test_loss(net, criterion, data_loader):
"""A simple function that iterates over `data_loader` to calculate the overall loss"""
testing_loss = []
with torch.no_grad():
for data in data_loader:
inputs, labels = data
# get the data to GPU (if available)
inputs, labels = inputs.to(device), labels.to(device)
outputs = net(inputs)
# calculate the loss for this batch
loss = criterion(outputs, labels)
# add the loss of this batch to the list
testing_loss.append(loss.item())
# calculate the average loss
return sum(testing_loss) / len(testing_loss)
Эта функция нам понадобится во время обучения. Приступаем к обучению прямо сейчас:
training_loss, testing_loss = [], []
running_loss = []
i = 0
for epoch in range(150): # 150 epochs
for data in train_loader:
inputs, labels = data
# get the data to GPU (if available)
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
# forward pass
outputs = net(inputs)
# backward pass
loss = criterion(outputs, labels)
loss.backward()
# update gradients
optimizer.step()
running_loss.append(loss.item())
i += 1
if i % 1000 == 0:
avg_train_loss = sum(running_loss) / len(running_loss)
avg_test_loss = get_test_loss(net, criterion, test_loader)
# clear the list
running_loss.clear()
# for logging & plotting later
training_loss.append(avg_train_loss)
testing_loss.append(avg_test_loss)
print(f"[{epoch:2d}] [it={i:5d}] Train Loss: {avg_train_loss:.3f}, Test Loss: {avg_test_loss:.3f}")
print("Done training.")
Выход:
[ 1] [it= 1000] Train Loss: 2.273, Test Loss: 2.118
[10] [it= 8000] Train Loss: 1.312, Test Loss: 1.326
[20] [it=16000] Train Loss: 1.031, Test Loss: 1.120
[30] [it=24000] Train Loss: 0.854, Test Loss: 1.043
[40] [it=32000] Train Loss: 0.718, Test Loss: 1.051
[51] [it=40000] Train Loss: 0.604, Test Loss: 1.085
[60] [it=47000] Train Loss: 0.521, Test Loss: 1.178
[70] [it=55000] Train Loss: 0.425, Test Loss: 1.370
[80] [it=63000] Train Loss: 0.348, Test Loss: 1.518
[93] [it=73000] Train Loss: 0.268, Test Loss: 1.859
[99] [it=78000] Train Loss: 0.248, Test Loss: 2.036
[109] [it=86000] Train Loss: 0.200, Test Loss: 2.351
[120] [it=94000] Train Loss: 0.161, Test Loss: 2.610
[130] [it=102000] Train Loss: 0.142, Test Loss: 2.976
[140] [it=110000] Train Loss: 0.117, Test Loss: 3.319
[149] [it=117000] Train Loss: 0.095, Test Loss: 3.593
Done training.
Глубокие нейронные сети включают в себя несколько нелинейных скрытых слоев, что делает их очень выразительными моделями, способными изучать сложные корреляции между входными и выходными данными. Однако при минимальном количестве обучающих данных многие из этих сложных ассоциаций будут результатом шума выборки; таким образом, они будут существовать в обучающем наборе, но не в реальных тестовых данных, даже если они получены из одного и того же распределения. Подгонка всех возможных альтернативных нейронных сетей к одному и тому же набору данных и усреднение прогнозов каждой модели — это один из методов уменьшения переобучения.
Это невозможно, но его можно аппроксимировать, используя небольшую группу различных моделей, известную как ансамбль.
Индивидуально обученные сети стоят дорого с массивными нейронными сетями. Объединение многих моделей наиболее полезно, когда отдельные модели отличаются друг от друга, а модели нейронных сетей должны быть специфичными, иметь различные конструкции или обучаться на отдельных данных. Обучение многих проектов затруднено, потому что определение идеальных гиперпараметров для каждой архитектуры является сложной задачей, а обучение каждой огромной сети требует значительного объема вычислений.
Кроме того, для обширных сетей часто требуется большое количество обучающих данных, и может не хватить данных для обучения отдельных сетей на разных подмножествах данных. Таким образом, даже при ансамблевом приближении существует трудность, заключающаяся в том, что для его подгонки и хранения требуются различные модели, что может быть затруднительно, если модели огромны и для их обучения и настройки требуются дни или недели.
Во время обучения пропускная способность сети снижается, когда выходные данные отпадающего слоя подвергаются случайной субдискретизации. Следовательно, отсева может потребоваться более обширная сеть.
При обучении нейронных сетей с отсевом выходные данные определенного слоя игнорируются или отбрасываются случайным образом. Это приводит к тому, что уровень имеет другое количество узлов и связанность с предыдущим уровнем.
Каждое обновление слоя во время обучения выполняется с отдельным представлением настроенного слоя. Выпадение делает процесс обучения шумным, заставляя узлы внутри слоя брать на себя большую или меньшую ответственность за входные данные на вероятностной основе.
На обучающих данных нейроны учатся исправлять ошибки, допущенные их сверстниками, — процесс, известный как коадаптация. Следовательно, способность сети соответствовать обучающим данным резко улучшается. Поскольку коадаптации адаптированы к особенностям обучающих данных, они не будут обобщаться на тестовые данные; следовательно, он становится более изменчивым.
Нитиш Сривастава, Илья Суцкевер, Джеффри Хинтон, Алекс Крижевский и Руслан Салахутдинов были первыми, кто предложил отсев, чтобы минимизировать эффекты нейронной коадаптации.
Источник: газета .
Dropout — это метод, который решает обе упомянутые выше проблемы в нейронных сетях. Он позволяет избежать переобучения и обеспечивает эффективное аппроксимирующее объединение неопределенно большого количества уникальных топологий нейронных сетей.
Выпадение относится к удалению единиц (как скрытых, так и явных) из нейронной сети. Удаление устройства означает временное удаление его из сети, включая все его входящие и исходящие соединения. Dropout имитирует разреженную активацию с заданного уровня, что подталкивает сеть к изучению разреженного представления в качестве побочного эффекта. Его можно использовать вместо регуляризации активности для поощрения разреженных представлений в моделях автокодировщика.
Dropout добавляет новый гиперпараметр: вероятность сохранения единицы p. Этот гиперпараметр управляет степенью отсева.
Отсева нет, когда p = 1, а низкие значения p указывают на то, что отсева много. Для скрытых блоков типичные значения p находятся в диапазоне от 0,5 до 0,8.
Тип ввода влияет на выбор входных слоев. Типичное значение для входных данных с действительным знаком (изображения или звуковые кадры) равно 0,8.
Выбор p для скрытых слоев связан с количеством скрытых единиц n. Меньшее p требует большего n, что замедляет обучение и приводит к недообучению.
Когда p велико, Dropout может быть недостаточно, чтобы избежать переобучения. На определенном уровне единицы будут иметь 70%-й шанс оставаться активным и 30%-й риск быть удаленным, если вероятность удержания на этом уровне составляет 0,7%.
Обратите внимание, что PyTorch и другие фреймворки для глубокого обучения используют коэффициент отсева вместо коэффициента сохранения p, коэффициент сохранения 70% означает коэффициент отсева 30% и т. д.
При обучении исключения используются исключительно для повышения способности сети противостоять изменениям в обучающем наборе данных. Во время теста вы, вероятно, захотите использовать всю сеть.
Примечательно, что отсев не используется с тестовыми данными или во время производственного вывода. В результате у вас будет больше связей и активаций в вашем нейроне во время логического вывода, чем во время обучения.
Таким образом, нейроны в последующем слое будут перевозбуждены, если вы используете 50-процентный коэффициент отсева во время обучения и удаляете два из каждых четырех нейронов в слое. Таким образом, значения, выдаваемые этими нейронами, будут на 50% больше, чем необходимо. Вероятность удержания (1 — показатель отсева) используется для уменьшения веса сверхактивированных нейронов во время тестирования и вывода. На рисунке ниже, созданном участником Datascience Stackexchange Дмитрием Прилипко , прекрасно показано, как это работает:
Впервые представленный Джеффри Хинтоном и его коллегами в их статье 2012 года под названием « Улучшение нейронных сетей путем предотвращения совместной адаптации детекторов признаков », отсев был применен к широкому кругу типов задач, включая классификацию фотографий (CIFAR-10), распознавание рукописных цифр (MNIST). ) и распознавание речи (TIMIT).
В этой статье можно вывести некоторые интересные факты о приложениях отсева в наборе данных MNIST.
Когда отсев был представлен в журнальной публикации 2014 года под названием « Отсев: простой способ предотвращения переобучения нейронных сетей» , он был опробован на широком спектре задач распознавания голоса, компьютерного зрения и классификации текста и обнаружил, что он повышает производительность во всех из них. . В данной статье описана процедура обучения отсевающих нейронных сетей.
Используя глубокие сверточные нейронные сети с регуляризацией отсева, Алекс Крижевский и др. В своей публикации 2012 года « Классификация ImageNet с глубокими свёрточными нейронными сетями » получили самые современные результаты для классификации изображений в наборе данных ImageNet.
Подход Dropout — отличная иллюстрация того, как PyTorch сделал программирование простым и понятным.
С помощью двух строк кода мы можем достичь нашей цели, которая поначалу кажется сложной. Нам просто нужно добавить дополнительный слой исключения при разработке нашей модели. torch.nn.Dropout()
Для этого будет использоваться класс .
Некоторые элементы входного тензора деактивируются этим классом случайным образом во время обучения. Параметр p дает вероятность деактивации нейрона. Эта опция имеет значение по умолчанию 0,5, что означает, что половина нейронов выпадет. Выходные данные масштабируются с коэффициентом 1/1-p, что указывает на то, что модуль просто вычисляет функцию идентичности во время оценки.
Присмотритесь к архитектуре нашей модели. Мы будем использовать приведенную ниже программу для обновления версии базовой модели.
В этом сценарии отсев применяется после каждого шага в порядке убывания.
class NetDropout(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.do1 = nn.Dropout(0.2) # 20% Probability
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.do2 = nn.Dropout(0.2) # 20% Probability
self.fc2 = nn.Linear(120, 84)
self.do3 = nn.Dropout(0.1) # 10% Probability
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.do1(x)
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = self.do2(x)
x = F.relu(self.fc2(x))
x = self.do3(x)
x = self.fc3(x)
return x
net_dropout = NetDropout()
net_dropout.to(device)
Проведем тренировку как раньше:
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net_dropout.parameters(), lr=0.001, momentum=0.9)
training_loss_d, testing_loss_d = [], []
running_loss = []
i = 0
for epoch in range(150): # 10 epochs
for data in train_loader:
inputs, labels = data
# get the data to GPU (if available)
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
# forward pass
outputs = net_dropout(inputs)
# backward pass
loss = criterion(outputs, labels)
loss.backward()
# update gradients
optimizer.step()
running_loss.append(loss.item())
i += 1
if i % 1000 == 0:
avg_train_loss = sum(running_loss) / len(running_loss)
avg_test_loss = get_test_loss(net_dropout, criterion, test_loader)
# clear the list
running_loss.clear()
# for logging & plotting later
training_loss_d.append(avg_train_loss)
testing_loss_d.append(avg_test_loss)
print(f"[{epoch:2d}] [it={i:5d}] Train Loss: {avg_train_loss:.3f}, Test Loss: {avg_test_loss:.3f}")
print("Done training.")
[ 1] [it= 1000] Train Loss: 2.302, Test Loss: 2.298
[10] [it= 8000] Train Loss: 1.510, Test Loss: 1.489
[20] [it=16000] Train Loss: 1.290, Test Loss: 1.318
[30] [it=24000] Train Loss: 1.167, Test Loss: 1.214
[40] [it=32000] Train Loss: 1.085, Test Loss: 1.154
[49] [it=39000] Train Loss: 1.025, Test Loss: 1.141
[60] [it=47000] Train Loss: 0.979, Test Loss: 1.113
[70] [it=55000] Train Loss: 0.936, Test Loss: 1.082
[80] [it=63000] Train Loss: 0.902, Test Loss: 1.088
[90] [it=71000] Train Loss: 0.880, Test Loss: 1.087
[99] [it=78000] Train Loss: 0.856, Test Loss: 1.090
[109] [it=86000] Train Loss: 0.843, Test Loss: 1.094
[120] [it=94000] Train Loss: 0.818, Test Loss: 1.102
[130] [it=102000] Train Loss: 0.805, Test Loss: 1.090
[140] [it=110000] Train Loss: 0.796, Test Loss: 1.094
[149] [it=117000] Train Loss: 0.785, Test Loss: 1.115
Done training.
Теперь давайте построим тестовую потерю обеих сетей (с отсевом и без):
import matplotlib.pyplot as plt
# plot both benchmarks
plt.plot(testing_loss, label="no dropout")
plt.plot(testing_loss_d, label="with dropout")
# make the legend on the plot
plt.legend()
plt.title("The Cross-entropy loss of the MNIST test data with and w/o Dropout")
plt.show()
Выход:
Как видите, потери при тестировании нейронной сети без отсева начали увеличиваться примерно через 20 эпох, явно переобучая. В то время как при введении отсева он со временем продолжает уменьшаться.
После применения методики дропаута мы замечаем небольшое улучшение. Dropout — не единственный метод, который можно использовать, чтобы избежать переобучения. Существуют и другие методы, такие как снижение веса или ранняя остановка.
Вы можете получить полный код для этого урока на блокноте Colab здесь .
Оригинальный источник статьи на https://www.thepythoncode.com
#pytorch #python #глубокоеобучение
1652184207
드롭아웃 정규화의 중요성과 Python의 PyTorch 딥 러닝 프레임워크에 적용하는 방법을 알아보세요.
모델 성능은 하이퍼파라미터를 미세 조정하여 상당히 향상될 수 있으므로 튜닝에는 모델의 기본 하이퍼파라미터보다 뛰어난 성능을 제공하는 최적의 매개변수를 식별하는 작업이 포함됩니다. 하이퍼파라미터 튜닝을 위해 여러 기술을 사용할 수 있습니다. 그 중 하나가 탈락입니다.
이 자습서에서는 신경망에 대한 드롭아웃 정규화를 제시합니다. 먼저 dropout을 채택한 배경과 동기를 살펴본 다음, dropout이 이론적으로 작동하는 방식과 Python의 Pytorch 라이브러리에서 이를 구현하는 방법에 대해 설명합니다.
또한 드롭아웃이 있거나 없는 신경망에서 시간 경과에 따른 테스트 세트의 손실 플롯도 볼 수 있습니다.
데모를 위해 torchvision 라이브러리에서 사용할 수 있는 MNIST 데이터 세트를 사용할 것입니다. 이 자습서의 모든 종속성을 설치해 보겠습니다.
$ pip install matplotlib==3.4.3 numpy==1.21.5 torch==1.10.1 torchvision
Colab을 사용하는 경우 모든 것이 사전 설치된 상태로 제공되므로 아무것도 설치할 필요가 없습니다. 필요한 라이브러리 가져오기:
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
CUDA를 사용할 수 있는 경우 GPU를 사용하여 훈련하도록 합시다.
# defining our device, 'cuda:0' if CUDA is available, 'cpu' otherwise
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
device
device(type='cuda', index=0)
MNIST 데이터 세트에 대한 변환 파이프라인을 만들어 보겠습니다.
# make the transform pipeline, converting to tensor and normalizing
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
# the batch size during training
batch_size = 64
다음으로 훈련 데이터 세트를 로드합니다.
train_dataset = torchvision.datasets.CIFAR10(root="./data", train=True,
download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size,
shuffle=True, num_workers=2)
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
170499072/? [00:06<00:00, 29640614.08it/s]
Extracting ./data/cifar-10-python.tar.gz to ./data
훈련 데이터 세트를 나타내기 위해 전달 True
하고 지정된 폴더 에 데이터 세트 를 다운로드하도록 설정했습니다. 그 후, 우리는 , 를 전달하고 로 설정 합니다 .traindownloadTruedataDataLoaderbatch_sizeshuffleTrue
마찬가지로 테스트 세트의 경우:
test_dataset = torchvision.datasets.CIFAR10(root="./data", train=False,
download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size,
shuffle=False, num_workers=2)
Files already downloaded and verified
이번에 는 테스트 세트를 가져오도록 설정했습니다 train
.False
다음은 MNIST 데이터 세트에서 사용 가능한 클래스입니다.
# the MNIST classes
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
이제 훈련할 데이터 세트가 준비되었으므로 드롭아웃이 없는 신경망을 만들어 보겠습니다.
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
# switch to GPU if available
net.to(device)
교차 엔트로피 손실 및 SGD 옵티마이저 초기화 :
import torch.optim as optim
# defining the loss and the optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
다음으로 신경망, 손실, 데이터 로더를 받아 전체 손실을 계산하는 함수를 만들어 보겠습니다.
def get_test_loss(net, criterion, data_loader):
"""A simple function that iterates over `data_loader` to calculate the overall loss"""
testing_loss = []
with torch.no_grad():
for data in data_loader:
inputs, labels = data
# get the data to GPU (if available)
inputs, labels = inputs.to(device), labels.to(device)
outputs = net(inputs)
# calculate the loss for this batch
loss = criterion(outputs, labels)
# add the loss of this batch to the list
testing_loss.append(loss.item())
# calculate the average loss
return sum(testing_loss) / len(testing_loss)
훈련 중에 이 기능이 필요합니다. 이제 훈련을 시작하겠습니다.
training_loss, testing_loss = [], []
running_loss = []
i = 0
for epoch in range(150): # 150 epochs
for data in train_loader:
inputs, labels = data
# get the data to GPU (if available)
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
# forward pass
outputs = net(inputs)
# backward pass
loss = criterion(outputs, labels)
loss.backward()
# update gradients
optimizer.step()
running_loss.append(loss.item())
i += 1
if i % 1000 == 0:
avg_train_loss = sum(running_loss) / len(running_loss)
avg_test_loss = get_test_loss(net, criterion, test_loader)
# clear the list
running_loss.clear()
# for logging & plotting later
training_loss.append(avg_train_loss)
testing_loss.append(avg_test_loss)
print(f"[{epoch:2d}] [it={i:5d}] Train Loss: {avg_train_loss:.3f}, Test Loss: {avg_test_loss:.3f}")
print("Done training.")
산출:
[ 1] [it= 1000] Train Loss: 2.273, Test Loss: 2.118
[10] [it= 8000] Train Loss: 1.312, Test Loss: 1.326
[20] [it=16000] Train Loss: 1.031, Test Loss: 1.120
[30] [it=24000] Train Loss: 0.854, Test Loss: 1.043
[40] [it=32000] Train Loss: 0.718, Test Loss: 1.051
[51] [it=40000] Train Loss: 0.604, Test Loss: 1.085
[60] [it=47000] Train Loss: 0.521, Test Loss: 1.178
[70] [it=55000] Train Loss: 0.425, Test Loss: 1.370
[80] [it=63000] Train Loss: 0.348, Test Loss: 1.518
[93] [it=73000] Train Loss: 0.268, Test Loss: 1.859
[99] [it=78000] Train Loss: 0.248, Test Loss: 2.036
[109] [it=86000] Train Loss: 0.200, Test Loss: 2.351
[120] [it=94000] Train Loss: 0.161, Test Loss: 2.610
[130] [it=102000] Train Loss: 0.142, Test Loss: 2.976
[140] [it=110000] Train Loss: 0.117, Test Loss: 3.319
[149] [it=117000] Train Loss: 0.095, Test Loss: 3.593
Done training.
심층 신경망에는 여러 비선형 은닉층이 포함되어 있어 입력과 출력 간의 복잡한 상관 관계를 학습할 수 있는 표현력이 뛰어난 모델을 만듭니다. 그러나 최소한의 훈련 데이터로 이러한 복잡한 연관의 대부분은 샘플링 노이즈로 인해 발생합니다. 따라서 동일한 분포에서 파생된 경우에도 훈련 세트에는 존재하지만 실제 테스트 데이터에는 존재하지 않습니다. 동일한 데이터 세트에 가능한 모든 대체 신경망을 맞추고 각 모델의 예측을 평균화하는 것은 과적합을 줄이는 한 가지 방법입니다.
불가능하지만 앙상블이라고 하는 개별 모델의 작은 그룹을 활용하여 근사화할 수 있습니다.
개별적으로 훈련된 네트워크는 대규모 신경망으로 인해 비용이 많이 듭니다. 여러 모델을 결합하는 것은 개별 모델이 서로 구별될 때 가장 유용하며, 신경망 모델은 다양한 디자인을 가지거나 별도의 데이터에 대해 학습하여 구체적이어야 합니다. 각 아키텍처에 대한 이상적인 하이퍼파라미터를 결정하는 것은 까다로운 작업이고 각각의 거대한 네트워크를 교육하려면 상당한 양의 컴퓨팅이 필요하기 때문에 많은 설계를 교육하는 것은 어렵습니다.
또한 광범위한 네트워크에는 종종 상당한 양의 훈련 데이터가 필요하며 데이터의 서로 다른 하위 집합에 대해 별도의 네트워크를 훈련시키기에는 데이터가 충분하지 않을 수 있습니다. 따라서 앙상블 근사법을 사용하더라도 다양한 모델을 피팅하고 저장해야 하는 어려움이 있으며, 이는 모델이 거대하고 훈련 및 튜닝에 며칠 또는 몇 주가 소요될 경우 어려울 수 있습니다.
훈련 중에 드롭아웃 아래 계층의 출력이 무작위로 서브샘플링되면 네트워크의 용량이 감소합니다. 결과적으로 탈락자는 보다 광범위한 네트워크가 필요할 수 있습니다.
드롭아웃이 있는 신경망을 훈련할 때 특정 레이어 출력은 무시되거나 무작위로 삭제됩니다. 이는 노드의 수와 선행 레이어와의 연결성을 다르게 하는 레이어를 갖는 효과가 있습니다.
훈련 중 계층에 대한 각 업데이트는 구성된 계층의 고유한 보기로 이루어집니다. Dropout은 훈련 과정을 시끄럽게 만들어 계층 내의 노드가 확률적으로 입력에 대해 다소간 책임을 지게 합니다.
훈련 데이터에서 뉴런은 동료가 저지른 오류를 수정하는 방법을 배웁니다. 이 과정을 공동 적응이라고 합니다. 결과적으로, 훈련 데이터에 맞는 네트워크의 능력이 극적으로 향상됩니다. 공동 적응은 훈련 데이터의 특성에 맞게 조정되기 때문에 테스트 데이터로 일반화되지 않습니다. 따라서 더 휘발성이 됩니다.
Nitish Srivastava, Ilya Sutskever, Geoffrey Hinton, Alex Krizhevsky 및 Ruslan Salakhutdinov는 신경 공동 적응의 영향을 최소화하기 위해 중도 탈락을 제안한 최초의 사람입니다.
출처: 논문 .
드롭아웃은 신경망에서 위에서 언급한 두 가지 문제를 모두 해결하는 기술입니다. 과적합을 피하고 무한히 많은 수의 고유한 신경망 토폴로지의 효과적인 근사 결합을 가능하게 합니다.
드롭아웃은 신경망에서 유닛(숨김 및 겉보기)을 제거하는 것을 의미합니다. 장치를 삭제한다는 것은 들어오고 나가는 모든 연결을 포함하여 네트워크에서 장치를 일시적으로 제거하는 것을 의미합니다. 드롭아웃은 주어진 계층에서 희소 활성화를 모방하여 네트워크가 희소 표현을 부작용으로 학습하도록 합니다. 자동 인코더 모델에서 희소 표현을 장려하기 위해 활동 정규화 대신 사용할 수 있습니다.
Dropout은 새로운 하이퍼파라미터(단위 p를 유지할 가능성)를 추가합니다. 이 하이퍼파라미터는 드롭아웃 정도를 제어합니다.
p = 1일 때 dropout이 없고 낮은 p 값은 dropout이 많다는 것을 나타냅니다. 은폐 장치의 경우 일반적인 p 값의 범위는 0.5에서 0.8입니다.
입력의 종류는 입력 레이어의 선택에 영향을 미칩니다. 실제 값 입력(이미지 패치 또는 오디오 프레임)의 일반적인 값은 0.8입니다.
은닉층에 대한 p의 선택은 은닉 유닛의 수 n과 연결됩니다. p가 작을수록 n이 커야 하므로 훈련 속도가 느려지고 과소적합이 발생합니다.
p가 크면 과적합을 피하기 위해 Dropout이 충분하지 않을 수 있습니다. 특정 계층에서 유닛은 활성 상태로 남아 있을 확률이 70%이고 유지 확률이 해당 계층에서 0.7%인 경우 드롭될 위험이 30%입니다.
PyTorch 및 기타 딥 러닝 프레임워크는 유지 비율 p 대신 중도 탈락 비율을 사용하며, 70% 유지 비율은 30% 중퇴 비율을 의미합니다.
훈련 시 드롭아웃은 훈련 데이터 세트의 변화를 견딜 수 있는 네트워크의 능력을 높이는 데만 사용됩니다. 테스트하는 동안 전체 네트워크를 사용하고 싶을 것입니다.
특히, 드롭아웃은 테스트 데이터 또는 프로덕션 추론 중에 사용되지 않습니다. 결과는 훈련 중보다 추론 중에 뉴런에서 더 많은 연결과 활성화를 갖게 된다는 것입니다.
따라서 훈련 중에 50%의 탈락률을 활용하고 계층의 뉴런 4개 중 2개를 제거하면 후속 계층의 뉴런이 과도하게 흥분됩니다. 따라서 이러한 뉴런에 의해 생성된 값은 필요한 것보다 50% 더 큽니다. 유지 확률(1 – 탈락률)은 테스트 및 추론 시간 동안 과활성화된 뉴런의 가중치를 줄이는 데 사용됩니다. Datascience Stackexchange 회원인 Dmytro Prylipko 가 만든 아래 그림 은 이것이 어떻게 작동하는지 아름답게 보여줍니다.
Geoffrey Hinton과 동료들이 2012년 기능 탐지기의 공동 적응을 방지하여 신경망 개선 이라는 제목의 논문에서 처음 소개한 드롭아웃은 사진 분류(CIFAR-10), 필기 숫자 인식(MNIST)을 비롯한 광범위한 문제 유형에 적용되었습니다. ) 및 음성 인식(TIMIT).
이 백서는 MNIST 데이터 세트의 드롭아웃 애플리케이션에 대한 몇 가지 흥미로운 사실을 추론할 수 있습니다.
Dropout이 2014년 Dropout: A Simple Way to Prevent Neural Networks from Overfitting 이라는 저널 출판물에 소개되었을 때 다양한 음성 인식, 컴퓨터 비전 및 텍스트 분류 문제에 대해 시도한 결과 모든 문제에서 성능이 향상되었음을 발견했습니다. . 이 문서에서는 드롭아웃 신경망을 훈련하는 절차를 설명합니다.
Alex Krizhevsky 등은 2012년 간행물 " ImageNet Classification with Deep Convolutional Neural Networks "에서 드롭아웃 정규화와 함께 심층 합성곱 신경망을 사용하여 ImageNet 데이터 세트의 이미지 분류에 대한 최신 결과를 생성했습니다.
Dropout 접근 방식은 PyTorch가 코딩을 간단하고 간단하게 만든 방법을 잘 보여줍니다.
두 줄의 코드로 처음에는 어려운 것처럼 보이는 목표를 달성할 수 있습니다. 모델을 개발할 때 드롭아웃 레이어를 추가하기만 하면 됩니다. 클래스 torch.nn.Dropout()
는 이를 수행하는 데 사용됩니다.
입력 텐서 요소 중 일부는 훈련 중에 이 클래스에 의해 무작위로 비활성화됩니다. 매개변수 p는 뉴런이 비활성화될 가능성을 제공합니다. 이 옵션의 기본값은 0.5이며, 이는 뉴런의 절반이 탈락함을 의미합니다. 출력은 1/1-p의 비율로 조정되며, 이는 모듈이 평가 중에 항등 함수만 계산함을 나타냅니다.
우리 모델의 아키텍처를 자세히 살펴보십시오. 우리는 베이스라인 모델의 버전을 업데이트하기 위해 아래 프로그램을 사용할 것입니다.
이 시나리오에서 드롭아웃은 각 단계 후에 내림차순으로 적용됩니다.
class NetDropout(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.do1 = nn.Dropout(0.2) # 20% Probability
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.do2 = nn.Dropout(0.2) # 20% Probability
self.fc2 = nn.Linear(120, 84)
self.do3 = nn.Dropout(0.1) # 10% Probability
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.do1(x)
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = self.do2(x)
x = F.relu(self.fc2(x))
x = self.do3(x)
x = self.fc3(x)
return x
net_dropout = NetDropout()
net_dropout.to(device)
이전과 같이 훈련을 합시다.
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net_dropout.parameters(), lr=0.001, momentum=0.9)
training_loss_d, testing_loss_d = [], []
running_loss = []
i = 0
for epoch in range(150): # 10 epochs
for data in train_loader:
inputs, labels = data
# get the data to GPU (if available)
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
# forward pass
outputs = net_dropout(inputs)
# backward pass
loss = criterion(outputs, labels)
loss.backward()
# update gradients
optimizer.step()
running_loss.append(loss.item())
i += 1
if i % 1000 == 0:
avg_train_loss = sum(running_loss) / len(running_loss)
avg_test_loss = get_test_loss(net_dropout, criterion, test_loader)
# clear the list
running_loss.clear()
# for logging & plotting later
training_loss_d.append(avg_train_loss)
testing_loss_d.append(avg_test_loss)
print(f"[{epoch:2d}] [it={i:5d}] Train Loss: {avg_train_loss:.3f}, Test Loss: {avg_test_loss:.3f}")
print("Done training.")
[ 1] [it= 1000] Train Loss: 2.302, Test Loss: 2.298
[10] [it= 8000] Train Loss: 1.510, Test Loss: 1.489
[20] [it=16000] Train Loss: 1.290, Test Loss: 1.318
[30] [it=24000] Train Loss: 1.167, Test Loss: 1.214
[40] [it=32000] Train Loss: 1.085, Test Loss: 1.154
[49] [it=39000] Train Loss: 1.025, Test Loss: 1.141
[60] [it=47000] Train Loss: 0.979, Test Loss: 1.113
[70] [it=55000] Train Loss: 0.936, Test Loss: 1.082
[80] [it=63000] Train Loss: 0.902, Test Loss: 1.088
[90] [it=71000] Train Loss: 0.880, Test Loss: 1.087
[99] [it=78000] Train Loss: 0.856, Test Loss: 1.090
[109] [it=86000] Train Loss: 0.843, Test Loss: 1.094
[120] [it=94000] Train Loss: 0.818, Test Loss: 1.102
[130] [it=102000] Train Loss: 0.805, Test Loss: 1.090
[140] [it=110000] Train Loss: 0.796, Test Loss: 1.094
[149] [it=117000] Train Loss: 0.785, Test Loss: 1.115
Done training.
이제 두 네트워크(드롭아웃 포함 및 제외)의 테스트 손실을 플롯해 보겠습니다.
import matplotlib.pyplot as plt
# plot both benchmarks
plt.plot(testing_loss, label="no dropout")
plt.plot(testing_loss_d, label="with dropout")
# make the legend on the plot
plt.legend()
plt.title("The Cross-entropy loss of the MNIST test data with and w/o Dropout")
plt.show()
산출:
보시다시피, 드롭아웃이 없는 신경망의 테스트 손실은 약 20 Epoch 이후 증가하기 시작하여 분명히 과적합되었습니다. 반면 dropout을 도입하면 시간이 지남에 따라 계속 감소합니다.
드롭아웃 기술을 적용한 후 약간의 개선이 있음을 알 수 있습니다. Dropout은 과적합을 피하기 위해 사용할 수 있는 유일한 기술이 아닙니다. 체중 감소 또는 조기 중단과 같은 다른 기술이 있습니다.
여기 에서 Colab 노트북에서 이 튜토리얼의 전체 코드를 얻을 수 있습니다 .
https://www.thepythoncode.com 의 원본 기사 출처
#pytorch #python #deeplearning
1652176920
Conozca la importancia de la regularización de la deserción y cómo aplicarla en el marco de aprendizaje profundo de PyTorch en Python.
Dado que el rendimiento del modelo puede mejorarse considerablemente ajustando sus hiperparámetros, ajustarlos implica identificar parámetros óptimos que proporcionarán un rendimiento más sobresaliente que los hiperparámetros predeterminados del modelo. Podemos utilizar varias técnicas para el ajuste de hiperparámetros. Uno de ellos es el abandono.
En este tutorial, presentaremos la regularización de abandonos para redes neuronales. Primero exploramos los antecedentes y la motivación para adoptar el abandono, seguido de una descripción de cómo funciona teóricamente el abandono y cómo implementarlo en la biblioteca Pytorch en Python.
También veremos un gráfico de la pérdida en el conjunto de prueba a lo largo del tiempo en la red neuronal con y sin abandono.
Para la demostración, usaremos el conjunto de datos MNIST que está disponible en la biblioteca de torchvision. Instalemos todas las dependencias de este tutorial:
$ pip install matplotlib==3.4.3 numpy==1.21.5 torch==1.10.1 torchvision
Si está en Colab, no tiene que instalar nada, ya que todo viene preinstalado. Importación de las bibliotecas necesarias:
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
Asegurémonos de entrenar usando GPU si CUDA está disponible:
# defining our device, 'cuda:0' if CUDA is available, 'cpu' otherwise
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
device
device(type='cuda', index=0)
Hagamos la canalización de transformación para nuestro conjunto de datos MNIST:
# make the transform pipeline, converting to tensor and normalizing
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
# the batch size during training
batch_size = 64
A continuación, cargando el conjunto de datos de entrenamiento:
train_dataset = torchvision.datasets.CIFAR10(root="./data", train=True,
download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size,
shuffle=True, num_workers=2)
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
170499072/? [00:06<00:00, 29640614.08it/s]
Extracting ./data/cifar-10-python.tar.gz to ./data
Pasamos True
a train
para indicar el conjunto de datos de entrenamiento, y también configuramos download
a para descargar el conjunto de datos en la carpeta True
especificada . data
Después de eso, hacemos nuestro DataLoader
, pasamos el , y batch_size
establecemos .shuffleTrue
Del mismo modo, para el conjunto de prueba:
test_dataset = torchvision.datasets.CIFAR10(root="./data", train=False,
download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size,
shuffle=False, num_workers=2)
Files already downloaded and verified
Esta vez nos dispusimos train
a False
obtener el conjunto de pruebas.
A continuación se muestran las clases disponibles en el conjunto de datos MNIST:
# the MNIST classes
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
Ahora que tenemos el conjunto de datos listo para el entrenamiento, hagamos una red neuronal sin abandono:
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
# switch to GPU if available
net.to(device)
Inicializando la pérdida de entropía cruzada y el optimizador SGD :
import torch.optim as optim
# defining the loss and the optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
A continuación, hagamos una función que acepte la red neuronal, la pérdida y el cargador de datos para calcular la pérdida total:
def get_test_loss(net, criterion, data_loader):
"""A simple function that iterates over `data_loader` to calculate the overall loss"""
testing_loss = []
with torch.no_grad():
for data in data_loader:
inputs, labels = data
# get the data to GPU (if available)
inputs, labels = inputs.to(device), labels.to(device)
outputs = net(inputs)
# calculate the loss for this batch
loss = criterion(outputs, labels)
# add the loss of this batch to the list
testing_loss.append(loss.item())
# calculate the average loss
return sum(testing_loss) / len(testing_loss)
Necesitaremos esta función durante el entrenamiento. Empecemos a entrenar ahora:
training_loss, testing_loss = [], []
running_loss = []
i = 0
for epoch in range(150): # 150 epochs
for data in train_loader:
inputs, labels = data
# get the data to GPU (if available)
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
# forward pass
outputs = net(inputs)
# backward pass
loss = criterion(outputs, labels)
loss.backward()
# update gradients
optimizer.step()
running_loss.append(loss.item())
i += 1
if i % 1000 == 0:
avg_train_loss = sum(running_loss) / len(running_loss)
avg_test_loss = get_test_loss(net, criterion, test_loader)
# clear the list
running_loss.clear()
# for logging & plotting later
training_loss.append(avg_train_loss)
testing_loss.append(avg_test_loss)
print(f"[{epoch:2d}] [it={i:5d}] Train Loss: {avg_train_loss:.3f}, Test Loss: {avg_test_loss:.3f}")
print("Done training.")
Producción:
[ 1] [it= 1000] Train Loss: 2.273, Test Loss: 2.118
[10] [it= 8000] Train Loss: 1.312, Test Loss: 1.326
[20] [it=16000] Train Loss: 1.031, Test Loss: 1.120
[30] [it=24000] Train Loss: 0.854, Test Loss: 1.043
[40] [it=32000] Train Loss: 0.718, Test Loss: 1.051
[51] [it=40000] Train Loss: 0.604, Test Loss: 1.085
[60] [it=47000] Train Loss: 0.521, Test Loss: 1.178
[70] [it=55000] Train Loss: 0.425, Test Loss: 1.370
[80] [it=63000] Train Loss: 0.348, Test Loss: 1.518
[93] [it=73000] Train Loss: 0.268, Test Loss: 1.859
[99] [it=78000] Train Loss: 0.248, Test Loss: 2.036
[109] [it=86000] Train Loss: 0.200, Test Loss: 2.351
[120] [it=94000] Train Loss: 0.161, Test Loss: 2.610
[130] [it=102000] Train Loss: 0.142, Test Loss: 2.976
[140] [it=110000] Train Loss: 0.117, Test Loss: 3.319
[149] [it=117000] Train Loss: 0.095, Test Loss: 3.593
Done training.
Las redes neuronales profundas incluyen varias capas ocultas no lineales, lo que las convierte en modelos altamente expresivos capaces de aprender correlaciones complejas entre entradas y salidas. Sin embargo, con datos de entrenamiento mínimos, muchas de estas asociaciones complejas resultarán del muestreo de ruido; por lo tanto, existirán en el conjunto de entrenamiento pero no en los datos de prueba reales, incluso si se derivan de la misma distribución. Ajustar todas las redes neuronales alternativas factibles en el mismo conjunto de datos y promediar las predicciones de cada modelo es un método para reducir el sobreajuste.
Es imposible, pero se puede aproximar utilizando un pequeño grupo de modelos distintos conocido como conjunto.
Las redes entrenadas individualmente son costosas con redes neuronales masivas. La combinación de muchos modelos es más útil cuando los modelos individuales son distintos entre sí, y los modelos de redes neuronales deben ser específicos al tener varios diseños o estar entrenados en datos separados. Entrenar muchos diseños es difícil porque determinar los hiperparámetros ideales para cada arquitectura es una tarea exigente, y entrenar cada red enorme exige una cantidad significativa de cómputo.
Además, las redes extensas a menudo necesitan una cantidad considerable de datos de entrenamiento y es posible que no haya suficientes datos para entrenar redes separadas en diferentes subconjuntos de datos. Por lo tanto, incluso con la aproximación del conjunto, existe la dificultad de que necesita varios modelos para ajustarse y almacenarse, lo que puede ser un desafío si los modelos son enormes y lleva días o semanas entrenarlos y ajustarlos.
Durante el entrenamiento, la capacidad de la red se reduce cuando las salidas de una capa bajo abandono se submuestrean aleatoriamente. En consecuencia, los desertores pueden necesitar una red más extensa.
Cuando se entrenan redes neuronales con deserción, las salidas de capas específicas se ignoran o se eliminan al azar. Esto tiene el efecto de que la capa tenga un número diferente de nodos y conectividad con respecto a la capa anterior.
Cada actualización de una capa durante el entrenamiento se realiza con una vista distinta de la capa configurada. El abandono hace que el proceso de entrenamiento sea ruidoso, lo que hace que los nodos dentro de una capa asuman más o menos responsabilidad por las entradas de forma probabilística.
Sobre los datos de entrenamiento, las neuronas aprenden a corregir errores cometidos por sus pares, proceso conocido como coadaptación. En consecuencia, la capacidad de la red para adaptarse a los datos de entrenamiento mejora drásticamente. Debido a que las coadaptaciones se adaptan a las particularidades de los datos de entrenamiento, no se generalizarán a los datos de prueba; por lo tanto, se vuelve más volátil.
Nitish Srivastava, Ilya Sutskever, Geoffrey Hinton, Alex Krizhevsky y Ruslan Salakhutdinov fueron los primeros en proponer el abandono para minimizar los efectos de la coadaptación neural.
Fuente: el papel .
Dropout es una técnica que aborda los dos problemas mencionados anteriormente en las redes neuronales. Evita el sobreajuste y permite la combinación de aproximación efectiva de un número indefinidamente grande de topologías de redes neuronales únicas.
El abandono se refiere a la eliminación de unidades (tanto ocultas como aparentes) de una red neuronal. Dar de baja una unidad implica retirarla temporalmente de la red, incluidas todas sus conexiones entrantes y salientes. El abandono imita una activación escasa de una capa determinada, lo que empuja a la red a aprender una representación escasa como efecto secundario. Podría emplearse en lugar de la regularización de actividades para fomentar representaciones dispersas en modelos de codificador automático.
Dropout añade un nuevo hiperparámetro: la probabilidad de mantener una unidad p. Este hiperparámetro controla el grado de Dropout.
No hay abandono cuando p = 1, y los valores bajos de p indican que hay muchos abandonos. Para unidades ocultas, los valores de p típicos oscilan entre 0,5 y 0,8.
El tipo de entrada influye en la elección de las capas de entrada. Un valor típico para entradas de valor real (parches de imagen o cuadros de audio) es 0,8.
La elección de p para capas ocultas está ligada al número de unidades ocultas n. Una p más pequeña exige una n grande, lo que ralentiza el entrenamiento y conduce a un ajuste insuficiente.
Cuando p es grande, es posible que no haya suficiente caída para evitar el sobreajuste. En una capa en particular, las Unidades tendrán un 70 % de posibilidades de permanecer activas y un 30 % de riesgo de ser eliminadas si la probabilidad de retención es del 0,7 % en esa capa.
Tenga en cuenta que PyTorch y otros marcos de aprendizaje profundo utilizan una tasa de abandono en lugar de una tasa de mantenimiento p, una tasa de mantenimiento del 70 % significa una tasa de abandono del 30 %, y así sucesivamente.
Durante el entrenamiento, los abandonos se utilizan únicamente para aumentar la capacidad de la red para soportar cambios en el conjunto de datos de entrenamiento. Durante la prueba, probablemente querrá usar toda la red.
En particular, el abandono no se usa con datos de prueba o durante la inferencia de producción. El resultado es que tendrás más conexiones y activaciones en tu neurona durante la inferencia que durante el entrenamiento.
Las neuronas en la capa subsiguiente se sobreexcitarían si utiliza una tasa de abandono del 50 % durante el entrenamiento y elimina dos de cada cuatro neuronas en una capa. Así, los valores producidos por estas neuronas serán un 50% mayores de lo necesario. La probabilidad de retención (1 – tasa de abandono) se utiliza para reducir el peso de las neuronas sobreactivadas durante el tiempo de prueba e inferencia. La figura a continuación, creada por Dmytro Prylipko , miembro de Datascience Stackexchange, muestra maravillosamente cómo funciona esto:
Presentado por primera vez por Geoffrey Hinton y sus colegas en su artículo de 2012 titulado Mejora de las redes neuronales mediante la prevención de la coadaptación de los detectores de características , el abandono se ha aplicado a una amplia gama de tipos de problemas, incluida la clasificación de fotografías (CIFAR-10), reconocimiento de dígitos escritos a mano (MNIST ), y reconocimiento de voz (TIMIT).
Este documento puede deducir algunos datos interesantes sobre las aplicaciones de abandono en el conjunto de datos MNIST.
Cuando se introdujo dropout en una publicación de revista de 2014 titulada Dropout: A Simple Way to Prevent Neural Networks from Overfitting , se probó en una amplia variedad de problemas de reconocimiento de voz, visión artificial y clasificación de texto y descubrió que mejoraba el rendimiento en todos ellos. . Este artículo describe el procedimiento para entrenar redes neuronales de abandono.
Usando redes neuronales convolucionales profundas con regularización de abandono, Alex Krizhevsky, et al., en su publicación de 2012 " Clasificación de ImageNet con redes neuronales convolucionales profundas ", produjeron resultados de última generación para la clasificación de imágenes en el conjunto de datos de ImageNet.
El enfoque Dropout es una excelente ilustración de cómo PyTorch ha hecho que la codificación sea simple y directa.
Con dos líneas de código, podemos lograr nuestro objetivo, que parece ser difícil al principio. Solo necesitamos agregar una capa adicional de abandono al desarrollar nuestro modelo. La clase torch.nn.Dropout()
se utilizará para hacer esto.
Esta clase desactiva aleatoriamente algunos de los elementos del tensor de entrada durante el entrenamiento. El parámetro p da la probabilidad de que una neurona se desactive. Esta opción tiene un valor predeterminado de 0,5, lo que implica que la mitad de las neuronas se perderán. Las salidas se escalan por un factor de 1/1-p, lo que indica que el módulo simplemente calcula la función de identidad durante la evaluación.
Eche un vistazo más de cerca a la arquitectura de nuestro modelo. Usaremos el siguiente programa para actualizar la versión del modelo de referencia.
En este escenario, se aplica un abandono después de cada paso en orden decreciente.
class NetDropout(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.do1 = nn.Dropout(0.2) # 20% Probability
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.do2 = nn.Dropout(0.2) # 20% Probability
self.fc2 = nn.Linear(120, 84)
self.do3 = nn.Dropout(0.1) # 10% Probability
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.do1(x)
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = self.do2(x)
x = F.relu(self.fc2(x))
x = self.do3(x)
x = self.fc3(x)
return x
net_dropout = NetDropout()
net_dropout.to(device)
Hagamos el entrenamiento como antes:
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net_dropout.parameters(), lr=0.001, momentum=0.9)
training_loss_d, testing_loss_d = [], []
running_loss = []
i = 0
for epoch in range(150): # 10 epochs
for data in train_loader:
inputs, labels = data
# get the data to GPU (if available)
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
# forward pass
outputs = net_dropout(inputs)
# backward pass
loss = criterion(outputs, labels)
loss.backward()
# update gradients
optimizer.step()
running_loss.append(loss.item())
i += 1
if i % 1000 == 0:
avg_train_loss = sum(running_loss) / len(running_loss)
avg_test_loss = get_test_loss(net_dropout, criterion, test_loader)
# clear the list
running_loss.clear()
# for logging & plotting later
training_loss_d.append(avg_train_loss)
testing_loss_d.append(avg_test_loss)
print(f"[{epoch:2d}] [it={i:5d}] Train Loss: {avg_train_loss:.3f}, Test Loss: {avg_test_loss:.3f}")
print("Done training.")
[ 1] [it= 1000] Train Loss: 2.302, Test Loss: 2.298
[10] [it= 8000] Train Loss: 1.510, Test Loss: 1.489
[20] [it=16000] Train Loss: 1.290, Test Loss: 1.318
[30] [it=24000] Train Loss: 1.167, Test Loss: 1.214
[40] [it=32000] Train Loss: 1.085, Test Loss: 1.154
[49] [it=39000] Train Loss: 1.025, Test Loss: 1.141
[60] [it=47000] Train Loss: 0.979, Test Loss: 1.113
[70] [it=55000] Train Loss: 0.936, Test Loss: 1.082
[80] [it=63000] Train Loss: 0.902, Test Loss: 1.088
[90] [it=71000] Train Loss: 0.880, Test Loss: 1.087
[99] [it=78000] Train Loss: 0.856, Test Loss: 1.090
[109] [it=86000] Train Loss: 0.843, Test Loss: 1.094
[120] [it=94000] Train Loss: 0.818, Test Loss: 1.102
[130] [it=102000] Train Loss: 0.805, Test Loss: 1.090
[140] [it=110000] Train Loss: 0.796, Test Loss: 1.094
[149] [it=117000] Train Loss: 0.785, Test Loss: 1.115
Done training.
Ahora, tracemos la pérdida de prueba de ambas redes (con y sin abandono):
import matplotlib.pyplot as plt
# plot both benchmarks
plt.plot(testing_loss, label="no dropout")
plt.plot(testing_loss_d, label="with dropout")
# make the legend on the plot
plt.legend()
plt.title("The Cross-entropy loss of the MNIST test data with and w/o Dropout")
plt.show()
Producción:
Como puede ver, la pérdida de prueba de la red neuronal sin abandono comenzó a aumentar después de aproximadamente 20 épocas, claramente sobreajustada. Mientras que al introducir la deserción, sigue disminuyendo con el tiempo.
Tras aplicar la técnica de abandono, notamos una ligera mejoría. La deserción no es la única técnica que se puede utilizar para evitar el sobreajuste. Existen otras técnicas, como la disminución de peso o la parada temprana.
Puede obtener el código completo de este tutorial en el cuaderno de Colab aquí .
Fuente del artículo original en https://www.thepythoncode.com
#pytorch #python #deeplearning
1652165632
Tìm hiểu tầm quan trọng của chính quy bỏ học và cách áp dụng nó trong khung học tập Sâu PyTorch bằng Python.
Vì hiệu suất của mô hình có thể được nâng cao đáng kể bằng cách tinh chỉnh các siêu tham số của chúng, việc điều chỉnh chúng liên quan đến việc xác định các tham số tối ưu sẽ mang lại hiệu suất vượt trội hơn so với siêu tham số mặc định của mô hình. Chúng ta có thể sử dụng một số kỹ thuật để điều chỉnh siêu thông số. Một trong số đó là học sinh bỏ học.
Trong hướng dẫn này, chúng tôi sẽ trình bày về quy định bỏ qua cho mạng nơ-ron. Đầu tiên chúng ta khám phá nền tảng và động lực để chấp nhận bỏ học, tiếp theo là mô tả về cách thức hoạt động của việc bỏ học về mặt lý thuyết và cách triển khai nó trong thư viện Pytorch bằng Python.
Chúng ta cũng sẽ thấy một biểu đồ về sự mất mát trong quá trình thử nghiệm được thiết lập theo thời gian trên mạng nơ-ron có và không có thời gian bỏ qua.
Để trình diễn, chúng tôi sẽ sử dụng tập dữ liệu MNIST có sẵn trong thư viện torchvision. Hãy cài đặt tất cả các phụ thuộc của hướng dẫn này:
$ pip install matplotlib==3.4.3 numpy==1.21.5 torch==1.10.1 torchvision
Nếu bạn đang sử dụng Colab, bạn không phải cài đặt bất kỳ thứ gì, vì mọi thứ đã được cài đặt sẵn. Nhập các thư viện cần thiết:
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
Hãy đảm bảo rằng chúng tôi đào tạo bằng GPU nếu CUDA có sẵn:
# defining our device, 'cuda:0' if CUDA is available, 'cpu' otherwise
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
device
device(type='cuda', index=0)
Hãy tạo đường dẫn chuyển đổi cho tập dữ liệu MNIST của chúng tôi:
# make the transform pipeline, converting to tensor and normalizing
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
# the batch size during training
batch_size = 64
Tiếp theo, tải tập dữ liệu đào tạo:
train_dataset = torchvision.datasets.CIFAR10(root="./data", train=True,
download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size,
shuffle=True, num_workers=2)
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
170499072/? [00:06<00:00, 29640614.08it/s]
Extracting ./data/cifar-10-python.tar.gz to ./data
Chúng tôi chuyển True
đến train
để chỉ ra tập dữ liệu đào tạo và chúng tôi cũng thiết lập tải download
xuống True
tập dữ liệu vào data
thư mục được chỉ định. Sau đó, chúng tôi thực hiện của chúng tôi DataLoader
, chuyển batch_size
và đặt shuffle
thành True
.
Tương tự, đối với bộ thử nghiệm:
test_dataset = torchvision.datasets.CIFAR10(root="./data", train=False,
download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size,
shuffle=False, num_workers=2)
Files already downloaded and verified
Lần này chúng tôi thiết lập train
để False
có được bộ thử nghiệm.
Dưới đây là các lớp có sẵn trong tập dữ liệu MNIST:
# the MNIST classes
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
Bây giờ chúng ta đã có tập dữ liệu sẵn sàng để đào tạo, hãy tạo một mạng nơ-ron không bị bỏ rơi:
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
# switch to GPU if available
net.to(device)
Khởi tạo tổn thất entropy chéo và trình tối ưu hóa SGD :
import torch.optim as optim
# defining the loss and the optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
Tiếp theo, hãy tạo một hàm chấp nhận mạng nơ-ron, tổn thất và bộ tải dữ liệu để tính toán tổn thất tổng thể:
def get_test_loss(net, criterion, data_loader):
"""A simple function that iterates over `data_loader` to calculate the overall loss"""
testing_loss = []
with torch.no_grad():
for data in data_loader:
inputs, labels = data
# get the data to GPU (if available)
inputs, labels = inputs.to(device), labels.to(device)
outputs = net(inputs)
# calculate the loss for this batch
loss = criterion(outputs, labels)
# add the loss of this batch to the list
testing_loss.append(loss.item())
# calculate the average loss
return sum(testing_loss) / len(testing_loss)
Chúng tôi sẽ cần chức năng này trong quá trình đào tạo. Hãy bắt đầu đào tạo ngay bây giờ:
training_loss, testing_loss = [], []
running_loss = []
i = 0
for epoch in range(150): # 150 epochs
for data in train_loader:
inputs, labels = data
# get the data to GPU (if available)
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
# forward pass
outputs = net(inputs)
# backward pass
loss = criterion(outputs, labels)
loss.backward()
# update gradients
optimizer.step()
running_loss.append(loss.item())
i += 1
if i % 1000 == 0:
avg_train_loss = sum(running_loss) / len(running_loss)
avg_test_loss = get_test_loss(net, criterion, test_loader)
# clear the list
running_loss.clear()
# for logging & plotting later
training_loss.append(avg_train_loss)
testing_loss.append(avg_test_loss)
print(f"[{epoch:2d}] [it={i:5d}] Train Loss: {avg_train_loss:.3f}, Test Loss: {avg_test_loss:.3f}")
print("Done training.")
Đầu ra:
[ 1] [it= 1000] Train Loss: 2.273, Test Loss: 2.118
[10] [it= 8000] Train Loss: 1.312, Test Loss: 1.326
[20] [it=16000] Train Loss: 1.031, Test Loss: 1.120
[30] [it=24000] Train Loss: 0.854, Test Loss: 1.043
[40] [it=32000] Train Loss: 0.718, Test Loss: 1.051
[51] [it=40000] Train Loss: 0.604, Test Loss: 1.085
[60] [it=47000] Train Loss: 0.521, Test Loss: 1.178
[70] [it=55000] Train Loss: 0.425, Test Loss: 1.370
[80] [it=63000] Train Loss: 0.348, Test Loss: 1.518
[93] [it=73000] Train Loss: 0.268, Test Loss: 1.859
[99] [it=78000] Train Loss: 0.248, Test Loss: 2.036
[109] [it=86000] Train Loss: 0.200, Test Loss: 2.351
[120] [it=94000] Train Loss: 0.161, Test Loss: 2.610
[130] [it=102000] Train Loss: 0.142, Test Loss: 2.976
[140] [it=110000] Train Loss: 0.117, Test Loss: 3.319
[149] [it=117000] Train Loss: 0.095, Test Loss: 3.593
Done training.
Mạng nơ-ron sâu bao gồm một số lớp ẩn phi tuyến tính, làm cho chúng trở thành các mô hình biểu cảm cao có khả năng học các mối tương quan phức tạp giữa đầu vào và đầu ra. Tuy nhiên, với dữ liệu đào tạo tối thiểu, nhiều liên kết phức tạp này sẽ là kết quả của nhiễu lấy mẫu; do đó, chúng sẽ tồn tại trong tập huấn luyện nhưng không tồn tại trong dữ liệu thử nghiệm thực tế, ngay cả khi chúng được bắt nguồn từ cùng một phân phối. Việc kết hợp tất cả các mạng nơ-ron thay thế khả thi trên cùng một tập dữ liệu và lấy trung bình các dự đoán từ mỗi mô hình là một phương pháp để giảm việc trang bị quá mức.
Điều này là không thể, nhưng nó có thể được tính gần đúng bằng cách sử dụng một nhóm nhỏ các mô hình riêng biệt được gọi là một nhóm.
Các mạng được đào tạo riêng lẻ rất tốn kém với các mạng nơ-ron khổng lồ. Việc kết hợp nhiều mô hình là hữu ích nhất khi các mô hình riêng lẻ khác biệt với nhau và các mô hình mạng thần kinh phải cụ thể bằng cách có nhiều thiết kế khác nhau hoặc được đào tạo trên dữ liệu riêng biệt. Việc đào tạo nhiều thiết kế là rất khó bởi vì việc xác định các siêu tham số lý tưởng cho mỗi kiến trúc là một công việc đòi hỏi khắt khe và việc đào tạo mỗi mạng khổng lồ đòi hỏi một lượng tính toán đáng kể.
Hơn nữa, các mạng mở rộng thường cần một lượng đáng kể dữ liệu huấn luyện và có thể không có đủ dữ liệu để huấn luyện các mạng riêng biệt trên các tập con khác nhau của dữ liệu. Vì vậy, ngay cả với tính gần đúng tổng thể, có một khó khăn là nó cần nhiều mô hình khác nhau để vừa vặn và lưu trữ, điều này có thể là một thách thức nếu các mô hình quá lớn và mất nhiều ngày hoặc vài tuần để đào tạo và điều chỉnh.
Trong quá trình đào tạo, dung lượng của mạng bị giảm khi kết quả đầu ra của một lớp bị loại bỏ được lấy mẫu con ngẫu nhiên. Do đó, học sinh bỏ học có thể cần một mạng lưới rộng khắp hơn.
Khi huấn luyện mạng nơ-ron có tính năng bỏ qua, kết quả đầu ra của lớp cụ thể bị bỏ qua hoặc bị loại bỏ một cách ngẫu nhiên. Điều này có tác dụng làm cho lớp có số lượng nút và độ kết nối khác với lớp trước đó.
Mỗi bản cập nhật cho một lớp trong quá trình đào tạo được thực hiện với một cái nhìn riêng biệt về lớp đã được định cấu hình. Việc bỏ học làm cho quá trình đào tạo trở nên ồn ào, khiến các nút trong một lớp phải chịu trách nhiệm nhiều hơn hoặc ít hơn đối với các đầu vào trên cơ sở xác suất.
Trên dữ liệu đào tạo, các tế bào thần kinh học cách sửa chữa các lỗi do đồng nghiệp của chúng mắc phải, một quá trình được gọi là đồng thích ứng. Do đó, khả năng của mạng để phù hợp với dữ liệu đào tạo được cải thiện đáng kể. Bởi vì các đồng điều chỉnh được điều chỉnh cho phù hợp với đặc thù của dữ liệu đào tạo, chúng sẽ không khái quát hóa thành dữ liệu thử nghiệm; do đó, nó trở nên dễ bay hơi hơn.
Nitish Srivastava, Ilya Sutskever, Geoffrey Hinton, Alex Krizhevsky và Ruslan Salakhutdinov là những người đầu tiên đề xuất bỏ học để giảm thiểu tác động của đồng thích ứng thần kinh.
Nguồn: báo .
Bỏ học là một kỹ thuật giải quyết cả hai vấn đề được đề cập ở trên trong mạng nơ-ron. Nó tránh trang bị quá nhiều và cho phép kết hợp xấp xỉ hiệu quả của một số lượng lớn vô hạn các cấu trúc liên kết mạng nơ-ron duy nhất.
Bỏ học đề cập đến việc loại bỏ các đơn vị (cả ẩn và rõ ràng) khỏi mạng nơ-ron. Bỏ một thiết bị ra ngoài có nghĩa là tạm thời xóa nó khỏi mạng, bao gồm tất cả các kết nối đến và đi của nó. Việc bỏ học bắt chước một kích hoạt thưa thớt từ một lớp nhất định, điều này thúc đẩy mạng học cách biểu diễn thưa thớt như một hiệu ứng phụ. Nó có thể được sử dụng thay vì quy định hóa hoạt động để khuyến khích các đại diện thưa thớt trong các mô hình tự động mã hóa.
Bỏ học thêm một siêu tham số mới: khả năng giữ một đơn vị p. Siêu tham số này kiểm soát mức độ Bỏ học.
Không có học sinh bỏ học khi p = 1 và giá trị p thấp cho thấy có rất nhiều học sinh bỏ học. Đối với các đơn vị ẩn, giá trị p điển hình nằm trong khoảng từ 0,5 đến 0,8.
Loại đầu vào ảnh hưởng đến việc lựa chọn các lớp đầu vào. Giá trị điển hình cho các đầu vào có giá trị thực (bản vá hình ảnh hoặc khung âm thanh) là 0,8.
Sự lựa chọn p cho các lớp ẩn được liên kết với số đơn vị ẩn n. P nhỏ hơn đòi hỏi n lớn, điều này làm chậm quá trình đào tạo và dẫn đến tình trạng thiếu trang bị.
Khi p lớn, có thể không có đủ Số lượng bỏ học để tránh bị quá tải. Trong một lớp cụ thể, các Đơn vị sẽ có 70% cơ hội vẫn hoạt động và 30% nguy cơ bị loại bỏ nếu xác suất duy trì trong lớp đó là 0,7%.
Xin lưu ý rằng PyTorch và các khuôn khổ học tập sâu khác sử dụng tỷ lệ bỏ học thay vì tỷ lệ duy trì p, tỷ lệ duy trì 70% có nghĩa là tỷ lệ bỏ học 30%, v.v.
Khi đào tạo, học viên bỏ học chỉ được sử dụng để tăng khả năng của mạng để chống lại những thay đổi trong tập dữ liệu đào tạo. Trong quá trình thử nghiệm, bạn có thể muốn sử dụng toàn bộ mạng.
Đáng chú ý, việc bỏ học không được sử dụng với dữ liệu thử nghiệm hoặc trong quá trình suy luận sản xuất. Kết quả là bạn sẽ có nhiều kết nối và kích hoạt hơn trong tế bào thần kinh của mình trong quá trình suy luận hơn là trong quá trình luyện tập.
Do đó, các tế bào thần kinh trong lớp tiếp theo sẽ bị kích thích quá mức nếu bạn sử dụng tỷ lệ bỏ học 50% trong quá trình đào tạo và loại bỏ hai trong số bốn tế bào thần kinh trong một lớp. Như vậy, các giá trị do các nơ-ron này tạo ra sẽ lớn hơn 50% so với mức cần thiết. Xác suất duy trì (1 - tỷ lệ bỏ học) được sử dụng để giảm trọng lượng của các tế bào thần kinh bị kích hoạt quá mức trong thời gian kiểm tra và suy luận. Hình dưới đây, được tạo ra bởi thành viên Dmytro Prylipko của Datascience Stackexchange , mô tả tuyệt vời cách thức hoạt động của nó:
Lần đầu tiên được giới thiệu bởi Geoffrey Hinton và các đồng nghiệp trong bài báo năm 2012 của họ có tiêu đề Cải thiện mạng nơ-ron bằng cách ngăn chặn sự đồng điều chỉnh của các bộ phát hiện tính năng , tính năng bỏ học đã được áp dụng cho một loạt các dạng vấn đề, bao gồm phân loại ảnh (CIFAR-10), nhận dạng chữ số viết tay (MNIST ) và nhận dạng giọng nói (TIMIT).
Bài báo này có thể suy ra một số sự thật thú vị về các ứng dụng bỏ học trên tập dữ liệu MNIST.
Khi tính năng bỏ học được giới thiệu trong một ấn phẩm tạp chí năm 2014 mang tên Bỏ học: Cách đơn giản để ngăn mạng thần kinh bị trang bị quá mức , nó đã được thử nghiệm trên nhiều vấn đề về nhận dạng giọng nói, thị giác máy tính và phân loại văn bản và phát hiện ra rằng nó đã cải thiện hiệu suất trên tất cả chúng. . Bài báo này mô tả quy trình huấn luyện lưới thần kinh bỏ học.
Sử dụng mạng nơ-ron phức hợp sâu với quy định bỏ lớp, Alex Krizhevsky và cộng sự, trong ấn phẩm năm 2012 của họ " Phân loại ImageNet với Mạng nơ-ron sâu ", đã tạo ra kết quả hiện đại để phân loại hình ảnh trên tập dữ liệu ImageNet.
Phương pháp Bỏ học là một minh họa tuyệt vời về cách PyTorch đã làm cho việc viết mã trở nên đơn giản và dễ hiểu.
Với hai dòng mã, chúng ta có thể đạt được mục tiêu của mình, mà thoạt đầu có vẻ là một khó khăn. Chúng tôi chỉ cần thêm một lớp bỏ trang bổ sung khi phát triển mô hình của mình. Lớp torch.nn.Dropout()
sẽ được sử dụng để làm điều này.
Một số phần tử tensor đầu vào bị vô hiệu hóa ngẫu nhiên bởi lớp này trong quá trình đào tạo. Tham số p cho biết khả năng một tế bào thần kinh bị vô hiệu hóa. Tùy chọn này có giá trị mặc định là 0,5, ngụ ý rằng một nửa số nơ-ron sẽ bị loại bỏ. Các kết quả đầu ra được chia tỷ lệ theo hệ số 1/1-p, điều này chỉ ra rằng mô-đun chỉ đơn thuần tính toán chức năng nhận dạng trong quá trình đánh giá.
Hãy xem xét kỹ hơn kiến trúc của mô hình của chúng tôi. Chúng tôi sẽ sử dụng chương trình dưới đây để cập nhật phiên bản của mô hình cơ sở.
Trong trường hợp này, bỏ học được áp dụng sau mỗi bước theo thứ tự giảm dần.
class NetDropout(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.do1 = nn.Dropout(0.2) # 20% Probability
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.do2 = nn.Dropout(0.2) # 20% Probability
self.fc2 = nn.Linear(120, 84)
self.do3 = nn.Dropout(0.1) # 10% Probability
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.do1(x)
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = self.do2(x)
x = F.relu(self.fc2(x))
x = self.do3(x)
x = self.fc3(x)
return x
net_dropout = NetDropout()
net_dropout.to(device)
Hãy thực hiện đào tạo như trước:
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net_dropout.parameters(), lr=0.001, momentum=0.9)
training_loss_d, testing_loss_d = [], []
running_loss = []
i = 0
for epoch in range(150): # 10 epochs
for data in train_loader:
inputs, labels = data
# get the data to GPU (if available)
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
# forward pass
outputs = net_dropout(inputs)
# backward pass
loss = criterion(outputs, labels)
loss.backward()
# update gradients
optimizer.step()
running_loss.append(loss.item())
i += 1
if i % 1000 == 0:
avg_train_loss = sum(running_loss) / len(running_loss)
avg_test_loss = get_test_loss(net_dropout, criterion, test_loader)
# clear the list
running_loss.clear()
# for logging & plotting later
training_loss_d.append(avg_train_loss)
testing_loss_d.append(avg_test_loss)
print(f"[{epoch:2d}] [it={i:5d}] Train Loss: {avg_train_loss:.3f}, Test Loss: {avg_test_loss:.3f}")
print("Done training.")
[ 1] [it= 1000] Train Loss: 2.302, Test Loss: 2.298
[10] [it= 8000] Train Loss: 1.510, Test Loss: 1.489
[20] [it=16000] Train Loss: 1.290, Test Loss: 1.318
[30] [it=24000] Train Loss: 1.167, Test Loss: 1.214
[40] [it=32000] Train Loss: 1.085, Test Loss: 1.154
[49] [it=39000] Train Loss: 1.025, Test Loss: 1.141
[60] [it=47000] Train Loss: 0.979, Test Loss: 1.113
[70] [it=55000] Train Loss: 0.936, Test Loss: 1.082
[80] [it=63000] Train Loss: 0.902, Test Loss: 1.088
[90] [it=71000] Train Loss: 0.880, Test Loss: 1.087
[99] [it=78000] Train Loss: 0.856, Test Loss: 1.090
[109] [it=86000] Train Loss: 0.843, Test Loss: 1.094
[120] [it=94000] Train Loss: 0.818, Test Loss: 1.102
[130] [it=102000] Train Loss: 0.805, Test Loss: 1.090
[140] [it=110000] Train Loss: 0.796, Test Loss: 1.094
[149] [it=117000] Train Loss: 0.785, Test Loss: 1.115
Done training.
Bây giờ, hãy lập biểu đồ kiểm tra mất mát của cả hai mạng (có và không có bỏ mạng):
import matplotlib.pyplot as plt
# plot both benchmarks
plt.plot(testing_loss, label="no dropout")
plt.plot(testing_loss_d, label="with dropout")
# make the legend on the plot
plt.legend()
plt.title("The Cross-entropy loss of the MNIST test data with and w/o Dropout")
plt.show()
Đầu ra:
Như bạn có thể thấy, sự mất mát thử nghiệm của mạng nơ-ron không bị bỏ rơi bắt đầu tăng lên sau khoảng 20 kỷ, rõ ràng là quá mức. Trong khi khi giới thiệu tình trạng bỏ học, nó tiếp tục giảm theo thời gian.
Sau khi áp dụng kỹ thuật bỏ học, chúng tôi nhận thấy một chút cải thiện. Bỏ học không phải là kỹ thuật duy nhất có thể được sử dụng để tránh mặc quá sức. Có những kỹ thuật khác, chẳng hạn như giảm trọng lượng hoặc dừng sớm.
Bạn có thể lấy mã hoàn chỉnh cho hướng dẫn này trên sổ tay Colab tại đây .
Nguồn bài viết gốc tại https://www.thepythoncode.com
#pytorch #python #deeplearning
1652164644
Learn the importance of dropout regularization and how to apply it in PyTorch Deep learning framework in Python.
Since model performance may be considerably enhanced by fine-tuning their hyperparameters, tuning them involves identifying optimum parameters that will provide more outstanding performance than the model's default hyperparameters. We can use several techniques for hyperparameter tuning. One of them is the dropout.
In this tutorial, we will present dropout regularization for neural networks. We first explore the background and motivation for adopting dropout, followed by a description of how dropout works theoretically and how to implement it in the Pytorch library in Python.
We will also see a plot of the loss on the testing set through time on the neural network with and without dropout.
For demonstration, we'll be using the MNIST dataset that is available in the torchvision library. Let's install all the dependencies of this tutorial:
$ pip install matplotlib==3.4.3 numpy==1.21.5 torch==1.10.1 torchvision
If you're on Colab, you don't have to install anything, as everything comes pre-installed. Importing the necessary libraries:
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
Let's make sure we train using GPU if CUDA is available:
# defining our device, 'cuda:0' if CUDA is available, 'cpu' otherwise
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
device
device(type='cuda', index=0)
Let's make the transform pipeline for our MNIST dataset:
# make the transform pipeline, converting to tensor and normalizing
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
# the batch size during training
batch_size = 64
Next, loading the training dataset:
train_dataset = torchvision.datasets.CIFAR10(root="./data", train=True,
download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size,
shuffle=True, num_workers=2)
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
170499072/? [00:06<00:00, 29640614.08it/s]
Extracting ./data/cifar-10-python.tar.gz to ./data
We pass True
to train
to indicate the training dataset, and we also set download
to True
to download the dataset into the specified data
folder. After that, we make our DataLoader
, pass the batch_size
, and set shuffle
to True
.
Similarly, for the test set:
test_dataset = torchvision.datasets.CIFAR10(root="./data", train=False,
download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size,
shuffle=False, num_workers=2)
Files already downloaded and verified
This time we set train
to False
to get the testing set.
Below are the available classes in the MNIST dataset:
# the MNIST classes
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
Now that we have the dataset ready for training, let's make a neural network without dropout:
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
# switch to GPU if available
net.to(device)
Initializing the cross-entropy loss and the SGD optimizer:
import torch.optim as optim
# defining the loss and the optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
Next, let's make a function that accepts the neural network, the loss, and the data loader to calculate the overall loss:
def get_test_loss(net, criterion, data_loader):
"""A simple function that iterates over `data_loader` to calculate the overall loss"""
testing_loss = []
with torch.no_grad():
for data in data_loader:
inputs, labels = data
# get the data to GPU (if available)
inputs, labels = inputs.to(device), labels.to(device)
outputs = net(inputs)
# calculate the loss for this batch
loss = criterion(outputs, labels)
# add the loss of this batch to the list
testing_loss.append(loss.item())
# calculate the average loss
return sum(testing_loss) / len(testing_loss)
We'll be needing this function during training. Let's start training now:
training_loss, testing_loss = [], []
running_loss = []
i = 0
for epoch in range(150): # 150 epochs
for data in train_loader:
inputs, labels = data
# get the data to GPU (if available)
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
# forward pass
outputs = net(inputs)
# backward pass
loss = criterion(outputs, labels)
loss.backward()
# update gradients
optimizer.step()
running_loss.append(loss.item())
i += 1
if i % 1000 == 0:
avg_train_loss = sum(running_loss) / len(running_loss)
avg_test_loss = get_test_loss(net, criterion, test_loader)
# clear the list
running_loss.clear()
# for logging & plotting later
training_loss.append(avg_train_loss)
testing_loss.append(avg_test_loss)
print(f"[{epoch:2d}] [it={i:5d}] Train Loss: {avg_train_loss:.3f}, Test Loss: {avg_test_loss:.3f}")
print("Done training.")
Output:
[ 1] [it= 1000] Train Loss: 2.273, Test Loss: 2.118
[10] [it= 8000] Train Loss: 1.312, Test Loss: 1.326
[20] [it=16000] Train Loss: 1.031, Test Loss: 1.120
[30] [it=24000] Train Loss: 0.854, Test Loss: 1.043
[40] [it=32000] Train Loss: 0.718, Test Loss: 1.051
[51] [it=40000] Train Loss: 0.604, Test Loss: 1.085
[60] [it=47000] Train Loss: 0.521, Test Loss: 1.178
[70] [it=55000] Train Loss: 0.425, Test Loss: 1.370
[80] [it=63000] Train Loss: 0.348, Test Loss: 1.518
[93] [it=73000] Train Loss: 0.268, Test Loss: 1.859
[99] [it=78000] Train Loss: 0.248, Test Loss: 2.036
[109] [it=86000] Train Loss: 0.200, Test Loss: 2.351
[120] [it=94000] Train Loss: 0.161, Test Loss: 2.610
[130] [it=102000] Train Loss: 0.142, Test Loss: 2.976
[140] [it=110000] Train Loss: 0.117, Test Loss: 3.319
[149] [it=117000] Train Loss: 0.095, Test Loss: 3.593
Done training.
Deep neural networks include several non-linear hidden layers, making them highly expressive models capable of learning complex correlations between inputs and outputs. However, with minimal training data, many of these complex associations will result from sampling noise; thus, they will exist in the training set but not in the actual test data, even if they are derived from the same distribution. Fitting all feasible alternative neural networks on the same dataset and averaging the predictions from each model is one method for reducing overfitting.
It is impossible, but it may be approximated by utilizing a small group of distinct models known as an ensemble.
Individually trained nets are costly with massive neural networks. Combining many models is most useful when the individual models are distinct from one another, and neural net models should be specific by having various designs or being trained on separate data. Training many designs is difficult because determining ideal hyperparameters for each architecture is a demanding undertaking, and training each huge network demands a significant amount of computing.
Furthermore, extensive networks often need a considerable quantity of training data, and there may not be enough data to train separate networks on different subsets of the data. So, even with the ensemble approximation, there is a difficulty in that it needs various models to be fit and stored, which may be challenging if the models are huge and take days or weeks to train and tune.
During training, the network's capacity is reduced when the outputs of a layer under dropout are randomly subsampled. Consequently, dropouts may need a more extensive network.
When training neural networks with dropout, specific layer outputs are disregarded or dropped out at random. This has the effect of having the layer having a different number of nodes and connectedness to the preceding layer.
Each update to a layer during training is made with a distinct view of the configured layer. Dropout makes the training process noisy, causing nodes within a layer to take on more or less responsibility for the inputs on a probabilistic basis.
On the training data, neurons learn to correct errors committed by their peers, a process known as co-adaptation. Consequently, the network's ability to fit the training data improves dramatically. Because the co-adaptations are tailored to the training data's particularities, they won't generalize to the test data; therefore, it becomes more volatile.
Nitish Srivastava, Ilya Sutskever, Geoffrey Hinton, Alex Krizhevsky, and Ruslan Salakhutdinov were the first to propose dropout to minimize the effects of neural co-adaptation.
Source: the paper.
Dropout is a technique that addresses both issues mentioned above in neural networks. It avoids overfitting and enables the effective approximation combining of an indefinitely large number of unique neural network topologies.
Dropout refers to removing units (both hidden and apparent) from a neural network. Dropping a unit out implies temporarily removing it from the network, including all of its incoming and outgoing connections. Dropout mimics a sparse activation from a given layer, which pushes the network to learn a sparse representation as a side-effect. It might be employed instead of activity regularization to encourage sparse representations in autoencoder models.
Dropout adds a new hyperparameter: the likelihood of keeping a unit p. This hyperparameter controls the degree of Dropout.
There is no dropout when p = 1, and low p values indicate that there are a lot of dropouts. For concealed units, typical p values range from 0.5 to 0.8.
The kind of input influences the choice of input layers. A typical value for real-valued inputs (image patches or audio frames) is 0.8.
The choice of p for hidden layers is linked to the number of hidden units n. Smaller p demands large n, which slows down training and leads to underfitting.
When p is large, there may not be enough Dropout to avoid overfitting. In a particular layer, Units will have a 70% chance of remaining active and a 30% risk of being dropped if the retention probability is 0.7% in that layer.
Please note that PyTorch and other deep learning frameworks use a dropout rate instead of a keep rate p, a 70% keep rate means a 30% dropout rate, and so on.
When training, dropouts are solely used to increase the network's ability to withstand changes in the training dataset. During the test, you'll likely want to use the whole network.
Notably, dropout is not used with test data or during production inference. The result is that you will have more connections and activations in your neuron during inference than during training.
The neurons in the subsequent layer would thus be overexcited if you utilize a 50% dropout rate during training and remove two out of every four neurons in a layer. Thus, the values produced by these neurons will be 50% larger than necessary. Retention probability (1 – dropout rate) is used to scale down the weights of overactivated neurons during test and inference time. The figure below, created by Datascience Stackexchange member Dmytro Prylipko, beautifully depicts how this works:
First introduced by Geoffrey Hinton and colleagues in their 2012 paper titled Improving neural networks by preventing co-adaptation of feature detectors, dropout has been applied to a wide range of problem types, including photo classification (CIFAR-10), handwritten digit recognition (MNIST), and speech recognition (TIMIT).
This paper can deduce some interesting facts about dropout applications on the MNIST dataset.
When dropout was introduced in a 2014 journal publication entitled Dropout: A Simple Way to Prevent Neural Networks from Overfitting, it was tried on a wide variety of voice recognition, computer vision, and text classification problems and discovered that it improved performance on all of them. This paper describes the procedure for training dropout neural nets.
Using deep convolutional neural networks with dropout regularization, Alex Krizhevsky, et al., in their 2012 publication "ImageNet Classification with Deep Convolutional Neural Networks", produced state-of-the-art results for image classification on the ImageNet dataset.
The Dropout approach is an excellent illustration of how PyTorch has made coding simple and straightforward.
With two lines of code, we can achieve our objective, which seems to be a difficult one at first. We just need to add an extra dropout layer when developing our model. The class torch.nn.Dropout()
will be used to do this.
Some of the input tensor elements are deactivated at random by this class during training. The parameter p gives the likelihood of a neuron being deactivated. This option has a default value of 0.5, implying that half of the neurons will drop out. The outputs are scaled by a factor of 1/1-p, which indicates that the module merely computes the identity function during evaluation.
Take a closer look at our model's architecture. We will use the below program to update the version of the baseline model.
In this scenario, a dropout is applied after each step in decreasing order.
class NetDropout(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.do1 = nn.Dropout(0.2) # 20% Probability
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.do2 = nn.Dropout(0.2) # 20% Probability
self.fc2 = nn.Linear(120, 84)
self.do3 = nn.Dropout(0.1) # 10% Probability
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.do1(x)
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = self.do2(x)
x = F.relu(self.fc2(x))
x = self.do3(x)
x = self.fc3(x)
return x
net_dropout = NetDropout()
net_dropout.to(device)
Let's do the training as before:
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net_dropout.parameters(), lr=0.001, momentum=0.9)
training_loss_d, testing_loss_d = [], []
running_loss = []
i = 0
for epoch in range(150): # 10 epochs
for data in train_loader:
inputs, labels = data
# get the data to GPU (if available)
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
# forward pass
outputs = net_dropout(inputs)
# backward pass
loss = criterion(outputs, labels)
loss.backward()
# update gradients
optimizer.step()
running_loss.append(loss.item())
i += 1
if i % 1000 == 0:
avg_train_loss = sum(running_loss) / len(running_loss)
avg_test_loss = get_test_loss(net_dropout, criterion, test_loader)
# clear the list
running_loss.clear()
# for logging & plotting later
training_loss_d.append(avg_train_loss)
testing_loss_d.append(avg_test_loss)
print(f"[{epoch:2d}] [it={i:5d}] Train Loss: {avg_train_loss:.3f}, Test Loss: {avg_test_loss:.3f}")
print("Done training.")
[ 1] [it= 1000] Train Loss: 2.302, Test Loss: 2.298
[10] [it= 8000] Train Loss: 1.510, Test Loss: 1.489
[20] [it=16000] Train Loss: 1.290, Test Loss: 1.318
[30] [it=24000] Train Loss: 1.167, Test Loss: 1.214
[40] [it=32000] Train Loss: 1.085, Test Loss: 1.154
[49] [it=39000] Train Loss: 1.025, Test Loss: 1.141
[60] [it=47000] Train Loss: 0.979, Test Loss: 1.113
[70] [it=55000] Train Loss: 0.936, Test Loss: 1.082
[80] [it=63000] Train Loss: 0.902, Test Loss: 1.088
[90] [it=71000] Train Loss: 0.880, Test Loss: 1.087
[99] [it=78000] Train Loss: 0.856, Test Loss: 1.090
[109] [it=86000] Train Loss: 0.843, Test Loss: 1.094
[120] [it=94000] Train Loss: 0.818, Test Loss: 1.102
[130] [it=102000] Train Loss: 0.805, Test Loss: 1.090
[140] [it=110000] Train Loss: 0.796, Test Loss: 1.094
[149] [it=117000] Train Loss: 0.785, Test Loss: 1.115
Done training.
Now let's plot the test loss of both networks (with and without dropout):
import matplotlib.pyplot as plt
# plot both benchmarks
plt.plot(testing_loss, label="no dropout")
plt.plot(testing_loss_d, label="with dropout")
# make the legend on the plot
plt.legend()
plt.title("The Cross-entropy loss of the MNIST test data with and w/o Dropout")
plt.show()
Output:
As you can see, the test loss of the neural network without dropout started to increase after about 20 epochs, clearly overfitting. Whereas when introducing dropout, it keeps decreasing over time.
After applying the dropout technique, we notice a slight improvement. Dropout is not the sole technique that can be used to avoid overfitting. There are other techniques, such as weight decay or early stopping.
You can get the complete code for this tutorial on the Colab notebook here.
Original article source at https://www.thepythoncode.com
#pytorch #python #deeplearning
1652025600
Reinforcement learning algorithms in RLlib and PyTorch.
pip install raylab
Raylab provides agents and environments to be used with a normal RLlib/Tune setup. You can an agent's name (from the Algorithms section) to raylab info list
to list its top-level configurations:
raylab info list SoftAC
learning_starts: 0
Hold this number of timesteps before first training operation.
policy: {}
Sub-configurations for the policy class.
wandb: {}
Configs for integration with Weights & Biases.
Accepts arbitrary keyword arguments to pass to `wandb.init`.
The defaults for `wandb.init` are:
* name: `_name` property of the trainer.
* config: full `config` attribute of the trainer
* config_exclude_keys: `wandb` and `callbacks` configs
* reinit: True
Don't forget to:
* install `wandb` via pip
* login to W&B with the appropriate API key for your
team/project.
* set the `wandb/project` name in the config dict
Check out the Quickstart for more information:
`https://docs.wandb.com/quickstart`
You can add the --rllib
flag to get the descriptions for all the options common to RLlib agents (or Trainer
s)
Launching experiments can be done via the command line using raylab experiment
passing a file path with an agent's configuration through the --config
flag. The following command uses the cartpole example configuration file to launch an experiment using the vanilla Policy Gradient agent from the RLlib library.
raylab experiment PG --name PG -s training_iteration 10 --config examples/PG/cartpole_defaults.py
You can also launch an experiment from a Python script normally using Ray and Tune. The following shows how you may use Raylab to perform an experiment comparing different types of exploration for the NAF agent.
import ray
from ray import tune
import raylab
def main():
raylab.register_all_agents()
raylab.register_all_environments()
ray.init()
tune.run(
"NAF",
local_dir="data/NAF",
stop={"timesteps_total": 100000},
config={
"env": "CartPoleSwingUp-v0",
"exploration_config": {
"type": tune.grid_search([
"raylab.utils.exploration.GaussianNoise",
"raylab.utils.exploration.ParameterNoise"
])
}
},
num_samples=10,
)
if __name__ == "__main__":
main()
One can then visualize the results using raylab dashboard
, passing the local_dir
used in the experiment. The dashboard lets you filter and group results in a quick way.
raylab dashboard data/NAF/
You can find the best checkpoint according to a metric (episode_reward_mean
by default) using raylab find-best
.
raylab find-best data/NAF/
Finally, you can pass a checkpoint to raylab rollout
to see the returns collected by the agent and render it if the environment supports a visual render()
method. For example, you can use the output of the find-best
command to see the best agent in action.
raylab rollout $(raylab find-best data/NAF/) --agent NAF
Paper | Agent Name |
Actor Critic using Kronecker-factored Trust Region | ACKTR |
Trust Region Policy Optimization | TRPO |
Normalized Advantage Function | NAF |
Stochastic Value Gradients | SVG(inf)/SVG(1)/SoftSVG |
Soft Actor-Critic | SoftAC |
Streamlined Off-Policy (DDPG) | SOP |
Model-Based Policy Optimization | MBPO |
Model-based Action-Gradient-Estimator | MAGE |
For a high-level description of the available utilities, run raylab --help
Usage: raylab [OPTIONS] COMMAND [ARGS]...
RayLab: Reinforcement learning algorithms in RLlib.
Options:
--help Show this message and exit.
Commands:
dashboard Launch the experiment dashboard to monitor training progress.
episodes Launch the episode dashboard to monitor state and action...
experiment Launch a Tune experiment from a config file.
find-best Find the best experiment checkpoint as measured by a metric.
info View information about an agent's config parameters.
rollout Wrap `rllib rollout` with customized options.
test-module Launch dashboard to test generative models from a checkpoint.
The project is structured as follows
raylab
|-- agents # Trainer and Policy classes
|-- cli # Command line utilities
|-- envs # Gym environment registry and utilities
|-- logger # Tune loggers
|-- policy # Extensions and customizations of RLlib's policy API
| |-- losses # RL loss functions
| |-- modules # PyTorch neural network modules for TorchPolicy
|-- pytorch # PyTorch extensions
|-- utils # miscellaneous utilities
Author: 0xangelo
Source Code: https://github.com/0xangelo/raylab
License: MIT license
#pytorch #streamlit
1652004000
🚀 This project was created using the Made With ML boilerplate template. Check it out to start creating your own ML applications.
virtualenv -p python3.6 venv
source venv/bin/activate
pip install -r requirements.txt
pip install torch==1.4.0
python text_classification/utils.py
python text_classification/train.py \
--data-url https://raw.githubusercontent.com/madewithml/lessons/master/data/news.csv --lower --shuffle --use-glove
uvicorn text_classification.app:app --host 0.0.0.0 --port 5000 --reload
GOTO: http://localhost:5000/docs
python text_classification/predict.py --text 'The Canadian government officials proposed the new federal law.'
curl "http://localhost:5000/predict" \
-X POST -H "Content-Type: application/json" \
-d '{
"inputs":[
{
"text":"The Wimbledon tennis tournament starts next week!"
},
{
"text":"The Canadian government officials proposed the new federal law."
}
]
}' | json_pp
import json
import requests
headers = {
'Content-Type': 'application/json',
}
data = {
"experiment_id": "latest",
"inputs": [
{
"text": "The Wimbledon tennis tournament starts next week!"
},
{
"text": "The Canadian minister signed in the new federal law."
}
]
}
response = requests.post('http://0.0.0.0:5000/predict',
headers=headers, data=json.dumps(data))
results = json.loads(response.text)
print (json.dumps(results, indent=2, sort_keys=False))
streamlit run text_classification/streamlit.py
GOTO: http://localhost:8501
pytest
docker build -t text-classification:latest -f Dockerfile .
docker run -d -p 5000:5000 -p 6006:6006 --name text-classification text-classification:latest
Set `WANDB_API_KEY` as an environment variable.
text-classification/
├── datasets/ - datasets
├── logs/ - directory of log files
| ├── errors/ - error log
| └── info/ - info log
├── tests/ - unit tests
├── text_classification/ - ml scripts
| ├── app.py - app endpoints
| ├── config.py - configuration
| ├── data.py - data processing
| ├── models.py - model architectures
| ├── predict.py - prediction script
| ├── streamlit.py - streamlit app
| ├── train.py - training script
| └── utils.py - load embeddings and utilities
├── wandb/ - wandb experiment runs
├── .dockerignore - files to ignore on docker
├── .gitignore - files to ignore on git
├── CODE_OF_CONDUCT.md - code of conduct
├── CODEOWNERS - code owner assignments
├── CONTRIBUTING.md - contributing guidelines
├── Dockerfile - dockerfile to containerize app
├── LICENSE - license description
├── logging.json - logger configuration
├── Procfile - process script for Heroku
├── README.md - this README
├── requirements.txt - requirementss
├── setup.sh - streamlit setup for Heroku
└── sweeps.yaml - hyperparameter wandb sweeps config
python text_classification/train.py \
--data-url https://raw.githubusercontent.com/madewithml/lessons/master/data/news.csv --lower --shuffle --data-size 0.1 --num-epochs 3
python text_classification/train.py \
--data-url https://raw.githubusercontent.com/madewithml/lessons/master/data/news.csv --lower --shuffle
python text_classification/train.py \
--data-url https://raw.githubusercontent.com/madewithml/lessons/master/data/news.csv --lower --shuffle --use-glove --freeze-embeddings
python text_classification/train.py \
--data-url https://raw.githubusercontent.com/madewithml/lessons/master/data/news.csv --lower --shuffle --use-glove
End-to-end topics that will be covered in subsequent lessons.
• Build image
docker build -t madewithml:latest -f Dockerfile .
• Run container if using CMD ["python", "app.py"]
or ENTRYPOINT [ "/bin/sh", "entrypoint.sh"]
docker run -p 5000:5000 --name madewithml madewithml:latest
• Get inside container if using CMD ["/bin/bash"]
docker run -p 5000:5000 -it madewithml /bin/bash
• Run container with mounted volume
docker run -p 5000:5000 -v $PWD:/root/madewithml/ --name madewithml madewithml:latest
• Other flags
-d: detached
-ti: interative terminal
• Clean up
docker stop $(docker ps -a -q) # stop all containers
docker rm $(docker ps -a -q) # remove all containers
docker rmi $(docker images -a -q) # remove all images
Author: madewithml
Source Code: https://github.com/madewithml/e2e-ml-app-pytorch
License: MIT license
1651996800
Simple example of usage of streamlit and FastAPI for ML model serving described on this blogpost and PyConES 2020 video.
When developing simple APIs that serve machine learning models, it can be useful to have both a backend (with API documentation) for other applications to call and a frontend for users to experiment with the functionality.
In this example, we serve an image semantic segmentation model using FastAPI
for the backend service and streamlit
for the frontend service. docker-compose
orchestrates the two services and allows communication between them.
To run the example in a machine running Docker and docker-compose, run:
docker-compose build
docker-compose up
To visit the FastAPI documentation of the resulting service, visit http://localhost:8000 with a web browser.
To visit the streamlit UI, visit http://localhost:8501.
Logs can be inspected via:
docker-compose logs
To deploy the app, one option is deployment on Heroku (with Dockhero). To do so:
docker-compose.yml
to dockhero-compose.yml
<my-app>
) on a Heroku accountheroku plugins:install dockhero
heroku dh:compose up -d --app <my-app>
to deploy the appheroku dh:open --app <my-app>
8000/docs
, e.g. http://dockhero-<named-assigned-to-my-app>-12345.dockhero.io:8000/docs
(not https
):8501
to visit the streamlit interfaceheroku logs -p dockhero --app <my-app>
To modify and debug the app, development in containers can be useful (and kind of fun!).
Author: davidefiocco
Source Code: https://github.com/davidefiocco/streamlit-fastapi-model-serving
License: MIT license
1651856040
PyTorch-NLP, or torchnlp
for short, is a library of basic utilities for PyTorch NLP. torchnlp
extends PyTorch to provide you with basic text data processing functions.
Logo by Chloe Yeo, Corporate Sponsorship by WellSaid Labs
Make sure you have Python 3.6+ and PyTorch 1.0+. You can then install pytorch-nlp
using pip:
pip install pytorch-nlp
Or to install the latest code via:
pip install git+https://github.com/PetrochukM/PyTorch-NLP.git
The complete documentation for PyTorch-NLP is available via our ReadTheDocs website.
Within an NLP data pipeline, you'll want to implement these basic steps:
Load the IMDB dataset, for example:
from torchnlp.datasets import imdb_dataset
# Load the imdb training dataset
train = imdb_dataset(train=True)
train[0] # RETURNS: {'text': 'For a movie that gets..', 'sentiment': 'pos'}
Load a custom dataset, for example:
from pathlib import Path
from torchnlp.download import download_file_maybe_extract
directory_path = Path('data/')
train_file_path = Path('trees/train.txt')
download_file_maybe_extract(
url='http://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip',
directory=directory_path,
check_files=[train_file_path])
open(directory_path / train_file_path)
Don't worry we'll handle caching for you!
Tokenize and encode your text as a tensor.
For example, a WhitespaceEncoder
breaks text into tokens whenever it encounters a whitespace character.
from torchnlp.encoders.text import WhitespaceEncoder
loaded_data = ["now this ain't funny", "so don't you dare laugh"]
encoder = WhitespaceEncoder(loaded_data)
encoded_data = [encoder.encode(example) for example in loaded_data]
With your loaded and encoded data in hand, you'll want to batch your dataset.
import torch
from torchnlp.samplers import BucketBatchSampler
from torchnlp.utils import collate_tensors
from torchnlp.encoders.text import stack_and_pad_tensors
encoded_data = [torch.randn(2), torch.randn(3), torch.randn(4), torch.randn(5)]
train_sampler = torch.utils.data.sampler.SequentialSampler(encoded_data)
train_batch_sampler = BucketBatchSampler(
train_sampler, batch_size=2, drop_last=False, sort_key=lambda i: encoded_data[i].shape[0])
batches = [[encoded_data[i] for i in batch] for batch in train_batch_sampler]
batches = [collate_tensors(batch, stack_tensors=stack_and_pad_tensors) for batch in batches]
PyTorch-NLP builds on top of PyTorch's existing torch.utils.data.sampler
, torch.stack
and default_collate
to support sequential inputs of varying lengths!
With your batch in hand, you can use PyTorch to develop and train your model using gradient descent. For example, check out this example code for training on the Stanford Natural Language Inference (SNLI) Corpus.
PyTorch-NLP has a couple more NLP focused utility packages to support you! 🤗
Now you've setup your pipeline, you may want to ensure that some functions run deterministically. Wrap any code that's random, with fork_rng
and you'll be good to go, like so:
import random
import numpy
import torch
from torchnlp.random import fork_rng
with fork_rng(seed=123): # Ensure determinism
print('Random:', random.randint(1, 2**31))
print('Numpy:', numpy.random.randint(1, 2**31))
print('Torch:', int(torch.randint(1, 2**31, (1,))))
This will always print:
Random: 224899943
Numpy: 843828735
Torch: 843828736
Now that you've computed your vocabulary, you may want to make use of pre-trained word vectors to set your embeddings, like so:
import torch
from torchnlp.encoders.text import WhitespaceEncoder
from torchnlp.word_to_vector import GloVe
encoder = WhitespaceEncoder(["now this ain't funny", "so don't you dare laugh"])
vocab_set = set(encoder.vocab)
pretrained_embedding = GloVe(name='6B', dim=100, is_include=lambda w: w in vocab_set)
embedding_weights = torch.Tensor(encoder.vocab_size, pretrained_embedding.dim)
for i, token in enumerate(encoder.vocab):
embedding_weights[i] = pretrained_embedding[token]
For example, from the neural network package, apply the state-of-the-art LockedDropout
:
import torch
from torchnlp.nn import LockedDropout
input_ = torch.randn(6, 3, 10)
dropout = LockedDropout(0.5)
# Apply a LockedDropout to `input_`
dropout(input_) # RETURNS: torch.FloatTensor (6x3x10)
Compute common NLP metrics such as the BLEU score.
from torchnlp.metrics import get_moses_multi_bleu
hypotheses = ["The brown fox jumps over the dog 笑"]
references = ["The quick brown fox jumps over the lazy dog 笑"]
# Compute BLEU score with the official BLEU perl script
get_moses_multi_bleu(hypotheses, references, lowercase=True) # RETURNS: 47.9
Maybe looking at longer examples may help you at examples/
.
Need more help? We are happy to answer your questions via Gitter Chat
We've released PyTorch-NLP because we found a lack of basic toolkits for NLP in PyTorch. We hope that other organizations can benefit from the project. We are thankful for any contributions from the community.
Read our contributing guide to learn about our development process, how to propose bugfixes and improvements, and how to build and test your changes to PyTorch-NLP.
torchtext and PyTorch-NLP differ in the architecture and feature set; otherwise, they are similar. torchtext and PyTorch-NLP provide pre-trained word vectors, datasets, iterators and text encoders. PyTorch-NLP also provides neural network modules and metrics. From an architecture standpoint, torchtext is object orientated with external coupling while PyTorch-NLP is object orientated with low coupling.
AllenNLP is designed to be a platform for research. PyTorch-NLP is designed to be a lightweight toolkit.
If you find PyTorch-NLP useful for an academic publication, then please use the following BibTeX to cite it:
@misc{pytorch-nlp,
author = {Petrochuk, Michael},
title = {PyTorch-NLP: Rapid Prototyping with PyTorch Natural Language Processing (NLP) Tools},
year = {2018},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/PetrochukM/PyTorch-NLP}},
}
Author: PetrochukM
Source Code: https://github.com/PetrochukM/PyTorch-NLP
License: BSD-3-Clause License
1651845120
Overview
PyText is a deep-learning based NLP modeling framework built on PyTorch. PyText addresses the often-conflicting requirements of enabling rapid experimentation and of serving models at scale. It achieves this by providing simple and extensible interfaces and abstractions for model components, and by using PyTorch’s capabilities of exporting models for inference via the optimized Caffe2 execution engine. We are using PyText in Facebook to iterate quickly on new modeling ideas and then seamlessly ship them at scale.
Core PyText features:
Installing PyText
To get started on a Cloud VM, check out our guide.
Get the source code:
$ git clone https://github.com/facebookresearch/pytext
$ cd pytext
Create a virtualenv and install PyText:
$ python3 -m venv pytext_venv
$ source pytext_venv/bin/activate
(pytext_venv) $ pip install pytext-nlp
Detailed instructions and more installation options can be found in our Documentation. If you encounter issues with missing dependencies during installation, please refer to OS Dependencies.
Train your first text classifier
For this first example, we'll train a CNN-based text-classifier that classifies text utterances, using the examples in tests/data/train_data_tiny.tsv
. The data and configs files can be obtained either by cloning the repository or by downloading the files manually from GitHub.
(pytext_venv) $ pytext train < demo/configs/docnn.json
By default, the model is created in /tmp/model.pt
Now you can export your model as a caffe2 net:
(pytext_venv) $ pytext export < demo/configs/docnn.json
You can use the exported caffe2 model to predict the class of raw utterances like this:
(pytext_venv) $ pytext --config-file demo/configs/docnn.json predict <<< '{"text": "create an alarm for 1:30 pm"}'
More examples and tutorials can be found in Full Documentation.
Join the community
License
PyText is BSD-licensed, as found in the LICENSE file.
Author: facebookresearch
Source Code: https://github.com/facebookresearch/pytext
License: View license
1651836960
This repository trains the Deep Convolutional GAN in both Pytorch and Tensorflow on Anime-Faces dataset. It is tested with:
Cuda-11.1
Cudnn-8.0
The Pytorch and Tensorflow scripts require numpy, tensorflow, torch. To get the versions of these packages you need for the program, use pip: (Make sure pip is upgraded: python3 -m pip install -U pip
)
pip3 install -r requirements.txt
├── PyTorch
│ ├── DCGAN_Anime_Pytorch.ipynb
│ └── dcgan_anime_pytorch.py
└── TensorFlow
├── DCGAN_Anime_Tensorflow.ipynb
└── dcgan_anime_tesnorflow.py
To train the Deep Convolutional GAN with Pytorch, please go into the Pytorch
folder and execute the Jupyter Notebook.
To train the Deep Convolutional GAN with TensorFlow, please go into the Tensorflow
folder and execute the Jupyter Notebook.
Link: https://github.com/spmallick/learnopencv/tree/master/Deep-Convolutional-GAN
1651777740
The pipeline is based on pytorch-lightning framework.
nvcc --version
;pip install torch==1.5.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
;pip install torch==1.5.1
to install PyTorch4. Run pip install -r requirements.txt
Please, follow the instruction to prepare the dataset:
split_food-101.py
to split Food-101 into train/test folders. This script will parse train.txt
and test.txt
and copy images into corresponding sub-folders. Note that we hard-coded the classes which we are going to use.To train model launch
python main.py --gpus [gpus_number] --max_epochs [epoch_number] --data-root [path_to_dataset] --amp_level [optimization_level] (only for mixed-precision with apex)
You can turn on any trick by adding the corresponding key:
--use-smoothing for label smoothing;
--use-mixup for mixup augmentation;
--use-cosine-scheduler for Cosine LR Scheduler;
--use-knowledge-distillation for Knowledge Distillation;
Note: If you want to train the model on GPU you should always use --gpus
key, without it the pytorch lightning console log will show you that the GPU is used but the training will be performed on CPU.
For Knowledge Distillation, please, download the teacher weights from Dropbox.
Run python main.py --help
to see all possible arguments. This command will also show you the arguments for pytorch lightning Trainer. Please, see the official documentation for details about them.
To evaluate a trained model launch
python main.py --gpus [gpus_number] --data-root [path_to_dataset] -e --checkpoint [path_to_checkpoint]
You can also train a colab-based version of this model:
Link: https://github.com/spmallick/learnopencv/tree/master/Bag-Of-Tricks-For-Image-Classification
1651650381
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch.
The main novelty seems to be an extra layer of indirection with the prior network (whether it is an autoregressive transformer or a diffusion network), which predicts an image embedding based on the text embedding from CLIP. Specifically, this repository will only build out the diffusion prior network, as it is the best performing variant (but which incidentally involves a causal transformer as the denoising network 😂)
This model is SOTA for text-to-image for now.
There was enough interest for a Jax version. I will also eventually extend this to text to video, once the repository is in a good place.
$ pip install dalle2-pytorch
To train DALLE-2 is a 3 step process, with the training of CLIP being the most important
To train CLIP, you can either use x-clip package, or join the LAION discord, where a lot of replication efforts are already underway.
This repository will demonstrate integration with x-clip
for starters
import torch
from dalle2_pytorch import CLIP
clip = CLIP(
dim_text = 512,
dim_image = 512,
dim_latent = 512,
num_text_tokens = 49408,
text_enc_depth = 1,
text_seq_len = 256,
text_heads = 8,
visual_enc_depth = 1,
visual_image_size = 256,
visual_patch_size = 32,
visual_heads = 8,
use_all_token_embeds = True, # whether to use fine-grained contrastive learning (FILIP)
decoupled_contrastive_learning = True, # use decoupled contrastive learning (DCL) objective function, removing positive pairs from the denominator of the InfoNCE loss (CLOOB + DCL)
extra_latent_projection = True, # whether to use separate projections for text-to-image vs image-to-text comparisons (CLOOB)
use_visual_ssl = True, # whether to do self supervised learning on images
visual_ssl_type = 'simclr', # can be either 'simclr' or 'simsiam', depending on using DeCLIP or SLIP
use_mlm = False, # use masked language learning (MLM) on text (DeCLIP)
text_ssl_loss_weight = 0.05, # weight for text MLM loss
image_ssl_loss_weight = 0.05 # weight for image self-supervised learning loss
).cuda()
# mock data
text = torch.randint(0, 49408, (4, 256)).cuda()
images = torch.randn(4, 3, 256, 256).cuda()
# train
loss = clip(
text,
images,
return_loss = True # needs to be set to True to return contrastive loss
)
loss.backward()
# do the above with as many texts and images as possible in a loop
Then, you will need to train the decoder, which learns to generate images based on the image embedding coming from the trained CLIP above
import torch
from dalle2_pytorch import Unet, Decoder, CLIP
# trained clip from step 1
clip = CLIP(
dim_text = 512,
dim_image = 512,
dim_latent = 512,
num_text_tokens = 49408,
text_enc_depth = 1,
text_seq_len = 256,
text_heads = 8,
visual_enc_depth = 1,
visual_image_size = 256,
visual_patch_size = 32,
visual_heads = 8
).cuda()
# unet for the decoder
unet = Unet(
dim = 128,
image_embed_dim = 512,
cond_dim = 128,
channels = 3,
dim_mults=(1, 2, 4, 8)
).cuda()
# decoder, which contains the unet and clip
decoder = Decoder(
unet = unet,
clip = clip,
timesteps = 100,
image_cond_drop_prob = 0.1,
text_cond_drop_prob = 0.5
).cuda()
# mock images (get a lot of this)
images = torch.randn(4, 3, 256, 256).cuda()
# feed images into decoder
loss = decoder(images)
loss.backward()
# do the above for many many many many steps
# then it will learn to generate images based on the CLIP image embeddings
Finally, the main contribution of the paper. The repository offers the diffusion prior network. It takes the CLIP text embeddings and tries to generate the CLIP image embeddings. Again, you will need the trained CLIP from the first step
import torch
from dalle2_pytorch import DiffusionPriorNetwork, DiffusionPrior, CLIP
# get trained CLIP from step one
clip = CLIP(
dim_text = 512,
dim_image = 512,
dim_latent = 512,
num_text_tokens = 49408,
text_enc_depth = 6,
text_seq_len = 256,
text_heads = 8,
visual_enc_depth = 6,
visual_image_size = 256,
visual_patch_size = 32,
visual_heads = 8,
).cuda()
# setup prior network, which contains an autoregressive transformer
prior_network = DiffusionPriorNetwork(
dim = 512,
depth = 6,
dim_head = 64,
heads = 8
).cuda()
# diffusion prior network, which contains the CLIP and network (with transformer) above
diffusion_prior = DiffusionPrior(
net = prior_network,
clip = clip,
timesteps = 100,
cond_drop_prob = 0.2
).cuda()
# mock data
text = torch.randint(0, 49408, (4, 256)).cuda()
images = torch.randn(4, 3, 256, 256).cuda()
# feed text and images into diffusion prior network
loss = diffusion_prior(text, images)
loss.backward()
# do the above for many many many steps
# now the diffusion prior can generate image embeddings from the text embeddings
In the paper, they actually used a recently discovered technique, from Jonathan Ho himself (original author of DDPMs, the core technique used in DALL-E v2) for high resolution image synthesis.
This can easily be used within this framework as so
import torch
from dalle2_pytorch import Unet, Decoder, CLIP
# trained clip from step 1
clip = CLIP(
dim_text = 512,
dim_image = 512,
dim_latent = 512,
num_text_tokens = 49408,
text_enc_depth = 6,
text_seq_len = 256,
text_heads = 8,
visual_enc_depth = 6,
visual_image_size = 256,
visual_patch_size = 32,
visual_heads = 8
).cuda()
# 2 unets for the decoder (a la cascading DDPM)
unet1 = Unet(
dim = 32,
image_embed_dim = 512,
cond_dim = 128,
channels = 3,
dim_mults = (1, 2, 4, 8)
).cuda()
unet2 = Unet(
dim = 32,
image_embed_dim = 512,
cond_dim = 128,
channels = 3,
dim_mults = (1, 2, 4, 8, 16)
).cuda()
# decoder, which contains the unet(s) and clip
decoder = Decoder(
clip = clip,
unet = (unet1, unet2), # insert both unets in order of low resolution to highest resolution (you can have as many stages as you want here)
image_sizes = (256, 512), # resolutions, 256 for first unet, 512 for second. these must be unique and in ascending order (matches with the unets passed in)
timesteps = 1000,
image_cond_drop_prob = 0.1,
text_cond_drop_prob = 0.5
).cuda()
# mock images (get a lot of this)
images = torch.randn(4, 3, 512, 512).cuda()
# feed images into decoder, specifying which unet you want to train
# each unet can be trained separately, which is one of the benefits of the cascading DDPM scheme
loss = decoder(images, unet_number = 1)
loss.backward()
loss = decoder(images, unet_number = 2)
loss.backward()
# do the above for many steps for both unets
Finally, to generate the DALL-E2 images from text. Insert the trained DiffusionPrior
as well as the Decoder
(which wraps CLIP
, the causal transformer, and unet(s))
from dalle2_pytorch import DALLE2
dalle2 = DALLE2(
prior = diffusion_prior,
decoder = decoder
)
# send the text as a string if you want to use the simple tokenizer from DALLE v1
# or you can do it as token ids, if you have your own tokenizer
texts = ['glistening morning dew on a flower petal']
images = dalle2(texts) # (1, 3, 256, 256)
That's it!
Let's see the whole script below
import torch
from dalle2_pytorch import DALLE2, DiffusionPriorNetwork, DiffusionPrior, Unet, Decoder, CLIP
clip = CLIP(
dim_text = 512,
dim_image = 512,
dim_latent = 512,
num_text_tokens = 49408,
text_enc_depth = 6,
text_seq_len = 256,
text_heads = 8,
visual_enc_depth = 6,
visual_image_size = 256,
visual_patch_size = 32,
visual_heads = 8
).cuda()
# mock data
text = torch.randint(0, 49408, (4, 256)).cuda()
images = torch.randn(4, 3, 256, 256).cuda()
# train
loss = clip(
text,
images,
return_loss = True
)
loss.backward()
# do above for many steps ...
# prior networks (with transformer)
prior_network = DiffusionPriorNetwork(
dim = 512,
depth = 6,
dim_head = 64,
heads = 8
).cuda()
diffusion_prior = DiffusionPrior(
net = prior_network,
clip = clip,
timesteps = 100,
cond_drop_prob = 0.2
).cuda()
loss = diffusion_prior(text, images)
loss.backward()
# do above for many steps ...
# decoder (with unet)
unet1 = Unet(
dim = 128,
image_embed_dim = 512,
cond_dim = 128,
channels = 3,
dim_mults=(1, 2, 4, 8)
).cuda()
unet2 = Unet(
dim = 16,
image_embed_dim = 512,
cond_dim = 128,
channels = 3,
dim_mults = (1, 2, 4, 8, 16)
).cuda()
decoder = Decoder(
unet = (unet1, unet2),
image_sizes = (128, 256),
clip = clip,
timesteps = 100,
image_cond_drop_prob = 0.1,
text_cond_drop_prob = 0.5,
condition_on_text_encodings = False # set this to True if you wish to condition on text during training and sampling
).cuda()
for unet_number in (1, 2):
loss = decoder(images, unet_number = unet_number) # this can optionally be decoder(images, text) if you wish to condition on the text encodings as well, though it was hinted in the paper it didn't do much
loss.backward()
# do above for many steps
dalle2 = DALLE2(
prior = diffusion_prior,
decoder = decoder
)
images = dalle2(
['cute puppy chasing after a squirrel'],
cond_scale = 2. # classifier free guidance strength (> 1 would strengthen the condition)
)
# save your image (in this example, of size 256x256)
Everything in this readme should run without error
You can also train the decoder on images of greater than the size (say 512x512) at which CLIP was trained (256x256). The images will be resized to CLIP image resolution for the image embeddings
For the layperson, no worries, training will all be automated into a CLI tool, at least for small scale training.
It is likely, when scaling up, that you would first preprocess your images and text into corresponding embeddings before training the prior network. You can do so easily by simply passing in image_embed
, text_embed
, and optionally text_encodings
and text_mask
Working example below
import torch
from dalle2_pytorch import DiffusionPriorNetwork, DiffusionPrior, CLIP
# get trained CLIP from step one
clip = CLIP(
dim_text = 512,
dim_image = 512,
dim_latent = 512,
num_text_tokens = 49408,
text_enc_depth = 6,
text_seq_len = 256,
text_heads = 8,
visual_enc_depth = 6,
visual_image_size = 256,
visual_patch_size = 32,
visual_heads = 8,
).cuda()
# setup prior network, which contains an autoregressive transformer
prior_network = DiffusionPriorNetwork(
dim = 512,
depth = 6,
dim_head = 64,
heads = 8
).cuda()
# diffusion prior network, which contains the CLIP and network (with transformer) above
diffusion_prior = DiffusionPrior(
net = prior_network,
clip = clip,
timesteps = 100,
cond_drop_prob = 0.2,
condition_on_text_encodings = False # this probably should be true, but just to get Laion started
).cuda()
# mock data
text = torch.randint(0, 49408, (4, 256)).cuda()
images = torch.randn(4, 3, 256, 256).cuda()
# precompute the text and image embeddings
# here using the diffusion prior class, but could be done with CLIP alone
clip_image_embeds = diffusion_prior.clip.embed_image(images).image_embed
clip_text_embeds = diffusion_prior.clip.embed_text(text).text_embed
# feed text and images into diffusion prior network
loss = diffusion_prior(
text_embed = clip_text_embeds,
image_embed = clip_image_embeds
)
loss.backward()
# do the above for many many many steps
# now the diffusion prior can generate image embeddings from the text embeddings
You can also completely go CLIP
-less, in which case you will need to pass in the image_embed_dim
into the DiffusionPrior
on initialization
import torch
from dalle2_pytorch import DiffusionPriorNetwork, DiffusionPrior
# setup prior network, which contains an autoregressive transformer
prior_network = DiffusionPriorNetwork(
dim = 512,
depth = 6,
dim_head = 64,
heads = 8
).cuda()
# diffusion prior network, which contains the CLIP and network (with transformer) above
diffusion_prior = DiffusionPrior(
net = prior_network,
image_embed_dim = 512, # this needs to be set
timesteps = 100,
cond_drop_prob = 0.2,
condition_on_text_encodings = False # this probably should be true, but just to get Laion started
).cuda()
# mock data
text = torch.randint(0, 49408, (4, 256)).cuda()
images = torch.randn(4, 3, 256, 256).cuda()
# precompute the text and image embeddings
# here using the diffusion prior class, but could be done with CLIP alone
clip_image_embeds = torch.randn(4, 512).cuda()
clip_text_embeds = torch.randn(4, 512).cuda()
# feed text and images into diffusion prior network
loss = diffusion_prior(
text_embed = clip_text_embeds,
image_embed = clip_image_embeds
)
loss.backward()
# do the above for many many many steps
# now the diffusion prior can generate image embeddings from the text embeddings
Although there is the possibility they are using an unreleased, more powerful CLIP, you can use one of the released ones, if you do not wish to train your own CLIP from scratch. This will also allow the community to more quickly validate the conclusions of the paper.
To use a pretrained OpenAI CLIP, simply import OpenAIClipAdapter
and pass it into the DiffusionPrior
or Decoder
like so
import torch
from dalle2_pytorch import DALLE2, DiffusionPriorNetwork, DiffusionPrior, Unet, Decoder, OpenAIClipAdapter
# openai pretrained clip - defaults to ViT/B-32
clip = OpenAIClipAdapter()
# mock data
text = torch.randint(0, 49408, (4, 256)).cuda()
images = torch.randn(4, 3, 256, 256).cuda()
# prior networks (with transformer)
prior_network = DiffusionPriorNetwork(
dim = 512,
depth = 6,
dim_head = 64,
heads = 8
).cuda()
diffusion_prior = DiffusionPrior(
net = prior_network,
clip = clip,
timesteps = 100,
cond_drop_prob = 0.2
).cuda()
loss = diffusion_prior(text, images)
loss.backward()
# do above for many steps ...
# decoder (with unet)
unet1 = Unet(
dim = 128,
image_embed_dim = 512,
cond_dim = 128,
channels = 3,
dim_mults=(1, 2, 4, 8)
).cuda()
unet2 = Unet(
dim = 16,
image_embed_dim = 512,
cond_dim = 128,
channels = 3,
dim_mults = (1, 2, 4, 8, 16)
).cuda()
decoder = Decoder(
unet = (unet1, unet2),
image_sizes = (128, 256),
clip = clip,
timesteps = 100,
image_cond_drop_prob = 0.1,
text_cond_drop_prob = 0.5,
condition_on_text_encodings = False # set this to True if you wish to condition on text during training and sampling
).cuda()
for unet_number in (1, 2):
loss = decoder(images, unet_number = unet_number) # this can optionally be decoder(images, text) if you wish to condition on the text encodings as well, though it was hinted in the paper it didn't do much
loss.backward()
# do above for many steps
dalle2 = DALLE2(
prior = diffusion_prior,
decoder = decoder
)
images = dalle2(
['a butterfly trying to escape a tornado'],
cond_scale = 2. # classifier free guidance strength (> 1 would strengthen the condition)
)
# save your image (in this example, of size 256x256)
Now you'll just have to worry about training the Prior and the Decoder!
This repository decides to take the next step and offer DALL-E v2 combined with latent diffusion, from Rombach et al.
You can use it as follows. Latent diffusion can be limited to just the first U-Net in the cascade, or to any number you wish.
The repository also comes equipped with all the necessary settings to recreate ViT-VQGan
from the Improved VQGans paper. Furthermore, the vector quantization library also comes equipped to do residual or multi-headed quantization, which I believe will give an even further boost in performance to the autoencoder.
import torch
from dalle2_pytorch import Unet, Decoder, CLIP, VQGanVAE
# trained clip from step 1
clip = CLIP(
dim_text = 512,
dim_image = 512,
dim_latent = 512,
num_text_tokens = 49408,
text_enc_depth = 1,
text_seq_len = 256,
text_heads = 8,
visual_enc_depth = 1,
visual_image_size = 256,
visual_patch_size = 32,
visual_heads = 8
)
# 3 unets for the decoder (a la cascading DDPM)
# first two unets are doing latent diffusion
# vqgan-vae must be trained beforehand
vae1 = VQGanVAE(
dim = 32,
image_size = 256,
layers = 3,
layer_mults = (1, 2, 4)
)
vae2 = VQGanVAE(
dim = 32,
image_size = 512,
layers = 3,
layer_mults = (1, 2, 4)
)
unet1 = Unet(
dim = 32,
image_embed_dim = 512,
cond_dim = 128,
channels = 3,
sparse_attn = True,
sparse_attn_window = 2,
dim_mults = (1, 2, 4, 8)
)
unet2 = Unet(
dim = 32,
image_embed_dim = 512,
channels = 3,
dim_mults = (1, 2, 4, 8, 16),
cond_on_image_embeds = True,
cond_on_text_encodings = False
)
unet3 = Unet(
dim = 32,
image_embed_dim = 512,
channels = 3,
dim_mults = (1, 2, 4, 8, 16),
cond_on_image_embeds = True,
cond_on_text_encodings = False,
attend_at_middle = False
)
# decoder, which contains the unet(s) and clip
decoder = Decoder(
clip = clip,
vae = (vae1, vae2), # latent diffusion for unet1 (vae1) and unet2 (vae2), but not for the last unet3
unet = (unet1, unet2, unet3), # insert unets in order of low resolution to highest resolution (you can have as many stages as you want here)
image_sizes = (256, 512, 1024), # resolutions, 256 for first unet, 512 for second, 1024 for third
timesteps = 100,
image_cond_drop_prob = 0.1,
text_cond_drop_prob = 0.5
).cuda()
# mock images (get a lot of this)
images = torch.randn(1, 3, 1024, 1024).cuda()
# feed images into decoder, specifying which unet you want to train
# each unet can be trained separately, which is one of the benefits of the cascading DDPM scheme
with decoder.one_unet_in_gpu(1):
loss = decoder(images, unet_number = 1)
loss.backward()
with decoder.one_unet_in_gpu(2):
loss = decoder(images, unet_number = 2)
loss.backward()
with decoder.one_unet_in_gpu(3):
loss = decoder(images, unet_number = 3)
loss.backward()
# do the above for many steps for both unets
# then it will learn to generate images based on the CLIP image embeddings
# chaining the unets from lowest resolution to highest resolution (thus cascading)
mock_image_embed = torch.randn(1, 512).cuda()
images = decoder.sample(mock_image_embed) # (1, 3, 1024, 1024)
Training the Decoder
may be confusing, as one needs to keep track of an optimizer for each of the Unet
(s) separately. Each Unet
will also need its own corresponding exponential moving average. The DecoderTrainer
hopes to make this simple, as shown below
import torch
from dalle2_pytorch import DALLE2, Unet, Decoder, CLIP, DecoderTrainer
clip = CLIP(
dim_text = 512,
dim_image = 512,
dim_latent = 512,
num_text_tokens = 49408,
text_enc_depth = 6,
text_seq_len = 256,
text_heads = 8,
visual_enc_depth = 6,
visual_image_size = 256,
visual_patch_size = 32,
visual_heads = 8
).cuda()
# mock data
text = torch.randint(0, 49408, (4, 256)).cuda()
images = torch.randn(4, 3, 256, 256).cuda()
# decoder (with unet)
unet1 = Unet(
dim = 128,
image_embed_dim = 512,
text_embed_dim = 512,
cond_dim = 128,
channels = 3,
dim_mults=(1, 2, 4, 8)
).cuda()
unet2 = Unet(
dim = 16,
image_embed_dim = 512,
text_embed_dim = 512,
cond_dim = 128,
channels = 3,
dim_mults = (1, 2, 4, 8, 16),
cond_on_text_encodings = True
).cuda()
decoder = Decoder(
unet = (unet1, unet2),
image_sizes = (128, 256),
clip = clip,
timesteps = 1000,
condition_on_text_encodings = True
).cuda()
decoder_trainer = DecoderTrainer(
decoder,
lr = 3e-4,
wd = 1e-2,
ema_beta = 0.99,
ema_update_after_step = 1000,
ema_update_every = 10,
)
for unet_number in (1, 2):
loss = decoder_trainer(images, text = text, unet_number = unet_number) # use the decoder_trainer forward
loss.backward()
decoder_trainer.update(unet_number) # update the specific unet as well as its exponential moving average
# after much training
# you can sample from the exponentially moving averaged unets as so
mock_image_embed = torch.randn(4, 512).cuda()
images = decoder_trainer.sample(mock_image_embed, text = text) # (4, 3, 256, 256)
$ dream 'sharing a sunset at the summit of mount everest with my dog'
Once built, images will be saved to the same directory the command is invoked
@misc{ramesh2022,
title = {Hierarchical Text-Conditional Image Generation with CLIP Latents},
author = {Aditya Ramesh et al},
year = {2022}
}
@misc{crowson2022,
author = {Katherine Crowson},
url = {https://twitter.com/rivershavewings}
}
@misc{rombach2021highresolution,
title = {High-Resolution Image Synthesis with Latent Diffusion Models},
author = {Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
year = {2021},
eprint = {2112.10752},
archivePrefix = {arXiv},
primaryClass = {cs.CV}
}
@inproceedings{Liu2022ACF,
title = {A ConvNet for the 2020s},
author = {Zhuang Liu and Hanzi Mao and Chaozheng Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie},
year = {2022}
}
@article{shen2019efficient,
author = {Zhuoran Shen and Mingyuan Zhang and Haiyu Zhao and Shuai Yi and Hongsheng Li},
title = {Efficient Attention: Attention with Linear Complexities},
journal = {CoRR},
year = {2018},
url = {http://arxiv.org/abs/1812.01243},
}
@inproceedings{Tu2022MaxViTMV,
title = {MaxViT: Multi-Axis Vision Transformer},
author = {Zhe-Wei Tu and Hossein Talebi and Han Zhang and Feng Yang and Peyman Milanfar and Alan Conrad Bovik and Yinxiao Li},
year = {2022}
}
@article{Yu2021VectorquantizedIM,
title = {Vector-quantized Image Modeling with Improved VQGAN},
author = {Jiahui Yu and Xin Li and Jing Yu Koh and Han Zhang and Ruoming Pang and James Qin and Alexander Ku and Yuanzhong Xu and Jason Baldridge and Yonghui Wu},
journal = {ArXiv},
year = {2021},
volume = {abs/2110.04627}
}
@article{Shleifer2021NormFormerIT,
title = {NormFormer: Improved Transformer Pretraining with Extra Normalization},
author = {Sam Shleifer and Jason Weston and Myle Ott},
journal = {ArXiv},
year = {2021},
volume = {abs/2110.09456}
}
Creating noise from data is easy; creating data from noise is generative modeling. - Yang Song's paper
Download Details:
Author: lucidrains
Source Code: https://github.com/lucidrains/DALLE2-pytorch
License: MIT
#pytorch
1651649644
YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset, and represents Ultralytics open-source research into future vision AI methods, incorporating lessons learned and best practices evolved over thousands of hours of research and development.
Documentation
See the YOLOv5 Docs for full documentation on training, testing and deployment.
Quick Start Examples
Install
Clone repo and install requirements.txt in a Python>=3.7.0 environment, including PyTorch>=1.7.
git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install
Inference
YOLOv5 PyTorch Hub inference. Models download automatically from the latest YOLOv5 release.
import torch
# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s') # or yolov5n - yolov5x6, custom
# Images
img = 'https://ultralytics.com/images/zidane.jpg' # or file, Path, PIL, OpenCV, numpy, list
# Inference
results = model(img)
# Results
results.print() # or .show(), .save(), .crop(), .pandas(), etc.
Inference with detect.py
detect.py
runs inference on a variety of sources, downloading models automatically from the latest YOLOv5 release and saving results to runs/detect
.
python detect.py --source 0 # webcam
img.jpg # image
vid.mp4 # video
path/ # directory
path/*.jpg # glob
'https://youtu.be/Zgi9g1ksQHc' # YouTube
'rtsp://example.com/media.mp4' # RTSP, RTMP, HTTP stream
Training
The commands below reproduce YOLOv5 COCO results. Models and datasets download automatically from the latest YOLOv5 release. Training times for YOLOv5n/s/m/l/x are 1/2/4/6/8 days on a V100 GPU (Multi-GPU times faster). Use the largest --batch-size
possible, or pass --batch-size -1
for YOLOv5 AutoBatch. Batch sizes shown for V100-16GB.
python train.py --data coco.yaml --cfg yolov5n.yaml --weights '' --batch-size 128
yolov5s 64
yolov5m 40
yolov5l 24
yolov5x 16
Tutorials
Integrations
Weights and Biases | Roboflow ⭐ NEW |
---|---|
Automatically track and visualize all your YOLOv5 training runs in the cloud with Weights & Biases | Label and export your custom datasets directly to YOLOv5 for training with Roboflow |
Why YOLOv5
YOLOv5-P5 640 Figure (click to expand)
Figure Notes (click to expand)
python val.py --task study --data coco.yaml --iou 0.7 --weights yolov5n6.pt yolov5s6.pt yolov5m6.pt yolov5l6.pt yolov5x6.pt
Model | size (pixels) | mAPval 0.5:0.95 | mAPval 0.5 | Speed CPU b1 (ms) | Speed V100 b1 (ms) | Speed V100 b32 (ms) | params (M) | FLOPs @640 (B) |
---|---|---|---|---|---|---|---|---|
[YOLOv5n][assets] | 640 | 28.0 | 45.7 | 45 | 6.3 | 0.6 | 1.9 | 4.5 |
[YOLOv5s][assets] | 640 | 37.4 | 56.8 | 98 | 6.4 | 0.9 | 7.2 | 16.5 |
[YOLOv5m][assets] | 640 | 45.4 | 64.1 | 224 | 8.2 | 1.7 | 21.2 | 49.0 |
[YOLOv5l][assets] | 640 | 49.0 | 67.3 | 430 | 10.1 | 2.7 | 46.5 | 109.1 |
[YOLOv5x][assets] | 640 | 50.7 | 68.9 | 766 | 12.1 | 4.8 | 86.7 | 205.7 |
[YOLOv5n6][assets] | 1280 | 36.0 | 54.4 | 153 | 8.1 | 2.1 | 3.2 | 4.6 |
[YOLOv5s6][assets] | 1280 | 44.8 | 63.7 | 385 | 8.2 | 3.6 | 12.6 | 16.8 |
[YOLOv5m6][assets] | 1280 | 51.3 | 69.3 | 887 | 11.1 | 6.8 | 35.7 | 50.0 |
[YOLOv5l6][assets] | 1280 | 53.7 | 71.3 | 1784 | 15.8 | 10.5 | 76.8 | 111.4 |
[YOLOv5x6][assets] + [TTA][TTA] | 1280 1536 | 55.0 55.8 | 72.7 72.7 | 3136 - | 26.2 - | 19.4 - | 140.7 - | 209.8 - |
Table Notes (click to expand)
python val.py --data coco.yaml --img 640 --conf 0.001 --iou 0.65
python val.py --data coco.yaml --img 640 --task speed --batch 1
python val.py --data coco.yaml --img 1536 --iou 0.7 --augment
Contact
For YOLOv5 bugs and feature requests please visit GitHub Issues. For business inquiries or professional support requests please visit https://ultralytics.com/contact.
Download Details:
Author: ultralytics
Source Code: https://github.com/ultralytics/yolov5
License: GPL-3.0 License
#yolo #pytorch