This is a project that I did for the Deep Learning in Computer Vision course back in my home university — Hong Kong University of Science and Technology during Spring 2019. Back then, I was slowly getting to know the world of fashion and style: how to dress better and keep up with the modern trend. I started to watch catwalk shows of high fashion luxury brands that even a random person on the street knows. Dior, Gucci, Louis Vuitton, Chanel, Hermes, Georgio Armani, Cartier, Burberry, and much more. As I watch more, I began to be gradually immersed in the fashion world. The time came when I need to figure out a final project topic for my Computer Vision course, and I figured, why not create a Deep Learning System that will be able to generate good-looking and creative high-fashion clothing? That would be interesting to visualize right?

And so my teammate and I created DeepStyle.

What is DeepStyle?

In short, DeepStyle is the custom deep learning framework that has the ability to generate high fashion clothing items. It can serve as inspiration for fashion designers and also predict the next trendy items in the fashion industry. DeepStyle takes in trendy fashion images and create new items as a way for effective future trend prediction.

Our research consists of two parts: building the high luxury fashion database and using AI to generate similar fashion items. For the first part, we will need a reliable source where we can gather all the high luxury fashion images from runways. But other than that, we also want to have a model that can identify the clothing and crop out the rest of the image, because ultimately, it’d be weird if we are generating fake models and audience in the background 😂. After we crop the images to only contain the clothing item itself, then we can feed the images into another model which will be able to generate new clothing items from scratch. Cropping the image is essential to remove noise as much as possible.

The Framework

After a brief analysis of the thing we are trying to build, here is a rough framework.

Image for post

DeepStyle Framework

The first part of DeepStyle contains Faster R-CNNwhich is a real-time object detection model that has been proven to achieve state-of-the-art accuracy using its Region Proposal Network. You can read about the official paper here for more details. We will train our Faster R-CNN with the DeepFashion Database which is released by the Chinese University of Hong Kong.

A quick intro to the DeepFashion Database: it is the **largest fashion dataset **to date, which consists of around 800k diverse fashion images with various backgrounds, angles, lighting conditions, etc. This dataset consists of four benchmarks used for different purposes, and the one we use for our project is the Category and Attribute Prediction benchmark. This benchmark has 289,222 clothing images and each image is annotated by the bounding box coordinates and corresponding clothing categories.

Image for post

Category and Attribute Prediction benchmark of DeepFashion Database

After training the Faster R-CNN against the DeepFashion database, the network will be able to predict where the clothing piece is, given any test image. Here is where the Pinterest database comes in. We can build a scraper to scrape the high fashion runways of several large luxury brands from Pinterest and use those as test images for our Faster R-CNN. The reason why we chose Pinterest is because Pinterest provides a lot of clean and high-quality images and also is easy to scrape.

After inference, the bounding boxes of the Pinterest images will be predicted and the rest of the image can be cropped out since we only need the specific item. We then finally pass it to our Fashion GAN, which will be implemented using DCGANor Deep-Convolutional Generative Adversarial Network. Another quick tutorial for GAN: A Generative Adversial Network basically contains two main components: the generator and the discriminator. The generator works hard to create images that look real while the discriminator tries to distinguish real images from fake images. Over time during training, the generator becomes better at generating real images while the discriminator becomes better at figuring what’s real and what’s fake. The final equilibrium is reached when discriminator can no longer figure out whether the images produced by the generator is real or fake.

The final result is the set of images produced by the DCGAN. And hopefully, they will look high fashion!

#deep-learning #machine-learning #artificial-intelligence #technology #data-science

Using State-of-the-Art Deep Learning
1.45 GEEK