We started Project Creaite last winter to see the capabilities of deep learning and computer vision algorithms in creative work such as fashion design and product design. The first version was focused on creating product pictures for items from fashion vertical such as apparel and accessories. We finally got some time to write about it and share what we did through a blog post here.
As we all know, Artificial Intelligence is finally here in its narrow form and ready to be helpful to solve specific problems for which we have readily available data. We have all seen machine learning algorithms doing all sort of things from speech recognition to visual search to powering autonomous cars. We have even seen algorithms playing games like Chess, Atari, ancient Chinese game of Go etc and they have even beaten the world champions in those games. Most of these accomplishments are governed by something called discriminator class of machine learning algorithms. Recently a different type of algorithms called generative models have shown a lot of promise. At Artifacia Research we have spent some time studying in detail about generative models. In phase one of Project Creaite, we used an encoder-decoder scheme to get promising results and built a prototype a few months ago that can come up with new designs for fashion products after being fed with enough number of examples. We think a similar approach can be applied to other images and models from other creative fields and combined with techniques such as 3D modelling it can be used to augment product design across domains.
In Deep Learning the majority of the problems that you hear about are supervised in nature. The AI research community - re-energised by the power of deep learning algorithms - has also recently started focusing a lot more on unsupervised learning and reinforcement learning problems to be able to advance the state of existing AI systems. When it comes to unsupervised learning the main problem that we face is the methods to represent the data into features. For this auto-encoders were introduced. As the name suggests it encodes the unsupervised data such that the encoding can be further used for many problems that we wish to solve.
High-level View of an Encoder-Decoder Network
Training an auto-encoder is a very interesting process. It is basically a neural network whose output is same as the input. The basic architecture of the auto-encoder is as follows. There is an input layer which is followed by a few hidden layers and then after a certain depth, the hidden layers follow the reverse architecture until we reach a point where the final layer is the same as the input layer. We pass in the data to the network whose embedding we wish to learn.
Illustration of the Underlying Process
This whole network can easily be split into two networks: an encoder network and a decoder network. The encoder basically encodes the data and this encoding can be used as useful features. Increasing the depth of the network can help you generate more and more high-level feature embedding of the data but this could also result in over-fitting the data. This over-fitting of the auto-encoders is tackled with a recent variant of auto-encoders called variational auto-encoders.
The decoder network can be used for decoding purpose as the name suggests. But there is a twist. Once the training of the entire auto-encoder is done, we can use the decoder part and do really interesting stuff with it. Let's say instead of passing the embedding from the encoder, we prepare a pseudo embedding with random numbers and pass it to the decoder, to decode and produce an image for that pseudo embedding. We can also keep a constraint that the pseudo embedding is in the same number range of that of the real embedding and is drawn from a gaussian distribution.
Example results from our network
The images produced with our encoder-decoder network are not exactly beautiful but still surreal to look at. To produce beautiful natural images that our cameras produce, there is a field of study called Generative models. The generative models come in various different flavors. There are generative models which use adversarial networks, there are networks which use variational auto-encoders etc. The generative models with adversarial network try to learn the data distribution of the input data and then try to confuse the discriminative models. The discriminative model tries to save itself by continuously learning from its own mistakes and this ultimately leads to the generative model learning half the data distribution of the data, and with this knowledge it can generate a lot of new pictures which resemble natural images.
We at Artifacia Research believe that the further development of this technology can help bring a lot of efficiency in fashion design in particular and product design in general. We are really excited by the possibilities of Deep Learning and AI and so we would be investing our time to take this project to the next level very soon.
G.E. Hinton, S. Osindero, and Y.W. Teh. A fast learning algorithm for deep belief nets. Paper link
Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes. Paper link
Note : This post is a work in progress and we are going to add more illustrative example images and related research material to it soon.