Neural Style Transfer
Neural Style Transfer in deep learning
What is Neural Style Transfer
Use a ‘Content’ image C and a ‘Style’ image S to generate a new image G, which has C’s content and S’s style.
What are Deep ConvNets learning?
One way we can do is visualizing.
From shallow layers to deep layers, pick a unit and find the image patches that maximize the unit’s activation (receptive field).
Then repeat for other units.
- For shallow layers, the hidden units may be activated most by images of edges (e.g.: vertical/horizontal edges in colors). —— simple
- For deep layers, the units may be activated most by images of entire objects (e.g.: images of dogs, flowers). —— complicated
Paper (Visualizing what the ConvNets are learning)
Visualizing and understanding convolutional networks
Neural Style Transfer Algorithm
By minimizing a cost function, we can generate the image $G$ we want.
- Generate a $Cost Function J(G)$
- Initialize G randomly
- Use gradient descent to minimize J(G)
Cost Function
$
J(G) = α \cdot J_{content}(C, G) + β \cdot J_{style}(S, G)
$
$J_{content}(C, G)$ measures how similar is the $content$ of image G to that of the image C
$J_{style}(S, G)$ measures how similar is the $style$ of image G to that of the image C
The Content cost function: $J_{content}(C, G)$
Target: Keep the high level feature structure of content image C
sTool: Use ConvNets to extract features (select some conv layers’ activation outputs as content)
Choose layer $l$ to compute content cost: $l$ should be the middle of the NN (neither too shallow nor too deep).
- For using hidden layer $l$ to compute content cost.
- Use a pre-trained ConvNet. (E.g., VGG network).
- Let $a^{[l](C)}$ and $a^{[l](G)}$ be the activation of layer $l$ on the images C and G.
- If $a^{[l](C)}$ and $a^{[l](G)}$ are similar, C and G have similar content.
$
J_{content}(C, G) = ||a^{[l](C)} - a^{[l](G)}||^2,\ it’s\ L2\ Norm\ between\ a^{[l](C)}\ and\ a^{[l](G)}
$
The Style cost function: $J_{style}(S, G)$
If using layer $l$’s activation to measure $Style$. Define $Style$ as correlation between activations across $channels$.
For a certain layer’s activation, first channel corresponds to neuron $a$, second channel corresponds to neuron $b$.
If first and second channel correlate, it means when there is a type of neuron $a$ in an image, there will probably be a type of neuron $b$ in that image too.
Compare the correlation between channels in input image and in generated image, we can measure how similar the style is between input image and generated image.
Style Matrix (Gram Matrix)
Let $a_{i,j,k}^{[l]}$ = activation at $(i,j,k)$. $G^{[l]}$ is $n_c^{[l]} * n_c^{[l]}$.
- For Style image: $G_{kk’}^{[l](S)} = \sum_{i=1}^{n_H^{[l]}} \sum_{j=1}^{n_W^{[l]}} a_{ijk}^{[l](S)} · a_{ijk’}^{[l](S)}$
- For Generated image: $G_{kk’}^{[l](G)} = \sum_{i=1}^{n_H^{[l]}} \sum_{j=1}^{n_W^{[l]}} a_{ijk}^{[l](G)} · a_{ijk’}^{[l](G)}$
Formula
$
For\ one\ layer:
J_{style}^{[l]}(S, G) = ||G^{[l](S)} - G^{[l](G)}||^2 = \frac{1}{(2n_H^{[l]}n_W^{[l]}n_C^{[l]})^2} \sum_{k}^{n_c^{[l]}} \sum_{k’}^{n_c^{[l]}} (G_{kk’}^{[l](S)} - G_{kk’}^{[l](G)})^2
$
$
For\ all\ layers:
J_{style}^{[l]}(S, G) = \sum_{l} \lambda^{[l]} J_{style}^{[l]}(S, G),\ where\ \lambda\ is\ a\ hyperparameter
$
Paper (Not too hard)
A neural algorithm of artistic style. Images on slide generated by Justin Johnson.