Posted 2025-01-13CV

Neural Style Transfer

Neural Style Transfer in deep learning

What is Neural Style Transfer

Use a ‘Content’ image C and a ‘Style’ image S to generate a new image G, which has C’s content and S’s style.

What are Deep ConvNets learning?

One way we can do is visualizing.

From shallow layers to deep layers, pick a unit and find the image patches that maximize the unit’s activation (receptive field).

Then repeat for other units.

For shallow layers, the hidden units may be activated most by images of edges (e.g.: vertical/horizontal edges in colors). —— simple

For deep layers, the units may be activated most by images of entire objects (e.g.: images of dogs, flowers). —— complicated

Paper (Visualizing what the ConvNets are learning)

Visualizing and understanding convolutional networks

Neural Style Transfer Algorithm

By minimizing a cost function, we can generate the image $G$ we want.

Generate a $Cost Function J(G)$
Initialize G randomly
Use gradient descent to minimize J(G)

Cost Function

$
J(G) = α \cdot J_{content}(C, G) + β \cdot J_{style}(S, G)
$

$J_{content}(C, G)$ measures how similar is the $content$ of image G to that of the image C
$J_{style}(S, G)$ measures how similar is the $style$ of image G to that of the image C

The Content cost function: $J_{content}(C, G)$

Target: Keep the high level feature structure of content image C
sTool: Use ConvNets to extract features (select some conv layers’ activation outputs as content)

Choose layer $l$ to compute content cost: $l$ should be the middle of the NN (neither too shallow nor too deep).

For using hidden layer $l$ to compute content cost.
Use a pre-trained ConvNet. (E.g., VGG network).
Let $a^{[l](C)}$ and $a^{[l](G)}$ be the activation of layer $l$ on the images C and G.
If $a^{[l](C)}$ and $a^{[l](G)}$ are similar, C and G have similar content.

$
J_{content}(C, G) = ||a^{[l](C)} - a^{[l](G)}||^2,\ it’s\ L2\ Norm\ between\ a^{[l](C)}\ and\ a^{[l](G)}
$

The Style cost function: $J_{style}(S, G)$

If using layer $l$’s activation to measure $Style$. Define $Style$ as correlation between activations across $channels$.

For a certain layer’s activation, first channel corresponds to neuron $a$, second channel corresponds to neuron $b$.

If first and second channel correlate, it means when there is a type of neuron $a$ in an image, there will probably be a type of neuron $b$ in that image too.

Compare the correlation between channels in input image and in generated image, we can measure how similar the style is between input image and generated image.

Style Matrix (Gram Matrix)

Let $a_{i,j,k}^{[l]}$ = activation at $(i,j,k)$. $G^{[l]}$ is $n_c^{[l]} * n_c^{[l]}$.

For Style image: $G_{kk’}^{[l](S)} = \sum_{i=1}^{n_H^{[l]}} \sum_{j=1}^{n_W^{[l]}} a_{ijk}^{[l](S)} · a_{ijk’}^{[l](S)}$
For Generated image: $G_{kk’}^{[l](G)} = \sum_{i=1}^{n_H^{[l]}} \sum_{j=1}^{n_W^{[l]}} a_{ijk}^{[l](G)} · a_{ijk’}^{[l](G)}$

Formula

$
For\ one\ layer:
J_{style}^{[l]}(S, G) = ||G^{[l](S)} - G^{[l](G)}||^2 = \frac{1}{(2n_H^{[l]}n_W^{[l]}n_C^{[l]})^2} \sum_{k}^{n_c^{[l]}} \sum_{k’}^{n_c^{[l]}} (G_{kk’}^{[l](S)} - G_{kk’}^{[l](G)})^2
$

$
For\ all\ layers:
J_{style}^{[l]}(S, G) = \sum_{l} \lambda^{[l]} J_{style}^{[l]}(S, G),\ where\ \lambda\ is\ a\ hyperparameter
$

Paper (Not too hard)

A neural algorithm of artistic style. Images on slide generated by Justin Johnson.

#Computer Vision

Buy me a coffee

Neural Style Transfer

What is Neural Style Transfer

What are Deep ConvNets learning?

Paper (Visualizing what the ConvNets are learning)

Neural Style Transfer Algorithm

Cost Function

The Content cost function: $J_{content}(C, G)$

The Style cost function: $J_{style}(S, G)$

Style Matrix (Gram Matrix)

Formula

Paper (Not too hard)

Like this article? Support the author with

Comments

Tags

Archives

Links

Categories

Recents

follow.it