At Imaginea, we run a social network for typoholics called Fontli as our designers have a passion for the field. Folks share typography that they catch in the wild or work that they’ve created themselves. Members ask others for font identification and tips, and tag what they’re able to identify themselves.
Given that we’re into typography, we would love to have a system where we can take a picture of some type and apply it to text of our own choice! We know that Deep Convolutional Neural Networks(DCNNs) have recently been achieving great results in image transformation tasks, most notably in the artistic style transfer. As these DCNNs are capable of capturing and transferring style of one image onto the other, we want to explore them and use to build a new system to transfer style of typography. We call this system as Deep Type.
In this post we share our experiments with known style transfer techniques applied to the context of typography.
- CNNs in practice
- Deep Type - Core Functionality
- Initial experiments on Fast-Neural-Style
- Experiments on fine tuning Slow-Neural Style Parameters
- Final tuned Parameters
- Results with tuned parameters
- Applying tuned parameters of Slow-Neural Style on Fast-Neural Style
Deep Type is being developed based on the core idea of CNNs. There are many Convolutional Neural Network architectures which have been proven with better results for the task of image classification challenge hosted by ImageNet.
All these architectures work by passing the input image through different layers where each layer extracts some features of the input image and remembers them.
The lower layers(inital layers) will usually extract the colors, edges. And as the level progress the higher layers will extract more complex shapes, which will inturn be used to recoginize the objects in the image and then classify them.
Usually the layers at lower level(initial layers) preserve more spatial information about the input image. Hence we will be able to construct back the input image with minimum spatial loss. The higher level layers will lose the spatial information but preserves the content of the input image in the form of features.
Given a content image and style typographic image, the algorithm extracts style (texture/high level pattern) from style image and renders the given content image with the extracted texture, as shown in the below figure.
Leon A. Gatys has published a paper, A Neural Algorithm of Artistic Style explaining the approach for style transfer. All our experiments are carried on the jcjohnson's implementation based on the key techique used in this paper. It works as follows:
The style image is passed through the network and the correlations are found via a gram matrix calculated by taking the inner-product of vectorised feature maps for given set of layers. This gram matrix will represent the extracted texture information of the given style image.
The content image is passed through the network and as we know the higher layers will have more abstruct information of the given image, the feature of map at a given higher layer is taken as the content extracted for the given content image.
White noise is passeed through the network and a new image is generated by combining aspects of content and style images via a parameterised ratio and the loss is computed by comparing the gram matrix and the content feature map. This loss is minized through gradient descent by repeating the process for a given set of iterations. Each iteration will result a stylized image which is trying to match both style construction and content construction images. The output obtained in the final iteration is treated as the generated stylized image.
The below figure illustrates the working of neural style transfer algorithm:
In practice, there are two ways in which neural style transfer is achieved. They are:
- Slow-neural styling
- Fast-neural styling
|Slow Neural Style||Fast Neural Style|
|No Training required||Requires training|
|Easy to tweak parameters and see the results with updated changes||Involves quite a lot of time as we need to train the model with the changes before seeing the result|
|Requires lot of compute power (GPU)||Requires less compute power (Reasonable CPU configuration is sufficient)|
|Easily scalable to multiple styles||One model can only represent one style. Can’t be scaled across the style images|
|Results are proportional to current style image and current content image||Results are proportional to the training data and current content image|
By using typographic image as content image, result will be comparable with using non-typographic content image.
We see that the style isn't carried over to the plain white/transparent background of the content image (typographic image)
Some kind of preprocessing the content image can actually help in spreading the style all over the image.
Adding some background to the content image makes style to spread across the result image.
Results show that adding some background template to the plane typographic image produces results with style spread across the image which proves our hypothesis.
As we observed that the resultant image quality depends on the background of content image, there are so many other parameters which affect the results. Some of them are...
- Style Image
- Content Layers (Convolution layers used for the content reconstruction loss)
- Style Layers (Convolution layers used for the style reconstruction loss)
- Content-Weight (Weights used for each content reconstruction loss)
- Style-Weight (Weights used for style reconstruction loss)
- Number of iterations (Number of ite to run the algorithm in the process of constructing the resultant image).
Since the training of Fast-Neural style is time consuming process, we first tried to tune Slow-Neural style and apply the tuned parameters to Fast-Neural style assuming tuned parameters on Slow-Neural Style produce expected results with Fast-Neural style.
Out of all the above mentioned factors influencing the result, we concentrated on style-weight/ content-weight (style-content-ratio), number-of-iterations, different-models and style image as initial parameters in performing the experiments.
|Tuning parameter||Hypothesis||Observation||Tuned value|
|style-content-ratio||With increase in the style-content-ratio,blending of content into the style will be more||With increase in style weight:
||Style-content-ratio : 200:1|
|Iterations||Blending of content into the style increases with increase in number of iterations||As the number of iterations increase, the content gets more stylized but with decrease in blending.||Iterations = 500|
|Diff-models||vgg19 performs better than vgg16||vgg16 out performs vgg19.May be vgg19 need different values of the parameters used to give good results.||Model = vgg16|
- Style to content ratio → 200:1
- Number of iterations → 500
- Model → vgg16
- All other are default parameters
As the Fast-Neural Style has different notation for iterations when compared to Slow-Neural Style we conducted the fast-neural-style experiment at different iterations along with tuned parameters of Slow-Neural Style.
|Style-Image||Slow-Neural-Style||Fast-Neural-Style(2000 ite)||Fast-Neural-Style(20000 ite)|
- For a given set of parameters, the result of slow neural style does not match
the results of fast-neural style.
- As the number of iterations increases blending of style with content
increases and the visibility of content (text) decreases.
- Fast-Neural Style need typographic content images with some background to
produce results with style spread across the image.
- In Fast-Neural Style, increase in number of iterations does not produce the
result image with good style blending and text visibility.
- Behavior of Slow-Neural Style is different from Fast-Neural Style. And
Fast-Neural Style need separate tuning of parameters to achieve expected
(Watch this blog for more updates on our work.)
- Artistic Neural Style "https://arxiv.org/abs/1508.06576"
- Perceptual Losses for Real-Time Style Transfer and Super-Resolution
- Instance Normalization: The Missing Ingredient for Fast Stylization