Deep Type

At Imaginea, we run a social network for typoholics called Fontli as our designers have a passion for the field. Folks share typography that they catch in the wild or work that they’ve created themselves. Members ask others for font identification and tips, and tag what they’re able to identify themselves.

Given that we’re into typography, we would love to have a system where we can take a picture of some type and apply it to text of our own choice! We know that Deep Convolutional Neural Networks(DCNNs) have recently been achieving great results in image transformation tasks, most notably in the artistic style transfer. As these DCNNs are capable of capturing and transferring style of one image onto the other, we want to explore them and use to build a new system to transfer style of typography. We call this system as Deep Type.

In this post we share our experiments with known style transfer techniques applied to the context of typography.

Contents

  1. CNNs in practice
  2. Deep Type - Core Functionality
    1. How it works?
    2. Style Construction
    3. Content Construction
    4. Generating stylized image
    5. Fast-Neural-Style vs. Slow-Neural-Style
  3. Initial experiments on Fast-Neural-Style
    1. Experiment 1
    2. Experiment 2
  4. Experiments on fine tuning Slow-Neural Style Parameters
  5. Final tuned Parameters
  6. Results with tuned parameters
  7. Applying tuned parameters of Slow-Neural Style on Fast-Neural Style
  8. Summary
  9. References
  10. Credits

CNNs in practice

Deep Type is being developed based on the core idea of CNNs. There are many Convolutional Neural Network architectures which have been proven with better results for the task of image classification challenge hosted by ImageNet.

All these architectures work by passing the input image through different layers where each layer extracts some features of the input image and remembers them.

The lower layers(inital layers) will usually extract the colors, edges. And as the level progress the higher layers will extract more complex shapes, which will inturn be used to recoginize the objects in the image and then classify them.

Usually the layers at lower level(initial layers) preserve more spatial information about the input image. Hence we will be able to construct back the input image with minimum spatial loss. The higher level layers will lose the spatial information but preserves the content of the input image in the form of features.

Deep Type - Core Functionality

Given a content image and style typographic image, the algorithm extracts style (texture/high level pattern) from style image and renders the given content image with the extracted texture, as shown in the below figure.

Content-Image Style-Image Result
Content Style1 Result1
Content Style2 Result2
Content Style3 Result3
How it works?

Leon A. Gatys has published a paper, A Neural Algorithm of Artistic Style explaining the approach for style transfer. All our experiments are carried on the jcjohnson’s implementation based on the key techique used in this paper. It works as follows:

Key_Technique

Style Construction

The style image is passed through the network and the correlations are found via a gram matrix calculated by taking the inner-product of vectorised feature maps for given set of layers. This gram matrix will represent the extracted texture information of the given style image.

Content Construction

The content image is passed through the network and as we know the higher layers will have more abstruct information of the given image, the feature of map at a given higher layer is taken as the content extracted for the given content image.

Generating stylized image

White noise is passeed through the network and a new image is generated by combining aspects of content and style images via a parameterised ratio and the loss is computed by comparing the gram matrix and the content feature map. This loss is minized through gradient descent by repeating the process for a given set of iterations. Each iteration will result a stylized image which is trying to match both style construction and content construction images. The output obtained in the final iteration is treated as the generated stylized image.

The below figure illustrates the working of neural style transfer algorithm:

working.png

In practice, there are two ways in which neural style transfer is achieved. They are:

  • Slow-neural styling
  • Fast-neural styling
Fast-Neural-Style vs. Slow-Neural-Style
Slow Neural Style Fast Neural Style
No Training required Requires training
Easy to tweak parameters and see the results with updated changes Involves quite a lot of time as we need to train the model with the changes before seeing the result
Requires lot of compute power (GPU) Requires less compute power (Reasonable CPU configuration is sufficient)
Easily scalable to multiple styles One model can only represent one style. Can’t be scaled across the style images
Results are proportional to current style image and current content image Results are proportional to the training data and current content image

Initial experiments on Fast-Neural-Style

Experiment 1
Hypothesis

By using typographic image as content image, result will be comparable with using non-typographic content image.

Results
Style-Image Content-image Result
candy hoovertowernight hoovertowernight_candy
candy fontli fontli-30k-gen
Observation

We see that the style isn’t carried over to the plain white/transparent background of the content image (typographic image)

Conclusion

Some kind of preprocessing the content image can actually help in spreading the style all over the image.

Experiment 2
Hypothesis

Adding some background to the content image makes style to spread across the result image.

Results
Style-Image Processing-Template Content-image Result
candy fontli fontli-30k-gen
candy clear-sky fontli-with-clear-sky fontli-30k-clear-sky-gen
candy clouds fontli-with-clouds fontli-30k-clouds-gen
Observation

Results show that adding some background template to the plane typographic image produces results with style spread across the image which proves our hypothesis.

Note

As we observed that the resultant image quality depends on the background of content image, there are so many other parameters which affect the results. Some of them are…

  • Network
  • Style Image
  • Content Layers (Convolution layers used for the content reconstruction loss)
  • Style Layers (Convolution layers used for the style reconstruction loss)
  • Content-Weight (Weights used for each content reconstruction loss)
  • Style-Weight (Weights used for style reconstruction loss)
  • Number of iterations (Number of ite to run the algorithm in the process of constructing the resultant image).

Since the training of Fast-Neural style is time consuming process, we first tried to tune Slow-Neural style and apply the tuned parameters to Fast-Neural style assuming tuned parameters on Slow-Neural Style produce expected results with Fast-Neural style.

Experiments on fine tuning Slow-Neural Style Parameters

Out of all the above mentioned factors influencing the result, we concentrated on style-weight/ content-weight (style-content-ratio), number-of-iterations, different-models and style image as initial parameters in performing the experiments.

Experiments
Tuning parameter Hypothesis Observation Tuned value
style-content-ratio With increase in the style-content-ratio,blending of content into the style will be more With increase in style weight:
  • style dominates the content
  • content is clear for lower style weights and starts to vanish after a specific style/content ratio
  • content is deformed with increase in style weight
Style-content-ratio : 200:1
Iterations Blending of content into the style increases with increase in number of iterations As the number of iterations increase, the content gets more stylized but with decrease in blending. Iterations = 500
Diff-models vgg19 performs better than vgg16 vgg16 out performs vgg19.May be vgg19 need different values of the parameters used to give good results. Model = vgg16

Final tuned Parameters

  • Style to content ratio → 200:1
  • Number of iterations → 500
  • Model → vgg16
  • All other are default parameters

Results with tuned parameters

Content Image Used

fontli-with-clouds

Style-Image Result-Image
Painting1 Painting1out_500
Painting7 Painting7out_500
Painting10 Painting10out_500
Painting12 Painting12out_500

Applying tuned parameters of Slow-Neural Style on Fast-Neural Style

Note

As the Fast-Neural Style has different notation for iterations when compared to Slow-Neural Style we conducted the fast-neural-style experiment at different iterations along with tuned parameters of Slow-Neural Style.

Content Image Used

fontli-with-clouds

Style-Image Slow-Neural-Style Fast-Neural-Style(2000 ite) Fast-Neural-Style(20000 ite)
Wallpaper15 Wallpaper6out_500 Wallpaper15_1_200_2000 Wallpaper15_1_200_20000
Animation13 Animation4out_500 Animation13_1_200_2000 Animation13_1_200_20000
Painting31 Painting16out_500 Painting31_1_200_2000.png Painting31_1_200_20000
Photograph11 Photograph2out_500 Photograph11_1_200_2000.png Photograph11_1_200_20000
Sketch3 Sketch3out_500 Sketch3_1_200_2000.png Sketch3_1_200_20000
Observation:
  • For a given set of parameters, the result of slow neural style does not match the results of fast-neural style.
  • As the number of iterations increases blending of style with content increases and the visibility of content (text) decreases.

Summary

  • Fast-Neural Style need typographic content images with some background to produce results with style spread across the image.
  • In Fast-Neural Style, increase in number of iterations does not produce the result image with good style blending and text visibility.
  • Behavior of Slow-Neural Style is different from Fast-Neural Style. And Fast-Neural Style need separate tuning of parameters to achieve expected results.

(Watch this blog for more updates on our work.)

References

Credits

Manoj Kumar avatar
About Manoj Kumar
R&D Engineer
Irfan Basha avatar
About Irfan Basha
R&D Engineer