Curtis Wigington, Chris Tensmeyer et al (2018),
Start, Follow, Read: End-to-End Full-Page Handwriting Recognition



This work proposes an interesting method. They jointly learn text detection, segmentation and recognition using a model with three components: the Start of Line (SOL) network, the Line Follower (LF) network, and the Handwriting Recognition (HWR) network that are pretrained separately and then jointly trained using only ground truth transcriptions (with line breaks).


Start of Line Network This is a region proposal network that detects the strating points of text lines. For a $16 \times 16$ input image patch, the network densely predicts $x$ and $y$ offsets, scale, rotation angle and probability of occurence for every $16 \times 16$ input patch.

Line Follower This network follows the handwriting line in incrimental steps and outputs a dewarped text line image. Instead of segmenting text lines with a bounding box, the LF network segments polygonal regions and is capable of following and straightening arbitrarily curved text. The LF is a recurrent network that given a current position and angle of rotation $ (x_i, y_i, \theta_i) $. The network predicts the next. This process is repeated until the image edge.

Handwriting Recognition This a CNN-BiLSTM network.

Post Processing: To correct recognition errors an HMM based 10-gram charecter level language model (LM) is used. Character-level LMs typically correct out-ofvocabulary words better than word-level LMs