Adam W. Harley, Alex Ufkes, and Konstantinos G. Derpanis (2016)
Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval

Paper: http://www.cs.cmu.edu/~aharley/icdar15/

The experiments in this paper show that CNNs trained on natural images such as ImageNet can transfer well to document analysis tasks like document image classification and document retrival.
rvl-cdip_dataset

Document classification using AlexNet

  • Considers a classification task on a document image dataset namely RVL-CDIP consisisting of 4 lakh documents with 16 classes as shown in the above figure.

  • Uses a AlexNet pre-trained on ImagNet dataset for transfer learning and achives an accuary of about 90 percent on RVL-CDIP.

  • Studies whether enforcing region-specific feature-learning for CNNs is useful for improving document classification by using an ensemble of CNNs as shown below.

  • Specifically 5 CNNs are used which take the a) Whole image, b)Header, c)Footer, d)Left Body, e)Right body of the documents.

  • Observes that there is no significant improvement in explicitly enforcing such region-specific features compared to just having one CNN where the whole image is passed. There by concludes that CNNs seem to inherently learn those implicitly.

cnn-ensemble

Document Image Retrieval

  • Given a test document, retrieval was performed by computing the Euclidean distance between the test document descriptor and every descriptor
    of the training set.

  • The sorted distances were then used to rank the images of the training data, and return a sorted list of documents for each test query.

  • A retrieved document is said to be relevant if it belongs to the same class as of the test document. Results obtained for few test samples are shown below.

retreival

  • Query images are shown in the first column, and the top ten retrievals are shown in the following columns in order. Retrievals from the same class are shown with a green border; retrievals from a different class are shown with a red border. Retrievals from other classes are considered incorrect, but they are often good retrievals nonetheless.

Datasets used