Extracting text from PDF documents is a common pre-processing task for text analysis and NLP work. The main challenges tools face in extracting content from PDF files is that PDFs are composed of text, graphics and tabular structures encoded in a form designed for printing.
i-tagger - Neural Networks based Deep Learning models and tools for sequence tagging. Developing models to solve a problem for a data set at hand, requires lot of trial and error methods. With current projects, we find a difficulty with supporting different datasets and models in a modular way. i-tagger helps with easing preprocessing, training and prediction. https://github.com/Imaginea/i-tagger