Problem Definition

To design a unified architecture that performs various NLP tasks such as part-of-speech tags, chunks, named entity tags, semantic roles and semantically similar words.


  • Single CNN based architecture for several related NLP tasks ranging from identifying part-of-speech tags to assigning semantic roles.
  • The first layer extracts features for each word.
  • The second layer extracts features from the sentence treating it as a sequence with local and global structure (i.e., it is not treated like a bag of words). The following layers are classical NN layers.
  • Uses look-up table based word embedding.
  • A stack of Time-Delay Neural Networks (TDNNs) are used to handle inputs of variable length sequence.


Training details

  • The entire network is trained jointly on all the tasks using weight-sharing, an instance of multitask learning as shown in Figure 2.
  • All the tasks use labeled data except the language model which is learnt from unlabeled text
  • Training is achieved in a stochastic manner by looping over the tasks:
    • Select the next task.
    • Select a random training example for this task.
    • Update the NN for this task by taking a gradient step with respect to this example.
    • Go to 1.
  • It is worth noticing that labeled data for training each task can come from completely different datasets.


Data set details

  • Prop Bank dataset for Semantic role labelling experiments.
  • Penn Treebank for parts of speech identification and chunking experiments
  • Language models were trained on Wikipedia.