To design a unified architecture that performs various NLP tasks such as part-of-speech tags, chunks, named entity tags, semantic roles and semantically similar words.
- Single CNN based architecture for several related NLP tasks ranging from identifying part-of-speech tags to assigning semantic roles.
- The first layer extracts features for each word.
- The second layer extracts features from the sentence treating it as a sequence with local and global structure (i.e., it is not treated like a bag of words). The following layers are classical NN layers.
- Uses look-up table based word embedding.
- A stack of Time-Delay Neural Networks (TDNNs) are used to handle inputs of variable length sequence.
- The entire network is trained jointly on all the tasks using weight-sharing, an instance of multitask learning as shown in Figure 2.
- All the tasks use labeled data except the language model which is learnt from unlabeled text
- Training is achieved in a stochastic manner by looping over the tasks:
- Select the next task.
- Select a random training example for this task.
- Update the NN for this task by taking a gradient step with respect to this example.
- Go to 1.
- It is worth noticing that labeled data for training each task can come from completely different datasets.