A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning(2008)
September 16, 2023A Unified Architecture for Natural Language Processing is an instance of multitask learning. The first layer is a lookup table that stores embeddings of a fixed dictionary and size. The second layer is a Time-Delay Neural Networks layer. It extracts features from the sentence treating it as a sequence with local structure. The third layer takes the maximum value for each of the output features of the second layer over time. The following layers are classical NN layers. The lookup-table is shared among the tasks, and the other layers can be task specific to each task.
Training is achieved in an online learning manner by looping over the tasks. A training example is randomly selected for the next task, and the weights of the model are updated by taking a gradient step with respect to the example.
Remark
The online learning can be replaced with mini-batch learning.