GPT1: Improving Language Understanding by Generative Pre-Training, Technical report, OpenAI, 2018
[Paper Review] GPT1: Improving Language Understanding by Generative Pre-Training, Technical report, OpenAI, 2018
Goal
Learn a universal representation that transfers with little adaptation to a wide range of tasks
Challenge
-
word-level information by unlabeled data 1) unclear of optimization objectives for effective transfer 2) no consensus on the most effective way to transfer these learned representations to the target task
-
기존의 다른 pretrained LM 들의 한계 (feature based approach) 1) restrict on short rangeELMo는 LSTM 기반 2) downstream task 를 위해 추가적인 아키텍처 필요
Solution :
- two-stage semi-supervised approachcombination of unsupervised pre-training and supervised fine-tuning 1) generative pre-training of LM on a diverse corpus of unlabeled text (unsupervised) 2) then, discriminative fine-tuning on each specific task (supervised)
- task-aware input transformation during fine-tuning for effective transfer w/ minimal changes of model architecture
Method:
- use Transformer decoder for LM for long-term dependencies
- task-specific input adaptations for robust transfer performance (traversal style)
Evaluation:
- Evaluation on NLI, QA & Commonsense Reasoning, Sentence similarity, Classification task
- Effect of number of layers transferredincreasing number of layers ~ transfer performance
source: Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding with unsupervised learning. Technical report, OpenAI. https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf
Leave a comment