GPT1: Improving Language Understanding by Generative Pre-Training, Technical report, OpenAI, 2018
[Paper Review] GPT1: Improving Language Understanding by Generative Pre-Training, Technical report, OpenAI, 2018
Learn a universal representation that transfers with little adaptation to a wide range of tasks
word-level information by unlabeled data 1) unclear of optimization objectives for effective transfer 2) no consensus on the most effective way to transfer these learned representations to the target task
기존의 다른 pretrained LM 들의 한계 (feature based approach) 1) restrict on short rangeELMo는 LSTM 기반 2) downstream task 를 위해 추가적인 아키텍처 필요
Solution :
- two-stage semi-supervised approachcombination of unsupervised pre-training and supervised fine-tuning 1) generative pre-training of LM on a diverse corpus of unlabeled text (unsupervised) 2) then, discriminative fine-tuning on each specific task (supervised)
- task-aware input transformation during fine-tuning for effective transfer w/ minimal changes of model architecture
- use Transformer decoder for LM for long-term dependencies
- task-specific input adaptations for robust transfer performance (traversal style)
- Evaluation on NLI, QA & Commonsense Reasoning, Sentence similarity, Classification task
- Effect of number of layers transferredincreasing number of layers ~ transfer performance
source: Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding with unsupervised learning. Technical report, OpenAI.
Leave a comment