less than 1 minute read

[Paper Review] GPT1: Improving Language Understanding by Generative Pre-Training, Technical report, OpenAI, 2018

Goal

Learn a universal representation that transfers with little adaptation to a wide range of tasks

Challenge

  • word-level information by unlabeled data 1) unclear of optimization objectives for effective transfer 2) no consensus on the most effective way to transfer these learned representations to the target task

  • 기존의 다른 pretrained LM 들의 한계 (feature based approach) 1) restrict on short rangeELMo는 LSTM 기반 2) downstream task 를 위해 추가적인 아키텍처 필요

Solution :

  • two-stage semi-supervised approachcombination of unsupervised pre-training and supervised fine-tuning 1) generative pre-training of LM on a diverse corpus of unlabeled text (unsupervised) 2) then, discriminative fine-tuning on each specific task (supervised)
  • task-aware input transformation during fine-tuning for effective transfer w/ minimal changes of model architecture

Method:

  • use Transformer decoder for LM for long-term dependencies
  • task-specific input adaptations for robust transfer performance (traversal style)

Evaluation:

  • Evaluation on NLI, QA & Commonsense Reasoning, Sentence similarity, Classification task
  • Effect of number of layers transferredincreasing number of layers ~ transfer performance

source: Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding with unsupervised learning. Technical report, OpenAI. https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf

Leave a comment