BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL 2019
[Paper Review] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL 2019
Goal
Propose powerful pretrained LM for general language representation
Challenge
- unidirectional LM restrict the power of pre-trained representations conditional objectives are not for bidirectional approach
Solutions
bidirectional LM
Method
- transformer encoder as LM
- pre-train by MLM, NSP
Evaluation:
- performance evaluation on GLUE(General Language Understanding Evaluation), SQuAD(Stanford Question Answering Dataset) and SWAG(Situations With Adversarial Generations)
- Effect of Pre-training tasks (~ Ablation) , Model Size
- Compare with feature-based approach on NER task
Etc.
- decoder는 내부에 masked attention(subsequent positions을 attenting 할 수 없도록 masking ex. t스텝에 t+1스텝 이후의 포지션들을 못 보도록 masking)을 해서 BERT가 원하는 방식대로 MLM을 할 수 없어서 encoder를 사용한 것 같다
Leave a comment