BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL 2019

less than 1 minute read

[Paper Review] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL 2019

Goal

Propose powerful pretrained LM for general language representation

unidirectional LM restrict the power of pre-trained representations conditional objectives are not for bidirectional approach

bidirectional LM

performance evaluation on GLUE(General Language Understanding Evaluation), SQuAD(Stanford Question Answering Dataset) and SWAG(Situations With Adversarial Generations)
Effect of Pre-training tasks (~ Ablation) , Model Size
Compare with feature-based approach on NER task

decoder는 내부에 masked attention(subsequent positions을 attenting 할 수 없도록 masking ex. t스텝에 t+1스텝 이후의 포지션들을 못 보도록 masking)을 해서 BERT가 원하는 방식대로 MLM을 할 수 없어서 encoder를 사용한 것 같다