BERT Paper Reading Notes
Record problems in the “BERT” paper.
Process Analysis
- masked language model: randomly mask some of the input tokens, and predict the originally vocab id of the mask ones.
- next sentence prediction: pretrains text-pair representations
- Unsupervised Feature-based Approaches.
- Pre-trained word embeddings.
Problems
- what is a masked language model?