MacBERT：MLM as correction BERT

2021-10-23

1 min read

NLP , Papers

簡介

這顆Model是朋友跟我說可以用用看，很多中文的比賽都用這個Model我才知道，因此簡單記錄一下

這個Model 主要有這些貢獻

用 WWM (whole word masking) 跟 n-gram 來選要被替換的tokens
- uni-gram - 4-gram 的比例是 40%、30%、20%、10%。
原始BERT模型使用[MASK] token 進行mask，但是[MASK] token 在fine-tuning / downstream task不會出現，會造成預訓練任務與fintuning的不一致，因此用同義詞來替換token
- A similar word is obtained by using Synonyms toolkit (Wang and Hu, 2017), which is based on word2vec
- If an N-gram is selected to mask, we will find similar words individually. In rare cases, when there is no similar word, we will degrade to use random word replacement
15% input words for masking
- 80% will replace with similar words
- 10% replace with a random word
- keep with original words for the rest of 10%

Untitled

Comparisons of the pre-trained language models. (AE: Auto-Encoder, AR: Auto-Regressive, T: Token, S: Segment, P: Position, W: Word, E: Entity, Ph: Phrase, WWM: Whole Word Masking, NM: N-gram Masking, NSP: Next Sentence Prediction, SOP: Sentence Order Prediction, MLM: Masked LM, PLM: Permutation LM, Mac: MLM as correction)

Untitled