BERT – Embedding Algorithm

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Pretty…I will keep it

Basic Overview:

BERT is a pre-trained neural network that has learned to take in sentences with some words obscured, and predict those obscured words (their dictionary id, at least). It is a deep, bidirectional network, and can be fine-tuned to a specific question/answer or other language task with a “small” amount of training data. Thus far, it is part of the ensemble method that is winning the SQuAD challenge, the question answering dataset out of Stanford.

BERT is meant to be used for transfer learning (fixing the weights of a neural network, adding a few layers on top, then training just the last few layers to a specific task). It is NOT a set of embeddings.

The model architecture is a multi-layer bidirectional Transformer encoder. We’ll find out what a Transformer encoder is in a moment, but it is available in the tensor2tensor library:

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s