BERT – Embedding Algorithm

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

https://arxiv.org/abs/1810.04805

post
Pretty…I will keep it

Basic Overview:

BERT is a pre-trained neural network that has learned to take in sentences with some words obscured, and predict those obscured words (their dictionary id, at least). It is a deep, bidirectional network, and can be fine-tuned to a specific question/answer or other language task with a “small” amount of training data. Thus far, it is part of the ensemble method that is winning the SQuAD challenge, the question answering dataset out of Stanford.

BERT is meant to be used for transfer learning (fixing the weights of a neural network, adding a few layers on top, then training just the last few layers to a specific task). It is NOT a set of embeddings.

The model architecture is a multi-layer bidirectional Transformer encoder. We’ll find out what a Transformer encoder is in a moment, but it is available in the tensor2tensor library: http://nlp.seas.harvard.edu/2018/04/03/attention.html

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s