[Pytorch][BERT] 버트 소스코드 이해 목차 |
||||
BERT | 📑 BERT Config | |||
📑 BERT Tokenizer | ||||
📑 BERT Model | 📑 BERT Input | |||
📑 BERT Output 👀 | ||||
📑 BERT Embedding | ||||
📑 BERT Pooler | ||||
📑 BERT Enocder | 📑 BERT Layer | 📑 BERT SelfAttention | ||
📑 BERT SelfOtput |
BaseModel 출력값
class BertModel(BertPreTrainedModel):
def forward(...):
........
return BaseModelOutputWithPoolingAndCrossAttentions(
last_hidden_state = **sequence_output**,
pooler_output = **pooled_output**,
past_key_values = **encoder_outputs**.past_key_values,
hidden_states = **encoder_outputs**.hidden_states,
attentions = **encoder_outputs**.attentions,
cross_attentions = **encoder_outputs**.cross_attentions,
)
1. BaseModelOutputWithPoolingAndCrossAttentions 클래스??
: 인덱싱 가능하도록 딕셔너리 형태로 바꾸어주는 역할 (별로 안 중요)
: Has a _getitem that allows indexing by integer or slice (like a tuple) or strings (like a dictionary) that will ignore the None attributes.
2. 최종 출력값들!
변수명 | type | 설명 | |
1 | last_hidden_state | torch.FloatTensor | : 모델의 마지막 layer의 hidden_state : shape (batch_size, sequence_length, hidden_size) : Sequence of **hidden-states at the output of the last layer of the model. |
2 | pooler_output | torch.FloatTensor | : 첫번째 토큰([CLS])의 마지막 hidden_state : : shape (batch_size, hidden_size) : Last layer hidden-state of the first token of the sequence (classification token) after further processing through the layers used for the auxiliary pretraining task. : E.g. for BERT-family of models, this returns the classification token after processing through a linear layer and a tanh activation function. The linear layer weights are trained from the next sentence prediction (classification) objective during pretraining. |
3 | hidden_states | tuple(torch.FloatTensor) | : (optional) 모델의 각 layer의 hidden_state 값들 : returned when 'output_hidden_states=True' is passed or when 'config.output_hidden_states=True' : shape (batch_size, sequence_length, hidden_size) : Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of : Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. |
4 | attentions | Tuple of torch.FloatTensor (one for each layer) | :(optional)각 Head별 attention 가중치 : returned when 'output_attentions=True' is passed or when 'config.output_attentions=True') : shape (batch_size, num_heads, sequence_length, sequence_length) : Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads. |
5 | cross_attentions | Tuple of torch.FloatTensor (one for each layer) | : decoder의 attention 가중치 : : returned when 'output_attentions=True' and 'config.add_cross_attention=True' is passed or when 'config.output_attentions=True`) : shape (batch_size, num_heads, sequence_length, sequence_length) : Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads. |
6 | past_key_values | tuple(tuple(torch.FloatTensor)) of length config.n_layers | : 이전에 연산된 hidden_states (self-attention의 key와 value들)로 decoding 연산을 높이기 위해 쓰임! : : shape (batch_size, num_heads, sequence_length - 1,embed_size_per_head) : Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding. : If 'past_key_values' are used, the user can optionally input only the last 'decoder_input_ids' (those that don't have their past key value states given to this model) of shape (batch_size, 1) instead of all 'decoder_input_ids' of shape (batch_size, sequence_length) |
7 | use_cache | bool | : (optional) true로 설정되면 past_key_values가 반환됨 : If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values) |
'AI > 딥러닝 기초(Deep learning)' 카테고리의 다른 글
[Pytorch][BERT] 버트 소스코드 이해_⑪ BertSelfAttention (0) | 2022.10.28 |
---|---|
[Pytorch][BERT] 버트 소스코드 이해_⑩ BERT Layer (0) | 2022.10.28 |
[Pytorch][BERT] 버트 소스코드 이해_⑧ BERT model 입력값 (0) | 2022.10.28 |
[Pytorch][BERT] 버트 소스코드 이해_⑦ Bert Pooler (0) | 2022.10.28 |
[딥러닝][기초] 가중치 초기화(Weight Initializers) (0) | 2021.10.06 |
댓글