본문 바로가기
AI/딥러닝 기초(Deep learning)

[Pytorch][BERT] 버트 소스코드 이해_⑨ BERT model 출력값

by Hyen4110 2022. 10. 28.

[Pytorch][BERT] 버트 소스코드 이해 목차

BERT  📑 BERT Config    
  📑 BERT Tokenizer      
  📑 BERT Model 📑 BERT Input    
    📑 BERT Output 👀    
    📑 BERT Embedding    
    📑 BERT Pooler    
    📑 BERT Enocder 📑 BERT Layer 📑 BERT SelfAttention
        📑 BERT SelfOtput

BaseModel 출력값

class BertModel(BertPreTrainedModel):
	def forward(...):

    ........

		return BaseModelOutputWithPoolingAndCrossAttentions(
            last_hidden_state = **sequence_output**,
            pooler_output = **pooled_output**,
            past_key_values = **encoder_outputs**.past_key_values,
            hidden_states = **encoder_outputs**.hidden_states,
            attentions = **encoder_outputs**.attentions,
            cross_attentions = **encoder_outputs**.cross_attentions,
		        )

 

1. BaseModelOutputWithPoolingAndCrossAttentions 클래스??

: 인덱싱 가능하도록 딕셔너리 형태로 바꾸어주는 역할 (별로 안 중요)

: Has a _getitem that allows indexing by integer or slice (like a tuple) or strings (like a dictionary) that will ignore the None attributes.

 

2. 최종 출력값들!

  변수명 type 설명
1 last_hidden_state torch.FloatTensor 모델의 마지막 layer의 hidden_state
: shape  (batch_size, sequence_length, hidden_size)

: Sequence of **hidden-states at the output of the last layer of the model.
2 pooler_output torch.FloatTensor : 첫번째 토큰([CLS])의 마지막 hidden_state
: : shape (batch_size, hidden_size)

: Last layer hidden-state of the first token of the sequence (classification token) after further processing through the layers used for the auxiliary pretraining task. 
: E.g. for BERT-family of models, this returns the classification token after processing through a linear layer and a tanh activation function. The linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.
3 hidden_states tuple(torch.FloatTensor) : (optional) 모델의 각 layer의 hidden_state 값들
: returned when 'output_hidden_states=True' is passed or when 'config.output_hidden_states=True'
: shape (batch_size, sequence_length, hidden_size)

: Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of
: Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
4 attentions Tuple of  torch.FloatTensor (one for each layer)  :(optional)각 Head별 attention 가중치
: returned when 'output_attentions=True' is passed or when 'config.output_attentions=True')
: shape (batch_size, num_heads, sequence_length, sequence_length)

: Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
5 cross_attentions Tuple of torch.FloatTensor (one for each layer) : decoder의 attention 가중치
: : returned when 'output_attentions=True' and 'config.add_cross_attention=True' is passed or when 'config.output_attentions=True`)
: shape (batch_size, num_heads, sequence_length, sequence_length)

: Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
6 past_key_values tuple(tuple(torch.FloatTensor)) of length config.n_layers  : 이전에 연산된 hidden_states (self-attention의 key와 value들)로 decoding 연산을 높이기 위해 쓰임!
: : shape (batch_size, num_heads, sequence_length - 1,embed_size_per_head)

: Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding.
: If 'past_key_values' are used, the user can optionally input only the last 'decoder_input_ids' (those that don't have their past key value states given to this model) of shape (batch_size, 1) instead of all 'decoder_input_ids' of shape (batch_size, sequence_length)
7 use_cache bool : (optional) true로 설정되면 past_key_values가 반환됨

: If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values)

 

 

댓글