[Pytorch][BERT] 버트 소스코드 이해

[Pytorch][BERT] 버트 소스코드 이해 목차
BERT	📑 BERT Config
	📑 BERT Tokenizer
	📑 BERT Model	📑 BERT Input
		📑 BERT Output 👀
		📑 BERT Embedding
		📑 BERT Pooler
		📑 BERT Enocder	📑 BERT Layer	📑 BERT SelfAttention
				📑 BERT SelfOtput

BaseModel 출력값

class BertModel(BertPreTrainedModel):
	def forward(...):

    ........

		return BaseModelOutputWithPoolingAndCrossAttentions(
            last_hidden_state = **sequence_output**,
            pooler_output = **pooled_output**,
            past_key_values = **encoder_outputs**.past_key_values,
            hidden_states = **encoder_outputs**.hidden_states,
            attentions = **encoder_outputs**.attentions,
            cross_attentions = **encoder_outputs**.cross_attentions,
		        )

1. BaseModelOutputWithPoolingAndCrossAttentions 클래스??

: 인덱싱 가능하도록 딕셔너리 형태로 바꾸어주는 역할 (별로 안 중요)

: Has a _getitem that allows indexing by integer or slice (like a tuple) or strings (like a dictionary) that will ignore the None attributes.

2. 최종 출력값들!

	변수명	type	설명
1	last_hidden_state	torch.FloatTensor	: 모델의 마지막 layer의 hidden_state : shape (batch_size, sequence_length, hidden_size) : Sequence of **hidden-states at the output of the last layer of the model.
2	pooler_output	torch.FloatTensor	: 첫번째 토큰([CLS])의 마지막 hidden_state : : shape (batch_size, hidden_size) : Last layer hidden-state of the first token of the sequence (classification token) after further processing through the layers used for the auxiliary pretraining task. : E.g. for BERT-family of models, this returns the classification token after processing through a linear layer and a tanh activation function. The linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.
3	hidden_states	tuple(torch.FloatTensor)	: (optional) 모델의 각 layer의 hidden_state 값들 : returned when 'output_hidden_states=True' is passed or when 'config.output_hidden_states=True' : shape (batch_size, sequence_length, hidden_size) : Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of : Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
4	attentions	Tuple of torch.FloatTensor (one for each layer)	:(optional)각 Head별 attention 가중치 : returned when 'output_attentions=True' is passed or when 'config.output_attentions=True') : shape (batch_size, num_heads, sequence_length, sequence_length) : Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
5	cross_attentions	Tuple of torch.FloatTensor (one for each layer)	: decoder의 attention 가중치 : : returned when 'output_attentions=True' and 'config.add_cross_attention=True' is passed or when 'config.output_attentions=True`) : shape (batch_size, num_heads, sequence_length, sequence_length) : Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
6	past_key_values	tuple(tuple(torch.FloatTensor)) of length config.n_layers	: 이전에 연산된 hidden_states (self-attention의 key와 value들)로 decoding 연산을 높이기 위해 쓰임! : : shape (batch_size, num_heads, sequence_length - 1,embed_size_per_head) : Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding. : If 'past_key_values' are used, the user can optionally input only the last 'decoder_input_ids' (those that don't have their past key value states given to this model) of shape (batch_size, 1) instead of all 'decoder_input_ids' of shape (batch_size, sequence_length)
7	use_cache	bool	: (optional) true로 설정되면 past_key_values가 반환됨 : If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values)

'AI > 딥러닝 기초(Deep learning)' 카테고리의 다른 글

[Pytorch][BERT] 버트 소스코드 이해_⑪ BertSelfAttention (0)	2022.10.28
[Pytorch][BERT] 버트 소스코드 이해_⑩ BERT Layer (0)	2022.10.28
[Pytorch][BERT] 버트 소스코드 이해_⑧ BERT model 입력값 (0)	2022.10.28
[Pytorch][BERT] 버트 소스코드 이해_⑦ Bert Pooler (0)	2022.10.28
[딥러닝][기초] 가중치 초기화(Weight Initializers) (0)	2021.10.06

Hyen4110

[Pytorch][BERT] 버트 소스코드 이해_⑨ BERT model 출력값

[Pytorch][BERT] 버트 소스코드 이해 목차

BaseModel 출력값

1. BaseModelOutputWithPoolingAndCrossAttentions 클래스??

2. 최종 출력값들!

'AI > 딥러닝 기초(Deep learning)' 카테고리의 다른 글

댓글

티스토리툴바

[Pytorch][BERT] 버트 소스코드 이해_⑨ BERT model 출력값

[Pytorch][BERT] 버트 소스코드 이해 목차

BaseModel 출력값

1. BaseModelOutputWithPoolingAndCrossAttentions 클래스??

2. 최종 출력값들!

'AI > 딥러닝 기초(Deep learning)' 카테고리의 다른 글

관련글

댓글

티스토리툴바