|
logits = self.lm_head(outputs.hidden_states[early_exit_layer]) |
I think you guys should apply model.norm layer to hidden_states[early_exit_layer] . Because only the last hidden_state applied model.norm layer. See
|
hidden_states = self.norm(hidden_states) |
DoLa/transformers-4.28.1/src/transformers/models/llama/modeling_llama.py
Line 703 in dc88907
I think you guys should apply model.norm layer to hidden_states[early_exit_layer] . Because only the last hidden_state applied model.norm layer. See
DoLa/transformers-4.28.1/src/transformers/models/llama/modeling_llama.py
Line 594 in dc88907