98 - Analyzing Information Flow In Transformers, With Elena Voita

98 - Analyzing Information Flow In Transformers, With Elena Voita

NLP Highlights

What function do the different attention heads serve in multi-headed attention models? In this episode, Lena describes how to use attribution methods to assess the importance and contribution of different heads in severa…

Recent comments

  • justHeuristic

    Probably the best piece of insight into deep NLP since Karpa…

Avatar

Related tracks

See all