Tag: AI(人工智能)

Posted 2024-04-2910 minutes read (About 1510 words)

karpathy 的300行mini-gpt

karpathy 300行实现了mini-gpt，是一个很好的学习范例。

Posted 2024-04-2912 minutes read (About 1766 words)

本意是为了seq2seq学习而设计出的编码器解码器架构，有个弊端。
对于编码器：

Posted 2024-04-298 minutes read (About 1206 words)

在一个空间内，有query，keys，values，然后产生了注意力汇聚。

Posted 2024-04-298 minutes read (About 1125 words)

自注意力的意思是，query，key，value都是同一个X。
说明一个词语会咨询所有其他的词元，看其相似度来计算value值。
所以最后演变成下面的结构。

Posted 2024-04-293 minutes read (About 410 words)

训练seq2seq代码

Posted 2024-04-2914 minutes read (About 2115 words)

从上述例子中，知道a函数，注意力评分主要是用来衡量，query和key值得相似度。

Posted 2024-04-2910 minutes read (About 1516 words)

在多头自注意力，和encoder-decoder架构上，基本构成了一个transformer的架构。
但是transformer的架构，还有一些其他的优化。

Posted 2024-04-09a minute read (About 192 words)

两个例子 GRU，LSTM，但是层数都没有什么增加。

Posted 2024-04-093 minutes read (About 435 words)

隐变量模型，隐藏变量存在一个问题，就是长期信息的保存，和短期输入的缺失。

Posted 2024-04-093 minutes read (About 477 words)

门控rnn是一个现代的rnn的变体。
在rnn计算梯度的时候，往往出现了消失爆炸的情况。