TheLittleBaby
( GPT Architecture )
TheLittleBaby...

Specifications

These are learnable parameters inside the model, and they’re randomly initialized at the start of training. Over time, through backpropagation, they’re updated to help the model learn meaningful attention patterns.
  • self.q_proj = nn.Linear(n_emb, head_size) -> W Q
  • self.k_proj = nn.Linear( n_emb , head_size ) -> W K
  • self.v_proj = nn.Linear( n_emb , head_size ) -> W V
Once the model is initialized, here’s the flow:

Input embeddings (𝑋) (e.g. from a tokenized sentence) are passed in.

The model computes:
  • Q = self.q_proj.forward(x) -> Q = X * W Q
  • K = self.k_proj.forward(x) -> K = X * W K
  • V = self.v_proj.forward(x) -> V = X * W V

Compute raw attention scores by calculating the dot product between the Query and Key(T) vectors:
  • Score ij = Q i * K j T
Apply softmax by converting the scores into probabilities:
  • attention_weights 𝑖𝑗 =softmax(score 𝑖𝑗 )
Apply weighted sum of values by multiplying the attention weights by the Value vectors to get the final output:
  • output 𝑖 =βˆ‘π‘—(attention_weights 𝑖𝑗 *⋅𝑉 𝑗 )

Specifications...
Koureas Stavros
Koureas Stavros

Text

Your

journey

starts

with

one

step

Text...

(X) Input

[0.43, 0.15, 0.89]

[0.55, 0.87, 0.66]

[0.57, 0.85, 0.64]

[0.22, 0.58, 0.33]

[0.77, 0.25, 0.10]

[0.05, 0.80, 0.55]

(X) Input...

Emb 6x3

[0.43, 0.15, 0.89]

[0.55, 0.87, 0.66]

[0.57, 0.85, 0.64]

[0.22, 0.58, 0.33]

[0.77, 0.25, 0.10]

[0.05, 0.80, 0.55]

Emb 6x3...

Emb 3x6 (T)

                                        [0.43, 0.55, 0.57, 0.22, 0.77, 0.05]
                                        
[0.15, 0.87, 0.85, 0.58, 0.25, 0.80]
[0.89, 0.66, 0.64, 0.33, 0.10, 0.55]
Emb 3x6 (T)...

Emb 6x6 (Att Scores)

                                        
[0.8945, 0.7254, 0.7284, 0.4080, 0.4784, 0.7000]
[0.7254, 1.3340, 1.2783, 0.8775, 0.6072, 1.1541]
[0.7284, 1.2783, 1.2640, 0.8375, 0.5893, 1.1087]
[0.4080, 0.8775, 0.8375, 0.4761, 0.3379, 0.7489]
[0.4784, 0.6072, 0.5893, 0.3379, 0.6604, 0.3125]
[0.7000, 1.1541, 1.1087, 0.7489, 0.3125, 1.0025]
Emb 6x6 (Att Scores)...

Text

Your

journey

starts

with

one

step

Text...

Text

Your, journey, starts, with, one, step

Text...

Emb 6x6 (Att Weights)

                                        
[0.2162, 0.1832, 0.1837, 0.1211, 0.1300, 0.1658]
[0.1404, 0.2483, 0.2351, 0.1543, 0.1133, 0.1086]
[0.1444, 0.2373, 0.2339, 0.1441, 0.1106, 0.1297]
[0.1443, 0.2089, 0.2004, 0.1536, 0.1350, 0.1578]
[0.1604, 0.1903, 0.1870, 0.1364, 0.1984, 0.1275]
[0.1440, 0.2248, 0.2151, 0.1531, 0.1083, 0.1547]
Emb 6x6 (Att Weights)...

Emb 6x3

[0.43, 0.15, 0.89]

[0.55, 0.87, 0.66]

[0.57, 0.85, 0.64]

[0.22, 0.58, 0.33]

[0.77, 0.25, 0.10]

[0.05, 0.80, 0.55]

Emb 6x3...

Text

Your

journey

starts

with

one

step

Text...

Emb 6x3 (Output)

                                        
[2.2813, 1.5864, 4.7172]
[4.6893, 7.0426, 5.3382]
[4.9244, 7.3395, 5.5212]
[2.7985, 3.6972, 2.1012]
[2.9052, 1.0152, 0.4061]
[3.8994, 6.2392, 4.2871]
Emb 6x3 (Output)...
Matrix Multiplication
Matrix Multip...
Normilization
(Softmax)
(Per Row)
Normilization...
Text
Text
Q (Query)
Q (Query)
Matrix Multiplication
Matrix Multip...
V (Value)
V (Value)

K (Key)

K (Key)

W Q (Query)

[0.43, 0.15, 0.89]

[0.55, 0.87, 0.66]

[0.57, 0.85, 0.64]

[0.22, 0.58, 0.33]

[0.77, 0.25, 0.10]

[0.05, 0.80, 0.55]

WQ (Query)...

W K (Key)

[0.43, 0.15, 0.89]

[0.55, 0.87, 0.66]

[0.57, 0.85, 0.64]

[0.22, 0.58, 0.33]

[0.77, 0.25, 0.10]

[0.05, 0.80, 0.55]

WK (Key)...

W V (Value)

[0.43, 0.15, 0.89]

[0.55, 0.87, 0.66]

[0.57, 0.85, 0.64]

[0.22, 0.58, 0.33]

[0.77, 0.25, 0.10]

[0.05, 0.80, 0.55]

WV (Value)...

Projects

Projects

W Q , W K , W V

These are normaly random initialized values but for simplicity now we use same as input

WQ, WK, WV...

Emb 6x3

[0.43, 0.15, 0.89]

[0.55, 0.87, 0.66]

[0.57, 0.85, 0.64]

[0.22, 0.58, 0.33]

[0.77, 0.25, 0.10]

[0.05, 0.80, 0.55]

Emb 6x3...

Text

Your

journey

starts

with

one

step

Text...
X input
Xinput
Matrix Multiplication
Matrix Multip...

W QKV

WQKV

Emb 6x3

[0.43, 0.15, 0.89]

[0.55, 0.87, 0.66]

[0.57, 0.85, 0.64]

[0.22, 0.58, 0.33]

[0.77, 0.25, 0.10]

[0.05, 0.80, 0.55]

Emb 6x3...

Text

Your

journey

starts

with

one

step

Text...

QKV

QKV
Q, K, V
Q, K, V
Text is not SVG - cannot display