Every transformer block has two sub-layers: attention + this FFN. The FFN expands then contracts the representation.
FFN(x) = W2 × GELU(W1 × x + b1) + b2- W1: d_model → d_ff (typically 4×)
- W2: d_ff → d_model
- GELU activation (smoother than ReLU)
With random weights, verify output shape matches input shape.
Round to **5 decimal places**.
Test Cases (2 visible · 1 hidden)
Case 1: GELU at 0
Input: gelu(0.0)
Expected: 0.0
Case 2: GELU at 1
Input: gelu(1.0)
Expected: 0.84134
⌘↵ Run · ⌘⇧↵ Submit