Used in every Transformer layer (GPT, BERT, LLaMA) to stabilize training. Normalizes across the **feature dimension** of a single sample.
LayerNorm(x) = (x - μ) / √(σ² + ε)Where μ = mean(x), σ² = variance(x), ε = 1e-8
layer_norm([1.0, 2.0, 3.0]) → [-1.22474, 0.0, 1.22474]
Round to **5 decimal places**.
Test Cases (2 visible · 2 hidden)
Case 1: Basic 3-element
Input: layer_norm([1.0,2.0,3.0])
Expected: [-1.22474, 0.0, 1.22474]
Case 2: All zeros
Input: layer_norm([0.0,0.0,0.0])
Expected: [0.0, 0.0, 0.0]
⌘↵ Run · ⌘⇧↵ Submit