Create the causal attention mask used in decoder-only transformers.
mask[i][j] = 1 if j ≤ i else 0Token at position i can attend to all positions 0..i but NOT future positions.
seq_len = 3: [[1, 0, 0], [1, 1, 0], [1, 1, 1]]
Return as a list of lists of integers.
Test Cases (2 visible · 2 hidden)
Case 1: 3x3 mask
Input: causal_mask(3)
Expected: [[1, 0, 0], [1, 1, 0], [1, 1, 1]]
Case 2: 2x2 mask
Input: causal_mask(2)
Expected: [[1, 0], [1, 1]]
⌘↵ Run · ⌘⇧↵ Submit