Implement the core attention operation.
Attention(Q, K, V) = softmax(QKᵀ / √dₖ) × V- Q: (n_q × d_k), K: (n_k × d_k), V: (n_k × d_v)
- Output: (n_q × d_v), each value rounded to 5 decimal places
Test Cases (2 visible · 1 hidden)
Case 1: Single query
Input: dot_product_attention([[1,0]],[[1,0],[0,1]],[[1,0],[0,1]])
Expected: [[0.73106, 0.26894]]
Case 2: Two queries
Input: dot_product_attention([[1,0],[0,1]],[[1,0],[0,1]],[[2,0],[0,2]])
Expected: [[1.46211, 0.53789], [0.53789, 1.46211]]
⌘↵ Run · ⌘⇧↵ Submit