Implement a single-head self-attention layer.
This problem asks you to implement the forward pass of a single-head self-attention layer. Starting from an input sequence, you project it into query, key, and value representations, compute scaled attention scores, apply softmax normalization, and use the resulting weights to form context-aware outputs as weighted sums of the values. The key challenges are handling matrix shapes correctly, applying scaling, and keeping the implementation numerically stable.