Methods#
Integrated Gradients
Compute attribution using integrated gradients.
Steering Vectors
Modify model behavior by adding vectors to intermediate activations.
Linear Probing
Train classifiers on model representations to understand what information is encoded.
Bilinear Probing
Train bilinear probes on paired layer representations to capture interaction structure.
Dimension Estimation
Estimate intrinsic dimension of data manifolds using TwoNN, Local PCA, and related methods.
Representation Similarity
Compare latent representations with CKA and leave room for additional similarity metrics.