Methods#

Integrated Gradients

Compute attribution using integrated gradients.

notebooks/methods/integrated-gradients.ipynb
Steering Vectors

Modify model behavior by adding vectors to intermediate activations.

notebooks/methods/steering-vectors.ipynb
Linear Probing

Train classifiers on model representations to understand what information is encoded.

notebooks/methods/linear-probing.ipynb
Bilinear Probing

Train bilinear probes on paired layer representations to capture interaction structure.

notebooks/methods/bilinear-probing.ipynb
Dimension Estimation

Estimate intrinsic dimension of data manifolds using TwoNN, Local PCA, and related methods.

notebooks/methods/dimension-estimation.ipynb
Representation Similarity

Compare latent representations with CKA and leave room for additional similarity metrics.

notebooks/methods/representation-similarity.ipynb