tdhook.metrics#
Classes#
Module Contents#
- class tdhook.metrics.InfidelityMetric(n_perturb_samples=10)[source]#
- Parameters:
n_perturb_samples (int)
- __call__(module, original_data)[source]#
Compute infidelity as the difference between attribution-weighted perturbations and model output changes.
- Parameters:
module (tdhook.modules.HookedModule)
original_data (tensordict.TensorDict)
- Return type:
tensordict.TensorDict
- class tdhook.metrics.SensitivityMetric(perturb_radius=0.02)[source]#
- Parameters:
perturb_radius (float)
- __call__(module, original_data)[source]#
Compute sensitivity as the relative change in explanation when input is perturbed.
- Parameters:
module (tdhook.modules.HookedModule)
original_data (tensordict.TensorDict)
- Return type:
tensordict.TensorDict