tdhook.metrics#

Classes#

Module Contents#

class tdhook.metrics.InfidelityMetric(n_perturb_samples=10)[source]#
Parameters:

n_perturb_samples (int)

n_perturb_samples = 10[source]#
__call__(module, original_data)[source]#

Compute infidelity as the difference between attribution-weighted perturbations and model output changes.

Parameters:
Return type:

tensordict.TensorDict

_perturb_data(data, in_keys)[source]#

Add random noise to create perturbations.

Parameters:
  • data (tensordict.TensorDict)

  • in_keys (List[str])

Return type:

tensordict.TensorDict

class tdhook.metrics.SensitivityMetric(perturb_radius=0.02)[source]#
Parameters:

perturb_radius (float)

perturb_radius = 0.02[source]#
__call__(module, original_data)[source]#

Compute sensitivity as the relative change in explanation when input is perturbed.

Parameters:
Return type:

tensordict.TensorDict

_perturb_data(data, in_keys)[source]#

Add random noise within the perturbation radius.

Parameters:
  • data (tensordict.TensorDict)

  • in_keys (List[str])

Return type:

tensordict.TensorDict