tdhook.latent.steering_vectors#

Classes#

SteeringVectors

Steering vectors [24].

ActivationAddition

Factory for creating hooking contexts.

Module Contents#

class tdhook.latent.steering_vectors.SteeringVectors(modules_to_steer, steer_fn)[source]#

Bases: tdhook.contexts.HookingContextFactory

Steering vectors [24].

Parameters:
  • modules_to_steer (List[str])

  • steer_fn (Callable)

_modules_to_steer[source]#
_steer_fn[source]#
_hook_module(module)[source]#
Parameters:

module (tdhook.modules.HookedModule)

Return type:

tdhook.hooks.MultiHookHandle

class tdhook.latent.steering_vectors.ActivationAddition(modules_to_steer, positive_key='positive', negative_key='negative', steer_key='steer', clean_intermediate_keys=True, cache_callback=None)[source]#

Bases: tdhook.contexts.HookingContextFactory

Factory for creating hooking contexts.

Parameters:
_modules_to_steer[source]#
_positive_key = 'positive'[source]#
_negative_key = 'negative'[source]#
_steer_key = 'steer'[source]#
_clean_intermediate_keys = True[source]#
_cache_callback = None[source]#
_prepare_module(module, in_keys, out_keys, extra_relative_path)[source]#
Parameters:
Return type:

tensordict.nn.TensorDictModuleBase

_hook_module(module)[source]#
Parameters:

module (tdhook.modules.HookedModule)

Return type:

tdhook.hooks.MultiHookHandle

_compute_steering_vectors(td)[source]#
Parameters:

td (tensordict.TensorDict)

Return type:

tensordict.TensorDict