alien.models.pytorch package

Submodules

alien.models.pytorch.last_layer module

Model wrapper for last layer linearization.

class alien.models.pytorch.last_layer.LastLayerPytorchLinearization(model=None, X=None, y=None, **kwargs)[source]

Bases: LastLayerLinearizableRegressor

“Last layer linearization for Pytorch-based models.

predict_with_embedding(X)[source]

Forward pass which returns the output of the penultimate layer along with the output of the last layer. If the last layer is not known yet, it will be determined when this function is called for the first time.

Parameters:: X – one batch of data to use as input for the forward pass

last_layer_embedding(X)[source]: Returns the activations of the last layer before the output.

linearization()[source]

Finds the last-layer linearization of the model in its current state.

Returns:: weights, bias

set_last_layer(last_layer_name: str) → None[source]

Set the last layer of the model by its name. This sets the forward hook to get the output of the penultimate layer.

Parameters:: last_layer_name – the name of the last layer (fixed in model.named_modules()).

Automatically determines the last layer of the model with one forward pass. It assumes that the last layer is the same for every forward pass and that it is an instance of torch.nn.Linear. Might not work with every architecture, but is tested with all PyTorch torchvision classification models (besides SqueezeNet, which has no linear last layer).

Parameters:: X – batch of samples used to find last layer.
Returns:: Returns the output of the forward pass, so as not to waste computation.

covariance(X): Returns the covariance of the epistemic uncertainty between all rows of X. This is where memory bugs often appear, because of the large matrices involved.

covariance_linear(X, block_size=1000)

property data

property embedding_method

find_method()

fit(X=None, y=None, reinitialize=None, fit_uncertainty=True, **kwargs)

Fits the model to the given training data. If X and y are not specified, this method looks for self.X and self.y. If fit() finds an X but not a y, it treats X as a combined dataset data, and then uses X, y = data.X, data.y. If we can’t find data.X and data.y, we instead use X, y = data[:-1], data[-1].

fit() should also fit any accompanying uncertainty model.

Parameters:

reinitialize – If True, reinitializes model weights before fitting. If False, starts training from previous weight values. If not specified, uses self.reinitialize)
fit_uncertainty – If True, a call to fit() will also call fit_uncertainty(). Defaults to True.

fit_model(X=None, y=None, **kwargs): Fit just the model component, and not the uncertainties (if these are computed separately)

fit_uncertainty(X=None, y=None): Fit just the uncertainties (if these need additional fitting beyond just the model)

initialize(init_seed=None, sample_input=None): (Re)initializes the model weights. If self.reinitialize is True, this should be called at the start of every fit(), and this should be the default behaviour of fit().

input_embedding(X)

static load(path): Loads a model. This particular implementation only works if save(path) hasn’t been overloaded.

method_names = {0: ['_embedding', 'embed', 'embeddings'], 1: ['last_layer_embedding', 'last_layer_embed', 'embed_last_layer', 'last_layer'], 2: ['input_embedding']}

property ndim

The number of axes in the feature space. Equal to len(self.shape). Most commonly equal to 1. If training data have been specified, then self.ndim == X.ndim - 1.

This property is used by any methods which use the @flatten_batch decorator.

abstract predict(X, return_std_dev=False)

Applies the model to input(s) X (with the last self.ndim axes corresponding to each sample), and returns prediction(s).

Parameters:: return_std_dev – if True, returns a tuple (prediction, std_dev)

predict_ensemble(X, multiple=1.0)

Returns a correlated ensemble of predictions for samples X.

Ensembles are correlated only over the last batch dimension, corresponding to axis (-1 - self.ndim) of X. Earlier dimensions have no guarantee of correlation.

Parameters:: multiple – standard deviation will be multiplied by this

predict_linear(X): Use the model’s linearization for a prediction.

predict_samples(X, n=1, multiple=1.0, use_covariance_for_ensemble=None): Makes a prediction for for the batch X, randomly selected from this model’s posterior distribution. Gives an ensemble of predictions, with shape (len(X), n).

save(path)

Saves the model. May well be overloaded by subclasses, if they contain non-picklable components (or pickling would be inefficient).

For any subclass, the save() and load() methods should be compatible with each other.

property shape

The shape of the feature space. Can either be specified directly, or inferred from training data, in which case self.shape == X.shape[1:], i.e., the first (batch) dimension is dropped.

This property is used by any methods which use the @flatten_batch decorator.

std_dev(X, **kwargs): Returns the (epistemic) standard deviation of the model on input X.

alien.models.pytorch.lightning module

alien.models.pytorch.pytorch module

alien.models.pytorch.pytorch.init_weights(module)[source]

alien.models.pytorch.pytorch.init_bias(module, torch)[source]

class alien.models.pytorch.pytorch.PytorchRegressor(model=None, X=None, y=None, **kwargs)[source]

Bases: LastLayerPytorchLinearization, MCDropoutRegressor, LinearizableLaplaceRegressor

Parameters:

trainer –

Specifies how the model will be trained. May be:

’model’ — calls self.model.fit ‘lightning’ — trains with pytorch-lightning trainer — calls trainer.fit None — chooses from the above in order, if available

fix_dropouts()[source]: Retools dropouts for MC dropout prediction

get_lightning_trainer(random_seed=None, **kwargs)[source]: Rerturn a LightningTrainer object from current model.

fit_model(X=None, y=None, reinitialize=None, init_seed=None, **kwargs)[source]: Fit just the model component, and not the uncertainties (if these are computed separately)

initialize(init_seed=None, sample_input=None)[source]: (Re)initializes the model weights. If self.reinitialize is True, this should be called at the start of every fit(), and this should be the default behaviour of fit().

predict(X, return_std_dev=False, convert_dtype=True)[source]

Applies the model to input(s) X (with the last self.ndim axes corresponding to each sample), and returns prediction(s).

Parameters:: return_std_dev – if True, returns a tuple (prediction, std_dev)

covariance(X): Returns the covariance of the epistemic uncertainty between all rows of X. This is where memory bugs often appear, because of the large matrices involved.

covariance_ensemble(X: _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]): Compute covariance from the ensemble of predictions

covariance_laplace(X, **kwargs): Computes covariance using the Laplace approximation

covariance_linear(X, block_size=1000)

property data

property embedding_method

Automatically determines the last layer of the model with one forward pass. It assumes that the last layer is the same for every forward pass and that it is an instance of torch.nn.Linear. Might not work with every architecture, but is tested with all PyTorch torchvision classification models (besides SqueezeNet, which has no linear last layer).

Parameters:: X – batch of samples used to find last layer.
Returns:: Returns the output of the forward pass, so as not to waste computation.

find_method()

fit(X=None, y=None, reinitialize=None, fit_uncertainty=True, **kwargs)

Fits the model to the given training data. If X and y are not specified, this method looks for self.X and self.y. If fit() finds an X but not a y, it treats X as a combined dataset data, and then uses X, y = data.X, data.y. If we can’t find data.X and data.y, we instead use X, y = data[:-1], data[-1].

fit() should also fit any accompanying uncertainty model.

Parameters:

reinitialize – If True, reinitializes model weights before fitting. If False, starts training from previous weight values. If not specified, uses self.reinitialize)
fit_uncertainty – If True, a call to fit() will also call fit_uncertainty(). Defaults to True.

fit_laplace(X=None, y=None, lamb=None): Fits the Laplace approximation to the (last layer of) the model.

fit_uncertainty(X=None, y=None): Fit just the uncertainties (if these need additional fitting beyond just the model)

input_embedding(X)

last_layer_embedding(X): Returns the activations of the last layer before the output.

linearization()

Finds the last-layer linearization of the model in its current state.

Returns:: weights, bias

static load(path): Loads a model. This particular implementation only works if save(path) hasn’t been overloaded.

method_names = {0: ['_embedding', 'embed', 'embeddings'], 1: ['last_layer_embedding', 'last_layer_embed', 'embed_last_layer', 'last_layer'], 2: ['input_embedding']}

property ndim

The number of axes in the feature space. Equal to len(self.shape). Most commonly equal to 1. If training data have been specified, then self.ndim == X.ndim - 1.

This property is used by any methods which use the @flatten_batch decorator.

predict_ensemble(X, **kwargs)

Returns an ensemble of predictions.

Parameters:: multiple – standard deviation should be this much larger

predict_linear(X): Use the model’s linearization for a prediction.

predict_samples(X, n=1, multiple=1.0, convert_dtype=True)[source]: Makes a prediction for for the batch X, randomly selected from this model’s posterior distribution. Gives an ensemble of predictions, with shape (len(X), n).

predict_with_embedding(X)

Forward pass which returns the output of the penultimate layer along with the output of the last layer. If the last layer is not known yet, it will be determined when this function is called for the first time.

Parameters:: X – one batch of data to use as input for the forward pass

save(path)

Saves the model. May well be overloaded by subclasses, if they contain non-picklable components (or pickling would be inefficient).

For any subclass, the save() and load() methods should be compatible with each other.

set_last_layer(last_layer_name: str) → None

Set the last layer of the model by its name. This sets the forward hook to get the output of the penultimate layer.

Parameters:: last_layer_name – the name of the last layer (fixed in model.named_modules()).

property shape

The shape of the feature space. Can either be specified directly, or inferred from training data, in which case self.shape == X.shape[1:], i.e., the first (batch) dimension is dropped.

This property is used by any methods which use the @flatten_batch decorator.

std_dev(X, **kwargs): Returns the (epistemic) standard deviation of the model on input X.

std_dev_ensemble(X): Returns the (epistemic) standard deviation of the model on input X.

property dtype

alien.models.pytorch.training_limits module

alien.models.pytorch.training_limits.get_training_limit(fn)[source]

* Decorator *

Modifies a function so that it can take any of a number of different naive arguments to specify limits to pytorch training. The inner (wrapped) function will see only an argument training_limit, which receives an instance of the fancy TrainingLimit class.

Thus, the inner fn must have an argument named training_limit (or **kwargs).

The outer, decorated fn may take additional optional kwargs:: 'sample_limit', 'batch_limit', 'epoch_limit' 'samples', 'batches', 'epochs'

If these are given, they determine the value of training_limit according a calculation that favors them in the order provided (though you may only want to provide one).

class alien.models.pytorch.training_limits.TrainingLimit(min_samples: int = 0, min_epochs: float = 0, samples: int | None = None, epochs: float | None = None, batches: int | None = None, max_samples: float = inf, max_epochs: float = inf)[source]

Bases: object

Encapsulates the computation of training limits, which may depend on things like dataset length.

min_samples: int = 0

min_epochs: float = 0

samples: int | None = None

epochs: float | None = None

batches: int | None = None

max_samples: float = inf

max_epochs: float = inf

sample_limit(length=None)[source]

batch_limit(batch_size=None, length=None)[source]

class alien.models.pytorch.training_limits.StdLimit(**kwargs)[source]

Bases: TrainingLimit

batch_limit(batch_size=None, length=None)

batches: int | None = None

epochs: float | None = None

max_epochs: float = inf

max_samples: float = inf

min_epochs: float = 0

min_samples: int = 0

sample_limit(length=None)

samples: int | None = None

alien.models.pytorch.utils module

Helper functions for Pytorch models.

alien.models.pytorch.utils.as_tensor(x)[source]: Return the tensor version of x (i.e. itself it it already is ArrayLike, or x.data)

alien.models.pytorch.utils.dropout_forward(self, x)[source]

alien.models.pytorch.utils.submodules(module, include_names=True, skip=frozenset({}))[source]

Iterator through submodules of module (paired with their names, if include_names is True). A submodule is returned only once, on its first occurence in a depth-first traversal.

Any modules in skip (given either as the actual module, or its name) will be skipped, along with all their submodules.

Parameters:

module (torch.nn.module) – The module, whose submodules we will iterate through.
include_names (bool) – True, the iterator yields pairs (name, submodule), otherwise it yields just submodule. (The returned name is the name it’s indexed as the first time it occurs in the tree. If the submodule is not named, its name will return as None)
skip – A collection of modules to skip. Can contain either modules themselves, and/or their names.

alien.models.pytorch package

Submodules

alien.models.pytorch.last_layer module

alien.models.pytorch.lightning module

alien.models.pytorch.pytorch module

alien.models.pytorch.training_limits module

alien.models.pytorch.utils module

Module contents