alien.benchmarks

Submodules

alien.benchmarks.metrics module

Module for computing metrics and plotting. Confidence intervals, RMSE, scatter plots.

alien.benchmarks.metrics.conf_int(confidence_level, standard_error, len_x: int | _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None)[source]

Compute confidence interval using a normal or t-distribution

Parameters:
  • confidence_level (_type_) – _description_

  • standard_error (_type_) – _description_

  • len_x (_type_, optional) – _description_. Defaults to None.

Returns:

_description_

Return type:

_type_

alien.benchmarks.metrics.sem(x: _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], axis=0) _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes][source]

Compute standard error of the mean.

Parameters:
  • x (ArrayLike) – array to compute SEM for.

  • axis (int, optional) – Axis in x to make the computation along. Defaults to 0.

Returns:

Array of computed SEMs.

Return type:

ArrayLike

class alien.benchmarks.metrics.Score(x=array([], dtype=float64), y=array([], dtype=float64), err=None, name: str | None = None, file_path: str | None = None, axes=None, plot_args: Dict | None = None)[source]

Bases: object

default_filename = '*.pickle'
append(x_val: float, y_val: float)[source]

Append point to self.x and self.y

Parameters:
  • x_val (float) – x value to append

  • y_val (float) – y value to append

save(file_path: str | None = None)[source]

Save Score object to given filepath.

Parameters:

file_path (Optional[str], optional) – Path to save object. Defaults to None.

static load(file_path: str)[source]

Load a Score object

Parameters:

file (str) – File path location.

Returns:

Score object to load.

Return type:

Score

static load_many(*scores, filename='*.pickle') List[source]

Loads many Scores at once, and returns a list.

Parameters:

*args – several filenames, or one filename with wildcards, or one folder name (which will have wildcards appended)

Returns:

_description_

Return type:

list[Score]

static average_runs(*args, length: str = 'longest', err=True, name: str | None = None, file_path: str | None = None, save: bool = False, filename='*.pickle')[source]

_summary_

Parameters:
  • args – _description_

  • length (str, optional) – _description_. Defaults to “longest”.

  • name (Optional[str], optional) – _description_. Defaults to None.

  • file_path (Optional[str], optional) – File path to save object. Defaults to None.

  • save (bool, optional) – whether to save returned object. Defaults to False.

Raises:

NotImplementedError – _description_

Returns:

average score to return

Return type:

Score

class alien.benchmarks.metrics.TopScore(x=array([], dtype=float64), y=array([], dtype=float64), err=None, name: str | None = None, file_path: str | None = None, axes=None, plot_args: Dict | None = None)[source]

Bases: Score

compute(x, labels: _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], average_over: int = 1)[source]

Compute top score.

Parameters:
  • x (_type_) – _description_

  • labels (ArrayLike) – _description_

  • average_over (int, optional) – _description_. Defaults to 1.

Returns:

_description_

Return type:

_type_

append(x_val: float, y_val: float)

Append point to self.x and self.y

Parameters:
  • x_val (float) – x value to append

  • y_val (float) – y value to append

static average_runs(*args, length: str = 'longest', err=True, name: str | None = None, file_path: str | None = None, save: bool = False, filename='*.pickle')

_summary_

Parameters:
  • args – _description_

  • length (str, optional) – _description_. Defaults to “longest”.

  • name (Optional[str], optional) – _description_. Defaults to None.

  • file_path (Optional[str], optional) – File path to save object. Defaults to None.

  • save (bool, optional) – whether to save returned object. Defaults to False.

Raises:

NotImplementedError – _description_

Returns:

average score to return

Return type:

Score

default_filename = '*.pickle'
static load(file_path: str)

Load a Score object

Parameters:

file (str) – File path location.

Returns:

Score object to load.

Return type:

Score

static load_many(*scores, filename='*.pickle') List

Loads many Scores at once, and returns a list.

Parameters:

*args – several filenames, or one filename with wildcards, or one folder name (which will have wildcards appended)

Returns:

_description_

Return type:

list[Score]

save(file_path: str | None = None)

Save Score object to given filepath.

Parameters:

file_path (Optional[str], optional) – Path to save object. Defaults to None.

class alien.benchmarks.metrics.RMSE(*args, scatter=None, axes: Tuple = ('samples', 'RMSE'), **kwargs)[source]

Bases: Score

compute(a0=None, a1=None)[source]
static from_folder(folder: str, name: str | None = None, file_path: str | None = None, save: bool = False)[source]
Parameters:
  • folder (str) – _description_

  • name (Optional[str], optional) – _description_. Defaults to None.

  • file_path (Optional[str], optional) – _description_. Defaults to None.

  • save (bool, optional) – _description_. Defaults to False.

append(x_val: float, y_val: float)

Append point to self.x and self.y

Parameters:
  • x_val (float) – x value to append

  • y_val (float) – y value to append

static average_runs(*args, length: str = 'longest', err=True, name: str | None = None, file_path: str | None = None, save: bool = False, filename='*.pickle')

_summary_

Parameters:
  • args – _description_

  • length (str, optional) – _description_. Defaults to “longest”.

  • name (Optional[str], optional) – _description_. Defaults to None.

  • file_path (Optional[str], optional) – File path to save object. Defaults to None.

  • save (bool, optional) – whether to save returned object. Defaults to False.

Raises:

NotImplementedError – _description_

Returns:

average score to return

Return type:

Score

default_filename = '*.pickle'
static load(file_path: str)

Load a Score object

Parameters:

file (str) – File path location.

Returns:

Score object to load.

Return type:

Score

static load_many(*scores, filename='*.pickle') List

Loads many Scores at once, and returns a list.

Parameters:

*args – several filenames, or one filename with wildcards, or one folder name (which will have wildcards appended)

Returns:

_description_

Return type:

list[Score]

save(file_path: str | None = None)

Save Score object to given filepath.

Parameters:

file_path (Optional[str], optional) – Path to save object. Defaults to None.

class alien.benchmarks.metrics.Scatter(labels=None, preds=None, errs=None, model=None, test=None, samples=None, name=None, axes=None, file=None, plot_args: Dict | None = None)[source]

Bases: object

compute(get_errs=None, samples=None)[source]
RMSE()[source]
plot(show_errors=True, axes=None, show=True, show_diagonal=True, block=True)[source]
save(file=None)[source]
static load(file)[source]
alien.benchmarks.metrics.plot_scores(*XY, xlabel='Compounds / Number', ylabel='Error / RMSE', show_err=True, confidence=0.95, grid=True, ticks=True, tight_layout=True, dpi=800, figsize=(5, 4), fontsize=12, xlim=None, ylim=None, xmin=None, xmax=None, ymin=None, ymax=None, show=True, block=True, save=False, file_path=None, title=None, name=None, legend=None, axes=None, **kwargs)[source]

Returns the matplotlib.axes.Axes it is drawing on. You can use this to modify the plot after-the-fact.

Parameters:
  • xlabel – Label for x-axis

  • ylabel – Label for y-axis

  • show_err – If True, shows error bands when available

  • confidence – The confidence threshold of the error bands

  • grid – If True, show a dashed gray grid

  • ticks – If True, show ticks on the axes

  • tight_layout – If True, calls matplotlib’s tight_layout

  • dpi – DPI for saved figures

  • figsize – Size of the figure in matplotlib units

  • fontsize – Font size for legend, axis labels

  • xlim – Can be either an order pair (xmin, xmax), or a dictionary {'xmin':xmin, 'xmax':xmax}. In fact, the dictionary may have any subset of the arguments to matplotlib.axes.Axes.set_xlim.

  • ylim – Like xlim, but with (ymin, ymax).

  • ymax (xmin, xmax, ymin,) – Alternatively, you can pass plot limits directly as kwargs.

  • show – Whether to call matplotlib.pyplot.show

  • block – Whether the plot display should be blocking. Defaults to True.

  • save – If save == True, or if file is given, saves the figure to a file. If file is specified, uses that filename. If file is not specified, builds a filename be sanitizing title or name.

  • file_path – See above

  • title

  • nametitle and name are synonyms, specifying the plot title

  • legend – Whether or not to show a legend

  • axes – You can specify matplotlib axes to plot into

Additional keyword arguments are passed to the plot function.

Note about titles/filenames: If you just want to give a name for the purpose of saving to a unique file, specify file. If you also want to show a title, there’s no need to specify file—you can just specify title or name.

alien.benchmarks.oracle module

class alien.benchmarks.oracle.Oracle[source]

Bases: object

abstract get_label(x, remove=False)[source]
get_labels(x, remove=False)[source]
class alien.benchmarks.oracle.SetOracle(*args, **kwargs)[source]

Bases: SetSampleGenerator, Oracle

get_label(x, remove=False)[source]
generate_sample()

Generates and returns a single sample

generate_samples(N=inf, reshuffle=False)

Generates and returns N samples.

Parameters:

N – usually an integer. Different generators will interpret N == inf in different ways. It will typically return “all” samples, perhaps as an iterable.

get_labels(x, remove=False)
property labels
remove_data_indices(indices)

Remove data indices and shift self.pointer accordingly

remove_sample(sample)

Single-sample version of remove_samples

remove_samples(samples)

‘Removes’ or, rather, hides samples from this generator. Hidden samples are still stored in self.data, but will not appear in any future calls to generate_samples.

reshuffle()

Reshuffles current indices

alien.benchmarks.retrospective module

alien.benchmarks.retrospective.run_experiments(X, y, model, runs_dir, overwrite_old_runs=True, n_initial=None, batch_size=20, num_samples=inf, selector=None, selector_args=None, fit_args=None, n_runs=10, ids=None, save_ids=True, random_seed=1, split_seed=421, test_size=0.2, timestamps=None, stop_samples=None, stop_rmse=None, stop_frac=None, peek_score=0, test_samples_x=None, test_samples_y=None)[source]
Parameters:
  • runs_dir – directory to store the runs and results of this training (each run in separate subdirectories)

  • n_initial – number of samples to randomly select for initial training data

  • batch_size – number of samples selected for batch

  • num_samples – number of samples (drawn from the sample pool X) to select from. Default is inf, which takes all of the samples available in X.

  • selector – the selector to use for batch selection, either given by one of the strings ‘covariance’, ‘random’, ‘expected improvement’/’ei’, ‘greedy’, or passed as an actual SampleSelector instance. Defaults to ‘covariance’.

  • selector_args

    a dictionary passed as kwargs to the selector constructor. The following constructor arguments are already automatically included, and don’t need to be included in this dictionary:

    model, labelled_samples, samples, num_samples, batch_size

  • fit_args – a dictionary passed as kwargs each time model.fit(…) is called. Typically, this is model- or framework-specific; so, eg., different arguments would be appropriate for pytorch models, DeepChem models, etc.

  • n_runs – the number of overall runs (each starting from a random initial selection) to do (for averaging)

  • random_seed – random seed for most RNG generation

  • split_seed – random seed for shuffling and splitting of data

  • test_size – the size of the test/validation set to take from X,y. If test_size >= 1, then takes that many samples. if 0 < test_size < 1, takes that fraction of the dataset size.

  • stop_samples – if this is not None, stops an experiment run when this many samples are labelled. Defaults to None

  • stop_rmse – if this is not None, stops an experiment run when this RMSE has been reached. Defaults to None

  • stop_frac – if this is not None, stops an experiment run when the RMSE has moved this fraction of the way from the RMSE after the first round to the RMSE trained on the whole dataset. We suggest something like .85, if you want to use this feature. Defaults to None

alien.benchmarks.uncertainty_metrics module

alien.benchmarks.uncertainty_metrics.KL_divergence(preds, std_devs, y, noise=None, normalize=True)[source]

Computes the KL-divergence from the predicted distribution (using preds and std_devs, assuming normal distributions) to the ground-truth distribution (all of the probability mass on the true y values). In other words, this tells you how much information would be gained by learning the true values. Averaged over sample points.

Lower is generally better. Not only does this penalize uncertainties which are poorly calibrated (i.e., it penalizes uncertainties where the actual error distribution has a different standard deviation), but also it penalizes uncertainties which are not as specific as they could be, i.e., which fail to discriminate between certain and uncertain predictions.

Parameters:
  • preds – predicted values

  • std_devs – predicted uncertainties, given as standard deviations

  • y – true values

  • normalize – if True (the default), normalizes with respect to the RMSE

alien.benchmarks.uncertainty_metrics.best_multiple(preds, std_devs, y, noise=None, max_precision=5)[source]

Does a simple binary search to find which multiple of std_devs gives the lowest KL-divergence score. To converge, assumes there is only one local minimum. (I expect this to be true, but I will have to check.)

Arguments preds, std_devs, y and noise are as in KL_divergence, except this will compute the KL-divergence for multiples of std_devs.

Parameters:

max_precision – the number of interval splits the score must be adjacent to before returning the value. Defaults to 5.

alien.benchmarks.uncertainty_metrics.binary_optimize(fn, *args, mode='max', start=1.0, max_precision=5, max_iterations=50, **kwargs)[source]

Does a simple binary search to find which scalar value gives the best (max/min) value of fn. Optimization converges to a local max/min.

Each search iteration starts with the previously-explored value with the best score, and looks on either side of it. If the best score is at the beginning of the current list of value, it looks at half this value on the low side; if at the end of the list, looks at twice this value on the high side. If the best value is somewhere in the middle, it divides the interval on either side in half.

Parameters:

max_precision – the number of interval splits the score must be adjacent to before returning the value. Defaults to 5.

alien.benchmarks.uncertainty_metrics.search_lower(i, vals, precision, max_precision=5)[source]
alien.benchmarks.uncertainty_metrics.search_upper(i, vals, precision, max_precision=5)[source]
alien.benchmarks.uncertainty_metrics.plot_errors(preds, std_devs, y, noise=None, show=True, axes=None, **kwargs)[source]

Module contents