alien.benchmarks

Submodules

alien.benchmarks.metrics module

Module for computing metrics and plotting. Confidence intervals, RMSE, scatter plots.

Compute confidence interval using a normal or t-distribution

Parameters:

confidence_level (_type_) – _description_
standard_error (_type_) – _description_
len_x (_type_, optional) – _description_. Defaults to None.

Returns:

_description_

Return type:

_type_

Compute standard error of the mean.

Parameters:

x (ArrayLike) – array to compute SEM for.
axis (int, optional) – Axis in x to make the computation along. Defaults to 0.

Returns:

Array of computed SEMs.

Return type:

ArrayLike

class alien.benchmarks.metrics.Score(x=array([], dtype=float64), y=array([], dtype=float64), err=None, name: str | None = None, file_path: str | None = None, axes=None, plot_args: Dict | None = None)[source]

Bases: object

default_filename = '*.pickle'

append(x_val: float, y_val: float)[source]

Append point to self.x and self.y

Parameters:

x_val (float) – x value to append
y_val (float) – y value to append

save(file_path: str | None = None)[source]

Save Score object to given filepath.

Parameters:: file_path (Optional[str], optional) – Path to save object. Defaults to None.

static load(file_path: str)[source]

Load a Score object

Parameters:: file (str) – File path location.
Returns:: Score object to load.
Return type:: Score

static load_many(*scores, filename='*.pickle') → List[source]

Loads many Scores at once, and returns a list.

Parameters:: *args – several filenames, or one filename with wildcards, or one folder name (which will have wildcards appended)
Returns:: _description_
Return type:: list[Score]

static average_runs(*args, length: str = 'longest', err=True, name: str | None = None, file_path: str | None = None, save: bool = False, filename='*.pickle')[source]

_summary_

Parameters:

args – _description_
length (str, optional) – _description_. Defaults to “longest”.
name (Optional[str], optional) – _description_. Defaults to None.
file_path (Optional[str], optional) – File path to save object. Defaults to None.
save (bool, optional) – whether to save returned object. Defaults to False.

Raises:

NotImplementedError – _description_

Returns:

average score to return

Return type:

Score

class alien.benchmarks.metrics.TopScore(x=array([], dtype=float64), y=array([], dtype=float64), err=None, name: str | None = None, file_path: str | None = None, axes=None, plot_args: Dict | None = None)[source]

Bases: Score

Compute top score.

Parameters:

x (_type_) – _description_
labels (ArrayLike) – _description_
average_over (int, optional) – _description_. Defaults to 1.

Returns:

_description_

Return type:

_type_

append(x_val: float, y_val: float)

Append point to self.x and self.y

Parameters:

x_val (float) – x value to append
y_val (float) – y value to append

static average_runs(*args, length: str = 'longest', err=True, name: str | None = None, file_path: str | None = None, save: bool = False, filename='*.pickle')

_summary_

Parameters:

args – _description_
length (str, optional) – _description_. Defaults to “longest”.
name (Optional[str], optional) – _description_. Defaults to None.
file_path (Optional[str], optional) – File path to save object. Defaults to None.
save (bool, optional) – whether to save returned object. Defaults to False.

Raises:

NotImplementedError – _description_

Returns:

average score to return

Return type:

Score

default_filename = '*.pickle'

static load(file_path: str)

Load a Score object

Parameters:: file (str) – File path location.
Returns:: Score object to load.
Return type:: Score

static load_many(*scores, filename='*.pickle') → List

Loads many Scores at once, and returns a list.

Parameters:: *args – several filenames, or one filename with wildcards, or one folder name (which will have wildcards appended)
Returns:: _description_
Return type:: list[Score]

save(file_path: str | None = None)

Save Score object to given filepath.

Parameters:: file_path (Optional[str], optional) – Path to save object. Defaults to None.

class alien.benchmarks.metrics.RMSE(*args, scatter=None, axes: Tuple = ('samples', 'RMSE'), **kwargs)[source]

Bases: Score

compute(a0=None, a1=None)[source]

static from_folder(folder: str, name: str | None = None, file_path: str | None = None, save: bool = False)[source]

Parameters:

folder (str) – _description_
name (Optional[str], optional) – _description_. Defaults to None.
file_path (Optional[str], optional) – _description_. Defaults to None.
save (bool, optional) – _description_. Defaults to False.

append(x_val: float, y_val: float)

Append point to self.x and self.y

Parameters:

x_val (float) – x value to append
y_val (float) – y value to append

static average_runs(*args, length: str = 'longest', err=True, name: str | None = None, file_path: str | None = None, save: bool = False, filename='*.pickle')

_summary_

Parameters:

args – _description_
length (str, optional) – _description_. Defaults to “longest”.
name (Optional[str], optional) – _description_. Defaults to None.
file_path (Optional[str], optional) – File path to save object. Defaults to None.
save (bool, optional) – whether to save returned object. Defaults to False.

Raises:

NotImplementedError – _description_

Returns:

average score to return

Return type:

Score

default_filename = '*.pickle'

static load(file_path: str)

Load a Score object

Parameters:: file (str) – File path location.
Returns:: Score object to load.
Return type:: Score

static load_many(*scores, filename='*.pickle') → List

Loads many Scores at once, and returns a list.

Parameters:: *args – several filenames, or one filename with wildcards, or one folder name (which will have wildcards appended)
Returns:: _description_
Return type:: list[Score]

save(file_path: str | None = None)

Save Score object to given filepath.

Parameters:: file_path (Optional[str], optional) – Path to save object. Defaults to None.

class alien.benchmarks.metrics.Scatter(labels=None, preds=None, errs=None, model=None, test=None, samples=None, name=None, axes=None, file=None, plot_args: Dict | None = None)[source]

Bases: object

compute(get_errs=None, samples=None)[source]

RMSE()[source]

plot(show_errors=True, axes=None, show=True, show_diagonal=True, block=True)[source]

save(file=None)[source]

static load(file)[source]

alien.benchmarks.metrics.plot_scores(*XY, xlabel='Compounds / Number', ylabel='Error / RMSE', show_err=True, confidence=0.95, grid=True, ticks=True, tight_layout=True, dpi=800, figsize=(5, 4), fontsize=12, xlim=None, ylim=None, xmin=None, xmax=None, ymin=None, ymax=None, show=True, block=True, save=False, file_path=None, title=None, name=None, legend=None, axes=None, **kwargs)[source]

Returns the matplotlib.axes.Axes it is drawing on. You can use this to modify the plot after-the-fact.

Parameters:

xlabel – Label for x-axis
ylabel – Label for y-axis
show_err – If True, shows error bands when available
confidence – The confidence threshold of the error bands
grid – If True, show a dashed gray grid
ticks – If True, show ticks on the axes
tight_layout – If True, calls matplotlib’s tight_layout
dpi – DPI for saved figures
figsize – Size of the figure in matplotlib units
fontsize – Font size for legend, axis labels
xlim – Can be either an order pair (xmin, xmax), or a dictionary {'xmin':xmin, 'xmax':xmax}. In fact, the dictionary may have any subset of the arguments to matplotlib.axes.Axes.set_xlim.
ylim – Like xlim, but with (ymin, ymax).
ymax (xmin, xmax, ymin,) – Alternatively, you can pass plot limits directly as kwargs.
show – Whether to call matplotlib.pyplot.show
block – Whether the plot display should be blocking. Defaults to True.
save – If save == True, or if file is given, saves the figure to a file. If file is specified, uses that filename. If file is not specified, builds a filename be sanitizing title or name.
file_path – See above
title –
name – title and name are synonyms, specifying the plot title
legend – Whether or not to show a legend
axes – You can specify matplotlib axes to plot into

Additional keyword arguments are passed to the plot function.

Note about titles/filenames: If you just want to give a name for the purpose of saving to a unique file, specify file. If you also want to show a title, there’s no need to specify file—you can just specify title or name.

alien.benchmarks.oracle module

class alien.benchmarks.oracle.Oracle[source]

Bases: object

abstract get_label(x, remove=False)[source]

get_labels(x, remove=False)[source]

class alien.benchmarks.oracle.SetOracle(*args, **kwargs)[source]

Bases: SetSampleGenerator, Oracle

get_label(x, remove=False)[source]

generate_sample(): Generates and returns a single sample

generate_samples(N=inf, reshuffle=False)

Generates and returns N samples.

Parameters:: N – usually an integer. Different generators will interpret N == inf in different ways. It will typically return “all” samples, perhaps as an iterable.

get_labels(x, remove=False)

property labels

remove_data_indices(indices): Remove data indices and shift self.pointer accordingly

remove_sample(sample): Single-sample version of remove_samples

remove_samples(samples): ‘Removes’ or, rather, hides samples from this generator. Hidden samples are still stored in self.data, but will not appear in any future calls to generate_samples.

reshuffle(): Reshuffles current indices

alien.benchmarks.retrospective module

alien.benchmarks.retrospective.run_experiments(X, y, model, runs_dir, overwrite_old_runs=True, n_initial=None, batch_size=20, num_samples=inf, selector=None, selector_args=None, fit_args=None, n_runs=10, ids=None, save_ids=True, random_seed=1, split_seed=421, test_size=0.2, timestamps=None, stop_samples=None, stop_rmse=None, stop_frac=None, peek_score=0, test_samples_x=None, test_samples_y=None)[source]

Parameters:

runs_dir – directory to store the runs and results of this training (each run in separate subdirectories)
n_initial – number of samples to randomly select for initial training data
batch_size – number of samples selected for batch
num_samples – number of samples (drawn from the sample pool X) to select from. Default is inf, which takes all of the samples available in X.
selector – the selector to use for batch selection, either given by one of the strings ‘covariance’, ‘random’, ‘expected improvement’/’ei’, ‘greedy’, or passed as an actual SampleSelector instance. Defaults to ‘covariance’.
selector_args –
a dictionary passed as kwargs to the selector constructor. The following constructor arguments are already automatically included, and don’t need to be included in this dictionary:

model, labelled_samples, samples, num_samples, batch_size
fit_args – a dictionary passed as kwargs each time model.fit(…) is called. Typically, this is model- or framework-specific; so, eg., different arguments would be appropriate for pytorch models, DeepChem models, etc.
n_runs – the number of overall runs (each starting from a random initial selection) to do (for averaging)
random_seed – random seed for most RNG generation
split_seed – random seed for shuffling and splitting of data
test_size – the size of the test/validation set to take from X,y. If test_size >= 1, then takes that many samples. if 0 < test_size < 1, takes that fraction of the dataset size.
stop_samples – if this is not None, stops an experiment run when this many samples are labelled. Defaults to None
stop_rmse – if this is not None, stops an experiment run when this RMSE has been reached. Defaults to None
stop_frac – if this is not None, stops an experiment run when the RMSE has moved this fraction of the way from the RMSE after the first round to the RMSE trained on the whole dataset. We suggest something like .85, if you want to use this feature. Defaults to None

alien.benchmarks.uncertainty_metrics module

alien.benchmarks.uncertainty_metrics.KL_divergence(preds, std_devs, y, noise=None, normalize=True)[source]

Computes the KL-divergence from the predicted distribution (using preds and std_devs, assuming normal distributions) to the ground-truth distribution (all of the probability mass on the true y values). In other words, this tells you how much information would be gained by learning the true values. Averaged over sample points.

Lower is generally better. Not only does this penalize uncertainties which are poorly calibrated (i.e., it penalizes uncertainties where the actual error distribution has a different standard deviation), but also it penalizes uncertainties which are not as specific as they could be, i.e., which fail to discriminate between certain and uncertain predictions.

Parameters:

preds – predicted values
std_devs – predicted uncertainties, given as standard deviations
y – true values
normalize – if True (the default), normalizes with respect to the RMSE

alien.benchmarks.uncertainty_metrics.best_multiple(preds, std_devs, y, noise=None, max_precision=5)[source]

Does a simple binary search to find which multiple of std_devs gives the lowest KL-divergence score. To converge, assumes there is only one local minimum. (I expect this to be true, but I will have to check.)

Arguments preds, std_devs, y and noise are as in KL_divergence, except this will compute the KL-divergence for multiples of std_devs.

Parameters:: max_precision – the number of interval splits the score must be adjacent to before returning the value. Defaults to 5.

alien.benchmarks.uncertainty_metrics.binary_optimize(fn, *args, mode='max', start=1.0, max_precision=5, max_iterations=50, **kwargs)[source]

Does a simple binary search to find which scalar value gives the best (max/min) value of fn. Optimization converges to a local max/min.

Each search iteration starts with the previously-explored value with the best score, and looks on either side of it. If the best score is at the beginning of the current list of value, it looks at half this value on the low side; if at the end of the list, looks at twice this value on the high side. If the best value is somewhere in the middle, it divides the interval on either side in half.

Parameters:: max_precision – the number of interval splits the score must be adjacent to before returning the value. Defaults to 5.

alien.benchmarks.uncertainty_metrics.search_lower(i, vals, precision, max_precision=5)[source]

alien.benchmarks.uncertainty_metrics.search_upper(i, vals, precision, max_precision=5)[source]

alien.benchmarks.uncertainty_metrics.plot_errors(preds, std_devs, y, noise=None, show=True, axes=None, **kwargs)[source]

alien.benchmarks

Submodules

alien.benchmarks.metrics module

alien.benchmarks.oracle module

alien.benchmarks.retrospective module

alien.benchmarks.uncertainty_metrics module

Module contents