alien.benchmarks
Submodules
alien.benchmarks.metrics module
Module for computing metrics and plotting. Confidence intervals, RMSE, scatter plots.
- alien.benchmarks.metrics.conf_int(confidence_level, standard_error, len_x: int | _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None)[source]
Compute confidence interval using a normal or t-distribution
- Parameters:
confidence_level (_type_) – _description_
standard_error (_type_) – _description_
len_x (_type_, optional) – _description_. Defaults to None.
- Returns:
_description_
- Return type:
_type_
- alien.benchmarks.metrics.sem(x: _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], axis=0) _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] [source]
Compute standard error of the mean.
- Parameters:
x (ArrayLike) – array to compute SEM for.
axis (int, optional) – Axis in x to make the computation along. Defaults to 0.
- Returns:
Array of computed SEMs.
- Return type:
ArrayLike
- class alien.benchmarks.metrics.Score(x=array([], dtype=float64), y=array([], dtype=float64), err=None, name: str | None = None, file_path: str | None = None, axes=None, plot_args: Dict | None = None)[source]
Bases:
object
- default_filename = '*.pickle'
- append(x_val: float, y_val: float)[source]
Append point to self.x and self.y
- Parameters:
x_val (float) – x value to append
y_val (float) – y value to append
- save(file_path: str | None = None)[source]
Save Score object to given filepath.
- Parameters:
file_path (Optional[str], optional) – Path to save object. Defaults to None.
- static load(file_path: str)[source]
Load a Score object
- Parameters:
file (str) – File path location.
- Returns:
Score object to load.
- Return type:
- static load_many(*scores, filename='*.pickle') List [source]
Loads many Scores at once, and returns a list.
- Parameters:
*args – several filenames, or one filename with wildcards, or one folder name (which will have wildcards appended)
- Returns:
_description_
- Return type:
list[Score]
- static average_runs(*args, length: str = 'longest', err=True, name: str | None = None, file_path: str | None = None, save: bool = False, filename='*.pickle')[source]
_summary_
- Parameters:
args – _description_
length (str, optional) – _description_. Defaults to “longest”.
name (Optional[str], optional) – _description_. Defaults to None.
file_path (Optional[str], optional) – File path to save object. Defaults to None.
save (bool, optional) – whether to save returned object. Defaults to False.
- Raises:
NotImplementedError – _description_
- Returns:
average score to return
- Return type:
- class alien.benchmarks.metrics.TopScore(x=array([], dtype=float64), y=array([], dtype=float64), err=None, name: str | None = None, file_path: str | None = None, axes=None, plot_args: Dict | None = None)[source]
Bases:
Score
- compute(x, labels: _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], average_over: int = 1)[source]
Compute top score.
- Parameters:
x (_type_) – _description_
labels (ArrayLike) – _description_
average_over (int, optional) – _description_. Defaults to 1.
- Returns:
_description_
- Return type:
_type_
- append(x_val: float, y_val: float)
Append point to self.x and self.y
- Parameters:
x_val (float) – x value to append
y_val (float) – y value to append
- static average_runs(*args, length: str = 'longest', err=True, name: str | None = None, file_path: str | None = None, save: bool = False, filename='*.pickle')
_summary_
- Parameters:
args – _description_
length (str, optional) – _description_. Defaults to “longest”.
name (Optional[str], optional) – _description_. Defaults to None.
file_path (Optional[str], optional) – File path to save object. Defaults to None.
save (bool, optional) – whether to save returned object. Defaults to False.
- Raises:
NotImplementedError – _description_
- Returns:
average score to return
- Return type:
- default_filename = '*.pickle'
- static load(file_path: str)
Load a Score object
- Parameters:
file (str) – File path location.
- Returns:
Score object to load.
- Return type:
- static load_many(*scores, filename='*.pickle') List
Loads many Scores at once, and returns a list.
- Parameters:
*args – several filenames, or one filename with wildcards, or one folder name (which will have wildcards appended)
- Returns:
_description_
- Return type:
list[Score]
- save(file_path: str | None = None)
Save Score object to given filepath.
- Parameters:
file_path (Optional[str], optional) – Path to save object. Defaults to None.
- class alien.benchmarks.metrics.RMSE(*args, scatter=None, axes: Tuple = ('samples', 'RMSE'), **kwargs)[source]
Bases:
Score
- static from_folder(folder: str, name: str | None = None, file_path: str | None = None, save: bool = False)[source]
- Parameters:
folder (str) – _description_
name (Optional[str], optional) – _description_. Defaults to None.
file_path (Optional[str], optional) – _description_. Defaults to None.
save (bool, optional) – _description_. Defaults to False.
- append(x_val: float, y_val: float)
Append point to self.x and self.y
- Parameters:
x_val (float) – x value to append
y_val (float) – y value to append
- static average_runs(*args, length: str = 'longest', err=True, name: str | None = None, file_path: str | None = None, save: bool = False, filename='*.pickle')
_summary_
- Parameters:
args – _description_
length (str, optional) – _description_. Defaults to “longest”.
name (Optional[str], optional) – _description_. Defaults to None.
file_path (Optional[str], optional) – File path to save object. Defaults to None.
save (bool, optional) – whether to save returned object. Defaults to False.
- Raises:
NotImplementedError – _description_
- Returns:
average score to return
- Return type:
- default_filename = '*.pickle'
- static load(file_path: str)
Load a Score object
- Parameters:
file (str) – File path location.
- Returns:
Score object to load.
- Return type:
- static load_many(*scores, filename='*.pickle') List
Loads many Scores at once, and returns a list.
- Parameters:
*args – several filenames, or one filename with wildcards, or one folder name (which will have wildcards appended)
- Returns:
_description_
- Return type:
list[Score]
- save(file_path: str | None = None)
Save Score object to given filepath.
- Parameters:
file_path (Optional[str], optional) – Path to save object. Defaults to None.
- class alien.benchmarks.metrics.Scatter(labels=None, preds=None, errs=None, model=None, test=None, samples=None, name=None, axes=None, file=None, plot_args: Dict | None = None)[source]
Bases:
object
- alien.benchmarks.metrics.plot_scores(*XY, xlabel='Compounds / Number', ylabel='Error / RMSE', show_err=True, confidence=0.95, grid=True, ticks=True, tight_layout=True, dpi=800, figsize=(5, 4), fontsize=12, xlim=None, ylim=None, xmin=None, xmax=None, ymin=None, ymax=None, show=True, block=True, save=False, file_path=None, title=None, name=None, legend=None, axes=None, **kwargs)[source]
Returns the
matplotlib.axes.Axes
it is drawing on. You can use this to modify the plot after-the-fact.- Parameters:
xlabel – Label for x-axis
ylabel – Label for y-axis
show_err – If True, shows error bands when available
confidence – The confidence threshold of the error bands
grid – If True, show a dashed gray grid
ticks – If True, show ticks on the axes
tight_layout – If True, calls matplotlib’s
tight_layout
dpi – DPI for saved figures
figsize – Size of the figure in matplotlib units
fontsize – Font size for legend, axis labels
xlim – Can be either an order pair
(xmin, xmax)
, or a dictionary{'xmin':xmin, 'xmax':xmax}
. In fact, the dictionary may have any subset of the arguments tomatplotlib.axes.Axes.set_xlim
.ylim – Like xlim, but with
(ymin, ymax)
.ymax (xmin, xmax, ymin,) – Alternatively, you can pass plot limits directly as kwargs.
show – Whether to call
matplotlib.pyplot.show
block – Whether the plot display should be blocking. Defaults to True.
save – If
save == True
, or iffile
is given, saves the figure to a file. Iffile
is specified, uses that filename. Iffile
is not specified, builds a filename be sanitizingtitle
orname
.file_path – See above
title –
name –
title
andname
are synonyms, specifying the plot titlelegend – Whether or not to show a legend
axes – You can specify matplotlib axes to plot into
Additional keyword arguments are passed to the
plot
function.Note about titles/filenames: If you just want to give a name for the purpose of saving to a unique file, specify
file
. If you also want to show a title, there’s no need to specifyfile
—you can just specifytitle
orname
.
alien.benchmarks.oracle module
- class alien.benchmarks.oracle.SetOracle(*args, **kwargs)[source]
Bases:
SetSampleGenerator
,Oracle
- generate_sample()
Generates and returns a single sample
- generate_samples(N=inf, reshuffle=False)
Generates and returns N samples.
- Parameters:
N – usually an integer. Different generators will interpret N == inf in different ways. It will typically return “all” samples, perhaps as an iterable.
- get_labels(x, remove=False)
- property labels
- remove_data_indices(indices)
Remove data indices and shift self.pointer accordingly
- remove_sample(sample)
Single-sample version of remove_samples
- remove_samples(samples)
‘Removes’ or, rather, hides samples from this generator. Hidden samples are still stored in self.data, but will not appear in any future calls to generate_samples.
- reshuffle()
Reshuffles current indices
alien.benchmarks.retrospective module
- alien.benchmarks.retrospective.run_experiments(X, y, model, runs_dir, overwrite_old_runs=True, n_initial=None, batch_size=20, num_samples=inf, selector=None, selector_args=None, fit_args=None, n_runs=10, ids=None, save_ids=True, random_seed=1, split_seed=421, test_size=0.2, timestamps=None, stop_samples=None, stop_rmse=None, stop_frac=None, peek_score=0, test_samples_x=None, test_samples_y=None)[source]
- Parameters:
runs_dir – directory to store the runs and results of this training (each run in separate subdirectories)
n_initial – number of samples to randomly select for initial training data
batch_size – number of samples selected for batch
num_samples – number of samples (drawn from the sample pool X) to select from. Default is inf, which takes all of the samples available in X.
selector – the selector to use for batch selection, either given by one of the strings ‘covariance’, ‘random’, ‘expected improvement’/’ei’, ‘greedy’, or passed as an actual SampleSelector instance. Defaults to ‘covariance’.
selector_args –
a dictionary passed as kwargs to the selector constructor. The following constructor arguments are already automatically included, and don’t need to be included in this dictionary:
model, labelled_samples, samples, num_samples, batch_size
fit_args – a dictionary passed as kwargs each time model.fit(…) is called. Typically, this is model- or framework-specific; so, eg., different arguments would be appropriate for pytorch models, DeepChem models, etc.
n_runs – the number of overall runs (each starting from a random initial selection) to do (for averaging)
random_seed – random seed for most RNG generation
split_seed – random seed for shuffling and splitting of data
test_size – the size of the test/validation set to take from X,y. If test_size >= 1, then takes that many samples. if 0 < test_size < 1, takes that fraction of the dataset size.
stop_samples – if this is not None, stops an experiment run when this many samples are labelled. Defaults to None
stop_rmse – if this is not None, stops an experiment run when this RMSE has been reached. Defaults to None
stop_frac – if this is not None, stops an experiment run when the RMSE has moved this fraction of the way from the RMSE after the first round to the RMSE trained on the whole dataset. We suggest something like .85, if you want to use this feature. Defaults to None
alien.benchmarks.uncertainty_metrics module
- alien.benchmarks.uncertainty_metrics.KL_divergence(preds, std_devs, y, noise=None, normalize=True)[source]
Computes the KL-divergence from the predicted distribution (using preds and std_devs, assuming normal distributions) to the ground-truth distribution (all of the probability mass on the true y values). In other words, this tells you how much information would be gained by learning the true values. Averaged over sample points.
Lower is generally better. Not only does this penalize uncertainties which are poorly calibrated (i.e., it penalizes uncertainties where the actual error distribution has a different standard deviation), but also it penalizes uncertainties which are not as specific as they could be, i.e., which fail to discriminate between certain and uncertain predictions.
- Parameters:
preds – predicted values
std_devs – predicted uncertainties, given as standard deviations
y – true values
normalize – if True (the default), normalizes with respect to the RMSE
- alien.benchmarks.uncertainty_metrics.best_multiple(preds, std_devs, y, noise=None, max_precision=5)[source]
Does a simple binary search to find which multiple of std_devs gives the lowest KL-divergence score. To converge, assumes there is only one local minimum. (I expect this to be true, but I will have to check.)
Arguments preds, std_devs, y and noise are as in KL_divergence, except this will compute the KL-divergence for multiples of std_devs.
- Parameters:
max_precision – the number of interval splits the score must be adjacent to before returning the value. Defaults to 5.
- alien.benchmarks.uncertainty_metrics.binary_optimize(fn, *args, mode='max', start=1.0, max_precision=5, max_iterations=50, **kwargs)[source]
Does a simple binary search to find which scalar value gives the best (max/min) value of fn. Optimization converges to a local max/min.
Each search iteration starts with the previously-explored value with the best score, and looks on either side of it. If the best score is at the beginning of the current list of value, it looks at half this value on the low side; if at the end of the list, looks at twice this value on the high side. If the best value is somewhere in the middle, it divides the interval on either side in half.
- Parameters:
max_precision – the number of interval splits the score must be adjacent to before returning the value. Defaults to 5.