ALP#
- class frlearn.data_descriptors.ALP(dissimilarity: str or float or Callable[[np.array], float] or Callable[[np.array, np.array], float] = 'boscovich', k: int or Callable[[int], float] or None = <function log_multiple.<locals>._f>, l: int or Callable[[int], float] or None = <function log_multiple.<locals>._f>, scale_weights: Callable[[int], np.array] | None = LinearWeights(), localisation_weights: Callable[[int], np.array] | None = LinearWeights(), nn_search: NeighbourSearchMethod = <frlearn.neighbours.neighbour_search_methods.KDTree object>, max_array_size: int = 67108864, preprocessors=(<frlearn.statistics.feature_preprocessors.IQRNormaliser object>,))#
Implementation of the Average Localised Proximity (ALP) data descriptor [1]. Expresses the proximity of a query instance to the target data, by localising its nearest neighbour distances against the local nearest neighbour distances in the target data.
- Parameters:
- dissimilarity: str or float or (np.array -> float) or ((np.array, np.array) -> float) = ‘boscovich’
The dissimilarity measure to use.
A vector size measure
np.array -> floatinduces a dissimilarity measure through application toy - x. A float is interpreted as Minkowski size with the corresponding value forp. For convenience, a number of popular measures can be referred to by name.The default is the Boscovich norm (also known as cityblock, Manhattan or taxicab norm).
- kint or (int -> float) or None = 5.5 * log n
How many nearest neighbour distances / localised proximities to consider. Corresponds to the scale at which proximity is evaluated. Should be either a positive integer, or a function that takes the target class size
nand returns a float, or None, which is resolved asn. All such values are rounded to the nearest integer in[1, n].- lint or (int -> float) or None = 6 * log n
How many nearest neighbours to use for determining the local ith nearest neighbour distance, for each
i <= k. Lower values correspond to more localisation. Should be either a positive integer, or a function that takes the target class sizenand returns a float, or None, which is resolved asn. All such values are rounded to the nearest integer in[1, n].- scale_weights(int -> np.array) or None = LinearWeights()
Weights to use for calculating the soft maximum of localised proximities. Determines to which extent scales with high localised proximity are emphasised.
- localisation_weights(int -> np.array) or None = LinearWeights()
Weights to use for calculating the local ith nearest neighbour distance, for each
i <= k. Determines to which extent nearer neighbours dominate.- max_array_sizeint = 2**26
Maximum array size to use. For a query set of size
q, calculating local distances requires an array of size[q, l, k], which can be too large to fit in memory. If the size of this array is larger thanmax_array_size, a query set is batch-processed, which is slower. TODO: determine maximum array size dynamically, investigate lowering float precision- preprocessorsiterable = (IQRNormaliser(), )
Preprocessors to apply. The default interquartile range normaliser rescales all features to ensure that they all have the same interquartile range.
Notes
kandlare the two principal hyperparameters that can be tuned to increase performance. Its default values are based on the empirical evaluation in [1].References
- class Model#