ALP#
- class frlearn.data_descriptors.ALP(dissimilarity: str or float or Callable[[np.array], float] or Callable[[np.array, np.array], float] = 'boscovich', k: int or Callable[[int], float] or None = <function log_multiple.<locals>._f>, l: int or Callable[[int], float] or None = <function log_multiple.<locals>._f>, scale_weights: Callable[[int], np.array] | None = LinearWeights(), localisation_weights: Callable[[int], np.array] | None = LinearWeights(), nn_search: NeighbourSearchMethod = <frlearn.neighbours.neighbour_search_methods.KDTree object>, max_array_size: int = 67108864, preprocessors=(<frlearn.statistics.feature_preprocessors.IQRNormaliser object>,))#
Implementation of the Average Localised Proximity (ALP) data descriptor [1]. Expresses the proximity of a query instance to the target data, by localising its nearest neighbour distances against the local nearest neighbour distances in the target data.
- Parameters:
- dissimilarity: str or float or (np.array -> float) or ((np.array, np.array) -> float) = ‘boscovich’
The dissimilarity measure to use.
A vector size measure
np.array -> float
induces a dissimilarity measure through application toy - x
. A float is interpreted as Minkowski size with the corresponding value forp
. For convenience, a number of popular measures can be referred to by name.The default is the Boscovich norm (also known as cityblock, Manhattan or taxicab norm).
- kint or (int -> float) or None = 5.5 * log n
How many nearest neighbour distances / localised proximities to consider. Corresponds to the scale at which proximity is evaluated. Should be either a positive integer, or a function that takes the target class size
n
and returns a float, or None, which is resolved asn
. All such values are rounded to the nearest integer in[1, n]
.- lint or (int -> float) or None = 6 * log n
How many nearest neighbours to use for determining the local ith nearest neighbour distance, for each
i <= k
. Lower values correspond to more localisation. Should be either a positive integer, or a function that takes the target class sizen
and returns a float, or None, which is resolved asn
. All such values are rounded to the nearest integer in[1, n]
.- scale_weights(int -> np.array) or None = LinearWeights()
Weights to use for calculating the soft maximum of localised proximities. Determines to which extent scales with high localised proximity are emphasised.
- localisation_weights(int -> np.array) or None = LinearWeights()
Weights to use for calculating the local ith nearest neighbour distance, for each
i <= k
. Determines to which extent nearer neighbours dominate.- max_array_sizeint = 2**26
Maximum array size to use. For a query set of size
q
, calculating local distances requires an array of size[q, l, k]
, which can be too large to fit in memory. If the size of this array is larger thanmax_array_size
, a query set is batch-processed, which is slower. TODO: determine maximum array size dynamically, investigate lowering float precision- preprocessorsiterable = (IQRNormaliser(), )
Preprocessors to apply. The default interquartile range normaliser rescales all features to ensure that they all have the same interquartile range.
Notes
k
andl
are the two principal hyperparameters that can be tuned to increase performance. Its default values are based on the empirical evaluation in [1].References
- class Model#