cardinal.clustering¶

Classes¶

class cardinal.clustering.GMMLikelihoodSampler(clustering, batch_size)[source]¶

GM (Gaussian Mixture Models) based query sampler. In order to increase diversity, it is possible to use maximum likelihood to select samples.

Parameters

clustering – A clustering algorithm matching the sklearn interface
batch_size – Number of samples to draw when predicting.

clustering_¶: The fitted clustering estimator.

fit(X, y=None) → GMMLikelihoodSampler[source]¶

Does nothing, this method is unsupervised.

Parameters

X – Labeled samples of shape (n_samples, n_features).
y – Labels of shape (n_samples).

Returns

The object itself

select_samples(X: array) → array[source]¶

Clusters the samples and select the ones with highest likelihood.

Parameters

X – Pool of unlabeled samples of shape (n_samples, n_features).
sample_weight – Weight of the samples of shape (n_samples), optional.

Returns

Indices of the selected samples of shape (batch_size).

class cardinal.clustering.GMMSampler(batch_size, **gmm_args)[source]¶

Select samples as highest likelihood to GMM clusters.

Parameters: batch_size – Number of samples to draw when predicting.

class cardinal.clustering.IncrementalMiniBatchKMeansSampler(batch_size, **kmeans_args)[source]¶

Select samples as closest sample to MiniBatchKMeans centroids.

Parameters: batch_size – Number of samples to draw when predicting.

fit(X, y=None) → KCentroidSampler[source]¶

Does nothing, this method is unsupervised.

Parameters

X – Labeled samples of shape (n_samples, n_features).
y – Labels of shape (n_samples).

Returns

The object itself

select_samples(X: array, sample_weight: Optional[array] = None, recenter_every=None) → array[source]¶

Clusters the samples and select the ones closest to centroids.

Parameters

X – Pool of unlabeled samples of shape (n_samples, n_features).
sample_weight – Weight of the samples of shape (n_samples), optional.

Returns

Indices of the selected samples of shape (batch_size).

class cardinal.clustering.KCenterGreedy(embedding_fun, batch_size, metric='euclidean')[source]¶

KCenter greedy query sampler. Select the furthest sample from already select ones, add it to the selected, and repeat until batch_size is reached.

Parameters: batch_size – Number of samples to draw when predicting.

fit(X, y=None) → KCenterGreedy[source]¶

Does nothing, this method is unsupervised.

Parameters

X – Labeled samples of shape (n_samples, n_features).
y – Labels of shape (n_samples).

Returns

The object itself

select_samples(X: array, sample_weight: Optional[array] = None) → array[source]¶

Clusters the samples and select the ones closest to centroids.

Parameters

X – Pool of unlabeled samples of shape (n_samples, n_features).
sample_weight – Weight of the samples of shape (n_samples), optional.

Returns

Indices of the selected samples of shape (batch_size).

class cardinal.clustering.KCentroidSampler(clustering, batch_size)[source]¶

KCentroid based query sampler. In order to increase diversity, it is possible to use a centroid based clustering to select samples.

Parameters

clustering – A clustering algorithm matching the sklearn interface
batch_size – Number of samples to draw when predicting.

clustering_¶: The fitted clustering estimator.

fit(X, y=None) → KCentroidSampler[source]¶

Does nothing, this method is unsupervised.

Parameters

X – Labeled samples of shape (n_samples, n_features).
y – Labels of shape (n_samples).

Returns

The object itself

select_samples(X: array, sample_weight: Optional[array] = None) → array[source]¶

Clusters the samples and select the ones closest to centroids.

Parameters

X – Pool of unlabeled samples of shape (n_samples, n_features).
sample_weight – Weight of the samples of shape (n_samples), optional.

Returns

Indices of the selected samples of shape (batch_size).

class cardinal.clustering.KMeansSampler(batch_size, **kmeans_args)[source]¶

Select samples as closest sample to KMeans centroids.

Parameters: batch_size – Number of samples to draw when predicting.

class cardinal.clustering.MiniBatchKMeansSampler(batch_size, **kmeans_args)[source]¶

Select samples as closest sample to MiniBatchKMeans centroids.

Parameters: batch_size – Number of samples to draw when predicting.

class cardinal.clustering.TwoStepGMMSampler(beta: int, classifier, batch_size: int, assume_fitted: bool = False, verbose: int = 0, **gmm_args)[source]¶

GMM sampler using a margin uncertainty sampler as preselector

fit(X: array, y: Optional[array] = None) → TwoStepGMMSampler[source]¶

Fits the first query sampler

Parameters

X – Labeled samples of shape [n_samples, n_features].
y – Labels of shape [n_samples].

Returns

The object itself

select_samples(X: array) → array[source]¶

Selects the using uncertainty preselection and KMeans sampler.

Parameters

X – Pool of unlabeled samples of shape (n_samples, n_features).
sample_weight – Weight of the samples of shape (n_samples), optional.

Returns

Indices of the selected samples of shape (batch_size).

class cardinal.clustering.TwoStepIWKMeansSampler(beta: int, classifier, batch_size: int, assume_fitted: bool = False, verbose: int = 0, **kmeans_args)[source]¶

class cardinal.clustering.TwoStepKCentroidSampler(kcentroid_sampler, beta: int, classifier, batch_size: int, assume_fitted: bool = False, verbose: int = 0, **kmeans_args)[source]¶

KMeans sampler using a margin uncertainty sampler as preselector

fit(X: array, y: array = None) → TwoStepKMeansSampler[source]¶

Fits the first query sampler

Parameters

X – Labeled samples of shape [n_samples, n_features].
y – Labels of shape [n_samples].

Returns

The object itself

select_samples(X: array, sample_weight: Optional[array] = None) → array[source]¶

Selects the using uncertainty preselection and KMeans sampler.

Parameters

X – Pool of unlabeled samples of shape (n_samples, n_features).
sample_weight – Weight of the samples of shape (n_samples), optional.

Returns

Indices of the selected samples of shape (batch_size).