cardinal.clustering

Classes

class cardinal.clustering.GMMLikelihoodSampler(clustering, batch_size)[source]

GM (Gaussian Mixture Models) based query sampler. In order to increase diversity, it is possible to use maximum likelihood to select samples.

Parameters
  • clustering – A clustering algorithm matching the sklearn interface

  • batch_size – Number of samples to draw when predicting.

clustering_

The fitted clustering estimator.

fit(X, y=None) GMMLikelihoodSampler[source]

Does nothing, this method is unsupervised.

Parameters
  • X – Labeled samples of shape (n_samples, n_features).

  • y – Labels of shape (n_samples).

Returns

The object itself

select_samples(X: array) array[source]

Clusters the samples and select the ones with highest likelihood.

Parameters
  • X – Pool of unlabeled samples of shape (n_samples, n_features).

  • sample_weight – Weight of the samples of shape (n_samples), optional.

Returns

Indices of the selected samples of shape (batch_size).

class cardinal.clustering.GMMSampler(batch_size, **gmm_args)[source]

Select samples as highest likelihood to GMM clusters.

Parameters

batch_size – Number of samples to draw when predicting.

class cardinal.clustering.IncrementalMiniBatchKMeansSampler(batch_size, **kmeans_args)[source]

Select samples as closest sample to MiniBatchKMeans centroids.

Parameters

batch_size – Number of samples to draw when predicting.

fit(X, y=None) KCentroidSampler[source]

Does nothing, this method is unsupervised.

Parameters
  • X – Labeled samples of shape (n_samples, n_features).

  • y – Labels of shape (n_samples).

Returns

The object itself

select_samples(X: array, sample_weight: Optional[array] = None, recenter_every=None) array[source]

Clusters the samples and select the ones closest to centroids.

Parameters
  • X – Pool of unlabeled samples of shape (n_samples, n_features).

  • sample_weight – Weight of the samples of shape (n_samples), optional.

Returns

Indices of the selected samples of shape (batch_size).

class cardinal.clustering.KCenterGreedy(embedding_fun, batch_size, metric='euclidean')[source]

KCenter greedy query sampler. Select the furthest sample from already select ones, add it to the selected, and repeat until batch_size is reached.

Parameters

batch_size – Number of samples to draw when predicting.

fit(X, y=None) KCenterGreedy[source]

Does nothing, this method is unsupervised.

Parameters
  • X – Labeled samples of shape (n_samples, n_features).

  • y – Labels of shape (n_samples).

Returns

The object itself

select_samples(X: array, sample_weight: Optional[array] = None) array[source]

Clusters the samples and select the ones closest to centroids.

Parameters
  • X – Pool of unlabeled samples of shape (n_samples, n_features).

  • sample_weight – Weight of the samples of shape (n_samples), optional.

Returns

Indices of the selected samples of shape (batch_size).

class cardinal.clustering.KCentroidSampler(clustering, batch_size)[source]

KCentroid based query sampler. In order to increase diversity, it is possible to use a centroid based clustering to select samples.

Parameters
  • clustering – A clustering algorithm matching the sklearn interface

  • batch_size – Number of samples to draw when predicting.

clustering_

The fitted clustering estimator.

fit(X, y=None) KCentroidSampler[source]

Does nothing, this method is unsupervised.

Parameters
  • X – Labeled samples of shape (n_samples, n_features).

  • y – Labels of shape (n_samples).

Returns

The object itself

select_samples(X: array, sample_weight: Optional[array] = None) array[source]

Clusters the samples and select the ones closest to centroids.

Parameters
  • X – Pool of unlabeled samples of shape (n_samples, n_features).

  • sample_weight – Weight of the samples of shape (n_samples), optional.

Returns

Indices of the selected samples of shape (batch_size).

class cardinal.clustering.KMeansSampler(batch_size, **kmeans_args)[source]

Select samples as closest sample to KMeans centroids.

Parameters

batch_size – Number of samples to draw when predicting.

class cardinal.clustering.MiniBatchKMeansSampler(batch_size, **kmeans_args)[source]

Select samples as closest sample to MiniBatchKMeans centroids.

Parameters

batch_size – Number of samples to draw when predicting.

class cardinal.clustering.TwoStepGMMSampler(beta: int, classifier, batch_size: int, assume_fitted: bool = False, verbose: int = 0, **gmm_args)[source]

GMM sampler using a margin uncertainty sampler as preselector

fit(X: array, y: Optional[array] = None) TwoStepGMMSampler[source]

Fits the first query sampler

Parameters
  • X – Labeled samples of shape [n_samples, n_features].

  • y – Labels of shape [n_samples].

Returns

The object itself

select_samples(X: array) array[source]

Selects the using uncertainty preselection and KMeans sampler.

Parameters
  • X – Pool of unlabeled samples of shape (n_samples, n_features).

  • sample_weight – Weight of the samples of shape (n_samples), optional.

Returns

Indices of the selected samples of shape (batch_size).

class cardinal.clustering.TwoStepIWKMeansSampler(beta: int, classifier, batch_size: int, assume_fitted: bool = False, verbose: int = 0, **kmeans_args)[source]
class cardinal.clustering.TwoStepKCentroidSampler(kcentroid_sampler, beta: int, classifier, batch_size: int, assume_fitted: bool = False, verbose: int = 0, **kmeans_args)[source]

KMeans sampler using a margin uncertainty sampler as preselector

fit(X: array, y: array = None) TwoStepKMeansSampler[source]

Fits the first query sampler

Parameters
  • X – Labeled samples of shape [n_samples, n_features].

  • y – Labels of shape [n_samples].

Returns

The object itself

select_samples(X: array, sample_weight: Optional[array] = None) array[source]

Selects the using uncertainty preselection and KMeans sampler.

Parameters
  • X – Pool of unlabeled samples of shape (n_samples, n_features).

  • sample_weight – Weight of the samples of shape (n_samples), optional.

Returns

Indices of the selected samples of shape (batch_size).