cardinal.clustering¶
Classes¶
- class cardinal.clustering.GMMLikelihoodSampler(clustering, batch_size)[source]¶
GM (Gaussian Mixture Models) based query sampler. In order to increase diversity, it is possible to use maximum likelihood to select samples.
- Parameters
clustering – A clustering algorithm matching the sklearn interface
batch_size – Number of samples to draw when predicting.
- clustering_¶
The fitted clustering estimator.
- fit(X, y=None) GMMLikelihoodSampler [source]¶
Does nothing, this method is unsupervised.
- Parameters
X – Labeled samples of shape (n_samples, n_features).
y – Labels of shape (n_samples).
- Returns
The object itself
- select_samples(X: array) array [source]¶
Clusters the samples and select the ones with highest likelihood.
- Parameters
X – Pool of unlabeled samples of shape (n_samples, n_features).
sample_weight – Weight of the samples of shape (n_samples), optional.
- Returns
Indices of the selected samples of shape (batch_size).
- class cardinal.clustering.GMMSampler(batch_size, **gmm_args)[source]¶
Select samples as highest likelihood to GMM clusters.
- Parameters
batch_size – Number of samples to draw when predicting.
- class cardinal.clustering.IncrementalMiniBatchKMeansSampler(batch_size, **kmeans_args)[source]¶
Select samples as closest sample to MiniBatchKMeans centroids.
- Parameters
batch_size – Number of samples to draw when predicting.
- fit(X, y=None) KCentroidSampler [source]¶
Does nothing, this method is unsupervised.
- Parameters
X – Labeled samples of shape (n_samples, n_features).
y – Labels of shape (n_samples).
- Returns
The object itself
- select_samples(X: array, sample_weight: Optional[array] = None, recenter_every=None) array [source]¶
Clusters the samples and select the ones closest to centroids.
- Parameters
X – Pool of unlabeled samples of shape (n_samples, n_features).
sample_weight – Weight of the samples of shape (n_samples), optional.
- Returns
Indices of the selected samples of shape (batch_size).
- class cardinal.clustering.KCenterGreedy(embedding_fun, batch_size, metric='euclidean')[source]¶
KCenter greedy query sampler. Select the furthest sample from already select ones, add it to the selected, and repeat until batch_size is reached.
- Parameters
batch_size – Number of samples to draw when predicting.
- fit(X, y=None) KCenterGreedy [source]¶
Does nothing, this method is unsupervised.
- Parameters
X – Labeled samples of shape (n_samples, n_features).
y – Labels of shape (n_samples).
- Returns
The object itself
- select_samples(X: array, sample_weight: Optional[array] = None) array [source]¶
Clusters the samples and select the ones closest to centroids.
- Parameters
X – Pool of unlabeled samples of shape (n_samples, n_features).
sample_weight – Weight of the samples of shape (n_samples), optional.
- Returns
Indices of the selected samples of shape (batch_size).
- class cardinal.clustering.KCentroidSampler(clustering, batch_size)[source]¶
KCentroid based query sampler. In order to increase diversity, it is possible to use a centroid based clustering to select samples.
- Parameters
clustering – A clustering algorithm matching the sklearn interface
batch_size – Number of samples to draw when predicting.
- clustering_¶
The fitted clustering estimator.
- fit(X, y=None) KCentroidSampler [source]¶
Does nothing, this method is unsupervised.
- Parameters
X – Labeled samples of shape (n_samples, n_features).
y – Labels of shape (n_samples).
- Returns
The object itself
- select_samples(X: array, sample_weight: Optional[array] = None) array [source]¶
Clusters the samples and select the ones closest to centroids.
- Parameters
X – Pool of unlabeled samples of shape (n_samples, n_features).
sample_weight – Weight of the samples of shape (n_samples), optional.
- Returns
Indices of the selected samples of shape (batch_size).
- class cardinal.clustering.KMeansSampler(batch_size, **kmeans_args)[source]¶
Select samples as closest sample to KMeans centroids.
- Parameters
batch_size – Number of samples to draw when predicting.
- class cardinal.clustering.MiniBatchKMeansSampler(batch_size, **kmeans_args)[source]¶
Select samples as closest sample to MiniBatchKMeans centroids.
- Parameters
batch_size – Number of samples to draw when predicting.
- class cardinal.clustering.TwoStepGMMSampler(beta: int, classifier, batch_size: int, assume_fitted: bool = False, verbose: int = 0, **gmm_args)[source]¶
GMM sampler using a margin uncertainty sampler as preselector
- fit(X: array, y: Optional[array] = None) TwoStepGMMSampler [source]¶
Fits the first query sampler
- Parameters
X – Labeled samples of shape [n_samples, n_features].
y – Labels of shape [n_samples].
- Returns
The object itself
- select_samples(X: array) array [source]¶
Selects the using uncertainty preselection and KMeans sampler.
- Parameters
X – Pool of unlabeled samples of shape (n_samples, n_features).
sample_weight – Weight of the samples of shape (n_samples), optional.
- Returns
Indices of the selected samples of shape (batch_size).
- class cardinal.clustering.TwoStepIWKMeansSampler(beta: int, classifier, batch_size: int, assume_fitted: bool = False, verbose: int = 0, **kmeans_args)[source]¶
- class cardinal.clustering.TwoStepKCentroidSampler(kcentroid_sampler, beta: int, classifier, batch_size: int, assume_fitted: bool = False, verbose: int = 0, **kmeans_args)[source]¶
KMeans sampler using a margin uncertainty sampler as preselector
- fit(X: array, y: array = None) TwoStepKMeansSampler [source]¶
Fits the first query sampler
- Parameters
X – Labeled samples of shape [n_samples, n_features].
y – Labels of shape [n_samples].
- Returns
The object itself
- select_samples(X: array, sample_weight: Optional[array] = None) array [source]¶
Selects the using uncertainty preselection and KMeans sampler.
- Parameters
X – Pool of unlabeled samples of shape (n_samples, n_features).
sample_weight – Weight of the samples of shape (n_samples), optional.
- Returns
Indices of the selected samples of shape (batch_size).