cardinal.uncertainty

Functions

cardinal.uncertainty.confidence_score(classifier, X: ndarray) ndarray[source]

Measure the confidence score of a model for a set of samples.

Parameters
  • classifier – The classifier for which the labels are to be queried.

  • X – The pool of samples to query from.

Returns

The confidence score for each sample.

cardinal.uncertainty.entropy_score(classifier, X: ndarray) ndarray[source]

Entropy sampling query strategy, uses entropy of all probabilities as score.

This strategy selects the samples with the highest entropy in their prediction probabilities.

Parameters
  • classifier – The classifier for which the labels are to be queried.

  • X – The pool of samples to query from.

  • n_instances – Number of samples to be queried.

Returns

The entropy score for each label

cardinal.uncertainty.margin_score(classifier, X: ndarray) ndarray[source]

Compute the difference between the two top probability classes for each sample.

This strategy takes the probabilities of top two classes and uses their difference as a score for selection.

Parameters
  • classifier – The classifier for which the labels are to be queried.

  • X – The pool of samples to query from.

Returns

The margin score for each sample.

Classes

class cardinal.uncertainty.ConfidenceSampler(classifier, batch_size: int, strategy: str = 'top', assume_fitted: bool = False, verbose: int = 0)[source]

Selects samples with lowest prediction confidence.

Lowest confidence sampling looks at the probability of the class predicted by the classifier and selects the samples where this probability is the lowest.

Parameters
  • classifier – Classifier used to determine the prediction confidence. The object must comply with scikit-learn interface and expose a predict_proba method.

  • batch_size – Number of samples to draw when predicting.

  • assume_fitted – If true, classifier is not refit

  • verbose – The verbosity level. Defaults to 0.

classifier_

The fitted classifier.

fit(X: array, y: array) ConfidenceSampler[source]

Fit the estimator on labeled samples.

Parameters
  • X – Labeled samples of shape (n_samples, n_features).

  • y – Labels of shape (n_samples).

Returns

The object itself

score_samples(X: array) array[source]

Selects the samples to annotate from unlabeled data.

Parameters

X – shape (n_samples, n_features), Samples to evaluate.

Returns

The score of each sample according to lowest confidence estimation.

class cardinal.uncertainty.EntropySampler(classifier, batch_size: int, strategy: str = 'top', assume_fitted: bool = False, verbose: int = 0)[source]

Selects samples with greatest entropy among all class probabilities.

Greatest entropy sampling measures the uncertainty of the model over all classes through the entropy of the probabilites of all classes. Highest entropy samples are selected.

Parameters
  • classifier – Classifier used to determine the prediction confidence. The object must comply with scikit-learn interface and expose a predict_proba method.

  • batch_size – Number of samples to draw when predicting.

  • assume_fitted – If true, classifier is not refit

  • verbose – The verbosity level. Defaults to 0.

classifier_

The fitted classifier.

fit(X: array, y: array) EntropySampler[source]

Fit the estimator on labeled samples.

Parameters
  • X – Labeled samples of shape (n_samples, n_features).

  • y – Labels of shape (n_samples).

Returns

The object itself

score_samples(X: array) array[source]

Selects the samples to annotate from unlabeled data.

Parameters

X – shape (n_samples, n_features), Samples to evaluate.

Returns

The score of each sample according to lowest confidence estimation.

class cardinal.uncertainty.MarginSampler(classifier, batch_size: int, strategy: str = 'top', assume_fitted: bool = False, verbose: int = 0)[source]

Selects samples with greatest confusion between the top two classes.

Smallest margin sampling uses the difference of predicted probability between the top two classes to select the samples on which the model is hesitating the most, hence the lowest difference.

Parameters
  • classifier – Classifier used to determine the prediction confidence. The object must comply with scikit-learn interface and expose a predict_proba method.

  • batch_size – Number of samples to draw when predicting.

  • assume_fitted – If true, classifier is not refit

  • verbose – The verbosity level. Defaults to 0.

classifier_

The fitted classifier.

fit(X: array, y: array) MarginSampler[source]

Fit the estimator on labeled samples.

Parameters
  • X – Labeled samples of shape (n_samples, n_features).

  • y – Labels of shape (n_samples).

Returns

The object itself

score_samples(X: array) array[source]

Selects the samples to annotate from unlabeled data.

Parameters

X – shape (n_samples, n_features), Samples to evaluate.

Returns

The score of each sample according to lowest confidence estimation.