mealy.error_analyzer¶
Classes¶
- class mealy.error_analyzer.ErrorAnalyzer(primary_model, feature_names=None, param_grid=None, probability_threshold=0.5, random_state=65537)[source]¶
ErrorAnalyzer analyzes the errors of a prediction model on a test set.
It uses model predictions and ground truth target to compute the model errors on the test set. It then trains a Decision Tree, called a Error Analyzer Tree, on the same test set by using the model error as target. The nodes of the decision tree are different segments of errors to be studied individually.
- Parameters
primary_model (sklearn.base.BaseEstimator or sklearn.pipeline.Pipeline) – a sklearn model to analyze. Either an estimator or a Pipeline containing a ColumnTransformer with the preprocessing steps and an estimator as last step.
feature_names (list of str) – list of feature names. Defaults to None.
param_grid (dict) – sklearn.tree.DecisionTree hyper-parameters values for grid search.
random_state (int) – random seed.
- _error_tree¶
the estimator used to train the Error Analyzer Tree
- Type
DecisionTreeClassifier
- evaluate(X, y, output_format='str')[source]¶
Evaluate performance of ErrorAnalyzer on the given test data and labels. Return ErrorAnalyzer summary metrics regarding the Error Tree.
- Parameters
X (numpy.ndarray or pandas.DataFrame) – feature data from a test set to evaluate the primary predictor and train a Error Analyzer Tree.
y (numpy.ndarray or pandas.DataFrame) – target data from a test set to evaluate the primary predictor and train a Error Analyzer Tree.
output_format (string) – Return format used for the report. Valid values are ‘dict’ or ‘str’. Defaults to ‘str’.
- Returns
dictionary or string report storing different metrics regarding the Error Decision Tree.
- Return type
- fit(X, y)[source]¶
Fit the Error Analyzer Tree.
Trains the Error Analyzer Tree, a Decision Tree to discriminate between samples that are correctly predicted or wrongly predicted (errors) by a primary model.
- Parameters
X (numpy.ndarray or pandas.DataFrame) – feature data from a test set to evaluate the primary predictor and train a Error Analyzer Tree.
y (numpy.ndarray or pandas.DataFrame) – target data from a test set to evaluate the primary predictor and train a Error Analyzer Tree.
- get_error_leaf_summary(leaf_selector=None, add_path_to_leaves=False, output_format='dict', rank_by='total_error_fraction')[source]¶
Return summary information regarding leaves.
- Parameters
leaf_selector (None, int or array-like) – The leaves whose information will be returned * int: Only return information of the leaf with the corresponding id * array-like: Only return information of the leaves corresponding to these ids * None (default): Return information of all the leaves
add_path_to_leaves (bool) – Whether to add information of the path across the tree till the selected node. Defaults to False.
output_format (string) – Return format used for the report. Valid values are ‘dict’ or ‘str’. Defaults to ‘dict’.
rank_by (str) – Ranking criterion for the leaves. Valid values are: * ‘total_error_fraction’ (default): rank by the fraction of total error in the node * ‘purity’: rank by the purity (ratio of wrongly predicted samples over the total number of node samples) * ‘class_difference’: rank by the difference of number of wrongly and correctly predicted samples in a node.
- Returns
list of reports (as dictionary or string) with different information on each selected leaf.
- Return type