mealy.error_visualizer¶

Classes¶

class mealy.error_visualizer.ErrorVisualizer(error_analyzer)[source]¶

ErrorVisualizer provides visual utilities to analyze the Error Tree in ErrorAnalyzer

Parameters: error_analyzer (ErrorAnalyzer) – fitted ErrorAnalyzer representing the performance of a primary model.

plot_error_tree(size=(50, 50))[source]¶

Plot the graph of the decision tree.

Parameters: size (tuple) – size of the output plot.
Returns: graph of the Error Analyzer Tree.
Return type: graphviz.Source

plot_feature_distributions_on_leaves(leaf_selector=None, top_k_features=5, show_global=True, show_class=True, rank_leaves_by='total_error_fraction', nr_bins=10, figsize=(15, 10))[source]¶

Return feature distribution plots at the selected leaves.

The leaves for which the distributions are plotted are determined by the leaf_selector argument. By default, no specific leaves are selected, and so the distributions are plotted for all the leaves. The leaves are ranked following a criterion set via the argument rank_leaves_by.

The features are sorted by feature importance in the Error Tree. The more important a feature is, the more correlated with the errors it is. The number of feature distributions to plot is set via top_k_features.

Parameters

leaf_selector (None, int or array-like) – the leaves whose information will be returned * int: Only plot the feature distributions for the leaf matching the id * array-like of int: Only plot the feature distributions for the leaves matching the ids * None (default): Plot the feature distributions for all the leaves
top_k_features (int) – Number of features to plot per node. * If a positive integer k is given, the distributions of the first k features (first in the sense of their importance) are plotted * If a negative integer k is given, the distributions of all but the k last features (last in the sense of their importance) are plotted * If k is 0, all the feature distributions are plotted
show_global (bool) – Whether to plot the feature distributions for the whole data (global baseline) along with the ones for the leaf samples.
show_class (bool) – Whether to show the proportion of Wrongly and Correctly predicted samples for each bin.
rank_leaves_by (str) – Ranking criterion for the leaves. Valid values are: * ‘total_error_fraction’: rank by the fraction of total error in the node * ‘purity’: rank by the purity (ratio of wrongly predicted samples over the total number of node samples) * ‘class_difference’: rank by the difference of number of wrongly and correctly predicted samples in a node.
nr_bins (int) – Number of bins in the feature distribution plots. Defaults to 10.
figsize (tuple of float) – Tuple of size 2 for the size of the plots as (width, height) in inches. Defaults to (15, 10).