mealy.error_visualizer

Classes

class mealy.error_visualizer.ErrorVisualizer(error_analyzer)[source]

ErrorVisualizer provides visual utilities to analyze the Error Tree in ErrorAnalyzer

Parameters

error_analyzer (ErrorAnalyzer) – fitted ErrorAnalyzer representing the performance of a primary model.

plot_error_tree(size=(50, 50))[source]

Plot the graph of the decision tree.

Parameters

size (tuple) – size of the output plot.

Returns

graph of the Error Analyzer Tree.

Return type

graphviz.Source

plot_feature_distributions_on_leaves(leaf_selector=None, top_k_features=5, show_global=True, show_class=True, rank_leaves_by='total_error_fraction', nr_bins=10, figsize=(15, 10))[source]

Return feature distribution plots at the selected leaves.

The leaves for which the distributions are plotted are determined by the leaf_selector argument. By default, no specific leaves are selected, and so the distributions are plotted for all the leaves. The leaves are ranked following a criterion set via the argument rank_leaves_by.

The features are sorted by feature importance in the Error Tree. The more important a feature is, the more correlated with the errors it is. The number of feature distributions to plot is set via top_k_features.

Parameters
  • leaf_selector (None, int or array-like) – the leaves whose information will be returned * int: Only plot the feature distributions for the leaf matching the id * array-like of int: Only plot the feature distributions for the leaves matching the ids * None (default): Plot the feature distributions for all the leaves

  • top_k_features (int) – Number of features to plot per node. * If a positive integer k is given, the distributions of the first k features (first in the sense of their importance) are plotted * If a negative integer k is given, the distributions of all but the k last features (last in the sense of their importance) are plotted * If k is 0, all the feature distributions are plotted

  • show_global (bool) – Whether to plot the feature distributions for the whole data (global baseline) along with the ones for the leaf samples.

  • show_class (bool) – Whether to show the proportion of Wrongly and Correctly predicted samples for each bin.

  • rank_leaves_by (str) – Ranking criterion for the leaves. Valid values are: * ‘total_error_fraction’: rank by the fraction of total error in the node * ‘purity’: rank by the purity (ratio of wrongly predicted samples over the total number of node samples) * ‘class_difference’: rank by the difference of number of wrongly and correctly predicted samples in a node.

  • nr_bins (int) – Number of bins in the feature distribution plots. Defaults to 10.

  • figsize (tuple of float) – Tuple of size 2 for the size of the plots as (width, height) in inches. Defaults to (15, 10).