Plots

table_evaluator.plots.cdf(data_r, data_f, xlabel: str = 'Values', ylabel: str = 'Cumulative Sum', ax=None, show: bool = True)

Plot continous density function on optionally given ax. If no ax, cdf is plotted and shown.

Parameters:
  • data_r (pd.Series) – Series with real data.

  • data_f (pd.Series) – Series with fake data.

  • xlabel (str) – Label to put on the x-axis.

  • ylabel (str) – Label to put on the y-axis.

  • ax (matplotlib.axes.Axes | None) – The axis to plot on. If None, a new figure is created.

  • show (bool) – Whether to display the plot. Defaults to True.

Returns:

The axis with the plot if show is False, otherwise None.

Return type:

matplotlib.axes.Axes | None

table_evaluator.plots.plot_correlation_comparison(evaluators: List, annot: bool = False, show: bool = False)

Plot the correlation differences of multiple TableEvaluator objects.

Parameters:
  • evaluators (List[TableEvaluator]) – List of TableEvaluator objects.

  • annot (bool) – Whether to annotate the plots with numbers.

table_evaluator.plots.plot_correlation_difference(real: DataFrame, fake: DataFrame, plot_diff: bool = True, cat_cols: list | None = None, annot: bool = False, fname: str | None = None, show: bool = True)

Plot the association matrices for the real dataframe, fake dataframe and plot the difference between them. Has support for continuous and Categorical (Male, Female) data types. All Object and Category dtypes are considered to be Categorical columns if dis_cols is not passed.

  • Continuous - Continuous: Uses Pearson’s correlation coefficient

  • Continuous - Categorical: Uses so called correlation ratio (https://en.wikipedia.org/wiki/Correlation_ratio) for both continuous - categorical and categorical - continuous.

  • Categorical - Categorical: Uses Theil’s U, an asymmetric correlation metric for Categorical associations

Parameters:
  • real (pd.DataFrame) – DataFrame with real data.

  • fake (pd.DataFrame) – DataFrame with synthetic data.

  • plot_diff (bool) – Plot difference if True, else not.

  • cat_cols (Optional[List[str]]) – List of Categorical columns.

  • annot (bool) – Whether to annotate the plot with numbers indicating the associations.

table_evaluator.plots.plot_mean_std(real: DataFrame, fake: DataFrame, ax=None, fname=None, show: bool = True)

Plot the means and standard deviations of each dataset.

Parameters:
  • real – DataFrame containing the real data

  • fake – DataFrame containing the fake data

  • ax – Axis to plot on. If none, a new figure is made.

  • fname – If not none, saves the plot with this file name.

table_evaluator.plots.plot_mean_std_comparison(evaluators: List, show: bool = True)

Plot comparison between the means and standard deviations from each evaluator in evaluators.

Parameters:

evaluators – list of TableEvaluator objects that are to be evaluated.

table_evaluator.plots.plot_var_cor(x: DataFrame | ndarray, ax=None, return_values: bool = False, **kwargs) ndarray | None

Given a DataFrame, plot the correlation between columns. Function assumes all numeric continuous data. It masks the top half of the correlation matrix, since this holds the same values.

Decomissioned for use of the dython associations function.

Parameters:
  • x – Dataframe to plot data from

  • ax – Axis on which to plot the correlations

  • return_values – return correlation matrix after plotting

  • kwargs – Keyword arguments that are passed to sns.heatmap.

Returns:

If return_values=True, returns correlation matrix of x as np.ndarray