- table_evaluator.metrics.column_correlations(dataset_a: DataFrame, dataset_b: DataFrame, categorical_columns: list[str] | None, theil_u=True)
Column-wise correlation calculation between
.- Parameters:
dataset_a (pd.DataFrame) – First DataFrame
dataset_b (pd.DataFrame) – Second DataFrame
categorical_columns (list[str]) – The columns containing categorical values
theil_u (bool) – Whether to use Theil’s U. If False, use Cramer’s V.
- Returns:
Mean correlation between all columns.
- Return type:
- table_evaluator.metrics.euclidean_distance(y_true: ndarray | Series, y_pred: ndarray | Series) float
Returns the euclidean distance between y_true and y_pred.
- Parameters:
y_true (numpy.ndarray) – The ground truth values.
y_pred (numpy.ndarray) – The predicted values.
- Returns:
The mean absolute error.
- Return type:
- table_evaluator.metrics.jensenshannon_distance(colname: str, real_col: Series, fake_col: Series, bins: int = 25) Dict[str, Any]
Calculate the Jensen-Shannon distance between real and fake data columns.
This function bins the data, calculates probability distributions, and then computes the Jensen-Shannon distance between these distributions.
- Parameters:
colname (str) – Name of the column being analyzed.
real_col (pd.Series) – Series containing the real data.
fake_col (pd.Series) – Series containing the fake data.
bins (int, optional) – Number of bins to use for discretization. Defaults to 25.
- Returns:
- A dictionary containing:
’col_name’: Name of the column.
’js_distance’: The calculated Jensen-Shannon distance.
- Return type:
Dict[str, Any]
The number of bins is capped at the length of the real column to avoid empty bins.
- table_evaluator.metrics.js_distance_df(real: DataFrame, fake: DataFrame, numerical_columns: List[str]) DataFrame
Calculate Jensen-Shannon distances between real and fake data for numerical columns.
This function computes the Jensen-Shannon distance for each numerical column in parallel using joblib’s Parallel and delayed functions.
- Parameters:
real (pd.DataFrame) – DataFrame containing the real data.
fake (pd.DataFrame) – DataFrame containing the fake data.
numerical_columns (List[str]) – List of column names to compute distances for.
- Returns:
- A DataFrame with column names as index and Jensen-Shannon
distances as values.
- Return type:
- Raises:
AssertionError – If the columns in real and fake DataFrames are not identical.
- table_evaluator.metrics.kolmogorov_smirnov_test(col_name: str, real_col: Series, fake_col: Series) Dict[str, Any]
Perform Kolmogorov-Smirnov test on real and fake data columns.
- Parameters:
col_name (str) – Name of the column being tested.
real_col (pd.Series) – Series containing the real data.
fake_col (pd.Series) – Series containing the fake data.
- Returns:
- A dictionary containing:
’col_name’: Name of the column.
’statistic’: The KS statistic.
’p-value’: The p-value of the test.
’equality’: ‘identical’ if p-value > 0.01, else ‘different’.
- Return type:
Dict[str, Any]
- table_evaluator.metrics.mean_absolute_error(y_true: ndarray, y_pred: ndarray) floating[Any]
Returns the mean absolute error between y_true and y_pred.
- Parameters:
y_true – NumPy.ndarray with the ground truth values.
y_pred – NumPy.ndarray with the ground predicted values.
- Returns:
Mean absolute error (float).
- table_evaluator.metrics.mean_absolute_percentage_error(y_true: ndarray | Series, y_pred: ndarray | Series)
Returns the mean absolute percentage error between y_true and y_pred. Throws ValueError if y_true contains zero values.
- Parameters:
y_true (numpy.ndarray) – The ground truth values.
y_pred (numpy.ndarray) – The predicted values.
- Returns:
Mean absolute percentage error.
- Return type:
- table_evaluator.metrics.rmse(y_true: ndarray | Series, y_pred: ndarray | Series) ndarray | Series
Returns the root mean squared error between y_true and y_pred.
- Parameters:
y_true – NumPy.ndarray with the ground truth values.
y_pred – NumPy.ndarray with the ground predicted values.
- Returns:
root mean squared error (float).