Evaluator¶

BerryDBRAGEvaluator.eval(test_params, metrics_names, metrics_args=None, metrics_processor=None)¶

This method evaluates the given test cases using the specified metrics.

Parameters:

test_params (Dict[str, Any]): A dictionary containing the test parameters. This can include

The dataset used for testing, under the key "test_data".
The name of the test suite, under the key "test_suite_name". (Optional)
In case you have multiple runs against the same test suite, you can use the run_name to differentiate. The name of the run should be under the key "run_name". (Optional)

metrics_names (Union[str, List[str]]): The names of the metrics to be used for evaluation. Redundant metrics are flatten in the code. Metrics can be passed in any of the following ways:

1. A single metric name as a string.
    OR
2. Multiple metrics as a list of individual metric names.
    OR
3. A single metrics collection name.
    OR
4. Multiple metrics collections as a list of individual metric names.
    OR
5. Combination of metrics and metrics collection name/s as a list

metrics_args Optional[Dict[str, Any]] = None: Metric parameters like the threshold, the model to use for the evaluation, whether to include a reason or not can be passed as a Dict.
metrics_processor A custom metrics processor function. If not provided, the default metrics processor will be used which upserts all the metrics to EvalMetricsDB.

Returns:

Dict: Returns the resultant metrics