Evaluator¶
- BerryDBRAGEvaluator.eval(test_params, metrics_names, metrics_args=None, metrics_processor=None)¶
This method evaluates the given test cases using the specified metrics.
Parameters:
test_params (Dict[str, Any]): A dictionary containing the test parameters. This can include
1. The dataset used for testing, under the key "test_data". 2. The name of the test suite, under the key "test_suite_name". (Optional) 3. In case you have multiple runs against the same test suite, you can use the run_name to differentiate. The name of the run should be under the key "run_name". (Optional)
metrics_names (Union[str, List[str]]): The names of the metrics to be used for evaluation. Redundant metrics are flatten in the code. Metrics can be passed in any of the following ways:
1. A single metric name as a string. OR 2. Multiple metrics as a list of individual metric names. OR 3. A single metrics collection name. OR 4. Multiple metrics collections as a list of individual metric names. OR 5. Combination of metrics and metrics collection name/s as a list
metrics_args Optional[Dict[str, Any]] = None: Metric parameters like the threshold, the model to use for the evaluation, whether to include a reason or not can be passed as a Dict.
metrics_processor A custom metrics processor function. If not provided, the default metrics processor will be used which upserts all the metrics to EvalMetricsDB.
Returns:
Dict: Returns the resultant metrics