Evaluator

BerryDBRAGEvaluator.eval(test_params, metrics_names, metrics_args=None, metrics_processor=None)

This method evaluates the given test cases using the specified metrics.

Parameters:

  • test_params (Dict[str, Any]): A dictionary containing the test parameters. This can include

    1. The dataset used for testing, under the key "test_data".
    2. The name of the test suite, under the key "test_suite_name". (Optional)
    3. In case you have multiple runs against the same test suite, you can use the run_name to differentiate. The name of the run should be under the key "run_name". (Optional)
    
  • metrics_names (Union[str, List[str]]): The names of the metrics to be used for evaluation. Redundant metrics are flatten in the code. Metrics can be passed in any of the following ways:

    1. A single metric name as a string.
        OR
    2. Multiple metrics as a list of individual metric names.
        OR
    3. A single metrics collection name.
        OR
    4. Multiple metrics collections as a list of individual metric names.
        OR
    5. Combination of metrics and metrics collection name/s as a list
    
  • metrics_args Optional[Dict[str, Any]] = None: Metric parameters like the threshold, the model to use for the evaluation, whether to include a reason or not can be passed as a Dict.

  • metrics_processor A custom metrics processor function. If not provided, the default metrics processor will be used which upserts all the metrics to EvalMetricsDB.

Returns:

  • Dict: Returns the resultant metrics