Running AISBench with a Custom Configuration File
The standard command invocation method for AISBench specifies the model task via --models, the dataset task via --datasets, and the result presentation task via --summarizer to run an evaluation task. Additionally, AISBench supports specifying a custom configuration file that combines the configuration information of these three types of tasks, enabling the execution of custom task combinations.
Usage Instructions
ais_bench ais_bench/configs/{model_type}_examples/{task_config_filename}
# Example:
ais_bench ais_bench/configs/api_examples/infer_vllm_api_general.py
Example of Using a Custom Configuration File
Editing the Example Content
The following example demonstrates how to evaluate the performance of two service interfaces (v1/chat/completions and v1/completions) on the GSM8K and MATH datasets. Refer to the sample file: demo_infer_vllm_api.py:
from mmengine.config import read_base
from ais_bench.benchmark.partitioners import NaivePartitioner
from ais_bench.benchmark.runners.local_api import LocalAPIRunner
from ais_bench.benchmark.tasks import OpenICLInferTask
from ais_bench.benchmark.models import VLLMCustomAPIChat
with read_base():
from ais_bench.benchmark.configs.summarizers.example import summarizer
from ais_bench.benchmark.configs.datasets.gsm8k.gsm8k_gen_0_shot_cot_str import gsm8k_datasets as gsm8k_0_shot_cot_str
from ais_bench.benchmark.configs.datasets.math.math500_gen_0_shot_cot_chat_prompt import math_datasets as math500_gen_0_shot_cot_chat
from ais_bench.benchmark.configs.models.vllm_api.vllm_api_general import models as vllm_api_general
# Use only a subset of samples for demo testing
gsm8k_0_shot_cot_str[0]['abbr'] = 'demo_' + gsm8k_0_shot_cot_str[0]['abbr']
gsm8k_0_shot_cot_str[0]['reader_cfg']['test_range'] = '[0:8]'
math500_gen_0_shot_cot_chat[0]['abbr'] = 'demo_' + math500_gen_0_shot_cot_chat[0]['abbr']
math500_gen_0_shot_cot_chat[0]['reader_cfg']['test_range'] = '[0:8]'
# Specify the dataset list; add different dataset configurations by concatenation
datasets = gsm8k_0_shot_cot_str + math500_gen_0_shot_cot_chat
# Specify the model configuration list
models = [
dict(
attr="service",
type=VLLMCustomAPIChat,
abbr='demo-vllm-api-general-chat',
path="",
model="",
request_rate = 0,
retry = 2,
host_ip = "localhost", # Specify the IP address of the inference service
host_port = 8080, # Specify the port of the inference service
max_out_len = 512,
batch_size=1,
generation_kwargs = dict(
temperature = 0.5,
top_k = 10,
top_p = 0.95,
seed = None,
repetition_penalty = 1.03,
)
)
]
work_dir = 'outputs/demo_api-vllm-general-chat/'
Executing the Custom Task Combination
After modifying the configuration file, run the following command to start the accuracy evaluation:
ais_bench ais_bench/configs/api_examples/demo_infer_vllm_api_general_chat.py
Output Results
dataset version metric mode demo-vllm-api-general-chat demo-vllm-api-general
----------------------- -------- -------- ----- -------------------------- ---------------------
demo_gsm8k 401e4c accuracy gen 62.50 62.50
demo_math_prm800k_500 c4b6f0 accuracy gen 50.00 62.50
List of Preset Custom Configuration File Samples
Filename |
Description |
|---|---|
Evaluates the |
|
Evaluates the |
|
Evaluates the |
|
Evaluates the |
|
Evaluates the |
|
Evaluates using the inference interface of a Hugging Face base model on the GSM8K dataset. The prompt format is a string, and the dataset path is customized. |
|
Evaluates using the inference interface of a Hugging Face chat model on the GSM8K dataset. The prompt format is a string, and the dataset path is customized. |
Note: To evaluate other datasets using the above custom configuration files, import additional datasets from ais_bench/configs/api_examples/all_dataset_configs.py.