User Configuration Parameters

AISBench Benchmark supports customizing the inference mode and evaluation process through two methods: Command Line Interface (CLI) Parameters and Configuration Constant File.

Command Line Parameters

The basic calling format for command line parameters [OPTIONS] is as follows:

ais_bench [OPTIONS]

Parameter Description

Based on the execution scenario, command line parameters are divided into three categories:

Common Parameters
Accuracy Evaluation Parameters (effective only when --mode is set to all, infer, eval, or viz)
Performance Evaluation Parameters (effective only when --mode is set to perf or perf_viz)

Accuracy Evaluation Parameters take effect only when the --mode parameter is specified as "all", "infer", "eval", "viz". Performance Evaluation Parameters take effect only when the --mode parameter is specified as "perf", "perf_viz". Common Parameters are not restricted by the task execution mode and can be specified in all modes.

Common Parameters

Applicable to all modes and can be used in combination with accuracy or performance parameters.

Parameter	Description	Example
`--models`	Specifies the name of the model inference backend task (corresponding to a pre-implemented default model configuration file under the path `ais_bench/benchmark/configs/models`). Multiple task names are supported; this parameter is mutually exclusive with the `config` parameter. For details, refer to 📚 Supported Models	`--models vllm_api_general`
`--datasets`	Specifies the name of the dataset task (corresponding to a pre-implemented default dataset configuration file under the path `ais_bench/benchmark/configs/datasets`). Multiple dataset names are supported; this parameter is mutually exclusive with the `config` parameter. For details, refer to 📚 Supported Dataset Types	`--datasets gsm8k_gen`
`--summarizer`	Specifies the name of the result summary task (corresponding to a pre-implemented default configuration file under the path `ais_bench/benchmark/configs/summarizers`). For details, refer to 📚 Supported Result Summary Tasks	`--summarizer medium`
`--mode` or `-m`	Running mode, optional values: `all`, `infer`, `eval`, `viz`, `perf`, `perf_viz`; default value is `all`. For details, refer to 📚 Running Mode Description.	`--mode infer` `-m all`
`--reuse` or `-r`	Specifies the timestamp in an existing working directory to continue execution and overwrite original results. Combined with the value of the `--mode` parameter, it can be used to resume interrupted inference, or perform accuracy calculation and visualization result printing based on existing inference results. If no parameter is added, the latest timestamp under `--work-dir` is automatically selected.	`--reuse 20250126_144254` `-r 20250126_144254`
`--work-dir` or `-w`	Specifies the evaluation working directory for saving output results. The default path is `outputs/default`.	`--work-dir /path/to/work` `-w /path/to/work`
`--config-dir`	The folder path where the configuration files for `models`, `datasets`, and `summarizers` are stored. The default path is `ais_bench/benchmark/configs`.	`--config-dir /xxx/xxx`
`--debug`	Enables Debug mode. This parameter is enabled if configured, and disabled if not configured; it is disabled by default. In Debug mode, all logs are printed directly to the terminal.	`--debug`
`--dry-run`	Enables Dry Run mode (only prints logs to the screen without actually running the task). This parameter is enabled if configured, and disabled if not configured; it is disabled by default.	`--dry-run`
`--max-workers-per-gpu`	Reserved parameter, not supported temporarily.	`--max-workers-per-gpu 1`
`--merge-ds`	Enables merged inference for datasets of the same type (running multiple datasets for the same task together).	`--merge-ds`

Accuracy Evaluation Parameters

Effective only when the mode is all, infer, eval, or viz.

Parameter	Description	Example
`--max-num-workers`	Number of parallel tasks, range `[1, number of CPU cores]`, default value is `1`. Invalid in Continuous Batch or performance mode.	`--max-num-workers 2`
`--dump-eval-details`	Switch to enable dumping details of the evaluation process. Enabled if this parameter is configured, disabled if not; disabled by default.	`--dump-eval-details`
`--dump-extract-rate`	Switch to enable dumping evaluation speed data. Enabled if this parameter is configured, disabled if not; disabled by default.	`--dump-extract-rate`
`--disable-cb`	Disables Continuous Batch inference (effective only for service-oriented API-type models). Disabled if this parameter is configured, enabled if not; enabled by default. When CB is enabled, multiple processes run concurrently, with a maximum concurrency limit of 500 per process. After disabling, single-process mode is restored, and `--max-num-workers` takes effect.	`--disable-cb`

Performance Evaluation Parameters

Effective only when the mode is perf or perf_viz.

Parameter	Description	Example
`--num-prompts`	Specifies the number of data samples for dataset evaluation. A positive integer must be entered. If the value exceeds the total number of dataset samples or no value is specified, the entire dataset is used for evaluation.	`--num-prompts 500`
`--pressure`	Switch to enable performance pressure testing mode. Effective only when `--mode perf` is set. Enabled if this parameter is configured, disabled if not; disabled by default. For details on pressure testing, refer to 📚 Enabling Steady-State Testing with Stress Testing.	`--pressure`

Configuration Constant File Parameters

Some global constants are not restricted by task type, and it is recommended to keep their default values. If customization is required, edit the constant file: global_consts.py for configuration.

The currently supported parameter configurations are as follows:

Parameter Name	Description	Value Range / Requirements
`WORKERS_NUM`	Number of processes used for sending requests. The default value is 0, which means automatic allocation based on the maximum number of concurrent requests configured by the user.	[0, number of CPU cores]
`CUSTOM_PACKAGE_DIR`	Specifies the directory path of custom Python packages. The Benchmark tool will load user-defined packages from this directory.	Must be a local path accessible to the user, pointing to the folder containing custom packages
`PRESSURE_TIME`	Duration of pressure testing, effective only when `--pressure` mode is specified. Unit: seconds.	`[1, 86400]` (i.e., 1 second to 24 hours)
`CONNECTION_ADD_RATE`	Concurrent thread creation rate. Represents the number of new concurrent threads added per second until the maximum concurrency limit is reached. Effective only when `--pressure` mode is specified.	`> 0.1` (Unit: threads per second)
`MAX_CHUNK_SIZE`	Maximum cache size for a single chunk returned by the streaming inference model backend. The default value is 65535 bytes (64KB).	`(0, 16777216]` (Unit: Byte)
`REQUEST_TIME_OUT`	Timeout period for the client to wait for a response after sending a request. The default value is None, meaning infinite waiting (always waiting for the model to return results).	`None` or `>0` (Unit: seconds)