Introduction to Evaluation Scenarios

Accuracy Evaluation

Service-Oriented Accuracy Evaluation

Function Description: Evaluate the prediction accuracy of a model deployed as a service on specific datasets.
Requirements: The model has been deployed, and its actual service capabilities need to be tested.
Model Tasks and Dataset Tasks Supported by This Scenario:
- Model Tasks: 📚 Service-Oriented Inference Backend
- Dataset Tasks: 📚 Open-Source Datasets and 📚 Custom Datasets

After selecting the model task and dataset task according to your usage needs, refer to the document for detailed usage of this scenario: 📚 Service-Oriented Accuracy Evaluation Guide

Pure Model Accuracy Evaluation

Function Description: Evaluate the accuracy of locally loaded models (non-service-oriented) on different datasets.
Requirements: Offline model weights and a deployment environment.
Supported Items:
- Model Tasks: 📚 Local Model Backend
- Dataset Tasks: 📚 Open-Source Datasets and 📚 Custom Datasets

After selecting the model task and dataset task according to your usage needs, refer to the document for detailed usage of this scenario: 📚 Pure Model Accuracy Evaluation Guide

Performance Evaluation

Service-Oriented Performance Evaluation

Function Description: Evaluate the operational efficiency (throughput, latency) of a service model in a real deployment environment.
Requirements: The model inference service must support access via a streaming interface.
Supported Items:
- Model Tasks: Streaming interface types in 📚 Service-Oriented Inference Backend
- Dataset Tasks: All data types in 📚 Supported Dataset Types
Note: The cache size occupied by performance evaluation is proportional to the context length of requests and the number of requests, so it usually increases positively with the evaluation duration.

After selecting the model task and dataset task according to your usage needs, refer to the document for detailed usage of this scenario: 📚 Service-Oriented Performance Evaluation Guide