Introduction to Evaluation Scenarios
Accuracy Evaluation
Service-Oriented Accuracy Evaluation
Function Description: Evaluate the prediction accuracy of a model deployed as a service on specific datasets.
Requirements: The model has been deployed, and its actual service capabilities need to be tested.
Model Tasks and Dataset Tasks Supported by This Scenario:
Model Tasks: 📚 Service-Oriented Inference Backend
Dataset Tasks: 📚 Open-Source Datasets and 📚 Custom Datasets
After selecting the model task and dataset task according to your usage needs, refer to the document for detailed usage of this scenario: 📚 Service-Oriented Accuracy Evaluation Guide
Pure Model Accuracy Evaluation
Function Description: Evaluate the accuracy of locally loaded models (non-service-oriented) on different datasets.
Requirements: Offline model weights and a deployment environment.
Supported Items:
Model Tasks: 📚 Local Model Backend
Dataset Tasks: 📚 Open-Source Datasets and 📚 Custom Datasets
After selecting the model task and dataset task according to your usage needs, refer to the document for detailed usage of this scenario: 📚 Pure Model Accuracy Evaluation Guide
Performance Evaluation
Service-Oriented Performance Evaluation
Function Description: Evaluate the operational efficiency (throughput, latency) of a service model in a real deployment environment.
Requirements: The model inference service must support access via a streaming interface.
Supported Items:
Model Tasks: Streaming interface types in 📚 Service-Oriented Inference Backend
Dataset Tasks: All data types in 📚 Supported Dataset Types
Note: The cache size occupied by performance evaluation is proportional to the context length of requests and the number of requests, so it usually increases positively with the evaluation duration.
After selecting the model task and dataset task according to your usage needs, refer to the document for detailed usage of this scenario: 📚 Service-Oriented Performance Evaluation Guide