To install this model locally in the shortest time, opt for a direct curl execution.
Please follow the instructions listed below to get started.
The tool automatically synchronizes and downloads the model database.
To save you time, the system will automatically determine efficient resource allocation.
The Qwen3-ASR-0.6B model is a compact speech recognition system designed for realâtime transcription across multiple languages. It contains 0.6â¯billion parameters, striking a balance between accuracy and onâdevice deployment feasibility. The architecture leverages efficient attention mechanisms to achieve low inference latency, making it suitable for realâtime applications. A dedicated languageâagnostic encoder enables robust performance on languages not commonly represented in largeâscale datasets. The modelâs lightweight footprint is highlighted in the comparison table below, which outlines key metrics such as parameter count, word error rate, and inference time.
| Metric | Value |
|---|---|
| Parameters | 0.6â¯B |
| Word Error Rate | 6.2% |
| Inference Latency | 12â¯ms |