Topology-Inspired Distributed AI Optimizer Performance Verification Results
| Experiment Condition | p50(ms) | p95(ms) | p99(ms) | TTFT(ms) | E2E(p50, ms) | tokens/s/GPU | mJ/token | W(avg) | GPU Util(%) | $/1M tok | QualityΔ(%) | SLOCompliance(%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Llama-70B, Batch=4, Concurrent=8, Prompt=512tokens, vLLM | 45 | 89 | 156 | 23 | 68 | 1,247 | 0.12 | 285 | 87 | 0.34 | +0.10% | 98.20 |
| Llama-70B, Batch=8, Concurrent=16, Prompt=1024tokens, vLLM | 67 | 134 | 198 | 31 | 98 | 1,156 | 0.14 | 312 | 92 | 0.38 | +0.20% | 96.80 |
| Experiment Condition | Baseline p95(ms) | Optimizer p95(ms) | Δ p95(%) | Baseline p99(ms) | Optimizer p99(ms) | Δ p99(%) | Baseline mJ/token | Optimizer mJ/token | Δ mJ/token(%) |
|---|---|---|---|---|---|---|---|---|---|
| Llama-70B, Batch=4, Concurrent=8, Prompt=512tokens, vLLM | — | 89 | — | — | 156 | — | — | 0.12 | — |
| Llama-70B, Batch=8, Concurrent=16, Prompt=1024tokens, vLLM | — | 134 | — | — | 198 | — | — | 0.14 | — |
Note: 비용 산정 가정은 별도 문서/시트로 연결 권장.
Detailed experiment data and reproduction methods are provided upon inquiry.
Contact Us