PoC Results

Topology-Inspired Distributed AI Optimizer Performance Verification Results

CompletedPoC

Core KPI (Essential Metrics)

Experiment Conditionp50(ms)p95(ms)p99(ms)TTFT(ms)E2E(p50, ms)tokens/s/GPUmJ/tokenW(avg)GPU Util(%)$/1M tokQualityΔ(%)SLOCompliance(%)
Llama-70B, Batch=4, Concurrent=8, Prompt=512tokens, vLLM458915623681,2470.12285870.34+0.10%98.20
Llama-70B, Batch=8, Concurrent=16, Prompt=1024tokens, vLLM6713419831981,1560.14312920.38+0.20%96.80

Comparative Experiment: Same Environment Baseline vs Optimizer

Experiment ConditionBaseline p95(ms)Optimizer p95(ms)Δ p95(%)Baseline p99(ms)Optimizer p99(ms)Δ p99(%)Baseline mJ/tokenOptimizer mJ/tokenΔ mJ/token(%)
Llama-70B, Batch=4, Concurrent=8, Prompt=512tokens, vLLM891560.12
Llama-70B, Batch=8, Concurrent=16, Prompt=1024tokens, vLLM1341980.14

Calculation Notes

  • mJ/token = ∫Power(W)dt ÷ Output Token Count × 1e3 (표기 단위 mJ)
  • $/1M tokens = (GPU·Time Cost + Power Cost + Others) ÷ (Total Tokens/1e6)
  • Scaling Efficiency(%) = (N-GPU Throughput ÷ 1-GPU Throughput) ÷ N × 100

Note: 비용 산정 가정은 별도 문서/시트로 연결 권장.

Evidence Artifacts

Data Artifacts

  • Benchmark logs (CSV/JSON)
  • Power logs and metrics
  • Reproducible Docker/notebooks
  • Experiment settings and parameters

Document Artifacts

  • Model/proof overview
  • Comparative experiment graphs
  • Reproducible code snippets
  • Performance analysis reports

Detailed experiment data and reproduction methods are provided upon inquiry.

Contact Us