|
- info
- AI engineer
- github repo
- LLM inference & ML systems
- contributing to vLLM (& llm-compressor), SGLang, ONNX Runtime — so far on quantization, SD, attention kernels (my PRs)
- Pinned Posts
- 2026-04-21 torch.compile 탐구생활
- 2025-12-02 Int4 to FP16 dequantization optimization
- 2024-09-27 Matryoshka Representation Learning Review
- all posts
- 2026-04-21 torch.compile 탐구생활
- 2026-02-07 FFT on GPU
- 2025-12-28 Data_format
- 2025-12-02 Int4 to FP16 dequantization optimization
- See all...
- categories
- 공사중...
- TIP
- Working....