Hacker News

new|past|comments|ask|show|jobs

points by Tepix 3 hours ago | hide | 0 comments

Sounds good. I saw that you use the FP8 version of the model. Do you also quantize the KV cache?

Guidelines|FAQ|Lists|API|Security|Legal|Apply to YC|Contact

Search: