#1Show HN: Needle — We Distilled Gemini Tool Calling into a 26M Model
Cactus Compute distilled Google's Gemini 3.1 into a tiny 26-million-parameter model called Needle that specializes in function calling — extracting tool invocations from natural language on resource-constrained devices like phones and watches. It outperforms much larger models like FunctionGemma-270m and Qwen-0.6B on single-shot tool use tasks, hitting 6,000 tokens/second prefill and 1,200 tokens/second decode. Weights are open on Hugging Face.