#1Gemma 4 12B: A unified, encoder-free multimodal model
Google released Gemma 4 12B, an open-weight multimodal model that processes both images and text without a separate encoder — a notable architectural choice that simplifies deployment. It's designed to run on consumer-grade hardware for local inference, putting serious vision-language capability in reach for developers without cloud budgets.


