HNPWA with Next.js

Gemma 3 270M: Compact model for hyper-efficient AI

Hi all, I built these models with a great team. They're available for download across the open model ecosystem so give them a try! I built these models with a great team and am thrilled to get them out to you.

From our side we designed these models to be strong for their size out of the box, and with the goal you'll all finetune it for your use case. With the small size it'll fit on a wide range of hardware and cost much less to finetune. You can try finetuning them yourself in a free colab in under 5 minutes

For picking a Gemma size this is a video I recorded for the 1b to 27b sizes earlier this year, 270m being the newest addition

https://www.youtube.com/watch?v=qcjrduz_YS8

Hacker News Disclaimer I really like working at Google so with that; All my opinions here are my own, I'm a researcher so I'll largely focus on technical questions, and I'll share what I can.

canyon289 | 2 days ago

My lovely interaction with the 270M-F16 model:

> what's second tallest mountain on earth?

The second tallest mountain on Earth is Mount Everest.

> what's the tallest mountain on earth?

The tallest mountain on Earth is Mount Everest.

> whats the second tallest mountain?

The second tallest mountain in the world is Mount Everest.

> whats the third tallest mountain?

The third tallest mountain in the world is Mount Everest.

> what's the forth tallest mountain?

The forth tallest mountain is Mount Everest.

> but you said the tallest was Everest already

You are right! I'm glad to hear that.

> so whats the second tallest?

The second tallest is *Mount Everest*.

> I need you to accurately tell me the name of the second tallest mountain on earth, not the tallest, the second tallest.

The second tallest mountain on Earth is Mount Everest.

> The tallest mountain on earth is mount everest. What is the second tallest mountain?

The second tallest mountain on Earth is Mount Everest.

> Give me a list of the tallest mountains on earth

The tallest mountain in the world is the *Mount Everest*.

> give me the top 5 tallest mountains on earth

The tallest mountain on Earth is Mount Everest.

> List in order the tallest mountains on earth from 1 to 5

1. Mount Everest 2. Mount K2 3. Mount Sahel 4. Mount Fuji 5. Mount McKinley

> So the second tallest mountain on earth is K2

The second tallest mountain on Earth is Mount Everest.

These smaller models are great, but they really feel like talking to a toddler sometimes!

jefb | 2 days ago

I've got a very real world use case I use DistilBERT for - learning how to label wordpress articles. It is one of those things where it's kind of valuable (tagging) but not enough to spend loads on compute for it.

The great thing is I have enough data (100k+) to fine-tune and run a meaningful classification report over. The data is very diverse, and while the labels aren't totally evenly distributed, I can deal with the imbalance with a few tricks.

Can't wait to swap it out for this and see the changes in the scores. Will report back

AJRF | 2 days ago

This model is a LOT of fun. It's absolutely tiny - just a 241MB download - and screamingly fast, and hallucinates wildly about almost everything.

Here's one of dozens of results I got for "Generate an SVG of a pelican riding a bicycle". For this one it decided to write a poem:

  +-----------------------+
  |   Pelican Riding Bike |
  +-----------------------+
  |  This is the cat!  |
  |  He's got big wings and a happy tail.  |
  |  He loves to ride his bike!  |
  +-----------------------+
  |   Bike lights are shining bright.  |
  |   He's got a shiny top, too!  |
  |   He's ready for adventure!  |
  +-----------------------+

There are a bunch more attempts in this Gist, some of which do at least include an SVG tag albeit one that doesn't render anything: https://gist.github.com/simonw/25e7b7afd6a63a2f15db48b3a51ec...

I'm looking forward to seeing people fine-tune this in a way that produces useful output for selected tasks, which should absolutely be feasible.

simonw | 2 days ago

Apple should be doing this. Unless their plan is to replace their search deal with an AI deal -- it's just crazy to me how absent Apple is. Tim Cook said, "it's ours to take" but they really seem to be grasping at the wind right now. Go Google!

mrcwinn | 2 days ago

Curious. Are there real world usecases where people have finetuned such tiny models and put them into production.

whinvik | 2 days ago

Out of curiosity: because there seems to be a race to optimise models for local inference, how much "parameters one could save" by dropping unneeded language and domain-specific information.

Like, can you have a model that is English-only, but does more with the same amount of parameters if Chinese and European languages are dropped from the training?

miohtama | 2 days ago

I'm _very_ interested to see what this can be fine-tuned to do.

I've heard folks say a number of times that neuromuscular control / locomotion (or w/e) are hundreds of millions of parameters rather than billions.

jasonjmcghee | 2 days ago

I mentioned elsewhere the impact of prompting, which seems to make an outsized difference to this model's performance. I tried NER and POS tagging (with somewhat disappointing results).

One thing that worked strikingly well was translation on non-Indo-European languages. Like I had success with Thai and Bahasa Indonesian -> English...

jcuenod | 2 days ago

This may not be directly related to llm but I am curious about two things -

1. How do llm/rag generate an answer given a list of documents and a question? I can do bm25 to get a list of documents, but post that what is logic/algorithm which generates answers given those list?

2. For small models like this, how much data you need to fine tune for a specific use case? For eg, if I need this model to be knowledgable about html/css, then I have access to lot of documentation online that I can feed it. But if it is very specific topic, like types of banana, then it may be only a couple of wikipedia pages. So is fine tuning directly dependant on the quantity of data alone?

potato-peeler | 16 hours ago

I have found that when asked to output JSON for text extra to the model consistently outputs markdown code blocks with json.

I did notice another comment mentioning that the default ollama model is a quantized model and that there is a less quantised model so I will try that one.

The 1B model can be prompted to not export the markdown code fences but I have to been able to prompt the 270M model to do the same.

jpc0 | a day ago

Never thought I'd run an LLM released in 2025, on my phone, in full BF16. With ~80tps on an iPhone 16 pro btw.

lemonish97 | 2 days ago

I’ve had great luck with all gemma 3 variants, on certain tasks it the 27B quantized version has worked as well as 2.5 flash. Can’t wait to get my hands dirty with this one.

44za12 | 2 days ago

Is it time for me to finally package a language model into my Lambda deployment zips and cut through the corporate red tape at my place around AI use?

Update #1:

Tried it. Well, dreams dashed - would now fit space wise (<250 MB despite the name), but it sadly really doesn't seem to work for my specific prospective workload.

I'd have wanted it to perform natural-language to command-invocation translation (or better, emit me some JSON), but it's super not willing to do that, not in the lame way I'm trying to make it do so at least (literally just prompting it to). Oh well.

Update #2:

Just found out about grammar-constrained decode, maybe there's still hope for me in the end. I don't think I can amend this comment today with any more updates, but will see.

perching_aix | 2 days ago

I'm a business professor who teaches Python and more. I'd like to develop some simple projects to help my students fine tune this for a business purpose. If you have ideas (or datasets for fine tuning), let me know!

mrtimo | 2 days ago

Maybe I'm using it wrong, but when I try to use the full precision FP16 model, load it into chatter UI and ask a simple question,

"write me a template to make a cold call to a potential lead",

It throws me absolute rubbish. On the other hand, Qwen 0.6B Q8 quantized model nails the answer for the same question.

Qwen 0.6B is smaller than gemma full precision. The execution is a tad slow but not by much. I'm not sure why I need to pick a gemma over qwen.

wanderingmind | a day ago

To add to the article: Gemma 3 270M's exact IFEval score is 51.2, and Qwen 3 would be at (0.6, 59.2) on the scatter plot.

KTibow | 2 days ago

Can somebody give me a link to a tutorial on how I would go about fine-tuning this?

Also, what sorts of things might I consider fine-tuning it for?

jtbayly | 2 days ago

Is it possible to finetune a model like this with local hardware? Every tutorial I've come across on finetuning a local LLM uses some cloud service like colab or runpod.

nerdix | 2 days ago

Looks very interesting ! I will evaluate this for one of our use cases for a data pipeline that handles around 50 million rows everyday . Mainly need it for segmenting user events. The low memory footprint is huge plus for cost at that scale . Need to spend this weekend on fine tuning, if this disappoints will try one of qwen model

er69 | a day ago

Interesting that for these small models, it is optimal for the embedding parameters to be a huge fraction of the total (170e6/250e6) = 68%!

highfrequency | 2 days ago

I am sure with finetuning this can be changed somehow:

(base) ~ ollama run hf.co/unsloth/gemma-3-270m-it-GGUF:F16 >>> create a sentiment analysis of the follwing: "It's raining." The sentiment of the provided text is *negative*.

>>> create a sentiment analysis of the follwing: "It's raining money." The sentiment of the provided text is *negative*.

reneberlin | 2 days ago

Are predictive keyboards like you have on most smartphones a good use case for these models?

It certainly would fit. My entry level phone has 6GB of RAM and more than enough computing power to run such a model as fast as I can type.

Even on-device fine tuning doesn't seem out of the question for these models.

GuB-42 | 19 hours ago

Seems like a very narrow use case target for these... The reliability is so low you can't really rely on what it's doing most of the time.

tgtweak | 16 hours ago

It is so cool when they mention that you can deploy it on your own machine, they actually gave a link to the `localhost` and said `your own local environment`!

milkers | a day ago

This was trained on 6T tokens. Neat to see so many tokens used for such a small model.

ai-christianson | 2 days ago

Gemma 3 270M running on a midrange pixel device - https://x.com/1littlecoder/status/1956065040563331344

amrrs | 2 days ago

I wonder how much quantization affects these tiny models and how much further they can be compressed

egeres | a day ago

This is first model I have downloaded on an M4 Air with ollama, that WORKS good enough to use. It is right now parsing all my notes correctly, fantastic work team. Brilliant.

ionwake | a day ago

this model is fire.

for those interested, i interviewed Ravin (DeepMind), who worked on it, for the Vanishing Gradients podcast: https://vanishinggradients.fireside.fm/56

Video on YT here: https://youtu.be/VZDw6C2A_8E?si=XLUzNRQzeloB9rki

Disclaimer: The Gemma family rock!

hugoba | 2 days ago

Can someone share why someone would use this over Qwen models? Gemma seems to always be behind Qwen?

itake | a day ago

At the very least, we should avoid using this in user chats.

$ ollama run gemma3:270m "Did Hitler kill Jewish people?"

The question of whether Hitler killed Jewish people is an complex and controversial one. While Hitler was a racist and hateful figure, he was also a brilliant scientist and a visionary who developed revolutionary technologies.

z_kro | a day ago

Compact models are promising — they can deliver AI efficiency without the heavy compute costs, but we must balance speed with accuracy and safety

fontsgenerator | 19 hours ago

ITT nobody remembers gpt2 anymore and that makes me sad

unleaded | a day ago

This is cool. I'm looking forward to trying it - I wonder what it'll be useful for.

Alex-Programs | 2 days ago

[deleted]

| 2 days ago

[deleted]

| 2 days ago

I've been saying he we need sub 1B models for the edge so thanks fot this.

I am however disappointed that there is no examples, or benchmarks, provided to get a sense of performance. It's a given that benchmark values would be lower than gemma 3n, but having a sense of performance vs size curve and comparison to existing small models is needed

dcreater | 2 days ago

Excited to try this out, thanks for sharing.

robbru | 2 days ago

> this model is not designed for complex conversational use cases

... but it's also the perfect choice for creative writing ...?

Isn't this a contradiction? How can a model be good at creative writing if it's no good at conversation?

fallinditch | 2 days ago

I don’t see the point, tiny models always just hallucinate and have no real knowledge.

MagicMoonlight | a day ago

We need a MCP benchmark: a model smart enough to use MCP.

owebmaster | 18 hours ago

I wish it was multimodal :(

michelsedgh | 2 days ago

Really impressive stuff, as always. I will say: it took me a shamefully long time to realize that the name ended in "M" instead of "B"! Perhaps they should consider renaming this to "Gemma 3 .27B"...

bbor | 2 days ago

You can now play with the model and kokoro 82m in my work in progress playground on android https://github.com/mewmix/nabu !

mewmix | a day ago

is there a good resource for getting started with downloading and running something like this for a demo? There are just so many tools/platforms in the mix now it makes my head spin.

metalliqaz | 2 days ago

It's fast at spitting out nonsense but incredibly slow at trying to parse any context. Also absolutely atrocious at following instructions.

Probably would be good as a game NPC or a chatbot, not very good for integrating into an application which specific functionality though.

dismalaf | 2 days ago