What's the strongest AI model you can train on a laptop in five minutes?

ingve | 556 points

Optimized small model training is not only important for availability but also for the scientific study of LLMs. It’s like the use of simple organisms like yeast for biological studies - we also need to study the simplest possible transformers that exhibit behaviors of interest from the larger models if we hope to ever understand LLMs and have more control over their behavior.

jebarker | 2 days ago

Instead of time it should be energy. What is the best model you can train with a given budget in Joules. Then the MBP and the H100 are on a more even footing.

zarzavat | 2 days ago

Let the AI efficiency olympics begin!

On a laptop, on a desktop, on a phone?

Train for 5 minutes, an hour, a day, a week?

On a boat? With a goat?

aniijbod | 2 days ago

> Paris, France is a city in North Carolina. It is the capital of North Carolina, which is officially major people in Bhugh and Pennhy. The American Council Mastlandan, is the city of Retrea. There are different islands, and the city of Hawkeler: Law is the most famous city in The Confederate. The country is Guate.

I love the phrase "officially major people"! I wonder how it could be put to use in everyday speech?

LorenDB | 2 days ago

I suspect one can go a lot further by adopting some tweaks from the GPT-2 speedrun effort [0], at minimum Muon, better init and carefully tuning learning rate.

[0]: https://github.com/KellerJordan/modded-nanogpt

tootyskooty | 2 days ago

Feels like there should be value in building smaller, more specialized models - maybe even doing so on-demand. I don’t always want a model that knows Polish and astrophysics and Shakespeare, I want one that runs really fast and is laser-focused on the domain that I’m working on.

I want to be able to say to a large general purpose LLM: “write a script that trains a model that is optimized for <useful task>” and then run that model.

Edit: well gosh darn. Within the edit window for this comment, Google goes and launches Gemma 3 270M.

jl6 | 2 days ago

AI is a broad term, the zero-to-hero series by Karpathy trains one in a Jupyter notebook. You can make some pretty powerful networks to de-duplicate database rows right in your laptop too. Data de-duplication and general MDM is pretty useful in large businesses.

chasd00 | 2 days ago

I like this scenario for a future James Bond movie. Bond has to have an AI in chat pretend to be him to stall the bad guys while he is sneaking around the back, but the state of the art Bond persona bot that Q gave him in its own hardware enclosure has been smashed up in the previous fight scene.

Bond has only minutes to train a strong enough AI model to pretend to be him and fool his targets long enough for him to gain entry to their impregnable fortress. Can he do it?!?

bryanrasmussen | 2 days ago

The most powerful Macbook Pro currently has 16 CPU cores, 40 GPU cores, and 128 GB of RAM (and a 16-core “neural engine” specifically designed to accelerate machine learning). Technically, it is a laptop, but it could just as well be a computer optimized for AI.

l5870uoo9y | 2 days ago

I looked up the most expensive laptop with an RTX 5090: https://marketplace.nvidia.com/en-us/consumer/gaming-laptops...

$5599.00 https://marketplace.nvidia.com/en-us/consumer/gaming-laptops...

Although you can get them with fewer specs and the same GPU for $3,899.99

https://marketplace.nvidia.com/en-us/consumer/gaming-laptops...

initramfs | 2 days ago
bbarnett | 2 days ago

This is evocative of “cramming”, a paper from a few years ago, where the author tried to find the best model they could train for a day on a modern laptop: https://arxiv.org/abs/2212.14034

lsb | 2 days ago

At which point is a simple markov chain same/better?

Aperocky | 2 days ago

But supposing you have a real specific need to train, is the training speed still relevant? Or do the resources spent on gathering and validating the data set dwarf the actual CPU/GPU usage?

nottorp | 2 days ago

Not the point of the exercise obviously, but at five minutes' training I wonder how this would compare to a Markov chain bot.

wowczarek | 2 days ago

I love seeing explorations like this, which highlight that easily accessible hardware can do better than most people think with modern architectures. For many novel scientific tasks, you really don't need an H100 to make progress using deep learning over classical methods.

hodgehog11 | 2 days ago

"Paris, France is a city in North Carolina. It is the capital of North Carolina."

If only we had a technology that didn't hallucinate and reported "I don't know". Then small models would be far more useful. Part of the need for insanely huge LLM models is to get coverage so broad that they don't have to make up stuff.

It would be nice to be able to train a customer service bot on a laptop in a reasonable length of time. But it will screw up badly outside its area of competence, which will happen frequently.

Animats | 2 days ago

This is awesome - thanks for sharing. Appreciate the small-scale but comprehensive studies testing out different architectures, model sizes and datasets.

Would be curious to see a version of your model size comparison chart but letting the training continue until perplexity plateaus / begins to overfit. For example: are your larger models performing worse because they are overfitting to a small dataset, or because you are comparing model sizes at a fixed 5 minute computation time - so that the large models just don't get to learn very much in that time.

(Also interesting would be learning curve comparisons between architecture/param count)

highfrequency | 2 days ago

If only AI models are trained to connect to data (sql) and use that to answer some of the questions using data source instead of just train on them, it could reduce model size a lot.

iamgopal | 2 days ago

I'd be happy with an AI that can just "train" on me: Just see what I do, learn from the repetitive tasks I do, and then do them quicker. An agent that is basically me x 10.

Start blank with no corporate-controlled/crippled state and just become me.

In fact, that might be the only way to let computers appear to grow faster into the future, even if their internal hardware only gets minor incremental improvements: Have your shit done before you sit down to do it.

Razengan | 2 days ago

AI is sorely lacking a demoscene

jarmitage | 2 days ago
[deleted]
| 2 days ago

The bigger question or may be even realization is that with this architecture there is no way to build a capable model to run on the laptop or phone, which means there will never be local compute and servers became ever more important. In general thinking about how ML itself works, reducing model size while retaining capability will just never happen.

yalogin | 2 days ago

Am I missing where the GitHub link is for this, or did the author not release sources? It'd be fun to reproduce this on a different machine, and play around with other architectures and optimizers that weren't mentioned in the article...

remexre | 2 days ago

Here's an Obfuscated C Contest entry that trains a toy model using LSTM:

https://www.ioccc.org/2019/mills/index.html

I suppose if you only have 5 minutes this is probably about the level you'd get.

hnfong | 2 days ago

Readers: I'm looking for toy, quick AI exercises that can be trained on a laptop, and help the doer increase their confidence in AI concepts (learning by doing, and all that).

The OP fits the bill.

If you can suggest other such exercises, please share in reply to this post.

Thank you.

profsummergig | 2 days ago

The idea of tracking and optimizing this reminds me of similar efforts a few years ago especially for image models via DAWNBench.

https://dawnd9.sites.stanford.edu/dawnbench

jasonjmcghee | 2 days ago

What about overnight on a desktop with a higher-end Nvidia gaming GPU? Asking for a friend.

indoordin0saur | 2 days ago

How far can you go by improving the curriculum? Start simple. Find a shorter and shorter sequence of examples that gives you thd best result. What is the shortest sequence to get to some perplexity? Why?

raindear | 2 days ago

On a laptop, you can fine-tune or train small transformer models or classical ML models, but large-scale deep learning models require GPUs, hours, or cloud resources.

business_liveit | 2 days ago

It is difficult to make it small and powerful at the same time. Unless there is a major technological breakthrough, the L3 to L5

Sharon_Q | 2 days ago

Probably something like a small logistic regression or a tiny GPT-2 variant (117M parameters) on a small dataset—anything beyond that will choke on RAM, VRAM, or time. Five minutes on a laptop = toy models, not miracles.

fontsgenerator | 2 days ago

I feel like such a 2-million-parameter model might make a much better keyboard autocorrect or IDE autocomplete than what I'm using now.

kragen | 2 days ago

Depends on how much weight you can support on your lap

quux | 2 days ago

Any reason to upgrade an M2 16GB macbook to a M4 ..GB (or 2026 M5) for local LLMs? Due an upgrade soon and perhaps it is educational to run these things more easily locally?

mhogers | 2 days ago

A trick that would be useful would be to start with an existing model instead of trying to generate it from a random starting place.

charcircuit | 2 days ago

An idea worth exploring: if specialized models on datasets can be trained quickly, it can be used as tools by bigger models.

simianwords | 2 days ago
[deleted]
| 2 days ago

I'd be interested in what implementation of D3PM was used (and failed). Diffusion model are more data efficient than their AR LLM counterpart but les compute efficient at training time, so it'd be interesting to know whether with more time.to.converge the diffusion approach does succeed. I guess I'll try :)

pilooch | 2 days ago

Would have been useful to see exact steps taken to replicate the result.

andrewstuart | 2 days ago

This would be more interesting if it wasn't about (L)LMs

panarchy | 2 days ago

I would've liked to see some xlstms

erikqu | 2 days ago

Now imagine what you could do in 6 minutes!

But honestly I really like the short turnaround times. Makes it easy to experiment with different parameters and develop an intuition for what they do.

yunusabd | 2 days ago

Which laptop, though?

pjmlp | 2 days ago

There was https://sortbenchmark.org, and now we need a similar for AI - best per joule, per 1 cent, per minute.

trhway | 2 days ago

Siri.

dileeparanawake | 2 days ago

You could train an unbeatable tic-tac-toe ai on your laptop in five minutes. It doesn’t get any stronger than that.

I know, I know. I’m intentionally misinterpreting the OP’s clear intent (the stuff of comedy). And normally a small joke like this wouldn’t be worth the downvotes…

But, I think there’s a deeper double meaning in this brave new world of prompt engineering. Most chat isn’t all that precise without some level of assumed shared context:

These days the meaning of the phrase ai has changed from the classical definition (all algorithms welcome), and now ai usually means LLMs and their derivatives.

schaefer | 2 days ago

Right now, Qwen3 4B

fswd | 2 days ago

[flagged]

heyoleftycunts | 2 days ago

[dead]

evrennetwork | 2 days ago

Thanks.

lamuswawir | 2 days ago

The best LLM on the planet right now is Gemini Pro 2.5 and Gemini Flash 2.5, nothing comes close to these.

Once you setup a good system prompt on these, nothing really compares.

Most of the models you see with high benchmarks are not even comparable on real tasks.

qwen3 or deepseek r1, they aren't even 1/10 as good as Gemini Pro2.5

faangguyindia | 2 days ago