In regards to LLMs there is a collision between Apple's extremely good chip design capabilities and Apple's insistence that Siri never says anything that isn't 100% scripted and 100% certain to not bad. Up until now, they've chosen to limit Siri functionality rather than leave anything to chance.
LLMs will absolutely be able to run locally, but whether Apple will be able to stop worrying and love the model remains to be seen.
you should put [June 2022] in the title
I find it fascinating that this was released after the June 2022 launch of the M2 chipset and line of products, and yet Apple had no desire to show relative performance of M2 vs. M1 here - even in the simultaneous announcement here: https://machinelearning.apple.com/research/neural-engine-tra...
It's fascinating to me that at least one of two things is true: either (a) Apple has lost its ability to coordinate "hype" between its teams, or (b) the difference between comparable levels e.g. the M1 Max vs. the M2 Max are so negligible that they don't look good in an announcement like this.
Has anyone run inference for LLMs or other transformer models on comparable M1 and M2 Macs? Are there good benchmarks for this specific workload?
Reminded me of one of the cooler uses of old iphones. These old phones are going to continue being useful long after their initial life as a phone. As long as apple doesn't act like apple and lock everything down.
1 - https://findthatmeme.com/blog/2023/01/08/image-stacks-and-ip...
The bottleneck with compute at the edge is (and will be) model size (both app download time and storage space on device).
Stable Diffusion sits at about 2GB for fp16, Whisper Medium at 1.53GB, LLAMA is 120GB.
Sure, Apple can ship an optimized model (<2-4GB) as part of the OS, but what if a capable app maker wants to ship one? Users will not be happy with an app sized at >1GB.
Weird I just read this tweet  arguing Apple will be launching their own secure and private LLM that runs on device (edge compute).
As someone entirely at sea with the rapid pace of development in this sphere:
1. Is this a new LLM from Apple?
2. Is this a way to optimize running LLMs like Llama locally on M1 macs?
3. Something else altogether?
Maybe apple will have a bigger effect on ai adoption than any other company.
Local inference is huge for anything that requires even a little bit of privacy.
The github repo has only 5 commits in it and the last one is August 9, 2022. It looks abandoned.
As much as I love the progress in AI (also see Microsoft's recent Office Copilot) - I seriously think that government needs to step in to regulate the fair use of training data.
Though this is painted by my personal beliefs, in order to maximise innovation, I believe it's the role of government to implement regulations that support a competitive commercial environment. Without this, monopolies will form around hard to obtain resources, innovation will stagnate and consumers will be subject to exploitation.
Currently, data is mostly acquired without user consent and is accessible retroactively. Companies own that data, they can trade it and they can use it however they want to. You as a consumer have no say in this and it's virtually impossible to live a normal life without being the subject of analysis.
While it's incredible that companies can produce undeniably valuable products like Copilot - ultimately - they will profit from these products. The irony is they built them from data sourced from you, likely from something you paid for (MS Word, etc).
The key ingredient in these products is training data. If you wanted to compete with them, no matter how capable you are as an engineer, you could never make Copilot without the same scale of data Microsoft has gathered.
I don't know what kind of regulation would even out the playing field, but I wouldn't mind being compensated for my role in creating these highly profitable products.
While the github contains the code, the article describing the optimisations are here: https://machinelearning.apple.com/research/neural-engine-tra....
TL;DR: execution of pytorch models on apple's neural engine and standard data-oriented optimisations (changing matrix layout, chunking to optimise temporal cache locality, and minimising redundant memory copies)
Note that the most recent commit to this repo is 7 months ago.
Not sure why Apple is throwing this code over the fence, (or catapulting it over the balustrade -- to continue the castle metaphor used here) and not also selling server form factor Apple silicon based devices for data centers. They left out an "issues" pane on the github repo, so it is intended to be quite the unidirectional act. I am not sure that Apple will remain proportionally huge with respect to other big tech while they squander their development on boutique mass market products, while ignoring the vast growth they could achieve if they expanded into first class cloud computing markets.
Is there also something available to make use of the ANE during _training_? E.g fine-tuning BERT on an M1 Mac in a couple of hours?
(This here only applies to inference, right?)
The race is on and ecosystems are moving fast.
As a newbie to this space, I see this mentioning PyTorch. I was looking at Whisper earlier today and, somewhat impressively, was reminded that my M1 Pro has a CPU fan. Is it realistic to think it would be a modest amount of work to install this in my local venv and use the NPU instead?
In 2 years, LLM similar to today’s capability of ChatGPT will be running locally on newer phones and macs.
So, is Stable Diffusion working finally on TPU or not? DiffusionBee uses GPU and running this https://github.com/apple/ml-stable-diffusion with CPU_AND_NE just segfaults
This is great. I cannot wait to try it on my laptop as I like to do dev locally. But I don't understand the development part - besides on device, how would you deploy this on a server let's say given Apple serves is not something cloud providers provide.
Latest commit seems to be "on Aug 9, 2022"
did anyone use this and adapted it make it work to run the llama or alpaca model?
did anyone use this and adapted it make it work for llama or Alpaca?
Add 2022 to the title
Is it attention free?
how to run this on the mac ?
i'd say within 5 years apple will have optimized apple silicon and their tech, along with language model improvements, such that you will be able to get gpt-4 level performance in the iPhone 19 with inference happening entirely locally.
openai is doing great work and is serious competition, but I think many underestimate big tech. once they're properly motivated they'll catch up quick. I think we can agree that openai is a sufficient motivator.