I think there is a slight disconnect here between making AI systems which are smart and AI systems which are useful. It’s a very old fallacy in AI: pretending tools which assist human intelligence by solving human problems must themselves be intelligent.
The utility of big datasets was indeed surprising, but that skepticism came about from recognizing the scaling paradigm must be a dead end: vertebrates across the board require less data to learn new things, by several orders of magnitude. Methods to give ANNs “common sense” are essentially identical to the old LISP expert systems: hard-wiring the answers to specific common-sense questions in either code or training data, even though fish and lizards can rapidly make common-sense deductions about manmade objects they couldn’t have possibly seen in their evolutionary histories. Even spiders have generalization abilities seemingly absent in transformers: they spin webs inside human homes with unnatural geometry.
Again it is surprising that the ImageNet stuff worked as well as it did. Deep learning is undoubtedly a useful way to build applications, just like Lisp was. But I think we are about as close to AGI as we were in the 80s, since we have made zero progress on common sense: in the 80s we knew Big Data can poorly emulate common sense, and that’s where we’re at today.
> “Pre-ImageNet, people did not believe in data,” Li said in a September interview at the Computer History Museum. “Everyone was working on completely different paradigms in AI with a tiny bit of data.”
That's baloney. The old ML adage "there's no data like more data" is as old as mankind itself.
I think neural nets are just a subset of machine learning techniques.
I wonder what would have happened if we poured the same amount of money, talent and hardware into SVMs, random forests, KNN, etc.
I don't say that transformers, LLMs, deep learning and other great things that happened in the neural network space aren't very valuable, because they are.
But I think in the future we should also study other options which might be better suited than neural networks for some classes of problems.
Can a very large and expensive LLM do sentiment analysis or classification? Yes, it can. But so can simple SVMs and KNN and sometimes even better.
I saw some YouTube coders doing calls to OpenAI's o1 model for some very simple classification tasks. That isn't the best tool for the job.
I'm surprised that the article doesn't mention that one of the key factors that enabled deep learning was the use of RELU as the activation function in the early 2010s. RELU behaves a lot better than the logistic sigmoid that we used until then.
The deep learning boom caught deep-learning researchers by surprise because deep-learning researchers don't understand their craft well enough to predict essential properties of their creations.
A model is grown, not crafted like a computer program, which makes it hard to predict. (More precisely, a big growth phase follows the crafting phase.)
“Nvidia invented the GPU in 1999” wrong on many fronts.
Arguably the November 1996 launch of 3dfx kickstarted GPU interest and OpenGL.
After reading that, it’s hard to take author seriously on the rest of the claims.
The article mentions Support Vector Machines being the hot topic in 2008. Is anyone still using/researching these?
I often wonder how many useful technologies could exist if trends went a different way. Where would we be if neural nets hadn't caught on and SVMs and expert systems had.
I took some AI courses around the same time as the author, and I remember the professors were actually big proponents of neural nets, but they believed the missing piece was some new genius learning algorithm rather than just scaling up with more data.
So the AI boom of the last 12 years was made possible by three visionaries who pursued unorthodox ideas in the face of widespread criticism.
I argue that Mikolov with word2vec was instrumental in current AI revolution. This demonstrated the easy of extracting meaning in mathematical way from text and directly lead to all advancements we have today with LLMs. And ironically, didn’t require GPU.
Three legs to the stool - the NN algorithms, the data, and the labels. I think the first two are obvious but think about how much human time and care went into labeling millions of images...
It wasn't "almost everyone", it was straight up everyone.
logic is data and data is logic
Lesson: ignore detractors. Especially if their argument is "dont be a tall poppy"
I can’t be the only one who has watched this all unfold with a sense of inevitability, surely.
When the first serious CUDA based ML demos started appearing a decade or so ago, it was, at least to me, pretty clear that this would lead to AGI in 10-15 years - and here we are. It was the same sort of feeling as when I first saw the WWW aged 11, and knew that this was going to eat the world - and here we are.
The thing that flummoxes me is that now that we are so obviously on this self-reinforcing cycle, how many are still insistent that AI will amount to nothing.
I am reminded of how the internet was just a fad - although this is going to have an even greater impact on how we live, and our economies.
It was basically useful to average people and wasn't just some way to steal and resell data or dump ad after ad on us. A lot of dark patterns really ruin services.
The article credits two academics (Hinton, Fei Fei Li) and a CEO (Jensen Huang). But really it was three academics.
Jensen Huang, reasonably, was desperate for any market that could suck up more compute, which he could pivot to from GPUs for gaming when gaming saturated its ability to use compute. Screen resolutions and visible polygons and texture maps only demand so much compute; it's an S-curve like everything else. So from a marketing/market-development and capital investment perspective I do think he deserves credit. Certainly the Intel guys struggled to similarly recognize it (and to execute even on plain GPUs.)
But... the technical/academic insight of the CUDA/GPU vision in my view came from Ian Buck's "Brook" PhD thesis at Stanford under Pat Hanrahan (Pixar+Tableau co-founder, Turing Award Winner) and Ian promptly took it to Nvidia where it was commercialized under Jensen.
For a good telling of this under-told story, see one of Hanrahan's lectures at MIT: https://www.youtube.com/watch?v=Dk4fvqaOqv4
Corrections welcome.