It&#x27;s neat to see ongoing Co-dfns work from Aaron and others! There are a number of YouTube videos online if anyone is interested in very cool and esoteric yet serious programming: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;playlist?list=PLDU0iEj6f8duXzmgnlGX4hMHJUMYh4rJq">https:&#x2F;&#x2F;www.youtube.com&#x2F;playlist?list=PLDU0iEj6f8duXzmgnlGX4...</a>.

Impressively concise implementation, really interesting paper! Benchmark looks quite questionable though -- e.g. they use fp64 (while any sane person would use at least f32, if not f16), batch 1 (while normally one would try to get the max batch size which fits into memory, and it would reach 1 only for much bigger models or inputs), and also measure the time including transfer to&#x2F;from GPU (while it would normally be interleaved with GPU operations). Not sure what results would look like in a more realistic setup, but still getting within 2x of PyTorch even in such a setting looks impressive!

Interesting Futhark mention as well in the Related work section:&gt; Another approach to GPU-based array programming with
an APL focus is the TAIL&#x2F;Futhark system [8], which is a
compiler chain taking APL to the TAIL (Typed Array Inter-
mediate Language) and then compiling TAIL code using the
Futhark GPU compiler backend.

I wonder what it would look like on kdb+?

Agreed. Implementing backprop myself—even if it was a crappy, slow version in MatLab—is what finally got me to understand it. I&#x27;ve worked as an ML researcher for 2 years since then and I&#x27;m still routinely happy that I have that deeper understanding of what&#x27;s going on under the hood of the models I&#x27;m training.

Why? As a learning experience implementing backpropagation is extremely helpful, implementing an entire FNN&#x2F;CNN from scratch is, to be honest.Also Implementing some basic automatic differentiation is something you should probably have done once in your life you are interested in Machine learning or numerical mathematics.

Yes, every ML grad student should write autodiff lib (ideally using dual numbers). But nobody should ever write backward pass for whole net. Researchers did that in 80s and 90s. Once we had autodiff, it became obvious that deriving backward pass by hand is no-go.

Nobody should write backward pass by hand.

U-Net CNN in APL: Exploring Zero-Framework, Zero-Library Machine Learning