Biggest shell programs
Oh, no, now I have to go dig out some of mine....
The first really big one I wrote was the ~7000 line installer for the Enrust CA and directory, which ran on, well, all Unixes at that time. It didn't initially, of course, but it grew with customer demand.
The installation itself wasn't especially complicated, but upgrades were, a little, and this was back when every utility on every Unix had slight variations.
Much of the script was figuring out and managing those differences, much was error detection and recovery and rollback, some was a very primitive form of package and dependency management....
DEC's Unix (the other one, not Ultrix) was the most baffling. It took me days to realize that all command line utilities truncated their output at column width. Every single one. Over 30 years later and that one still stands out.
Every release of HP-UX had breaking changes, and we covered 6.5 to 11, IIRC. I barely remember Ultrix or the Novell one or Next, or Sequent. I do remember AIX as being weird but I don't remember why. And of course even Sun's three/four OS's had their differences (SunOS pre 4.1.3; 4.1.3; Solaris pre 2; and 2+) but they had great FMs. The best.
At one point I considered writing an interpreter for my scripting language Lil in bash to maximize portability, but quickly realized that floating-point arithmetic would be extremely painful (can't even necessarily depend on bc/dc being available in every environment) and some of the machines in my arsenal have older versions of bash with very limited support for associative arrays. My compromise was to instead target AWK, which is a much more pleasant general-purpose language than most shells, and available in any POSIX environment: https://beyondloom.com/blog/lila.html
As someone who has written and maintained large Perl programs at various points in my career. There is a reason why people do this- Java and Python like languages work fine when interfaces and formats are defined, and you often have 0 OS interaction. That is, you use JSON/XML/YAML or interact with a database or other programs via http(s). This creates an ideal situation where these languages can shine.
When people do large quantity text and OS interaction work, languages like Java and Python are a giant pain. And you will begin to notice how Shell/Perl become a breeze to do this kind of work.
This means nearly every automation task, chaotic non-standard interfaces, working with text/log files, or other data formats that are not structured(or at least well enough). Add to this Perl's commitment towards backwards compatibility, a large install base and performance. You have 0 alternatives apart from Perl if you are working to these kind of tasks.
I have long believed that a big reason for so much manual drudgery these days, with large companies hiring thousands of people to do trivially easy to automate tasks is because Perl usage dropped. People attempt to use Python or Java to do some big automation tasks and quit soon enough when they are faced with the magnitude of verbosity and overall size of code they have to churn and maintain to get it done.
I think the main problem with writing large programs as bash scripts is that shell scripting languages were never really designed for complexity. They excel at orchestrating small commands and gluing together existing tools in a quick, exploratory way. But when you start pushing beyond a few hundred lines of Bash, you run into a series of limitations that make long-term maintenance and scalability a headache.
First, there’s the issue of readability. Bash's syntax can become downright cryptic as it grows. Variable scoping rules are subtle, error handling is primitive, and string handling quickly becomes messy. These factors translate into code that’s harder to maintain and reason about. As a result, future maintainers are likely to waste time deciphering what’s going on, and they’ll also have a harder time confidently making changes.
Next, there’s the lack of robust tooling. With more mature languages, you get static analysis tools, linters, and debuggers that help you spot common mistakes early on. For bash, most of these are either missing or extremely limited. Without these guardrails, large bash programs are more prone to silent errors, regressions, and subtle bugs.
Then there’s testing. While you can test bash scripts, the process is often more cumbersome. Complex logic or data structures make it even trickier. Plus, handling edge cases—like whitespace in filenames or unexpected environment conditions—means you end up writing a ton of defensive code that’s painful to verify thoroughly.
Finally, the ecosystem just isn’t built for large-scale Bash development. You lose out on modularity, package management, standardized dependency handling, and all the other modern development patterns that languages like Python or Go provide. Over time, these deficits accumulate and slow you down.
I think using Bash for one-off tasks or simple automation is fine — it's what it’s good at. But when you start thinking of building something substantial, you’re usually better off reaching for a language designed for building and maintaining complex applications. It saves time in the long run, even if the initial learning curve or setup might be slightly higher.
I'm pretty sure the largest handwritten shell program I used back in the day on a regular basis was abcde (A Better CD Encoder)[1] which clocks in at ~5500 LOC.[2]
[2] https://git.einval.com/cgi-bin/gitweb.cgi?p=abcde.git;a=blob...
Many of these programs are true gems; the rkhunter script, for instance is both nice code (can be improved) and a treasure trove of information*.
Note that much of the code size of these scripts is dedicated to ensuring that the right utilities exist across the various platforms and perform as expected with their various command line options. This is the worst pain point of any serious shell script author, even worse than signals and subprocesses (unless one enjoys the pain).
*Information that, I would argue, would be less transparent if rkhunter had been written in a "proper" programming language. It might be shoved off in some records in data structures to be retrieved; actions might be complex combinations of various functions---or, woe, methods and classes---on nested data structures; logging could be JSON-Bourned into pieces and compressed in some database to be accessed via other methods and so on.
Shell scripts, precisely due to the lack of such complex tools, tend to "spill the beans" on what is happening. This makes rkhunter, for instance, a decent documentation of various exploits and rootkits without having to dig into file upon file, structure upon structure, DB upon DB.
The FreeBSD Update client is about 3600 lines of sh code. Not huge compared to some of the other programs mentioned here, but I'm inclined to say that "tool for updating an entire operating system" is a pretty hefty amount of functionality.
The code which builds the updates probably adds up to more lines, but that's split across many files.
It’s “only” 7.1K LoC, but my favorite is the “acme.sh” script which is used to issue and renew certs from Lets Encrypt.
https://github.com/acmesh-official/acme.sh/blob/master/acme....
Sometimes shell is the only thing you can guarantee is available and life is such you have to have portability, but in general, if you've got an enormous shell app, you might want to rethink your life choices. :/
If you're looking for a tool to simplify the building of big shell programs, I highly recommend using argc (https://github.com/sigoden/argc). It's a powerful Bash CLI framework that significantly simplifies the process of developing feature-rich command-line interfaces.
Back when I worked on mod_pagespeed we wrote shell scripts for our end-to-end tests. This was expedient when getting started, but then we just kept using it long past when we should have switched away. At one point I got buy-in for switching to python, but (inexperience) I thought the right way to do it was to build up a parallel set of tests in python and then switch over once everything had been ported. This, of course, didn't work out. If I were doing this now I'd do it incrementally, since there's no reason you can't have a mix of shell and python during the transition.
I count 10k lines of hand-written bash in the system tests:
$ git clone git@github.com:apache/incubator-pagespeed-mod.git
$ git clone git@github.com:apache/incubator-pagespeed-ngx.git
$ find incubator-pagespeed-* | \
grep sh$ | \
grep system_test | \
xargs cat | \
wc -l
10579
I just added winetricks (22k LoC shell script) https://github.com/Winetricks/winetricks
Don't know about the biggest, although it was quite big, , but the best shell program I ever wrote was in ReXX for a couple of IBM 4381s running VM/CMS which did distributed printing across a number of physical sites. It saved us a ton of money as it only needed a cheap serial terminal and printer and saved us so much money when IBM was wanting to charge us an ungodly amount for their own printers and associated comms. One of pieces of software I'm most proud of (written in the mid 1980s), to this day.
Probably my largest one that was an order of magnitude smaller than these for the most part, but it checked that my VPN was up (or not) and started it if not. (And restarted various media based docker containers.)
If it was up, it would do a speedcheck and record that for the IP the VPN was using, then check to see how that speed was compared to the average, with a standard deviation and z-score. It would then calculate how long it should wait before it recycled the VPN client. Slow VPN endpoints would cycle quicker, faster ones would wait longer to cycle. Speeds outsize a standard deviation or so would check quicker than the last delta, within 1 Z would expand the delta before it checked again.
Another one about that size would, based on current time, scrape the local weather and sunup/sundown times for my lat/long, and determine how long to wait before turning on an outdoor hose, and for how long to run it via X10 with a switch on the laptop that was using a serial port to hook into the X10 devices. The hose was attached to a sprinkler on my roof which would spray down the roof to cool it off. Hotter (and sunnier) weather would run longer and wait shorter, and vice versa. I live in the US South where shedding those BTUs via evaporation did make a difference in my air conditioning power use.
I'm writing a ticketing manager for the terminal entirely in bash. Reasonably non-trivial project, and it's been pretty enjoyable working "exclusively" with bash. ("exclusively" here used in quotes, because the whole point of a shell scripting language is to act as a glue between smaller programs or core utilities in the first place, which obviously may well have been written in other languages. but you get the point).
Having said that, if I were to start experimenting with an altogether different shell, I would be very tempted to try jshell!
Incidentally, I hate when projects say stuff like "Oils is our upgrade path from bash to a better language and runtime". Whether a change of this kind is an "upgrade" is completely subjective, and the wording is unnecessarily haughty / dismissive. And very often you realise that projects who say that kind of thing are basically just using the underlying tech wrongly, and trying to reinvent the wheel.
Honestly, I've almost developed a knee reflex to seeing the words "upgrade" and "better" in this kind of context by now. Oils may be a cool project but that description is not making me want to find out more about it.
On a macOS machine, this:
$ file /usr/bin/* | grep "shell script" | cut -f1 -d':' | xargs wc -l | sort -n
gives me: 6431 /usr/bin/tkcon
but that's another Tk script disguised as a shell script; the next is: 1030 /usr/bin/dtruss
which is a shell script wrapper around dtrace.Since the topic is shell, can I shamelessly ask a question?
I'm an SRE for a service everyone has heard of. I have inadvertently pasted into my terminal prompt multiple times now, which has attempted to run each line as a command. I see there is a way to disable this at the shell for each client, but what about at the server level? This way I could enforce it as a policy, and not have to protect every single user (including myself) individually. Said differently, I want to keep everyone who ssh into a prod machine from being able to paste and execute multiple lines. But not forbid paste entirely.
The only thing I could think of would be to recompile bash and detect if the input was from a tty. If so, require at least 200ms between commands, and error out if the threshold exceeded. This would still allow the first pasted command to run, however.
I love exploring things like this. The demo for ble.sh interactive text editor made me chuckle with delight.
I think for sports, I could wrap all the various mulle-sde and mulle-bashfunction files back into one and make it > 100K lines. It wouldn't even be cheating, because it naturally fractalized into multiple sub-projects with sub-components from a monolithic script over time.
Biggest I know of is https://github.com/sonic2kk/steamtinkerlaunch/blob/master/st...
27k lines/24k loc
FireHOL is another pretty big one, around 20k lines. It's a neat firewall configuration tool with its own custom config format.
I would add Bash Forth to that. String-threaded concatenative programming!
Sometimes I do things I know are cursed for the sheer entertainment of being able to say it worked. E.g. my one absurdly complex R script that would write ungodly long bash scripts based on the output of various domain specific packages.
It began:
# Yeah yeah I know
Why is ReaR not on this list?
https://relax-and-recover.org/
This is the equivalent of the "Ignite" tool under HP-UX.
I think around a decade ago, I tried installing a copy of Mathematica and the installer from Wolfram was a bash program that was over a GB in size.
I tried opening it up just to look at it and most text editors just absolutely choked on it. I can't remember, but it was either Vim xor Emacs that could finally handle opening it.
everything you can do in `git gui` is actually a silly shell script but that works for me.
I feel like this merits having a Computer Benchmarks Game for different shells.
around ~2000, my build/install script had to simulate some kind of OO-like inheritance.. And there was python but noone understood it (and even less had it installed), so: bash - aliases had priority over funcs which had priority to whatever executables found in PATH.. so here you go - whole 3 levels of it, with lowest/PATH being changeable..
Most shell script installers are works of art
Surprised not to see Arch Linux’s makepkg on the list, btw.
Would love to see the same for batch on Windows
[dead]
[dead]
Okay, so when I worked at Sony about 25 years ago, I got assigned this project to fix our order management system, which was extremely slow, and kept crashing.
I jumped in and started digging around, and to my horror, the OMS was a giant set of shell scripts running on an AIX server, which evolved over a decade and was abandoned. It was over 50,000 lines of code! It was horrendous and shit kept timing out everywhere -- orders, payments, and other information were moved from server to server over FTP, parsed with complicated sed/awk, and inventory was tracked in text files (also FTPd around.)
At the time, perl seemed like the most practical way for me to migrate the mess -- I rewrote all of the shell piece by piece, starting with the simplest peices and replaced them with small perl modules as part of a larger perl application, refactoring along the way. It took me 3 months and I moved the whole thing to about 5000 lines of perl, and it ran 10-100x faster with almost none of the failures in the original system.
As terrible as it was, it's one of the most satisfying things I've ever done. :-)