Converting a large mathematical software package written in C++ to C++20 modules
Oh, it’s Wolfgang. In computational math, he has a focus on research software that few others are able to do, he (the deal.ii team more generally) got an award for it last SIAMCSE. Generally a great writer, looking forward to reading this.
A few points
1) modules only really help address time spent parsing stuff, not time spent doing codegen. Actually they can negatively impact codegen performance because they can make more definitions available for inlining/global opts, even in non-lto builds. For this reason it's likely best to compare using thin-lto in both cases.
2) when your dependencies aren't yet modularized you tend to get pretty big global module fragments, inflating both the size of your BMIs and the parsing time. Header units are supposed to partially address this but right now they are not supported in any build systems properly (except perhaps msbuild?). Also clang is pretty bad at pruning the global module fragment of unused data, which makes this worse again.
I really wonder whether LLMs are helpful in this case. This kind of task should be the forte of LLMs: well-defined syntax and requirements, abundant training material available, and outputs that are verifiable and validatable.
Perhaps we should use LLMs to convert all the legacy programs written in Fortran or COBOL into modern languages.
I would like to see a comparison between modules and precompiled headers. I have a suspicion that using precompiled headers could provide the same build time gains with much less work.
To be fair, C++’s modules make no sense, just like their namespaces that span multiple translation units.
It’s just more heavy clunky abstractions for the sake of abstractions.
The code block styling is less than ideal.
Thanks to author for doing some solid work in providing data points for modules. For those like me looking for the headline metric, here it is in the conclusion
So, alas, underwhelming in this iteration and perhaps speaks to 'module-fication' of existing source code (deal.II, dates from the '90s I believe), rather than doing it from scratch. More work might be needed in structuring the source code into modules as I have known good speedup with just pch, forward decls etc. (more than 10%). Good data point and rich analysis, nevertheless.