Poetry from dirty OCR
MaysonL | 63 points
OCR is hard, but maybe we can make some real progress on it now with modern AI. A context-smart church records handwriting transcriber would be pretty great.
vintermann | 3 months ago
I've poured over ((ok, grepped) ~500GB of Chroincling America data to find lines that meet my low standard for nonsene, basically ones that match egrep "[^a-zA-Z0-9 ]{3,}"
I'm super curious to know fast this was. grep is generally very fast and this should be doable on a normal computer, though it might take a little whileversion_five | 3 months ago
Spent a load of time doing OCR and dealing with its failures... this is absolutely wonderful, thanks for sharing!
chaps | 3 months ago
Yes, sir, we got a parrot.
BubbleRings | 3 months ago
This reminds me of the experiment to run paint splatters through OCR and check, whether the result is valid Perl code (spoiler: 93% evaluated just fine).
https://www.mcmillen.dev/sigbovik/