Poetry from dirty OCR

MaysonL | 63 points

This reminds me of the experiment to run paint splatters through OCR and check, whether the result is valid Perl code (spoiler: 93% evaluated just fine).


eliaspro | a year ago

OCR is hard, but maybe we can make some real progress on it now with modern AI. A context-smart church records handwriting transcriber would be pretty great.

vintermann | a year ago

  I've poured over ((ok, grepped) ~500GB of Chroincling America data to find lines that meet my low standard for nonsene, basically ones that match egrep "[^a-zA-Z0-9 ]{3,}"
I'm super curious to know fast this was. grep is generally very fast and this should be doable on a normal computer, though it might take a little while
version_five | a year ago

Spent a load of time doing OCR and dealing with its failures... this is absolutely wonderful, thanks for sharing!

chaps | a year ago

Yes, sir, we got a parrot.

BubbleRings | a year ago