Page 1 of 1

OCR for Puppy Linux

Posted: Sat 01 Oct 2011, 15:41
by tronkel
Here is a pet package called puppyocr that is based on the well-established Tesseract optical character recognition engine. Pet is 1.9MB.

I have included a command-line interface wrapper written in C++ that makes the task of using OCR a bit friendlier than the raw Tesseract command-line interface.

To use it, install the pet and then type "puppyocr" (no quotes) in a terminal and simply follow the prompts there.

When asked for, type the name (including the extension) of a 'tif' file that is stored in your home folder.

You can use the MTPAINT or Gimp software programs to create a 'tif' image file from a scanner.

When prompted for the name of the output file, you can use any name you like. This output file will be created in your home folder but with a 'txt' extension.

edit: replaced with updated version that checks for a suitable input file. If none found program exits with a warning.

Enjoy!

Posted: Mon 03 Oct 2011, 04:18
by TheAsterisk!
Thanks! I was just thinking about using something like this earlier today.

I've downloaded it, and I'll give it a try sometime later this week!

an error occured.

Posted: Sun 10 May 2015, 00:53
by Pelo
an error occured... Is the link dead ?

Posted: Sun 10 May 2015, 01:06
by starhawk
That would be the forum attachment limit biting at you, Pelo. When Flash and/or John Murga set the attachment size to 256kb maximum, it actually deleted everything over that, that had been attached previously.

Sorry, it's gone...

Posted: Sun 10 May 2015, 06:26
by saintless
Newer version available here:
http://akita.scottjarvis.com/puppyocr-1.22.pet
And mirrored here:
http://smokey01.com/saintless/Fredx181/ ... r-1.22.pet
Maybe this will also help (Edit: No, the download links do not work there):
http://www.murga-linux.com/puppy/viewto ... 7f975e7829

Merci saintless

Posted: Sat 23 May 2015, 11:12
by Pelo
pet stored in my tool case.
With PuppyOCR i read old documents from 1800 to 1900, about history of france, In spite of errors, 95 percent of text is recognized. Don't want too much. fifteen lines are enough . the whole page don't suit. often these docs were scanned from books end edges are trunked.
PuppyOCR does as it can, but as much as others