PeasyScan Image Scanner Program

Word processors, spreadsheets, presentations, translation, etc.
Message
Author
User avatar
Argolance
Posts: 3767
Joined: Sun 06 Jan 2008, 22:57
Location: PORT-BRILLET (Mayenne - France)
Contact:

#106 Post by Argolance »

Bonjour,
:)
rcrsn51 wrote:Peasyscan generates some large, temporary PNM image files in /root. They are deleted when the program terminates. Maybe they should be placed elsewhere.
Yes indeed! If the program ends not properly for any reason, the image will not be deleted and may cause problem with a small nearly full pupsave: so Just for my own curiosity, why /root/scan (by default)?

Cordialement.

User avatar
rcrsn51
Posts: 13096
Joined: Tue 05 Sep 2006, 13:50
Location: Stratford, Ontario

#107 Post by rcrsn51 »

That was the original program in 2010. Now the large pnm file is stored in /tmp. Only the final png file is stored in /root after you save it.

User avatar
Argolance
Posts: 3767
Joined: Sun 06 Jan 2008, 22:57
Location: PORT-BRILLET (Mayenne - France)
Contact:

#108 Post by Argolance »

rcrsn51 wrote:That was the original program in 2010. Now the large pnm file is stored in /tmp. Only the final png file is stored in /root after you save it.
OK! Thanks.

User avatar
Argolance
Posts: 3767
Joined: Sun 06 Jan 2008, 22:57
Location: PORT-BRILLET (Mayenne - France)
Contact:

#109 Post by Argolance »

Bonjour,
- Scanning image for OCR, I noticed that PeasyScan is searching for the "tessdata" directory inside /usr/share/ while it is (usually?) inside /usr/share/tesseract-ocr/. So the conversion into text is not done.

Code: Select all

ls: cannot access /usr/share/tessdata/*.traineddata: No such file or directory
pnmtotiff: computing colormap...
pnmtotiff: Too many colors - proceeding to write a 24-bit RGB file.
pnmtotiff: If you want an 8-bit palette file, try doing a 'pnmquant 256'.
Error opening data file /usr/share/tesseract-ocr/tessdata/.traineddata.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language '.traineddata'
Tesseract couldn't load any languages!
If I make a symbolic link /usr/share/tessdata to /usr/share/tesseract-ocr/./tessdata it runs well.
I consequently had a look to the PeasyScan script and changed the line:

Code: Select all

LANGUAGE=$(basename $(ls -1 /usr/share/tessdata/*.traineddata | head -n1) .traineddata)
to:

Code: Select all

LANGUAGE=$(basename $(ls -1 /usr/share/tesseract-ocr/tessdata/*.traineddata | head -n1) .traineddata)
And now all is OK!

- Scanning image for PDF, the generated pdf file has no .pdf extention unless a pdf extension is added to the name of the scanned image itself in the field.

Small suggestion: would it be possible to display the text file using the defaulttextviewer at the end of the OCR process as well as the pdf file using the defaultpdtviewer. I think it is what user is expecting for, instead of the image which is not really welcome in this case? :roll:

Thinking this could be useful.

Cordialement.

User avatar
rcrsn51
Posts: 13096
Joined: Tue 05 Sep 2006, 13:50
Location: Stratford, Ontario

#110 Post by rcrsn51 »

Argolance wrote:- Scanning image for OCR, I noticed that PeasyScan is searching for the "tessdata" directory inside /usr/share/ while it is (usually?) inside /usr/share/tesseract-ocr/. So the conversion into text is not done.
Where did you get your "tessdata" package? On page 1, I have given the instruction:
3. Copy the file xxx.traineddata to /usr/share/tessdata
Small suggestion: would it be possible to display the text file using the defaulttextviewer at the end of the OCR process as well as the pdf file using the defaultpdtviewer. I think it is what user is expecting for, instead of the image which is not really welcome in this case?
Try this: Between lines 114 and 115, insert

Code: Select all

 defaulttexteditor "$SAVEFILENAME"
I think that seeing the image is still useful. Its quality determines how well the OCR works.

User avatar
Argolance
Posts: 3767
Joined: Sun 06 Jan 2008, 22:57
Location: PORT-BRILLET (Mayenne - France)
Contact:

#111 Post by Argolance »

Bonjour,
rcrsn51 wrote:Where did you get your "tessdata" package? On page 1, I have given the instruction:
3. Copy the file xxx.traineddata to /usr/share/tessdata
I simply installed Tesseract and all its needed dependancies from the PPM...
Try this: Between lines 114 and 115, insert

Code: Select all

 defaulttexteditor "$SAVEFILENAME"
It is what I did for my own use, as well as for PDF file: with some minor changes, it works quite well.

Now, as simple user:
  • Something is a bit confusing:
    - "Select the image format"? PDF and TXT are not "images" as such?
    - "Name the scanned image as"? There are not only "scanned" images that are named scan, but the *.png, *.jpg, *.pdf and *.txt files, so "scan" is only the base name of them?
In the basic recipe for using Peasyscan, you first mention:
1. Select the image format.
This must be this way while automating scans because all is done at a time, but I noticed that when scanning only a single document it is possible to select the image format (the "Output file type") after scanning, just before saving, so, from the same scanned image it is possible to get the full range of images, pdf or text files (provided that the script is adapted for that: I did it for my own use too).
This can be useful.

Cordialement.

User avatar
rcrsn51
Posts: 13096
Joined: Tue 05 Sep 2006, 13:50
Location: Stratford, Ontario

#112 Post by rcrsn51 »

When I added OCR to PeasyScan in 2010, I included my own Tesseract 3.00 package. It's still on page 1. Its default location is /usr/share/tessdata, so PeasyScan is written to look there.

If you get a Tesseract package elsewhere, I can't predict where the language files will be located. So you need to provide the link into /usr/share/tessdata.

I agree that the phrases are confusing. The original versions of PeasyScan only saved to graphics files, so they made sense then.

How about "Select the output format" and "Name the output file as"?

User avatar
Argolance
Posts: 3767
Joined: Sun 06 Jan 2008, 22:57
Location: PORT-BRILLET (Mayenne - France)
Contact:

#113 Post by Argolance »

rcrsn51 wrote:When I added OCR to PeasyScan in 2010, I included my own Tesseract 3.00 package. It's still on page 1. Its default location is /usr/share/tessdata, so PeasyScan is written to look there.
When I installed Tesseract, as it is the case for all the PPM users, Puppy didn't ask me where to copy the tessdata folder and put it inside the /usr/share/tesseract-ocr directory, which seems to be the default/usual one. In any case, I think it may be appropriate your script to take this into account and search for this directory too?
If you get a Tesseract package elsewhere, I can't predict where the language files will be located. So you need to provide the link into /usr/share/tessdata
.
It is not 'elsewhere' but very problably 'where' most of the (ToOpPy) users may usually find the package and install it from. :wink:
I agree that the phrases are confusing. The original versions of PeasyScan only saved to graphics files, so they made sense then.
Your script, as always, is very interesting, as simple and efficient as possible... but looks a bit tough for my taste! :oops: :)
So I took the liberty and had fun to get it smoother dressed! The only thing I hope is not to have impaired its functions!
How about "Select the output format" and "Name the output file as"?
See the pictures below, this is the choices I made...

Cordialement.
Attachments
171116_190816_376x42_easyshot.png
(2.74 KiB) Downloaded 618 times
171116_190837_598x299_easyshot.png
(31.47 KiB) Downloaded 620 times
Last edited by Argolance on Thu 16 Nov 2017, 18:10, edited 1 time in total.

User avatar
rcrsn51
Posts: 13096
Joined: Tue 05 Sep 2006, 13:50
Location: Stratford, Ontario

#114 Post by rcrsn51 »

I have updated my version to look for the language files in both places.

User avatar
charlie6
Posts: 1230
Joined: Mon 30 Jun 2008, 04:03
Location: Saint-Gérard / Walloon part of Belgium

tiff2pdf: invalid option -F

#115 Post by charlie6 »

Hi Bill,
hope you are doing well !

using this well appreciated version-2.12-peasyscan on Dpup Stretch-7.5: saving to pdf reports:

Code: Select all

# peasyscan
tiff2pdf: invalid option -- 'F'
LIBTIFF, Version 4.0.8
...
 -f: set PDF "Fit Window" user preference
...
tiff2pdf: invalid option -- 'F'
LIBTIFF, Version 4.0.8
editing /usr/local/bin/peasyscan
and replacing -F by -f after the two instances of tiff2pdf fixes the issues.

HTH
Charlie

User avatar
rcrsn51
Posts: 13096
Joined: Tue 05 Sep 2006, 13:50
Location: Stratford, Ontario

Re: tiff2pdf: invalid option -F

#116 Post by rcrsn51 »

charlie6 wrote:

Code: Select all

# peasyscan
tiff2pdf: invalid option -- 'F'
LIBTIFF, Version 4.0.8
...
 -f: set PDF "Fit Window" user preference
...
tiff2pdf: invalid option -- 'F'
LIBTIFF, Version 4.0.8
editing /usr/local/bin/peasyscan
and replacing -F by -f after the two instances of tiff2pdf fixes the issues.
Thanks. The developers of libtiff are always changing something.

I have added an Update to the main page that explains this issue.

User avatar
charlie6
Posts: 1230
Joined: Mon 30 Jun 2008, 04:03
Location: Saint-Gérard / Walloon part of Belgium

/usr/local/peasyscan/config4BLACK&WHITE.cfg; «Auto» key

#117 Post by charlie6 »

Hi,
just for memory sake ... :roll: might also help those who want a well contrasted black & white scan, even from original drawn with a common graphite pencil (hardness=HB2).

All what follows when Using the «Auto» button in peasyscans GUI see screenshot hereunder (NOT when using the «Start» button)

reference man pages:
scanimage https://www.systutorials.com/docs/linux ... scanimage/
and
gamma4scanimage http://www.pkill.info/linux/man/1-gamma4scanimage/
https://www.systutorials.com/docs/linux ... scanimage/

Here is a pixma160_4B&W_config.cfg files code for Canon MP160 scanner:
NB: URI is to be determined by running in terminal
export URI="pixma:04A91714_F30F67"
export SOURCE="flatbed"
export MODE="lineart"
export RESOLUTION="300"
export PAPER="A4"
export LANGUAGE="fra"
export OTHER="--custom-gamma=yes --gamma-table [0]1-[4095]255 --gamma=0.4"
Got the URI="pixma:04A91714_F30F67" by typing in terminal:
# scanimage -h
or better, which reports all available scanimages options for the current scanner
# scanimage -A
All options specific to device `pixma:04A91714_F30F67':
Scan mode:
--resolution auto||75|150|300|600dpi [75]
Sets the resolution of the scanned image.
--mode auto|Color|Gray|Lineart [Color]
Selects the scan mode (e.g., lineart, monochrome, or color).
--source Flatbed [Flatbed]
Selects the scan source (such as a document-feeder). Set source before
mode and resolution. Resets mode and resolution to auto values.
--button-controlled[=(yes|no)] [no]
When enabled, scan process will not start immediately. To proceed,
press "SCAN" button (for MP150) or "COLOR" button (for other models).
To cancel, press "GRAY" button.
Gamma:
--custom-gamma[=(auto|yes|no)] [yes]
Determines whether a builtin or a custom gamma-table should be used.
--gamma-table auto|0..255,...
Gamma-correction table. In color mode this option equally affects the
red, green, and blue channels simultaneously (i.e., it is an intensity
gamma table).
--gamma auto|0.299988..5 [2.2]
Changes intensity of midtones
Geometry:
-l auto|0..216.069mm [0]
Top-left x position of scan area.
-t auto|0..297.011mm [0]
Top-left y position of scan area.
-x auto|0..216.069mm [216.069]
Width of scan-area.
-y auto|0..297.011mm [297.011]
Height of scan-area.
Buttons:
--button-update
Update button state
--button-1 <int> [0] [read-only]
Button 1
--button-2 <int> [0] [read-only]
Button 2
--original <int> [0] [read-only]
Type of original to scan
--target <int> [0] [read-only]
Target operation type
--scan-resolution <int> [0] [read-only]
Scan resolution
Extras:
--threshold auto|0..100% (in steps of 1) [inactive]
Select minimum-brightness to get a white point
--threshold-curve auto|0..127 (in steps of 1) [inactive]
Dynamic threshold curve, from light to dark, normally 50-65
#
--gamma=0.4 : values less than 1 increase the contrast
minimum/maximum-scanner-specific-values are given above
see «...--gamma auto|0.299988..5 »: 0.299988 and 5

--gamma-table [0]1-[4095]255 : this syntax defines a kind of
" XY-4095points-Xrange0->4095-and-Yrange1->255-gamma curve "
(for details read above referenced links :roll: ; just make some other "cut-and-try" trials like as example [0]0-[4095]125 or [0]2-[4095]145 ranges to see what happens, looking at the "image.pnm" output of the below given scanimage console command for faster testing than each time editing the peasyscans config.cfg file)

the value 4095 is specific to the pixma MP160 scanner, is given reading the error report of the following console command:
# scanimage --mode Gray --custom-gamma=yes --gamma-table [0]0-[99999]255 >image.pnm
scanimage: option --gamma-table: index 99999 out of range [0..4095]
To mention: the above scanner-specific-options "Extras:" would be specific to the MODE="Color", and then inactive if MODE=lineart, which I did not have needed to investigate till now.

Mode="gray" gives a continuous-gray-ranged image, which size is larger than the lineart mode one: also not investigated.

Have fun, :wink: HTH
Charlie
Attachments
peasyscanGUI.png
...using the peasysacns «Auto» key
(17.69 KiB) Downloaded 464 times

User avatar
rcrsn51
Posts: 13096
Joined: Tue 05 Sep 2006, 13:50
Location: Stratford, Ontario

#118 Post by rcrsn51 »

Verson 2.13 posted above. See the Update note about optional compression when saving to PDF.

User avatar
rcrsn51
Posts: 13096
Joined: Tue 05 Sep 2006, 13:50
Location: Stratford, Ontario

#119 Post by rcrsn51 »

@Charlie:

I looked at libtiff v4.0.8 and I cannot replicate your problem with the "-F" option.

It continues to work as advertised - it expands the image to fill the PDF page.

Bill

Post Reply