Page 3 of 6

Posted: Sun 31 Jan 2010, 19:12
by edoc
disciple wrote: pdf2txt - just a script http://comp.eonworks.com/scripts/conver ... 40418.html
Would it be really hard to create an icon onto which a .pdf file could be dragged & dropped to convert the whole thing to text?

Would a script like this just ignore charts and graphs and images?

Posted: Sun 31 Jan 2010, 20:13
by edoc
disciple wrote: Pdf2html looks like it extracts images and text :) http://freshmeat.net/projects/pdf2html/
Looking at the URL for this one it appears to have been abandoned 9 years ago.

Was it not functioning correctly and not deemed worth the effort to pursue or was it replaced by a different utility?

Posted: Sun 31 Jan 2010, 23:08
by disciple
Would it be really hard to create an icon onto which a .pdf file could be dragged & dropped to convert the whole thing to text?
No, it would be really easy of course. It would be harder to add an icon to pdfshuffler to do it, but I'm still not sure if that's what you meant.
It would be worth comparing the output produced by the different alternatives though, as I think you'll find some are significantly better than others.
Would a script like this just ignore charts and graphs and images?
AFAIK everything but pdf2html would ignore images, although I'm guessing most of them would pick up the text from charts, which wouldn't be pretty.
I used to use a pdf2html on Windows, which produced relatively good results - I'm not sure if it picked up images or if it is the same one as this pdf2html. There are also all sorts of other options, like online converters and sending a pdf to a friend with Adobe Acrobat to use the built-in .doc export (which sometimes produces corrupt .docs that freeze M$ Word :) ). Corel Wordperfect does PDF import these days, and there is also a plugin for Openoffice that does it, although I think it just imports to OOo Draw.
Was it not functioning correctly and not deemed worth the effort to pursue or was it replaced by a different utility?
I've got no idea - give it a try :)

Thanks

Posted: Tue 01 Jun 2010, 20:54
by Frank Cox
I installed and converted the pdfshuffler squash file and converted it to 4 for Puppy 4.31 and everything worked well including the menu listing.

It is a cool and useful program but unless I missed something it cannot edit the files , only shuffle them, is that correct? It is presented as a replacement for pdfeditor .

I am using a program called ImageScan which works fine but only seems to be able to produce single page pdfs . Are you familiar with a Puppy friendly scanner software that will create multi page pdfs?

Also is there a pet for pdfeditor or another utility that will edit the files themselves?

Thanks

Posted: Tue 01 Jun 2010, 21:07
by Frank Cox
I installed and converted the pdfshuffler squash file and converted it to 4 for Puppy 4.31 and everything worked well including the menu listing.

It is a cool and useful program but unless I missed something it cannot edit the files , only shuffle them, is that correct? It is presented as a replacement for pdfeditor .

I am using a program called ImageScan which works fine but only seems to be able to produce single page pdfs . Are you familiar with a Puppy friendly scanner software that will create multi page pdfs?

Also is there a pet for pdfeditor or another utility that will edit the files themselves?

Posted: Wed 02 Jun 2010, 07:28
by disciple
No, it is not a replacement for pdfedit. As I said, it lets you combine pdfs, rearrange, delete, rotate and crop pages... but not edit the content of pages. I mentioned pdfedit because a lot of people use it when they should be using pdfshuffler, which is much faster and more reliable.

It sounds like you want gscan2pdf, which we were discussing here

AFAIK pdfedit is the only option for editing the content of pdf pages. There are .pet packages on the forum, but I haven't seen a recent one. There were a lot of major bugfixes early last year (I was using it in Cygwin), so I strongly recommend you run the latest version. You might like to try an old .pet before putting effort into building the latest version, to make sure it actually does what you need... although I guess you can probably find a recent build from some other distro that will work.

Can you post a link to this ImageScan program? If it is good, it would be trivial to combine its single page pdfs together...

Incidentally, Final Page (the competition for pdfshuffler) should be buildable in some of the newest Puppies.

ImageScan

Posted: Wed 02 Jun 2010, 17:33
by Frank Cox
Imagescan works with Epson Printers, I am not sure if it works with others. This is the link fro the Epson version, they also have lots of printer drivers for Epson there.
it won't allow you to save in ps , only pdf and most image files.

It would be sweet if efax would send a format I can save in.

I

Posted: Tue 14 Sep 2010, 09:44
by disciple
There's a fork of Pdfshuffler some of you might be interested in; unfortunately it requires pygconf.
http://code.google.com/p/pdfsnip/

Posted: Sun 19 Sep 2010, 01:07
by disciple
Aha!
I found an old debian package of pygconf that seems to work even with my gconf-dbus package.
http://snapshot.debian.org/archive/debi ... 3_i386.deb
Just extract the gconf.so to /usr/lib/python2.5/site-packages/gtk-2.0/gconf.so

To install Pdfsnip extract it and run `python setup.py clean install`. It should now work.
The main advantage of Pdfsnip is that it has a zoom feature... but this is slow and seems unreliable :(
It might be better to add a feature to Pdfshuffler to enlarge just a single page (in a separate window or pane).

Re: (Possibly) a small alternative to Pdfshuffler

Posted: Sat 25 Sep 2010, 18:28
by musher0
disciple wrote:BTW when Puppy's gtk and stuff gets updated there's an alternative to Pdfshuffler (without the Python dependencies, and written in vala) that's worth trying http://freshmeat.net/projects/final-page
Hi, disciple.

No joy compiling "Final Page" on wary 0.7: I'm told that gconf-tools are missing from the dev-x package... Oh, well... This is not the first time that thjis has happened while trying to compile some source in puppy. Which is why I occasionally use ready-made binaries from other distros (it's much easier now that we have .deb and slackware packages capabilities). In any case, maybe Puppy should revisit its dev-x sfs's and make them more inclusive? Only my two cents worth!

Incidentally, many, many thanks for your useful contributions (messages, research and files) to pdf file processing in Puppy. Very useful, it saved me a lot of time.

BFN.

Posted: Sun 26 Sep 2010, 01:06
by disciple
People might also be interested in a third tool along the same lines - PdfMod. Unfortunately this is a Gnome/Mono app.

Pdfjumbler - another drag and drop pdfeditor

Posted: Fri 25 May 2012, 04:00
by disciple
Wow - I didn't realise how crowded the market was on Linux these days.

There is a fourth tool along the same lines called Pdfjumbler. This one is Java based, so it is easy to run on Windows as well, and you don't need to actually install it. I haven't tested extensively, but it looks quite good... possibly the best Java program I've seen.

You can use it to combine pdfs and rearrange and delete pages. No rotate or crop (there is a feature request for rotate; perhaps you would like to implement it ;) )

I also see that pdfsam (also Java based) has greatly improved its UI for rearranging pages and stuff. My preferred tool is still pdfshuffler though.

Briss - gui for cropping pdfs (Java based)

Posted: Thu 31 May 2012, 02:28
by disciple
Another interesting Java based tool - just for cropping:
http://sourceforge.net/projects/briss/
This is a small application to crop PDF files. It helps the user to decide what should be cropped by creating a overlay of similar pages (=>all pages within a pdf having the same size, orientation(even/odd)).

Java guis for editing PDFs

Posted: Fri 01 Jun 2012, 03:45
by disciple
Two things to note with the Java tools:

- I haven't tested them on Linux, but at least on Windows they seem to write the entire output file to memory before writing it to disk or something. And Java limits itself to a small amount of memory by default, so with a large PDF you can get an "out of heap space" error. To allow them to use memory you can start them like this:

Code: Select all

java -Xms128m -Xmx1024m -jar pdfjumbler-0.16.jar
- Briss just changes the viewable extents, it does not actually get rid of the cropped information. If you want to do that, try printing the cropped pdf to a new file, using the CUPS-pdf virtual printer that comes in Puppy. (actually, new versions of the GTK built-in "print-to-file" feature allow you to save as PDF, too. I'm not certain if that will throw away the information or not).

Re: (Possibly) a small alternative to Pdfshuffler

Posted: Fri 01 Jun 2012, 03:58
by disciple
musher0 wrote:
disciple wrote:BTW when Puppy's gtk and stuff gets updated there's an alternative to Pdfshuffler (without the Python dependencies, and written in vala) that's worth trying http://freshmeat.net/projects/final-page
Hi, disciple.

No joy compiling "Final Page" on wary 0.7...
For future readers: it is currently not possible to build final-page.
There are two problems with it:
- It seems the build system is set up to compile from the c files, not the vala files. I think the original author has written parts of it in vala, then used valac to translate these to c, and then modified them. If this is correct, then you couldn't build directly from the vala source anyway. (hmmm. then why were some of the last commits to .vala files? What did I miss when I was trying to build it?)
- It needs a lot of fixes to build with a recent vala and other dependencies (hmmm. does this make sense if the build system starts from the C? Is the problem changes in GTK, not vala?)

Re: Briss - gui for cropping pdfs (Java based)

Posted: Thu 21 Jun 2012, 06:23
by disciple
disciple wrote:Another interesting Java based tool - just for cropping:
http://sourceforge.net/projects/briss/
This is a small application to crop PDF files. It helps the user to decide what should be cropped by creating a overlay of similar pages (=>all pages within a pdf having the same size, orientation(even/odd)).
Very similar to Briss, and also Java, is Pdf scissors

EDIT 20191021
I just cropped on Windows a study Bible with almost 2000 pages successfully using Briss, and it was very fast. Pdfscissors failed for some reason - I don't know if it was to do with the file size or something, but it took quite a lot longer and appeared to succeed, but the pages were not actually cropped.
I get the same results with a couple of other New Testaments; I guess perhaps Briss is succeeding because it is shipped on Windows with a newer version of iText than pdfscissors is...

more pdf editing

Posted: Mon 25 Jun 2012, 03:36
by disciple
Stapler - a Python alternative to Pdftk (which is java). i.e. a command line tool.

pyPdf-GUI - a PyGTK alternative to the Pdftk gui. Can add watermarks (although this can corrupt the page).

Posted: Fri 03 Aug 2012, 22:52
by headfound
disciple - just wanted to say thank you for making and keeping alive such a useful thread! Cheers (wheres the +1 Like button?) :D

Posted: Thu 12 Sep 2013, 05:36
by disciple
I've seen this one before, but I don't seem to have posted it here. Looks like a command line alternative to briss and PDF scissors.
PDFCrop is a Perl script that crops the white margins of PDF pages and rescales them to fit a standard size sheet of paper.

Posted: Thu 12 Sep 2013, 05:37
by disciple
A plugin for pdfsam to do the same thing.
Didn't seem to reliably detect the right place to crop in the one PDF I tried it with...