Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Fri 29 Aug 2014, 04:11
All times are UTC - 4
 Forum index » Advanced Topics » Additional Software (PETs, n' stuff) » Unsorted
joinPdf
Moderators: deshlab, Flash, GuestToo, Ian, JohnMurga, Lobster
Post new topic   Reply to topic View previous topic :: View next topic
Page 1 of 2 [29 Posts]   Goto page: 1, 2 Next
Author Message
disciple

Joined: 20 May 2006
Posts: 6428
Location: Auckland, New Zealand

PostPosted: Wed 24 Oct 2007, 23:56    Post subject:  joinPdf
Subject description: small gui to combine pdf files
 

(Screenshots at the bottom)
Updated 30 August 2008 to work with spaces in paths.

Works two ways:
1 - you call it without any input arguments, and you get a gui to specify the output file, a folder with the input files, and whether or not you want to view the file it creates.
2 - you call it from the command line with input files and/or folders, or drop things on it or use it from the rox "open with" menu or something. It will pop up a dialogue asking where you want to save it, and whether or not you want to view the file it creates (if you want to be able to use it from the console without popping up a dialogue then I'm sure you're smart enough to modify it to do that).

It will join all the .pdf files in a folder recursively, sorted in alphanumeric path order. e.g. if I do
Code:
joinPdf /~/pdf
for a hypothetical folder, it will join all the files together in this order:
    /~/pdf/1.pdf
    /~/pdf/2/4.pdf
    /~/pdf/2/5/1.pdf
    /~/pdf/2/5/a.pdf
    /~/pdf/2/5/b.pdf
    /~/pdf/2/5/c.pdf
    /~/pdf/3.pdf

However, if I run it from the command line with more than one input argument (files or folders or both), then it will join them in the order I specify. So if I do
Code:
joinpdf z b.pdf a
then it will first do the pdfs in folder z, in alphanumeric order, then it will add b.pdf, and then it will add the pdfs in folder a in alphanumeric order.

Requires gtkdialog3 (included in recent Puppies) and ghostscript of course.
Has no error handling to deal with ghostscript errors.
Doesn't check whether you want to overwrite if you specify an output file that already exists - just overwrites automatically.
Works recursively with pdfs in any depth of folders.
Isn't phased by folders containing files that aren't pdfs
Does make sure the output file has a .pdf extension but not a .pdf.pdf extension.

The full code at the moment:
Code:
#! /bin/bash
# Script for Puppy Linux to combine pdf files.
# Version 4 by disciple, 30 August 2008.
# http://www.murga-linux.com/puppy/viewtopic.php?p=149208#149208
# Currently has NO ERROR HANDLING
# You may experience errors if you have pdfs that are broken or are not pdfs at all.
# I'm not sure, but because it doesn't delete the temporary directory at the start, you may also get unexpected results if you use "View the file afterwards", crash your Pdf viewer, and then join some more pdfs.
# Yes, the use of all those symlinks is an ugly hack, but I couldn't get gs to join files with spaces in the path otherwise :(

# Set defaults
INPUTFOLDER="`pwd`"
OUTPUTFILE="`pwd`/combined.pdf"

# Set temporary directory
TEMPFOLDER=/tmp/joinPdfdir
mkdir $TEMPFOLDER

# Initialise filecount
FILECOUNT=100

export MAIN_DIALOG="
<window title=\"Puppy's pdf joining\"icon-name=\"gtk-file\">
 <vbox>
  <text>
   <label>You can recursively join all the pdfs in a directory and any number of subdirectories in normal alphanumeric order.</label>
  </text>
  <text>
   <label>Make sure you name the files and folders appropriately so they are joined in the order you want.</label>
  </text>
  <text>
   <label>e.g. Pdfs in a subfolder called "A" will come after a file called "1.pdf", and before a file called "B.pdf"</label>
  </text>
  <frame Location of input files>
   <hbox>
    <entry accept=\"directory\">
     <variable>INPUTFOLDER</variable>
     <input>echo '$INPUTFOLDER'</input>
    </entry>
    <button>
     <input file stock=\"gtk-open\"></input>
     <action type=\"fileselect\">INPUTFOLDER</action>
     <action>refresh:INPUTFOLDER</action>
    </button>
   </hbox>
  </frame>
  <frame Output file>
   <hbox>
    <entry accept=\"savefilename\">
     <variable>OUTPUTFILE</variable>
     <input>echo '$OUTPUTFILE'</input>
    </entry>
    <button>
     <input file stock=\"gtk-open\"></input>
     <action type=\"fileselect\">OUTPUTFILE</action>
    </button>
   </hbox>
  </frame>
  <checkbox>
   <label>View the file afterwards</label>
   <default>true</default>
   <variable>VIEWOUTPUT</variable>
  </checkbox>
  <hbox>
   <button>
    <input file stock=\"gtk-ok\"></input>
    <label>Join pdfs</label>
    <action type=\"exit\">JOIN-NOW</action>
   </button>
   <button>
    <input file stock=\"gtk-dialog-info\"></input>
    <label>Help</label>
    <action>gtkdialog3 -c --program HELP_DIALOG</action>
   </button>
   <button cancel></button>
  </hbox>
 </vbox>
 </window>"

export HELP_DIALOG="
<window title=\"joinPdf info\"icon-name=\"gtk-dialog-info\">
<vbox>
 <text>
 <label>If you run joinPdf from the command line with inputs, it will pop up a dialogue to ask you what you want to save the combined file as, and will then join them as you would expect.  If you specify more than one input (file or folder), it will join them in the order that you specify, and things that it joins recursively are sorted globally (per input).  They are deliberately not sorted folders first and then files.</label>
 </text>
 <text>
 <label>\"\"</label>
 </text>
 <text>
 <label>The file chooser for the output file can only choose a directory - you need to add a filename, but the script will sort out the .pdf extension.</label>
 </text>
 <hbox>
  <button>
   <label>\"Visit forum thread\"</label>
   <action>defaultbrowser http://www.murga-linux.com/puppy/viewtopic.php?p=149208#149208</action>
  </button>
  <button ok></button>
 </hbox>
</vbox>
</window>
"

export OUTPUT_FILE_DIALOG="
<window title=\"joinPdf\"icon-name=\"gtk-file\">
<vbox>
 <text>
 <label>What would you like to save the output file as?</label>
 </text>
 <hbox>
  <entry accept=\"savefilename\">
   <variable>OUTPUTFILE</variable>
   <input>echo '$OUTPUTFILE'</input>
  </entry>
  <button>
   <input file stock=\"gtk-open\"></input>
   <action type=\"fileselect\">OUTPUTFILE</action>
  </button>
 </hbox>
 <hbox>
 <button ok>
  <action type=\"exit\">JOIN-NOW</action>
 </button>
 <button cancel></button>
 </hbox>
 <checkbox>
  <label>View the file afterwards</label>
  <default>true</default>
  <variable>VIEWOUTPUT</variable>
 </checkbox>
</vbox>
</window>
"

# Show gui if run without input arguments.
test -sd "$@"
if [ "$?" = "0" ]; then
 MAINGUI="`gtkdialog3 -c --program MAIN_DIALOG`"
 if [ "`echo "$MAINGUI" | grep EXIT | cut -f 2 -d '\"' | sed 's/\"//g'`" != "JOIN-NOW" ]; then
  exit 0
 fi
 INPUTFOLDER="`echo "$MAINGUI" | grep INPUTFOLDER | cut -f 2 -d '"' | sed 's/\"//g' `"
 OUTPUTFILE="`echo "$MAINGUI" | grep OUTPUTFILE | cut -f 2 -d '"' | sed 's/\"//g' `"
 VIEWOUTPUT="`echo "$MAINGUI" | grep VIEWOUTPUT | cut -f 2 -d '"' | sed 's/\"//g' `"
 find "$INPUTFOLDER" -name '*.pdf' | sort > $TEMPFOLDER/files.txt

# Just combine the pdfs if run with input arguments.
else
 # Get input filenames
 for i in "$@"
  do
   find "$i" -name '*.pdf' | sort >> $TEMPFOLDER/files.txt
  done
 # Get output filename
 OUTPUTFILEGUI="`gtkdialog3 --program=OUTPUT_FILE_DIALOG --center`"
 OUTPUTFILE="`echo "$OUTPUTFILEGUI" | grep OUTPUTFILE | cut -f 2 -d '"' | sed 's/\"//g' `"
 if [ "`echo "$OUTPUTFILEGUI" | grep EXIT | cut -f 2 -d '\"' | sed 's/\"//g'`" != "JOIN-NOW" ]; then
  exit 0
 fi
 VIEWOUTPUT="`echo "$OUTPUTFILEGUI" | grep VIEWOUTPUT | cut -f 2 -d '"' | sed 's/\"//g' `"
fi

# Make sure output file has an extension
OUTPUTFILE="`echo $OUTPUTFILE | gawk '{gsub (/\.pdf$|\.PDF$/,"",$0); print $0'}`"
OUTPUTFILE="$OUTPUTFILE.pdf"

# Symlink files for us to join
while read line
do FILECOUNT=$(($FILECOUNT+1))
 ln -s "`realpath "$line"`" $TEMPFOLDER/$FILECOUNT
done < $TEMPFOLDER/files.txt

# Remove list
rm -f $TEMPFOLDER/files.txt

# Join files together
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile="$OUTPUTFILE" $TEMPFOLDER/*

# View output file
if [ "$VIEWOUTPUT" = "true" ]
 then
 rox "$OUTPUTFILE"
fi

#remove temporary directory
rm -rf $TEMPFOLDER
b1.jpg
 Description   
 Filesize   8.63 KB
 Viewed   2378 Time(s)

b1.jpg

a.jpg
 Description   
 Filesize   20.81 KB
 Viewed   2375 Time(s)

a.jpg


Last edited by disciple on Sat 30 Aug 2008, 03:03; edited 4 times in total
Back to top
View user's profile Send private message 
muggins

Joined: 20 Jan 2006
Posts: 6683
Location: lisbon

PostPosted: Thu 25 Oct 2007, 04:24    Post subject:  

i'll be interested to try it out. until your post i didn't realise that you could use GS to join pdf files...i've been using pdftk. i'll see how the output .pdf compares from both, e.g. final file size.
Back to top
View user's profile Send private message 
disciple

Joined: 20 May 2006
Posts: 6428
Location: Auckland, New Zealand

PostPosted: Thu 25 Oct 2007, 20:08    Post subject:  

I forgot to test with a recent Puppy. If muggins doesn't post anything I'll do that after my next exam.
Back to top
View user's profile Send private message 
wingruntled

Joined: 20 Feb 2007
Posts: 287
Location: Great Lakes

PostPosted: Thu 25 Oct 2007, 20:29    Post subject:  

disciple
In all these years I have never ran into a situation where I needed to "combine" a PDF. Edit a protected yes, but combine no. What is this used for?
Back to top
View user's profile Send private message 
muggins

Joined: 20 Jan 2006
Posts: 6683
Location: lisbon

PostPosted: Thu 25 Oct 2007, 21:37    Post subject:  

if i've dloaded a book, where the various chapters are available as separate .pdfs, i'm in the habit of using pdftk to concatenate them all into one .pdf. one side effect of doing so is that the resultant .pdf is smaller in size than the sum of the parts. i usually go further, to reduce disk usage, by making the resultant .pdf into a self-extracting archive.
Back to top
View user's profile Send private message 
wingruntled

Joined: 20 Feb 2007
Posts: 287
Location: Great Lakes

PostPosted: Thu 25 Oct 2007, 22:38    Post subject:  

muggins

The Books are protectrd and, well????
I can do it in MS but not in Linux. But then again "we" broke DRM and got shut down. Sad
Back to top
View user's profile Send private message 
john biles


Joined: 17 Sep 2006
Posts: 1408
Location: Australia

PostPosted: Thu 25 Oct 2007, 23:35    Post subject:  

Hello disciple,
Any change of a pet. or tar.gz
Or do you just copy the large file into some where?

_________________
Legacy OS 2.1 LTS Released! Install me on a new! EXT2 Partition with 500Mb of swap and I'll be happy. Razz
Legacy OS 4 Released! Install to newer legacy hardware / early EeePC's Very Happy
Back to top
View user's profile Send private message Visit poster's website 
muggins

Joined: 20 Jan 2006
Posts: 6683
Location: lisbon

PostPosted: Thu 25 Oct 2007, 23:42    Post subject:  

wingruntled:

you'll have to excuse my ignorance . i sat for english, for my higher school certificate, at the lowest level possible & failed, but i've got know idea what The Books are protectrd and, well???? means. perhaps i'll have to dig out some joyce books from the library, a uni course in semiotics &/or deconstructionalism, and grow some peyote, to get your drift.

johnbiles:

i just copied & pasted it, as a script, to /usr/bin & gave it executable permission. i still haven't gotten around to trying it on any pdf's yet, but will do for sure, and report back results.
Back to top
View user's profile Send private message 
disciple

Joined: 20 May 2006
Posts: 6428
Location: Auckland, New Zealand

PostPosted: Fri 26 Oct 2007, 04:00    Post subject:  

Q - Why would we want to join PDFs?
A - Sometimes I have a whole lot of small pdfs from some lecturer, e.g. several sheets of tutorial questions, and also answers, and I want to print them out all together, say two to a page and double-sided, so I can cram for a test. I imagine it is possible to print multiple files like this with some obscure command, but it is easier to join them together. (BTW I also find it easiest to use GTKLPQ (do a forum search for it) to do stuff like printing two-up, but I guess if ePDF is now in Puppy it might have a proper printer interface).
Also, sometimes I want to join several types of file together into one pdf to electronically submit an assignment. Until I discovered the gs command I used pdf995 (a windows pdf printer) and pdfedit995 (which lets you append a file to the last one you print), but now I can use jcoder's (AD-FREE!) pdf printer and this.
Or, like muggins, you might find it convenient to join together the parts of some (maybe official) document you downloaded.

At work (a civil engineering consultancy) we always archive big reports and manuals and stuff as PDF, and the main reason we use Adobe Acrobat is for joining files together. As far as I know, Acrobat can't do what this script does, which I think is far more efficient than joining files manually in Acrobat, but of course with this if you have two pages in the wrong order you can't just click and swap them around. I designed the script specifically for this sort of task, where you have a lot of files you want to join together that might be in different folders for different sections and subsections, and stuff, and it is natural to name the files in an organised way anyway.

Anyway, with this script I no longer feel the need to get pdfedit to compile in my Puppy Smile
-----------------------
john biles - Yes - just put that code in a file in /usr/bin or /usr/local/bin (I think these places are most appropriate), and make it executable.
If someone wants to make a dotpup with a roxapp and a menu entry, feel free. I didn't really feel inclined to take it further at the moment since I can't get it to work with paths with spaces.
-----------------------
Muggins - it would depend why they are smaller when you join them together. If it is because they are saved in a more "compressed" way somehow, or have embedded fonts removed or something, then my script won't make them smaller - I think it just joins the raw pdfs together. But if one pdf is inherently smaller than two pdfs half the size... well that seems funny. What happens if you print them again? Because you can change some sort of quality settings in the CUPS pdf printer. Maybe you could join them together and then print them out even smaller... but I guess pdftk does quality settings anyway.

I like my script. It is useful and light and easy to use. I just wish it would work with spaces...
Back to top
View user's profile Send private message 
john biles


Joined: 17 Sep 2006
Posts: 1408
Location: Australia

PostPosted: Fri 26 Oct 2007, 20:14    Post subject:  

Hello disciple,
Your script doesn't work in Puppy 2.14
q
Complains about "line 130: gtkdialog3: command not found"

Looks like you made it for Puppy 3.01?, can it made to run in the Puppy 2 series?

_________________
Legacy OS 2.1 LTS Released! Install me on a new! EXT2 Partition with 500Mb of swap and I'll be happy. Razz
Legacy OS 4 Released! Install to newer legacy hardware / early EeePC's Very Happy
Back to top
View user's profile Send private message Visit poster's website 
disciple

Joined: 20 May 2006
Posts: 6428
Location: Auckland, New Zealand

PostPosted: Fri 26 Oct 2007, 22:43    Post subject:  

Ah. Apparently gtkdialog3 was introduced with Puppy 2.15 - you can get the pet here.

I've actually only tested the script in Grafpup 104, which I use even though it's as old as the hills - so you'll have no problem with the gui itself. I just probably should have checked that the same gs command still works in newer puppies.

You also need gtkdialog3 for the newer versions of Pfind and several of the other wonderful guis people have done. Personally I think every user of an older Puppy should get gtkdialog3 just for the sake of Pfind Smile
Back to top
View user's profile Send private message 
muggins

Joined: 20 Jan 2006
Posts: 6683
Location: lisbon

PostPosted: Wed 31 Oct 2007, 19:56    Post subject:  

disciple,

i just remembered your program, & how i said i'd give it a test & compare it with the results of using pdftk.

well i tried it on 7 different pdf chapters, from the same book. as separate .pdfs, they totalled 372kbytes in size.

joining them with:

pdftk a1.pdf a2.pdf a3.pdf a4.pdf a5.pdf a6.pdf a7.pdf cat output test.pdf

the resultant file was 339k. using your program, 332k

While these size reductions seem quite small, I'd imagine for larger files the space savings would be quite significant. Plus I only need to open one file to read the book, rather than seven.
Back to top
View user's profile Send private message 
muggins

Joined: 20 Jan 2006
Posts: 6683
Location: lisbon

PostPosted: Wed 31 Oct 2007, 20:05    Post subject:  

Plus regarding gtkdialog3, I've tried extracting, and then running, some .pets that needed it on pup1.08, and once I installed gtkdialog3 they worked without problems.
Back to top
View user's profile Send private message 
disciple

Joined: 20 May 2006
Posts: 6428
Location: Auckland, New Zealand

PostPosted: Wed 31 Oct 2007, 21:29    Post subject:  

My script isn't 8MB or something either Smile

That size difference is interesting, but it would be strange if it scaled up for bigger files. If you're so worried about size, why haven't you checked out pdftk's claim to be able to recompress pdf's?

If for some reason there is an advantage to pdftk, it would actually be very simple to adapt the gui to use it, or for that matter to do other things. Incidentally, I tried the Pdftk gui on windows a long time ago, and could never figure it out. I hadn't been enlightened about the command line Smile This is just my attempt to make something a little more user friendly than it.
Back to top
View user's profile Send private message 
muggins

Joined: 20 Jan 2006
Posts: 6683
Location: lisbon

PostPosted: Wed 31 Oct 2007, 22:38    Post subject:  

it was on windows that i first used pdftk using, not the commandline, but:

http://www.paehl.de/pdf/?GUI_for_PDFTK
Back to top
View user's profile Send private message 
Display posts from previous:   Sort by:   
Page 1 of 2 [29 Posts]   Goto page: 1, 2 Next
Post new topic   Reply to topic View previous topic :: View next topic
 Forum index » Advanced Topics » Additional Software (PETs, n' stuff) » Unsorted
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.1085s ][ Queries: 13 (0.0042s) ][ GZIP on ]