joinPdf

Word processors, spreadsheets, presentations, translation, etc.
Message
Author
disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

joinPdf

#1 Post by disciple »

EDIT
See this post for latest version including screenshots
http://www.murga-linux.com/puppy/viewto ... 000#996000
/EDIT

(Screenshots at the bottom)
Updated 30 August 2008 to work with spaces in paths.

Works two ways:
1 - you call it without any input arguments, and you get a gui to specify the output file, a folder with the input files, and whether or not you want to view the file it creates.
2 - you call it from the command line with input files and/or folders, or drop things on it or use it from the rox "open with" menu or something. It will pop up a dialogue asking where you want to save it, and whether or not you want to view the file it creates (if you want to be able to use it from the console without popping up a dialogue then I'm sure you're smart enough to modify it to do that).

It will join all the .pdf files in a folder recursively, sorted in alphanumeric path order. e.g. if I do

Code: Select all

joinPdf /~/pdf
for a hypothetical folder, it will join all the files together in this order:
  • /~/pdf/1.pdf
    /~/pdf/2/4.pdf
    /~/pdf/2/5/1.pdf
    /~/pdf/2/5/a.pdf
    /~/pdf/2/5/b.pdf
    /~/pdf/2/5/c.pdf
    /~/pdf/3.pdf
However, if I run it from the command line with more than one input argument (files or folders or both), then it will join them in the order I specify. So if I do

Code: Select all

joinpdf z b.pdf a
then it will first do the pdfs in folder z, in alphanumeric order, then it will add b.pdf, and then it will add the pdfs in folder a in alphanumeric order.

Requires gtkdialog3 (included in recent Puppies) and ghostscript of course.
Has no error handling to deal with ghostscript errors.
Doesn't check whether you want to overwrite if you specify an output file that already exists - just overwrites automatically.
Works recursively with pdfs in any depth of folders.
Isn't phased by folders containing files that aren't pdfs
Does make sure the output file has a .pdf extension but not a .pdf.pdf extension.

The full code at the moment:

Code: Select all

#! /bin/bash
# Script for Puppy Linux to combine pdf files.
# Version 4 by disciple, 30 August 2008.
# http://www.murga-linux.com/puppy/viewtopic.php?p=149208#149208
# Currently has NO ERROR HANDLING
# You may experience errors if you have pdfs that are broken or are not pdfs at all.
# I'm not sure, but because it doesn't delete the temporary directory at the start, you may also get unexpected results if you use "View the file afterwards", crash your Pdf viewer, and then join some more pdfs.
# Yes, the use of all those symlinks is an ugly hack, but I couldn't get gs to join files with spaces in the path otherwise :(

# Set defaults
INPUTFOLDER="`pwd`"
OUTPUTFILE="`pwd`/combined.pdf"

# Set temporary directory
TEMPFOLDER=/tmp/joinPdfdir
mkdir $TEMPFOLDER

# Initialise filecount
FILECOUNT=100

export MAIN_DIALOG="
<window title=\"Puppy's pdf joining\"icon-name=\"gtk-file\">
 <vbox>
  <text>
   <label>You can recursively join all the pdfs in a directory and any number of subdirectories in normal alphanumeric order.</label>
  </text>
  <text>
   <label>Make sure you name the files and folders appropriately so they are joined in the order you want.</label>
  </text>
  <text>
   <label>e.g. Pdfs in a subfolder called "A" will come after a file called "1.pdf", and before a file called "B.pdf"</label>
  </text>
  <frame Location of input files>
   <hbox>
    <entry accept=\"directory\">
     <variable>INPUTFOLDER</variable>
     <input>echo '$INPUTFOLDER'</input>
    </entry>
    <button>
     <input file stock=\"gtk-open\"></input>
     <action type=\"fileselect\">INPUTFOLDER</action>
     <action>refresh:INPUTFOLDER</action>
    </button>
   </hbox>
  </frame>
  <frame Output file>
   <hbox>
    <entry accept=\"savefilename\">
     <variable>OUTPUTFILE</variable>
     <input>echo '$OUTPUTFILE'</input>
    </entry>
    <button>
     <input file stock=\"gtk-open\"></input>
     <action type=\"fileselect\">OUTPUTFILE</action>
    </button>
   </hbox>
  </frame>
  <checkbox>
   <label>View the file afterwards</label>
   <default>true</default>
   <variable>VIEWOUTPUT</variable>
  </checkbox>
  <hbox>
   <button>
    <input file stock=\"gtk-ok\"></input>
    <label>Join pdfs</label>
    <action type=\"exit\">JOIN-NOW</action>
   </button>
   <button>
    <input file stock=\"gtk-dialog-info\"></input>
    <label>Help</label>
    <action>gtkdialog3 -c --program HELP_DIALOG</action>
   </button>
   <button cancel></button>
  </hbox>
 </vbox>
 </window>"

export HELP_DIALOG="
<window title=\"joinPdf info\"icon-name=\"gtk-dialog-info\">
<vbox>
 <text>
 <label>If you run joinPdf from the command line with inputs, it will pop up a dialogue to ask you what you want to save the combined file as, and will then join them as you would expect.  If you specify more than one input (file or folder), it will join them in the order that you specify, and things that it joins recursively are sorted globally (per input).  They are deliberately not sorted folders first and then files.</label>
 </text>
 <text>
 <label>\"\"</label>
 </text>
 <text>
 <label>The file chooser for the output file can only choose a directory - you need to add a filename, but the script will sort out the .pdf extension.</label>
 </text>
 <hbox>
  <button>
   <label>\"Visit forum thread\"</label>
   <action>defaultbrowser http://www.murga-linux.com/puppy/viewtopic.php?p=149208#149208</action>
  </button>
  <button ok></button>
 </hbox>
</vbox>
</window>
"

export OUTPUT_FILE_DIALOG="
<window title=\"joinPdf\"icon-name=\"gtk-file\">
<vbox>
 <text>
 <label>What would you like to save the output file as?</label>
 </text>
 <hbox>
  <entry accept=\"savefilename\">
   <variable>OUTPUTFILE</variable>
   <input>echo '$OUTPUTFILE'</input>
  </entry>
  <button>
   <input file stock=\"gtk-open\"></input>
   <action type=\"fileselect\">OUTPUTFILE</action>
  </button>
 </hbox>
 <hbox>
 <button ok>
  <action type=\"exit\">JOIN-NOW</action>
 </button>
 <button cancel></button>
 </hbox>
 <checkbox>
  <label>View the file afterwards</label>
  <default>true</default>
  <variable>VIEWOUTPUT</variable>
 </checkbox>
</vbox>
</window>
"

# Show gui if run without input arguments.
test -sd "$@"
if [ "$?" = "0" ]; then
 MAINGUI="`gtkdialog3 -c --program MAIN_DIALOG`"
 if [ "`echo "$MAINGUI" | grep EXIT | cut -f 2 -d '\"' | sed 's/\"//g'`" != "JOIN-NOW" ]; then
  exit 0
 fi
 INPUTFOLDER="`echo "$MAINGUI" | grep INPUTFOLDER | cut -f 2 -d '"' | sed 's/\"//g' `"
 OUTPUTFILE="`echo "$MAINGUI" | grep OUTPUTFILE | cut -f 2 -d '"' | sed 's/\"//g' `"
 VIEWOUTPUT="`echo "$MAINGUI" | grep VIEWOUTPUT | cut -f 2 -d '"' | sed 's/\"//g' `"
 find "$INPUTFOLDER" -name '*.pdf' | sort > $TEMPFOLDER/files.txt

# Just combine the pdfs if run with input arguments.
else
 # Get input filenames
 for i in "$@"
  do
   find "$i" -name '*.pdf' | sort >> $TEMPFOLDER/files.txt
  done
 # Get output filename
 OUTPUTFILEGUI="`gtkdialog3 --program=OUTPUT_FILE_DIALOG --center`"
 OUTPUTFILE="`echo "$OUTPUTFILEGUI" | grep OUTPUTFILE | cut -f 2 -d '"' | sed 's/\"//g' `"
 if [ "`echo "$OUTPUTFILEGUI" | grep EXIT | cut -f 2 -d '\"' | sed 's/\"//g'`" != "JOIN-NOW" ]; then
  exit 0
 fi
 VIEWOUTPUT="`echo "$OUTPUTFILEGUI" | grep VIEWOUTPUT | cut -f 2 -d '"' | sed 's/\"//g' `"
fi

# Make sure output file has an extension
OUTPUTFILE="`echo $OUTPUTFILE | gawk '{gsub (/\.pdf$|\.PDF$/,"",$0); print $0'}`"
OUTPUTFILE="$OUTPUTFILE.pdf"

# Symlink files for us to join
while read line
do FILECOUNT=$(($FILECOUNT+1))
 ln -s "`realpath "$line"`" $TEMPFOLDER/$FILECOUNT
done < $TEMPFOLDER/files.txt

# Remove list
rm -f $TEMPFOLDER/files.txt

# Join files together
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile="$OUTPUTFILE" $TEMPFOLDER/*

# View output file
if [ "$VIEWOUTPUT" = "true" ]
 then
 rox "$OUTPUTFILE"
fi

#remove temporary directory
rm -rf $TEMPFOLDER
Attachments
b1.jpg
(8.63 KiB) Downloaded 3941 times
a.jpg
(20.81 KiB) Downloaded 3758 times
Last edited by disciple on Fri 15 Jun 2018, 07:23, edited 5 times in total.

muggins
Posts: 6724
Joined: Fri 20 Jan 2006, 10:44
Location: hobart

#2 Post by muggins »

i'll be interested to try it out. until your post i didn't realise that you could use GS to join pdf files...i've been using pdftk. i'll see how the output .pdf compares from both, e.g. final file size.

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#3 Post by disciple »

I forgot to test with a recent Puppy. If muggins doesn't post anything I'll do that after my next exam.

wingruntled

#4 Post by wingruntled »

disciple
In all these years I have never ran into a situation where I needed to "combine" a PDF. Edit a protected yes, but combine no. What is this used for?

muggins
Posts: 6724
Joined: Fri 20 Jan 2006, 10:44
Location: hobart

#5 Post by muggins »

if i've dloaded a book, where the various chapters are available as separate .pdfs, i'm in the habit of using pdftk to concatenate them all into one .pdf. one side effect of doing so is that the resultant .pdf is smaller in size than the sum of the parts. i usually go further, to reduce disk usage, by making the resultant .pdf into a self-extracting archive.

wingruntled

#6 Post by wingruntled »

muggins

The Books are protectrd and, well????
I can do it in MS but not in Linux. But then again "we" broke DRM and got shut down. :(

User avatar
john biles
Posts: 1458
Joined: Sun 17 Sep 2006, 14:05
Location: Australia
Contact:

#7 Post by john biles »

Hello disciple,
Any change of a pet. or tar.gz
Or do you just copy the large file into some where?
Legacy OS 2017 has been released.

muggins
Posts: 6724
Joined: Fri 20 Jan 2006, 10:44
Location: hobart

#8 Post by muggins »

wingruntled:

you'll have to excuse my ignorance . i sat for english, for my higher school certificate, at the lowest level possible & failed, but i've got know idea what The Books are protectrd and, well???? means. perhaps i'll have to dig out some joyce books from the library, a uni course in semiotics &/or deconstructionalism, and grow some peyote, to get your drift.

johnbiles:

i just copied & pasted it, as a script, to /usr/bin & gave it executable permission. i still haven't gotten around to trying it on any pdf's yet, but will do for sure, and report back results.

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#9 Post by disciple »

Q - Why would we want to join PDFs?
A - Sometimes I have a whole lot of small pdfs from some lecturer, e.g. several sheets of tutorial questions, and also answers, and I want to print them out all together, say two to a page and double-sided, so I can cram for a test. I imagine it is possible to print multiple files like this with some obscure command, but it is easier to join them together. (BTW I also find it easiest to use GTKLPQ (do a forum search for it) to do stuff like printing two-up, but I guess if ePDF is now in Puppy it might have a proper printer interface).
Also, sometimes I want to join several types of file together into one pdf to electronically submit an assignment. Until I discovered the gs command I used pdf995 (a windows pdf printer) and pdfedit995 (which lets you append a file to the last one you print), but now I can use jcoder's (AD-FREE!) pdf printer and this.
Or, like muggins, you might find it convenient to join together the parts of some (maybe official) document you downloaded.

At work (a civil engineering consultancy) we always archive big reports and manuals and stuff as PDF, and the main reason we use Adobe Acrobat is for joining files together. As far as I know, Acrobat can't do what this script does, which I think is far more efficient than joining files manually in Acrobat, but of course with this if you have two pages in the wrong order you can't just click and swap them around. I designed the script specifically for this sort of task, where you have a lot of files you want to join together that might be in different folders for different sections and subsections, and stuff, and it is natural to name the files in an organised way anyway.

Anyway, with this script I no longer feel the need to get pdfedit to compile in my Puppy :)
-----------------------
john biles - Yes - just put that code in a file in /usr/bin or /usr/local/bin (I think these places are most appropriate), and make it executable.
If someone wants to make a dotpup with a roxapp and a menu entry, feel free. I didn't really feel inclined to take it further at the moment since I can't get it to work with paths with spaces.
-----------------------
Muggins - it would depend why they are smaller when you join them together. If it is because they are saved in a more "compressed" way somehow, or have embedded fonts removed or something, then my script won't make them smaller - I think it just joins the raw pdfs together. But if one pdf is inherently smaller than two pdfs half the size... well that seems funny. What happens if you print them again? Because you can change some sort of quality settings in the CUPS pdf printer. Maybe you could join them together and then print them out even smaller... but I guess pdftk does quality settings anyway.

I like my script. It is useful and light and easy to use. I just wish it would work with spaces...

User avatar
john biles
Posts: 1458
Joined: Sun 17 Sep 2006, 14:05
Location: Australia
Contact:

#10 Post by john biles »

Hello disciple,
Your script doesn't work in Puppy 2.14
q
Complains about "line 130: gtkdialog3: command not found"

Looks like you made it for Puppy 3.01?, can it made to run in the Puppy 2 series?
Legacy OS 2017 has been released.

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#11 Post by disciple »

Ah. Apparently gtkdialog3 was introduced with Puppy 2.15 - you can get the pet here.

I've actually only tested the script in Grafpup 104, which I use even though it's as old as the hills - so you'll have no problem with the gui itself. I just probably should have checked that the same gs command still works in newer puppies.

You also need gtkdialog3 for the newer versions of Pfind and several of the other wonderful guis people have done. Personally I think every user of an older Puppy should get gtkdialog3 just for the sake of Pfind :)

muggins
Posts: 6724
Joined: Fri 20 Jan 2006, 10:44
Location: hobart

#12 Post by muggins »

disciple,

i just remembered your program, & how i said i'd give it a test & compare it with the results of using pdftk.

well i tried it on 7 different pdf chapters, from the same book. as separate .pdfs, they totalled 372kbytes in size.

joining them with:

pdftk a1.pdf a2.pdf a3.pdf a4.pdf a5.pdf a6.pdf a7.pdf cat output test.pdf

the resultant file was 339k. using your program, Image 332k Image

While these size reductions seem quite small, I'd imagine for larger files the space savings would be quite significant. Plus I only need to open one file to read the book, rather than seven.

muggins
Posts: 6724
Joined: Fri 20 Jan 2006, 10:44
Location: hobart

#13 Post by muggins »

Plus regarding gtkdialog3, I've tried extracting, and then running, some .pets that needed it on pup1.08, and once I installed gtkdialog3 they worked without problems.

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#14 Post by disciple »

My script isn't 8MB or something either :)

That size difference is interesting, but it would be strange if it scaled up for bigger files. If you're so worried about size, why haven't you checked out pdftk's claim to be able to recompress pdf's?

If for some reason there is an advantage to pdftk, it would actually be very simple to adapt the gui to use it, or for that matter to do other things. Incidentally, I tried the Pdftk gui on windows a long time ago, and could never figure it out. I hadn't been enlightened about the command line :) This is just my attempt to make something a little more user friendly than it.

muggins
Posts: 6724
Joined: Fri 20 Jan 2006, 10:44
Location: hobart

#15 Post by muggins »

it was on windows that i first used pdftk using, not the commandline, but:

http://www.paehl.de/pdf/?GUI_for_PDFTK

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#16 Post by disciple »

That's the one.

OK. I just updated the script. Changes:
-A bit extra in the help windows
-Quoted the ROX command - the script now fully works with spaces in the path/name of the output file
-Changed the defaults so they are set to the working directory. (I really was silly - I was using /temp instead of /tmp :))
-Changed the file selector for the output file to accept folders instead of files, so you can choose an empty directory, but you have to manually add the file name, so you can't just click on an existing file.

I'll just go and remove that "apology dialog". It is a leftover from an earlier stage of the design process.

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#17 Post by disciple »

OK, I sorted out the output file selection - that works great now.
If I can only get spaces in the input files to work...

User avatar
john biles
Posts: 1458
Joined: Sun 17 Sep 2006, 14:05
Location: Australia
Contact:

#18 Post by john biles »

Hello disciple,
Just letting you know that I will be including joinPDF in the Bugfix / Update of TEENpup 2.14.1 which will be released soon. :D
Legacy OS 2017 has been released.

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#19 Post by disciple »

Sweet - hopefully some people will find it useful.

I've given up on trying to get it to work with spaces in the input files, as I can't figure out why gs can handle quoted inputs manually typed at the command line, but not from a script :(

User avatar
vovchik
Posts: 1507
Joined: Tue 24 Oct 2006, 00:02
Location: Ukraine

spaces in filenames

#20 Post by vovchik »

Dear disciple,

I know this is a crude suggestion, but you could use a routine like the one below, but rather than using "mv" as in the attached script, replace it with "ln" to create symlinks in a temporary dir. You could then pass the symlinks to gs and remove them and the tmp dir after processing the pdfs. It's not a particularly elegant solution, I admit, but I think it just might do the trick.

I do find your program useful as I compose music and output scores to pdf files (for the moment, using Windows in a virtual machine). The parts - sometimes over 25 separate files for various instruments - also have to be included at the end of the main pdf (conductor's score) to simplify printing and emailing. JoinPdf does exactly what I need for that operation, so thanks.

With kind regards,
vovchik

Code: Select all

#!/bin/bash
IFS='
'
j=`find $1 -printf "%d\n" | sort -u | tail -n 1`
j=$((j-1))
echo "Max dir depth:" $j
for (( i=0; i<=j ; i++ ))
do
for name in `find -mindepth $i -maxdepth $i -iname "* *" -printf "%p\n"`
do
newname=`echo "$name" | tr " " "_"`
echo "$name" "$newname"
mv "$name" "$newname"
done
done
##########

Post Reply