Page 1 of 2

Posted: Thu 01 Nov 2007, 04:36
by disciple
That's the one.

OK. I just updated the script. Changes:
-A bit extra in the help windows
-Quoted the ROX command - the script now fully works with spaces in the path/name of the output file
-Changed the defaults so they are set to the working directory. (I really was silly - I was using /temp instead of /tmp :))
-Changed the file selector for the output file to accept folders instead of files, so you can choose an empty directory, but you have to manually add the file name, so you can't just click on an existing file.

I'll just go and remove that "apology dialog". It is a leftover from an earlier stage of the design process.

Posted: Thu 01 Nov 2007, 07:30
by disciple
OK, I sorted out the output file selection - that works great now.
If I can only get spaces in the input files to work...

Posted: Thu 01 Nov 2007, 13:57
by john biles
Hello disciple,
Just letting you know that I will be including joinPDF in the Bugfix / Update of TEENpup 2.14.1 which will be released soon. :D

Posted: Thu 01 Nov 2007, 20:53
by disciple
Sweet - hopefully some people will find it useful.

I've given up on trying to get it to work with spaces in the input files, as I can't figure out why gs can handle quoted inputs manually typed at the command line, but not from a script :(

spaces in filenames

Posted: Thu 01 Nov 2007, 22:05
by vovchik
Dear disciple,

I know this is a crude suggestion, but you could use a routine like the one below, but rather than using "mv" as in the attached script, replace it with "ln" to create symlinks in a temporary dir. You could then pass the symlinks to gs and remove them and the tmp dir after processing the pdfs. It's not a particularly elegant solution, I admit, but I think it just might do the trick.

I do find your program useful as I compose music and output scores to pdf files (for the moment, using Windows in a virtual machine). The parts - sometimes over 25 separate files for various instruments - also have to be included at the end of the main pdf (conductor's score) to simplify printing and emailing. JoinPdf does exactly what I need for that operation, so thanks.

With kind regards,
vovchik

Code: Select all

#!/bin/bash
IFS='
'
j=`find $1 -printf "%d\n" | sort -u | tail -n 1`
j=$((j-1))
echo "Max dir depth:" $j
for (( i=0; i<=j ; i++ ))
do
for name in `find -mindepth $i -maxdepth $i -iname "* *" -printf "%p\n"`
do
newname=`echo "$name" | tr " " "_"`
echo "$name" "$newname"
mv "$name" "$newname"
done
done
##########

Posted: Fri 02 Nov 2007, 00:22
by disciple
I do find your program useful
That's nice to know.

I had thought of doing what you suggest, I was just waiting a while in case someone else had a look and figured out how to do it properly :)
It should be possible to do - I can do a script with the specific filenames in it, and that works, so I think I don't quite understand what's happening when I try to replace newlines from the sort output with spaces.

Posted: Sat 08 Dec 2007, 23:26
by disciple
muggins thinks this approach might help deal with spaces in the name/path. It is too complicated for me, but someone else maybe able to do something with it :)

Posted: Sat 08 Dec 2007, 23:51
by muggins
Sorry Disciple,

I should have clarified. The only bit that deals with filenames containing spaces could be summarised as:

Code: Select all

......whatever pre-processing is done here......

for file in $directory/*
do
   if [ -d "$file" ]; then #To identify if it's a directory
      cd "$file"         #If so change to that directory
          scriptname  #The script is calling itself recursively to process the directory
          cd ..            #Then when finished, go back up one directory
     else
      filename=${file%.pdf}    #This bit deals with spaces in filename     
      xxx "$filename.pdf"  #This is whatever processing your script does
     fi
done
I hope that's a bit clearer. All the rest of that other script was just to do with whether the user passed any arguments to the script, and to identify .pdf files from other files.

Posted: Sun 09 Dec 2007, 00:27
by disciple
Yes, but I couldn't figure out how to integrate that with the recursive approach of my script :(

Posted: Mon 25 Aug 2008, 10:34
by disciple
We can create a temporary folder somewhere, and as we find each pdf, symlink it into that folder, with a name that is the current count, and then we join like this:
~# gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile="/ouput path/output file.pdf" ~/.joinpdf/*.pdf
I checked, and gs does join files with an *, so now we just need to figure out how to create all those symlinks. I figure no one will be joining more than 100 files, so we can start at 01. Or I guess to be really sure we could do 001, 002, 003...
Or maybe it would be easier to start at 100 so we don't have to figure out a way to prefix the zeros. So we just somehow get a count and add 100.

Updated joinPdf

Posted: Sat 30 Aug 2008, 07:08
by disciple
OK, fixed. New version in the first post.
Thanks Pizzasgood for the help.

Posted: Thu 06 Aug 2009, 08:17
by disciple
You might also like to try pdfshuffler.

Posted: Thu 27 Aug 2009, 09:28
by disciple
I've made an 11MB .sfs to make it easy to install pdfshuffler and its dependencies - python, pygtk etc. http://www.murga-linux.com/puppy/viewtopic.php?t=45826

JoinPDF

Posted: Fri 16 Aug 2013, 22:21
by MikeyBrat
Works great!
Thanks.

-
Puppy Precise 5.6.1 frugal usb (among other puppies)

Posted: Tue 27 Jun 2017, 05:20
by disciple
Attached is an update.
1. I wanted a fully automatic command line only version. This is now `joinpdf` while the gui version is now `gjoinpdf`. I guess I could have done it in just the one script by checking $0, but I didn't want to spend the time on it.
2. The pdfs I'm dealing with these days tend to be too broken to work well with ghostscript, so I'm using pdfunite from poppler, which performs much better anyway. You can comment and uncomment a line if you want to use gs still (I guess ideally it should just check for pdfunite).

Posted: Thu 27 Jul 2017, 04:24
by disciple
I'm using pdfunite from poppler
Even though they're both java based, I was thinking about the possibility of instead using sejda (command line version of pdfsam) or pdftk, to create the pdf with bookmarks.
I found that there is now a very similar project which already creates the pdf with bookmarks. It is using gs, so it will have the performance problems I was complaining about, whereas sejda or pdfsam wouldn't.
https://github.com/bronson/pdfdir

Posted: Fri 15 Jun 2018, 07:20
by disciple
Hi guys,
I have reworked both scripts a reasonable amount.
Changes include:
- searches for pdfs by mime type rather than file extension (should be more reliable - I assume slower, but it can still join 960 files pretty quickly here!).
- less Puppy specific (gui version doesn't assume rox).
- uses natsort if available, otherwise coreutils `sort -V`, with plain sort as last resort.
- uses pdfunite if available, otherwise gs.
- at least one gui bugfix (I'm rather surprised no one reported the bug - does everybody have the same use pattern as me?).
- gui related error checking, cosmetic enhancements, and ability to enter the program you want to use to view (or postprocess :) ) the output file.
- main gui isn't blocked by the help window or the browser launched to visit this thread).
- better handling of temporary files (multiple instances won't mess with each other, and it should be more portable).

Posted: Tue 27 Nov 2018, 17:44
by greengeek
Just getting this on my list for testing.

Posted: Sat 06 Apr 2019, 11:45
by disciple
Note in case anyone has an issue with the latest version using gawk to generate the output file name:
Some systems these days may include mawk rather than gawk by default. Mawk seems to work fine, too. In fact I tested it with a busybox awk and on OSX with what I think is the original Unix awk, and it seemed to work everywhere.
So I'm not 100% sure - maybe the awk I had available at the time was some obscure or old crippled implementation, or maybe the internationalisation of gawk helped with accented letters in gjoinpdf or something, or maybe it was unnecessary and it would have worked with plain awk all along.