Pdfshuffler .sfs - Edit pdfs :) fantastic!

Word processors, spreadsheets, presentations, translation, etc.
Message
Author
disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

attach files to a pdf

#91 Post by disciple »

If you are interested in "pdf portfolios" i.e. pdfs with attached files, pdfdetach can extract them. But mutool can do that and also attach them in the first place.

mutool does several other useful things - I haven't used it extensively, but I assume it is good as it is a brother of mupdf.
Last edited by disciple on Wed 03 Apr 2019, 05:18, edited 2 times in total.
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#92 Post by disciple »

Origami-pdf is another tool that may be worth mentioning, although it is in ruby, and I'm not sure if it is currently being developed anywhere:
Features
•Create PDF documents from scratch.
•Parse existing documents, modify them and recompile them.
•Explore documents at the object level, going deep into the document structure, uncompressing PDF object streams and desobfuscating names and strings.
•High-level operations, such as encryption/decryption, signature, file attachments...
•A GTK interface to quickly browse into the document contents.
There is a similar python project that seems to be dead.https://github.com/jesparza/peepdf
And a Java one that is alive https://github.com/itext/rups/
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#93 Post by disciple »

Since I'm recording my most important knowledge about pdfs in this thread:

Qpdf is the best option if you need to remove restrictions (e.g. can't print, can't edit, or can't copy text) from pdfs, which in most cases doesn't require a password. N.B. this doesn't help if you have a pdf where the text has not been encoding according to the standard characterset (which was a problem using the old "cups-pdf" virtual printer in Puppy i.e. copied text was gibberish because the characters in the pdf had all been randomly remapped).
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#94 Post by disciple »

Pdfshuffler users may want to note that the version some have treated as an unofficial upstream has now forked as pdfarranger, I guess because there has been a little activity on the original upstream lately.
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

Re: attach files to a pdf

#95 Post by disciple »

disciple wrote:If you are interested in "pdf portfolios" i.e. pdfs with attached files, pdfdetach can extract them. But mutool can do that and also attach them in the first place.
Poppler now also has a pdfattach.

Sejda has an option to unpack attachments, and an option to create a "portfolio/collection of attachments". I'm not sure whether or not that is actually different from attaching a file with mutool or pdfattach.
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#96 Post by disciple »

Some PDFs have named pages i.e. if you open them in some viewers (e.g. Adobe's) instead of displaying the logical page number they display a number in a different format (perhaps i, ii, iii... in the preface and 1, 2, 3 in the main body of the document), and/or some other text.

I spent quite some time looking for tools which can handle this. I have now discovered that these are called "page labels", and sejda has a tool to apply them to a pdf. I'm not sure if there are any free tools which preserve page labels (e.g. when splitting or merging pdfs), or can list the labels of pages in a pdf. Apparently poppler has supported page labels for a very long time, but tools like pdfunite don't seem to preserve them...

A document can contain more than one page with the same label, which I guess complicates things, and I think rather than being attached to individual pages, they are defined as a kind of metadata that says "starting from this logical page, number using this format".

What I would really like is a way to create bookmarks matching the page labels, and vice versa, and to split a document based on page labels, or the page label prefix.
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#97 Post by disciple »

disciple wrote:I'm not sure if there are any free tools which preserve page labels (e.g. when splitting or merging pdfs), or can list the labels of pages in a pdf
...
What I would really like is a way to create bookmarks matching the page labels, and vice versa, and to split a document based on page labels, or the page label prefix.
Ah, knowing the right terminology helps.
You can get information about the page labels with pdftk:

Code: Select all

# pdftk Drawing1.pdf dump_data
InfoBegin
InfoKey: ModDate
InfoValue: D:20190718221353
InfoBegin
InfoKey: CreationDate
InfoValue: D:20190718221353
InfoBegin
InfoKey: Title
InfoValue: sill name (2)
InfoBegin
InfoKey: Creator
InfoValue: AutoCAD 2019 - English 2019 (23.0s (LMS Tech))
InfoBegin
InfoKey: Producer
InfoValue: pdfplot15.hdi 15.00.152.000
NumberOfPages: 2
BookmarkBegin
BookmarkTitle: Sheets and Views
BookmarkLevel: 1
BookmarkPageNumber: 0
BookmarkBegin
BookmarkTitle: Random name
BookmarkLevel: 2
BookmarkPageNumber: 1
BookmarkBegin
BookmarkTitle: sill name (2)
BookmarkLevel: 2
BookmarkPageNumber: 2
PageMediaBegin
PageMediaNumber: 1
PageMediaRotation: 0
PageMediaRect: 0 0 1191 842
PageMediaDimensions: 1191 842
PageMediaBegin
PageMediaNumber: 2
PageMediaRotation: 0
PageMediaRect: 0 0 1191 842
PageMediaDimensions: 1191 842
PageLabelBegin
PageLabelNewIndex: 1
PageLabelStart: 1
PageLabelPrefix: [1] Random name
PageLabelNumStyle: NoNumber
PageLabelBegin
PageLabelNewIndex: 2
PageLabelStart: 1
PageLabelPrefix: [2] sill name (2)
PageLabelNumStyle: NoNumber
So it wouldn't be too hard to script a solution for the splitting, or listing the page labels.
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#98 Post by disciple »

disciple wrote:
disciple wrote:Does anybody by any chance know of a Linux program to change the default view settings in a pdf e.g. change from |continuous view" to "single page" view, or "fit width" to "100%" or "fit page"?
I haven't tried it, but I think there's a good chance Softmaker's new "Flexipdf Basic" would run in Wine. This functionality is provided, albeit in a rather strange place: File>Preferences>Loading, in the bottom section.
Ah, sejda looks like the best command line option to change settings like this.
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#99 Post by disciple »

It looks like the latest alternative to pdfmod, pdfshuffler etc is pdfslicer.
It uses gtkmm3 :(
The backend is qpdf, so I imagine it will do the best job at handling the most pdfs.
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#100 Post by disciple »

disciple wrote:Pdfshuffler users may want to note that the version some have treated as an unofficial upstream has now forked as pdfarranger, I guess because there has been a little activity on the original upstream lately.
Also note that the latest version of pdfarranger uses pikepdf (python interface to libqpdf) as backend if it is installed, rather than Pypdf2. I believe this will be better (check out the matrix on the pikepdf web page comparing it to Pypdf), although I haven't done any testing.

The latest version also introduces undo/redo.
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

User avatar
tallboy
Posts: 1760
Joined: Tue 21 Sep 2010, 21:56
Location: Drøbak, Norway

#101 Post by tallboy »

disciple wrote:Does anybody by any chance know of a Linux program to change the default view settings in a pdf e.g. change from |continuous view" to "single page" view, or "fit width" to "100%" or "fit page"?
Xpdf does that. It can be set default in it's prefs, and be modified on the spot in a document dialog window.
I have only used Xpdf and pdftk (don't use the pdftk-1.41-static pet) for the last 10-15 years. In my view, Xpdf produces the the cleanest and best looking fonts in a .pdf.

I also use to drag a .pdf file that I want to edit, to Abiword, which sometimes opens it like any other text document for editing. It depends on the origin of the document, for example a .pdf printout of a browser page, can very often be modified in Abi later. I have always given it a try.
BTW: I just tested my own claim, and opened a 90-page Huawei .pdf user manual (downloaded from Huawei as a .pdf) in Abi, along with some .pdf email attachments and bills. All very editable. I'm afraid I am an Abi-lover. :oops:

Hmm, on second thoughts (the other cell awakened), I have only been using Xpdf and pdftk, as long as I have been using Linux, some 20 years now... :lol:
True freedom is a live Puppy on a multisession CD/DVD.

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#102 Post by disciple »

Correct me if I'm wrong, but I think you misunderstand what I was asking.
Pdf viewers commonly allow you to configure defaults for the viewer, and it sounds like that is what you're describing i.e. it affects every pdf you open in that viewer. I am talking about the settings in the actual pdf i.e. if I change them using sejda or flexipdf and send the file to someone else, it affects how the file opens in their viewer, assuming their viewer is set up to respect the settings.
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#103 Post by disciple »

Another couple of tools along similar lines as something like pdfsam (although less mature) i.e. non-wysiwyg gui utilities:

https://github.com/muriloventuroso/pdftricks (vala/gtk3/ghostscript)
https://gitlab.com/scarpetta/pdfmixtool (c++/qt5/podofo, although looking at qpdf now)
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

User avatar
tallboy
Posts: 1760
Joined: Tue 21 Sep 2010, 21:56
Location: Drøbak, Norway

#104 Post by tallboy »

disciple wrote:...I think you misunderstand what I was asking.
Yup! :lol:
True freedom is a live Puppy on a multisession CD/DVD.

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

Re: attach files to a pdf

#105 Post by disciple »

disciple wrote:
disciple wrote:If you are interested in "pdf portfolios" i.e. pdfs with attached files, pdfdetach can extract them. But mutool can do that and also attach them in the first place.
Poppler now also has a pdfattach.
Which only attaches one file at a time, and you can't attach it in place i.e. you need to write out the pdf to a new file.
I tested it for sending Windows executables via email (both outlook
and gmail block executables and at least gmail blocks e.g. zip files these days. Success. Dealing with it seems nice and simple.
Sejda has an option to unpack attachments, and an option to create a "portfolio/collection of attachments". I'm not sure whether or not that is actually different from attaching a file with mutool or pdfattach.
If I create a pdf portfolio with sejda, when opening it in adobe reader it complains that it needs to install flash, although it seems to work without it. I'm not sure if pdf portfolios always use Flash, or if it is just the way they've chosen to implement it in sejda. I guess for pdfs to support Flash it must be written into the standard, which seems rather stupid as one day soon (if not already) most people won't have Flash...

The versions of mutool I have to hand don't actually seem to have the portfolio feature... perhaps it is a compile time option?
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#106 Post by disciple »

I haven't tried the linux version lately, but the Windows Foxit Reader has a good interface for attaching files. Even Adobe Reader can attach files on Windows, although the interface isn't good.
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#107 Post by disciple »

https://github.com/arrufat/pdftag
Gui to edit pdf metadata, written in vala and uses poppler
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#108 Post by disciple »

People might find these snippets from http://www.imagemagick.org/Usage/formats/#ps interesting:
Multi-paged PDF Documents...

You can use perl to combine multiple PDF files, without resorting to a IM, and its rasterization problem...

Code: Select all

#!/usr/bin/perl
#  Script   pdf-combiner.pl
use strict;
use warnings;
use PDF::Reuse;

prFile('combo.pdf'); # Output.
for (qw/a b c d/) # Inputs.
{
  prImage("result_$_.pdf");
  prPage();
}
prEnd();
You can also use a JAVA toolkit to merge IM generated images into a PDF producing a better PDF than a simpler one that IM will generate...

Code: Select all

#!/bin/bash

for x in ./*.jpeg
do
    echo $x to ${x}.pdf
    convert $x -quality 75 ${x}.pdf
done

echo Merging...
java tool.pdf.Merge *.pdf
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

User avatar
rcrsn51
Posts: 13096
Joined: Tue 05 Sep 2006, 13:50
Location: Stratford, Ontario

#109 Post by rcrsn51 »

Can you please clarify this? Is the objective to merge some individual PDFs into one file? Or is it to encapsulate some JPEG images into a PDF?

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#110 Post by disciple »

disciple wrote:Another program I don't think I've mentioned, particularly for doing ocr on scanned pdfs, is the Windows freeware "pdf-xchange viewer", which apparently runs well in Wine.
I know there are some other topics here about linux OCR engines and guis, but I thought I'd mention ocrmypdf, which is probably the easiest solution for adding a layer of ocred text to a raster pdf. It is from the same author as pikepdf, which is basically a python wrapper library for qpdf.

EDIT

FWIW I did some testing with ocrmypdf.
IIRC the ocr backend it uses is tesseract. Recognition was perfect except for white space; so more accurate than pdf-xchange, which I had handy for a comparison.
It shrinks test files from the scanner at my work a bit. If I install jbig2enc (which requires leptonica) it shrinks monochrome test files even more.

I wanted to know how to remove scanned text so I converted to a new pdf using pdftocairo, which removed the text and made the file a lot bigger, so presumably it reencoded without jbig2. Interestingly, if I rerun that output through ocrmypdf the result is even smaller. I was dealing with a very small single page file though, so metadata and stuff might show as a big difference in size which wouldn't be noticeable with a large file.
Last edited by disciple on Thu 24 Oct 2019, 20:04, edited 1 time in total.
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

Post Reply