Convert folder of pdf to txt?

Booting, installing, newbie
Post Reply
Message
Author
Branarchist
Posts: 7
Joined: Tue 27 Nov 2007, 05:04

Convert folder of pdf to txt?

#1 Post by Branarchist »

Hey All:

Trying to change a directory of pdf files into txt files using the pdftotext utility installed with xpdf through the puppyfiles office section. I've tried

ls *.pdf | pdftotext

with no success... it just goes to the pdftotext help file. I've used pdftotext to convert single files successfully, so I know that part works. Do I need to use xargs or something?

Thanks,
-- Brandan

User avatar
MU
Posts: 13649
Joined: Wed 24 Aug 2005, 16:52
Location: Karlsruhe, Germany
Contact:

#2 Post by MU »

ls *.pdf | while read a;do pdftotext "$a";done

Did not try myself, but this is a common way to batch-process a list of files.
I use it with other programs than pdftotext often.
Mark

muggins
Posts: 6724
Joined: Fri 20 Jan 2006, 10:44
Location: hobart

#3 Post by muggins »

Branarchist,

If you save this script as /usr/bin/pdfs2txt, and give it executable permission with chmod +x /usr/bin/pdfs2txt, then it will recursively run pdftotext on any .pdf files found, including any with spaces in their names.

You can either run it without any argument, in the target directory, or pass it the target directory as an argument. e.g.

cd /xxx
pdfs2txt

or

pdfs2txt /xxx

Code: Select all

#!/bin/sh
params=$#

if [ "$params" -eq 0 ];then
	directory=`pwd`
elif [ "$params" -eq 1 ];then
	directory="$@"
	cd $directory
	if [ "$directory" != `pwd` ];then
		directory=`pwd`
	fi
else
	echo "wrong number of arguments!"
	exit 1
fi

for file in $directory/*
do
	if [ -d "$file" ]; then
		cd "$file"
    		pdfs2txt
    		cd ..
  	elif [ `head -c 4 "$file"` = "%PDF" ];then
		filename=${file%.pdf}			
		pdftotext -layout -raw -eol unix $file > "$filename.txt"
	

	fi

done

Post Reply