Dear Jose,
(1) You can go to PDFZone at
http://www.pdfzone.com/products/software/toolinfo_all.asp
for a full list of PDF conversion tools. Some of them may work for your
needs.
(2) You can also use the following ps2ascii script (uses ghostscript)
which may work with simple PDF files:
-----------------------------------------------------------------------
#!/bin/sh
# Extract ASCII text from a PostScript file. Usage:
# ps2ascii [infile.ps [outfile.txt]]
# If outfile is omitted, output goes to stdout.
# If both infile and outfile are omitted, ps2ascii acts as a filter,
# reading from stdin and writing on stdout.
trap "rm -f _temp_.err _temp_.out" 0 1 2 15
if ( test $# -eq 0 ) then
gs -q -dNODISPLAY -dNOBIND -dWRITESYSTEMDICT -dSIMPLE -c save -f
ps2ascii.ps - -c quit
elif ( test $# -eq 1 ) then
gs -q -dNODISPLAY -dNOBIND -dWRITESYSTEMDICT -dSIMPLE -c save -f
ps2ascii.ps $1 -c quit
else
gs -q -dNODISPLAY -dNOBIND -dWRITESYSTEMDICT -dSIMPLE -c save -f
ps2ascii.ps $1 -c quit >$2
fi
-----------------------------------------------------------------------
(3) You can try the pstotext utility from
http://www.research.digital.com/SRC/virtualpaper/pstotext.html
This also requires requires Aladdin Ghostscript. It's supposed to work
for both postscript and PDF conversion although I've found that it fails
for PDF documents of complex technical nature.
(4) We are currently in the process of evaluating a (commercial)
PDF-to-text tool called Argus (it's in the PDFzone list). It seems to
work fairly well and the plus point is that it is configurable. As with
other tools though, it seems to have problems when the document
includes a lot of equations and tables and we're trying to find a way
around that.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Dr George Demetriou
Dept. of Computer Science Room: 219
The University of Sheffield Tel: +44 (0) 114 2221894
Regent Court FAX: +44 (0) 114 2229237
211 Portobello Street e-mail: demetri@dcs.shef.ac.uk
Sheffield, S1 4DP, UK
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%