[Linux] converting PDF to DOC?
Larry Kagan
linux@flux.org
Sun, 03 Jun 2007 21:53:33 -0400
This is a multi-part message in MIME format.
--------------090400070909030402060201
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
There is one other option but it's not pretty. You can use gocr
(optical character recognition).
1. Open the PDF in Gimp.
2. Save as PGM file format. (or use ImageMagic: $ convert mydoc.pdf
mydoc.pgm)
3. Run gocr on the pgm file ($ gocr mydoc.pgm > mydoc.txt)
4. Open mydoc.txt in OO and save as MS word.
5. Read and Fix all the characters not recognized properly (which
could be quite a lot)
6. Re-format the document (bullets, underlines, bold, italic, etc)
7. Crop, save, and copy embedded images from PDF into the new doc file.
This is obviously a project and probably more work than it's worth but
only you can decide that.
Good Luck
Larry
--------------090400070909030402060201
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<font face="Helvetica, Arial, sans-serif" size="-1">The</font><font
face="Helvetica, Arial, sans-serif" size="-1">re is one other option
but it's not pretty. You can use gocr (optical character
recognition). <br>
<br>
</font>
<ol>
<li><small><font face="Helvetica, Arial, sans-serif">Open the PDF in
Gimp.</font></small></li>
<li><small><font face="Helvetica, Arial, sans-serif">Save as PGM file
format. (or use ImageMagic: </font><tt>$ convert mydoc.pdf mydoc.pgm</tt>)<br>
</small></li>
<li><small><font face="Helvetica, Arial, sans-serif">Run gocr on the
pgm file (</font><tt>$ gocr mydoc.pgm > mydoc.txt</tt><font
face="Helvetica, Arial, sans-serif">)</font></small></li>
<li><small><font face="Helvetica, Arial, sans-serif">Open mydoc.txt
in OO and save as MS word.</font></small></li>
<li><small><font face="Helvetica, Arial, sans-serif">Read and Fix all
the characters not recognized properly (which could be quite a lot)</font></small></li>
<li><small><font face="Helvetica, Arial, sans-serif">Re-format the
document (bullets, underlines, bold, italic, etc)</font></small></li>
<li><small><font face="Helvetica, Arial, sans-serif">Crop, save, and
copy embedded images from PDF into the new doc file.</font></small></li>
</ol>
<small><font face="Helvetica, Arial, sans-serif">This is obviously a
project and probably more work than it's worth but only you can decide
that.<br>
<br>
Good Luck<br>
<br>
Larry<br>
</font></small>
</body>
</html>
--------------090400070909030402060201--