[Linux] converting PDF to DOC?
Adam Glass
linux@flux.org
Fri, 22 Jun 2007 11:32:52 -0400
------=_Part_4117_29687459.1182526372077
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Robert's method is probably the best you'll be able to do, but note that
this method will insert an image of the text into the Word document, not the
text itself. That means you won't be able to edit the text, just read it.
Some scanners include optical character recognition (OCR) software. You
could print the PDF and then scan the pages and then run the OCR software.
Sometimes this works well, sometimes not. But the result will be text that
can be pasted into a Word document and then edited.
--Adam
On 6/21/07, Robert Citek <robert.citek@gmail.com> wrote:
>
> On 06/03/2007 11:21 AM, StephenW wrote:
> > I have a PDF document what a friend wants in DOC format.
> >
> > She is on the other side of the world and is not at all computer
> literate and
> > therefore does not have Adobe Reader on her computer. Is there any way
> to
> > convert it other than printing out the document and scanning it back
> into WORD
> > (25 pages)?
>
> Did you ever find a solution?
>
> Here's one solution which worked for me using the IRS's 1040 pdf as an
> example:
>
> 1) get the pdf from the IRS using your favorite browser or wget
> $ wget http://www.irs.gov/pub/irs-pdf/f1040.pdf
>
> 2) convert to jpg using ImageMagick's convert. This will create one jpg
> for each page.
> $ convert f1040.pdf f1040.jpg
>
> 3) open OpenOffice, insert each image into a new page (Insert > Picture
> > From File ... ), and resize the images to fit the page by dragging the
> image handles.
>
> 4) save as a Word document (File > Save as > File Type > Microsoft Word
> 97/2000/XP)
>
> The text was a bit fuzzy, but it worked. Anyone have any suggestions on
> how to make the text more readable? Perhaps some special option to
> convert?
>
> I did this on a laptop running Ubuntu Dapper Drake 6.06. I had to
> install ImageMagick with 'sudo apt-get install imagemagick'
>
> Odd that she would have Word (a for-purchase product) and not have
> Acrobat Reader (a freely available product).
>
> Regards,
> - Robert
>
> _______________________________________________
> Linux mailing list
> Linux@flux.org
> http://www.flux.org/mailman/listinfo/linux
>
------=_Part_4117_29687459.1182526372077
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
<div>Robert's method is probably the best you'll be able to do, but note that this method will insert an image of the text into the Word document, not the text itself. That means you won't be able to edit the text, just read it.
</div><div><br class="webkit-block-placeholder"></div><div>Some scanners include optical character recognition (OCR) software. You could print the PDF and then scan the pages and then run the OCR software. Sometimes this works well, sometimes not. But the result will be text that can be pasted into a Word document and then edited.
</div><div><br class="webkit-block-placeholder"></div><div>--Adam</div><div><br class="webkit-block-placeholder"></div><br><div><span class="gmail_quote">On 6/21/07, <b class="gmail_sendername">Robert Citek</b> <<a href="mailto:robert.citek@gmail.com">
robert.citek@gmail.com</a>> wrote:</span><blockquote class="gmail_quote" style="margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex">On 06/03/2007 11:21 AM, StephenW wrote:<br>> I have a PDF document what a friend wants in DOC format.
<br>><br>> She is on the other side of the world and is not at all computer literate and<br>> therefore does not have Adobe Reader on her computer. Is there any way to<br>> convert it other than printing out the document and scanning it back into WORD
<br>> (25 pages)?<br><br>Did you ever find a solution?<br><br>Here's one solution which worked for me using the IRS's 1040 pdf as an<br>example:<br><br>1) get the pdf from the IRS using your favorite browser or wget
<br>$ wget <a href="http://www.irs.gov/pub/irs-pdf">http://www.irs.gov/pub/irs-pdf</a>/f1040.pdf<br><br>2) convert to jpg using ImageMagick's convert. This will create one jpg<br>for each page.<br>$ convert f1040.pdf
f1040.jpg<br><br>3) open OpenOffice, insert each image into a new page (Insert > Picture<br>> From File ... ), and resize the images to fit the page by dragging the<br>image handles.<br><br>4) save as a Word document (File > Save as > File Type > Microsoft Word
<br>97/2000/XP)<br><br>The text was a bit fuzzy, but it worked. Anyone have any suggestions on<br>how to make the text more readable? Perhaps some special option to convert?<br><br>I did this on a laptop running Ubuntu Dapper Drake
6.06. I had to<br>install ImageMagick with 'sudo apt-get install imagemagick'<br><br>Odd that she would have Word (a for-purchase product) and not have<br>Acrobat Reader (a freely available product).<br><br>Regards,
<br>- Robert<br><br>_______________________________________________<br>Linux mailing list<br><a href="mailto:Linux@flux.org">Linux@flux.org</a><br><a href="http://www.flux.org/mailman">http://www.flux.org/mailman</a>/listinfo/linux
<br></blockquote></div><br>
------=_Part_4117_29687459.1182526372077--