[Linux] converting PDF to DOC?

Larry Kagan linux@flux.org
Fri, 22 Jun 2007 19:41:06 -0400


> Yes, with a heavy helping of "good luck".  Here's a full pipeline:
>
> $ wget 'http://cm.bell-labs.com/cm/cs/who/dmr/pdfs/man11.pdf' -O - |
>  convert - pnm:- |
>  gocr - |
>  head -20 |
>  tail
>
> e_DPSlS     __ key arlle n_et , , ,
>
> DESCBTRI0h_   ar matnt_ln8 gro_FL or rll_8 c_bInad tnto a 8Ln_
> '_lg archive ftle,  JtL maln use lL to cre_te aid
> _             update lljary tlle8 aL uged bY tt.e loader,  Jt
> can b8 used, t_ugh, tor any slmll_ __po8e,
>
> kg Lg Dng ch8racter rrn_. the Let gd t  , opti0n-
> ally concat_nated wlth v.  ?g1  tL the __cnive
> rtle.  The n__eL ar8 co\code(0144)atltuent ftles tn the
>
>   

Oh look at that... my mother used to read that story to me at bed time. 
;)  I only used this process once recently in a proof-of-concept for an 
application.  My results were about 80% accurate.  The original document 
was faxed in and stored as a tiff.  I then converted the image to pgm 
via convert and then ran it through gocr.  Obviously, your results may vary.

Larry