[Linux] converting PDF to DOC?
Larry Kagan
linux@flux.org
Fri, 22 Jun 2007 19:41:06 -0400
> Yes, with a heavy helping of "good luck". Here's a full pipeline:
>
> $ wget 'http://cm.bell-labs.com/cm/cs/who/dmr/pdfs/man11.pdf' -O - |
> convert - pnm:- |
> gocr - |
> head -20 |
> tail
>
> e_DPSlS __ key arlle n_et , , ,
>
> DESCBTRI0h_ ar matnt_ln8 gro_FL or rll_8 c_bInad tnto a 8Ln_
> '_lg archive ftle, JtL maln use lL to cre_te aid
> _ update lljary tlle8 aL uged bY tt.e loader, Jt
> can b8 used, t_ugh, tor any slmll_ __po8e,
>
> kg Lg Dng ch8racter rrn_. the Let gd t , opti0n-
> ally concat_nated wlth v. ?g1 tL the __cnive
> rtle. The n__eL ar8 co\code(0144)atltuent ftles tn the
>
>
Oh look at that... my mother used to read that story to me at bed time.
;) I only used this process once recently in a proof-of-concept for an
application. My results were about 80% accurate. The original document
was faxed in and stored as a tiff. I then converted the image to pgm
via convert and then ran it through gocr. Obviously, your results may vary.
Larry