ocrodjvu

ocrodjvu is a wrapper for OCR systems, that allows you to perform OCR on DjVu files.

Author: Jakub Wilk

License: GNU General Public License, version 2

Version: 0.7.19 released on 2014-11-11

Acknowledgment:

Since May 2009 ocrodjvu development has been supported by the Polish Ministry of Science and Higher Education's grant no. N N519 384036 (2009–2012).

Example:
$ wget -q 'https://ocropus.googlecode.com/svn/trunk/data/pages/alice_1.png'
$ gm convert -threshold 50% 'alice_1.png' 'alice.pbm'
$ cjb2 'alice.pbm' 'alice.djvu'
$ ocrodjvu --in-place 'alice.djvu'
Processing 'alice.djvu':
- Page #1
$ djvused -e print-txt 'alice.djvu' | head -n10
(page 0 0 2488 3507
 (line 470 2922 1383 2978
  (word 470 2927 499 2976 "1")
  (word 588 2926 787 2978 "Down")
  (word 817 2925 927 2977 "the")
  (word 959 2922 1383 2976 "Rabbit-Hole"))
 (line 465 2803 2073 2856
  (word 465 2819 569 2856 "Alice")
  (word 592 2819 667 2841 "was")
  (word 690 2808 896 2854 "beginning")