• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Tesseract release notes June 30 2009 - V2.04
2Integrated patches for portability and to remove some of the
3"access" macros.
4Removed dependence on lua from the viewer making it a *lot*
5faster. Also the viewer now compiles and works (on Linux.)
6Fixed the following issues:
71, 63, 67, 71, 76, 79, 81, 82, 84, 106, 108, 111, 112, 128, 129, 130, 133, 135,
8142, 143, 145, 146, 147, 153, 154, 160, 165, 169, 170, 175, 177, 187, 192,
9195, 199, 201, 205, 209.
10This is the last version to support VC++6!
11This may also be the last version to compile without leptonica!
12Windows version now outputs to stderr by default, fixing a lot of the problems with lack of visible meaningful error messages.
13
14Tesseract release notes April 22 2008 - V2.03
152.02 was unrunnable, due to a last-minute "simple" change.
162.03 fixes the problem and also adds an include check for leptonica
17to make it more usable.
18
19Tesseract release notes April 21 2008 - V2.02
20Improvements to clustering, training and classifier.
21Major internationalization improvements for large-character-set
22languages, eg Kannada.
23Removed some compiler warnings.
24Added multipage tiff support for training and running.
25Updated graphics output to talk to new java-based viewer.
26Added ability to save n-best lists.
27Added leptonica support for more file types.
28Improved Init/End to make them safe.
29Reduced memory use of dictionaries.
30Added some new APIs to TessBaseAPI.
31Fixed namespace collisions with jpeg library (INT32).
32Portability fixes for Windows for new code.
33Updates to autoconf system for new code.
34
35Tesseract release notes August 27 2007 - V2.01
36Fixed UTF8 input problems with box file reader.
37Fixed various infinite loops and crashes in dawg code.
38Removed include of config_auto.h from host.h.
39Added automatic wctype encoding to unicharset_extractor.
40Fixed dawg table too full error.
41Removed svn files from tarball.
42Added new functions to tessdll.
43Increased maximum utf8 string in a classification result to 8.
44
45Tesseract release notes July 17, 2007 - V2.00
46
47First release of the International version.
48This version recognizes the following languages:
49English - eng
50French  - fra
51Italian - ita
52German  - deu
53Spanish - spa
54Dutch   - nld
55The language codes follow ISO 639-2. The default language is English.
56To recognize another language:
57tesseract inputimage outputbase -l langcode
58
59To train on a new language, see separate documentation.
60More languages will be appearing over time.
61
62List of changes in this release:
63  Converted internal character handling to UTF8.
64  Trained with 6 languages.
65  Added unicharset_extractor, wordlist2dawg.
66  Added boxfile creation mode.
67  Added UNLV regression test capability.
68  Fixed problems with copyright and registered symbols.
69  Fixed extern "C" declarations problem.
70  Made some improvements to consistency of accuracy across platforms.
71  Added vc++ express support.
72
73Instructions for downloading and building version 2.00.
74Things have changed quite a bit since the previous versions so please read carefully.
75*All users*
76The tarballs are split into pieces.
77tesseract-2.00.tar.gz contains all the source code.
78tesseract-2.00.<lang>.tar.gt contains the data files for <lang>. You need at least one of these or tesseract will not work.
79tesseract-2.00.exe.tar.gz is not for the 'exe' language. It is windows executables. They are built with VC++ express and come with absolutely no warranty. If they work for you then great, otherwise get visual C++ express (and the platform sdk) and build from the source.
80
81*Non-windows users*
82As with 1.04, this version works with make install.
83*New* there is a tesseract.spec for making rpms. (Thanks to Andrew Ziem for the help.)
84It might work with your OS if you know how to do that sort of thing.
85If you are linking to the libraries, as with Ocropus, there is now a single master
86library called libtesseract_full.a.
87
88*Windows users*
89If you are building from the sources, there are still dsw and dsp files for vc++6 and also
90sln and vcproj files for vc++ express.
91The dll has been updated to allow input of non-binary images. (Thanks to Glen of Jetsoft.)
92
93
94Tesseract release notes May 15, 2007 - V1.04.
95
96=== Windows users only ===
97Added a dll interface for windows. Thanks to Glen at Jetsoft for contributing
98this. To use the dll, include tessdll.h, import tessdll.lib and put tessdll.dll
99somewhere where the system can find it. There is also a small dlltest program
100to test the dll. Run with:
101dlltest phototest.tif phototest.txt
102It will output the text from phototest.tif with bounding box information.
103**New for Windows** the distribution now includes tesseract.exe and tessdll.dll
104which *might* work out of the box! There are no guarantees as you need
105VC++6 versions of mfc and crt (at least) for it to work. (Batteries not
106included, and certainly no installshield.)
107
108== Important note for anyone building with make: i.e. anyone except devstudio
109users ==
110This release includes new standardization for the data directory. To enable
111Tesseract to find its data files, you must either:
112./configure
113make
114make install
115to move the data files to the standard place, or:
116export TESSDATA_PREFIX="directory in which your tessdata resides/"
117(or equivalent) in your .profile or whatever or setenv to set the environment
118variable. Note that the directory must end in a /
119HAVING tesseract and tessdata IN THE SAME DIRECTORY DOES NOT WORK ANY MORE.
120
121== All users ==
122Fixed a bunch of name collisions - mostly with stl.
123Made some preliminary changes for unicode compatibility. Includes a new data
124file (unicharset) and renaming of the other data files to eng.* to support
125different languages.
126There are also several other minor bug fixes and portability improvements
127for 64 bit, the latest visual studio compiler etc. Thanks to all who have
128contributed these fixes.
129
130NOTE: This is likely to be the last English-only release!
131Apologies in advance to non-windows users for bloating the distribution with
132windows executables. This will probably get fixed in the next release with
133the multi-language capability, since that will also bloat the distribution.
134
135
136Tesseract release notes Feb 2, 2007 - V1.03.
137Added mftraining and cntraining. Using an image with a box file, tesseract
138generates .tr output files. cntraining runs on the .tr files to make
139normproto that lives in tessdata. mftraining runs on the .tr files to
140make inttemp and pffmtable in tessdata. These are the main data files
141that tesseract uses to recognize characters. At present, the code to make
142dictionary files is not yet available, nor are any sample box files or
143rebuilt inttemp or documentation to create any of these. Recognition is
144still limited to the ASCII set, but when this problem is fixed, documentation
145will follow.
146
147Added a new API with adaptive thresholding for grey and color images.
148See ccmain/baseapi.h/cpp for details. The main program has been converted
149to use the API as an example. See main() in ccmain/tesseractmain.cpp for
150details. The API is designed to make it easy to add subclasses with ability
151to output the bounding boxes etc from the internal structures. The adaptive
152thresholding improves accuracy (most of the time) on non-binary images.
153
154Many memory leaks have been fixed. There are no known leaks left from using
155the API correctly.
156
157The adaptive classifier was not operating correctly. This bug, and several
158others have been fixed, including poor chopping, an indefinite (if not quite
159infinite) loop in the number parser, and a couple of crash bugs. Thanks to
160all that have contributed bugs and bug fixes.
161
162It is now possible to build without any of the graphics support to save code
163size using #define GRAPHICS_DISABLED. There is also a new EMBEDDED define
164for use on operating systems with limited library support.
165
16664-bit and Mac OSX buildability is now included in the mainline source tree.
167Thanks to all that have contributed patches and comments to help with that.
1681.03 is also endian-independent, apart from the tiff i/o, so if you use
169libtiff, the code should run on all platforms, even if you get/create new
170data files of a different endinanness.
171
172Some of the bug fixes improve accuracy, and so do some of the changes to
173DangAmbigs and user-words.
174
175Tesseract release notes, Oct 4 2006 - V1.02.
176Removed dependency on aspirin. *All* code is now licensed under Apache2.0.
177
178Tesseract release notes, Sep 7 2006 - V1.01.
179
180Fixes for this release:
181Added mfcpch.cpp and getopt.cpp for VC++.
182Fixed problem with greyscale images and no libtiff.
183Stopped debug window from being used for the usage output.
184Fixed load of inttemp for big-endian architectures.
185Fixed some Mac compilation issues.
186
187This version should read uncompressed 8 bit grey and 24 bit color tiffs
188without having to have libtiff. It does a dumb threshold though, so don't
189expect good results from poor contrast or images of natural scenes etc.
190
191If you just run tesseract with no command line args you should now get a
192sensible usage message on linux, with or without X-windows.
193
194If you can get it to compile on a PPC Mac, it may now run correctly,
195although not all the build issues are fixed yet.
196
197Building Tesseract:
198Windows:
199Unpack the tar.gz archive
200Open tesseract.dsw in DevStudio (preferably version 6, higher versions will be more difficult)
201Set Win32 - Release as the active configuration.
202Build.
203Copy tesseract.exe from bin.rel up one directory level.
204Run tesseract phototest.tif phototest
205This will create phototest.txt.
206
207Linux:
208Unpack the tar.gz archive
209./configure
210make
211Copy tesseract from ccmain up one directory level (or create a symbolic link)
212Run tesseract phototest.tif phototest
213This will create phototest.txt.
214