1How to run UNLV tests. 2 3The scripts in this directory make it possible to duplicate the tests 4published in the Fourth Annual Test of OCR Accuracy. 5See http://www.isri.unlv.edu/downloads/AT-1995.pdf 6but first you have to get the tools and data from UNLV: 7 8Step 1: to download the images goto 9http://www.isri.unlv.edu/ISRI/OCRtk 10and get 3b.tgz, Bb.tgz, Mb.tgz and Nb.tgz. 11 12Step 2: extract the files. It doesn't really matter where 13in your filesystem you put them, but they must go under a common 14root so you have directories 3, B, M and N in, for example, 15/users/me/ISRI-OCRtk. 16 17Step 3: Reorg the files 18The lack of tif extensions on the images is inconvenient, so there 19is a script to reorganize the data to match the rest of the test 20scripts. 21cd to /users/me/ISRI-OCRtk or wherever 3, B, M and N ended up and run 22/blah/blah/tesseract-ocr/testing/reorgdata.sh 3B 23This makes directories doe3.3B, bus.3B, mag.3B and news.3B. 24You can now get rid of 3, B, M, and N unless you want to get some of the 25other scanning resolutions out of them. 26 27Step 4: Download the ISRI toolkit from: 28http://www.isri.unlv.edu/downloads/ftk-1.0.tgz 29 30Step 5: If they work for you, use the binaries directly from the bin 31directory and put them in tesseract-ocr/testing/unlv 32otherwise build the tools for yourself and put them there. 33 34Step 6: cd back to your main tesseract-ocr dir and Build tesseract. 35 36Step 7: run testing/runalltests.sh with the root data dir and testname: 37testing/runalltests.sh /users/me/ISRI-OCRtk tess2.0 38and go to the gym, have lunch etc. 39 40Step 8: There should be a file 41testing/reports/tess2.0.summary that contains the final summarized accuracy 42report and comparison with the 1995 results. 43 44