• Home
  • Raw
  • Download

Lines Matching full:of

3 The Project Gutenberg Etext of LOC WORKSHOP ON ELECTRONIC TEXTS
25 Library of Congress
36 TABLE OF CONTENTS
61 Dorothy Twohig, The Papers of George Washington
63 Maria L. Lebron, The Online Journal of Current Clinical Trials
79 Session IV. Image Capture, Text Capture, Overview of Text and
82 A) Principal Methods for Image Capture of Text:
83 direct scanning, use of microform
98 D) Text Conversion: OCR vs. rekeying, standards of accuracy
99 and use of imperfect texts, service bureaus
125 Appendix III: Directory of Participants
134 opportunity to learn about areas of human activity unknown to me a scant
147 The Workshop on Electronic Texts (1) drew together representatives of
149 experiences, and, in particular, methods of placing and presenting
152 form a new nation, or, to put it another way, the diversity of projects
158 attendees represented a variety of formal, informal, figurative, and
169 * Study of the use of digital materials by scholars and others
172 sequence of presentations.
176 any computerized reproduction or version of a document, book,
180 (2) The Workshop was held at the Library of Congress on 9-10 June
182 The document that follows represents a summary of the presentations
190 discussed in the context of imaging. Anne KENNEY and Lynne PERSONIUS
191 explained how the concept of a faithful copy and the user-friendliness of
194 Cornell project are creating digital image sets of older books in the
204 flexible note as she endorsed the creation and dissemination of a variety
205 of types of digital copies. Do not be too narrow in defining what counts
212 In part, BATTIN's position reflected the unsettled nature of image-format
213 standards, and attendees could hear echoes of this unsettledness in the
214 comments of various speakers. For example, Jean BARONAS reviewed the
215 status of several formal standards moving through committees of experts;
216 and Clifford LYNCH encouraged the use of a new guideline for transmitting
219 Memory project highlighted some of the challenges to the actual creation
220 or interchange of images, including difficulties in converting
222 progress of a master plan for a project at Yale University to convert
225 The Workshop offered rather less of an imaging practicum than planned,
227 KENNEY's presentation and in the discussion of arcana such as
231 (3) Although there is a sense in which any reproductions of
233 field have developed particular guidelines for the creation of
236 (4) Titles and affiliations of presenters are given at the
237 beginning of their respective talks and in the Directory of
243 The sections of the Workshop that dealt with machine-readable text tended
246 presentation on the Text Encoding Initiative's (TEI) implementation of
250 discussion focused on the value of the finished product, what the
255 kinds of markup were distinguished: 1) procedural markup, which
256 describes the features of a text (e.g., dots on a page), and 2)
257 descriptive markup, which describes the structure or elements of a
260 The TEI proponents emphasized the importance of texts to scholarship.
264 a written or printed item (e.g., a particular edition of a book) is
265 merely a representation of the abstraction we call a text. To concern
266 ourselves with faithfully reproducing a printed instance of the text,
268 of a representation ("images as simulacra for the text"). The TEI proponents'
270 for example, photographs of the Acropolis to accompany a Greek text.
272 By the end of the Workshop, SPERBERG-McQUEEN confessed to having been
279 the present conference at the Library of Congress had compelled them to
280 reevaluate their perspective on the usefulness of text as images.
286 the case of the Papers of George Washington, Dorothy TWOHIG explained
287 that the digital version will provide a not-quite-perfect rendering of
290 Members of the American Memory team and the staff of NAL's Text
292 searchable texts. In the case of American Memory, contractors produce
294 "reference" versions of written or printed originals. End users who need
295 faithful copies or perfect renditions must refer to accompanying sets of
296 digital facsimile images or consult copies of the originals in a nearby
297 library or archive. American Memory staff argued that the high cost of
299 access to large parts of its collections.
302 THE MACHINE-READABLE TEXT: METHODS OF CONVERSION
304 Although the Workshop did not include a systematic examination of the
308 merging of multiple optical character recognition systems that will
309 reduce errors from an unacceptable rate of 5 characters out of every
310 l,000 to an unacceptable rate of 2 characters out of every l,000.
312 Pamela ANDRE presented an overview of NAL's Text Digitization Program and
314 purchased hardware and software capable of performing optical character
318 and/or creating abstracts or summaries of texts. NAL reckoned costs at
319 $7 per page. By way of contrast, Ricky ERWAY explained that American
322 and quality of results, as opposed to methods of conversion. ERWAY noted
326 the most time-consuming aspect of contracting out conversion. ERWAY
333 The topic of dissemination proper emerged at various points during the
336 highlighted the virtues of Internet today and of the network that will
338 vision of an information democracy in which millions of citizens freely
339 find and use what they need. LYNCH noted that a lack of standards
341 by BESSER. LARSEN addressed the issues of network scalability and
342 modularity and commented upon the difficulty of anticipating the effects
343 of growth in orders of magnitude. BROWNRIGG talked about the ability of
346 shortcomings and incongruities of present-day computer networks. For
348 traffic consists of personal communication (E-mail). 2) Large bodies of
352 4) Machine-readable texts are commonplace, but the capability of the
354 A glimpse of a multimedia future for networks, however, was provided by
355 Maria LEBRON in her overview of the Online Journal of Current Clinical
356 Trials (OJCCT), and the process of scholarly publishing on-line.
358 The contrasting form of the CD-ROM disk was never systematically
359 analyzed, but attendees could glean an impression from several of the
361 demonstrated recently published disks, while the descriptions of the
362 IBYCUS version of the Papers of George Washington and Chadwyck-Healey's
363 Patrologia Latina Database (PLD) told of disks to come. According to
365 Migne's definitive collection of Latin texts to machine-readable form.
367 on-line future, the possibility of rolling up one's sleeves for a session
376 Although concerned with the technicalities of production, the Workshop
377 never lost sight of the purposes and uses of electronic versions of
379 the problematical matter of digital preservation, while the TEI proponents
382 She placed the phenomenon of electronic texts within the context of
388 80 percent of these devoted to topics in the social sciences and the
395 Toward the end of the Workshop, Michael LESK presented a corollary to
396 MICHELSON's talk, reporting the results of an experiment that compared
397 the work of one group of chemistry students using traditional printed
403 DALY provided an anecdotal account of the revolutionizing impact of the
404 new technology on his previous methods of research in the field of classics.
406 made by MICHELSON concerning the positive effects of the sudden and radical
409 Susan VECCIA and Joanne FREEMAN delineated the use of electronic
410 materials outside the university. The most interesting aspect of their
420 advice during a lively discussion of this subject. But uncertainty
421 remains concerning the price of copyright in a digital medium, because a
422 solution remains to be worked out concerning management and synthesis of
423 copyrighted and out-of-copyright pieces of a database.
425 As moderator of the final session of the Workshop, Prosser GIFFORD directed
426 discussion to future courses of action and the potential role of LC in
438 * A network version of American Memory should be developed or
441 Given the current dearth of digital data that is appealing and
443 network version of American Memory could do much to help make
446 * Concerning the thorny issue of electronic deposit, LC should
447 initiate a catalytic process in terms of distributed
455 in which LC would be dealing with a minimal number of publishers
457 concept of on-line publishing, determining, among other things,
460 * Since a number of projects are planning to carry out
466 This would reduce the possibility of multiple institutions digitizing
478 attendees learned a great deal, and plan to select and employ elements of
483 On the imaging side, one confronts a proliferation of competing
484 data-interchange standards and a lack of consensus on the role of digital
485 facsimiles in preservation. In the realm of machine-readable texts, one
487 and high costs. These latter problems, of course, represent a special
489 press, "to put the [contents of the] Library of Congress on line." In
490 the words of one participant, there was "no solution to the economic
492 to be a lot of work to transform the information industry, and so far the
505 GIFFORD * Origin of Workshop in current Librarian's desire to make LC's
507 of greater interconnectedness *
510 After welcoming participants on behalf of the Library of Congress,
512 GIFFORD, director for scholarly programs, Library of Congress, located
513 the origin of the Workshop on Electronic Texts in a conversation he had
515 some of the issues faced by AM. On the assumption that numerous other
517 together as many of these people as possible to ask the same questions
518 together. In a deeper sense, GIFFORD said, the origin of the Workshop
519 lay in the desire of the current Librarian of Congress, James H.
520 Billington, to make the collections of the Library, especially those
521 offering unique or unusual testimony on aspects of the American
522 experience, available to a much wider circle of users than those few
524 emphasis of AM, from the outset, has been on archival collections of the
532 that the prospect of greater interconnectedness than ever before would
534 development of systems of shared and distributed responsibilities to
535 avoid duplication and to ensure accuracy and preservation of unique
536 materials; and 3) agreement on the necessary standards and development of
540 participants reflect from the outset upon the sorts of outcomes they
549 FLEISCHHAUER * Core of Workshop concerns preparation and production of
550 materials * Special challenge in conversion of textual materials *
555 Carl FLEISCHHAUER, coordinator, American Memory, Library of Congress,
557 of the work of converting or preparing materials and that the core of
561 term, in which AM not only has wrestled with the issue of what is the
562 best course to pursue but also has faced a variety of technical
565 FLEISCHHAUER remarked AM's endeavors to deal with a wide range of library
567 and pictorial collections of various sorts, especially collections of
568 photographs. In the course of these efforts, AM kept coming back to
570 etc. Text posed the greatest conversion challenge of all. Thus, the
571 genesis of the Workshop, which reflects the problems faced by AM. These
573 and archive business deal with collections made up of fragile and rare
575 bound materials of the late nineteenth century. These are precious
576 cultural artifacts, however, as well as interesting sources of
583 by the question of acceptable level of accuracy. One hundred percent
584 accuracy is tremendously expensive. On the other hand, the output of
589 Questions of quality arose concerning images as well. FLEISCHHAUER
590 contrasted the extremely high level of quality of the digital images in
596 networks have begun to signal that for various forms of media a
600 through the network because of its size. FLEISCHHAUER referred, of
606 There probably would be times when the historical authenticity of an
621 In thinking about issues related to reproduction of materials and seeing
624 surveyed the several groups represented: 1) the world of images (image
625 users and image makers); 2) the world of text and scholarship and, within
627 delightful irony in the fact that some of the most advanced thinkers on
629 3) the network world; and 4) the general world of library science, which
633 Lucile Packard Foundation for its support of the meeting, the American
635 Demonstration Lab, and the Office of Special Events. He expressed the
637 Packard's work and the work of the foundation had sponsored a number of
645 DALY * Acknowledgements * A new Latin authors disk * Effects of the new
646 technology on previous methods of research *
649 Serving as moderator, James DALY acknowledged the generosity of all the
650 presenters for giving of their time, counsel, and patience in planning
651 the Workshop, as well as of members of the American Memory project and
652 other Library of Congress staff, and the David and Lucile Packard
656 in the Humanities (CETH) and the Department of Classics at Rutgers
665 at once the revolutionizing impact of the new technology on his previous
666 methods of research. Had this disk been available two or three years
668 Book 10 of Virgil's Aeneid for Cambridge University Press, he would not
671 concordances to key Latin authors, an almost equal number of lexica to
673 were lacking, numerous editions of authors antedating and postdating Virgil.
675 Nor, when checking each of the average six to seven words contained in
678 mechanical process of flipping through these concordances, lexica, and
681 the Thesaurus Linguae Latinae. Instead of devoting countless hours, or
682 the bulk of his research time, to gathering data concerning Virgil's use
683 of words, DALY--now freed by PHI's Latin authors disk from the
685 would have been able to devote that same bulk of time to analyzing and
689 DALY argued that this reversal in his style of work, made possible by the
691 research. Indeed, even in the course of his browsing the Latin authors
693 capabilities suggested to him several new avenues of research into
694 Virgil's use of sound effects. This anecdotal account, DALY maintained,
702 texts within the context of broader trends within information technology
703 and scholarly communication * Evaluation of the prospects for the use of
704 electronic texts * Relationship of electronic texts to processes of
711 information technology trends affecting the conduct of scholarly
713 * The trend toward greater connectivity * Effects of these trends * Key
714 transformations taking place * Summary of principal arguments *
720 of both information technology and scholarship trends. This
723 relevant to scholarship; 2) the key trends in the use of currently
730 on the scholarly use of technology.
732 MICHELSON sought to place the phenomenon of electronic texts within the
733 context of broader trends within information technology and scholarly
734 communication. She argued that electronic texts are of most use to
741 Evaluation of the prospects for the use of electronic texts includes two
742 elements: 1) an examination of the ways in which researchers currently
744 an analysis of key information technology trends that are affecting the
745 long-term conduct of scholarly communication. MICHELSON limited her
746 discussion of the use of electronic texts to the practices of humanists
749 MICHELSON examined the nature of the current relationship of electronic
751 maintained were, essentially, five processes of scholarly communication
755 generation of scholars and students. This examination would produce a
756 clearer understanding of the synergy among these five processes that
757 fuels the tendency of the use of electronic resources for one process to
758 stimulate its use for other processes of scholarly communication.
760 For the first process of scholarly communication, the identification of
762 supplement traditional word-of-mouth searches for sources among their
763 colleagues with new forms of electronic searching. So, for example,
764 instead of having to visit the library, researchers are able to explore
765 descriptions of holdings in their offices. Furthermore, if their own
768 universities of California, Michigan, Pennsylvania, and Wisconsin.
770 empowerment to scholars by presenting a comprehensive means of browsing
773 The second process of communication involves communication among
774 scholars. Beyond the most common methods of communication, scholars are
775 using E-mail and a variety of new electronic communications formats
779 education networks. Moreover, the global spread of E-mail has been so
784 include more than 700 conferences, with about 80 percent of these devoted
785 to topics in the social sciences and humanities. The rate of growth of
789 humanities were added to this directory of listings. Scholars have
794 education, and gifted and talented education. The appeal to scholars of
797 with peers at the front end of the research process.
799 Interpretation and analysis of sources constitutes the third process of
800 scholarly communication that MICHELSON discussed in terms of texts and
804 ends of this continuum. At one end, quantitative analysis involves the
805 use of mathematical processes such as a count of frequencies and
806 distributions of occurrences or, on a higher level, regression analysis.
807 At the other end of the continuum, qualitative analysis typically
809 interpretation or the building of theory. Aspects of this work involve
810 the processing--either manual or computational--of large and sometimes
811 massive amounts of textual sources, although the use of nontextual
815 Scholars have discovered that many of the methods of interpretation and
820 of advanced technologies, computers can recognize patterns, analyze text,
823 must rely on manual interpretation of data. But if scholars are to use
828 of the numerous textual conversion projects organized by scholars around
831 converting the extant ancient texts of classical Greece. (Editor's note:
832 according to the TLG Newsletter of May l992, TLG was in use in thirty-two
838 years, humanities scholars have initiated a number of projects to
843 In a second effort to facilitate the sharing of converted text, scholars
845 Humanities (CETH). The center estimates that there are 8,000 series of
849 electronic library, and preparing bibliographic descriptions of the
857 American and French Research on the Treasury of the French Language
860 noting that: 1) increasing numbers of humanities scholars in the library
861 community are recognizing the importance to the advancement of
862 scholarship of retrospective conversion of source materials in the arts
867 The fourth process of scholarly communication is dissemination of
869 research and education networks to engineer a new type of publication:
873 to thirty-six during the past year (July 1991 to June 1992). Most of
879 process. Beyond scholarly journals, MICHELSON remarked the delivery of
884 that the copyright and fees issues impeding the delivery of full text on
887 The final process of scholarly communication is curriculum development
888 and instruction, and this involves the use of computer information
889 technologies in two areas. The first is the development of
892 the analysis of sources in the classroom, etc. The Perseus Project, a
894 civilization, is a good example of the way in which entire curricula are
898 to build upon the work of others, will be resolved before too long.
903 The second aspect of electronic learning involves the use of research and
906 locations and rely on the availability of electronic instructional
908 state departments of education because of their demonstrated capacity to
909 bring advanced specialized course work and an array of experts to many
912 development in many of the remaining states.
917 in general, are being infused into each of the five processes described
919 The use of electronic resources for one process tends to stimulate its
920 use for other processes, because the chief course of movement is toward a
922 includes on-line availability of key bibliographies, scholarly feedback,
927 developing an electronic concordance of the works of Saint Thomas Aquinas
929 beginning of this on-line transition but, for at least some humanities
933 networks are becoming the new medium of scholarly communication. The
939 transformations as a result of the emergence and growing prominence of
942 MICHELSON next turned to the second element of the framework she proposed
943 at the outset of her talk for evaluating the prospects for electronic
945 of scholarly communication over the next decade: 1) end-user computing
950 consumes the computation. The emergence of personal computers, along
951 with a host of other forces, such as ubiquitous computing, advances in
953 of computation to do their own computing, and is thus rendering obsolete
956 The trend toward end-user computing is significant to consideration of
959 competent in the use of electronic media. By avoiding programmer
962 researcher's perspective on the nature of research itself, that is, the
963 kinds of questions that can be posed, the analytical methodologies that
964 can be used, the types and amount of sources that are appropriate for
967 computation are being infused into all processes of humanities
977 collaborate in all phases of research.
979 The combination of the trend toward end-user computing and the trend
980 toward connectivity suggests that the scholarly use of electronic
982 established feature of scholarship. The effects of these trends, along
989 In summary, MICHELSON emphasized four points: 1) A portion of humanities
993 processes of scholarly communication. 3) The humanities scholars'
994 working context is in the process of changing from print technology to
997 changes are occurring in conjunction with the development of a new
1003 texts are best understood in terms of the relationship to other
1004 electronic resources and the growing prominence of network-mediated
1006 to be integrated into the on-line network of electronic resources that
1008 of portions of the scholarly record should be a key strategy as information
1014 VECCIA * AM's evaluation project and public users of electronic resources
1016 implementation of AM * Characteristics of the six public libraries
1017 selected * Characteristics of AM's users in these libraries * Principal
1022 American Memory, Library of Congress, gave a joint presentation. First,
1023 by way of introduction, VECCIA explained her and FREEMAN's roles in
1025 assisted with the evaluation project of AM, placing AM collections in a
1026 variety of different sites around the country and helping to organize and
1027 implement that project. FREEMAN has been an associate coordinator of AM
1029 preparing some of the electronic exhibits and printed historical
1032 of electronic resources. Notwithstanding a fairly structured evaluation
1034 terms of numbers, etc., because they felt it was too early in the
1037 AM is an electronic archive of primary source materials from the Library
1038 of Congress, selected collections representing a variety of formats--
1040 and soon, pamphlets and books. In terms of the design of this system,
1044 teachers so that they may begin using the content of the system at once.
1047 users of AM, limiting her remarks to public libraries, because FREEMAN
1050 involves testing of the Macintosh implementation of AM. Since the
1051 primary goal of this evaluation is to determine the most appropriate
1053 makes evaluation difficult because of the varying degrees of technology
1054 literacy among the sites. AM is situated in forty-four locations, of
1060 VECCIA focused the remainder of her talk on the six public libraries, one
1061 of which doubles as a state library. They represent a range of
1062 geographic areas and a range of demographic characteristics. For
1064 one in a suburban setting. A range of technical expertise is to be found
1065 among these facilities as well. For example, one is an "Apple library of
1070 appreciative of the work that AM has been doing. VECCIA characterized
1072 general readers; of the students who use AM in the public libraries,
1076 people interested in the content and historical connotations of these
1080 people seem comfortable with either IBM or Macintosh, although most of
1084 What kinds of things do users do with AM? In a public library there are
1088 described a patron of a rural public library who comes in every day on
1090 collection image by image. At the end of his hour he makes an electronic
1095 of the older, retired people in the community, who ordinarily would not
1096 use "those things,"--computers. Another example of adult learning in
1097 public libraries is book groups, one of which, in particular, is using AM
1098 as part of its reading on industrialization, integration, and urbanization
1103 to use AM to prepare an exhibit on toys of the past. These two examples
1104 emphasize the mission of the public library as a cultural institution,
1108 numbers came in one afternoon to use AM for entertainment. A number of
1110 Detroit collection, which was essentially a collection of images used on
1111 postcards around the turn of the century. Train buffs are similarly
1112 interested because that was a time of great interest in railroading.
1113 People, it was found, relate to things that they know of firsthand. For
1115 observers reported that the older people with personal remembrances of
1116 the turn of the century were gravitating to the Detroit collection.
1118 integration of electronic tools and ideas--that people learn best when
1125 staff of a major local public library in the South to think about ways to
1126 make its own collection of photographs more accessible to the public.
1133 itself * Computer anxiety * Access and availability of the system *
1134 Hardware * Strengths gained through the use of archival resources in
1139 resource made up of primary materials with very little interpretation,
1144 grades of elementary school through high school, greeted the announcement
1150 several strengths of this type of material in a school environment as
1151 opposed to a highly structured resource that offers a limited number of
1155 environment. There is often some difficulty in developing a sense of
1157 and assume that, because AM comes from the Library of Congress, all of
1158 American history is now at their fingertips. As a result of that sort of
1160 nothing of use to them when they look for one or two things and do not
1162 a sense of what the system contains. Some students grope toward the idea
1163 of an archive, a new idea to them, since they have not previously
1164 experienced what it means to have access to a vast body of somewhat
1169 teachers and students to gain a sense of what it is they are viewing.
1171 know that it is a postcard from the turn of the century, a panoramic
1172 photograph, or even machine-readable text of an eighteenth-century
1175 environment to grasp. Because of that, it occasionally becomes difficult
1178 FREEMAN also noted the obvious fear of the computer, which constitutes a
1184 believe they lack complete control. FREEMAN related the example of
1190 A final question raised by FREEMAN concerned access and availability of
1191 the system. She noted the occasional existence of a gap in communication
1202 A related issue in the school context concerned the number of
1203 workstations available at any one location. Centralization of equipment
1208 Another issue was hardware. As VECCIA observed, a range of sites exists,
1210 first computer for the primary purpose of using it in conjunction with
1214 newer piece of hardware, they must learn how to use that also; at an
1217 computer. All of these small issues raise one large question, namely,
1222 that were gained through the use of archival resources in schools, including:
1226 written reports, a documentary, a turn-of-the-century newspaper--
1234 * This sort of system is overcoming the isolation between disciplines
1246 another positive outcome--a high level of personal involvement of
1250 * Perhaps the most ironic strength of these kinds of archival
1251 electronic resources is that many of the teachers AM interviewed
1256 just that. Ironically, however, this lack of structure produces
1257 some of the confusion to which the newness of these kinds of
1258 resources may also contribute. The key to effective use of archival
1265 DISCUSSION * Nothing known, quantitatively, about the number of
1269 the manner and extent of the use of supporting materials in print
1270 provided by AM to await completion of evaluative study * A listener's
1271 reflections on additional applications of electronic texts * Role of
1278 LESK asked if MICHELSON could give any quantitative estimate of the
1279 number of humanities scholars who must see or want to see the original,
1280 or the best possible version of the material, versus those who typically
1288 genealogical or avocational research and the kind of professional
1294 Council of Learned Societies (ACLS), and what it showed was that 50
1295 percent of humanities scholars at that time were using computers. That
1296 constitutes the extent of our knowledge.
1298 Concerning AM's strategy for orienting people toward the scope of
1301 particularly in the schools, what has been made of their efforts. Within
1304 intended to offer a student user a sense of what a broadside is and what
1309 supporting materials in print provided by AM at the request of local
1313 and in this way gain a better understanding of the contents. But again,
1314 reaching firm conclusions concerning the manner and extent of their use
1318 Administration (NARA) as a result of the increasing emphasis on
1323 role and what it can do. In terms of changes and initiatives that NARA
1327 DALY's opening comments on how he could have used a Latin collection of
1329 would be unwilling to do that. But as he thought of that in terms of the
1330 original meaning of research--that is, having already mastered these texts,
1332 the electronic format made a lot of sense. GREENFIELD could envision
1333 growing numbers of scholars learning the new technologies for that very
1334 aspect of their scholarship and for convenience's sake.
1336 Listening to VECCIA and FREEMAN, GREENFIELD thought of an additional
1337 application of electronic texts. He realized that AM could be used as a
1340 before. Thus, AM is leading them, in theory, to a vast body of
1341 information and giving them a superficial overview of it, enabling them
1342 to select parts of it. GREENFIELD asked if any evidence exists that this
1348 FREEMAN conceded the correctness of GREENFIELD's observation as applied
1350 system, play with it, find some things of interest, and then walk away.
1351 But in the relatively controlled situation of a school library, much will
1353 viewed the situation not as one of fine-tuning research skills but of
1358 FREEMAN concluded that introducing the idea of following one's own path
1359 of inquiry, which is essentially what research entails, involves more
1361 observation that the individual teacher and the use of a creative
1363 Some schools and some teachers are making excellent use of the nature
1364 of critical thinking and teaching skills, she said.
1375 moderator of the "show-and-tell" session. She noted that a
1379 MYLONAS * Overview and content of Perseus * Perseus' primary materials
1381 aspects of Perseus * Tools to use with the Greek text * Prepared indices
1383 close study of words and concepts * Navigating Perseus by tracing down
1388 gave an overview of Perseus, a large, collaborative effort based at
1398 Consisting entirely of primary materials, Perseus includes ancient Greek
1399 texts and translations of those texts; catalog entries--that is, museum
1402 other sources. The number of objects and the objects for which catalog
1403 entries exist are accompanied by thousands of color images, which
1404 constitute a major feature of the database. Perseus contains
1405 approximately 30 megabytes of text, an amount that will double in
1408 navigation easier, the goal being to build part of the electronic
1412 The demonstration of Perseus will show only a fraction of the real work
1413 that has gone into it, because the project had to face the dilemma of
1416 material in? Since Perseus decided to opt for very high quality, all of
1421 compatible with the guidelines of the Text Encoding Initiative (TEI) when
1426 archival forms, consist of the best available slides, which are being
1427 digitized. Much of the catalog material exists in database form--a form
1430 comes in: All of this rich, well-marked-up information is stripped of
1431 much of its content; the images are converted into bit-maps and the text
1438 they appear, none of which information is in Perseus on the CD.
1440 Of the numerous multimedia aspects of Perseus, MYLONAS focused on the
1441 textual. Part of what makes Perseus such a pleasure to use, MYLONAS
1449 primary material in a kind of electronic library, an electronic sandbox,
1461 of an unfamiliar word in Greek after subjecting it to Perseus'
1464 Because vast amounts of indexing support all of the primary material, one
1465 can find out where else all forms of a particular Greek word appear--
1467 since the story of Prometheus has to do with the origins of sacrifice, a
1471 indexed the definitions of its dictionary)--the string sacrifice appears
1472 in the definitions of these sixty-five words. One may then find out
1473 where any of those words is used in the work(s) of a particular author.
1476 All of the indices driving this kind of usage were originally devised for
1477 speed, MYLONAS observed; in other words, all that kind of information--
1478 all forms of all words, where they exist, the dictionary form they belong
1482 are full-text searches in Perseus, much of the work is done behind the
1484 scenes, MYLONAS pointed out that without the SGML forms of the text, it
1485 could not be done effectively. Much of this indexing is based on the
1488 It was found that one of the things many of Perseus' non-Greek-reading
1490 of words and concepts via this kind of English-Greek word search, by which
1494 the words in the Greek but, of course, reading across in the English.
1499 perform a similar kind of index retrieval on the database of
1504 red-figure vase from the Boston Museum of Fine Arts--one can perform this
1505 kind of navigation very easily by tracing down indices. MYLONAS
1506 illustrated several generic scenes of sacrifice on vases. The features
1508 better means of retrieval.
1510 MYLONAS closed by looking at one of the pictures and noting again that
1511 one can do a great deal of research using the iconography as well as the
1513 highly interested in Greek concepts of foreigners and representations of
1514 non-Greeks. So they performed a great deal of research, both with texts
1522 DISCUSSION * Indexing and searchability of all English words in Perseus *
1523 Several features of Perseus 1.0 * Several levels of customization
1526 emphasis of Perseus *
1535 their descriptions--in short, in all of Perseus.
1541 one is interested in and selecting an area of information one is
1544 Since Perseus was developed in HyperCard, several levels of customization
1556 The Perseus Project has an evaluation team at the University of Maryland
1561 to use vast amounts of primary data may not exist. One documented effect
1563 being done by the same person instead of by three different people.
1568 Plutarch), via small gray underscoring (on the screen) of linked
1571 To different extents, most of the production work was done at Harvard,
1572 where the people and the equipment are located. Much of the
1574 the main challenge and the emphasis of Perseus is the gathering of
1577 Systems-building is definitely not the main concern. Thus, much of the
1590 Running PLD under a variety of retrieval softwares * Encoding the
1592 of user documentation * Limitations of the CD-ROM version *
1596 software interpretation of the Patrologia Latina Database (PLD). PLD's
1597 principal focus from the beginning of the project about three-and-a-half
1599 CALALUCA suggested, conversion of the text will be the major contribution
1603 project, but instead had relied upon a great deal of homework and
1604 marketing to accomplish the task of conversion.
1606 Ever since the possibilities of computer-searching have emerged, scholars
1607 in the field of late ancient and early medieval studies (philosophers,
1608 theologians, classicists, and those studying the history of natural law
1609 and the history of the legal development of Western civilization) have
1610 been longing for a fully searchable version of Western literature, for
1611 example, all the texts of Augustine and Bernard of Clairvaux and
1619 trouble--which is far greater with CD-ROM than with the production of
1621 of the hurdles to using electronic information that some publishers have
1624 The PLD project was based on the principle that computer-searching of
1629 The basic rule in converting PLD was to do no harm, to avoid the sins of
1630 intrusion in such a database: no introduction of newer editions, no
1631 on-the-spot changes, no eradicating of all possible falsehoods from an
1633 this discipline, but simply the beginning. The conversion of PLD has
1635 What about networking? Can the rights of a database be protected?
1636 Should one protect the rights of a database? How can it be made
1639 Those converting PLD also tried to avoid the sins of omission, that is,
1640 excluding portions of the collections or whole sections. What about the
1641 images? PLD is full of images, some are extremely pious
1642 nineteenth-century representations of the Fathers, while others contain
1643 highly interesting elements. The goal was to cover all the text of Migne
1644 (including notes, in Greek and in Hebrew, the latter of which, in
1656 PLD as a database that can, and should, be run under a variety of
1658 Consequently, the need to produce a CD-ROM of PLD, as well as to develop
1659 software that could handle some 1.3 gigabyte of heavily encoded text,
1660 developed out of conversations with collection development and reference
1662 pedestrian but also capable of incorporating the most detailed
1664 encoding and conversion of the data will prove the most enduring
1665 testament to the value of the project.
1667 The encoding of the database was also a hard-fought issue: Did the
1675 decisions for him or her. Essentially, the goal of encoding was to
1680 CALALUCA demonstrated a portion of Volume 160, because it had the most
1682 Technologies of Providence, RI, and is called Dynatext. The software
1685 Viewing a table of contents on the screen, the audience saw how Dynatext
1692 CALALUCA also demonstrated how a user can perform a variety of searches
1693 and quickly move to any part of a volume; the look-up screen provides
1696 CALALUCA argued that one of the major difficulties is not the software.
1698 a broad spectrum of computer sophistication, user documentation proves
1702 words of virtus and how one would be able to find its contents throughout
1704 many of the applications in the retrieval software being written for it
1705 will exceed the capabilities of the software employed now for the CD-ROM
1706 version. The CD-ROM faces genuine limitations, in terms of speed and
1707 comprehensiveness, in the creation of a retrieval software to run it.
1722 Various search and retrieval capabilities * Illustration of automatic
1727 IBM prototypes of AM * Multimedia aspects of AM *
1730 A demonstration of American Memory by its coordinator, Carl FLEISCHHAUER,
1731 and Ricky ERWAY, associate coordinator, Library of Congress, concluded
1732 the morning session. Beginning with a collection of broadsides from the
1734 collection in a presentable form at the time of the Workshop, FLEISCHHAUER
1735 highlighted several of the problems with which AM is still wrestling.
1737 broadsides but also the full text with illustrations of a set of
1741 of interpretation to introduce collections. In the present case, the
1749 the "go to" pull-down allowed the user in effect to jump out of Toolbook,
1752 Librarian. This was the Windows version of Personal Librarian, a
1758 other forms of the same root) and a truncated search. One of Personal
1766 While in the text of one of the broadside documents, FLEISCHHAUER
1770 written as on-line records right into one of the Library's mainframe
1772 who massaged them somewhat to display them in the manner shown. One of
1778 of the screen). Although extremely limited in its ability to translate
1780 on screen; a fairly easy thing to do, but it is one of the ways in which
1784 of AM, with accuracy being one of the places where project staff have
1786 FLEISCHHAUER cited the example of the standard of the rekeying industry,
1792 number of people who would look at those images and the number who would
1793 work only with the text. If the implication of LESK's question was
1795 reduced the value of the strategy for images.
1798 demonstrated several images derived from a scan of a preservation
1799 microfilm that AM had made. He awarded a grade of C at best, perhaps a
1800 C minus or a C plus, for how well it worked out. Indeed, the matter of
1802 in particular, scanning from microfilm, was one of the factors that drove
1804 example, was one of the issues that AM in its ignorance had not reckoned
1807 Further, the handling of images of the sort shown, in a desktop computer
1808 environment, involved a considerable amount of zooming and scrolling.
1815 scenario, he proceeded to illustrate other features of Personal Librarian
1818 of the search window pops the words that have been highlighted into the
1824 them with one Boolean operator and then a couple of words in another set
1825 of parentheses and asks for things within so many words of others.
1827 Until they became acquainted recently with some of the work being done in
1828 classics, the AM staff had not realized that a large number of the
1833 more of searching for concepts and ideas than for particular words.
1838 prototype built by AM contains a greater diversity of formats. Echoing a
1844 FLEISCHHAUER demonstrated several additional examples of the prototype
1846 kind of reading-room graphic suggests how one would be able to go around
1847 to different materials. AM contains a large number of photographs in
1855 phonograph records of political speeches that were made during and
1857 hours of audio, as AM has digitized it, which occupy 150 megabytes on a
1859 FLEISCHHAUER proceeded to a transcript of a speech with the audio
1863 Considerable value has been added beyond what the Library of Congress
1870 about the medium, he thought, than about AM's presentation of it.
1874 turn-of-the-century footage seemed to represent the most appropriate
1875 collections from the Library of Congress in motion pictures. These were
1879 contains about fifty titles and pieces of film from that period, all of
1886 DISCUSSION * Using the frame-grabber in AM * Volume of material processed
1887 and to be processed * Purpose of AM within LC * Cataloguing and the
1888 nature of AM's material * SGML coding and the question of quality versus
1897 digitize a single frame of the movie or one of the photographs. It
1908 kind of user than another.
1910 Concerning the total volume of material that has been processed in this
1912 all of them photographic. In the Macintosh environment, for example,
1915 500 political cartoons in the form of drawings. The motion pictures, as
1918 AM also has a manuscript collection, the life history portion of one of
1923 has recycled a fair amount of the work done by LC's Prints and
1925 the 1980s. For example, a special division of LC has tooled up and
1926 thought through all the ramifications of electronic presentation of
1928 The purpose of AM within the Library, it is hoped, is to catalyze several
1929 of the other special collection divisions which have no particular
1934 heavily weighted toward the description of monograph and serial
1935 materials, but is much thinner when one enters the world of manuscripts
1941 Publishing collection of 25,000 pictures. In the case of the Federal
1943 information from twenty-six different states, AM with the assistance of
1944 Karen STUART of the Manuscript Division will attempt to find some way not
1948 is conservative and clings to cataloguing, though of course visitors tout
1950 perhaps one need not have cataloguing or that much of it could be put aside.
1952 The matter of SGML coding, FLEISCHHAUER conceded, returned the discussion
1953 to the earlier treated question of quality versus quantity in the Library
1954 of Congress. Of course, text conversion can be done with 100-percent
1956 a tiny amount will be exposed, whereas permitting lower levels of
1963 TWOHIG * A contrary experience concerning electronic options * Volume of
1964 material in the Washington papers and a suggestion of David Packard *
1965 Implications of Packard's suggestion * Transcribing the documents for the
1966 CD-ROM * Accuracy of transcriptions * The CD-ROM edition of the Founding
1970 Finding encouragement in a comment of MICHELSON's from the morning
1972 options to do their work--Dorothy TWOHIG, editor, The Papers of George
1975 MICHELSON's. TWOHIG emphasized literary scholars' complete ignorance of
1979 After providing an overview of the five Founding Fathers projects
1982 the University of Virginia), TWOHIG observed that the Washington papers,
1983 like all of the projects, include both sides of the Washington
1988 Project (WPP) greeted David Packard's suggestion that the papers of the
1990 great benefit of American scholarship, via CD-ROM.
1993 the transcription of thousands of documents waiting to be put on disk in
1994 the WPP offices. Further, since the costs of collecting, editing, and
1996 running into the millions of dollars, and the considerable staffs
1997 involved in all of these projects were devoting their careers to
1999 revolutionary aspect: Transcriptions of the entire corpus of the
2001 college libraries, even high schools, at a fraction of the cost--
2003 press run of 1,000 of each volume of the published papers at $45-$150 per
2008 papers. TWOHIG stressed, however, that development of the Founding
2013 transcribe the 75,000 or so documents of the Washington papers remaining
2014 to be transcribed onto computer disks. Slides illustrated several of the
2015 problems encountered, for example, the present inability of CD-ROM to
2023 indication of the project's benefits in the ongoing use made by scholars
2024 of the search functions of the CD-ROM, particularly in reducing the time
2025 spent in manually turning the pages of the Washington papers.
2027 TWOHIG next furnished details concerning the accuracy of transcriptions.
2028 For instance, the insertion of thousands of documents on the CD-ROM
2030 original manuscript several times as in the case of documents that appear
2032 check for obvious typos, the misspellings of proper names, and other
2035 this process has met with opposition from some of the editors on the
2038 misspelling of proper names and other relatively minor editorial matters.
2040 Completion of all five Founding Fathers projects (i.e., retrievability
2041 and searchability of all of the documents by proper names, alternate
2042 spellings, or varieties of subjects) will provide one of the richest
2043 sources of this size for the history of the United States in the latter
2044 part of the eighteenth century. Further, publication on CD-ROM will
2050 negotiations with the publishers of the papers. At the moment, the
2052 developed out of the Thesaurus Linguae Graecae project and designed for
2053 the use of classical scholars. There are perhaps 400 IBYCUS computers in
2054 the country, most of which are in university classics departments.
2055 Ultimately, it is anticipated that the CD-ROM edition of the Founding
2064 DISCUSSION * Several additional features of WPP clarified *
2069 intellectual product consists in the electronic transcription of the
2071 marked up; (3) that cataloging and subject-indexing of the material
2081 LEBRON * Overview of the history of the joint project between AAAS and
2084 electronic publishing * How AAAS and OCLC arrived at the subject of
2085 clinical trials * Advantages of the electronic format and other features
2086 of OJCCT * An illustrated tour of the journal *
2089 Maria LEBRON, managing editor, The Online Journal of Current Clinical
2090 Trials (OJCCT), presented an illustrated overview of the history of the
2091 joint project between the American Association for the Advancement of
2095 three years ago and combines the strengths of these two disparate
2096 organizations. In short, OJCCT represents the process of scholarly
2100 with traditional publishing on hard copy--for example, peer review of
2102 noted in particular the implications of citation counts for tenure
2110 economic limitations such as the storage costs of maintaining back issues
2114 not a bulletin board or E-mail, forms of electronic transmission of
2116 of what the journal is attempting to do. OJCCT, which publishes
2117 peer-reviewed medical articles dealing with the subject of clinical
2121 Next, LEBRON described how AAAS and OCLC arrived at the subject of
2123 not require halftones but can satisfy the needs of its audience with line
2125 dissemination of high-quality research results. Clinical trials are
2126 research activities that involve the administration of a test treatment
2130 board, editorial content, and the types of articles it publishes
2134 Among the advantages of the electronic format are faster dissemination of
2135 information, including raw data, and the absence of space constraints
2140 accurate transcription. Other features of OJCCT include on-screen alerts
2141 that allow linkage of subsequently published documents to the original
2142 documents; on-line searching by subject, author, title, etc.; indexing of
2146 days of indexing of all articles published in the journal;
2153 speedy editorial process and the coding of the document using SGML tags
2155 tour of the journal, its search-and-retrieval capabilities in particular,
2157 and the importance of on-screen alerts to the medical profession re
2165 DISCUSSION * Additional features of OJCCT *
2182 published in it. Thus the table of contents grows bigger. The date
2183 of publication serves to distinguish between currently published
2200 employed by many of the hard-copy journals. The process still
2204 maintained on the computer permanently and subscribers, as part of
2206 of everything published during that year; in addition, reprints can
2209 dissemination of the information.
2213 opposed to downloading the whole thing and printing it out; a mix of
2214 both types of users likely will result.
2221 Developing a network application an underlying assumption of the project
2222 * Details of the scanning process * Print-on-demand copies of books *
2223 Future plans include development of a browsing tool *
2233 ago; mass storage and the dramatic savings that result from it in terms of
2236 of information; and, of course, digital technologies, whose applicability to
2243 has provided a significant amount of hardware, the CLASS Project has been
2247 library and information technologies. The focus of the project has been
2250 such books were the result of developments in papermaking around the
2251 beginning of the Industrial Revolution. The papermaking process was
2252 changed so that a significant amount of acid was introduced into the
2255 One of the advantages for technology and for the CLASS Project is that
2256 the information in brittle books is mostly out of copyright and thus
2259 material. Acknowledging the familiarity of those working in preservation
2261 done: the primary preservation technology used today is photocopying of
2262 brittle material. Saving the intellectual content of the material is the
2267 An underlying assumption of the CLASS Project from the beginning was
2271 workstation, and a printer is located in another building. All of the
2273 in the on-line catalogue. In fact, a record for each of these electronic
2274 books is stored in the RLIN database so that a record exists of what is
2279 assumption is that the preferred means of finding the material will be by
2283 because this is a preservation application, is the placing of the pages
2285 be used with some sort of a document feeder, but because of this
2290 that all of the image, all of the information, has been captured. Then,
2295 in effect, the equivalent of preservation photocopies. Thus, the project
2296 has a library of digital books. In essence, CLASS is scanning and
2300 TIFF files on an optical filing system that is composed of a database
2302 stores 64 twelve-inch platters. A very-high-resolution printed copy of
2309 to print on demand--to make their own copies of books. (PERSONIUS
2310 distributed copies of an engineering journal published by engineering
2311 students at Cornell around 1900 as an example of what a print-on-demand
2312 copy of material might be like. This very cheap copy would be available
2315 PERSONIUS then attempted to illustrate a very early prototype of
2317 developed a prototype of a view station that can send images across the
2326 have a printed copy of it.
2333 selecting books for scanning * Compression and decompression of images *
2343 any of those platforms will retrieve books; a further operating
2351 added at the advice of Cornell's legal staff with the caveat that it
2363 appropriately--a kind of autosegmentation that would enable the
2370 in need of preservation, the mathematics library and the mathematics
2395 project is capturing an image that is of sufficient resolution to be
2410 projects, Library of Congress, and moderator of this session, first noted
2411 the blessed but somewhat awkward circumstance of having four very
2415 members of the audience would join the discussion. He stressed the
2416 subtitle of this particular session, "Options for Dissemination," and,
2417 concerning CD-ROMs, the importance of determining when it would be wise
2418 to consider dissemination in CD-ROM versus networks. A shopping list of
2424 on networks, identifying the pool of existing networks, determining how a
2425 producer would choose between networks, and identifying the elements of
2431 Information Service (NTIS), in the case of government. The pros and cons
2437 marketing and dissemination that some would seek. There is the body of
2438 commercial publishers that do possess that kind of expertise in
2441 matters such as distribution and marketing. Such are some of the options
2442 for publishing in the case of CD-ROM.
2444 In the case of technical and design issues, which are also important,
2457 sword * Publishing information on a CD-ROM in the present world of
2459 Examples demonstrated earlier in the day as a set of insular information
2461 necessary * Project NEEDS and the issues of information reuse and active
2462 versus passive use * X-Windows as a way of differentiating between
2464 of networked multimedia information * Need for good, real-time delivery
2465 protocols * The question of presentation integrity in client-server
2469 Clifford LYNCH, director, Library Automation, University of California,
2475 and more subtle. He invited the members of the audience to extrapolate,
2477 sort of a world of electronics information--scholarly, archival,
2482 Putting the issue of CD-ROM in perspective before getting into
2496 slowness but the two-edged sword of having the retrieval application and
2498 typical CD-ROM publication model. It is not a case of publishing data
2499 but of distributing a typically stand-alone, typically closed system,
2502 cases of integrating data on one disk with that on another. Most CD-ROM
2504 in the present world of immature standards and lack of understanding of
2511 institution such as the University of California has vendors who will
2514 magnetic tape, regardless of how many people may use it concurrently,
2524 Given that context, LYNCH described the examples demonstrated as a set of
2535 extremely difficult in the environments under discussion with copies of
2538 The notion of layering also struck LYNCH as lurking in several of the
2540 information archives without a significant amount of navigation built in.
2546 databases as well as a database of Renaissance culture). This ability to
2547 organize resources, to build things out of multiple other things on the
2548 network or select pieces of it, represented for LYNCH one of the key
2549 aspects of network information.
2554 produce a database of engineering courseware as well as the components
2555 that can be used to develop new courseware. In a number of the existing
2556 applications, LYNCH said, the issue of reuse (how much one can take apart
2558 raised the issue of active versus passive use, one aspect of which is
2561 was uncertain how these resources would be used by the vast majority of
2564 LYNCH next said a few words about X-Windows as a way of differentiating
2565 between network access and networked information. A number of the
2573 graphical version of remote log-in across the network. X-type applications
2576 LYNCH next discussed barriers to the distribution of networked multimedia
2577 information. The heart of the problem is a lack of standards to provide
2582 useful tool kit of exchange formats for basic texts is only now being
2583 assembled. The synchronization of content streams (i.e., synchronizing a
2590 highly important in this context is the notion of networked digital
2591 object IDs, the ability of one object on the network to point to another
2600 extensive control over the integrity of the presentation; strange
2602 thought must be given to what guarantees integrity of presentation. Part
2603 of that is related to where one draws the boundaries around a networked
2604 information service. This question of presentation integrity in
2613 costs therefore can be amortized among large numbers of users. In this
2615 place to start, because it tends to have a longer life span than much of
2617 example, that American Memory fits many of the criteria outlined. He
2620 as a way of helping the American educational system.
2622 LYNCH closed by noting that the kinds of applications demonstrated struck
2623 him as excellent justifications of broad-scale networking for K-12, but
2630 DISCUSSION * Dearth of genuinely interesting applications on the network
2631 a slow-changing situation * The issue of the integrity of presentation in
2640 once one goes outside high-end science and the group of those who need
2641 access to supercomputers, there is a great dearth of genuinely
2643 slowly, with some of the scientific databases and scholarly discussion
2645 of Wide Area Information Servers (WAIS) and some of the databases that
2646 are being mounted there. However, many of those things do not seem to
2648 students of LYNCH's acquaintance would not qualify as devotees of serious
2651 Concerning the issue of the integrity of presentation, LYNCH believed
2652 that a couple of information providers have laid down the law at least on
2654 Library of Medicine feels strongly that one needs to employ the
2666 CD-ROM software does not network for a variety of reasons, LYNCH said.
2674 from enough of their customers.
2679 BESSER * Implications of disseminating images on the network; planning
2680 the distribution of multimedia documents poses two critical
2683 implications for networking * Transmission of megabyte size images
2685 trends for compression * A disadvantage of using X-Windows * A project at
2689 Howard BESSER, School of Library and Information Science, University of
2691 broad implications of disseminating them on the network. He argued that
2692 planning the distribution of multimedia documents posed two critical
2693 implementation problems, which he framed in the form of two questions:
2695 have for viewing of the material? and 2) How can one deliver a
2696 sufficiently robust set of information in an accessible format in a
2697 reasonable amount of time? Depending on whether network or CD-ROM is the
2698 medium used, this question raises different issues of storage,
2701 Concerning the design of platforms (e.g., sound, gray scale, simple
2705 workstations would simply have less functionality. He urged members of
2707 layered functionality across a wide variety of platforms.
2710 large a machine to design for situations when the largest number of users
2711 have the lowest level of the machine, and one desires higher
2712 functionality. BESSER then proceeded to the question of file size and
2714 For example, a digital color image that fills the screen of a standard
2715 mega-pel workstation (Sun or Next) will require one megabyte of storage
2716 for an eight-bit image or three megabytes of storage for a true color or
2718 computational procedures in which no data is lost in the process of
2720 maintained) might bring storage down to a third of a megabyte per image,
2721 but not much further than that. The question of size makes it difficult
2722 to fit an appropriately sized set of these images on a single disk or to
2725 With these full screen mega-pel images that constitute a third of a
2727 a standard CD-ROM represents approximately 60 percent of that. Storing
2728 images the size of a PC screen (just 8 bit color) increases storage
2729 capacity to 4,000-12,000 images per gigabyte; 60 percent of that gives
2730 one the size of a CD-ROM, which in turn creates a major problem. One
2737 questions of disk access, remote display, and current telephone
2738 connection speed make transmission of megabyte-size images impractical.
2742 issues of how much one is willing to lose in the compression process and
2744 known is that compression entails some loss of data. BESSER urged that
2746 example, what kind of images are needed for what kind of disciplines, and
2747 what kind of image quality is needed for a browsing tool, an intermediate
2756 offer promise. These issues of compression and decompression, BESSER
2757 argued, resembled those raised earlier concerning the design of different
2758 platforms. Gauging the capabilities of potential users constitutes a
2764 Imagequery, especially the advantages and disadvantages of using
2767 networked system. Finally, BESSER described a project of Jim Wallace at
2776 enough is known concerning the value of images.
2783 to determine quality of images users will tolerate *
2789 LESK argued that the photographers were far ahead of BESSER: It is
2793 licensing agreements on any sort of reasonable terms. LESK had heard
2795 some image in some kind of educational production for $100 per image, but
2798 responded that a consortium of photographers, headed by a former National
2799 Geographic photographer, had started assembling its own collection of
2800 electronic reproductions of images, with the money going back to the
2805 from video. BESSER urged the launching of a study to determine what
2815 LARSEN * Issues of scalability and modularity * Geometric growth of the
2818 Effects of implementation of the Z39.50 protocol for information
2819 retrieval on the library system * The trade-off between volumes of data
2820 and its potential usage * A snapshot of current trends *
2824 of Maryland at College Park, first addressed the issues of scalability
2825 and modularity. He noted the difficulty of anticipating the effects of
2826 orders-of-magnitude growth, reflecting on the twenty years of experience
2827 with the Arpanet and Internet. Recalling the day's demonstrations of
2832 LARSEN focused on the geometric growth of the Internet from its inception
2834 that rapid growth. To illustrate the issue of scalability, LARSEN
2844 building layers of communication protocols, as BESSER pointed out.
2845 By layering both physically and logically, a sense of scalability is
2852 through September 1991--of the number of networks that comprise the
2853 Internet. This growth has been sustained largely by the availability of
2855 log-on (telnet). LARSEN also reviewed the growth in the kind of traffic
2857 of a larger population of users and increasing use per user. Today one sees
2862 LARSEN then illustrated a model of a library's roles and functions in a
2863 network environment. He noted, in particular, the placement of on-line
2867 fundamental questions of networked information in order to build
2871 LARSEN supported the role of the library system as the access point into
2872 the nation's electronic collections. Implementation of the Z39.50
2876 conformant with Z39.50 in a manner that is familiar to University of
2878 secondary content into primary content. (The notion of how one links
2882 projects supporting the ordering of materials across the network, LARSEN
2883 revisited the issue of transmitting high-density, high-resolution color
2884 images across the network and the large amounts of bandwidth they
2888 LARSEN illustrated the trade-off between volumes of data in bytes or
2889 orders of magnitude and the potential usage of that data. He discussed
2891 of information), and what one could do with a network supporting
2893 environment includes a composite of data-transmission requirements,
2896 construction, and operation of multigigabyte networks.
2902 LARSEN concluded by offering a snapshot of current trends: continuing
2903 geometric growth in network capacity and number of users; slower
2904 development of applications; and glacial development and adoption of
2912 radio and the development of MELVYL in 1980-81 in the Division of Library
2913 Automation at the University of California * Design criteria for packet
2917 infrastructure of radios that do not move around *
2921 polled the audience in order to seek out regular users of the Internet as
2930 be extremely frustrating. He suggested that because of economics and
2931 physical barriers we were beginning to create a world of haves and have-nots
2932 in the process of scholarly communication, even in the United States.
2934 BROWNRIGG detailed the development of MELVYL in academic year 1980-81 in
2935 the Division of Library Automation at the University of California, in
2936 order to underscore the issue of access to the system, which at the
2938 network, which at that time entailed use of satellite technology, that is,
2940 from the State of California's microwave system. The installation of
2942 formed part of a larger problem involving politics and financial resources).
2944 of distributing the signal throughout the campus. The solution involved
2946 which combined the basic notion of packet-switching with radio. The project
2955 detail research and development of the technology, how it is being
2961 have been built, he continued, and are in the process of being
2971 people, like those in the audience for the price of a VCR could purchase
2973 to the Internet, and partake of all its services, with no need for an FCC
2975 presented several details of a demonstration project currently taking
2978 nodes running at backbone speeds, and 100 of these nodes will be libraries,
2982 BROWNRIGG next explained Part 15.247, a new rule within Title 47 of the
2983 Code of Federal Regulations enacted by the FCC in 1985. This rule
2985 build a radio that would run at no more than one watt of output power and
2986 use a fairly exotic method of modulating the radio wave called spread
2987 spectrum. Spread spectrum in fact permits the building of networks so
3000 reimplement it so that one can have a WAIS server on a Mac instead of a
3003 project, which has a team of about twelve people, will run through 1993
3006 law. Thus, the need is to create an infrastructure of radios that do not
3023 unlike what falls in most parts of the United States.
3027 SESSION IV. IMAGE CAPTURE, TEXT CAPTURE, OVERVIEW OF TEXT AND
3030 William HOOTON, vice president of operations, I-NET, moderated this session.
3033 KENNEY * Factors influencing development of CXP * Advantages of using
3034 digital technology versus photocopy and microfilm * A primary goal of
3035 CXP; publishing challenges * Characteristics of copies printed * Quality
3036 of samples achieved in image capture * Several factors to be considered
3037 in choosing scanning * Emphasis of CXP on timely and cost-effective
3038 production of black-and-white printed facsimiles * Results of producing
3039 microfilm from digital files * Advantages of creating microfilm * Details
3040 concerning production * Costs * Role of digital technology in library
3044 Anne KENNEY, associate director, Department of Preservation and
3048 would be important, at least for the present generation of users and
3049 equipment. She described three factors that influenced development of
3050 the project: 1) Because the project has emphasized the preservation of
3051 deteriorating brittle books, the quality of what was produced had to be
3060 quality reproduction of a deteriorating original than conventional
3063 loss of quality, as opposed to the situation with light-lens processes,
3065 subsequent generation of an image. 3) A digital image can be manipulated
3066 in a number of ways to improve image capture; for example, Xerox has
3069 reproduction of both. (With light-lens technology, one must choose which
3075 faint documents. 5) On-screen inspection can take place at the time of
3077 substantially reduce the number of retakes required in quality control.
3079 A primary goal of CXP has been to evaluate the paper output printed on
3081 scanned images at a rate of 135 pages a minute. KENNEY recounted several
3082 publishing challenges to represent faithful and legible reproductions of
3084 captured. For example, many of the deteriorating volumes in the project
3086 languages such as Japanese, in which the buildup of characters comprised
3087 of varying strokes is difficult to reproduce at lower resolutions; a
3088 surprising number of them came with annotations and mathematical
3093 and toner requirements for proper adhesion of print to page, as described
3095 the archival equivalent of preservation photocopy.
3097 KENNEY then discussed several samples of the quality achieved in the
3098 project that had been distributed in a handout, for example, a copy of a
3099 print-on-demand version of the 1911 Reed lecture on the steam turbine,
3102 capabilities of scanning to photocopy for a standard test target, the
3106 Conceding the simplistic nature of her review of the quality of scanning
3107 to photocopy, KENNEY described it as one representation of the kinds of
3119 cost-effective manner of printed facsimiles that consisted largely of
3122 the process of compressing [and decompressing] an image--the exact
3125 Telephone) compression. CXP was getting compression ratios of about
3130 version, it appears 1) that other combinations of spatial resolution with
3137 Among CXP's findings concerning the production of microfilm from digital
3140 resulting film was faithful to the image capture of the digital files,
3142 lecture were superior to that of the light-lens film, the resolution
3146 based on definition of quality for a preservation copy. Although making
3148 to investigate the issue over the course of the next year.
3150 KENNEY concluded this portion of her talk with a discussion of the
3151 advantages of creating film: it can serve as a primary backup and as a
3158 * Development and testing of a moderately-high resolution production
3159 scanning workstation represented a third goal of CXP; to date, 1,000
3170 significantly with subsequent iterations of the software from Xerox;
3171 a three-month time-and-cost study of scanning found that the average
3176 scanning sample pages to identify a default range of settings for
3182 of the total pages scanned.
3185 four years, the cost of storing and refreshing the digital files every
3186 four years, and the cost of printing and binding, book-cloth binding, a
3191 Of course, with scanning, in addition to the paper facsimile, one is left
3192 with a digital file from which subsequent copies of the book can be
3193 produced for a fraction of the cost of photocopy, with readers afforded
3194 choices in the form of these copies.
3198 included the means of disseminating reprints of books that are in demand
3205 KENNEY stressed that the focus of CXP has been on obtaining high quality
3206 in a production environment. The use of digital technology is viewed as
3212 ANDRE * Overview and history of NATDP * Various agricultural CD-ROM
3219 presented an overview of NATDP, which has been underway at NAL the last
3221 defined agricultural information as a broad range of material going from
3226 NATDP began in late 1986 with a meeting of representatives from the
3227 land-grant library community to deal with the issue of electronic
3228 information. NAL and forty-five of these libraries banded together to
3237 installed at NAL), NATDP has done a variety of things, concerning which
3242 Over the four years of this project, four separate CD-ROM products on
3245 Thus, NATDP has gained comparative information in terms of those relative
3246 costs. Each of these products contained the full ASCII text as well as
3247 page images of the material, or between 4,000 and 6,000 pages of material
3253 The third phase of NATDP focused on delivery mechanisms other than
3254 CD-ROM. At the suggestion of Clifford LYNCH, who was a technical
3256 Internet and initiated a project with the help of North Carolina State
3257 University, in which fourteen of the land-grant university libraries are
3262 success had led to its extension. (ANDRE noted that one of the first
3265 of choice after a lengthy evaluation.)
3269 1) An arrangement with the American Society of Agronomy--a
3271 about 1908--to scan and create bit-mapped images of its journal.
3276 right to use this material in support of its program.
3279 to try to do the same thing--put the journals of particular interest
3282 2) An extension of the earlier product on aquaculture.
3286 images of Carver's papers, letters, and drawings.
3288 It was anticipated that all of these products would appear no more than
3298 filtering * Image capture from microform: the papers and letters of
3304 details of NATDP, including her primary responsibility, scanning and
3308 processing of the material is nearly identical, in which NATDP is also
3317 (database design occurs in the process of preparing the material for
3319 defining the contents--what will constitute a record, what kinds of
3320 fields will be captured in terms of author, title, etc.); 3) perform a
3321 certain amount of markup on the paper publications. NAL performs this
3322 task record by record, preparing work sheets or some other sort of
3325 Part of this process also involves determining NATDP's file and directory
3334 capture requires greater care because of the quality of the image: it
3336 just for the capture of photographs.
3339 including the title of the book and the title of the chapter, which will
3341 front of a full-text record so that it is searchable.
3344 bound publications in the case of NATDP, however, because often they are
3348 separating of the images. After performing optical character
3353 ZIDAR next illustrated the kinds of adjustments that one can make when
3359 Though adequate for capturing text that is all of a standard size, 300
3360 dpi is unsuitable for any kind of photographic material or for very small
3361 text. Many scanners allow for different image formats, TIFF, of course,
3374 ZIDAR emphasized the importance of de-skewing and filtering as
3378 is extremely time-consuming. The same holds for filtering of
3382 reels from a sixty-seven-reel set of the papers and letters of George
3389 Unfortunately, the process of scanning from microfilm was not an
3394 OCR could not be performed from the scanned images of the frames. The
3398 microfilm, none of which seemed to affect the quality of the image; but
3399 also on none of them could OCR be performed.
3402 factors in mind. ZIDAR noted two factors that influenced the quality of
3403 the images: 1) the inherent quality of the original and 2) the amount of
3411 completed in summer 1991 and by the end of summer 1992 the disk was
3417 account of the nature of the material, and therefore some of the frames
3421 page, which was extremely time-consuming. The quality of the images
3422 scanned from the printout of the microfilm compared unfavorably with that
3423 of the original images captured directly from the microfilm. The
3428 ZIDAR. The type of equipment that one would purchase for a scanning
3441 prove critical to the success or failure of one's system. In addition to
3447 Finally, ZIDAR stressed the importance of buying an open system that allows
3454 digital imagery (POB) * The place of electronic tools in the library of
3455 the future * The uses of images and an image library * Primary input from
3457 hypotheses guiding POB * Use of vendor selection process to facilitate
3459 results of process for Yale * Key factor distinguishing vendors *
3460 Components, design principles, and some estimated costs of POB * Role of
3462 quality and cost * Factors affecting the usability of complex documents
3466 Donald WATERS, head of the Systems Office, Yale University Library,
3467 reported on the progress of a master plan for a project at Yale to
3469 that POB was in an advanced stage of planning, WATERS detailed, in
3470 particular, the process of selecting a vendor partner and several key
3472 He commented first on the vision that serves as the context of POB and
3475 WATERS sees the library of the future not necessarily as an electronic
3479 context of this vision. Several roles for electronic tools include
3480 serving as: indirect sources of electronic knowledge or as "finding"
3482 documents and archives); direct sources of recorded knowledge; full-text
3483 images; and various kinds of compound sources of recorded knowledge (the
3484 so-called compound documents of Hypertext, mixed text and image,
3493 While input will come from a variety of sources, POB is considering
3499 The purpose and scope of POB focus on imaging. Though related to CXP,
3500 POB has two features which distinguish it: 1) scale--conversion of
3506 a modest incremental cost of microfilm. 3) Capturing and storing documents
3512 use a vendor selection process to facilitate a good deal of the
3514 confirming the validity of the plan, establishing the cost of the project
3525 and with the support of the Commission on Preservation and Access, each
3527 project and then to submit a formal proposal for the completion of the
3529 pay the loser. The results for Yale of involving a vendor included:
3530 broad involvement of Yale staff across the board at a relatively low
3533 understanding of the factors that affect corporate response to markets
3535 view of the imaging markets.
3539 internal complexity of the company also was an important factor. POB was
3542 the clear winner. WATERS then described the components of the proposal,
3543 the design principles, and some of the costs estimated for the process.
3563 The costs proposed for start-up assumed the existence of the Yale network
3565 at $1 million over the three phases. At the end of the project, the annual
3571 view of the imaging markets: the management of complex documents in
3574 useful for developing that market because of the qualities of the
3575 material. For example, much of it is out of copyright. The resolution
3576 of key issues such as the quality of scanning and image browsing also
3577 will affect development of that market.
3580 context of rapid change, several factors affect quality and cost, to
3582 levels of resolution that can be achieved. POB believes it can bring
3589 operator for handling material, the ways of integrating quality control
3592 at the point of scanning. Thus, thanks to Xerox, POB anticipates having
3598 of the material, including subsequent OCR, storage, printing, and
3600 This facility, WATERS said, is perhaps the weakest aspect of imaging
3601 technology and the most in need of development.
3603 A variety of factors affect the usability of complex documents in image
3604 form, among them: 1) the ability of the system to handle the full range
3605 of document types, not just monographs but serials, multi-part
3606 monographs, and manuscripts; 2) the location of the database of record
3611 internal structure of the document accessible to the reader; and finally,
3612 5) the physical presentation on the CRT of those documents. POB is ready
3633 * HOLMES commented on the unsuccessful experience of NARA in
3639 rates obtained by substituting the make and model of scanners in
3640 NARA's recent test of an "intelligent" character-recognition product
3641 for a new company. In the selection of hardware and software,
3646 * Danny Cohen and Alan Katz of the University of Southern California
3649 format for Internet distribution of monochrome bit-mapped images,
3658 from scanning microfilm, for example, with that device, that set of
3662 important. Most of the problems discussed today have been solved in
3664 cognizant of various experiences, this is not to say that it will
3667 * At NAL, the through-put rate of the scanning process for paper,
3682 THOMA * Illustration of deficiencies in scanning and storage process *
3690 of Medicine (NLM), illustrated several of the deficiencies discussed by
3691 the previous speakers. He introduced the topic of special problems by
3692 noting the advantages of electronic imaging. For example, it is regenerable
3696 One of the difficulties discussed in the scanning and storage process was
3698 things for maps, medical X-rays, or broadcast television. In the case of
3699 documents, THOMA said, image quality boils down to legibility of the
3700 textual parts, and fidelity in the case of gray or color photo print-type
3705 Better image quality entails at least four different kinds of costs: 1)
3707 greater number of elements costs more; 2) time costs that translate to
3715 But while resolution takes care of the issue of legibility in image
3724 THOMA offered an example of extremely poor contrast, which resulted from
3725 the fact that the stock was a heavy red. This is the sort of image that
3731 age. This was also a case of contrast deficiency, and correction was
3735 but it comes with dark areas. Though THOMA did not have a slide of the
3743 characteristics. The neighbors of a pixel determine where the threshold
3746 THOMA showed an example of a page that had been made deficient by a
3747 variety of techniques, including a burn mark, coffee stains, and a yellow
3748 marker. Application of a fixed-thresholding scheme, THOMA argued, might
3749 take care of several deficiencies on the page but not all of them.
3751 removes most of the deficiencies so that at least the text is legible.
3759 that was distributed by CXP, THOMA noticed that the dithered image of the
3761 extreme example of deterioration in the text in which compounded
3769 nonprint materials with an example of a grayish page from a medical text,
3770 which was reproduced to show all of the gray that appeared in the
3771 original. Dithering provided a reproduction of all the gray in the
3772 original of another example from the same text.
3774 THOMA finally illustrated the problem of bordering, or page-edge,
3777 reasons: 1) the aesthetics of the image; after all, if the image is to
3778 be preserved, one does not necessarily want to keep all of its
3782 point of scanning window the part of the image that is desirable and
3783 automatically turn all of the pixels out of that picture to white.
3792 Carl FLEISCHHAUER, coordinator, American Memory, Library of Congress,
3800 Devoting the remainder of his brief presentation to dithering,
3804 algorithm that forms part of the same Kurzweil Xerox scanner; it
3817 DISCUSSION * Relative use as a criterion for POB's selection of books to
3821 During the discussion period, WATERS noted that one of the criteria for
3825 coherent bodies of material will increase usage or whether POB should
3829 approach would be to provide a large body of intellectually coherent
3831 in microfilm. POB would seek material that was out of copyright.
3836 BARONAS * Origin and scope of AIIM * Types of documents produced in
3837 AIIM's standards program * Domain of AIIM's standardization work * AIIM's
3839 Categories of EIM standardization where AIIM standards are being
3843 Jean BARONAS, senior manager, Department of Standards and Technology,
3856 terminology of standards and of the technology it uses; 2) methods of
3858 users to evaluate and measure quality; 4) the features of apparatus used
3861 BARONAS noted that three types of documents are produced in the AIIM
3868 followed by the number and title of the standard.
3870 BARONAS then illustrated the domain of AIIM's standardization work. For
3871 example, AIIM is the administrator of the U.S. Technical Advisory Group
3877 BARONAS described AIIM's structure, including its board of directors, its
3878 standards board of twelve individuals active in the image-management
3880 its National Standards Council, which is comprised of the members of a
3881 number of organizations who vote on every AIIM standard before it is
3887 BARONAS illustrated the procedures of TC l7l, which covers all aspects of
3890 member countries of TC l7l can simultaneously work on the development of
3896 new density ranges and new methods of evaluating film images in the
3906 BARONAS next outlined the four categories of EIM standardization in which
3909 conversion of documents. She detailed several of the main projects of
3910 each: 1) in the category of image transfer and retrieval, a bi-level
3913 the images are compressed using G3 and G4 compression; 2) the category of
3932 BATTIN * The implications of standards for preservation * A major
3935 preservation of the human record * Near-term prognosis for reliable
3938 world and the politics of reproduction * Need to redefine the concept of
3939 archival and to begin to think in terms of life cycles * Cooperation and
3941 of preserving text and image * General principles to be adopted in a
3946 (CPA), addressed the implications of standards for preservation. She
3947 listed several areas where the library profession and the analog world of
3953 development of national and international collaborative programs,
3954 nevertheless, a pervasive mistrust of other people's standards remains a
3957 The zeal to achieve perfection, regardless of the cost, has hindered
3963 with the preservation of the human record, that is, the provision of
3964 access to recorded knowledge in a multitude of media as far into the
3967 yesterday, if set too soon they can hinder creativity, expansion of
3968 capability, and the broadening of access. The characteristics of
3970 imagery. And the nature of digital technology implies continuing
3980 One is the continuing assurance of access to knowledge originally
3985 produced or captured raw data; and 2) the application of digital
3986 technologies to the reformatting of materials originally published on a
3989 The preservation of electronic media requires a reconceptualizing of our
3991 may last far longer than any of us envision today. BATTIN urged the
3992 necessity of shifting focus from assessing, measuring, and setting
3993 standards for the permanence of the medium to the concept of managing
3994 continuing access to information stored on a variety of media and
3995 requiring a variety of ever-changing hardware and software for access--a
4000 1) standards in the real world and 2) the politics of reproduction.
4003 concept of archive and to begin to think in terms of life cycles. In
4005 cavalier attitude toward life cycles. The transient nature of the
4007 concept of life cycles in place of permanency.
4010 to ensure efficient exchange of information. Moreover, during this
4015 In terms of cooperation, particularly in the university setting, BATTIN
4017 directions. The CPA has catalyzed a small group of universities called
4022 instead of waiting for something that is officially blessed. Continuing
4023 to apply analog values and definitions of standards to the digital
4024 environment, BATTIN said, will effectively lead to forfeiture of the
4025 benefits of digital technology to research and scholarship.
4027 Under the second rubric, the politics of reproduction, BATTIN reiterated
4030 expressed more dramatically than in the conversion of brittle books to
4033 information is not lost through reproduction. In the analog world of
4034 photocopies and microfilm, the issue of fidelity to the original becomes
4035 paramount, as do issues of "Whose fidelity?" and "Whose original?"
4038 conducted by the CPA on the problems of preserving text and image.
4039 Discussions with scholars, librarians, and curators in a variety of
4040 disciplines dependent on text and image generated a variety of concerns,
4041 for example: 1) Copy what is, not what the technology is capable of.
4042 This is very important for the history of ideas. Scholars wish to know
4045 presentation. 2) The fidelity of reproduction--what is good enough, what
4046 can we afford, and the difference it makes--issues of subjective versus
4048 users. Restricting the definition of primary user to the one in whose
4050 reality that these printed books have had a host of other users from a
4051 host of other disciplines, who not only were looking for very different
4052 things, but who also shared values very different from those of the
4053 primary user. 4) The relationship of the standard of reproduction to new
4054 capabilities of scholarship--the browsing standard versus an archival
4060 enhanced access, degrees of fidelity, and costs?
4063 reproduction process, and add to the long list of technical standards
4065 analyze the costs that are attached to the different levels of standards
4069 foreseeable future, BATTIN urged adoption of the following general
4072 * Strive to understand the changing information requirements of
4074 the process of research and scholarly communication in order to meet
4088 provides for total convertibility: OCR, scanning of microfilm,
4091 * Work closely with the generators of information and the builders
4092 of networks and databases to ensure that continuing accessibility is
4097 take advantage of that which is being standardized for the rest of
4101 rather than perfecting the longevity of a particular medium.
4111 under the auspices of AIIM. TIFF is a company product, not a standard,
4125 becoming an increasingly important part of the imaging business. Many
4127 and more character data as part of their imaging system. Re the issue of
4130 Does one attempt to eliminate the use of operators where possible?
4142 LESK * Roles of participants in CORE * Data flow * The scanning process *
4143 The image interface * Results of experiments involving the use of
4144 electronic resources and traditional paper copies * Testing the issue of
4154 feature of page segmentation, and 2) the use made of the text and the
4159 of the most important chemistry journals in the United States), CORE is
4161 quarter of the pages by square inch are made up of images of
4162 quasi-pictorial material; dealing with the graphic components of the
4163 pages is extremely important. LESK described the roles of participants
4165 journals on microfilm, and some of the definitions of the files; 2) at
4167 performs experiments on the users of chemical abstracts, and supplies the
4168 indexing and numerous magnetic tapes; 3) Cornell provides the site of the
4176 prefer to have more gray level, because one of the ACS journals prints on
4187 usually of the titles of articles contained in an issue--that derives
4192 LESK next presented the results of an experiment conducted by Dennis Egan
4193 and involving thirty-six students at Cornell, one third of them
4195 majors, and one third graduate chemistry students. A third of them
4197 abstracts on paper. A third received image displays of the pictures of
4209 paper took more than fifteen minutes on average, and yet most of them
4214 In the browsing study, the students were given a list of eight topics,
4215 told to imagine that an issue of the Journal of the American Chemical
4224 was not. (The reason, of course, was that they were performing word
4228 This question also contained a trick to test the issue of serendipity.
4229 The students were given another list of eight topics and instructed,
4230 without taking a second look at the journal, to recall how many of this
4231 new list of eight topics were in this particular issue. This was an
4238 (LESK gave a parenthetical illustration of the learning curve of students
4242 using print, but by the third of the three sessions in the series had
4244 better means of finding what one wants to read; reading speeds, once the
4245 object of the search has been found, are about the same.
4247 Almost none of the students could perform the hard task--the analogous
4248 transformation. (It would require the expertise of organic chemists to
4265 twenty man-years of work in programming and polishing, was not winning,
4273 ERWAY * Most challenging aspect of working on AM * Assumptions guiding
4274 AM's approach * Testing different types of service bureaus * AM's
4276 Additional factors influencing AM's approach to coding * Results of AM's
4278 * Quality control the most time-consuming aspect of contracting out
4282 To Ricky ERWAY, associate coordinator, American Memory, Library of
4283 Congress, the constant variety of conversion projects taking place
4284 simultaneously represented perhaps the most challenging aspect of working
4286 conversion but a tool kit of solutions to apply to LC's varied
4288 process of converting text to machine-readable form, and the variety of
4293 to perform the conversion inhouse. Because of the variety of formats and
4294 types of texts, to capitalize the equipment and have the talents and
4299 several types of operations take place at the same time. 2) AM was not a
4302 mattered little to AM. What mattered were cost and accuracy of results.
4304 AM considered different types of service bureaus and selected three to
4305 perform several small tests in order to acquire a sense of the field.
4308 eighteenth-century printed broadsides on microfilm. On none of these
4316 means one or two errors per page. The initial batch of test samples
4326 to retain any of the intellectual content represented by the formatting
4327 of the document (which would be lost if one performed a straight ASCII
4329 but were used without the benefit of document-type definitions. AM found
4333 coding included: 1) the inability of any known microcomputer-based
4334 user-retrieval software to take advantage of SGML coding; and 2) the
4335 multiple inconsistencies in format of the older documents, which
4340 The five text collections that AM has converted or is in the process of
4341 converting include a collection of eighteenth-century broadsides, a
4342 collection of pamphlets, two typescript document collections, and a
4343 collection of 150 books.
4345 ERWAY next reviewed the results of AM's experience with rekeying, noting
4346 again that because the bulk of AM's materials are historical, the quality
4347 of the text often does not lend itself to OCR. While non-English
4350 are nearly incapable of converting handwritten text. Another
4351 disadvantage of working with overseas keyers is that they are much less
4358 retaining the image, even if they perform OCR. Thus, questions of image
4359 format and storage media were somewhat novel to many of them. ERWAY also
4361 their inability to perform text conversion from the kind of microfilm
4365 aspect of contracting out conversion. AM has been attempting to perform
4377 means a third keying or another complete run-through of the text.
4382 in some of the OCR technology, some of the processes, and some of the
4385 the day that the entire contents of LC are available on-line.
4391 conversion * Per page cost of performing OCR * Typical problems
4397 the question of why one attempts to perform full-text conversion: 1)
4398 Text in an image can be read by a human but not by a computer, so of
4408 actual cost per average-size page of approximately $7. NAL scans the
4412 Praising the celerity of her student workers, ZIDAR observed that editing
4428 key it from scratch. NAL has also experimented with partial editing of
4438 concluded that rekeying of text may be the best route to take, in spite
4439 of numerous problems with quality control and cost.
4447 Distinction between the structure of a document and its representation
4452 comments about modifying an image before one reaches the point of
4454 significant amount of redundant data, such as form-type data, numerous
4455 companies today are working on various kinds of form renewal, prior to
4460 terms of either capital investment or service, and determines the quality
4461 of the remainder of one's system, because it determines the character of
4467 everything, including building of the database. The project undertaken
4470 for the software and building of the database. The Acid Rain Project--a
4471 two-disk set produced by the University of Vermont, consisting of
4473 including keying of the text, which was double keyed, scanning of the
4474 images, and building of the database. The in-house project offered
4475 considerable ease of convenience and greater control of the process. On
4482 increase the costs. Thus, conversion of the text, including the coding,
4486 precluded the necessity of going through the request-for-proposal process
4494 ZIDAR detailed the elements that constitute the previously noted cost of
4496 of errors, and spell-checkings, which though they may sound easy to
4497 perform require, in fact, a great deal of time. Reformatting text also
4498 takes a while, but a significant amount of NAL's expenses are for equipment,
4499 which was extremely expensive when purchased because it was one of the few
4500 systems on the market. The costs of equipment are being amortized over
4503 HOCKEY raised a general question concerning OCR and the amount of editing
4504 required (substantial in her experience) to generate the kind of
4507 extend the previous question about the cost-benefit of adding or exerting
4511 original materials quickly from the hands of the people performing the
4524 of document elements, and which they hoped to extend, WEIBEL said that in
4525 fact one can recognize the major elements of a document with a fairly
4526 high degree of reliability, at least as good as OCR. STEVENS drew a
4528 for a document-type definition the structure of the document), and what
4530 forms of emphasis. Thus, two different components are at work, one being
4531 the structure of the document itself (its logic), and the other being its
4539 HOCKEY * Text in ASCII and the representation of electronic text versus
4540 an image * The need to look at ways of using markup to assist retrieval *
4548 what one can do with a text in ASCII and the representation of electronic
4552 use markup and methods of preparing the text to take full advantage of
4553 the capability of the computer. That would lead to a discussion of what
4557 would lead to issues of improving intellectual access.
4559 HOCKEY urged the need to look at ways of using markup to facilitate retrieval,
4564 She pressed the desideratum of going beyond Boolean searches and performing
4565 more sophisticated searching, which the insertion of more markup in the text
4570 electronic text, which was developed through the use of computers in the
4575 compilation of dictionaries, language studies, and language analysis, in
4576 which people have built up archives of text and have begun to recognize
4579 byproduct of what one wants to do, but to structure it inside the computer
4586 keying of texts, more automated ways of developing data * Project ADAPT
4588 Advantages of SGML * Data should be free of procedural markup;
4590 Storage requirements and costs for putting a lot of information on line *
4598 keying of texts, one would like to move toward much more automated ways
4599 of developing data.
4603 indexing and also a little bit of automatic formatting and tagging of
4606 WEIBEL's principal concern at the moment. This project is an example of
4611 WEIBEL cited the Online Journal of Current Clinical Trials as an example
4612 of de novo electronic publishing, that is, a form in which the primary
4613 form of the information is electronic.
4615 Project ADAPT, then, which OCLC completed a couple of years ago and in
4620 retroconversion of materials, will make it possible to accomplish more.
4626 of intelligent character recognition and asserted that what is wanted is
4628 merging of multiple optical character recognition systems that will
4629 reduce errors from an unacceptable rate of 5 characters out of every
4630 l,000 to an unacceptable rate of 2 characters out of every l,000, but it
4637 the on-line system in virtually all of the reference products at OCLC.
4645 WEIBEL next outlined the critical role of SGML for a variety of purposes,
4646 for example, as noted by HOCKEY, in the world of extremely large
4648 WEIBEL argued that by building the structure of the data in (i.e., the
4649 structure of the data originally on a printed page), it becomes easy to
4651 where the title or author is, or what the sections of that document would be.
4655 The second big advantage of SGML is that it gives one the ability to
4660 the elements of a document.
4666 By keeping one's database free of that kind of contamination, one can
4668 that are not cramped by built-in notions of what should be italic and
4671 subsequent illustrated examples of markup, WEIBEL acknowledged the common
4676 printed version of the document.
4678 WEIBEL next illustrated an extremely cluttered screen dump of OCLC's
4680 the screen. (He noted parenthetically that he had become a supporter of
4681 X-Windows as a result of the progress of the CORE Project.) WEIBEL also
4682 illustrated the two major parts of the interface: l) a control box that
4683 allows one to generate lists of items, which resembles a small table of
4685 viewer, which is a separate process in and of itself. He demonstrated
4691 Given the constraints of time, WEIBEL omitted a large number of ancillary
4693 what will be required to put a lot of things on line. Since it is
4694 extremely expensive to reconvert all of this data, especially if it is
4699 go back and look at bit-maps of pages), one can get 10,000 journals of
4701 approximately 135 gigabytes of storage, which is not all that much,
4703 be required. WEIBEL calculated the costs of storing this information as
4705 approximately $1 million to buy in terms of hardware. One also needs a
4708 year for a supported terabyte of data.
4714 supporting publication of the journal * Cost of building tagged text into
4721 supported the publication of the journal. Although they are not tagged
4733 cost of building tagged text into the database, which is small.
4739 of recognizing that all representation is encoding * Dealing with
4740 complicated representations of text entails the need for a grammar of
4741 documents * Variety of forms of formal grammars * Text as a bit-mapped
4744 reusability and longevity of data * TEI conformance explicitly allows
4745 extension or modification of the TEI tag set * Administrative background
4746 of the TEI * Several design goals for the TEI tag set * An absolutely
4747 fixed requirement of the TEI Guidelines * Challenges the TEI has
4749 issue of reproducibility or processability * The issue of mages as
4750 simulacra for the text redux * One's model of text determines what one's
4755 Text Encoding Initiative (TEI), University of Illinois-Chicago, first drew
4762 argued, leads to the recognition of two things: 1) The topic description
4764 of pros and cons of text-coding unless what one means is pros and cons of
4766 computer without some sort of encoding; images are one way of encoding text,
4768 information loss, that is, there is no perfect reproduction of a text that
4770 What is the most useful representation of text for a serious work?
4771 This depends on what kind of serious work one is talking about.
4774 information and fairly complex manipulation of the textual material.
4777 as part of one's representation of the text. Thus, one needs to store the
4778 structure in the text. To deal with complicated representations of text,
4779 one needs somehow to control the complexity of the representation of a text;
4780 that means one needs a way of finding out whether a document and an
4781 electronic representation of a document is legal or not; and that
4782 means one needs a grammar of documents.
4784 SPERBERG-McQUEEN discussed the variety of forms of formal grammars,
4786 argued that these grammars correspond to different models of text that
4787 different developers have. For example, one implicit model of the text
4791 distinguished several kinds of text that have a sort of hierarchical
4799 displays was the model of text as a bit-mapped image, an image of a page,
4804 electronic form. Many of their problems stem from the fact that they are
4806 page, thus making them representations of representations.
4808 In this situation of increasingly complicated textual information and the
4810 of the need for good textual grammars), one has the introduction of SGML.
4813 general document-type declarations that can handle all sorts of text.
4815 will ensure the kind of reusability and longevity of data discussed earlier.
4816 It offers a way to stay alive in the state of permanent technological
4820 that do some work in controlling the complexity of the textual object but
4822 Fundamental to the notion of the TEI is that TEI conformance allows one
4826 SPERBERG-McQUEEN next outlined the administrative background of the TEI.
4828 for the encoding and interchange of machine-readable text. It is
4831 Literary and Linguistic Computing. Representatives of numerous other
4833 of affiliated projects that have provided assistance by testing drafts of
4836 Among the design goals for the TEI tag set, the scheme first of all must
4837 meet the needs of research, because the TEI came out of the research
4840 emerging standards. In 1990, version 1.0 of the Guidelines was released
4844 been the lack of adequate internal or external documentation for many
4846 contain few fixed requirements, but one of them is this: There must
4848 1) a bibliographic description of the electronic object one is talking
4852 Version 2.0 of the Guidelines was scheduled to be completed in fall 1992
4859 level of markup that people are using now to tag only chapter, section,
4862 detailed markup which many people foresee as the future destination of
4864 present home of numerous electronic texts in specialized areas.
4867 unable to support the kind of applications that draw people who have
4876 The question of how people will tag the text is in large part a function
4877 of their reaction to what SPERBERG-McQUEEN termed the issue of
4879 one wants to work with. Perhaps a more useful concept than that of
4880 reproducibility or recoverability is that of processability, that is,
4885 SPERBERG-McQUEEN returned at length to the issue of images as simulacra
4887 than images of pages of particular editions of the text are needed,
4891 of the text such as its layout on the page, which is not always
4893 pieces of information such as the very important lexical ties between the
4894 English and Latin versions of Comenius's bilingual text, for example.
4896 what a scanned image of the text will accomplish. For example, in order
4897 to study the transmission of texts, information concerning the text
4900 much of the information that one would need if studying those books as
4905 because they do not show up, and on a couple of the marginal marks one
4906 loses half of the mark because the pen is very light and the scanner
4907 failed to pick it up, and so what is clearly a checkmark in the margin of
4908 the original becomes a little scoop in the margin of the facsimile.
4910 also true of light-lens photography, and are remarked here because it is
4912 image of this page with good contrast, we are not replacing the
4917 beyond those who spend all of their time studying text, because one's
4918 model of text determines what one's software can do with a text. Good
4923 provide software with a better model of the text they can make a killing.
4928 DISCUSSION * Implications of different DTDs and tag sets * ODA versus SGML *
4932 Neither AAP (i.e., Association of American Publishers) nor CALS (i.e.,
4935 handle that. Given this state of affairs and assuming that the
4937 other two types, then an institution like the Library of Congress, which
4938 might receive all of their publications, would have to be able to handle
4939 three different types of document definitions and tag sets and be able to
4944 Much of the ODA standard is easier to read and clearer at first reading
4955 for all kinds of information, though more megalomaniacal in attempting to
4956 cover all sorts of documents. The other advantage is that the model of
4957 text represented by SGML is simply an order of magnitude richer and more
4958 flexible than the model of text offered by ODA. Both offer hierarchical
4959 structures, but SGML recognizes that the hierarchical model of the text
4960 that one is looking at may not have been in the minds of the designers,
4963 ODA is not really aiming for the kind of document that the TEI wants to
4964 encompass. The TEI can handle the kind of material ODA has, as well as a
4965 significantly broader range of material. ODA seems to be very much
4973 Responsibilities of a publisher * Reproduction of Migne's Latin series
4980 Chadwyck-Healey, Inc., spoke from the perspective of a publisher re
4981 text-encoding, rather than as one qualified to discuss methods of
4986 text files (such as PLD), one cannot avoid making personal judgments of
4990 notions have become axioms for him in the consideration of future sources
4992 as any other kind of publishing, and questions of if and how to encode
4993 the data are simply a consequence of that prior decision; 2) all
4998 of it. Finding the specialist to advise in this process is the core of
5001 responsibility of a publisher is to represent the desires of scholars and
5007 production of the Patrologia series in the mid-nineteenth century.
5012 for theologians. It is a bedrock source for the study of Western
5016 offered direct judgments on the question of appropriateness of these
5023 community asserted the need for normative tagging structures of important
5026 impact of electronic text sources on 80 or 90 or 100 doctoral
5029 one edition of Ambrose's De Anima, and they also understand that the
5033 schemes every single discrete area of a text that might someday be
5041 bounds of private research, though exporting tag files from a CD-ROM
5046 it of the interchangeability and portability these important texts should
5048 searching require care in text selection and strongly support encoding of
5054 Chadwyck-Healey and the board it offers the widest possible array of
5056 by urging the encoding of all important text sources in whatever way
5060 final release of the TEI Guidelines.)
5066 The TEI and the issue of interchangeability of standards * A
5068 LC in the event that a multiplicity of DTDs develops * Producing images
5075 favor of creating texts with markup and on trends in encoding. In the
5078 millions of words of data. It therefore becomes important to consider
5081 toward this end include building on a computer version of a dictionary
5083 about the semantic structure or semantic field of a word, its grammatical
5087 in creating: 1) machine-readable versions of dictionaries that can be
5091 a dynamic tool for searching mechanisms; 2) large bodies of text to study
5095 seen much interest in studying the structure of printed dictionaries
5097 many words from those is only partial, one or two definitions of the
5098 common or the usual meaning of a word, and then numerous definitions of
5102 current interest in developing large bodies of text in computer-readable
5106 compilation of 100 million words of British English: about 10 percent of
5109 adjectives, or other parts of speech. This tagging can then be used by
5110 programs which will begin to learn a bit more about the structure of the
5114 refine the tagging process and thus the bigger body of text one can build
5118 recommended the development of software tools that will help one begin to
5120 images of that text in that format and to using more intelligence to help
5123 HOCKEY posited the need to think about common methods of text-encoding
5124 for a long time to come, because building these large bodies of text is
5140 SPERBERG-McQUEEN replied that the published drafts of the TEI had met
5142 one to handle X or Y or Z. Particular concerns of the affiliated
5143 projects have led, in practice, to discussions of how extensions are to
5144 be made; the primary concern of any project has to be how it can be
5148 from the beginning, because none of it is required and very little is
5153 projects in a set of twenty TEI-conformant projects will not necessarily
5154 tag the material in the same way. One result of the TEI will be that the
5155 easiest problems will be solved--those dealing with the external form of
5158 the adoption of a common notation, the differences in the underlying
5159 conceptions of what is interesting about texts become more visible.
5160 The success of a standard like the TEI will lie in the ability of
5161 the recipient of interchanged texts to use some of what it contains
5168 the example of the MARC records, namely, the formats that are the same
5172 STEVENS opined that the producers of the information will set the terms
5174 of their products), creating a situation that will be problematical for
5175 an institution like the Library of Congress, which will have to deal with
5176 the DTDs in the event that a multiplicity of them develops. Thus,
5185 of many people that merely by producing images, POB was not really
5191 with a set of images.
5194 consisting wholly of images. At first sight, organizing graphic images
5196 advantages of the scheme WATERS described would be precisely that
5197 ability to move into something that is more of a multimedia document:
5198 a combination of transcribed text and page images. WEIBEL concurred in
5206 OCR or even just through keying. For WATERS, the labor of composing the
5207 document and saying this set of documents or this set of images belongs
5212 structure of the documents, though. They have some recommendations about
5216 That in no way contradicts the use of AAP tag sets.
5220 needed, and a fairly long critique of the naming conventions, which has
5221 led to a very different style of naming in the TEI. He stressed the
5222 importance of the opposition between prescriptive markup, the kind that a
5235 environment * Review of copyright law in the United States * The notion
5236 of the public good and the desirability of incentives to promote it *
5238 of copyright holders * Publishers' concerns in today's electronic
5239 environment * Compulsory licenses * The price of copyright in a digital
5246 Marybeth PETERS, policy planning adviser to the Register of Copyrights,
5247 Library of Congress, made several general comments and then opened the
5248 floor to discussion of subjects of interest to the audience.
5250 Having attended several sessions in an effort to gain a sense of what
5255 then, from a copyright point of view, one is creating something and
5259 immediately arises about the status of the materials in question.
5266 in other countries. Thus, one must consider all of the places a
5269 demanding discussion of what one is doing.
5274 dissemination of intellectual works for the good of society as a whole;
5279 emotional issue. The United States has never accepted the notion of the
5280 natural right of an author so much as it has accepted the notion of the
5281 public good and the desirability of incentives to promote it. This state
5282 of affairs, however, has created strains on the international level and
5283 is the reason for several of the differences in the laws that we have.
5284 Today the United States protects almost every kind of work that can be
5285 called an expression of an author. The standard for gaining copyright
5288 minimal amount of authorship. One can also acquire copyright protection
5289 for making a new version of preexisting material, provided it manifests
5290 some spark of creativity.
5296 results of a process called declicking, in which one mechanically removes
5298 hand, the choice to record a song digitally and to increase the sound of
5299 violins or to bring up the tympani constitutes the results of conversion
5301 the United States, one generally needs the permission of the copyright
5303 -material is a matter of contract. In the absence of a contract, the
5311 work done by a federal employee as part of his or her official duties is
5313 of doubt concerning whether or not the work is in the public domain
5320 issue in the United States, they may be in different parts of the world,
5321 where most countries previously employed a copyright term of the life of
5324 PETERS next reviewed the economics of copyright holding. Simply,
5325 economic rights are the rights to control the reproduction of a work in
5326 any form. They belong to the author, or in the case of a work made for
5329 one of the most significant rights of authors, particularly in an
5333 all rights of distribution are extinguished with the sale of that copy.
5334 The key is that it must be sold. A number of companies overcome this
5337 of a work. The fourth right, and one very important in a digital world,
5338 is a right of public performance, which means the right to show the work
5339 sequentially. For example, copyright owners control the showing of a
5341 side of public performance is something called the right of public
5343 to very limited visual works of art, but in theory may apply under
5344 contract and other principles. Moral rights may include the right of an
5345 author to have his or her name on a work, the right of attribution, and
5346 the right to object to distortion or mutilation--the right of integrity.
5349 preservation; to use of material for scholarly and research purposes when
5350 the user does not make multiple copies; and to the generation of
5351 facsimile copies of unpublished works by libraries for themselves and
5353 distributor of the product for the entire world. In today's electronic
5358 example, from access and use. Hence, the development of site licenses
5359 and other kinds of agreements to cover what publishers believe they
5363 Noting that the United States is a member of the Berne Convention and
5365 She also defined compulsory licenses. A compulsory license, of which the
5370 succeeded in providing for use of a work. Often overlooked when one
5373 PETERS, the price of copyright in a digital medium, whatever solution is
5387 practical matter, if one believes she or he has made enough of those
5404 employee, if written as part of official duties, is not
5406 by a National Institutes of Health grantee (i.e., someone who
5410 author retains copyright. If a provision of the contract, grant, or
5415 * An enhanced electronic copy of a print copy of an older reference
5417 material is a purely mechanical rendition of the original work, and
5421 it. For example, Congress recently passed into law the concept of
5427 * Concerning whether or not the United States keeps track of when
5441 Society of Composers, Authors, and Publishers), and BMI (i.e., Broadcast
5443 due. Of course, people ought not to copy a creative product without
5444 paying for it; there should be some compensation. But the truth of the
5447 until he becomes a big guy. That is true of every author, every
5448 composer, everyone, and, unfortunately, is part of life.
5450 Copyright always originates with the author, except in cases of works
5459 and work out deals. With regard to use of a work, it usually is much
5467 The notion of copyright law is that it resides with the individual, but
5474 original publisher to try to control all of the versions and all of the
5485 Desiderata in planning the long-term development of something * Questions
5486 surrounding the issue of electronic deposit * Discussion of electronic
5487 deposit as an allusion to the issue of standards * Need for a directory
5488 of preservation projects in digital form and for access to their
5489 digitized files * CETH's catalogue of machine-readable texts in the
5491 Need for LC to deal with the concept of on-line publishing * LC's Network
5492 Development Office exploring the limits of MARC as a standard in terms
5493 of handling electronic information * Magnitude of the problem and the
5496 point * Development of a network version of AM urged * A step toward AM's
5497 construction of some sort of apparatus for network access * A delicate
5498 and agonizing policy question for LC * Re the issue of electronic
5499 deposit, LC urged to initiate a catalytic process in terms of distributed
5502 people to think through long-term cooperation * Clarification of the
5506 In his role as moderator of the concluding session, GIFFORD raised two
5508 commonalities among those of us that have been here for two days so that
5509 we can see courses of action that should be taken in the future? And, if
5512 the Library of Congress in all this? Of course, the Library of Congress
5513 holds a rather special status in a number of these matters, because it is
5517 Describing himself as an uninformed observer of the technicalities of the
5522 dimension, the accessibility of the processability, the portability of
5528 had contributed anything in the way of bringing together a different group
5529 of people from those who normally appear on the workshop circuit.
5533 coming together of people working on texts and not images. Attempting to
5537 stage it can be interpreted into text, and find a common way of building
5542 In planning the long-term development of something, which is what is
5544 of discussing the technical aspects of how one does it but particularly
5545 of thinking about what the people who use the stuff will want to do.
5547 electronic text or material that nobody ever thought of in the beginning.
5549 LESK, in response to the question concerning the role of the Library of
5550 Congress, remarked the often suggested desideratum of having electronic
5551 deposit: Since everything is now computer-typeset, an entire decade of
5555 absence of PETERS, GIFFORD replied that the question was being
5556 actively considered but that that was only one dimension of the problem.
5557 Another dimension is the whole question of the integrity of the original
5563 software, one had to submit a paper copy of the first and last twenty
5564 pages of code--something that represented the work but did not include
5566 measure, LC has claimed the right to demand electronic versions of
5572 keep track of the appropriate computers, software, and media? The situation
5579 Macintosh version and the IBM-compatible version of software. It does
5589 connection, GIFFORD reiterated the need to work out some sense of
5590 distributive responsibility for a number of these issues, which
5595 to serve as a depository of tapes in an electronic manuscript standard.
5603 ANDRE viewed this discussion as an allusion to the issue of standards.
5612 plans of a number of projects to carry out preservation by creating
5616 the impression that many of these institutions would be willing to make
5622 scrutiny because it seemed to be connected to some of the basic issues of
5623 cataloging and distribution of records. It would be foolish, given the
5624 amount of work that all of us have to do and our meager resources, to
5633 somebody who is thinking of performing preservation activity on that work
5636 for preservation purposes but for the convenience of people looking for
5637 this material. She endorsed LYNCH's dictum that duplication of this
5640 HOCKEY informed the Workshop about one major current activity of CETH,
5641 namely a catalogue of machine-readable texts in the humanities. Held on
5643 to digitized images of text. She is exploring ways to improve the
5651 the National Copyright Depository of Electronic Materials. Of course
5654 to that set of problems and returned the discussion to the issue raised
5655 by LYNCH--whether or not putting the kind of records that both BATTIN and
5658 some kind of directory for these kinds of materials. In a situation
5661 suggested, RLIN is helpful, but it is not helpful in the case of a local,
5664 patron, even though one did not digitize it, if it is out of copyright.
5666 amount of real-time look-up, which would be awkward at best, or
5673 That represents LC's new form of library loan. Perhaps LC's new on-line
5674 catalogue is an amalgamation of all these catalogues on line. LYNCH
5687 this catalogue, because that raises the question of what constitutes a
5689 in OCLC's Office of Research is also wrestling with this particular
5691 contended that a majority of texts in the humanities are in the hands
5692 of either a small number of large research institutions or individuals
5698 issue involved the responsibility of a publisher. The fact that someone
5707 LEBRON expressed puzzlement at the variety of ways electronic publishing
5708 has been viewed. Much of what has been discussed throughout these two
5711 Sooner or later LC will have to deal with the concept of on-line
5718 sometimes realized. Lacking clear answers to all of these questions
5720 in helping to define some of them for quite a while.
5723 among other things, to explore the limits of MARC as a standard in terms
5724 of handling electronic information. GREENFIELD also noted that Rebecca
5726 Information Science (ASIS) summarizing several of the discussion papers
5727 that were coming out of the Network Development Office. GREENFIELD said
5729 of feedback received today concerning the difficulties of identifying and
5731 would be aware of that and somehow contribute to that conversation.
5733 Noting two of LC's roles, first, to act as a repository of record for
5740 limited set of formats, and then develop mechanisms for allowing people
5743 LC does that with most of its bibliographic records, BESSER said, which
5745 as most of LC's books are available in some form through interlibrary
5754 the magnitude of the problem of what to keep and what to select. GIFFORD
5761 direction and defining how LC could do so, for example, in areas of
5762 standardization or distribution of responsibility.
5764 FLEISCHHAUER added that AM was fully engaged, wrestling with some of the
5765 questions that pertain to the conversion of older historical materials,
5766 which would be one thing that the Library of Congress might do. Several
5769 networking of bibliographic information, as well as preservation itself.
5772 database, LYNCH urged development of a network version of AM, or
5773 consideration of making the data in it available to people interested in
5774 doing network multimedia. On account of the current great shortage of
5776 problems, this course of action could have a significant effect on making
5780 LC's Office of Information Technology Services that attempts to associate
5781 digital images of photographs with cataloguing information in ways that
5783 construction of some sort of apparatus for access. Further, AM has
5796 Returning the discussion to what she viewed as the vital issue of
5798 process in terms of distributed responsibility, that is, bring together
5801 issues of how we deal with the management of electronic information will
5808 a minimal number of publishers and minimal copyright problems.
5810 GRABER remarked the recent development in the scientific community of a
5813 in the humanities. Although the National Library of Medicine found only
5818 troubles with the commercial publishers of electronic media in acquiring
5820 they would not be able to cover their costs and would lose control of
5827 request one copy of that, or two copies if it is the only version, and
5828 can request copies of software, but that fails to address magazines or
5831 GIFFORD acknowledged the thorny nature of this issue, which he illustrated
5832 with the example of the cumbersome process involved in putting a copy of a
5835 of Workshop participants in thinking through a number of these problems.
5841 her perspective on the usefulness of text as images. MYLONAS framed the
5842 issues in a series of questions: How do we acquire machine-readable
5843 text? Do we take pictures of it and perform OCR on it later? Is it
5845 FLEISCHHAUER agreed with MYLONAS's framing of strategic questions, adding
5846 that a large institution such as LC probably has to do all of those
5853 of the health and of the immaturity of the field, and more time would
5858 the preservation of knowledge for the future, not simply for particular
5859 research use. In the case of Perseus, MYLONAS said, the assumption was
5862 archival copy for purposes of preservation in the case of, say, the Bill
5863 of Rights, in the sense that the scanned images are effectively the
5883 Library of Congress
5900 Fleischhauer, Coordinator, American Memory, Library of
5906 Broad description of the range of electronic information.
5907 Characterization of who uses it and how it is or may be used.
5918 of Congress (Beyond the scholar)
5925 Each presentation to consist of a fifteen-minute
5936 2. Other humanities projects employing the emerging norms of
5944 Ricky Erway, Associate Coordinator, Library of Congress
5947 Institute: The Papers of George Washington, University
5948 of Virginia
5953 full-text searchability: The Online Journal of Current
5955 of Science
5958 6. A project that offers facsimile images of pages but omits
5981 Librarian for Special Projects, Library of Congress
5982 Clifford A. Lynch, Director, Library Automation, University of
5984 Howard Besser, School of Library and Information Science,
5985 University of Pittsburgh
5986 Ronald L. Larsen, Associate Director of Libraries for
5987 Information Technology, University of Maryland at College
6002 9:00 AM Session IV. Image Capture, Text Capture, Overview of Text and
6005 Moderator: William L. Hooton, Vice President of Operations,
6008 A) Principal Methods for Image Capture of Text:
6010 Use of microform
6012 Anne R. Kenney, Assistant Director, Department of Preservation
6025 Carl Fleischhauer, Coordinator, American Memory, Library of
6028 National Library of Medicine (NLM)
6033 11:00 AM Session IV. Image Capture, Text Capture, Overview of Text and
6038 Jean Baronas, Senior Manager, Department of Standards and
6046 Standards of accuracy and use of imperfect texts
6053 Ricky Erway, Associate Coordinator, American Memory, Library of
6065 Discussion of approaches to structuring text for the computer;
6066 pros and cons of text coding, description of methods in
6067 practice, and comparison of text-coding methods.
6073 University of Illinois-Chicago
6081 Marybeth Peters, Policy Planning Adviser to the Register of
6082 Copyrights, Library of Congress
6090 anything? What should the Library of Congress do next, if
6093 Library of Congress
6106 Avra MICHELSON Forecasting the Use of Electronic Texts by
6110 to be used by the non-scientific scholarly community. Many of the
6114 The speaker assesses 1) current scholarly use of information technology
6118 current use of electronic texts is explored broadly within the context of
6119 scholarly communication. From the perspective of scholarly
6120 communication, the work of humanities and social sciences scholars
6121 involves five processes: 1) identification of sources, 2) communication
6122 with colleagues, 3) interpretation and analysis of data, 4) dissemination
6123 of research findings, and 5) curriculum development and instruction. The
6124 extent to which computation currently permeates aspects of scholarly
6125 communication represents a viable indicator of the prospects for
6128 The discussion of current practice is balanced by an analysis of key
6129 trends in the scholarly use of information technology. These include the
6131 framework for forecasting the use of electronic texts through this
6132 millennium. The presentation concludes with a summary of the ways in
6134 electronic texts, and the implications of that use for information
6138 Use of American Memory in Public and
6141 This joint discussion focuses on nonscholarly applications of electronic
6142 library materials, specifically addressing use of the Library of Congress
6143 American Memory (AM) program in a small number of public and school
6144 libraries throughout the United States. AM consists of selected Library
6145 of Congress primary archival materials, stored on optical media
6149 represent a variety of formats including photographs, graphic arts,
6153 In 1991, the Library of Congress began a nationwide evaluation of AM in
6154 different types of institutions. Test sites include public libraries,
6157 Joanne FREEMAN will discuss their observations on the use of AM by the
6161 VECCIA will comment on the overall goals of the evaluation project, and
6162 the types of public and school libraries included in this study. Her
6163 comments on nonscholarly use of AM will focus on the public library as a
6165 and informal education. FREEMAN will discuss the use of AM in school
6167 questions about the use of electronic resources, as well as definite
6168 benefits gained by the "nonscholar." Topics will include the problem of
6171 awakened through use of electronic resources.
6179 available version of its hypertextual database of multimedia materials on
6181 comprised of readers at the student and scholar levels. As such, it must
6183 contain enough detail to serve the different needs of its users. In
6193 without thinking about the restrictions of the delivery system. We have
6197 of the data. [A discussion of these solutions as of two years ago is in
6200 The Computerization of Classical Databases, J. Solomon and T. Worthen
6201 (eds.), University of Arizona Press, in press.]
6203 Much of the work on Perseus is focused on collecting and converting the
6205 provide means of access to the information, in order to make it usable,
6210 avoid favoring any one type of use by allowing multiple forms of access
6213 The way text is handled exemplifies some of these principles. All text
6214 in Perseus is tagged using SGML, following the guidelines of the Text
6220 content of the texts, and greatly speeds all the processing performed on
6227 within a text, and that all versions of our texts, regardless of delivery
6232 Together with the index, the Greek-English Lexicon, and the index of all
6233 the English words in the definitions of the lexicon, the morphological
6234 analyses comprise a set of linguistic tools that allow users of all
6239 detailed morphological studies of word use by using the morphological
6240 analyses of the texts. Because these tools were not designed for any one
6259 complexity of the projects present problems for electronic publishers,
6260 but surmountable ones if they remain abreast of the latest possibilities
6263 The issues which required address prior to the commencement of the
6266 1. Editorial selection (or exclusion) of materials in each
6283 6. How does the emergence of national and international education
6284 networks affect the use and viability of research projects
6290 From new notions of "scholarly fair use" to the future of optical media,
6298 In spring 1988 the editors of the papers of George Washington, John
6300 approached by classics scholar David Packard on behalf of the Packard
6301 Humanities Foundation with a proposal to produce a CD-ROM edition of the
6302 complete papers of each of the Founding Fathers. This electronic edition
6305 that our CD-ROM edition of Washington's Papers will be substantially
6307 the next ten years or so, similar CD-ROM editions of the Franklin, Adams,
6308 Jefferson, and Madison papers also will be available. At the Library of
6310 experience of the Washington Papers in producing the CD-ROM edition, but
6314 edition will provide immense possibilities for the searching of documents
6315 for information in a way never possible before. The kind of technical
6317 soon revolutionize historical research and the production of historical
6318 documents. Unfortunately, much of this new technology is not being used
6319 in the planning stages of historical projects, simply because many
6320 historians are aware only in the vaguest way of its existence. At least
6322 editions, simply because they are not aware of the possibilities of
6323 electronic alternatives and the advantages of the new technology in terms
6324 of flexibility and research potential compared to microfilm. In fact,
6325 too many of us in history and literature are still at the stage of
6327 progress presently, and an equal number of literary projects. While the
6329 are ways in which electronic technology can be of service to both.
6331 Since few of the editors involved in the Founding Fathers CD-ROM editions
6333 of our experience how many of these electronic innovations can be used
6334 successfully by scholars who are novices in the world of new technology.
6335 One of the major concerns of the sponsors of the multitude of new
6337 volumes. Most of these editions are being published in small quantities
6338 and the publishers' price for them puts them out of the reach not only of
6339 individual scholars but of most public libraries and all but the largest
6344 What attracted us most to the CD-ROM edition of The Papers of George
6346 edition of all of the 135,000 documents we have collected available in an
6350 edition will carry none of the explanatory annotation that appears in the
6351 published volumes, we also feel that the use of the CD-ROM will lead many
6354 In addition to ignorance of new technical advances, I have found that too
6357 work. I intend to discuss some of the arguments traditionalists are
6358 advancing to resist technology, ranging from distrust of the speed with
6360 better than CD-ROM) to suspicion of the technical language used to
6365 The Online Journal of Current Clinical Trials, a joint venture of the
6366 American Association for the Advancement of Science (AAAS) and the Online
6369 This presentation will discuss the genesis and start-up period of the
6370 journal. Topics of discussion will include historical overview,
6371 day-to-day management of the editorial peer review, and manuscript
6372 tagging and publication. A demonstration of the journal and its features
6378 Corporation, with the support of the Commission on Preservation and
6382 project goes beyond that, however, to investigate some of the issues
6393 of the development and testing process for enhancements to the CLASS
6394 software system. The collaborative nature of this relationship is
6398 A digital library of 1,000 volumes (or approximately 300,000 images) has
6400 The library includes a collection of select mathematics monographs that
6403 various capabilities of the scanning system.
6405 One project objective is to provide users of the Cornell library and the
6406 library staff with the ability to request facsimiles of digitized images
6409 by a committee of Cornell librarians and computer professionals. This
6417 the development of a network resident image conversion and delivery
6423 During the show-and-tell session of the Workshop on Electronic Texts, a
6424 prototype view station will be demonstrated. In addition, a display of
6427 overview of the project will include a slide presentation that
6428 constitutes a "tour" of the preservation digitizing process.
6430 The final network-connected version of the viewing station will provide
6432 and will also provide the capability of viewing images directly. This
6436 The Joint Study in Digital Preservation has generated a great deal of
6438 fortunately, this project serves to raise a vast number of other issues
6439 surrounding the use of digital technology for the preservation and use of
6447 What do we have to consider in building and distributing databases of
6449 a variety of concerns that need to be addressed before a multimedia
6452 In the past it has not been feasible to implement databases of visual
6453 materials in shared-user environments because of technological barriers.
6454 Each of the two basic models for multi-user multimedia databases has
6457 incredibly complex (and expensive) infrastructure. The economies of
6462 The digital multimedia storage model has required vast amounts of storage
6464 cost of such a large amount of storage space made this model a
6469 consider in building digitally stored multi-user databases of visual
6472 can become commonplace and useful to a large number of people.
6474 The key problem is the vast size of multimedia documents, and how this
6476 Anything slower than T-1 speed is impractical for files of 1 megabyte or
6486 This necessitates compression, which itself raises a number of other
6489 lose? To date there has been only one significant study done of
6496 University of California at Berkeley image database) demonstrates the
6497 utility of a client-server topology, but also points to the limitation of
6502 capacity problems while doing nothing to address problems of
6505 We need to examine the effects on network through-put of moving
6513 which they can be accessed by a variety of application software.
6516 issue of providing access to these multimedia documents in
6525 years shows no sign of abating. Roughly speaking, each five-year period
6526 has yielded an order-of-magnitude improvement in price and performance of
6536 Growth in both the population of network users and the volume of network
6538 percent per month. This flood of capacity and use, likened by some to
6540 for libraries. Libraries must anticipate the future implications of this
6546 deployment, and use of this infrastructure. The emerging infrastructure
6550 five years will witness substantial development of the information
6551 infrastructure of the network.
6554 have a fundamental understanding of and appreciation for computer
6567 "Knowledge Navigator" described by John Scully of Apple. But the reality
6568 of library service has been less visionary and the leap to the electronic
6574 of shared research and development in order to make the collective vision
6575 more concrete. The program is working toward the creation of large,
6576 indexed publicly available electronic image collections of published
6578 plan is the result of the first stage of the program, which has been an
6579 investigation of the information technologies available to support such
6580 an effort, the economic parameters of electronic service compared to
6585 The strategic plan envisions a combination of publicly searchable access
6588 management-control system. This combination of technology and
6599 accreditation issues; in public libraries, the potential of electronic
6604 the moment, dominated by a sense of library limits. The continued
6605 expansion and rapid growth of local academic library collections is now
6610 municipal service in a time when the budgets of safety and health
6617 technology-intensive initiatives that offer the potential of decreased
6618 labor costs can provoke the opposition of library staff.
6623 mostly with judicious use of information technologies. The advances in
6625 continuing precipitous drop in computing costs, the growth of the
6629 For example, OCLC has become one of the largest computer network
6631 of more than 6,000 libraries worldwide. On-line public access catalogs
6632 now serve millions of users on more than 50,000 dedicated terminals in
6633 the United States alone. The University of California MELVYL on-line
6636 become the largest group of customers of CD-ROM publishing technology;
6640 This march of technology continues and in the next decade will result in
6642 clear is that libraries can now go beyond automation of their order files
6643 and catalogs to automation of their collections themselves--and it is
6656 recording of 1,000 brittle books as 600-dpi digital images and the
6657 production, on demand, of high-quality and archivally sound paper
6659 Preservation and Access, also investigated some of the issues surrounding
6663 Anne Kenney will focus on some of the issues surrounding direct scanning
6667 cost analysis; storage formats, protocols, and standards; and the use of
6671 highly acceptable for creating paper replacements of deteriorating
6672 originals. The 1,000 scanned volumes provided an array of image-capture
6674 embrittled material, and that defy the use of text-conversion processes.
6678 languages, and a proliferation of illustrated material embedded in text.
6683 The Xerox prototype scanning system provided a number of important
6690 time-and-cost study conducted during the last three months of this
6691 project confirmed the economic viability of digital scanning, and these
6694 From the outset, the Cornell Xerox Project was predicated on the use of
6695 nonproprietary standards and the use of common protocols when standards
6705 structure and a networked image file-server, both of which will be
6708 The presentation will conclude with a discussion of some of the issues
6709 surrounding the use of this technology as a preservation tool (storage,
6715 raster scanning of printed materials. Since 1987, the Library has
6718 libraries. An overview of the project will be presented, giving its
6721 An in-depth discussion of NATDP will follow, including a description of
6722 the scanning process, from the gathering of the printed materials to the
6723 archiving of the electronic pages. The type of equipment required for a
6724 stand-alone scanning workstation and the importance of file management
6730 the usefulness of converting microfilm to electronic images in order to
6731 improve access. With the cooperation of Tuskegee University, NAL has
6732 selected three reels of microfilm from a collection of sixty-seven reels
6733 containing the papers, letters, and drawings of George Washington Carver.
6735 specialized microfilm scanner. The selection, filming, and indexing of
6742 state of planning and organization. The Yale Library has selected a
6745 of risk and uncertainty as well as key issues to be addressed during the
6746 life of the project. The Yale Library is now poised to decide what
6750 The proposal that Yale accepted for the implementation of Project Open
6751 Book will provide at the end of three phases a conversion subsystem,
6755 implementation assumes the existence of Yale's campus ethernet network
6759 estimates for the facilities management of the storage devices and image
6765 Yale staff to generate a detailed analysis of requirements for Project
6766 Open Book. Each vendor used the results of the requirements analysis to
6769 primary vendor partner but also revealed much about the state of the
6775 Project Open Book is focused specifically on the conversion of images
6778 the Yale Library emphasized features of the technology that affect the
6779 technical quality of digital image production and the costs of creating
6780 and storing the image library: What levels of digital resolution can be
6781 achieved by scanning microfilm? How does variation in the quality of
6783 affect the quality of the digital images? What technologies can an
6789 The actual and expected uses of digital images--storage, browsing,
6792 for readers to browse image documents is perhaps the weakest aspect of
6793 imaging technology and most in need of development. As it defined its
6795 of usability for image documents: Does the system have sufficient
6796 flexibility to handle the full range of document types, including
6799 document uniquely for storage and retrieval? Where is the database of
6801 How are basic internal structures of documents, such as pagination, made
6806 microfilm is more than adequate as a medium for preserving the content of
6808 it is increasingly clear that the challenge of digital image technology
6809 and the key to the success of efforts like Project Open Book is to
6810 provide a means of both preserving and improving access to those
6817 In the use of electronic imaging for document preservation, there are
6830 legibility of print and sufficient fidelity in the pseudo-halftoned gray
6834 Compound images consisting of both two-toned text and gray-scale
6835 illustrations must be processed appropriately to retain the quality of
6849 interchangeable among a variety of systems. The applications of
6857 (ANSI) standards developer with more than twenty committees comprised of
6861 in the development of new International Organization for Standardization
6864 This presentation describes the development of AIIM's EIM standards and a
6866 of imaging industries including capture, recording, processing,
6879 Characteristics of standards for digital imagery:
6881 * Nature of digital technology implies continuing volatility.
6893 Significant potential and attractiveness of digital technology as a
6896 Productive use of digital imagery for preservation requires a
6897 reconceptualizing of preservation principles in a volatile,
6900 Concept of managing continuing access in the digital environment
6901 rather than focusing on the permanence of the medium and long-term
6908 * Remove the burden of "archival copy" from paper artifacts.
6919 Stuart WEIBEL The Role of SGML Markup in the CORE Project (6)
6921 The emergence of high-speed telecommunications networks as a basic
6922 feature of the scholarly workplace is driving the demand for electronic
6923 document delivery. Three distinct categories of electronic
6927 1.) Conversion of paper or microfilm archives to electronic format
6928 2.) Conversion of electronic files to formats tailored to
6933 OCLC has experimental or product development activities in each of these
6934 areas. Among the challenges that lie ahead is the integration of these
6935 three types of information stores in coherent distributed systems.
6938 the conversion of large text and graphics collections for which
6941 1980 for its twenty journals. This collection of some 250 journal-years
6945 The use of Standard Generalized Markup Language (SGML) offers the means
6946 to capture the structural richness of the original articles in a way that
6947 will support a variety of retrieval, navigation, and display options
6950 An SGML document consists of text that is marked up with descriptive tags
6951 that specify the function of a given element within the document. As a
6955 formalized map of article structure allows the user interface design to
6957 toward interoperability. Demonstration of this separability is a part of
6958 the CORE project, wherein user interface designs born of very different
6969 A major on-line file of chemical journal literature complete with
6970 graphics is being developed to test the usability of fully electronic
6971 access to documents, as a joint project of Cornell University, the
6974 DigitaI Equipment Corporation, Sony Corporation of America, and Apple
6977 indexing of the articles from Chemical Abstracts Documents is available
6979 used. Our goals are (1) to assess the effectiveness and acceptability of
6981 identify the most desirable functions of the user interface to an
6982 electronic system of journals, including in particular a comparison of
6984 chemistry students on a variety of tasks suggest that searching tasks are
6986 that for reading all versions of the articles are roughly equivalent.
6992 will be related and compared with the experience of having text rekeyed.
7004 authors and disseminators of works includes the right to do or authorize
7007 addition, copyright owners of sound recordings and computer programs have
7008 the right to control rental of their works. These rights are not
7009 unlimited; there are a number of exceptions and limitations.
7012 Copyright owners want to control uses of their work and be paid for any
7018 difficult to deal with. Questions concerning the integrity of the work
7019 and the status of the changed version under the copyright law are to be
7027 Appendix III: DIRECTORY OF PARTICIPANTS
7042 Department of Standards and Technology
7090 Library of Congress
7096 Library of Congress
7106 Library of Congress
7114 Library of Congress
7138 Department of Preservation and Conservation
7148 University of Maryland at College Park
7156 The Online Journal of Current Clinical Trials
7175 University of California,
7176 Office of the President
7194 Department of the Classics
7221 Register of Copyrights
7222 Library of Congress
7230 University of Illinois at Chicago
7239 National Library of Medicine
7247 The Papers of George Washington
7249 University of Virginia
7256 Library of Congress
7281 Library of Congress
7300 Division of Research
7311 Library of Congress
7324 Division of Preservation and Access
7366 Division of Preservation and Access
7393 U.S. Department of Education
7451 Office of Special Projects LM 612
7511 editing required the removal of diacritics, underlining, and fonts such
7516 [A few of the italics (when used for emphasis) were replaced by CAPS mh]
7518 *End of The Project Gutenberg Etext of LOC WORKSHOP ON ELECTRONIC ETEXTS