III: Small: Assisted Emulation for Digital Preservation
Indiana University, Bloomington IN
Investigators
Abstract
For the past 20 years, CD-ROMs have been the primary media for distributing key economic, scientific, environmental, and societal data as well as educational and scholarly work. More than 150,000 titles have been published including thousands distributed by the United States and other governments. Yet no viable strategy has been developed to ensure that these materials will be accessible to future generations of scholars. In the short term, these materials are subject to physical degradation which will make them ultimately unreadable and, in the long-term, technological obsolescence will make their contents unusable. This project will develop practical techniques using off-the-shelf emulators with virtualization software to ensure long-term viability of CD-ROM materials. Although emulation has been widely discussed as a preservation strategy, it suffers from a fundamental flaw, since future users are unlikely to be familiar with legacy software environments and will find such software increasingly difficult to use. Furthermore the user communities of many such materials are sparse and distributed, thus any necessary technical knowledge is unlikely to be available to library patrons. The key objective of this project is to develop the technology and processes necessary to mitigate these flaws and to enable large-scale deployment of emulation by libraries and archives. This project will develop automation technologies to capture the technical knowledge necessary to install and perform common actions with legacy CD-ROM materials in the form of scripts for performing on-the-fly customization of \generic" emulation environments. The long-term vision is to support a distributed CD-ROM collection, developed by a community of libraries, which enables client workstations to access preserved CD-ROM images through customized emulation environments. The project will explore the costs of developing the scripts necessary to automate the use of specific CD-ROMs and the technologies necessary to enable libraries to pool their resources to create a distributed network preserved CD-ROM materials. The project is structured as a two-year pilot study that will develop automation tools, apply these tools to a large (several thousand representative set of CD-ROM materials, evaluate the performance of this approach in a distributed environment, disseminate the tools and scripts as software artifacts, and provide statistics for planning the large-scale preservation of CD-ROM materials. The research performed in this proposal will enable libraries and archives to solve a growing problem while reducing the resources required to maintain their collections of removable media. This project provides a foundation for libraries and archives to pool their intellectual resources by providing access to virtual media collections accessed through shared emulators using community generated scripts. The materials whose preservation will be enabled by this project include key scientific and societal data published by United States and other governments as well as cultural and educational materials from many sources. This project will have a significant impact on undergraduate science education by direct mentoring of undergraduate research assistants and providing the opportunity for their involvement in writing and presenting scholarly works.
View original record on NSF Award Search →