CHS: Small: Human-Directed Optical Music Recognition

$504,197FY2015CSENSF

Indiana University, Bloomington IN

Investigators

Christopher S Raphaelcontact Erik A Stolterman

Abstract

Vast quantities of character-encoded text form the foundation for the information retrieval revolution of recent decades. In contrast, very little symbolically-represented music exists, preventing music from fully participating in the 21st century. The International Music Score Library Project (IMSLP) is a large and rapidly growing open library of public domain machine-printed classical music scores, actively used by many musicians, scholars, and researchers around the world. Optical music recognition (OMR) forms the natural bridge between the IMSLP and the missing symbolic music data. While there has been active OMR research since the 1960s, the state of the art still is not sufficiently well-developed to create symbolic data from realistic documents, as represented on the IMSLP. This is because music notation contains a thicket of special cases, exceptions to general rules, image pathologies, and interpretation challenges, whose recognition requires a deep level of content understanding. With this in mind, the PI has developed prototype software named Ceres for supporting a hybrid human-computer team, in which both machine and person partner in a collaborative recognition effort. The human guides the computer through the recognition task, identifying and providing crucial missing pieces of information, while allowing the computer to fill in the details, consistent with the human guidance. The ultimate goal is to build a Wikipedia-like community centered around Ceres and the IMSLP with the mission of creating a definitive, open access, symbolic music library that distributes music scores electronically and globally, allowing for adaptive display and automatic transformation and registration of scores with audio and video. The prevalence of symbolic music data would open up a world of possibilities to music-science researchers, including systems for music information retrieval, expressive performance, musical accompaniment, transcription and arranging, performance assistance, and many others. Last but not least, the symbolic music library would enable innovative commercial applications; tablet computers will likely be the sheet music "delivery system" of the future, allowing automatic page turning, performance feedback, and various kinds of content-based annotation. The challenge of integrating both human and algorithmic intelligence to create a tractable and efficient OMR solution constitutes the heart of this project. The PI's approach is to adopt the interface paradigm of constrained optimization; the human uses domain understanding to supply crucial missing pieces of information when needed, and the computer uses this guidance to re-recognize and reinterpret subject to these user-supplied constraints. A preliminary experiment conducted by the PI using a medium-difficulty test set showed a 17% error rate on the part of his prototype system, accounting for both false positives and false negatives at the primitive level. The human-computer interface is where the recognition results become tangible and subject to manipulation, so its design is critical; this is an area where the PI expects to make contributions to HCI in general. The PI argues that to be useful for OMR the interface should be almost completely open, providing a set of tools and options, and imposing only the minimal required structure (e.g., staff recognition must be verified before we identify page structure, while the latter must be verified before it is worth continuing to the symbol recognition phase). The interface development strategy will be one of iterative refinement, the evaluation of each version to involve time- and effort-oriented metrics as well as open-ended user comments. For example, the Ceres user interface (the null hypothesis for the current project) superimposes the recognized results on the original image, making discrepancies readily apparent (in contrast to other systems that present side-by-side original and recognized notation which is cognitively more difficult to compare), but maybe even better solutions are possible? Other aspects of the work will include exploration of the roles of visualization (including directing the user's attention) and music playback (hearing the score). OMR is just one of many computer vision problems that fall into the constrained optimization category, and the approach also applies to natural language processing, machine listening, and others; the essential process to be explored here would extend to these domains as well, providing a general template with far-reaching significance. For OMR the constraints are individual pixel labels, but for other problem domains they could equally well refer to labelling of individual samples, words, or whatever fundamental units compose the data. In this way, the approach uses a generic and flexible view for human input that doesn't require the human to understand the inner workings of the recognition processes.

View original record on NSF Award Search →