POSE: Phase I: Scoping the Ecosystem of Skyhook Data Management
University Of California-Santa Cruz, Santa Cruz CA
Investigators
Abstract
New, fast-moving open source ecosystems centered on big data and data science have emerged due to the successful business models in hyperscale computing industries. This Phase 1 project explores sustainable and effective pathways for establishing open source as an alternative translation for technologies using Skyhook as a pilot project. The project will coordinate a series of workshops convening open source experts and community leaders with diverse backgrounds to build expertise for open tech transfer within the university. An important focus in these workshops is the ability to foster a diverse community and encourage participation from historically excluded communities. The project seeks to impacts the adoption of Skyhook technology for reproducible research prototyping, as a teaching tool in classrooms, and for the establishment of open source as a viable translation path of technologies for research universities. Apache Arrow is a representation of columnar data in memory which has created a wide-ranging and rapidly growing open source ecosystem of efficient data processing with many different programming language bindings (e.g., C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust). Due to the common representation, data can move efficiently without conversion between the ecosystem's processing engines running on different systems. Skyhook aims to become a research prototyping ecosystem and a blueprint for efficiently embedding data processing libraries in storage systems and computational storage devices while enabling processing and storage ecosystems to evolve independently. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →