CSR: Medium: Eidetic Systems
Regents Of The University Of Michigan - Ann Arbor, Ann Arbor MI
Investigators
Abstract
The vast majority of state produced by a typical computer is generated, consumed, and then lost forever. Lost state includes process address spaces, deleted files, interprocess communication, and input received from the network. With lost state comes lost value: users cannot recover detailed information about past computations that would be useful for auditing, forensics, debugging, error tracking, and many other purposes. To solve this problem, this project is developing eidetic computer systems that can recall, on demand, any past state that existed on a computer, including all versions of all files, transient application state, and network communication. Further, eidetic computer systems can explain the provenance of current and past state at byte granularity: for example, they can answer questions such as "Where did this data come from and what steps were used to derive the data?" or "What state or outputs did this data influence?" The proposed work is deploying several important system features that utilize eidetic systems: (1) Total recall: a computer system can recover not just any prior data saved to disk (as in versioning file systems), but also any prior application state, transient output, or computation state. (2) Complete provenance: for any data object, the system can provide the history of how that data was produced, including inter-process channels and the computations that transformed the data. (3) Rich queries: users can reason over the entire history of execution to detect anomalies, recover workflows, and improve productivity. (4) Replay-structured storage: network usage can be reduced by deterministically regenerating file data at endpoints, and storage usage can be reduced by opportunistically deduplicating logs of non-deterministic inputs rather than file data. (5) Privacy-preserving replay: provenance can enable comprehensive deletion policies in which all values derived from sensitive data are identified and redacted.
View original record on NSF Award Search →