Privacy Preserving Stream Analytics at Scale
Recent years have seen unprecedented growth in networked devices and services that continuously collect increasingly detailed information about individuals. The collection of this unbounded stream of data is increasingly prevalent across a wide range of systems in diverse domains such as health, agriculture, transportation, operational insight, and smart cities. The growth of streaming data is largely attributed to the rising demand for instrumentation. Individuals and organizations are continuously logging various metrics that report systems’ state for better diagnoses, forecasting, decision making, and resource allocation. However, with this trend comes the problem of ensuring the privacy of user data. Users today typically entrust their data to a thirdparty storage or application provider. However, there is growing concern that this model leaves users vulnerable to privacy violations due to misuse of their data - whether deliberate or inadvertent – by third-party providers. These concerns appear to be amply justified, given the numerous reports of recent data breaches and misuse. A frequently advocated solution to this concern is that users rely on end-to-end encryption in which data is encrypted at the source such that even cloud storage or service providers never see data in the clear. However, this approach has the potential to severely limit a user’s ability to compute and share access to their data, and this, in turn, limits the genuine value to be found in large-scale datasets. In this project, we explore a new approach to designing privacy preserving stream processing systems that adhere to the end-to-end encryption paradigm yet maintain their functionalities and performance over encrypted data.
• TimeCrypt (Published in USENIX NSDI’20): A new scalable encrypted time-series database that meets the scalability and low-latency requirements associated with time-series workloads. TimeCrypt protects data confidentiality, yet maintains data utility by efficiently supporting a rich set of functionalities and analytics on encrypted time series data.
• Droplet (Published in USENIX Security’20): A new decentralized data access control service that enables data owners to securely and selectively share their encrypted data while guaranteeing data confidentiality in the presence of unauthorized parties and compromised data servers.