Translator Disclaimer
10 May 2019 Managing training data from untrusted partners using self-generating policies
Author Affiliations +
When training data for machine learning is obtained from many different sources, not all of which may be trusted, it is difficult to determine which training data to accept and which to reject. A policy-based approach for data curation, where the policies are generated after examining the properties of the offered data, can provide a way to only accept selected data for creating a machine learning model. In this paper, we discuss the challenges associated with generating policies that can manage training data from different sources. An efficient policy generation scheme needs to determine the order in which information is received, must have an approach to determine the trustworthiness of each partner, must have an approach to decide how to quickly assess which data subset can add value to a complex model, and must address several other issues. After providing an overview of the challenges, we propose approaches to solve them and study the properties of those approaches.
Conference Presentation
© (2019) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Dinesh Verma, Seraphin Calo, Shonda Witherspoon, Irene Manotas, Elisa Bertino, Amani Abu Jabal, Geeth de Mel , Ananthram Swami, Greg Cirincione, and Gavin Pearson "Managing training data from untrusted partners using self-generating policies ", Proc. SPIE 11006, Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, 110060P (10 May 2019);

Back to Top