26 November 2022 Generating traceable adversarial text examples by watermarking in the semantic space
Mingjie Li, Hanzhou Wu, Xinpeng Zhang
Author Affiliations +
Abstract

The adversarial examples have been proven to reveal the vulnerability of the deep neural networks (DNNs) model, which can be used to evaluate the performance and further improve the robustness of the model. Because text data is discrete, it is more difficult to generate adversarial examples in the natural language processing (NLP) domain than in the image domain. One of the challenges is that the generated adversarial text examples should maintain the correctness of grammar and the semantic similarity compared with the original texts. In this paper, we propose an adversarial text generation model, which generates high-quality adversarial text examples through an end-to-end model. Moreover, the adversarial text examples generated by our proposed model are embedded with watermarks, which can mark and trace the source of the generated adversarial text examples and prevent the model from being maliciously or illegally used. The experimental results show that the attack success rates of the proposed model can still reach higher than 88% even on the AG’s News dataset where generating adversarial text examples is more difficult. And the quality of adversarial text examples generated by the proposed model is higher than that of the baseline models. At the same time, because of the generated adversarial text examples are embedded with strong robust watermarks, the model can be better protected.

© 2022 SPIE and IS&T
Mingjie Li, Hanzhou Wu, and Xinpeng Zhang "Generating traceable adversarial text examples by watermarking in the semantic space," Journal of Electronic Imaging 31(6), 063034 (26 November 2022). https://doi.org/10.1117/1.JEI.31.6.063034
Received: 23 June 2022; Accepted: 8 November 2022; Published: 26 November 2022
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Digital watermarking

Data modeling

Semantics

Silver

Performance modeling

Education and training

Computer programming

Back to Top