Cookies are used for analytics and marketing. Enable JavaScript to manage preferences, or see our Privacy Policy.

Abstracts

Add abstract

Want to add your dissertation abstract to this database? It only takes a minute!

Search abstract

Search for abstracts by subject, author or institution

Share this abstract

dissertation.com
on Facebook

Trojan Attacks and Defenses on Deep Neural Networks

by Yingqi Liu

Institution:	Purdue University
Department:	Computer Sciences
Degree:
Year:	2022
Keywords:	Cybersecurity and privacy not elsewhere classified; Adversarial machine learning; Adversarial Machine Learning; Backdoor Attack; Trojan Attack; AI safety; Robust Machine Learning; Artificial Intelligence
Posted:	3/25/2025
Record ID:	2321080
Full text PDF:	https://doi.org/10.25394/pgs.21312327.v1

Abstract

With the fast spread of machine learning techniques, sharing and adopting public deep neural networks become very popular. As deep neural networks are not intuitive for human to understand, malicious behaviors can be injected into deep neural networks undetected. We call it trojan attack or backdoor attack on neural networks. Trojaned models operate normally when regular inputs are provided, and misclassify to a specific output label when the input is stamped with some special pattern called trojan trigger. Deploying trojaned models can cause various severe consequences including endangering human lives (in applications like autonomous driving). Trojan attacks on deep neural networks introduce two challenges. From the attacker's perspective, since the training data or training process is usually not accessible to the attacker, the attacker needs to find a way to carry out the trojan attack without access to training data. From the user's perspective, the user needs to quickly scan the online public deep neural networks and detect trojaned models. We try to address these challenges in this dissertation. For trojan attack without access to training data, We propose to invert the neural network to generate a general trojan trigger, and then retrain the model with reverse-engineered training data to inject malicious behaviors to the model. The malicious behaviors are only activated by inputs stamped with the trojan trigger. To scan and detect trojaned models, we develop a novel technique that analyzes inner neuron behaviors by determining how output activation change when we introduce different levels of stimulation to a neuron. A trojan trigger is then reverse-engineered through an optimization procedure using the stimulation analysis results, to confirm that a neuron is truly compromised. Furthermore, for complex trojan attacks, we propose a novel complex trigger detection method. It leverages a novel symmetric feature differencing method to distinguish features of injected complex triggers from natural features. For trojan attacks on NLP models, we propose a novel backdoor scanning technique. It transforms a subject model to an equivalent but differentiable form. It then inverts a distribution of words denoting their likelihood in the trigger and applies a novel word discriminativity analysis to determine if the subject model is particularly discriminative for the presence of likely trigger words.