Adversarial samples are inputs to machine learning classifiers that an attacker has designed to cause the model to make a mistake. In Centroida we managed to design such adversarail samples which work against widely used online DL Vision services.
The techniques that we used for ‘fooling’ the online classifiers do not involve sending any queries except the test ones including the modfied / perturbed input. For implementing the generation techniques, we used several pre-trained on ImageNet architectures, namely: VGG16, ResNet50, and VGG19. We also included a pre-trained SSD300 model used for detecting objects in the input images.
Adversarial Samples Generation
Since we did not assume any access to the oracle model (the online service) we decided that substituting the oracle model with one of the widely-used DL Vision architectures and crafting adversarial samples against it will provide good results, due to the fact that several recent papers prove transferability of adversarial samples across different architectures. The reason for that might be that either the online oracle was trained by finetuning one of the above-mentioned architectures or a similar design was used while training it from scratch.
The algorithm that we used includes the following steps
1. Choosing substitute classifier
For the purpose of this article, let us assume that the chosen classifier is VGG19 pre-trained on ImageNet
2. Cropping object
Provided an input image we use an object detection model (for this experiment we used SSD300) in order to crop the object of interest from the image
Crop Used for generating perturbation:
Image with perturbation applied (Adversarial Image):
3. Modifying crop
The object crop (the wolf from the previous image) is passed to our algorithm which performs the following steps:
- Classify the crop with the above-mentioned architectures which were not used as oracles in the particular case ResNet50 and VGG16
- Obtain the direction of the gradient in which the probability of the correct class will decrease
- Obtain masks for ResNet50 and VGG16 models (perturbation vectors) by multiplying the direction by an epsilon (parameter controlling perceptibility of the perturbation) which are then aggregated and added to the crop
We repeat each of the steps (in 3) until we make sure that both ResNet50 and VGG16 missclassify the crop by assigning a large enough probability to one of the wrong classes (it can be any of the 999 left classes which are all different from the correct one). The last step includes testing the perturbed crop against the substitute network that we have chosen in step 1. If the crop is missclassified we proceed to next step, otherwise we repeat the modification procedure by increasing epsilon (the allowed perturbation).
After that we return the perturbed crop to the image by substituting the original one.
At the end, we have the perturbed image which we can feed to the oracle classifier and check if our attack is successfull.
The algorithm an be used for hiding specific objects from an image before feeding it to an oracle model. Moreover, our technique can be used to make a classifier ‘hallucinate’ about certain objects appearing in the image. As long as we are able to crop the object of interest the outlined technique can be directly applied since it does not involve any additional cost of sending querries to the online model.
Below are several adversarial samples (before and after) which show significant difference in the classification results once they are uploaded to machine learning classifiers.
- Microsoft Vision API
Example of misslassifying the wolf picture