Code

The primary goal of this project was to determine whether images containing both objects and scenes belong to the object or scene category. To address this question, I employed various neural networks, such as ResNet and AlexNet, initially trained on object classification (ImageNet) or scene classification (Places365) tasks. I then fine-tuned these networks using a custom dataset of object and scene images to adapt their feature extraction capabilities.

For evaluation, I utilized a separate testing dataset containing images of objects, scenes, and both conditions. I extracted features from each layer of the neural networks and assessed which model (control, scene, or object) could best explain the variance in the feature space across the different layers of the networks using representational similarity analysis (RSA). This involved extracting representational dissimilarity matrices (RDMs) for each network. For instance, the RSA results indicated that images containing both objects and scenes are more similar to scenes rather than objects in the last layers of AlexNet trained on ImageNet.

However, the overall results were not significant, indicating that the models were unable to effectively explain the variance in the feature space. This suggests that a better set of images or a different approach may be needed to improve the prediction accuracy for the ‘both’ condition.

Example image set

Example feature extraction (Alexnet trained on Imagenet)

RDMs for models and features

RSA results

Acknowledgement

  • This project was part of my research internship in the Object Vision Group at CIMeC. I would like to thank and acknowledge Dr. Stefania Bracci’s supervision during my training.