Seeing through the Brain: Image Reconstruction of Visual Perception from Human Brain Signals
🔥INFO
Blog: 2025/07/22 by IgniSavium
- Title: Seeing through the Brain: Image Reconstruction of Visual Perception from Human Brain Signals
- Authors: Yu-Ting Lan,Wei-Long Zheng et al. (SJTU)
- Published: August 2023
- Comment: arxiv
- URL: https://arxiv.org/abs/2308.02510
🥜TLDR: EEG2img Reconstruction Using the Stable Diffusion Framework: Transforming EEG Signals into Fine-Granularity Image Silhouette Saliency Maps and Coarse-Granularity CLIP Text Embeddings for Descriptions.
Motivation
This research addresses the challenge of reconstructing high-resolution visual stimuli from EEG signals, a task complicated by the temporal nature and noise of EEG data, and improves upon previous approaches by introducing a multi-level (actually 2-level) semantic decoding method that enhances image quality and semantic accuracy, filling the gap left by earlier methods that either struggled with low resolution or failed to capture semantic details.
Model
Pixel-Level gran.
First, train a simple EEG feature extractor by clustering (a: anchor; p: positive ; n: negative):
Next, train a GAN saliency map generator (using $ f_\theta $ above as input):
ms: mode seeking regularization ; SSIM: structural similarity index measure
Sample-Level gran.
transform EEG signal to corresponding description CLIP text embeddings using L2 loss.
Evaluation
Qualitative Results
Performance
IS: Inception Score
Cross-Subject Consistency
🤔As having incorporated all-subject data in the training set, this Cross-Subject Consistency experiment is somehow pointless.
Ablation
Insufficient visual structural information (saliency map in this case) has become a drag on for more informative captions (BLIP caption vs. simple label caption in this case).
🤔Reflections
The feature extractor for EEG signals is poorly trained, resulting in GAN-generated saliency maps that are derived from ambiguous EEG features. These maps are highly inadequate for capturing the overall visual structure of the original image, including attributes such as position, size, and orientation.
An alternative approach is to directly map EEG signals into the latent space of the DM (e.g., the latent states encoded from the original images) [already utilized by Yu Takagi and Shinji Nishimoto].