Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion
🔥INFO
Blog: 2025/07/30 by IgniSavium
- Title: Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion
- Authors: Dongyang Li et al. (Southern University of Science and Technology)
- Published: March 2024
- Comment: NIPS2024
- URL: https://arxiv.org/abs/2403.07721
🥜TLDR: diffusion prior + SDXL-Turbo + IP-Adapter
Motivation
This paper aims to address the limited practicality and performance gap of EEG-based visual decoding compared to fMRI-based methods by introducing a novel zero-shot EEG-to-image reconstruction framework that achieves state-of-the-art results through a tailored encoder and two-stage generation (prior diffusion to convert \(Z_{EEG}\) to \(Z_{img}\)) strategy, overcoming previous challenges like low signal quality and limited model design.
Model
Architecture
EEG Encoder (Channel-wise Transformer + Synthesis Module):
Image Reconstruction training uses weighted loss of these two:
- contrastive loss between EEG-Encoder and CLIP Visual Encoder (\(R^{1024}\) in ViT-14/Large)
- MSE to encoder EEG into image's VAE latent
Two-stage EEG-guided image generation pipeline using conditional diffusion models (ref. DALLE-2):
- Stage I – EEG-Conditioned Diffusion: EEG embeddings condition a diffusion model to generate CLIP image embeddings using a U-Net and classifier-free guidance.
-
Stage II – Image Synthesis: The generated CLIP embedding guides image generation using pre-trained SDXL(Stable Diffusion) and IP-Adapter (ImagePrompt: adapter weights for image-embedding-conditioned SD), with optional acceleration via SDXL-Turbo.
-
Low-Level Pipeline: To recover fine visual details (e.g., contours, posture), EEG featurevs are aligned with VAE latents. Only latent-space loss works reliably; full image-level losses are unstable and memory-intensive.
-
Semantic-Level Pipeline: EEG-derived image features generate captions via GIT, optionally guiding generation via text prompts. Due to potential semantic drift, this step is not always applied.