Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion

🔥INFO

Blog: 2025/07/30 by IgniSavium

Title: Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion
Authors: Dongyang Li et al. (Southern University of Science and Technology)
Published: March 2024
Comment: NIPS2024
URL: https://arxiv.org/abs/2403.07721

🥜TLDR: diffusion prior + SDXL-Turbo + IP-Adapter

Motivation

This paper aims to address the limited practicality and performance gap of EEG-based visual decoding compared to fMRI-based methods by introducing a novel zero-shot EEG-to-image reconstruction framework that achieves state-of-the-art results through a tailored encoder and two-stage generation (prior diffusion to convert \(Z_{EEG}\) to \(Z_{img}\)) strategy, overcoming previous challenges like low signal quality and limited model design.

Model

Architecture

EEG Encoder (Channel-wise Transformer + Synthesis Module):

Image Reconstruction training uses weighted loss of these two:

contrastive loss between EEG-Encoder and CLIP Visual Encoder (\(R^{1024}\) in ViT-14/Large)
MSE to encoder EEG into image's VAE latent

Two-stage EEG-guided image generation pipeline using conditional diffusion models (ref. DALLE-2):

Stage I – EEG-Conditioned Diffusion: EEG embeddings condition a diffusion model to generate CLIP image embeddings using a U-Net and classifier-free guidance.

Stage II – Image Synthesis: The generated CLIP embedding guides image generation using pre-trained SDXL(Stable Diffusion) and IP-Adapter (ImagePrompt: adapter weights for image-embedding-conditioned SD), with optional acceleration via SDXL-Turbo.
Low-Level Pipeline: To recover fine visual details (e.g., contours, posture), EEG featurevs are aligned with VAE latents. Only latent-space loss works reliably; full image-level losses are unstable and memory-intensive.
Semantic-Level Pipeline: EEG-derived image features generate captions via GIT, optionally guiding generation via text prompts. Due to potential semantic drift, this step is not always applied.

Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion

🔥INFO

Motivation

Model

Architecture

Evaluation

Overall Performance

Reconstruction Performance

Temporal and Spatial Interpretability

Retrieval and Classification Performance