[평범한 대학원생이 하는 논문 간단 요약] Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts (ACMMM 2025)

Notice

Recent Posts

Recent Comments

Link

« 2026/02 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28

Tags more

Archives

Today

Total

관리 메뉴

평범한 필기장

[평범한 대학원생이 하는 논문 간단 요약] Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts (ACMMM 2025) 본문

AI/Generative Models

[평범한 대학원생이 하는 논문 간단 요약] Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts (ACMMM 2025)

junseok-rh 2025. 9. 23. 18:13

Paper : https://arxiv.org/abs/2504.12782

Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts

Ensuring the ethical deployment of text-to-image models requires effective techniques to prevent the generation of harmful or inappropriate content. While concept erasure methods offer a promising solution, existing finetuning-based approaches suffer from

arxiv.org

Abstract

기존의 anchor-free method들은 sampling trajectory를 방해해서 visual artifact들을 야기한다. 그리고 anchor-based method들은 anchor concept들의 선택에 의존한다. 본 논문은 이러한 문제점들을 극복하기 위해서, Automatically guides deNoiseing Trajectory(ANT)를 도입한다.

1. Method

1.1 Insights into the Denoising Process

본 논문은 early sampling stage동안에 CFG를 적용하고 mid-to-late sampling stage 동안에 CFG의 condition direction term을 바꾸는 것이 image의 fundamental structure를 유지하면서 detailed content를 바꿀 수 있게 한다는 것을 발견한다. 즉, 샘플이 unwanted concept들로 수렴하는 것을 막으면서 natural image manifold에는 남는다.

여기서 $\mathbf{\delta}(\mathbf{c}) = \epsilon_\theta(z_t,t,\mathbf{c}) - \epsilon_\theta(z_t,t)$이다.

위 이미지에서와 같이, $t^\prime$에 적절하게 설정이 되면, 특정 틱징이나 디테일들이 제거되고 생성된 이미지의 naturalness는 보존된다. 이는 denoising의 early stage동안, 샘플들이 correct score function을 따르고 그럴듯한 data manifold로 가이드되기 때문이다. Later stage에서, guidance는 그 manifold안에 특정 mode로부터 떨어진 sample로 향한다. $t^\prime$이 너무 이르면 이미지의 구조적인 integrity를 잃고, 너무 늦으면 이미 concept-specific model에 들어갔기 때문에 fine detail들에만 영향을 준다.

1.2 Trajectory-Aware Loss Function

앞선 섹션의 내용처럼, adjustment는 mid-to-late stage score function field로 제한돼야 한다. 이를 통해 finetuning된 model이 지워진 concept을 condition으로 받았을 때도 샘플이 적절한 manifold로 여전히 수렴할 수 있다. 본 논문의 finetuning objective는 다음과 같다.

여기서 $\theta$는 finetuning되는 파라미터이고, $\theta^*$는 원래 모델의 파라미터이다. 각 iteration마다 $t_1, t_2$를 샘플링해서 저 loss를 한번에 계산한다.

Early-stage preservation

첫번째 term $\mathcal{L}_{preserve}$는 early stage 동안에 predicted conditional score function이 지속적으로 natural data model로 향하도록 한다. 이는 early stage score function field의 integrity를 유지한다. 그 결과, 지워진 컨셉이 컨디션으로 주어졌을 때 finetuning된 모델로 샘플링을 할 때, 생성된 샘플들은 natural image manifold로 부드럽게 이동한다.

Mid-to-late-stage erasure

두번째 term $\mathcal{L}_{erase}$은 later stage에서 predicted conditional score function이 undesirable mode로부터 멀어지도록 guide하는 것을 강조한다. 이는 ESD loss에서 모든 timestep을 반영하는 것과 다르다.

Unconditional score function preservation

Unconditional score function $\epsilon_\theta(z_t,t)$은 모든 data mode들의 approximation center를 향하는 일반적은 방향을 나타내므로, 이를 수정하는 것은 많은 concept들에게 영향을 끼칠 수 있다. 이 문제를 해결하기 위해서 본 논문은 (3)에서 세번째와 네번째 term을 도입한다. 이 term들은 finetuning된 모델의 unconditional output이 기존 모델의 것들과 align시킨다.

1.3 The Heavy Hitters Among the Parameters

본 논문은 모든 파라미터를 finetuning시키지 않고, 특정 파라미터만 선택해 finetuning시킨다. 그래서 어떤 파라미터를 선택할 지가 매우 중요하다. 본 논문은 이를 위해서 prompt와 seed augmentation을 통해 강화된 concept-specific saliency map을 제안한다. Saliency map은 prompt context와 random seed에 따라 다양하다. 그래서 본 논문은 다양한 saliency map의 intersection을 구한다.

GPT4를 이용해서 다양한 promtp들을 생성하고 이를 random seed와 결합한다. 그리고 이를 이용해서 model 파라미터에 대한 gradient map을 생성한다. Threshold를 통해서 gradient들을 평가함으로써, saliency map을 얻는다.

이 saliency map들의 intersection을 통해서 최종적은 concept-specific saliency map $\mathbf{M}^*$를 얻는다.

파라미터는 최종적으로 다음과 같이 finetuning된다.

1.4 Boosting the Performance of Multi-Concept Erasure Frameworks

본 논문의 trajectory-aware loss function은 기존의 multi-concept erasure framework인 MACE에 통합될 수 있다. MACE에서 특정 concept을 지우기 위해서 weight를 closed form으로 구해서 그 weight로 기존 weight를 대체하는데, 이 stage는 생략하고 각 concept에 대한 LoRA를 finetuning하기 위한 MACE의 attention loss를 본 논문의 trajectory-aware loss로 대체한다. 이는 기존 MACE에서 Grounded-SAM모델을 필요로 하는 것을 없앤다. 각 concept들에 대해서 LoRA들을 학습하고 나서, 본 논문은 아래의 objective function을 사용해 cross-attention layer로 이들을 융합해 넣는다.

$e^f_j$ : 지우려고 하는 concept과 연관된 token embedding
$e^p_j$ : 지우려는 concept과 연관이 없고, prior-preservation token embedding

위 objective는 여러 LoRA matrix들을 통합하는 $\mathbf{W}^*$를 찾는데, 이는 closed-form으로 구할 수 있다.

2. Experiments

Stable Diffusion 1.4를 base로 실험을 진행했다.

2.1 Erasing NSFW Content

I2P dataset과 MS-COCO dataset을 활용해서 실험을 진행했다.

2.2 Erasing Celebrity

MACE에서의 200-celebrity dataset을 이용해서 실험을 진행.

$H_c = \frac{1}{(1-Acc_e)^{-1} + (Acc_p)^{-1}}$

2.3 Erasing Art Style

MACE에서의 200-artist dataset을 이용해서 실험 진행.

$H_a = CLIP_p - CLIP_e$

2.4 Ablation Study

Take Away

Later step에서 negative guidance를 주는게 주요하다.
Multi-concept erasure는 MACE 방식을 그대로 사용하는게 아쉽다...

'AI > Generative Models' 카테고리의 다른 글

[평범한 대학원생이 하는 논문 간단 요약] FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models (ICCV 2025 Oral) (0)	2025.10.05
[평범한 대학원생이 하는 논문 간단 요약] Localized Concept Erasure for Text-to-Image Diffusion Models Using Training-Free Gated Low-Rank Adaptation (CVPR 2025) (0)	2025.09.29
[평범한 대학원생이 하는 논문 간단 요약] Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models (CVPR 2025) (0)	2025.09.21
[평범한 대학원생이 하는 논문 간단 요약] FADE : Adversarial Concept Erasure in Flow Models (ArXiv 2507) (0)	2025.09.19
[평범한 대학원생이 하는 논문 간단 요약] Holistic Unlearning Benchmark : A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning (ICCV 2025) (0)	2025.09.16

'AI/Generative Models' Related Articles

평범한 필기장

[평범한 대학원생이 하는 논문 간단 요약] Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts (ACMMM 2025) 본문

[평범한 대학원생이 하는 논문 간단 요약] Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts (ACMMM 2025)

Abstract

1. Method

1.1 Insights into the Denoising Process

1.2 Trajectory-Aware Loss Function

Early-stage preservation

Mid-to-late-stage erasure

Unconditional score function preservation

1.3 The Heavy Hitters Among the Parameters

1.4 Boosting the Performance of Multi-Concept Erasure Frameworks

2. Experiments

2.1 Erasing NSFW Content

2.2 Erasing Celebrity

2.3 Erasing Art Style

2.4 Ablation Study

Take Away

'AI > Generative Models' 카테고리의 다른 글

티스토리툴바