'2024/05 글 목록

« 2024/05 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

[평범한 학부생이 하는 논문 리뷰] SonicDiffusion : Audio-Driven Image Generation and Editing with Pretrained Diffusion Model

https://arxiv.org/abs/2405.00878 SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion ModelsWe are witnessing a revolution in conditional image synthesis with the recent success of large scale text-to-image generation methods. This success also opens up new opportunities in controlling the generation and editing process using multi-modal input. Warxiv.org1. Introdu..

AI/Diffusion 2024. 5. 31. 23:49

[평범한 학부생이 하는 논문 리뷰] InstructAny2Pix : Flexible Visual Editing via Multimodal Instruction Following

https://arxiv.org/abs/2312.06738 InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction FollowingThe ability to provide fine-grained control for generating and editing visual imagery has profound implications for computer vision and its applications. Previous works have explored extending controllability in two directions: instruction tuning with textarxiv.orghttps://github.com/jack..

AI/Diffusion 2024. 5. 27. 00:03

[DEVOCEAN OpenLab] PEEKABOO : Interactive Video Generation via Masked Diffusion

https://arxiv.org/abs/2312.07509 PEEKABOO: Interactive Video Generation via Masked-DiffusionModern video generation models like Sora have achieved remarkable success in producing high-quality videos. However, a significant limitation is their inability to offer interactive control to users, a feature that promises to open up unprecedented applicaarxiv.org1. Introduction 본 논문에서는 PEEKABOO를 도입한다. 이..

Experience/[DEVOCEAN] Text-to-Video Diffusion Study 2024. 5. 23. 15:58

[DEVOCEAN OpenLab] Follow Your Pose : Pose-Guided Text-to-Video Generation using Pose-Free Videos

https://arxiv.org/abs/2304.01186 Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free VideosGenerating text-editable and pose-controllable character videos have an imperious demand in creating various digital human. Nevertheless, this task has been restricted by the absence of a comprehensive dataset featuring paired video-pose captions and the garxiv.org1. Introduction Text-to..

Experience/[DEVOCEAN] Text-to-Video Diffusion Study 2024. 5. 15. 01:13

[평범한 청강생의 논문 맛보기] Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance (PAG)

https://arxiv.org/abs/2403.17377 Self-Rectifying Diffusion Sampling with Perturbed-Attention GuidanceRecent studies have demonstrated that diffusion models are capable of generating high-quality samples, but their quality heavily depends on sampling guidance techniques, such as classifier guidance (CG) and classifier-free guidance (CFG). These techniquesarxiv.org1. Introduction Diffusion Model들은..

Experience/DAVIAN Lab Computer Vision Study 2024. 5. 10. 00:51

[평범한 학부생이 하는 논문 리뷰] GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

https://arxiv.org/abs/2112.10741 GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion ModelsDiffusion models have recently been shown to generate high-quality synthetic images, especially when paired with a guidance technique to trade off diversity for fidelity. We explore diffusion models for the problem of text-conditional image synthesis and carxiv.org1. Intro..

AI/Diffusion 2024. 5. 6. 01:59

[평범한 학부생이 하는 논문 리뷰] Magic Clothing : Controllable Garment-Driven Image Synthesis

https://arxiv.org/abs/2404.09512 Magic Clothing: Controllable Garment-Driven Image SynthesisWe propose Magic Clothing, a latent diffusion model (LDM)-based network architecture for an unexplored garment-driven image synthesis task. Aiming at generating customized characters wearing the target garments with diverse text prompts, the image controllarxiv.org1. Introduction 본 논문의 주 contribution을 요약하..

AI/Virtual Try-On 2024. 5. 1. 16:29

평범한 필기장

목록2024/05 (7)

평범한 필기장

티스토리툴바