数据集

DatasetAvg DuraActorEmoViewIntensityResolutionClips
SAVESS7min21s47111280*1024480
RAVDESS3min42s248121920*10807356
GRID18min54s341720*57634000
Lombard4min1s542854*48010800
CREMA-D91613(1/12)1280*7207442
MEAD38min57s608731920*1080281400
Mocap14113D blendshape865(english)
925(chinese)

论文

NameAuthorTimeAbstractLink
EXPRESSIVE SPEECH-DRIVEN FACIAL ANIMATION WITH
CONTROLLABLE EMOTIONS

(3D)
清华大学,惠灵顿大学2023link
SPACEx: Speech-driven Portrait Animation with Controllable Expression
(情感特征控制,强度可控,可参考
英伟达2022数据集:VoxCeleb2,RAVDESS,MEAD
模型:

Speech2Landmarks + Pose generation + Landmarks2Latents + FiLM(emotion control) +
face-vid2vid generator (可借鉴)
输入:
图片 + 语音 + 情感特征
link
Demo1
Demo2
Demo3
Expressive Talking Head Generation with Granular Audio Visual Control
情感视频控制
百度,港中文2022数据集: VoxCeleb2 MEAD
模型:
ID encoder + pose encoder + emotional encoder + content encoder + audio encoder + G
图像分割+关键点检测+mask机制
Loss:
contrastive loss Lc (视频特征和音频特征分布一致性) + L l1 + Lvgg + LGAN
输入:
图片+pose video + emotion video + content video/audio
link
EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model情感视频控制可参考
南京大学,港中文,悉尼大学,莫纳什大学,商汤,清华2022数据集:LRW,MEAD(2020),RAVDESS,CFD,CREMA-D
模型:
Keypoint Detector Ek
Audio2Facial-Dynamics Module:
image encoder EI ,audio encoder Ea,pose sequence encoder Ep ,LSTM decoder,Flow
Estimator F,image generator G
First Order Motion Model
Implicit Emotion Displacement Learner:
Emotion Extractor Ee ,Displacement Predictor Pd
Loss:
A2FD module:a key-point loss term loss Lkp ,perceptual loss Lper
Implicit Emotion Displacement Learner:Lkp
输入:图片+语音+预制头动序列+情感视频
link
code
Audio-Driven Emotional Video Portraits
情感语音控制可参考

南京大学,商汤,港中文,南洋理工,清华2021数据集:MEAD(情感),LRW(预训练content模块)
模型:Cross-Reconstructed Emotion Disentanglement + Audio To Landmark + 3D-Aware Keypoint Alignment + Edge-to-Video
对齐:3D-Aware Keypoint Alignment
Loss:
1. cross reconstructed emotion disentanglement:
cross reconstruction loss+self reconstruction loss+ classification loss+content loss
2. Edge-to-Video translation network
输入:情感语音+驱动视频
link
code
Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation
3D 文本驱动BS情感标签
网易伏羲AI,悉尼大学2021数据集: Mocap
模型:

Speaker-independent: Ghed,Gupp,Gmou
Speaker-specific: Gldmk,Gvid
人脸、背景mask生成
Loss:

Speaker-independent: L1 + Ladv +Lssim
Speaker-specific: Ladv + Lperc + Limg + Lface
输入:
文本 + 情感标签 + 驱动视频
link
demo
3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head
(3D)
计算所夏老师2021数据:RAVDESS ,LBG
模型:Deepspeech+Voca
3DMM + Facescape model
输入:语音 + 3D mesh
link
Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis
(3D情感视频控制
清华大学,海思hisilicon2021数据集:Ted-HD(834 clips)
模型:

Deep 3D reconstruction + deepspeech + resnet encoder-decoder + Unet + pix2pixHD
输入:
图片 + 风格视频 + 语音
link
Speech Driven Talking Face Generation from a Single Image and an Emotion Condition
情感标签控制
University of Rochester2021数据集:CREMA-D
模型:image encoder+speech encoder + noise encoder + emotion encoder
Loss:mouth region mask loss + perceptual loss + frame discriminator loss + emotion discriminator loss
输入:图片+语音+情感标签(one-hot)
link
code
MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation
数据集
商汤,卡梅隆大学,中科院,南洋理工2020数据集公开link
code
Realistic Speech-Driven Facial Animation with GANs
语音驱动数据集包含情感
Samsung AI Centre,
Cambridge, UK
2019数据集:GRID, TCD TIMIT, CREMA-D
模型:Identity Encoder+Content Encoder+Noise Generator+Frame Decoder
Loss:Frame Discriminator loss + Sequence Discriminator loss + Synchronization Discriminator loss
输入:语音+图片
link
Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks德克萨斯大学2018learn the relationship between emotion and lip movements from a designed conditional generative adversarial networklink
ExprGAN: Facial Expression Editing with Controllable Expression IntensityUniversity of Maryland2017link

发表评论

您的电子邮箱地址不会被公开。 必填项已用*标注