Awesome talking face generation

Awesome talking face generation

papers & codes

2023

titlepapercodedatasetkeywords
Audio-Visual Face ReenactmentWACV (23)papercode

2022

titlepapercodedatasetkeywords
Compressing Video Calls using Synthetic Talking HeadsBMVC (22)paperapplication
EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion ModelSIGGRAPH (22)paperemotion
Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head SynthesisECCV(22)papercode
Expressive Talking Head Generation with Granular Audio-Visual ControlCVPR(22)paper
Talking Face Generation With Multilingual TTSCVPR(22)papercode
Deep Learning for Visual Speech Analysis: A Surveypapersurvey
StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGANpapercodestylegan
Semantic-Aware Implicit Neural Audio-Driven Video Portrait GenerationECCV(22)papercode(coming soon)NeRF
Cross-Modal Mutual Learning for Audio-Visual Speech Recognition and Manipulationpaper
SyncTalkFace: Talking Face Generation with Precise Lip-syncing via Audio-Lip MemoryAAAI(22)paper(temp)LRW, LRS2, BBC News
DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural RenderingpaperNeRF
Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videospaper
Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressionspaper
DialogueNeRF: Towards Realistic Avatar Face-to-face Conversation Video Generationpaper
Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusionpaper
StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generationpaper

2022

titlepapercodedatasetkeywords
EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion ModelSIGGRAPH (22)paperemotion
Expressive Talking Head Generation with Granular Audio-Visual ControlCVPR(22)paper
Deep Learning for Visual Speech Analysis: A Surveypapersurvey
StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGANpapercodestylegan
Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generationpapercode(coming soon)NeRF
Cross-Modal Mutual Learning for Audio-Visual Speech Recognition and Manipulationpaper
SyncTalkFace: Talking Face Generation with Precise Lip-syncing via Audio-Lip MemoryAAAI(22)paper(temp)LRW, LRS2, BBC News
DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural RenderingpaperNeRF
Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videospaper
Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressionspaper
DialogueNeRF: Towards Realistic Avatar Face-to-face Conversation Video Generationpaper
Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusionpaper

2021

titlepapercodedataset
Parallel and High-Fidelity Text-to-Lip Generationpaper
[Survey]Deep Person Generation: A Survey from the Perspective of Face, Pose and Cloth Synthesispaper
FaceFormer: Speech-Driven 3D Facial Animation with TransformersCVPR(22)papercode
Voice2Mesh: Cross-Modal 3D Face Model Generation from Voicespapercode
FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute LearningICCVpapercode
Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesispapercode
Audio-Driven Emotional Video PortraitsCVPRpapercodeMEAD, LRW
LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting NormalizationCVPRpaper
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual RepresentationCVPRpapercodeVoxCeleb2, LRW
Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual DatasetCVPRpapercodeHDTF
MeshTalk: 3D Face Animation from Speech using Cross-Modality DisentanglementICCVpapercode(coming soon)
AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head SynthesisICCVpapercode
Write-a-speaker: Text-based Emotional and Rhythmic Talking-head GenerationAAAIpapercode(coming soon)Mocap dataset
Visual Speech Enhancement Without A Real Visual Streampaper
Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionarypapercode
Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head MotionIJCAIpapercodeVoxCeleb, GRID, LRW
3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Headpaper
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary PersonpaperVoxCeleb2, Obama

2020

titlepapercodedataset
What comprises a good talking-head video generation?: A Survey and Benchmarkpapercode
Speech Driven Talking Face Generation from a Single Image and an Emotion ConditionpapercodeCREMA-D
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The WildACMMMpapercodeLRS2
Talking-head Generation with Rhythmic Head MotionECCVpapercodeCrema, Grid, Voxceleb, Lrs3
MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face GenerationECCVpapercodeVoxCeleb2, AffectNet
Neural voice puppetry:Audio-driven facial reenactmentECCVpaper
Fast Bi-layer Neural Synthesis of One-Shot Realistic Head AvatarsECCVpapercode
HeadGAN:Video-and-Audio-Driven Talking Head SynthesispaperVoxCeleb2
MakeItTalk: Speaker-Aware Talking Head AnimationpapercodecodeVoxCeleb2, VCTK
Audio-driven Talking Face Video Generation with Learning-based Personalized Head PosepapercodeImageNet, FaceWarehouse, LRW
Photorealistic Lip Sync with Adversarial Temporal Convolutional Networkspaper
SPEECH-DRIVEN FACIAL ANIMATION USING POLYNOMIAL FUSION OF FEATURESpaperLRW
Animating Face using Disentangled Audio RepresentationsWACVpaper
Everybody’s Talkin’: Let Me Talk as You Wantpaper
Multimodal Inputs Driven Talking Face Generation With Spatial-Temporal Dependencypaper
Speech Driven Talking Face Generation from a Single Image and an Emotion Conditionpaper

2019

titlepapercodedataset
Hierarchical Cross-Modal Talking Face Generation with Dynamic Pixel-Wise LossCVPRpapercodeVGG Face, LRW

datasets

metrics

  • PSNR (peak signal-to-noise ratio)
  • SSIM (structural similarity index measure)
  • LMD (landmark distance error)
  • LRA (lip-reading accuracy) 
  • FID (Fréchet inception distance)
  • LSE-D (Lip Sync Error – Distance)
  • LSE-C (Lip Sync Error – Confidence)
  • LPIPS (Learned Perceptual Image Patch Similarity) 
  • NIQE (Natural Image Quality Evaluator) 

发表评论

您的电子邮箱地址不会被公开。 必填项已用*标注