kinglier 技术收纳箱 / 计算机视觉 2022-03-01 2023-01-05

papers & codes

2023

title	–	paper	code	dataset	keywords
Audio-Visual Face Reenactment	WACV (23)	paper	code

2022

title	–	paper	code	dataset	keywords
Compressing Video Calls using Synthetic Talking Heads	BMVC (22)	paper			application
EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model	SIGGRAPH (22)	paper			emotion
Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis	ECCV(22)	paper	code
Expressive Talking Head Generation with Granular Audio-Visual Control	CVPR(22)	paper	–
Talking Face Generation With Multilingual TTS	CVPR(22)	paper	code	–	–
Deep Learning for Visual Speech Analysis: A Survey	–	paper	–	–	survey
StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN	–	paper	code	–	stylegan
Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation	ECCV(22)	paper	code(coming soon)		NeRF
Cross-Modal Mutual Learning for Audio-Visual Speech Recognition and Manipulation	–	paper	–	–	–
SyncTalkFace: Talking Face Generation with Precise Lip-syncing via Audio-Lip Memory	AAAI(22)	paper(temp)	–	LRW, LRS2, BBC News	–
DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering		paper			NeRF
Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos		paper
Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions		paper
DialogueNeRF: Towards Realistic Avatar Face-to-face Conversation Video Generation		paper
Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion		paper
StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation	–	paper	–	–

2022

title	–	paper	code	dataset	keywords
EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model	SIGGRAPH (22)	paper		emotion
Expressive Talking Head Generation with Granular Audio-Visual Control	CVPR(22)	paper	–
Deep Learning for Visual Speech Analysis: A Survey	–	paper	–	–	survey
StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN	–	paper	code	–	stylegan
Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation	–	paper	code(coming soon)		NeRF
Cross-Modal Mutual Learning for Audio-Visual Speech Recognition and Manipulation	–	paper	–	–	–
SyncTalkFace: Talking Face Generation with Precise Lip-syncing via Audio-Lip Memory	AAAI(22)	paper(temp)	–	LRW, LRS2, BBC News	–
DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering		paper			NeRF
Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos		paper
Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions		paper
DialogueNeRF: Towards Realistic Avatar Face-to-face Conversation Video Generation		paper
Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion		paper

2021

title	–	paper	code	dataset
Parallel and High-Fidelity Text-to-Lip Generation		paper
[Survey]Deep Person Generation: A Survey from the Perspective of Face, Pose and Cloth Synthesis	–	paper	–	–
FaceFormer: Speech-Driven 3D Facial Animation with Transformers	CVPR(22)	paper	code	–
Voice2Mesh: Cross-Modal 3D Face Model Generation from Voices	–	paper	code	–
FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning	ICCV	paper	code	–
Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis	–	paper	code	–
Audio-Driven Emotional Video Portraits	CVPR	paper	code	MEAD, LRW
LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization	CVPR	paper	–	–
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation	CVPR	paper	code	VoxCeleb2, LRW
Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset	CVPR	paper	code	HDTF
MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement	ICCV	paper	code(coming soon)	–
AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis	ICCV	paper	code	–
Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation	AAAI	paper	code(coming soon)	Mocap dataset
Visual Speech Enhancement Without A Real Visual Stream	–	paper	–	–
Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary	–	paper	code	–
Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion	IJCAI	paper	code	VoxCeleb, GRID, LRW
3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head	–	paper	–	–
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person	–	paper	–	VoxCeleb2, Obama

2020

title	–	paper	code	dataset
What comprises a good talking-head video generation?: A Survey and Benchmark	–	paper	code	–
Speech Driven Talking Face Generation from a Single Image and an Emotion Condition	–	paper	code	CREMA-D
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild	ACMMM	paper	code	LRS2
Talking-head Generation with Rhythmic Head Motion	ECCV	paper	code	Crema, Grid, Voxceleb, Lrs3
MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation	ECCV	paper	code	VoxCeleb2, AffectNet
Neural voice puppetry:Audio-driven facial reenactment	ECCV	paper	–	–
Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars	ECCV	paper	code	–
HeadGAN:Video-and-Audio-Driven Talking Head Synthesis	–	paper	–	VoxCeleb2
MakeItTalk: Speaker-Aware Talking Head Animation	–	paper	code, code	VoxCeleb2, VCTK
Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose	–	paper	code	ImageNet, FaceWarehouse, LRW
Photorealistic Lip Sync with Adversarial Temporal Convolutional Networks	–	paper	–	–
SPEECH-DRIVEN FACIAL ANIMATION USING POLYNOMIAL FUSION OF FEATURES	–	paper	–	LRW
Animating Face using Disentangled Audio Representations	WACV	paper	–
Everybody’s Talkin’: Let Me Talk as You Want	–	paper	–	–
Multimodal Inputs Driven Talking Face Generation With Spatial-Temporal Dependency	–	paper	–	–
Speech Driven Talking Face Generation from a Single Image and an Emotion Condition	–	paper	–	–

2019

title	–	paper	code	dataset
Hierarchical Cross-Modal Talking Face Generation with Dynamic Pixel-Wise Loss	CVPR	paper	code	VGG Face, LRW

datasets

MEAD link
HDTF link
CREMA-D link
VoxCeleb link
LRS2 link
LRW link
GRID link
BIWI link
SAVEE link

metrics

PSNR (peak signal-to-noise ratio)
SSIM (structural similarity index measure)
LMD (landmark distance error)
LRA (lip-reading accuracy) –
FID (Fréchet inception distance)
LSE-D (Lip Sync Error – Distance)
LSE-C (Lip Sync Error – Confidence)
LPIPS (Learned Perceptual Image Patch Similarity) –
NIQE (Natural Image Quality Evaluator) –

发表评论取消回复