Hand-Object-Learning

Hand-Object Interaction Paper List

Paper List for CVPR2025

[CVPR’25]GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities

[CVPR’25]Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model

[CVPR’25]ManiVideo: Generating Hand-Object Manipulation Video with Dexterous and Generalizable Grasping

[CVPR’25]Roger: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation

[CVPR’25]DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness

[CVPR’25]ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic Grasping

[CVPR’25]Learning Physics-Based Full-Body Human Reaching and Grasping from Brief Walking References

[CVPR’25]UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping

[CVPR’25]LatentHOI: On the Generalizable Hand Object Motion Generation with Latent Hand Diffusion

[CVPR’25]How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions

[CVPR’25]Pose-Guided Temporal Enhancement for Robust Low-Resolution Hand Reconstruction

[CVPR’25]HOIGPT: Learning Long Sequence Hand-Object Interaction with Language Models

Paper List for 2025.1-2025.2

[25/2/4]Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation

[25/2/4]Can You Move These Over There? An LLM-based VR Mover for Supporting Object Manipulation

[25/2/1]RoboGrasp: A Universal Grasping Policy for Robust Robotic Control

[25/2/1]Dexterous Cable Manipulation: Taxonomy, Multi-Fingered Hand Design, and Long-Horizon Manipulation

[25/2/1]Shape from Semantics: 3D Shape Generation from Multi-View Semantics

[25/1/31]SampleLLM: Optimizing Tabular Data Synthesis in Recommendations

[25/1/29]Synthesizing Grasps and Regrasps for Complex Manipulation Tasks

[25/1/29]Hand-Object Contact Detection using Grasp Quality Metrics

[25/1/23]You Only Teach Once: Learn One-Shot Bimanual Robotic Manipulation from Video Demonstrations

[25/1/3]TexHOI: Reconstructing Textures of 3D Unknown Objects in Monocular Hand-Object Interaction Scenes

[25/1/1]DAViD: Modeling Dynamic Affordance of 3D Objects using Pre-trained Video Diffusion Models

[AAAI’25]Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics[paper]

[AAAI’25]Diffgrasp: Whole-Body Grasping Synthesis Guided by Object Motion Using a Diffusion Model[paper]

[AAAI’25]RAGG: Retrieval-Augmented Grasp Generation Model

[AAAI’25]Single-view Image to Novel-view Generation for Hand Object Interactions

[AAAI’25]Hand1000: Generating Realistic Hands from Text with Only 1,000 Images[paper]

[AAAI’25]HandDiffuse: Generative Controllers for Two-Hand Interactions via Diffusion Models[paper]

[AAAI’25]QORT-Former: Query-optimized Real-time Transformer for Understanding Two Hands Manipulating Objectscode

[AAAI’25]HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation[paper]

[AAAI’25]RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance[paper]

[AAAI’25]IDseq: Decoupled and Sequentially Detecting and Grounding Multi-modal Media Manipulation

[AAAI’25]PoseLLaVA: Pose Centric Multimodal LLM for Fine-Grained 3D Pose Manipulation

[ICLR’25 8866]Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions?[paper]

[ICLR’25 866]Pre-Training for 3D Hand Pose Estimation with Contrastive Learning on Large-Scale Hand Images in the Wild[paper]

[ICLR’25 8666]PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance[[paper]]

[ICLR’25 8666]Generation and Comprehension Hand-in-Hand: Vision-guided Expression Diffusion for Boosting Referring Expression Generation and Comprehension[paper]

[ICLR’25 666]DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image[paper]

[ICLR’25 6666]Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning[paper]

[ICLR’25 8888]Data Scaling Laws in Imitation Learning for Robotic Manipulation[paper]

[ICLR’25 8665]EC-Diffuser: Multi-Object Manipulation via Entity-Centric Behavior Generation[paper]

[ICLR’25 6666]GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation[paper]

Paper List for Accept Papers IN 2024

[NeurIPS’24]HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness

[NeurIPS’24]Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba

[NeurIPS’24]Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars

[NeurIPS’24]Grasp as You Say: Language-guided Dexterous Grasp Generation 7 6 5 5

[NeurIPS’24]Omnigrasp: Simulated Humanoid Grasping on Diverse Objects

[ACM MM’24]HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting

[ACM MM’24]Learning Context with Priors for 3D Interacting Hand-Object Pose Estimation

[ACM MM’24]Decoupling Heterogeneous Features for Robust 3D Interacting Hand Poses Estimation

[ACM MM’24]ClickDiff: Click to Induce Semantic Contact Map for Controllable Grasp Generation with Diffusion Models

[ECCV’24]Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics

[ECCV’24]Decomposed Vector-Quantized Variational Autoencoder for Human Grasp Generation

[ECCV’24]GraspXL: Generating Grasping Motions for Diverse Objects at Scale

[ECCV’24]Parameterized Quasi-Physical Simulators for Dexterous Manipulations Transfer

[ECCV’24 Oral]UGG: Unified Generative Grasping

[ECCV’24 Oral]SemGrasp: Semantic Grasp Generation via Language Aligned Discretization

[CVPR’24]Dexterous Grasp Transformer

[CVPR’24]Single-View Scene Point Cloud Human Grasp Generation

[CVPR’24]Physics-Aware Hand-Object Interaction Denoising

[CVPR’24]Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction

[CVPR’24]BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics

[CVPR’24]GEARS: Local Geometry-aware Hand-object Interaction Synthesis

[CVPR’24]HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances

[CVPR’24]BiTT: Bi-directional Texture Reconstruction of Interacting Two Hands from a Single Image

[CVPR’24]InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion

[CVPR’24]GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation Demonstration and Imitation

[CVPR’24]HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data

[CVPR’24]OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion

[CVPR’24]G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis

[CVPR’24]TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding

[IJCAI’24]RealDex: Towards Human-like Grasping for Robotic Dexterous Hand

[3DV’25]FastGrasp: Efficient Grasp Synthesis with Diffusion

[CoRL’24]Towards Open-World Grasping with Large Vision-Language Models

[WACV’25]A Versatile and Differentiable Hand-Object Interaction Representation

[RAL’24-12]Bimanual Grasp Synthesis for Dexterous Robot Hands

Paper List for Arxiv IN 2024.7-2024.12

[24/12/28]SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis

[24/12/25]GeoMatch++: Morphology Conditioned Geometry Matching for Multi-Embodiment Grasping

[24/12/22]DreamOmni: Unified Image Generation and Editing

[24/12/21]BODex: Scalable and Efficient Robotic Dexterous Grasp Synthesis Using Bilevel Optimization

[24/12/20]Dexterous Manipulation Based on Prior Dexterous Grasp Pose Knowledge

[24/12/18]ManiVideo: Generating Hand-Object Manipulation Video with Dexterous and Generalizable Grasping

[24/12/18]Embedding high-resolution touch across robotic hands enables adaptive human-like grasping

[24/12/18]Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera

[24/12/14]Grasp What You Want: Embodied Dexterous Grasping System Driven by Your Voice

[24/12/11]Multi-GraspLLM: A Multimodal LLM for Multi-Hand Semantic Guided Grasp Generation

[24/12/11]Grasp Diffusion Network: Learning Grasp Generators from Partial Point Clouds with Diffusion Models in SO(3)xR3

[24/12/10]Stereo Hand-Object Reconstruction for Human-to-Robot Handover

[24/12/6]BimArt: A Unified Approach for the Synthesis of 3D Bimanual Interaction with Articulated Objects

[24/12/4]GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities

[24/11/28]HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos

[24/11/14]UniHOI: Learning Fast, Dense and Generalizable 4D Reconstruction for Egocentric Hand Object Interaction Videos

[24/11/21]EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild

[24/11/14]Dynamic Reconstruction of Hand-Object Interaction with Distributed Force-aware Contact Representation

[24/10/16]GraspDiffusion: Synthesizing Realistic Whole-body Hand-Object Interaction

[24/10/10]RegionGrasp: A Novel Task for Contact Region Controllable Hand Grasp Generation

[24/9/14]ManiDext: Hand-Object Manipulation Synthesis via Continuous Correspondence Embeddings and Residual-Guided Diffusion

[24/9/12]Hand-Object Interaction Pretraining from Videos

[24/7/31]EAR: Phrase-Based Hand-Object Interaction Anticipation

[24/8/29]3D Whole-body Grasp Synthesis with Directional Controllability

[24/7/13]DexGrasp-Diffusion: Diffusion-based Unified Functional Grasp Synthesis Method for Multi-Dexterous Robotic Hands

[24/6/27]Core4d: A 4d human-object-human interaction dataset for collaborative object rearrangement

Affordance Paper List for Accept Papers IN 2024

[RSS’24]HRP: Human Affordances for Robotic Pre-Training

[ECCV’24]Text2Place: Affordance-aware Text Guided Human Placement

[ECCV’24]AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation

[CVPR’24]An Interactive Navigation Method with Effect-oriented Affordance

[CVPR’24]Move as You Say Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

[CVPR’24]LASO: Language-guided Affordance Segmentation on 3D Object

[CVPR’24]One-Shot Open Affordance Learning with Foundation Models

[NeurIPS’24]GAMap: Zero-Shot Object Goal Navigation with Multi-Scale Geometric-Affordance Guidance

Affordance Paper List for ARXIV IN 2024.7-2024.12

[ARXIV’24/12]Improving Vision-Language-Action Models via Chain-of-Affordance

[ARXIV’24/12]Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion

[ARXIV’24/12]ManipGPT: Is Affordance Segmentation by Large Vision Models Enough for Articulated Object Manipulation?

[ARXIV’24/12]GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency

[ARXIV’24/12]TB-HSU: Hierarchical 3D Scene Understanding with Contextual Affordances

[ARXIV’24/12]AffordDP: Generalizable Diffusion Policy with Transferable Affordance

[ARXIV’24/12]SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model

[ARXIV’24/11]GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding

[ARXIV’24/11]GLOVER: Generalizable Open-Vocabulary Affordance Reasoning for Task-Oriented Grasping

[ARXIV’24/11]RT-Affordance: Affordances are Versatile Intermediate Representations for Robot Manipulation

[ARXIV’24/10]PAVLM: Advancing Point Cloud based Affordance Understanding Via Vision-Language Model

[ARXIV’24/9]UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models

[ARXIV’24/9]PLATO: Planning with LLMs and Affordances for Tool Manipulation

[ARXIV’24/9]Affordance-based Robot Manipulation with Flow Matching

[ARXIV’24/8]Learning Precise Affordances from Egocentric Videos for Robotic Manipulation

Affordance for Hand