[CVPR’25]GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities
[CVPR’25]Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model
[CVPR’25]ManiVideo: Generating Hand-Object Manipulation Video with Dexterous and Generalizable Grasping
[CVPR’25]Roger: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation
[CVPR’25]DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness
[CVPR’25]ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic Grasping
[CVPR’25]Learning Physics-Based Full-Body Human Reaching and Grasping from Brief Walking References
[CVPR’25]UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping
[CVPR’25]LatentHOI: On the Generalizable Hand Object Motion Generation with Latent Hand Diffusion
[CVPR’25]How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions
[CVPR’25]Pose-Guided Temporal Enhancement for Robust Low-Resolution Hand Reconstruction
[CVPR’25]HOIGPT: Learning Long Sequence Hand-Object Interaction with Language Models
[25/2/4]Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation
[25/2/4]Can You Move These Over There? An LLM-based VR Mover for Supporting Object Manipulation
[25/2/1]RoboGrasp: A Universal Grasping Policy for Robust Robotic Control
[25/2/1]Dexterous Cable Manipulation: Taxonomy, Multi-Fingered Hand Design, and Long-Horizon Manipulation
[25/2/1]Shape from Semantics: 3D Shape Generation from Multi-View Semantics
[25/1/31]SampleLLM: Optimizing Tabular Data Synthesis in Recommendations
[25/1/29]Synthesizing Grasps and Regrasps for Complex Manipulation Tasks
[25/1/29]Hand-Object Contact Detection using Grasp Quality Metrics
[25/1/23]You Only Teach Once: Learn One-Shot Bimanual Robotic Manipulation from Video Demonstrations
[25/1/3]TexHOI: Reconstructing Textures of 3D Unknown Objects in Monocular Hand-Object Interaction Scenes
[25/1/1]DAViD: Modeling Dynamic Affordance of 3D Objects using Pre-trained Video Diffusion Models
[AAAI’25]Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics[paper]
[AAAI’25]Diffgrasp: Whole-Body Grasping Synthesis Guided by Object Motion Using a Diffusion Model[paper]
[AAAI’25]RAGG: Retrieval-Augmented Grasp Generation Model
[AAAI’25]Single-view Image to Novel-view Generation for Hand Object Interactions
[AAAI’25]Hand1000: Generating Realistic Hands from Text with Only 1,000 Images[paper]
[AAAI’25]HandDiffuse: Generative Controllers for Two-Hand Interactions via Diffusion Models[paper]
[AAAI’25]QORT-Former: Query-optimized Real-time Transformer for Understanding Two Hands Manipulating Objectscode
[AAAI’25]HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation[paper]
[AAAI’25]RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance[paper]
[AAAI’25]IDseq: Decoupled and Sequentially Detecting and Grounding Multi-modal Media Manipulation
[AAAI’25]PoseLLaVA: Pose Centric Multimodal LLM for Fine-Grained 3D Pose Manipulation
[ICLR’25 8866]Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions?[paper]
[ICLR’25 866]Pre-Training for 3D Hand Pose Estimation with Contrastive Learning on Large-Scale Hand Images in the Wild[paper]
[ICLR’25 8666]PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance[[paper]]
[ICLR’25 8666]Generation and Comprehension Hand-in-Hand: Vision-guided Expression Diffusion for Boosting Referring Expression Generation and Comprehension[paper]
[ICLR’25 666]DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image[paper]
[ICLR’25 6666]Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning[paper]
[ICLR’25 8888]Data Scaling Laws in Imitation Learning for Robotic Manipulation[paper]
[ICLR’25 8665]EC-Diffuser: Multi-Object Manipulation via Entity-Centric Behavior Generation[paper]
[ICLR’25 6666]GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation[paper]
[NeurIPS’24]HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness
[NeurIPS’24]Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba
[NeurIPS’24]Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars
[NeurIPS’24]Grasp as You Say: Language-guided Dexterous Grasp Generation 7 6 5 5
[NeurIPS’24]Omnigrasp: Simulated Humanoid Grasping on Diverse Objects
[ACM MM’24]HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting
[ACM MM’24]Learning Context with Priors for 3D Interacting Hand-Object Pose Estimation
[ACM MM’24]Decoupling Heterogeneous Features for Robust 3D Interacting Hand Poses Estimation
[ACM MM’24]ClickDiff: Click to Induce Semantic Contact Map for Controllable Grasp Generation with Diffusion Models
[ECCV’24]Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics
[ECCV’24]Decomposed Vector-Quantized Variational Autoencoder for Human Grasp Generation
[ECCV’24]GraspXL: Generating Grasping Motions for Diverse Objects at Scale
[ECCV’24]Parameterized Quasi-Physical Simulators for Dexterous Manipulations Transfer
[ECCV’24 Oral]UGG: Unified Generative Grasping
[ECCV’24 Oral]SemGrasp: Semantic Grasp Generation via Language Aligned Discretization
[CVPR’24]Dexterous Grasp Transformer
[CVPR’24]Single-View Scene Point Cloud Human Grasp Generation
[CVPR’24]Physics-Aware Hand-Object Interaction Denoising
[CVPR’24]Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction
[CVPR’24]BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics
[CVPR’24]GEARS: Local Geometry-aware Hand-object Interaction Synthesis
[CVPR’24]HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances
[CVPR’24]BiTT: Bi-directional Texture Reconstruction of Interacting Two Hands from a Single Image
[CVPR’24]InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion
[CVPR’24]GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation Demonstration and Imitation
[CVPR’24]HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data
[CVPR’24]OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion
[CVPR’24]G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis
[CVPR’24]TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding
[IJCAI’24]RealDex: Towards Human-like Grasping for Robotic Dexterous Hand
[3DV’25]FastGrasp: Efficient Grasp Synthesis with Diffusion
[CoRL’24]Towards Open-World Grasping with Large Vision-Language Models
[WACV’25]A Versatile and Differentiable Hand-Object Interaction Representation
[RAL’24-12]Bimanual Grasp Synthesis for Dexterous Robot Hands
[24/12/28]SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis
[24/12/25]GeoMatch++: Morphology Conditioned Geometry Matching for Multi-Embodiment Grasping
[24/12/22]DreamOmni: Unified Image Generation and Editing
[24/12/21]BODex: Scalable and Efficient Robotic Dexterous Grasp Synthesis Using Bilevel Optimization
[24/12/20]Dexterous Manipulation Based on Prior Dexterous Grasp Pose Knowledge
[24/12/18]ManiVideo: Generating Hand-Object Manipulation Video with Dexterous and Generalizable Grasping
[24/12/18]Embedding high-resolution touch across robotic hands enables adaptive human-like grasping
[24/12/18]Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera
[24/12/14]Grasp What You Want: Embodied Dexterous Grasping System Driven by Your Voice
[24/12/11]Multi-GraspLLM: A Multimodal LLM for Multi-Hand Semantic Guided Grasp Generation
[24/12/11]Grasp Diffusion Network: Learning Grasp Generators from Partial Point Clouds with Diffusion Models in SO(3)xR3
[24/12/10]Stereo Hand-Object Reconstruction for Human-to-Robot Handover
[24/12/6]BimArt: A Unified Approach for the Synthesis of 3D Bimanual Interaction with Articulated Objects
[24/12/4]GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities
[24/11/28]HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos
[24/11/14]UniHOI: Learning Fast, Dense and Generalizable 4D Reconstruction for Egocentric Hand Object Interaction Videos
[24/11/21]EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild
[24/11/14]Dynamic Reconstruction of Hand-Object Interaction with Distributed Force-aware Contact Representation
[24/10/16]GraspDiffusion: Synthesizing Realistic Whole-body Hand-Object Interaction
[24/10/10]RegionGrasp: A Novel Task for Contact Region Controllable Hand Grasp Generation
[24/9/14]ManiDext: Hand-Object Manipulation Synthesis via Continuous Correspondence Embeddings and Residual-Guided Diffusion
[24/9/12]Hand-Object Interaction Pretraining from Videos
[24/7/31]EAR: Phrase-Based Hand-Object Interaction Anticipation
[24/8/29]3D Whole-body Grasp Synthesis with Directional Controllability
[24/7/13]DexGrasp-Diffusion: Diffusion-based Unified Functional Grasp Synthesis Method for Multi-Dexterous Robotic Hands
[24/6/27]Core4d: A 4d human-object-human interaction dataset for collaborative object rearrangement
[RSS’24]HRP: Human Affordances for Robotic Pre-Training
[ECCV’24]Text2Place: Affordance-aware Text Guided Human Placement
[ECCV’24]AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation
[CVPR’24]An Interactive Navigation Method with Effect-oriented Affordance
[CVPR’24]Move as You Say Interact as You Can: Language-guided Human Motion Generation with Scene Affordance
[CVPR’24]LASO: Language-guided Affordance Segmentation on 3D Object
[CVPR’24]One-Shot Open Affordance Learning with Foundation Models
[NeurIPS’24]GAMap: Zero-Shot Object Goal Navigation with Multi-Scale Geometric-Affordance Guidance
[ARXIV’24/12]Improving Vision-Language-Action Models via Chain-of-Affordance
[ARXIV’24/12]Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion
[ARXIV’24/12]ManipGPT: Is Affordance Segmentation by Large Vision Models Enough for Articulated Object Manipulation?
[ARXIV’24/12]GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency
[ARXIV’24/12]TB-HSU: Hierarchical 3D Scene Understanding with Contextual Affordances
[ARXIV’24/12]AffordDP: Generalizable Diffusion Policy with Transferable Affordance
[ARXIV’24/12]SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model
[ARXIV’24/11]GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding
[ARXIV’24/11]GLOVER: Generalizable Open-Vocabulary Affordance Reasoning for Task-Oriented Grasping
[ARXIV’24/11]RT-Affordance: Affordances are Versatile Intermediate Representations for Robot Manipulation
[ARXIV’24/10]PAVLM: Advancing Point Cloud based Affordance Understanding Via Vision-Language Model
[ARXIV’24/9]UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
[ARXIV’24/9]PLATO: Planning with LLMs and Affordances for Tool Manipulation
[ARXIV’24/9]Affordance-based Robot Manipulation with Flow Matching
[ARXIV’24/8]Learning Precise Affordances from Egocentric Videos for Robotic Manipulation
[RSS’24]HRP: Human Affordances for Robotic Pre-Training
[ECCV’24]Text2Place: Affordance-aware Text Guided Human Placement
[CVPR’24]Move as You Say Interact as You Can: Language-guided Human Motion Generation with Scene Affordance
[CVPR’24]LASO: Language-guided Affordance Segmentation on 3D Object
[ARXIV’24/12]SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model
[ARXIV’24/12]ManipGPT: Is Affordance Segmentation by Large Vision Models Enough for Articulated Object Manipulation?
[ARXIV’24/12]GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency
[ARXIV’24/11]GLOVER: Generalizable Open-Vocabulary Affordance Reasoning for Task-Oriented Grasping
[ICLR’25]Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions? 8 8 6 6
[ICLR’25]Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning 6 6 6 6
[ICLR’25]HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction 8 6 5*
[NeurIPS’24]Grasp as You Say: Language-guided Dexterous Grasp Generation 7 6 5 5
[ECCV’24 Oral]SemGrasp: Semantic Grasp Generation via Language Aligned Discretization
[CVPR’24]Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction
[CVPR’24]BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics
[CoRL’24]Towards Open-World Grasping with Large Vision-Language Models
[24/12/11]Multi-GraspLLM: A Multimodal LLM for Multi-Hand Semantic Guided Grasp Generation
[CVPR’24]InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion
[CVPR’24]OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion
[CVPR’24]TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding
[RAL’24-12]Bimanual Grasp Synthesis for Dexterous Robot Hands
[24/12/6]BimArt: A Unified Approach for the Synthesis of 3D Bimanual Interaction with Articulated Objects
[24/12/4]GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities
[CVPR’23]ARCTIC: A dataset for dexterous bimanual handobject manipulation
[ICCV’21]H2O: Two hands manipulating objects for first person interaction recognition