Language-Conditioned Semantic Search-Based Policy for Robotic Manipulation Tasks

Abstract

Reinforcement learning and Imitation Learning approaches utilize policy learning strategies that are difficult to generalize well with just a few examples of a task. In this work, we propose a language-conditioned semantic search-based method to produce an online search-based policy from the available demonstration dataset of state-action trajectories. Here we directly acquire actions from the most similar manipulation trajectories found in the dataset. Our approach surpasses the performance of the baselines on the CALVIN benchmark and exhibits strong zero-shot adaptation capabilities. This holds great potential for expanding the use of our online search-based policy approach to tasks typically addressed by Imitation Learning or Reinforcement Learning-based policies.

Framework

Overview of our framework. We obtain a binary mask of the object of interest in the static and gripper camera views and then find the most similar state in the dataset and start cloning the corresponding actions.

Environments

Overview of all four different environments in CALVIN. During inference, the Search-Based Policy searches for the most similar state in environments A, B and C with respect to the current state from environment D.

Language Embeddings

Visualization of the clustered natural language instructions with PCA in a 2D space. For clustering, we fit K-Means to the train embeddings of size 768 generated by the fine-tuned GTE model and set the number of clusters to k, where k represents the total number of tasks, 34. Each data point represents a unique natural language instruction corresponding to a task, and the cluster labels denote the respective tasks.

Train

Test

BibTeX

@article{sheikh2023sbp,
  title     = {Language-Conditioned Semantic Search-Based Policy for Robotic Manipulation Tasks},
  author    = {Sheikh, Jannik and Melnik, Andrew and Nandi, Gora Chand and Haschke, Robert},
  journal   = {arXiv preprint arXiv:2312.05925},
  year      = {2023},
}

Language-Conditioned Semantic Search-Based Policyfor Robotic Manipulation Tasks