Skip to content

pollen-robotics/pollen-vision

Repository files navigation

Pollen vision library

Simple and unified interface to zero-shot computer vision models curated for robotics use cases.

demo

Check out our HuggingFace space for an online demo or try pollen-vision in a Colab notebook!

Get started in very few lines of code!

Perform zero-shot object detection and segmentation on a live video stream from your webcam with the following code:

import cv2

from pollen_vision.vision_models.object_detection import OwlVitWrapper
from pollen_vision.vision_models.object_segmentation import MobileSamWrapper
from pollen_vision.utils import Annotator, get_bboxes


owl = OwlVitWrapper()
sam = MobileSamWrapper()
annotator = Annotator()

cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    predictions = owl.infer(
        frame, ["paper cups"]
    )  # zero-shot object detection | put your classes here
    bboxes = get_bboxes(predictions)

    masks = sam.infer(frame, bboxes=bboxes)  # zero-shot object segmentation
    annotated_frame = annotator.annotate(frame, predictions, masks=masks)

    cv2.imshow("frame", annotated_frame)
    if cv2.waitKey(1) & 0xFF == ord("q"):
        cv2.destroyAllWindows()
        break

Supported models

We continue to work on adding new models that could be useful for robotics perception applications.

We chose to focus on zero-shot models to make it easier to use and deploy. Zero-shot models can recognize objects or segment them based on text queries, without needing to be fine-tuned on annotated datasets.

Right now, we support:

Object detection

  • Yolo-World for zero-shot object detection and localization
  • Owl-Vit for zero-shot object detection and localization
  • Recognize-Anything for zero-shot object detection (without localization)

Object segmentation

  • Mobile-SAM for (fast) zero-shot object segmentation

Monocular depth estimation

  • Depth Anything for (non metric) monocular depth estimation

Below is an example of combining Owl-Vit and Mobile-Sam to detect and segment objects in a point cloud, all live. (Note: in this example, there is no temporal or spatial filtering of any kind, we display the raw outputs of the models computed independently on each frame)

pc_segmentation_doc3-2024-02-26_17.07.20.mp4

We also provide wrappers for the Luxonis cameras which we use internally. They allow to easily access the main features that are interesting to our robotics applications (RBG-D, onboard h264 encoding and onboard stereo rectification).

Installation

Installation

Note: This package has been tested on Ubuntu 22.04 and macOS (with M1 Pro processor), with python3.10.

Git LFS

This repository uses Git LFS to store large files. You need to install it before cloning the repository.

Ubuntu

sudo apt-get install git-lfs

macOS

brew install git-lfs

One line installation

You can install the package directly from the repository without having to clone it first with:

pip install "pollen-vision[vision] @ git+https://github.com/pollen-robotics/pollen-vision.git@main"

Note: here we install the package with the vision extra, which includes the vision models. You can also install the depthai_wrapper extra to use the Luxonis depthai wrappers.

Install from source

Clone this repository and then install the package either in "production" mode or "dev" mode.

👉 We recommend using a virtual environment to avoid conflicts with other packages.

After cloning the repository, you can either install everything with:

pip install .[all]

or install only the modules you want:

pip install .[depthai_wrapper]
pip install .[vision]

To add "dev" mode dependencies (CI/CD, testing, etc):

pip install -e .[dev]

Luxonis depthai specific information

If this is the first time you use luxonis cameras on this computer, you need to setup the udev rules:

echo 'SUBSYSTEM=="usb", ATTRS{idVendor}=="03e7", MODE="0666"' | sudo tee /etc/udev/rules.d/80-movidius.rules
sudo udevadm control --reload-rules && sudo udevadm trigger
Gradio demo

Gradio demo

Test the demo online

A gradio demo is available on Pollen Robotics' Huggingface space. It allows to test the models on your own images without having to install anything.

Run the demo locally

If you want to run the demo locally, you can install the dependencies with the following command:

pip install pollen_vision[gradio]

You can then run the demo locally on your machine with:

python pollen-vision/gradio/app.py
Examples

Examples

Vision models wrappers

Check our example notebooks!

Luxonis depthai wrappers

Check our example scripts!

Twitter URL Linkedin URL