Generate Grasp Pose from Point Cloud Detection Results

Functional Description

This operator is used to generate pick poses for detected objects by combining 3D point cloud information and 2D image detection results. It first matches and aligns the input list of point clouds with the input list of 2D detection results. Then, it calculates the depth (Z-coordinate) of the pick point based on the aligned point cloud and determines the XY coordinates and rotation angle around the Z-axis of the pick point using the geometric information (such as the minimum bounding rectangle) of the 2D detection result, finally outputting poses based on the robot’s base coordinate system.

Use Cases

  • Robot Picking: In vision-guided robot picking tasks, it’s necessary to calculate precise 6D pick poses for identified objects so that the robot can pick them accurately.

  • Pose Estimation: Estimate the complete pose of an object in space by combining the category and position information from 2D detection with the depth and shape information from the 3D point cloud.

Inputs and Outputs

Input Items

Camera coordinate system point cloud: Input list of point clouds, typically point cloud clusters corresponding to the objects to be picked, obtained after filtering or clustering the point clouds extracted from detection results. Must be point clouds in the camera coordinate system and contain no NaN values.

Image: The original image (color or grayscale) corresponding to the point cloud, used to project the point cloud back to pixel coordinates.

Detection results: Output result list from a 2D detection operator, where each result contains information such as the object’s bounding box/polygon, score, category, angle, etc.

Camera internal reference: 3x3 camera intrinsic matrix.

Camera distortion: The camera’s distortion coefficient vector.

Hand-eye calibration matrix: 4x4 transformation matrix from the camera coordinate system to the robot’s base coordinate system.

Output Items

Grab Positional Information:

A list of calculated pick poses. Each element is a dictionary containing:

  • pose: Pick object pose.

  • score: The confidence score of the corresponding 2D detection result.

  • class_name: The category of the corresponding 2D detection result.

  • uuid: The unique identifier of the corresponding 2D detection result.

  • polygon: The polygon contour of the corresponding 2D detection result.

  • object_info: A dictionary containing information such as object dimensions and area. The dimensions and area here are actual dimensions and area, in mm and mm^2 respectively.

  • points_number: The index of the corresponding input point cloud in the original list.

Parameter Descriptions

Use detection result angle

Parameter Description

Determines whether the rotation angle (around the Z-axis) of the final output pose directly uses the angle provided by the 2D detection result or is recalculated based on the point cloud and the minimum bounding rectangle of the detection result.

Tuning Description

  • Disabled (default): Does not directly use the angle from the detection result. The operator calculates the line connecting the midpoints of the short sides of the detection result’s minimum bounding rectangle, projects it to the camera coordinate system, transforms it to the robot’s base coordinate system, and calculates the angle between this line and the X-axis of the base coordinate system as the Rz rotation of the final pose.

  • Enabled: Directly uses the angle value provided by each instance in the input "Detection results" as the Rz rotation of the final pose. This is generally only available for rotational detection; other detection results will have an angle of 0.

Alignment strategy

Parameter Description

Selects the method for matching and aligning the input point cloud clusters with the 2D detection results.

Tuning Description

  • Center Point Alignment (default): Calculates the distance between the center point of the point cloud projected onto the image and the center point of the detection result mask, selecting the one with the shortest distance for matching. Fast calculation, suitable for scenarios where objects are widely spaced and center points are clearly distinguishable.

  • Mask Intersection over Union (IoU) Alignment: Projects the point cloud onto the image to generate a mask, calculates its IoU with the detection result mask, and selects the one with the largest IoU for matching. Considers shape overlap and may be more robust for situations where objects are closely packed or partially occluded, but computationally more intensive.

Calculation method of the z value of the capture point

Parameter Description

Selects how to calculate the Z-coordinate value of the final pick pose based on the aligned point cloud cluster.

Tuning Description

  • Mean (default): Uses the average Z-coordinate of all points in the point cloud cluster. Suitable for good quality point clouds with few outliers.

  • Median: Uses the median Z-coordinate of all points in the point cloud cluster. Less sensitive to outliers, resulting in a more robust result; recommended when the point cloud has more noise.

  • Mask Center Circular Area Mean: At the center of the detection result’s minimum bounding rectangle, takes a circular area defined by "Circle radius scale" and calculates the mean Z-coordinate of the point cloud within that area.

Circle radius scale

Parameter Description

Effective when "Calculation method of the z value of the capture point" is set to "Mask Center Circular Area Mean". Defines the radius of the circular area used to calculate the Z-value, which is a ratio relative to half the length of the short side of the detection result’s minimum bounding rectangle.

Tuning Description

This parameter controls the size of the object surface area referenced when calculating the depth Z-value.

  • A smaller ratio means considering points in a very small area at the object’s center, which might be more stable but susceptible to noise.

  • A larger ratio considers points over a wider area, which might be smoother but will include more surface undulations.

Parameter Range

[0, 1], Default: 0.3