Thanks for the proposed solution. In our application we would need:
(1) marker tracking
(2) Main camera for MR capture
(3) CV camera for timestamps and camera transform and intrinsics
So probably we cannot apply your solution here. Fortunately we do not need to constantly track the marker so we decided to disable marker tracking before enabling video capture, and so far it seems to work fine. Several small questions I have that's related:
- With some simple tests I was kind of convinced that the CV camera callback contains almost everything we need to compute the world to screen point conversion (transform, FOV seems fine, and we only need to specify width and height as the Main (MR capture) camera width and height in the following code). However, when it comes to principal points, I think they are around (width/2, height/2) of any camera resolution specified but have some slight offsets from time to time. Since we are not super strict about the accuracy of the projected gaze points, I suppose using (width/2, height/2) seems reasonable?
public static Vector2 WorldPointToPixel(Vector3 worldPoint, int width, int height, MLCameraBase.IntrinsicCalibrationParameters parameters, Matrix4x4 cameraTransformationMatrix)
{
// Step 1: Convert the world space point to camera space
Vector3 cameraSpacePoint = cameraTransformationMatrix.inverse.MultiplyPoint(worldPoint);
// Step 2: Project the camera space point onto the normalized image plane
Vector2 normalizedImagePoint = new Vector2(cameraSpacePoint.x / cameraSpacePoint.z, cameraSpacePoint.y / cameraSpacePoint.z);
// Step 3: Adjust for FOV
float verticalFOVRad = parameters.FOV * Mathf.Deg2Rad;
float aspectRatio = width / (float)height;
float horizontalFOVRad = 2 * Mathf.Atan(Mathf.Tan(verticalFOVRad / 2) * aspectRatio);
// float horizontalFOVRad = 2 * Mathf.Atan(Mathf.Tan(verticalFOVRad / 2));
normalizedImagePoint.x /= Mathf.Tan(horizontalFOVRad / 2);
normalizedImagePoint.y /= Mathf.Tan(verticalFOVRad / 2);
// Step 4: Convert normalized image coordinates to pixel coordinates
// Vector2 pixelPosition = new Vector2(
// normalizedImagePoint.x * width + parameters.PrincipalPoint.x,
// normalizedImagePoint.y * height + parameters.PrincipalPoint.y
// );
Vector2 pixelPosition = new Vector2(
normalizedImagePoint.x * width + width / 2,
normalizedImagePoint.y * height + height / 2
);
return pixelPosition;
}
- This might seem dumb, but for the "vertical" resolution options for the MR capture (e.g., 648 * 720, 972 * 1080), are they essential a sub part of those "horizontal" resolution options (e.g., 960 * 720, 1440 * 1080)? I'm asking since it appears to me this is the case with some camera shots, and the
FOVreturned is always 75 regardless of the direction. Is this expected and there's no real "extended vertical FOV" if we choose the more "vertical" resolutions?
Thanks!