Getting ScreenSpace Coordinates from World Coordinates

I have the coordinates of a square in the world space and want to convert it to screenspace (is it screenspace? I'm confused about which space is what we see when wearing a headset) so that I could cut it from the camera feed and use it for further processing. I'm currently using Camera.main.WorldToScreenPoint from Unity and there are significant shifts on the height axis. By doing so which of the cameras am I using and is it the best workflow for me to implement it? Can I do it with MagicLeap's CV Camera?

Any help would be appreciated. Thanks in advance!

@zhhqu0305 You will need to use the RGB Camera's intrinsic and extrinsic values to raycast from pixel to world point properly.

You can see a simple implementation here:

Thanks for your reply! It's a helpful implementation, however in my case I only need to cast from world point to pixel space, in which case I suppose Camera.main.WorldToScreenPoint should do the work. The thing is I'm using Vuforia camera and the VuforiaBehaviour.Instance.CameraDevice.GetCameraImage(PIXEL_FORMAT) method is returning a 1920*1080 array but the pixelwidth and pixelheight of the camera from Camera.main.pixelWidth(Height) are actually both 1's when I print them out. This, on the height dimension, is making the screen space coordinates subtle, since for example if I have the four corners of a square in world space as:

08-14 13:03:28.468380  7105  7126 I Unity   : S2: topLeft(0.66, -0.02, 0.16)
08-14 13:03:28.468406  7105  7126 I Unity   : S2: topRight(0.74, -0.02, 0.07)
08-14 13:03:28.468431  7105  7126 I Unity   : S2: bottomLeft(0.66, -0.13, 0.15)
08-14 13:03:28.468456  7105  7126 I Unity   : S2: bottomRight(0.73, -0.13, 0.07)

The corresponding screen space coordinates I would get are:

08-14 13:03:28.468495  7105  7126 I Unity   : S5: topLeft(0.44, 0.59, 0.51)
08-14 13:03:28.468520  7105  7126 I Unity   : S5: topRight(0.63, 0.58, 0.52)
08-14 13:03:28.468545  7105  7126 I Unity   : S5: bottomLeft(0.43, 0.40, 0.53)
08-14 13:03:28.468571  7105  7126 I Unity   : S5: bottomRight(0.62, 0.39, 0.54)

So the distance on height and width dimension are essentially the same, which did not account for the 1920*1080 actual camera image captured. I have some trouble getting the correct position of these coornidates in the screen space -- do you think I should switch the post to the Vuforia channnel? Or are you aware of other methods that do not involve using the vuforia camera but could get the job done?

Thanks!

We are investigating a bug that cause the device's screen width and height to not be reported correctly. If you need the screens size, I recommend using Unity's XRSettings.eyeTextureHeight and XRSettings.eyeTextureWidth,

You might also want to use the ViewPortPosition instead and then use those normalized values instead of using the pixel position directly.

Converting world space points to pixel space involves projecting the 3D point onto the 2D image plane defined by the camera's intrinsic parameters. This is essentially the reverse process of the raycasting operation we discussed earlier.

Here is an untested function that converts a world space point to a pixel point based on the Magic Leap CV camera image. Hopefully this will help you get started:

public Vector2 WorldPointToPixel(Vector3 worldPoint, int width, int height, MLCameraBase.IntrinsicCalibrationParameters parameters, Matrix4x4 cameraTransformationMatrix)
{
    // Step 1: Convert the world space point to camera space
    Vector3 cameraSpacePoint = cameraTransformationMatrix.inverse.MultiplyPoint(worldPoint);

    // Step 2: Project the camera space point onto the normalized image plane
    Vector2 normalizedImagePoint = new Vector2(cameraSpacePoint.x / cameraSpacePoint.z, cameraSpacePoint.y / cameraSpacePoint.z);

    // Step 3: Adjust for FOV
    float verticalFOVRad = parameters.FOV * Mathf.Deg2Rad;
    float aspectRatio = width / (float)height;
    float horizontalFOVRad = 2 * Mathf.Atan(Mathf.Tan(verticalFOVRad / 2) * aspectRatio);

    normalizedImagePoint.x /= Mathf.Tan(horizontalFOVRad / 2);
    normalizedImagePoint.y /= Mathf.Tan(verticalFOVRad / 2);

    // Step 4: Convert normalized image coordinates to pixel coordinates
    Vector2 pixelPosition = new Vector2(
        normalizedImagePoint.x * width + parameters.PrincipalPoint.x,
        normalizedImagePoint.y * height + parameters.PrincipalPoint.y
    );

    return pixelPosition;
}

This function first transforms the world space point into camera space, then projects it onto the normalized image plane. After adjusting for FOV and aspect ratio, it finally transforms the normalized coordinates into pixel coordinates using the camera's intrinsic parameters.

Thanks for this code example! I'm trying to test it right now, but still I'm a bit worried that the screen size I retrieved by XRSettings.eyeTextureHeight(Width) is 1440 * 1760, which is still different from the vuforia camera image captured (1920 * 1080). At this point, I guess I would have to first switch to image capture using MLCamera? In that case, is there a way for me to access the camera output besides RawVideoFrameAvailable or OnCaptureDataReceived? I do need to crop the square out of the camera data, and I don't need to do it more than once per second.

Also I noticed that For video capture there are 15, 30 and 60 FPS options, but what for OnCaptureDataReceived`? Which one would you recommend using?

Thanks!

I might have misunderstood the initial question. Are you trying to convert the world coordinate to Screen Space position or are you trying to covert it to the pixel position from the Camera Image captured through the RGB camera (the same camera used by the vuforia plugin)?

If you are trying to get the pixel position based on the RGB camera image, note that only one process can access a stream at a time. The Vuforia plugin uses the CV camera stream, so your application will use the Main Camera to capture the images. Alternately, you can try to capture an image from the CV camera stream instead of receiving the video frames.

Here is a simple script to capture Main camera video frames: Simple Camera Example | MagicLeap Developer Documentation

Wait so is Vuforia using the RGB camera or the CV camera?

I'm currently using VuforiaBehaviour.Instance.CameraDevice.GetCameraImage to obtain the image. I assume that this comes from the CV camera stream if that's what you mean? The problem is by using that method I'm getting a 1920*1080 array while the screenpoints I get by Camera.main.WorldToScreenPoint are still normalized values (since Camera.main.pixelWidth(Height) are both 1's), and therefore I cannot get the correct pixel positions of the four corners.

I'm now trying to, according to the code snippets you provided, use ML Camera instead since I only need the cropped image. After you pointed this out, maybe I should also be able to use Vuforia camera intrinsics to get the pixel position, and proceed with that? I attached a screenshot of vuforia cameradevice intrinsics and maybe it could also do the job? So the problem comes from Camera.main.WorldToScreenPoint not doing its job correctly?


Seems that there are all the required components but I couldn't get the camera transformation as by MLCVCamera.GetFramePose(resultExtras.VCamTimestamp, out Matrix4x4 cameraTransform). Is this the same as Camera.main.worldToCameraMatrix?

Also may I ask what's the difference between the CV camera and the main camera? there seems to be some setting difference but are the streams different?

Regarding the separate image streams:

See the MLCamera Overview guide for information about the cameras. It is the same physical camera that can be accessed by two processes at the same time.

Regarding the Image size:

I assume Vuforia is using the CV camera stream and capturing images at the 1920x1080 resolution.

Regarding the Vuforia Camera:

I'm not familiar with Vuforia but if they provide a separate Unity Camera that has the same intrinsic parameters as the physical camera it is using to capture the images, you can get the pixel position using the WorldToViewportPoint function. Then multiply the normalized values by the known resolution (1920x1080)

  publicVector2 WorldToPixelPoint(Camera camera, Vector3 worldPosition, Vector2 resolution)
    {
        // Convert world position to viewport position
        Vector3 viewportPos = camera.WorldToViewportPoint(worldPosition);

        // Scale viewport position with the specified resolution to get pixel position
        return new Vector2(viewportPos.x * resolution.x, viewportPos.y * resolution.y);
    }

I'm not sure if Vuforia sets the camera's aspect ratio to match the resolution of the image or if it's set to match the aspect ratio of the display. If the aspect ratio does not match,the pixel positions might be offset and you may need to consider adding additional logic that scales the view port position accordingly. For example.

float cameraAspectRatio = camera.aspect;
        float targetAspectRatio = (float)resolution.x / resolution.y;

        // Adjust the viewport position based on the difference in aspect ratios
        if (cameraAspectRatio > targetAspectRatio)
        {
            viewportPos.x *= targetAspectRatio / cameraAspectRatio;
        }
        else
        {
            viewportPos.y *= cameraAspectRatio / targetAspectRatio;
        }

If the camera.aspect ratio does not return the correct aspect ratio then consider using the eyeTextureHeight and eyeTextureWidth to calculate the aspect ratio of the display manually.

float cameraAspectRatio = XRSettings.eyeTextureWidth/XRSettings.eyeTextureHeight;

Thanks for pointing this out. I didn't figure out how the convertion could possibly work -- if you look at the coordinates of the four corners in the viewport space in my previous posts, you can see that:

08-14 13:03:28.468495  7105  7126 I Unity   : S5: topLeft(0.44, 0.59, 0.51)
08-14 13:03:28.468520  7105  7126 I Unity   : S5: topRight(0.63, 0.58, 0.52)
08-14 13:03:28.468545  7105  7126 I Unity   : S5: bottomLeft(0.43, 0.40, 0.53)
08-14 13:03:28.468571  7105  7126 I Unity   : S5: bottomRight(0.62, 0.39, 0.54)

here top-down and left-right distance are both around 0.2. The image array I have is 1920:1080, while the aspect ratio I have is either 1:1 (from camera.ratio) or 1440:1760 (from XRsettings). I have a physical square whose width: height is of course 1:1, and no matter how to convert these values I can't have equal width and height on the screen. I tried to directly multiply height by 1920 or width by 1080, the previous would result in something high up and the latter would give a small image at the top-left of the actual image. Clearly *1920 did give the correct size of the square I have, but I'm not sure how y-coordinates are mapped and as a result cannot get the desired output.

On the other hand, I implemented a customized worldtoscreen function as follows:

	private Vector2 WorldPointToPixel(Vector3 worldPoint, int width, int height, Intrinsics intrinsics)
	{
		// Step 1: Convert the world space point to camera space
		Vector3 cameraSpacePoint = Camera.main.worldToCameraMatrix.inverse.MultiplyPoint(worldPoint);

		// Step 2: Project the camera space point onto the normalized image plane
		Vector2 normalizedImagePoint = new Vector2(cameraSpacePoint.x / cameraSpacePoint.z, cameraSpacePoint.y / cameraSpacePoint.z);

		// Step 3: Adjust for FOV
		float verticalFOVRad = intrinsics.FieldOfViewInDeg.y * Mathf.Deg2Rad;
		float horizontalFOVRad = intrinsics.FieldOfViewInDeg.x * Mathf.Deg2Rad;

		normalizedImagePoint.x /= Mathf.Tan(horizontalFOVRad / 2);
		normalizedImagePoint.y /= Mathf.Tan(verticalFOVRad / 2);

		// Step 4: Convert normalized image coordinates to pixel coordinates
		Vector2 pixelPosition = new(
			normalizedImagePoint.x * width + intrinsics.PrincipalPoint.x,
			normalizedImagePoint.y * height + intrinsics.PrincipalPoint.y
		);

		return pixelPosition;
	}

I don't see detailed documentation for vuforia camera intrinsics so this is a bold guess. However when I path in width 1920 and height 1080 I got:

08-14 16:41:04.358614 22513 22534 I Unity   : S2: topLeft(-0.21, -0.13, 0.34)
08-14 16:41:04.358641 22513 22534 I Unity   : S2: topRight(-0.10, -0.13, 0.38)
08-14 16:41:04.358670 22513 22534 I Unity   : S2: bottomLeft(-0.19, -0.23, 0.30)
08-14 16:41:04.358699 22513 22534 I Unity   : S2: bottomRight(-0.09, -0.23, 0.34)
08-14 16:41:04.358742 22513 22534 I Unity   : S4: topLeft from Vuforia(1308.82, 780.45)
08-14 16:41:04.358760 22513 22534 I Unity   : S4: topRight from Vuforia(1047.43, 625.14)
08-14 16:41:04.358778 22513 22534 I Unity   : S4: bottomLeft from Vuforia(1268.04, 1673.10)
08-14 16:41:04.358796 22513 22534 I Unity   : S4: bottomRight from Vuforia(1001.67, 1484.95)

Given that my image array is 1920*1080, clearly this wouldn't work. I'm still trying to find what's going wrong, but am I on the right track? Shall I move to MLCamera just for simplicity?

To disambiguate, the Camera that is used by Vuforia is different than the Main Camera. The Unity Main Camera represents the display and has a resolution, aspect ratio, and position that is different than the RGB camera used by the Vuforia capture.

Does Vuforia return what seem like correct intrinsic and extrinsic values for the RGB camera at the front of the headset?

Vuforia did return something likewise, check the figure I attached from vuforia:


I don't know if this looks correct, but I have FieldOfViewInDeg as Field of View in Deg: x=64.7041 y=39.22261

I'm now testing with MLCamera but the app seems to get crashed after a few seconds. Trying to debug right now... I have a log like this:
logcat.zip (144.3 KB)
My code:

using System;
using System.Collections;
using System.Runtime.CompilerServices;
using UnityEngine;
using UnityEngine.XR.MagicLeap;
using Vuforia;

public class SimpleCamera : MonoBehaviour
{
    [SerializeField, Tooltip("Desired width for the camera capture")]
    private int captureWidth = 1920;
    [SerializeField, Tooltip("Desired height for the camera capture")]
    private int captureHeight = 1080;
    [SerializeField, Tooltip("The renderer to show the camera capture on RGB format")]
    private Renderer _screenRendererRGB = null;

    //The identifier can either target the Main or CV cameras.
    private MLCamera.Identifier _identifier = MLCamera.Identifier.Main;
    private MLCamera _camera;
    //Is true if the camera is ready to be connected.
    private bool _cameraDeviceAvailable;

    private MLCamera.CaptureConfig _captureConfig;

    private Texture2D _videoTextureRgb;
    //The camera capture state
    bool _isCapturing;


    void OnEnable()
    {
        //This script assumes that camera permissions were already granted.
        StartCoroutine(EnableMLCamera());

    }

    void OnDisable()
    {
        StopCapture();
    }

    //Waits for the camera to be ready and then connects to it.
    private IEnumerator EnableMLCamera()
    {
        //Checks the main camera's availability.
        while (!_cameraDeviceAvailable)
        {
            MLResult result = MLCamera.GetDeviceAvailabilityStatus(_identifier, out _cameraDeviceAvailable);
            if (result.IsOk == false || _cameraDeviceAvailable == false)
            {
                // Wait until camera device is available
                yield return new WaitForSeconds(1.0f);
            }
        }
        ConnectCamera();
    }

    private void ConnectCamera()
    {
        //Once the camera is available, we can connect to it.
        if (_cameraDeviceAvailable)
        {
            MLCamera.ConnectContext connectContext = MLCamera.ConnectContext.Create();
            connectContext.CamId = _identifier;
            //MLCamera.Identifier.Main is the only camera that can access the virtual and mixed reality flags
            connectContext.Flags = MLCamera.ConnectFlag.CamOnly;
            connectContext.EnableVideoStabilization = true;

            _camera = MLCamera.CreateAndConnect(connectContext);
            if (_camera != null)
            {
                Debug.Log("Camera device connected");
                ConfigureCameraInput();
                SetCameraCallbacks();
            }
        }
    }

    private void ConfigureCameraInput()
    {
        //Gets the stream capabilities the selected camera. (Supported capture types, formats and resolutions)
        MLCamera.StreamCapability[] streamCapabilities = MLCamera.GetImageStreamCapabilitiesForCamera(_camera, MLCamera.CaptureType.Video);

        if (streamCapabilities.Length == 0)
            return;

        //Set the default capability stream
        MLCamera.StreamCapability defaultCapability = streamCapabilities[0];

        //Try to get the stream that most closely matches the target width and height
        if (MLCamera.TryGetBestFitStreamCapabilityFromCollection(streamCapabilities, captureWidth, captureHeight,
                MLCamera.CaptureType.Video, out MLCamera.StreamCapability selectedCapability))
        {
            defaultCapability = selectedCapability;
        }


        //Initialize a new capture config.
        _captureConfig = new MLCamera.CaptureConfig();
        //Set RGBA video as the output
        MLCamera.OutputFormat outputFormat = MLCamera.OutputFormat.YUV_420_888;
        //Set the Frame Rate to 30fps
        _captureConfig.CaptureFrameRate = MLCamera.CaptureFrameRate._30FPS;
        //Initialize a camera stream config.
        //The Main Camera can support up to two stream configurations
        _captureConfig.StreamConfigs = new MLCamera.CaptureStreamConfig[1];
        _captureConfig.StreamConfigs[0] = MLCamera.CaptureStreamConfig.Create(
            defaultCapability, outputFormat
        );
        StartVideoCapture();
    }

    private void StartVideoCapture()
    {
        MLResult result = _camera.PrepareCapture(_captureConfig, out MLCamera.Metadata metaData);
        if (result.IsOk)
        {
            //Trigger auto exposure and auto white balance
            _camera.PreCaptureAEAWB();
            //Starts video capture. This call can also be called asynchronously 
            //Images capture uses the CaptureImage function instead.
            result = _camera.CaptureVideoStart();
            _isCapturing = MLResult.DidNativeCallSucceed(result.Result, nameof(_camera.CaptureVideoStart));
            if (_isCapturing)
            {
                Debug.Log("Video capture started!");
            }
            else
            {
                Debug.LogError($"Could not start camera capture. Result : {result}");
            }
        }
    }

    private void StopCapture()
    {
        if (_isCapturing)
        {
            _camera.CaptureVideoStop();
        }

        _camera.Disconnect();
        _camera.OnRawVideoFrameAvailable -= RawVideoFrameAvailable;
        _isCapturing = false;
    }

    //Assumes that the capture configure was created with a Video CaptureType
    private void SetCameraCallbacks()
    {
        //Provides frames in either YUV/RGBA format depending on the stream configuration
        _camera.OnRawVideoFrameAvailable += RawVideoFrameAvailable;
    }

    void RawVideoFrameAvailable(MLCamera.CameraOutput output, MLCamera.ResultExtras extras, MLCameraBase.Metadata metadataHandle)
    {
        var imgTarget = AllVariables.Instance.mImageTarget;

        if (imgTarget == null || imgTarget.TargetStatus.Status != Status.TRACKED)
        {
            return;
        }
        if (output.Format == MLCamera.OutputFormat.YUV_420_888)
        {
            var imagePlane = output.Planes[0];
            int actualWidth = (int)(imagePlane.Width * imagePlane.PixelStride);
            // obtain corners of the puzzle in world space
            AllVariables.Instance.ComputeCornerCoordinates();
            var corners = AllVariables.Instance.corners;
            Debug.Log("C2: topLeft" + corners[0]);
            Debug.Log("C2: topRight" + corners[1]);
            Debug.Log("C2: bottomLeft" + corners[2]);
            Debug.Log("C2: bottomRight" + corners[3]);
            if (MLCVCamera.GetFramePose(extras.VCamTimestamp, out Matrix4x4 cameraTransform).IsOk)
            {
                Debug.Log("Actual width is " + actualWidth);
                Debug.Log("imagePlane.Stride is " + imagePlane.PixelStride);
                Debug.Log("imagePlane.Width is " + imagePlane.Width);
                Debug.Log("imagePlane.Height is " + imagePlane.Height);

                var topLeftScreen = WorldPointToPixel(corners[0], (int)actualWidth, (int)imagePlane.Height, extras.Intrinsics.Value, cameraTransform);
                var topRightScreen = WorldPointToPixel(corners[1], (int)actualWidth, (int)imagePlane.Height, extras.Intrinsics.Value, cameraTransform);
                var bottomLeftScreen = WorldPointToPixel(corners[2], (int)actualWidth, (int)imagePlane.Height, extras.Intrinsics.Value, cameraTransform);
                var bottomRightScreen = WorldPointToPixel(corners[3], (int)actualWidth, (int)imagePlane.Height, extras.Intrinsics.Value, cameraTransform);
                Debug.Log("C5: topLeft" + topLeftScreen);
                Debug.Log("C5: topRight" + topRightScreen);
                Debug.Log("C5: bottomLeft" + bottomLeftScreen);
                Debug.Log("C5: bottomRight" + bottomRightScreen);


                var minX = Mathf.Max(Mathf.FloorToInt(Mathf.Min(topLeftScreen.x, bottomLeftScreen.x)) - 10, 0);
                var maxX = Mathf.Min(Mathf.CeilToInt(Mathf.Max(topRightScreen.x, bottomRightScreen.x)) + 10, actualWidth);
                var minY = Mathf.Max(Mathf.FloorToInt(Mathf.Min(bottomLeftScreen.y, bottomRightScreen.y)) - 10, 0);
                var maxY = Mathf.Min(Mathf.CeilToInt(Mathf.Max(topLeftScreen.y, topRightScreen.y)) + 10, imagePlane.Height);
                Debug.Log("C3: minX: " + minX);
                Debug.Log("C3: maxX: " + maxX);
                Debug.Log("C3: minY: " + minY);
                Debug.Log("C3: maxY: " + maxY);
                AllVariables.Instance.puzzleHeight = (int)(maxY - minY);
                AllVariables.Instance.puzzleWidth = maxX - minX;
                if (imagePlane.Stride != actualWidth)
                {

                    byte[] yChannel = new byte[(int)((maxX - minX) * (maxY - minY))];
                    for (int i = minY; i < maxY; i++)
                    {

                        Buffer.BlockCopy(imagePlane.Data, (int)(i * imagePlane.Stride + minX), yChannel,
                            i * (maxX - minX), maxX - minX);
                    }
                    Debug.Log("Hello we have a channel Y data in if with shape: " + yChannel.Length);
                    AllVariables.Instance.puzzleImage = yChannel;

                    Debug.Log("Ending one update");



                }
                else
                {
                    byte[] yChannel = imagePlane.Data;
                    Debug.Log("Hello we have a channel Y data in else with shape: " + yChannel.Length);
                }
            }





        }
    }

    public Vector2 WorldPointToPixel(Vector3 worldPoint, int width, int height, MLCameraBase.IntrinsicCalibrationParameters parameters, Matrix4x4 cameraTransformationMatrix)
    {
        // Step 1: Convert the world space point to camera space
        Vector3 cameraSpacePoint = cameraTransformationMatrix.inverse.MultiplyPoint(worldPoint);

        // Step 2: Project the camera space point onto the normalized image plane
        Vector2 normalizedImagePoint = new Vector2(cameraSpacePoint.x / cameraSpacePoint.z, cameraSpacePoint.y / cameraSpacePoint.z);

        // Step 3: Adjust for FOV
        float verticalFOVRad = parameters.FOV * Mathf.Deg2Rad;
        float aspectRatio = width / (float)height;
        float horizontalFOVRad = 2 * Mathf.Atan(Mathf.Tan(verticalFOVRad / 2) * aspectRatio);

        normalizedImagePoint.x /= Mathf.Tan(horizontalFOVRad / 2);
        normalizedImagePoint.y /= Mathf.Tan(verticalFOVRad / 2);

        // Step 4: Convert normalized image coordinates to pixel coordinates
        Vector2 pixelPosition = new Vector2(
            normalizedImagePoint.x * width + parameters.PrincipalPoint.x,
            normalizedImagePoint.y * height + parameters.PrincipalPoint.y
        );

        return pixelPosition;
    }
}

Update: It still crashes sometimes but I was able to get some results. Basically use the conversion function I have, I got:

08-14 20:12:53.206271 25938 25959 I Unity   : C2: topLeft(-0.16, 0.03, 0.36)
08-14 20:12:53.206321 25938 25959 I Unity   : C2: topRight(-0.05, 0.03, 0.37)
08-14 20:12:53.206343 25938 25959 I Unity   : C2: bottomLeft(-0.16, -0.08, 0.37)
08-14 20:12:53.206367 25938 25959 I Unity   : C2: bottomRight(-0.05, -0.08, 0.38)
08-14 20:12:53.206477 25938 25959 I Unity   : Actual width is 1920
08-14 20:12:53.206500 25938 25959 I Unity   : imagePlane.Stride is 1
08-14 20:12:53.206513 25938 25959 I Unity   : imagePlane.Width is 1920
08-14 20:12:53.206529 25938 25959 I Unity   : imagePlane.Height is 1080
08-14 20:12:53.206561 25938 25959 I Unity   : C5: topLeft(802.76, 254.34)
08-14 20:12:53.206580 25938 25959 I Unity   : C5: topRight(1123.74, 201.58)
08-14 20:12:53.206599 25938 25959 I Unity   : C5: bottomLeft(750.44, -65.84)
08-14 20:12:53.206617 25938 25959 I Unity   : C5: bottomRight(1071.02, -118.72)
08-14 20:12:53.206632 25938 25959 I Unity   : C3: minX: 740
08-14 20:12:53.206646 25938 25959 I Unity   : C3: maxX: 1134
08-14 20:12:53.206661 25938 25959 I Unity   : C3: minY: 0
08-14 20:12:53.206681 25938 25959 I Unity   : C3: maxY: 265

And the image I get is higher up w.r.t the actual square I have. I assume the y axis is also flipped for MLCamera output, but why am I getting negative values here?

I recommend creating a test script that does the world to pixel conversion separate from the Vuforia project you are using. This project would allow you to place the objects in a known point and debug the pixel positions. You can use the Camera Capture app to verify that the point you are tracking is in the frustrum of the Camera capture and is not contributing to the negative pixel positions.

Once you verify that the conversion is correct you can try to integrate it into your vuforia project.

You can also try to use the intrinsic values provided by Vuforia. I’m not a vuforia expert but you might be able to use the VuforiaBehaviour.Instance.World.OnStateUpdated event to get a callback when a new image and pose are captured. Then you might be able to use the camera intrinsic values without using a separate camera stream.

That said, I would first verify that the world to pixel position is reported correctly .

Yes sure thanks for your advice! I'm currently trying wth Vuforia disabled. Maybe I'll just start with a 50*50 square in the center of the square.

I was not able to reproduce the issue with the World To Pixel alignment issue. I can confirm that the following scripts work when used in the Vuforia Image Targets Example. It's important to note that the Camera inside Unity does not represent the RGB camera on the headset, but rather the user's eyes. Which means that you cannot use Camera.WorldToScreen or Camera.ScreenToWorld or any other variations that references the camera directly. Because of this you cannot use the CameraDevice class provided by Vuforia because it does not provide the Camera Pose when the Image was captured
which is different than the MainCamera.Transform.Position.

This is the script that demonstrates getting the Image Target Corners :

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using Vuforia;

public class World2PixelExample : MonoBehaviour
{
    public ImageTargetBehaviour imageTargetBehaviour;
    public SimpleCamera SimpleCamera;

    void Start()
    {
        if (imageTargetBehaviour)
        {
            imageTargetBehaviour.OnTargetStatusChanged += OnTargetStatusChanged;
        }
        else
        {
            Debug.LogError("Image Target Behaviour not found.");
        }
    }

    private void OnTargetStatusChanged(ObserverBehaviour observerbehavour, TargetStatus status)
    {

        Debug.Log("World To Pixel Status Update");
        if (status.Status == Status.TRACKED && status.StatusInfo == StatusInfo.NORMAL)
        {
            Vector3 targetSize = imageTargetBehaviour.GetSize();
            Debug.Log("targetSize " + targetSize);

            Vector3 topLeft = new Vector3(-targetSize.x / 2, 0, -targetSize.y / 2);
            Vector3 topRight = new Vector3(targetSize.x / 2, 0, -targetSize.y / 2);
            Vector3 bottomLeft = new Vector3(-targetSize.x / 2, 0, targetSize.y / 2);
            Vector3 bottomRight = new Vector3(targetSize.x / 2, 0, targetSize.y / 2);

            // Transform the corner positions to world coordinates
            topLeft = imageTargetBehaviour.transform.TransformPoint(topLeft);
            topRight = imageTargetBehaviour.transform.TransformPoint(topRight);
            bottomLeft = imageTargetBehaviour.transform.TransformPoint(bottomLeft);
            bottomRight = imageTargetBehaviour.transform.TransformPoint(bottomRight);
         
            Debug.Log($"Top Left: {topLeft}");
            Debug.Log($"Top Right: {topRight}");
            Debug.Log($"Bottom Left: {bottomLeft}");
            Debug.Log($"Bottom Right: {bottomRight}");

   

            // Convert world coordinates to screen pixel positions
            Vector2 topLeftPixel = SimpleCamera.WorldPointToPixel(topLeft);
            Debug.Log($"topLeftPixel {topLeftPixel}");

            Vector2 topRightPixel = SimpleCamera.WorldPointToPixel(topRight);
            Debug.Log($"topRightPixel {topRightPixel}");

            Vector2 bottomLeftPixel = SimpleCamera.WorldPointToPixel(bottomLeft);
            Debug.Log($"bottomLeftPixel {bottomLeftPixel}");

            Vector2 bottomRightPixel = SimpleCamera.WorldPointToPixel(bottomRight);
            Debug.Log($"bottomRightPixel {bottomRightPixel}");

        }
    }
}

This is the script that manages the Camera Data and stores the RGB Camera intrinsic and extrinsic parameters

using System;
using System.Collections;
using UnityEngine;
using UnityEngine.XR.MagicLeap;
using Vuforia;

public class SimpleCamera : MonoBehaviour
{
    [SerializeField, Tooltip("Desired width for the camera capture")]
    private int captureWidth = 1280;
    [SerializeField, Tooltip("Desired height for the camera capture")]
    private int captureHeight = 720;
    //Not used in this example

    //  [SerializeField, Tooltip("The renderer to show the camera capture on RGB format")]
    //  private Renderer _screenRendererRGB = null;

    //The identifier can either target the Main or CV cameras.
    private MLCamera.Identifier _identifier = MLCamera.Identifier.Main;
    private MLCamera _camera;
    //Is true if the camera is ready to be connected.
    private bool _cameraDeviceAvailable;

    private MLCamera.CaptureConfig _captureConfig;

    private Texture2D _videoTextureRgb;
    //The camera capture state
    bool _isCapturing;

    private MLCamera.CameraOutput _lastCameraOutput;
    private MLCamera.ResultExtras _lastExtras;
    private Matrix4x4 _lastTransform;

    private bool permissionCameraGranted = false;
    private readonly MLPermissions.Callbacks permissionCallbacks = new MLPermissions.Callbacks();

    private void Awake()
    {
        permissionCallbacks.OnPermissionGranted += OnPermissionGranted;
    }

    private void OnDestroy()
    {
        permissionCallbacks.OnPermissionGranted -= OnPermissionGranted;
    }

    void Start()
    {
        MLPermissions.RequestPermission(MLPermission.Camera, permissionCallbacks);
        MLPermissions.RequestPermission(MLPermission.SpatialMapping, permissionCallbacks);
        MLPermissions.RequestPermission(MLPermission.SpatialAnchors, permissionCallbacks);
        StartCoroutine(EnableMLCamera());
    }

    private void OnPermissionGranted(string permission)
    {
        permissionCameraGranted = true;
        Debug.Log($"{permission} granted.");
    }

    void OnDisable()
    {
        StopCapture();
    }

    //Waits for the camera to be ready and then connects to it.
    private IEnumerator EnableMLCamera()
    {
        while (!permissionCameraGranted)
        {
            yield return null;
        }

        VuforiaApplication.Instance.Initialize();

        yield return new WaitForSeconds(5);


        //Checks the main camera's availability.
        while (!_cameraDeviceAvailable)
        {
            MLResult result = MLCamera.GetDeviceAvailabilityStatus(_identifier, out _cameraDeviceAvailable);
            if (result.IsOk == false || _cameraDeviceAvailable == false)
            {
                // Wait until camera device is available
                yield return new WaitForSeconds(1.0f);
            }
        }

        yield return ConnectCamera();
    }

    private IEnumerator ConnectCamera()
    {
        //Once the camera is available, we can connect to it.
        if (_cameraDeviceAvailable)
        {
            MLCamera.ConnectContext connectContext = MLCamera.ConnectContext.Create();
            connectContext.CamId = _identifier;
            //MLCamera.Identifier.Main is the only camera that can access the virtual and mixed reality flags
            connectContext.Flags = MLCamera.ConnectFlag.CamOnly;

            var createAndConnectAsync = MLCamera.CreateAndConnectAsync(connectContext);

            while (!createAndConnectAsync.IsCompleted)
            {
                yield return null;
            }

            _camera = createAndConnectAsync.Result;

            if (_camera != null)
            {
                Debug.Log("Camera device connected");
                if (TryGetCaptureConfig(out _captureConfig))
                {
                    Debug.Log("Camera Config Created. Starting Video Capture");
                    yield return StartVideoCapture();
                }
                else
                {
                    Debug.LogError("Cannot Create Capture Config");
                    yield break;
                }
            }
        }
        yield return null;
    }

    private bool TryGetCaptureConfig(out MLCameraBase.CaptureConfig captureConfig)
    {
        captureConfig = new MLCameraBase.CaptureConfig();

        //Gets the stream capabilities the selected camera. (Supported capture types, formats and resolutions)
        MLCamera.StreamCapability[] streamCapabilities = MLCamera.GetImageStreamCapabilitiesForCamera(_camera, MLCamera.CaptureType.Video);

        if (streamCapabilities.Length == 0)
           return false;

        //Set the default capability stream
        MLCamera.StreamCapability defaultCapability = streamCapabilities[0];

        //Try to get the stream that most closely matches the target width and height
        if (MLCamera.TryGetBestFitStreamCapabilityFromCollection(streamCapabilities, captureWidth, captureHeight,
                MLCamera.CaptureType.Video, out MLCamera.StreamCapability selectedCapability))
        {
            defaultCapability = selectedCapability;
        }

        //Initialize a new capture config.
        captureConfig = new MLCamera.CaptureConfig();
        //Set RGBA video as the output
        MLCamera.OutputFormat outputFormat = MLCamera.OutputFormat.RGBA_8888;
        //Set the Frame Rate to 30fps
        captureConfig.CaptureFrameRate = MLCamera.CaptureFrameRate._30FPS;
        //Initialize a camera stream config.
        //The Main Camera can support up to two stream configurations
        captureConfig.StreamConfigs = new MLCamera.CaptureStreamConfig[1];
        captureConfig.StreamConfigs[0] = MLCamera.CaptureStreamConfig.Create(defaultCapability, outputFormat);
        return true;
    }
    private IEnumerator StartVideoCapture()
    {
        MLResult result = _camera.PrepareCapture(_captureConfig, out MLCamera.Metadata metaData);
        if (result.IsOk)
        {
            //Assume this is done by Vuforia
            // _camera.PreCaptureAEAWB();

            //Images capture uses the CaptureImage function instead.
            var captureVideoStartAsync = _camera.CaptureVideoStartAsync();
           while (!captureVideoStartAsync.IsCompleted)
           {
               yield return null;
           }

            result = captureVideoStartAsync.Result;
            _isCapturing = MLResult.DidNativeCallSucceed(result.Result, nameof(_camera.CaptureVideoStart));
            if (_isCapturing)
            {
                Debug.Log("Video capture started!");
                _camera.OnRawVideoFrameAvailable += RawVideoFrameAvailable;

            }
            else
            {
                Debug.LogError($"Could not start camera capture. Result : {result}");
            }
        }
    }

    private void StopCapture()
    {
        if (_isCapturing)
        {
            _camera.CaptureVideoStop();
            _camera.OnRawVideoFrameAvailable -= RawVideoFrameAvailable;
        }

        _camera?.Disconnect();
        _isCapturing = false;
    }

    void RawVideoFrameAvailable(MLCamera.CameraOutput output, MLCamera.ResultExtras extras, MLCameraBase.Metadata metadataHandle)
    {
        _lastCameraOutput = output;
        _lastExtras = extras;

        if (MLCVCamera.GetFramePose(extras.VCamTimestamp, out Matrix4x4 cameraTransform).IsOk)
        {
            _lastTransform = cameraTransform;
        }

        //Additional logic to render the image
        // if (output.Format == MLCamera.OutputFormat.RGBA_8888)
        // {
        //     UpdateRGBTexture(ref _videoTextureRgb, output.Planes[0], _screenRendererRGB);
        // }
    }

    public Vector2 WorldPointToPixel(Vector3 worldPoint)
    {
        if (_lastExtras.Intrinsics.HasValue)
        {
            int width = (int)_lastCameraOutput.Planes[0].Width;
            int height = (int)_lastCameraOutput.Planes[0].Height;
            return WorldPointToPixel(worldPoint, width, height, _lastExtras.Intrinsics.Value, _lastTransform);
        }
        Debug.Log("No Intrinsic value");
        return new Vector2(0, 0);
    }

    private Vector2 WorldPointToPixel(Vector3 worldPoint, int width, int height, MLCameraBase.IntrinsicCalibrationParameters parameters, Matrix4x4 cameraTransformationMatrix)
    {
        // Step 1: Convert the world space point to camera space
        Vector3 cameraSpacePoint = cameraTransformationMatrix.inverse.MultiplyPoint(worldPoint);

        // Step 2: Project the camera space point onto the normalized image plane
        Vector2 normalizedImagePoint = new Vector2(cameraSpacePoint.x / cameraSpacePoint.z, cameraSpacePoint.y / cameraSpacePoint.z);

        // Step 3: Adjust for FOV
        float verticalFOVRad = parameters.FOV * Mathf.Deg2Rad;
        float aspectRatio = width / (float)height;
        float horizontalFOVRad = 2 * Mathf.Atan(Mathf.Tan(verticalFOVRad / 2) * aspectRatio);

        normalizedImagePoint.x /= Mathf.Tan(horizontalFOVRad / 2);
        normalizedImagePoint.y /= Mathf.Tan(verticalFOVRad / 2);

        // Step 4: Convert normalized image coordinates to pixel coordinates
        Vector2 pixelPosition = new Vector2(
            normalizedImagePoint.x * width + parameters.PrincipalPoint.x,
            normalizedImagePoint.y * height + parameters.PrincipalPoint.y
        );

        return pixelPosition;
    }

    private void UpdateRGBTexture(ref Texture2D videoTextureRGB, MLCamera.PlaneInfo imagePlane, Renderer renderer)
    {

        if (videoTextureRGB != null &&
            (videoTextureRGB.width != imagePlane.Width || videoTextureRGB.height != imagePlane.Height))
        {
            Destroy(videoTextureRGB);
            videoTextureRGB = null;
        }

        if (videoTextureRGB == null)
        {
            videoTextureRGB = new Texture2D((int)imagePlane.Width, (int)imagePlane.Height, TextureFormat.RGBA32, false);
            videoTextureRGB.filterMode = FilterMode.Bilinear;

            Material material = renderer.material;
            material.mainTexture = videoTextureRGB;
            material.mainTextureScale = new Vector2(1.0f, -1.0f);
        }

        int actualWidth = (int)(imagePlane.Width * imagePlane.PixelStride);

        if (imagePlane.Stride != actualWidth)
        {
            var newTextureChannel = new byte[actualWidth * imagePlane.Height];
            for (int i = 0; i < imagePlane.Height; i++)
            {
                Buffer.BlockCopy(imagePlane.Data, (int)(i * imagePlane.Stride), newTextureChannel, i * actualWidth, actualWidth);
            }
            videoTextureRGB.LoadRawTextureData(newTextureChannel);
        }
        else
        {
            videoTextureRGB.LoadRawTextureData(imagePlane.Data);
        }
        videoTextureRGB.Apply();
    }
}

I am trying to do exactly what this function is supposed to work for. For the most part it is working fine but there is a lot of vertical error in the conversion. When I move my head up and down, the pixel coordinate does not seem to correspond well (vertically) to where it should be. However, when I move my head left and right, the pixel coordinate stays where I expect it to be (and follows the 3D object well). Any idea how can I make corrections to this vertical error?

My intuition is to see if the y-axis value is reversed.

1 Like

The direction of the vertical movement is correct (which imo indicates that the y value is not reversed). The value itself is changing drastically with slight up or down movement (again, in the right direction) which is causing the error. I am noticing that verticalFOVRad does not take into account the aspect ratio while the horizontalFOVRad does. Can I somehow incorporate aspect ratio in verticalFOVRad?