Processing the depth frames

Hi all. I have created a Unity application to record depth images together with extrinsic and intrinsic data of the image. My goal is to convert the depth images to point clouds and transform them into a common coordinate space (using the extrinsic data). I haven't found any example code for doing this, but so far I think I have managed to do the depth -> point cloud conversion properly. At least the point clouds look okayish. I have also used the distortion parameters to undistort the data (OpenCV getOptimalNewCameraMatrix + undistort). Unfortunately the part where I try to transform the pointcloud from camera frame to the world frame is not working.

Is there an example implementation for this in (python / C++ / C#)? Or instructions how to apply the camera pose (extrinsic data) to the point cloud in the camera coordinate frame.

Hi @paul.kemppi we plan to provide a sample of this once the depth sensor becomes available via the OpenXR APIs in Unity. That said, here is a snippet of code that shows how to undistort the depth image and how to use a lookup table that can be used to undistort the pixel positions.

 private Vector2[,] projTable = null;

    private void CreateProjectionTable(IntrinsicParameters intrinsics)
    {
        Vector2Int resolution = intrinsics.Resolution;
        Vector2 focal = intrinsics.FocalLength;
        Vector2 center = intrinsics.PrincipalPoint;
        DistortionParameters distortion = intrinsics.Distortion;

        projTable = new Vector2[resolution.y, resolution.x];

        for (int y = 0; y < resolution.y; ++y)
        {
            for (int x = 0; x < resolution.x; ++x)
            {
                Vector2 uv = new Vector2(x, y) / resolution;
                uv = Undistort(uv, distortion);
                projTable[y, x] = ((uv * resolution) - center) / focal;
            }
        };
    }

    private Vector2 Undistort(Vector2 uv, DistortionParameters d)
    {
        Vector2 xy = uv - half2;
        float r2 = Vector2.Dot(xy, xy);
        float r4 = r2 * r2;
        float r6 = r4 * r2;

        Vector2 xy_rd = xy * (1 + ((float)d.k1 * r2) + ((float)d.k2 * r4) + ((float)d.k3 * r6));

        float xtd = (2 * (float)d.p1 * xy.x * xy.y) + ((float)d.p2 * (r2 + (2 * xy.x * xy.x)));
        float ytd = (2 * (float)d.p2 * xy.x * xy.y) + ((float)d.p1 * (r2 + (2 * xy.y * xy.y)));
        Vector2 xy_td = new Vector2(xtd, ytd);

        return (xy_rd + xy_td) + half2;
    }

Thank you for the quick reply. I am also using world camera frames in my work and undistorting them with OpenCV. Should I use the above functions instead? Distortion for the world camera contains k4 parameter, so should I just add:

float r8 = r6 * r2;

And then:

Vector2 xy_rd = xy * (1 + ((float)d.k1 * r2) + ((float)d.k2 * r4) + ((float)d.k3 * r6 + ((float)d.k4 * r8));

Do you mind creating a new ticket for the world, camera and distortion? This will help us track the request our customers and help us prioritize, some of our new API‘s and samples

I created a new ticket for the world camera. Could you also clarify 1) what is the half2 in your function 2) How to apply the created projTable to get undistorted image.

Thank you, here is some additional information

    private static readonly Vector2 half2 = Vector2.one * 0.5f;

Here is how you would apply the projection table where d is the depth value from the depth camera. x and y are the pixel coordinates and xy is the value from the projection table.

 Vector2 xy = projTable[y, x];
 Vector3 cp = new Vector3(xy.x, xy.y, 1).normalized * d;
 Vector3 wp = cameraToWorld.MultiplyPoint3x4(cp);

Further questions related to this:

  1. Is the depth (d) here the pixels from DepthImage

"Depth map stores the depth data from the depth camera. Depth is represented in meters and gives the radial distance of the real world location from the depth camera coordinate frame."

  1. Is the cameraToWorld constructed from Position and Rotation

  2. Is the output (wp) a world point corresponding the depth pixel? By storing these values e.g. in text file (.xyz) line by line, should I see the undistorted, world frame point cloud in e.g. CloudCompare?

If these assumptions are correct, there is still something wrong in these functions (or my implementation) as all the points fall in a line. Depth values make sense (0...7.5m) and all the other input params are verified to be meaningful (w,h,fx,fy,cx,cy,k1,k2,p1,p2,k3).

Here is an the current code for the conversion and compressed sample depth image in csv-format.

Depth2PointCloud.cs (5.2 KB)
VTT_XR_MagicLeap_DepthFrame_03_20_2024_01_16_21.zip (352.3 KB)

1 Like

Paul and kbabilinksi,

I am working on a project that is also trying to use the depth sensor data, and we need to be able to access specific pixels in the depth camera in order to use them to process other images. Could you explain how you gain access to the DepthImage to save that to a csv in your work?

Best,
Cooper

Hi Cooper,

Check out the example scene for acquiring the data: Depth Camera | MagicLeap Developer Documentation

Like in the example, you can request depth frames in void Update() function of your script:


void Update()
{ 
    var result = MLDepthCamera.GetLatestDepthData(timeout, out MLDepthCamera.Data data);

    isFrameAvailable = result.IsOk;
    if (result.IsOk)
    {
        lastData = data;
    }

    if (lastData == null)
    {
        return;
    }

    switch (captureFlag)
    {
        case MLDepthCamera.CaptureFlags.DepthImage:
            if (lastData.DepthImage != null)
            {
                // This is the depth array of floats containing the depth in meters (radial distance of the real world location from the depth camera coordinate frame)
                float[] depth = ConvertBytesToFloats(lastData.DepthImage.Value.Data);
            }
    }
}

public static float[] ConvertBytesToFloats(byte[] array)
{
    float[] floatArr = new float[array.Length / 4];
    for (int i = 0; i < floatArr.Length; i++)
    {
        floatArr[i] = BitConverter.ToSingle(array, i * 4);
    }
    return floatArr;
}

Returning to the original question: I would be extremely happy to get fixed example code showing how to convert the depth into point cloud in world frame.

Paul

The camera world can be obtained by positioning a transform to the position and rotation of the depth camera and calling transform.localToWorldMatrix

Thank you for your patience as I look for a more complete example.

Here is a script that sends an array of world points to a "point cloud renderer" . I have not included the point cloud renderer code but the script should be enough to demonstrate how to obtain the undistorted depth data. A few things to note.

You may want to check if the depth point falls into the user's field of view and exclude any depth points that are not visible. This will ensure that the depth image aligns to the world more closely. Although, the point cloud is undistorted, some skewing occurs at the edges of the image.

pseudo code

private bool IsPointInFrustum(Vector3 point, Plane[] frustumPlanes)
{
    foreach (var plane in frustumPlanes)
    {
        if (plane.GetDistanceToPoint(point) < 0)
        {
            // Point is outside this plane, and thus outside the frustum
            return false;
        }
    }
    // Point is inside all planes, and thus inside the frustum
    return true;
}

You may want to consider checking the user's movement to prevent capturing data while the device is in motion. You could do this by storing the previous position of the depth camera and comparing it with the new frame.

pseudo code

    var position = info.Pose.position + info.Pose.forward + info.Pose.up;
        float speed = Vector3.Distance(position, lastPosition) / Time.deltaTime;
        if (speed < 0.1f) StartCoroutine(Process());
        lastPosition = position;

Here is the test script that projects the depth image as an array of points. When using the script, press the Bumper to initiate capturing the point cloud. The point cloud will stop being processed when the bumper is released.

Note the point cloud renderer is not provided.

using System.Collections;
using System.Threading.Tasks;
using Unity.Collections;
using UnityEngine;
using UnityEngine.XR.MagicLeap;

/// <summary>
/// Converts the depth images from the Magic Leap 2 to an array of 3d points and sends the points to be rendered by a PointCloudRender script.
/// Points will only be submitted to the PointCloudRenderer when the bumper button is pressed.
/// </summary>
public class DepthPointCloudTest : MonoBehaviour
{
    [Header("Depth Camera Settings")] [SerializeField]
    private float minDepth = 0.2f;

    [SerializeField] private float maxDepth = 10f;
    [SerializeField] private bool useConfidenceFilter = false;
    [SerializeField] private MLDepthCamera.Stream depthStream = MLDepthCamera.Stream.LongRange;

    [Header("Rendering")]
    // Custom class that renders a vector3 array as 3d points
    public PointCloudRenderer pointCloudRenderer;

    private MagicLeapInputs magicLeapInputs;
    private MagicLeapInputs.ControllerActions controllerActions;
    private Vector3[] cachedDepthPoints;
    private bool isDepthCameraRunning = false;
    private Vector2[,] cachedProjectionTable;
    private Texture2D depthTexture;
    private Texture2D confidenceTexture;

    private static readonly Vector2 Half2 = Vector2.one * 0.5f;

    private void Awake()
    {
        InitializeMagicLeapInputs();
    }

    private void OnEnable()
    {
        CheckAndRequestDepthCameraPermission();
    }

    private void OnDisable()
    {
        StopDepthCamera();
    }

    private IEnumerator Start()
    {
        yield return ProcessDepthData();
    }

    private void InitializeMagicLeapInputs()
    {
        magicLeapInputs = new MagicLeapInputs();
        magicLeapInputs.Enable();
        controllerActions = new MagicLeapInputs.ControllerActions(magicLeapInputs);
        controllerActions.Enable();
    }

    /// <summary>
    /// Checks for depth camera permissions and requests them if not already granted.
    /// Starts the depth camera after granting the permission.
    /// </summary>
    private void CheckAndRequestDepthCameraPermission()
    {
        if (MLPermissions.CheckPermission(MLPermission.DepthCamera).IsOk)
        {
            StartDepthCamera();
        }
        else
        {
            MLPermissions.Callbacks callbacks = new MLPermissions.Callbacks();
            callbacks.OnPermissionGranted += _ => StartDepthCamera();
            MLPermissions.RequestPermission(MLPermission.DepthCamera, callbacks);
        }
    }

    /// <summary>
    /// Starts the depth camera if it's not already running by setting its configuration and connecting to it.
    /// </summary>
    private void StartDepthCamera()
    {
        if (isDepthCameraRunning)
        {
            Debug.LogWarning("DepthCamera: Already running.");
            return;
        }

        MLDepthCamera.Settings settings = ConfigureDepthCameraSettings();
        MLDepthCamera.SetSettings(settings);

        Debug.Log("DepthCamera: StartDepthCamera() - Settings set");

        MLResult result = MLDepthCamera.Connect();

        if (result.IsOk)
        {
            isDepthCameraRunning = true;
            Debug.Log("DepthCamera: Connected.");
        }
        else
        {
            Debug.LogError($"DepthCamera: Connection failed with error: {result}.");
        }
    }

    /// <summary>
    /// Sets the settings for the depth camera to preset values, including stream configuration and exposure settings.
    /// </summary>
    /// <returns>Returns the configured settings to be applied when starting the depth camera</returns>
    private MLDepthCamera.Settings ConfigureDepthCameraSettings()
    {
        uint flags = (uint)(MLDepthCamera.CaptureFlags.DepthImage | MLDepthCamera.CaptureFlags.Confidence);
        MLDepthCamera.StreamConfig longConfig = new MLDepthCamera.StreamConfig
        {
            FrameRateConfig = MLDepthCamera.FrameRate.FPS_5,
            Flags = flags,
            Exposure = 1600
        };

        MLDepthCamera.StreamConfig shortConfig = new MLDepthCamera.StreamConfig
        {
            FrameRateConfig = MLDepthCamera.FrameRate.FPS_30,
            Flags = flags,
            Exposure = 375
        };

        return new MLDepthCamera.Settings
        {
            Streams = depthStream,
            StreamConfig = new[] { longConfig, shortConfig }
        };
    }

    /// <summary>
    /// Stops the depth camera if it is running.
    /// </summary>
    private void StopDepthCamera()
    {
        if (!isDepthCameraRunning)
        {
            Debug.LogWarning("DepthCamera: Not running.");
            return;
        }

        Debug.Log($"DepthCamera: StopDepthCamera() - Stopping depthTexture camera");

        MLResult result = MLDepthCamera.Disconnect();

        if (result.IsOk)
        {
            isDepthCameraRunning = false;
            Debug.Log("DepthCamera: Disconnected.");
        }
        else
        {
            Debug.LogError($"DepthCamera: Disconnection failed with error: {result}.");
        }
    }

    /// <summary>
    /// Coroutine for processing depth data continuously while the depth camera is running. 
    /// It waits for the depth camera to start, then enters a loop to fetch and process the latest depth data
    /// at regular intervals, updating textures and calculating point clouds based on the depth and confidence data.
    /// </summary>
    private IEnumerator ProcessDepthData()
    {
        // Wait until the depth camera has started before proceeding.
        yield return new WaitUntil(() => isDepthCameraRunning);

        // Continue processing depth data as long as the depth camera is running.
        while (isDepthCameraRunning)
        {
            MLDepthCamera.Data data;

            // Loop until valid depth data is received, including both depth image and confidence values.
            while (!MLDepthCamera.GetLatestDepthData(1, out data).IsOk || !data.DepthImage.HasValue ||
                   !data.ConfidenceBuffer.HasValue)
            {
                // Wait until the next frame before trying again.
                yield return null;
            }

            // Prepare and update the textures for both depth and confidence data using the latest valid data received.
            depthTexture = CreateOrUpdateTexture(depthTexture, data.DepthImage.Value);
            confidenceTexture = CreateOrUpdateTexture(confidenceTexture, data.ConfidenceBuffer.Value);

            // Check if the controller's bumper is pressed to trigger point cloud calculation.
            if (controllerActions.Bumper.IsPressed())
            {
                Debug.Log("DepthCloud: Calculate Point Cloud - Started");

                // Create a transformation matrix from the depth camera's position and rotation to world space.
                Matrix4x4 cameraToWorldMatrix = new Matrix4x4();
                cameraToWorldMatrix.SetTRS(data.Position, data.Rotation, Vector3.one);

                // Calculate the point cloud based on the current depth data and camera position.
                yield return CalculatePointCloud(data.Intrinsics, cameraToWorldMatrix);
                
            }
            
            yield return null;
        }
    }

    /// <summary>
    /// Calculates the point cloud from depth and confidence textures using camera intrinsics and a transformation matrix.
    /// </summary>
    /// <param name="intrinsics">The depth camera's intrinsic parameters./param>
    /// <param name="cameraToWorldMatrix">A transform matrix based on the depth camera's position and rotation.</param>
    private IEnumerator CalculatePointCloud(MLDepthCamera.Intrinsics intrinsics, Matrix4x4 cameraToWorldMatrix)
    {
        var depthData = depthTexture.GetRawTextureData<float>();
        var confidenceData = confidenceTexture.GetRawTextureData<float>();
        Vector2Int resolution = new Vector2Int((int)intrinsics.Width, (int)intrinsics.Height);
        
        Task t = Task.Run(() =>
        { 
            // Ensure the projection table is calculated and cached to avoid recomputation.
            if (cachedProjectionTable == null) 
            { 
                cachedProjectionTable = CreateProjectionTable(intrinsics); 
                Debug.Log("DepthCloud: Projection Table Created");
            }

            // Process depth points to populate the cachedDepthPoints array with world positions.
            ProcessDepthPoints(ref cachedDepthPoints, depthData, confidenceData, resolution, cameraToWorldMatrix);

        });

        yield return new WaitUntil(() => t.IsCompleted);

        Debug.Log("DepthCloud: Updating Renderer");
        // Update the point cloud renderer with the newly calculated points.
        pointCloudRenderer.UpdatePointCloud(cachedDepthPoints);
    }

    /// <summary>
    /// Processes range data from a sensor to generate a point cloud. This function transforms sensor range data,
    /// which measures the distance to each point in the sensor's view, into 3D world coordinates. It takes into account
    /// the resolution of the range data, a confidence map for filtering purposes, and the camera's position and orientation
    /// in the world to accurately map each point from sensor space to world space.
    /// (Range Image = distance to point, Depth Image = distance to plane)
    /// </summary>
    /// <param name="depthPoints">Reference to an array of Vector3 points that will be populated with the world coordinates of each depth point. This array is directly modified to contain the results.</param>
    /// <param name="depthData">A NativeArray of float values representing the range data from the depth sensor. Each float value is the distance from the sensor to the point in the scene.</param>
    /// <param name="confidenceData">A NativeArray of float values representing the confidence for each point in the depthData array. This is used to filter out unreliable data points based on a confidence threshold.</param>
    /// <param name="resolution">The resolution of the depth sensor's output, given as a Vector2Int where x is the width and y is the height of the depth data array.</param>
    /// <param name="cameraToWorldMatrix">A Matrix4x4 representing the transformation from camera space to world space. This is used to translate each point's coordinates into the global coordinate system.</param>
    private void ProcessDepthPoints(ref Vector3[] depthPoints, NativeArray<float> depthData, NativeArray<float> confidenceData, Vector2Int resolution, Matrix4x4 cameraToWorldMatrix)
    {
        // Initialize or resize the depthPoints array based on the current resolution, if necessary.
        if (depthPoints == null || depthPoints.Length != depthData.Length)
        {
            depthPoints = new Vector3[resolution.x * resolution.y];
            Debug.Log("DepthCloud: Initializing New Depth Array");
        }

        Debug.Log($"DepthCloud: Processing Depth. Resolution : {resolution.x} x {resolution.y}");

        // Iterate through each pixel in the depth data.
        for (int y = 0; y < resolution.y; ++y)
        {
            for (int x = 0; x < resolution.x; ++x)
            {
                // Calculate the linear index based on x, y coordinates.
                int index = x + (resolution.y - y - 1) * resolution.x;
                float depth = depthData[index];

                // Skip processing if depth is out of range or confidence is too low (if filter is enabled).
                // Confidence comes directly from the sensor pipeline and is represented as a float ranging from
                // [-1.0, 0.0] for long range and [-0.1, 0.0] for short range, where 0 is highest confidence. 
                if (depth < minDepth || depth > maxDepth || (useConfidenceFilter && confidenceData[index] < -0.1f))
                {
                    //Set the invalid points to be positioned at 0,0,0
                    depthPoints[index] = Vector3.zero;
                    continue;
                }

                // Use the cached projection table to find the UV coordinates for the current point.
                Vector2 uv = cachedProjectionTable[y, x];
                // Transform the UV coordinates into a camera space point.
                Vector3 cameraPoint = new Vector3(uv.x, uv.y, 1).normalized * depth;
                // Convert the camera space point into a world space point.
                Vector3 worldPoint = cameraToWorldMatrix.MultiplyPoint3x4(cameraPoint);

                // Store the world space point in the depthPoints array.
                depthPoints[index] = worldPoint;
            }
        }
    }

    /// <summary>
    /// Creates a new Texture2D or updates an existing one using data from a depth camera's frame buffer.
    /// </summary>
    /// <param name="texture">The current texture to update. If this is null, a new texture will be created.</param>
    /// <param name="frameBuffer">The frame buffer from the MLDepthCamera containing the raw depth data.</param>
    /// <returns>The updated or newly created Texture2D populated with the frame buffer's depth data.</returns>
    private Texture2D CreateOrUpdateTexture(Texture2D texture, MLDepthCamera.FrameBuffer frameBuffer)
    {
        if (texture == null)
        {
            texture = new Texture2D((int)frameBuffer.Width, (int)frameBuffer.Height, TextureFormat.RFloat, false);
        }

        texture.LoadRawTextureData(frameBuffer.Data);
        texture.Apply();
        return texture;
    }

    /// <summary>
    /// Creates a projection table mapping 2D pixel coordinates to normalized device coordinates (NDC),
    /// accounting for lens distortion.
    /// </summary>
    /// <param name="intrinsics">The intrinsic parameters of the depth camera, including resolution,
    /// focal length, principal point, and distortion coefficients.</param>
    private Vector2[,] CreateProjectionTable(MLDepthCamera.Intrinsics intrinsics)
    {
        // Convert the camera's resolution from intrinsics to a Vector2Int for easier manipulation.
        Vector2Int resolution = new Vector2Int((int)intrinsics.Width, (int)intrinsics.Height);
        // Initialize the projection table with the same dimensions as the camera's resolution.
        Vector2[,] projectionTable = new Vector2[resolution.y, resolution.x];

        // Iterate over each pixel in the resolution.
        for (int y = 0; y < resolution.y; ++y)
        {
            for (int x = 0; x < resolution.x; ++x)
            {
                // Normalize the current pixel coordinates to a range of [0, 1] by dividing
                // by the resolution. This converts pixel coordinates to UV coordinates.
                Vector2 uv = new Vector2(x, y) / new Vector2(resolution.x, resolution.y);

                // Apply distortion correction to the UV coordinates. This step compensates
                // for the lens distortion inherent in the depth camera's optics.
                Vector2 correctedUV = Undistort(uv, intrinsics.Distortion);

                // Convert the corrected UV coordinates back to pixel space, then shift
                // them based on the principal point and scale by the focal length to
                // achieve normalized device coordinates (NDC). These coordinates are
                // useful for mapping 2D image points to 3D space.
                projectionTable[y, x] = ((correctedUV * new Vector2(resolution.x, resolution.y)) - intrinsics.PrincipalPoint) / intrinsics.FocalLength;
            }
        }
        //Return the created projection Table
        return projectionTable;
    }

    /// <summary>
    /// Applies distortion correction to a UV coordinate based on given distortion parameters.
    /// </summary>
    /// <param name="uv">The original UV coordinate to undistort.</param>
    /// <param name="distortionParameters">Distortion parameters containing radial and tangential distortion coefficients.</param>
    /// <returns>The undistorted UV coordinate.</returns>
    private Vector2 Undistort(Vector2 uv, MLDepthCamera.DistortionCoefficients distortionParameters)
    {
        // Calculate the offset from the center of the image.
        Vector2 offsetFromCenter = uv - Half2;

        // Compute radial distance squared (r^2), its fourth power (r^4), and its sixth power (r^6) for radial distortion correction.
        float rSquared = Vector2.Dot(offsetFromCenter, offsetFromCenter);
        float rSquaredSquared = rSquared * rSquared;
        float rSquaredCubed = rSquaredSquared * rSquared;

        // Apply radial distortion correction based on the distortion coefficients.
        Vector2 radialDistortionCorrection = offsetFromCenter * (float)(1 + distortionParameters.K1 * rSquared + distortionParameters.K2 * rSquaredSquared + distortionParameters.K3 * rSquaredCubed);

        // Compute tangential distortion correction.
        float tangentialDistortionCorrectionX = (float)((2 * distortionParameters.P1 * offsetFromCenter.x * offsetFromCenter.y) + (distortionParameters.P2 * (rSquared + 2 * offsetFromCenter.x * offsetFromCenter.x)));
        float tangentialDistortionCorrectionY = (float)((2 * distortionParameters.P2 * offsetFromCenter.x * offsetFromCenter.y) + (distortionParameters.P1 * (rSquared + 2 * offsetFromCenter.y * offsetFromCenter.y)));
        Vector2 tangentialDistortionCorrection = new Vector2(tangentialDistortionCorrectionX, tangentialDistortionCorrectionY);

        // Combine the radial and tangential distortion corrections and adjust back to original image coordinates.
        return (radialDistortionCorrection + tangentialDistortionCorrection) + Half2;
    }
}

Hi kbabilinski,

Thank you for the example code. I got the conversion working now. There is definitely some filtering to be done, but otherwise the result looks good. For some reason, I had to swap x and y at the end, to avoid the room being mirrored. This ticket can be closed now.

Edit: the mirroring is of course due to the fact that Unity uses left-handed coordinate system and I was viewing the point cloud using CloudCompare, which is using right-handed coordinate system.

Regards,
Paul

1 Like

Happy to hear that you got is working, I will keep an eye out on any additional posts from you.

I am working on providing more sensor samples once the OpenXR Magic Leap Pixel Sensor API goes live . (It should provide a more uniform way of accessing our pixel sensors: world, depth, eye)

1 Like

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.