Locating ML2 physical camera in OpenXR

Hi,

I would like to locate ML2 physical camera in OpenXR.

I'm currently locating it using a VIEW reference space. The location returned by xrLocateSpace is however slightly erroneous. From what I understand, this is "expected" as camera "origin" is the left eye [1], not the real camera origin. What would be then the best solution to precisely locate ML2 physical camera?

For HoloLens 2 headset, Microsoft provides XR_MSFT_spatial_graph_bridge(3). Is there something similar for ML2 headset?

Or am I best sticking with MLCVCameraGetFramePose? Since the returned pose will be in MLPerception "world" with different time reference, different world origin and maybe different coordinate system than OpenXR, how to then interop between MLPerception and OpenXR data?

Thanks.

[1] Hologram Composition Offset - #3 by kbabilinski

Hi @emaschino, great question.

To obtain the position of the RGB camera use the MLCVCameraGetFramePose API. Instead of doing a custom conversion use the UNBOUNDED space. This will make sure that both the Headset and the Camera pose reference the same origin. Otherwise you will need to convert between UNBOUNDED and the other space.

Thanks @kbabilinski for your answer.

I'm not fully getting all the points in your explanation, though. Would you please mind clarifying them?

My app already uses an UNBOUNDED reference space as "world" reference. And I'm locating the VIEW space of the camera headset in this UNBOUNDED base reference space. So basically, I'm doing xrLocateSpace(viewSpace, unboundedSpace, timestamp, &location). So your suggestion is not to use a VIEW space for the camera headset, just get the pose returned by MLCVCameraGetFramePose, without a call to xrLocateSpace, that's it? MLPerception poses and OpenXR locations are thus in the same coordinate space, with the same world origin?

Regarding frame timestamps in MLCameraResultExtras::vcam_timestamp (expressed in µs from Unity API documentation, nothing stated for Native API but I imagine the same), they seem to be quite different than OpenXR runtime ones. So, to synchronize between them, I'm recording the system monotonic clock when a new frame arrived and convert the timespec time with xrConvertTimespecTimeToTimeKHR from XR_KHR_convert_timespec_time extension. Is this the way to interop between MLTime and XrTime timestamps to further use them with xrLocateSpace and/or MLCVCameraGetPose?

I was able to track down some additional information regarding this:

You will need to use a combination of the timespec convert OpenXR extension and the ml_time.h time conversion functions.

I'm recording the system monotonic clock when a new frame arrived

Instead you should only be using the timestamp from the camera metadata.

To go from XrTime to MLTime:
MLTime:xrConvertTimeToTimespecTimeKHR

MLTime to XrTime:
MLTimeConvertMLTimeToSystemTime


Also I recommend using the Android NDK Camera API instead of the MLCamera . For the the timestamp from the Android NDK Camera API it is returned in microseconds so it will need to be converted into nanoseconds by hand.

  static void OnCaptureCompleted(void *context, ACameraCaptureSession *session, ACaptureRequest *request, const ACameraMetadata *result) {
    if (context == nullptr) {
      return;
    }
    ACameraMetadata_const_entry entry;
    if (ACameraMetadata_getConstEntry(result, ML_CONTROL_CAMERA_MLTIME_TIMESTAMPS, &entry) != ACAMERA_OK) {
      return;
    }
    int64_t exposure_timestamp_us_ = entry.data.i64[1];
    int64_t exposure_timestamp_ns_ = exposure_timestamp_us_ * 1000;
  }

If you use the MLCamera the MLCameraResultExtras::vcam_timestamp is in ns as it's already MLTime.


If you want to locate the camera into some space that makes sense in OpenXR. In this example you will need to enable the msft unbounded reference space extension. You should also replace <referential for camera> with your target space :

  1. Receive camera callback
  2. Pass timestamp in nanoseconds to MLCVCameraGetPose
  3. Convert timestamp in nanoseconds from MLTime to XrTime as described above
  4. Call xrLocateSpace with :
    .base space = <referential for camera>,
    .space = MSFT_UnboundedReferenceSpace,
    .time = timestamp (converted to XrTime)
  5. Multiply <result of xrLocateSpace> * <result ofMLCVCameraGetPose > that will give you base_space_T_vcam

To go from XrTime to MLTime:
MLTime:xrConvertTimeToTimespecTimeKHR

MLTime to XrTime:
MLTimeConvertMLTimeToSystemTime

As I can't find any reference to XrTime in the above links (only MLTime and system time), I'm not sure I got your point correctly. What I did:

  • retrieve the camera MLTime timestamp to pass to MLCVCameraGetFramePose;
  • convert this MLTime timestamp to timespec time with MLTimeConvertMLTimeToSystemTime;
  • convert the resulting timespec time to XrTime with xrConvertTimespecTimeToTimeKHR to pass to xrLocateSpace.

Is this correct?

In the following logs, I'm recording the original camera MLTime (MLTime = lines), converted to timespec (timespec = lines) and converted to XrTime (XrTime = lines) timestamps as well as the locations returned by MLCVCameraGetFramePose (MLCVCameraGetFramePose = lines) and xrLocateSpace (xrLocateSpace = lines). I'm however occasionally getting the following exception in MLCVCameraGetFramePose:

2023-12-18 15:00:27.058  1031-2560  ml_camera_client        com.mycompany.myapp    D  Ready for raw video frame callback
2023-12-18 15:00:27.058  1031-2560  utility                 com.mycompany.myapp    I  MLTime = 1050517926488000
2023-12-18 15:00:27.058  1031-2560  utility                 com.mycompany.myapp    I  MLCVCameraGetFramePose: orientation = (-0.250627, 0.489080, 0.121367, 0.826594), position = (-0.214779, 0.067654, 0.024862)
2023-12-18 15:00:27.058  1031-2560  utility                 com.mycompany.myapp    I  timespec = 1050473273482136
2023-12-18 15:00:27.058  1031-2560  utility                 com.mycompany.myapp    I  XrTime = 1050473273482136
2023-12-18 15:00:27.058  1031-2560  utility                 com.mycompany.myapp    I  xrLocateSpace: orientation = (-0.245247, 0.491186, 0.121783, 0.826897), position = (-0.173523, 0.066953, 0.041910)
2023-12-18 15:00:27.266  1031-2560  ml_camera_client        com.mycompany.myapp    D   released video buffer 558 timestamp = 1050473323618816
2023-12-18 15:00:27.269  1031-1049  ml_camera_client        com.mycompany.myapp    D   got video buffer 559 timestamp = 1050473349627761
2023-12-18 15:00:27.269  1031-1054  ml_camera_client        com.mycompany.myapp    D  capturecompletecount = 562 timestamp = 1050473806111016
2023-12-18 15:00:27.270  1031-1054  ml_camera_client        com.mycompany.myapp    D  capture complete for 1050473806111016
2023-12-18 15:00:27.273  1031-2563  ml_camera_client        com.mycompany.myapp    D  Ready for raw video frame callback
2023-12-18 15:00:27.273  1031-2563  utility                 com.mycompany.myapp    I  MLTime = 1050517959812000
2023-12-18 15:00:29.230  1031-2563  com.mycompany.myapp     com.mycompany.myapp    E  leapcore/frameworks/perception/data_sources/include/pad/xpad_data_source.h(107) GetClosestTimestampedData():
                                                                                                    ERR:  exception: Data Not Found for timestamp: 1050517959812us
2023-12-18 15:00:29.239  1031-2563  com.mycompany.myapp     com.mycompany.myapp    E  leapcore/frameworks/perception/data_sources/include/pad/xpad_data_source.h(107) GetClosestTimestampedData():
                                                                                                    ERR: [stack trace begin]
                                                                                                    #00 pc 0000000000020432  /system/lib64/libml_perception_head_tracking.so (_ZNK2ml10perception11datasources14XPadDataSourceINS_3pil2ss17world_pose_resultEE25GetClosestTimestampedDataERKNSt3__16chrono8durationIxNS7_5ratioILl1ELl1000000000EEEEE+3026)
                                                                                                    #01 pc 0000000000017942  /system/lib64/libml_perception_head_tracking.so (_ZNK2ml10perception10components13head_tracking12HeadTracking7GetPoseERKNSt3__16chrono8durationIxNS4_5ratioILl1ELl1000000000EEEEE+834)
                                                                                                    #02 pc 000000000002cbe0  /system/lib64/libperception.magicleap.so (_ZZ22MLCVCameraGetFramePoseENK3$_2clEv+368)
                                                                                                    #03 pc 000000000002c70e  /system/lib64/libperception.magicleap.so (MLCVCameraGetFramePose+222)

From the native API documentation (CV Camera | MagicLeap Developer Documentation), The camera tracker only caches a limited set of poses. Retrieve the camera pose as soon as the timestamp is available else the API may return MLResult_PoseNotFound. Nowhere is it stated that an exception can occur. Should I guard against it?

Similarly, it sometimes happens in xrLocateSpace:

2023-12-18 15:58:17.115 25492-25844 ml_camera_client        com.mycompany.myapp    D  Ready for raw video frame callback
2023-12-18 15:58:17.115 25492-25844 utility                 com.mycompany.myapp    I  MLTime = 1053987885382000
2023-12-18 15:58:17.115 25492-25844 utility                 com.mycompany.myapp    I  timespec = 1053943138692888
2023-12-18 15:58:17.115 25492-25844 utility                 com.mycompany.myapp    I  XrTime = 1053943138692888
2023-12-18 15:58:17.118 25492-25844 com.mycompany.myapp     com.mycompany.myapp    E  leapcore/frameworks/perception/data_sources/include/pad/xpad_data_source.h(107) GetClosestTimestampedData():
                                                                                                    ERR:  exception: Data Not Found for timestamp: 1053987885260us
2023-12-18 15:58:17.135 25492-25844 com.mycompany.myapp     com.mycompany.myapp    E  leapcore/frameworks/perception/data_sources/include/pad/xpad_data_source.h(107) GetClosestTimestampedData():
                                                                                                    ERR: [stack trace begin]
                                                                                                    #00 pc 0000000000020432  /system/lib64/libml_perception_head_tracking.so (_ZNK2ml10perception11datasources14XPadDataSourceINS_3pil2ss17world_pose_resultEE25GetClosestTimestampedDataERKNSt3__16chrono8durationIxNS7_5ratioILl1ELl1000000000EEEEE+3026)
                                                                                                    #01 pc 0000000000017942  /system/lib64/libml_perception_head_tracking.so (_ZNK2ml10perception10components13head_tracking12HeadTracking7GetPoseERKNSt3__16chrono8durationIxNS4_5ratioILl1ELl1000000000EEEEE+834)
                                                                                                    #02 pc 000000000001a229  /system/lib64/libml_perception_head_tracking.so (_ZNSt3__110__function6__funcIZNK2ml10perception10components13head_tracking8internal4$_12clERNS5_12HeadTrackingEEUlNS_6chrono8durationIxNS_5ratioILl1ELl1000000000EEEEEE_NS_9allocatorISF_EEFNS3_10pose_query15NodeQueryResultI9MLPoseExtEESE_EEclEOSE_+41)
                                                                                                    #03 pc 0000000000011f4f  /system/lib64/libml_perception_session.so (_ZNK2ml10perception10pose_query9PoseQueryINS_2IdE9MLPoseExtKNS1_19ml_pose_identity_fnMUlRKS3_E_EKNS1_18ml_pose_combine_fnMUlRKS4_SA_E_EKNS1_17ml_pose_invert_fnMUlSA_E_EE7GetPoseES6_S6_NSt3__16chrono8durationIxNSG_5ratioILl1ELl1000000000EEEEE+431)
                                                                                                    #04 pc 000000000000e6e1  /system/lib64/libml_perception_session.so (_ZNK2ml10perception10components7session7Session7GetPoseERKNS_2IdES6_NSt3__16chrono8durationIxNS7_5ratioILl1ELl1000000000EEEEE+273)
                                                                                                    #05 pc 00000000001a2882  /system/lib64/libopenxr_runtime.magicleap.so (_ZN2ml3oxr5space17OXR_xrLocateSpaceINS_8PlatformILNS_14TargetPlatformE2EEEEE8XrResultP9XrSpace_TS8_lP15XrSpaceLocation+834)
                                                                                                    #06 pc 00000000000b67dd  /data/app/com.mycompany.myapp-8QHNA-tW_b-zWAd-Mx0vWQ==/base.apk!libopenxr_loaderd.so (offset 0x42d000) (xrLocateSpace+173)

Any idea what's going wrong?

@kbabilinski,

It looks like the exception thrown in GetClosestTimeStampData() that I reported in my previous post was already seen before [1].

Based on your comments out there, my app doesn't run camera at 60 but 30FPS, so probably not the reason of too old timestamps. However, having to use both the Perception system and OpenXR runtime to locate the camera by multiplying the pose returned by xrLocateSpace by the one returned by MLCVCameraGetFramePose as you detailed previously can be seen as a serious resources usage. I wish I could use OpenXR on its own (my initial objective) but it seems that interop with ML SDK is needed for what I'm trying to achieve.

So from now, the only way I've found to accurately locate the camera without having an exception thrown in GetClosestTimeStampData() is to record the system monotonic clock in OnVideoAvailable callback (rather than using the camera timestamp in MLCameraResultExtras::vcam_timestamp), convert it to MLTime with MLTimeConvertSystemTimeToMLTime and pass it to MLCVCameraGetFramePose to get the camera pose, without relying upon OpenXR's xrLocateSpace at all, which is suboptimal. This is kinda working: the returned camera locations are accurate but the location updates are really laggy. I know this may come from erroneous (i.e. too new) timestamps recorded in OnVideoAvailable callback rather than at exposure time in MLCameraResultExtras::vcam_timestamp for display prediction, but it's the only solution I've found so far. And I'm pretty sure this is not the right solution...

[1] Crash on GetClosestTimeStampData())

Yes that is correct.

In the following logs, I'm recording the original camera MLTime (MLTime = lines), converted to timespec (timespec = lines) and converted to XrTime (XrTime = lines) timestamps as well as the locations returned by MLCVCameraGetFramePose (MLCVCameraGetFramePose = lines) and xrLocateSpace (xrLocateSpace = lines). I'm however occasionally getting the following exception in MLCVCameraGetFramePose

We maintain a buffer of 500ms worth of historical headposes. If they take over 500ms from when the camera capture was taken to when they query for the pose, that will happen.

The exception that your are getting should be caught and handled by our API and return MLResult_PoseNotFound . However, we are actively tracking a bug that prevents this from working properly. While we work on resolving the issue I recommend :Guard the calling code to not make a query into the head pose data source if the timestamp is older than 500ms thus guaranteed to fail and throw. This would avoid performance implications of the exception but the app will still see the same API behavior of MLResult_PoseNotFound returned.

Regarding the last point. We are currently working on a OpenXR way of locating the camera. The way the OpenXR is using the "now" time will result in inaccurate camera poses. I recommend they call MLCVCameraGetFramePose and xrLocateSpace immediately upon receiving the camera frame callback.

I've switched from MLCamera module to my own Android NDK Camera2-based implementation, using ML_CONTROL_CAMERA_MLTIME_TIMESTAMPS as frame timestamp, as you explained earlier. Many thanks, this did the trick: I'm no more hitting the crash in GetClosestTimeStampData() due to too old timestamps and I can accurately locate the camera with MLCVCameraGetFramePose alone.

I'm not getting this point. Aren't both MLCVCameraGetFramePose and xrLocateSpace supposed to return the from-camera-to-world transform? Each in its own world origin and coordinate system, I agree. But why multiplying the results of both functions yields to the final OpenXR camera location?

Happy to hear that you were able to resolve the GetClosestTimeStampData issue.

Regarding the xrLocateSpace, MLCVCameraGetFramePose gives you the position of the camera in world origin coordinate system which is not a concept in OpenXR. However it happens to align with our implementation of msft_unbounded_ref_space so we take advantage of that to allow the user to express the position of the camera in any OpenXR space they choose. Note: OpenXR's local reference space is not the same origin as world origin in our CAPI.

Sorry, I'm still puzzled :stuck_out_tongue:

In our app, ML2 physical camera is located with a view (not local) reference space in XR_MSFT_unbounded_reference_space base reference space.

If MLCVCameraGetFramePose gives the position of ML2 physical camera in world origin coordinate system that happens to align with XR_MSFT_unbounded_reference_space, the returned pose is thus the "same" than above, the only difference being that xrLocateSpace actually gives the location of the device left eye (IIRC) whereas MLCVCameraGetFramePose gives the location of the camera. Is this correct?

I can then understand that multiplying the pose returned by xrLocateSpace with the one returned by MLCVCameraGetFramePose (or rather the inverse pose, as suggested by a colleague) will give us the transform between the camera and device left eye, but I don't get its use there. Unless for computing camera <-> left eye transform once and apply it each time to the pose returned by xrLocateSpace to obtain the camera pose rather than the left eye pose. Is this what you had in mind? Is a single call to MLCVCameraGetFramePose to compute one for all this camera <-> left eye transform sufficiently accurate?

Not sure if I understand but let me try to clarify.

If MLCVCameraGetFramePose gives the position of ML2 physical camera in world origin coordinate system that happens to align with XR_MSFT_unbounded_reference_space, the returned pose is thus the "same" as above, the only difference being that xrLocateSpace actually gives the location of the device left eye (IIRC) whereas MLCVCameraGetFramePose gives the location of the camera. Is this correct?

If you are calling xrLocateSpace(.base_space = view, .space = XR_MSFT_unbounded_reference_space, ...) this will give you the position of the average of the left and right display referential which ends up being somewhere between the two display and a bit ahead of the users eyes (not the left eye).

I can then understand that multiplying the pose returned by xrLocateSpace with the one returned by MLCVCameraGetFramePose (or rather the inverse pose) will give us the transform between the camera and device left eye, but I don't get its use there.Is it for computing camera <-> left eye transform once and apply it each time to the pose returned by xrLocateSpace to obtain the camera pose rather than the left eye pose. Is this what you had in mind?Is a single call to MLCVCameraGetFramePose to compute the offset for all this camera <-> left eye transform sufficiently accurate?

I recommend to call both every frame because MLCVCameraGetFramePose will give you the rgb camera position in world origin coordinate system (rgb camera moves relative to world origin as the headset moves in physical space) and xrLocateSpace(.base_space = view, .space = XR_MSFT_unbounded_reference_space) gives the position of world origin in view coordinate system (view moves relative to world origin as the headset moves around). Combining the two will give you the rgb camera relative to the view.

This will become much easier in an upcoming version of the Magic Leap OpenXR API that includes support support pixel sensors.

With the release of Magic Leap SDK v1.6.0, MLHeadTrackingCreate and MLHeadTrackingDestroy are now flagged deprecated. They were used to create/destroy the head tracker passed to the MLCVCameraGetFramePose function (second argument: MLHandle head_handle). Indeed, as discussed above, just calling xrLocateSpace doesn't give accurate RGB camera location. So how to deal with the deprecation?

I had a look at the newly introduced XR_ML_pixel_sensor extension you're referring, but didn't found how it relates to RGB camera location. Is it through the creation of an OpenXR space for the RGB camera thanks to xrCreatePixelSensorSpaceML? And then simply locating the RGB camera with a direct call to xrLocateSpace(worldSpace, rgbCameraSpace)? Or will I still have to do a composition of the RGB camera location with the helmet location as with MLCVCameraGetFramePose?

Good question, I have reached out to the team to get more information about this. In the meantime note that the previous method will continue to work.

I just confirmed that you are correct about the overall mechanics of getting the camera pose using the OpenXR API.

Thanks for checking/confirming. Is there some documentation beside what can be found in XR_ML_pixel_sensor.h add-on header? For example, XrPixelSensorCreateSpaceInfoML structure passed to xrCreatePixelSensorML function expects an offset. How/where do I know this offset? Is creating a pixel sensor space sufficient for locating or do I also have to configure and start/stop the sensor beforehand with xrConfigurePixelSensorAsyncML and xrStartPixelSensorAsyncML/xrStopPixelSensorAsyncML before being locatable? In the context of ML2 camera, won't this xrStartPixelSensorAsyncML/xrStopPixelSensorAsyncML conflict with NDK Camera2 API for retrieving the camera frames?

How/where do I know this offset?

This is just an openxr thing which allows you to pass in an (arbitrary app chosen) offset when creating a new XrSpace . In this case I don’t see any use for this since its almost always going to be identity. Let me discuss with the team to make sure that I’m not missing anything and remove it.

Is creating a pixel sensor space sufficient for locating or do I also have to configure and start/stop the sensor beforehand with xrConfigurePixelSensorAsyncML and xrStartPixelSensorAsyncML/xrStopPixelSensorAsyncML before being locatable?

configuring/starting the sensor is not a precondition to creating an XrSpace for that sensor. All you need to do is successfully connect to it and then it should be locatable.

won’t this xrStartPixelSensorAsyncML/xrStopPixelSensorAsyncML conflict with NDK Camera2 API for retrieving the camera frames?

Currently we do not support to get frames from the RGB camera via OpenXR so you will need to use the NDK APIs.

I took some time to implement a frame locator leveraging OpenXR XR_ML_pixel_sensor rather than ML SDK MLCVCameraGetFramePose.

It's working, but I once again have the offset I was describing at the very beginning of this thread.

When creating the sensor, I've xrStringToPath'ed "/pixelsensor/picture/center" as .sensor member of the XrPixelSensorCreateInfoML struct. I think this is the correct one for ML2 physical camera as the other available sensors relate to depth, eye and world cameras.

When creating the sensor space, I've only set to 1.0 the w component of the .offset.orientation member of the XrPixelSensorCreateSpaceInfoML struct, that's all, no additional transform here.

So basically xrLocateSpace(space = sensor_reference_space, baseSpace = XR_MSFT_unbounded_reference_space) gives the (visually, at least) same result than xrLocateSpace(space = VIEW_reference_space, baseSpace = XR_MSFT_unbounded_reference_space) and is not as good/accurate as the one returned by MLCVCameraGetFramePose.

Am I missing something?

XR_REFERENCE_SPACE_TYPE_VIEW maps to the rig display id in our OpenXR implementation. This would be the average of the locations of the two displays. This location is fairly close to the location of the video camera (the video camera is slightly to the left and above the center of the two displays), so I would expect that the two calls to xrLocateSpace with the space ’s and baseSpace ’s you indicated to visually appear like they’re giving the user a similar pose. But they should definitely be different poses.

One thing that we forgot to mention in the release notes is that the 1.6.0 OS returns the timestamps in nano seconds instead of micorseconds. So the manual time conversion does not have to be performed.