What coud we know about frame camtoworld transformation accuracy, reliability and FrameTimestamps?

For each frame data there is a "camToWorld transformation" / "pose" or in case of CV camera it can be queried by using an API call.

In optimal case the pose is calculated for the same moment when the frame was aquired.

In practice how close it is to the optimal case?

Could we know how the pose is calcualted or estimated?
E.g. it is determined in every 1/x sec, and each frame will get the closest one, or an estimation is calculated somehow?

Are there timestamps used for some estimation calcualtion? Can we access them? What type of syncronization is used for this?
How the speed or acceleration affects the accuracy and reliability?

About the frame timestamps
The Frame "MLTime FrameTimestamps" of each sensor/camera seems to be independent from eachother. They starting point are not the same and the frequency is also different.
How could we get common timestamps which would mean the same time on the sam bases/epoch for all sensor/camera we use?

Even the MLTime.ConvertMLTimeToSystemTime result does not seem to be such common timestamp.

Could you clarify what you mean by this?

For synchronizing cameras, we recommend checking out this solution.

@etucker :
For the second point:

I tried to use the ConvertMLTimeToSystemTime method, but for me it seems, that it does not return a common "system" time. Let's see result from a recordings:

    TimeStampSys     ElapsedTSSeconds FrameType        FrameNumber     
    191901473814     191,901473814    DepthShortRange  169             
    400564877422     200,282438711    WorldNormalRight 468             
    191865591241     191,865591241    CvVideo          205             
    400564877950     200,282438975    WorldNormalLeft  468             
    400564877810     200,282438905    WorldNormalCente 468             
    191941472628     191,941472628    DepthShortRange  170             
    191932267410     191,93226741     CvVideo          206             
    191981473928     191,981473928    DepthShortRange  171             
    191998928693     191,998928693    CvVideo          207             
    192021473486     192,021473486    DepthShortRange  172             
    400684877816     200,342438908    WorldNormalCente 471             
    400684877891     200,3424389455   WorldNormalRight 471             
    400684877825     200,3424389125   WorldNormalLeft  471             
    192061472488     192,061472488    DepthShortRange  173             
    192065604554     192,065604554    CvVideo          208             
    192101473610     192,10147361     DepthShortRange  174             
    400764876805     200,3824384025   WorldNormalLeft  473             
    400764879484     200,382439742    WorldNormalRight 473             
    400764876841     200,3824384205   WorldNormalCente 473             
    192141473692     192,141473692    DepthShortRange  175             
    192132263060     192,13226306     CvVideo          209             
    192181472653     192,181472653    DepthShortRange  176             
    192198930657     192,198930657    CvVideo          210             
    400884877325     200,4424386625   WorldNormalLeft  476             

The TimeStampSys values are calcualted by calling ConvertMLTimeToSystemTime .Investigating an hour long recordings, from the timestamp differences belonging to same framtypes we see that worldcamera systemtime tick is half-nanosec, the other cameras systemtime tick is 1 nanosec.
Using this rule, I calcualted the second column:ElapsedTSSeconds.
From this we can see that there are 8-9 seconds differences between different type of framtimestamps which follows each-other.

On the sceen we recorded , there was a stopper; by this we checked that the recordings does not contain time-shifts, so the differences are coming from the timestamps only.

Based on this we guess that the starting point (0 nanosec) for each sensor can be different.

So the question is:
After having the TimeStampSys values for each frames, how can we get for one frame all the other type of frames which are the most closer in real-time.

Can we calcualte the real-time differences between timestamps if the frametypes are different?

For your 1st question:
e.g. in case of CV camera it is written, than poses are cached and frametimestamps is used for geting the pose.
But how?
It coud be the key in the cache, or somehow this timestamp is used for geting a global time which is used for selecting one "nearest in time" pose or calculating one pose from the previous and next in the time poses, by using speed vector, acceleration etc.

Could you perhaps provide more details on your use case? Are you trying to synchronize all of the data from the different cameras? This will help us get you towards a practical solution. If you are not comfortable sharing here, feel free to reach out to us directly.

What frequency are you capturing the CV video at? You are correct, the pixel streams are not synchronized. Do you mind sharing information about your implementation ? All sensor timestamps should be coming from the same clock

Hi

It is not a must to get data from cameras at the same moment (if you mean that on syncing them), but when processing them on a server we need to collect all the camera/sensor information which are closest to a given moment or to eachother.
Moreover we need to process data coming from external cameras and sensors too.

So first we need to know which frame from e.g. Depth camera is the closest-in-time to e.g a given CV camera frame.
Next we need to convert those timestamps to e.g. UTC ticks to get the closest-in-time information from external sources.

Lets see a situation:

  • We capture CV video frames in CamOnly mode with resolution:720p @15FPS but process only 10 FPS , capture DepthShortRange @5 FPS, and WorldNormalCenter @10 FPS

  • A callback functin is called by ML when a CV frame is ready for processing.

    OnCaptureRawVideoFrameAvailable(MLCamera.CameraOutput capturedFrame, MLCamera.ResultExtras resultExtras, MLCamera.Metadata metadataHandle)
    
    • this put the frame to CVQ if it is required to reach 10FPS and has free capacity
  • A GetDepthFrames task calls MLDepthCamera.GetLatestDepthData to get the last frame, put it to the DepthQ and make the thread sleep as musch as necessary to reach the required FPS.

  • A GetWorldFrames task calls worldCamera.GetLatestWorldCameraData to get the last frame, put it to the WorldQ and make the thread sleep as musch as necessary to reach the required FPS.

  • A CV Compressor Task reads the CVQ comresses the frame and sends it to the WriterQ

  • A Depth Compressor Task reads the DepthQ comresses the frame and sends it to the WriterQ

  • A World Compressor Task reads the WorldQ comresses the frame and sends it to the WriterQ

  • the writer Task reads the WriterQ ands sends frames data and metadata to the target. e.g to a server

To the server the next timestamps are sent currently for each frame:

  • FrameData.Timestamp
  • SysTimeStamp: MLTime.ConvertMLTimeToSystemTime(frameData.TimeStamp, out long sysTimeStamp);
  • UtcTicks (The moment when the CV callback function or timed World/Depth request received the FrameData.)

Based on The SysTimeStamps we see (as I sent earlier in this case) it seems that the starting points of the clocks for each frametype is different. So from this we can not tell which DepthFrame systimestamp value is the closest to which CVVideo systimestamp in the "real-world-time".

Thank you for that detailed information. I have reached out to the relevant team for guidance and will keep you posted as soon as I hear back.

Still working on this, and I will report back as soon as I have more information. We are investigating if there is a possible issue with ConvertMLTimeToSystemTime assuming a unified time step size, which could be causing the offset that you are seeing.

Thank you for your patience.

1 Like

Would it be possible to get a sample of the script / project you are using to test with?

Hi,
I attached a minimized Unity app which works on the same way as we get those timestamps in our full app and reproduce the same differences.

When opening in Unity, If you requested to enter safemode and see 'protobuf' errors,
than remove the "mlcamtest\Library\PackageCache\com.e7.protobuf-unity@948ab23d09" directory and wait for successful recompilation.

You may need to set the project target for Android e.g. when Magicleap Project setup tool window opens.

The code in \Assets\Vartid\Scripts\RecorderMenu.cs
I attach a screenshot too which show the results. It can be seen if we compare The half of the WorldSysTs to Depth SysTs than the differences betwee 8-9 seconds, and the WorldSysTs counts twice faster than DepthTs.
The CV SysTs is close to Depth Ts but it would be good to know if it is really is the same at the same real time or just close to the depth value.

mlcamtest.zip (253.8
KB)

1 Like

You may need to edit the Packages/manifest.json and set path for "com.magicleap.unitysdk":
before opening the Unity project,

The protobuf error on opening the project also solved if the protobuf is removed from this file before opening the project.