Best/Quickest approach to create fiducial marker tracking app with OpenCV?

Hi, I'm brand new to the Magic Leap development and just got the ML2 two days ago, and am looking for advice on the best approach to take for my app, such as using Unity vs Native API via android studio.

95% of my programming experience is working with c++ and Qt in Visual Studio, and we would be pairing the Magic Leap with an existing Qt application and code base running on a separate windows 10/11 tablet.

The primary use of the ML2 would be using the camera's video feed to track fiducial markers. We've previously had a lot of success and high accuracy with OpenCV's ArUco codes and would like to use the same on the ML2.
So the workflow would be:

  1. Capture 4k video feed on ML2
  2. Run OpenCV's aruco detect markers function to detect all markers in the scene and their corner positions
  3. Send the marker positions to windows machine over UDP/TCP
  4. Run pose estimation on the marker positions and perform other processing using about a dozen c++ libraries and existing code that would be impractical to port or run on the headset
  5. Send computation results back to ML2 over UDP/TCP for basic HUD/display information

I've had only basic experience dabbling in android studio, and never used it for native c/c++ projects, and have no experience in Unity, so I'm hoping someone more experienced could suggest which approach would be best for this type of application?

I've also had a lot of trouble trying to modify one of the ML2 sample Android Studio projects to build and deploy with OpenCV. I don't think I need any other 3rd party libraries, but OpenCV is a must for the initial image processing for corner detection, as wifi video streaming would be too impractical and low quality for our purposes.

Thanks

Hi @alex.teepe,

While we have had developers successfully build OpenCV applications for the ML2 we do not currently have a public sample project with this implementation.
However, we do have a native fiducial marker tracking API, which you can find an example for in the Magic Leap Hub Package Manager.

Is it a hard requirement for you to use OpenCV, or are you just needing to use marker tracking in general?

We also support Vuforia Engine in Unity, which has a marker tracking library, as well as our own Unity guides and examples for marker tracking.

We suggest using one of the existing marker tracking methods if you want to get started quickly and don't have to use OpenCV.

hi @kvlasova, thanks for your response. OpenCV wasn't a hard requirement for us, it was just the simplest and most robust library we could use to get started prototyping, but mostly it gives us more fine control over the image processing steps.

I spent most of the day testing the fiducial marker sample project and trying different combinations of camera settings on the custom profile to get maximum update rate and accuracy, but was still having trouble achieving the required accuracy.

Our application involves tracking a stationary reference marker(s) at a maximum of 4 feet away, and a handheld tool with marker(s) affixed to it averaging about 2 feet away from the user. Both reference and tool markers are expected and required to be in view at the same time. In addition, we require the positional accuracy of the markers to be < 1mm error.

On our laptop + OpenCV prototype tracking app (using 1080p USB webcam), we realized we would need multiple markers for the reference and tool to get the desired accuracy, so our current setup uses 4 co-planar 35x35mm Aruco markers on both the tool and the reference. The X,Y accuracy is surprisingly good, but it struggles in the depth dimension (about +/- 1mm noise), but using a 5 sample averaging window for position, we get acceptable/usable results.

However, we only achieve this accuracy because we track 4 markers together as a single unit. Each corner on the 4 markers has a known 3D position on a mesh, and we feed each corner position and known 3D coordinate into solvePnP (OpenCV: Perspective-n-Point (PnP) pose computation) to get accurate results.

The downside to the ML marker tracking API is it seems to only track markers individually, rather than a configuration of markers together, and the single marker accuracy isn't enough for our use case. Additionally, the Magic Leap seems to be reporting the global coordinates of the markers in the room/world, as opposed to just camera view, which I believe is adding additional computational requirements, and extra error when moving the headset around. Since we use a reference marker that acts as the "origin" in our scene, and only care about the relative transform between reference marker(s) and tool marker(s), we would only require the 2D image from the main 4k camera on the Magic Leap.

Do you provide an API where we could simply get the 2D corner positions of each Aruco or April Tag marker in the main camera's video feed? That's all we would be using OpenCV for.

Thanks

Why not stream the ML2 video to your windows machine?

@jaime.cisneros We've used the "Device Stream" option in the Magic Leap Hub and the video compression really diminished the image quality, and had a lot of artifacts, glitches, and dropped frames, and that was with only 1080p resolution. Our accuracy requirements would require the crisp native 4k video feed, at 30 or 60 fps, which seems impossible to reliably stream over wifi.

Another possibility is OpenCV for Unity: OpenCV for Unity | Integration | Unity Asset Store. You can try the example Android application to see if it works for your device Releases · EnoxSoftware/OpenCVForUnity · GitHub

If anyone's wondering, I did finally get OpenCV for Android Native compiling and running inside one of the samples. Took about a week of fighting nonstop with Android Studio and gradle in particular.

I'll post the steps I took to get it working here for anyone else who needs this (and for my future self if I forget)

The best instructions and only resource that worked for me was from this repo, loosely following the "How to create the native OpenCV project from scratch" section, but starting from one of the ML samples instead of a blank project, and skipping the java portion.

I used the latest OpenCV 4.7.0, downloading from the option for Android, which has Aruco library included
https://opencv.org/releases/

I used the Camera-Preview sample as the base project to start from and
Modified the CMakeLists.txt to add include directories, and library location

image

Edit the build.gradle file's externalNativeBuild->cmake->arguments to use -DANDROID_STL=c++shared instead of -DANDROID_STL=c++static. The project will compile even if you don't, but will fail to load the library on launch without it, since the OpenCV libraries require a shared lib, not a static lib.
image

and then for good measure, add all of the camera permissions

<uses-permission android:name="android.permission.CAMERA"/>
<uses-feature android:name="android.hardware.camera"/>
<uses-feature android:name="android.hardware.camera.autofocus"/>
<uses-feature android:name="android.hardware.camera.front"/>
<uses-feature android:name="android.hardware.camera.front.autofocus"/>

And then the build error that had me stumped for days was

The server may not support the client's requested TLS protocol versions: (TLSv1.2, TLSv1.3). You may need to configure the client to allow other protocols to be used.

Which had dozens of answers on how to fix, but none worked for me, except for finally editing the gradle.properties file and adding the line
systemProp.https.protocols=TLSv1.2
in order to prevent it from trying to use TLSv1.3, I guess, which some sites don't accept, and was causing gradle to inexplicably fail on the most basic steps as it was unable to resolve hosts to download dependencies from.
image

And finally I was able to edit the Camera_Preview sample to convert the camera's image data into a cv::Mat and do corner detection on it

  static void OnVideoAvailable(const MLCameraOutput *output, const MLHandle metadata_handle,
                               const MLCameraResultExtras *extra, void *data)
  {
      CameraPreviewApp* pThis = reinterpret_cast<CameraPreviewApp*>(data);

      // edited camera settings to read 4k images at 30 fps
      auto frame = output->planes[0];

      // initialize cv::Mat using a pointer to the image buffer inside of frame.
      // (No Copying occurs)
      // (But data is stored as RGBA, while opencv thinks it's BGRA, though that doesn't matter for my purposes)
      cv::Mat mat = cv::Mat(frame.height, frame.width, CV_8UC4, frame.data);

      // optional initialization by copying data
      // (memcpy Takes about 5ms)
      //cv::Mat mat = cv::Mat(frame.height, frame.width, CV_8UC4);
      //memcpy(mat.data, frame.data, frame.size);

      // (if you explicitly convert from RGBA to BGRA, it takes about 15 ms on the Magic Leap 2)
      //cv::cvtColor(mat, mat, cv::COLOR_RGBA2BGRA);

      // (if you convert from RGBA to grayscale, it takes about 3 ms)
      cv::cvtColor(mat, mGrayscaleImage, cv::COLOR_RGBA2GRAY);


      // I run camera calibration on pre-collected images in background on startup,
      // and wait until calibration finishes running before doing aruco detection.
      if (pThis->mbCameraCalibrated)
      {
          std::vector<int> markerIds;
          std::vector<std::vector<cv::Point2f>> markerCorners, rejectedCandidates;
          ALOGI("attempting to find markers\n");

          // Detect marker(s) and corners
          // This takes around 50-80 ms on 4k images on Magic Leap 2
          pThis->mArucoDetector.detectMarkers(mGrayscaleImage, markerCorners, markerIds, rejectedCandidates);

          std::map<int, std::vector<cv::Point2f>> markers;
          for (int i = 0; i < (int)markerIds.size(); i++)
          {
              int markerID = markerIds[i];
              std::vector<cv::Point2f> corners = markerCorners[i];

              markers.emplace(markerID, corners);
              ALOGI("detected marker: %d", markerID);
          }

          // drawing markers only works on 1 or 3 component images, so remove alpha and add back after.
          cv::Mat tmp;
          cv::cvtColor(mat, tmp, cv::COLOR_RGBA2RGB);
          cv::aruco::drawDetectedMarkers(tmp, markerCorners, markerIds);
          cv::cvtColor(tmp, mat, cv::COLOR_RGB2RGBA);
        }

        CameraPreviewApp *this_app = reinterpret_cast<CameraPreviewApp *>(data);
        if (!this_app->is_frame_available_) {

            memcpy(this_app->framebuffer_.data(),
                   mat.data, // output->planes[0].data  // display cv image instead of raw camera footage
                   output->planes[0].size);

            this_app->is_frame_available_ = true;
        } else {
            // When running with ZI, as the video needs be transfered from device to host, lots of frame
            // dropping is expected. So don't flood with this logging.
#ifdef ML_LUMIN
            ALOGW("%s() dropped a frame! This should never happen, apart from the app startup/teardown phase!", __func__);
#endif
        }
    }

Benchmark results on the Magic Leap 2:

  • Converting ML 4k camera frame into a cv::Mat using memcpy: 5ms
  • using cv::cvtColor(mat, mat, cv::COLOR_RGBA2BGRA) to convert from camera's RGBA to openCV's expected BGRA: 15ms (optional depending on your usage)
  • using cv::cvtColor() to convert from RGBA to Grayscale: ~3ms
  • cv::Aruco::ArucoDetector::DetectMarkers(): 50-80ms depending on the frame

Based on DetectMarker corners alone, the ML2 is only able to detect markers using CV's aruco library at around 12.5 - 20 fps at 4k, so I will likely need to do that part asynchronously in separate threads, but the accuracy of the borders looks really spot on, so that's good.

Hopes this helps anyone also wanting to use OpenCV or aruco detection. Took me a really long time to get this working.

3 Likes