Implementing Yolov8 model in Unity Magic Leap 2 project

I'm seeking to integrate object detection into an existing Unity application for Magic Leap 2. I'm currently using Barracuda as my machine learning library along with a custom-trained YOLOv8 ONNX model. While I've successfully conducted inference both in Unity and on the device, I'm facing challenges in interpreting the outputs.

Could anyone recommend a Unity-supported package for inferencing with YOLOv8? Alternatively, is there a more effective approach to implementing object detection using a trained YOLO model in ONNX format?

@sfernandez, I would greatly appreciate it if you could take a look at this topic. Thank you for your assistance and our past collaborations! :blush:

Does this help?

Have fun with it, let me know what you think.

I would recommand to check Unity Sentis: Use AI models in Unity Runtime | Unity for ONNX format.

I tried using that solution, but I received a strange output. The values I'm extracting from the model are center, top-left, bottom-right, and confidence. All these values seem small, so I'm unsure how to convert them into bounding boxes. Additionally, the confidence values should be within the range of 0 to 1, but I'm getting values above 1.

image

Yeah, I am not quite the Yolo expert. It feels like you putting the wrong values in or reading the wrong values out of a tensor. But that is more of a hunch. What does Netron display about inputs and outputs?

This is the Netron results:
image

Data are read as Barracuda.Tensor. This is the exact code for reading data:

Center = new Vector2(tensorData[0, 0, 0, 0], tensorData[0, 0, 0, 1]);
Size = new Vector2(tensorData[0, 0, 0, 2], tensorData[0, 0, 0, 3]);
TopLeft = Center - Size / 2;
BottomRight = Center + Size / 2;
Confidence = tensorData[0, 0, 0, 4];

As inputs I pass the texture of image in resolution 640x640. Basically in the same way preprocessing is done in the example.

Like I said, I am not a Yolo expert and I have no idea if you are actually using more of my code or not. Have you made sure the model is exported with opset 9?

In YOLO v8, the center and size are normalized by resolution. So when convert to topleft ro bottomright, you have to multiply center and size by resolution.
Here is a Matlab script I wrote to convert topleft/bottomright to center/size, you can reverse it to get what you want which are x1, y1, x2, y2:

%reso is the resolution, e.g. [640 480]
%box is the bounding box [topleftx, toplefty, width, height]
% x, y are the normalized center, w, h are normalized width and height
% x1, y1, x2, y2 are coords of topleft/bottomright (non normalized)
function [x,y,w,h] = convert2yolo(reso, box)
    dw = 1./reso(1);
    dh = 1./reso(2);
    w = box(3);
    h = box(4);
    x1 = box(1);
    y1 = box(2);
    x2 = x1 + w;
    y2 = y1 + h;
    x = (x1+x2)/2;
    y = (y1+y2)/2;
    x = x*dw;
    w = w*dw;
    y = y*dh;
    h = h*dh;
end

I managed to solve this issue by flipping the image pixels