We have been trying out a new iPad Pro with LiDAR and ARKit 3.5. We were keen to learn what kind of data ARKit can provide us and what exactly has changed since the last version. I would like to share the results with you
ARKit solves mapping of virtual objects to real-world surfaces via anchors. Every anchor carries a piece of information about its transform (position, orientation, scale) in virtual 3D world space. By using this information, we are able to render our entities at a well fitting position/orientation/scale. Therefore, when it is rendered over a video stream coming from the camera, it looks like it is really there.
The first version only allowed developers to put objects on horizontal planes. Step by step, the API has been extended and now (in the last major release at date of writing, which is ARKit 3) we have 4 types of anchors: ARPlaneAnchor (vertical and horizontal planes), ARImageAnchor (pre-trained image), ARObjectAnchor (pre-trained 3D object ), and ARFaceAnchor (human face).
ARKit 3.5 introduced a new type of anchor – ARMeshAnchor. As you may have already determined from its name, ARMeshAnchor does not only transform – by collecting data from LiDAR, it also provides information about geometry of surroundings.
Accessing geometry provided by ARMeshAnchor is done via var geometry: ARMeshGeometry { get } property.
Let's now look more closely at the new structure ARMeshGeometry (https://developer.apple.com/documentation/arkit/armeshgeometry):
{% c-block language="swift" %}
/**
A three-dimensional shape that represents the geometry of a mesh.
*/
@available(iOS 13.4, *)
open class ARMeshGeometry : NSObject, NSSecureCoding {
/*
The vertices of the mesh.
*/
open var vertices: ARGeometrySource { get }
/*
The normals of the mesh.
*/
open var normals: ARGeometrySource { get }
/*
A list of all faces in the mesh.
*/
open var faces: ARGeometryElement { get }
/*
Classification for each face in the mesh.
*/
open var classification: ARGeometrySource? { get }
}
{% c-block-end %}
Vertices, normals, and classification are represented by a new class ARGeometrySource. Apple documentation says it is mesh data in a buffer-based array. The type of data represented in the buffer is described by MTLVertexFormat.
So ARGeometrySource points to a piece of MTLBuffer, which is an array of vectors with count elements. The vector itself is also an array with fixed length (stride bytes), described in format.
By using a cut-and-try method I found out that for vertices and normals, the format is MTLVertexFormat.float3(i.e., 3 floats representing the X, Y, and Z coordinates of vertex / vector respectively).
The format for classification is MTLVertexFormat.uchar, which represents the raw value of ARMeshClassification enumumeration.
What I also found out was that the amount of normals is the same as the amount of vertices, not faces. This fact neither matches the documentation, which describes the normals property as rays that define which direction is outside for each face, nor does it fit the general meaning of a normal.
The next data type is ARGeometryElement, which is used for description of faces. It also contains a Metal buffer: MTLVertexFormat. The buffer contains an array of array of vertex indices. Each element in the buffer represents a face. Each face is represented by a fixed amount of numbers (indexCountPerPrimitive), and each number is a vertex index.
What I found interesting to explore is how ARKit assigns ARMeshAnchor to the feature points. RealityKit provides a visualisation for debugging purposes, but it only shows a grid based on geometries of all anchors. It's not possible to say what a single anchor's geometry looks like.
Generating and presenting a virtual object based on mesh data from LiDAR in real-time can be very cool, right?
Unfortunately, the procedural meshes in RealityKit are still not fully supported. You definitely can generate primitives by using MeshSource.generateBox and similar, but not a complex mesh. The geometry provided in ARMeshAnchor is a set of faces and you won't be able to represent it by using primitives. It is not a surprise that RealityKit is still quite new and is on its path to maturity.
There is still one way in RealityKit though. You can generate a MDLAsset via Model I/O framework, export it to usdz format, and import it again into RealityKit via ModelEntity.load(contentsOf:withName:). But you may experience high latency in real-time use due to I/O operations with the file system.
Where I really did succeed in runtime mesh generation was with a SceneKit, which allows you to dynamically create SCNGeometry and assign it to a SCNNode.
With a few handy extensions the code can look like this:
{% c-block language="swift" %}
/* MARK: **- ARSCNViewDelegate** */
func renderer(_ renderer: SCNSceneRenderer, nodeFor anchor: ARAnchor) -> SCNNode? {
/*
Create a node for a new ARMeshAnchor
We are only interested in anchors that provide mesh
*/
guard let meshAnchor = anchor as? ARMeshAnchor else {
return nil
}
/* Generate a SCNGeometry (explained further) */
let geometry = SCNGeometry(arGeometry: meshAnchor.geometry)
/* Let's assign random color to each ARMeshAnchor/SCNNode be able to distinguish them in demo */
geometry.firstMaterial?.diffuse.contents = colorizer.assignColor(to: meshAnchor.identifier)
/* Create node & assign geometry */
let node = SCNNode()
node.name = "DynamicNode-\(meshAnchor.identifier)"
node.geometry = geometry
return node
}
func renderer(_ renderer: SCNSceneRenderer, didUpdate node: SCNNode, for anchor: ARAnchor) {
/* Update the node's geometry when mesh or position changes */
guard let meshAnchor = anchor as? ARMeshAnchor else {
return
}
/* Generate a new geometry */
let newGeometry = SCNGeometry(arGeometry: meshAnchor.geometry) /* regenerate geometry */
/* Assign the same color (colorizer stores id <-> color map internally) */
newGeometry.firstMaterial?.diffuse.contents = colorizer.assignColor(to: meshAnchor.identifier)
/* Replace node's geometry with a new one */
node.geometry = newGeometry
}
{% c-block-end %}
Conversion of ARMeshGeometry to SCNGeometry is pretty straightforward as the structures are very similar:
{% c-block language="swift" %}
extension SCNGeometry {
convenience init(arGeometry: ARMeshGeometry) {
let verticesSource = SCNGeometrySource(arGeometry.vertices, semantic: .vertex)
let normalsSource = SCNGeometrySource(arGeometry.normals, semantic: .normal)
let faces = SCNGeometryElement(arGeometry.faces)
self.init(sources: [verticesSource, normalsSource], elements: [faces])
}
}
extension SCNGeometrySource {
convenience init(_ source: ARGeometrySource, semantic: Semantic) {
self.init(buffer: source.buffer, vertexFormat: source.format, semantic: semantic, vertexCount: source.count, dataOffset: source.offset, dataStride: source.stride)
}
}
extension SCNGeometryElement {
convenience init(_ source: ARGeometryElement) {
let pointer = source.buffer.contents()
let byteCount = source.count * source.indexCountPerPrimitive * source.bytesPerIndex
let data = Data(bytesNoCopy: pointer, count: byteCount, deallocator: .none)
self.init(data: data, primitiveType: .of(source.primitiveType), primitiveCount: source.count, bytesPerIndex: source.bytesPerIndex)
}
}
extension SCNGeometryPrimitiveType {
static func of(_ type: ARGeometryPrimitiveType) -> SCNGeometryPrimitiveType {
switch type {
case .line:
return .line
case .triangle:
return .triangles
}
}
}
{% c-block-end %}
And here is the result:
As we can see, the ARKit produces "square" meshes approximately 1 m x 1 m. Some of them may overlap with each other.
Similarly to the early RealityKit (a new way to primarily work with augmented reality) releases where many features were missing, the new version of ARKit 3.5 came with a simple and quite small API covering a big enhancement. It also includes one more anchor type and a few new classes in a computer graphics style. I personally found it more interesting to check out these new entities in a practical way. It helped me to figure out how to work with it and also showed me interesting facts which the documentation does not describe. This first finding that I described can be explained as a way in which Apple simplifies a complex problem. But it is only the first part of what I discovered: Stay tuned to read more.