Innovations in BIM-based Localization
In the classic SLAM context, the process involves simultaneous mapping and localization (SLAM), meaning that the map is built incrementally while the robot's location is estimated.


However, in the construction context, the map only needs to be constructed once and can support long-term operations. Moreover, BIM is becoming more prevalent compared to CAD. Against this backdrop, the following three studies focus on BIM-based localization, semantic consistency, and geometric consistency.
Towards BIM-based Localization
H. Yin, J. M. Liew, W. L. Lee, M. H. Ang, Ker-Wei Yeoh, and Justin, “Towards BIM-based robot localization: a real-world case study,” presented at the 39th International Symposium on Automation and Robotics in Construction, Jul. 2022. doi: 10.22260/ISARC2022/0012.
This work leverages a workflow that localizes robots in BIM-generated maps, freeing them from the computational complexity and global inconsistency of online SLAM.
Mapping
The authors propose a three-step pipeline for each individual floor:

Localization
A point-to-plane ICP-based pose estimation algorithm is used:
In the NUS dataset, localization achieves translation errors below 0.2m and rotation errors below 2° compared to DLO. However, deviations between as-planned and as-built conditions can lead to sudden drift.


Why not DLO ?
Direct LiDAR Odometry (DLO) achieves even higher accuracy than other LiDAR SLAM systems, to the point where the authors use it as a proxy for ground truth in their experiments. So why not rely entirely on DLO?
Despite its superior accuracy, DLO still suffers from global inconsistency due to its scan-to-scan matching approach. More importantly, DLO cannot leverage the semantic information contained in BIM models.
Towards Semantic Consistency
H. Yin, Z. Lin, and J. K. W. Yeoh, “Semantic localization on BIM-generated maps using a 3D LiDAR sensor,” Automation in Construction, vol. 146, p. 104641, Feb. 2023, doi: 10.1016/j.autcon.2022.104641.
This work introduces Semantic ICP to improve localization accuracy in structured environments, which guides LiDAR-based localization to both geometric and semantic consist.

Mapping
Each semantic object in BIM is represented by an axis-aligned bounding box which is parameterized by 2 points

Localization
With the guidance of coarse-to-fine, there are three steps to achieving the refined result. First, iterate by standard ICP to get a coarse result. Then, based on that result, the input points are labeled with those boxes. Finally, based on the label, a semantic ICP is applied to refined registration.

Which
In Huan's practice,

In the experiment, a 2D offline SOTA SLAM Cartographer is chosen as a proxy for ground truth. Compared to standard ICP, the semantic filter improved the accuracy, while the gap between as-designed and as-built was still unsolved. The conversion from BIM to a semantic map may not be that accurate, which may reduce the robustness. The init guess of
Towards Geometric Consistency
Z. Qiao et al., “Speak the Same Language: Global LiDAR Registration on BIM Using Pose Hough Transform,” IEEE Transactions on Automation Science and Engineering, pp. 1–1, 2025, doi: 10.1109/TASE.2025.3549176.


Submap is composed by a sequence of pointcloud inputs
Every voxel is parameterized by Gaussian distribution
To determine whether the points in a voxel belong to a plane, we analyze the eigenvalues of the covariance matrix
After this, those voxels with similar normal and similar point-to-plane distances will merge into the same planar. Then wall points are projected onto the ground plane with a pixel scale of

The triangle descriptor

Based on the resolution of

To determine the optimal transformation aligning a LiDAR submap with a BIM model, an occupancy-aware confidence score is proposed. The BIM wall point cloud
The penalty score penalizes ground submap points
The final confidence score combines these components
where

Problems Unsolved
The authors use a clever confidence calculation method to avoid the difference between as-designed and as-built make some effects. It is an implicit way to express the gaps. Do we need to express it explicitly? Can we find an explicit expression for such gaps?

The second problem is lidar would degenerate in long corridor cases, which leads to inconsistency and failure estimation. Do we need to expand the size of the submap to avoid it? Can we use a topologic graph to avoid it explicitly? Or can we fuse more information from other sensors like cameras or radar?