I'm working on object recognition with 6 DoF pose estimation.

I use local descriptors (SHOT or PHF), HOUGH voting for pipeline, Harris3D
or ISS3D for keypoints.

Object detection works great in real time using Kinect v2. Main problem is
that in every frame I get different transformation matrix, sometimes with
translation error around ~8 cm between frames.

Is this happening because of the unstable keypoints?

I was thinking about to add ICP registration after local pipeline to refine
pose estimation.

How to solve this problem?


