Performance questions

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Performance questions

lllegrand
This post has NOT been accepted by the mailing list yet.
Greetings,

I am interested in (among other things) doing fast registration of 3d scans to build object models.  Our current implementation is not very flexible and not very fast.  All indications are that a re-spin based on pcl would be a vast improvement, so I have spent the last couple of days going through the tutorials and trying to understand the code.  

The pcl definitely seems flexible.  However, as I sketch out my proposed perception processing graphs, I am concerned about the efficiency of the resulting computations.  Since I am new to all of this, it is certainly possible that (a) I am doing things all wrong, (b) I have misunderstood how things work, or (c) the things that I am concerned about actually don't take much time.  I welcome any schooling you can offer.

Here are some questions:
- It seems much of the flexibility of pcl comes from the idea of creating processing nodes/nodelets that can be mixed and matched in different ways to do various jobs.  The way that cloud data is passed between nodes/nodelets is via the sensor_msg::PointCloud2, but it seems that most useful processing requires the cloud data to be represented as a pcl::PointCloud.  So each time a node gets data, it mush first convert to pcl::PointCloud.  Then when it is done processing, if it wands to publish a resulting cloud, it must convert it to sensor_msg::PointCloud2.  Doesn't it take a lot of cycles to do all of these convert-to-then-back-again's?  I could reduce the number of conversions by cramming more functionality into a single node, but then I lose flexibility.  Can you offer advice about the best way to balance these concerns?  BTW, I saw in one post that there is an effort to make pcl::PointCloud publishable.  Is that work still going on?  Any estimate of the ETA?

- As I go through the  pairwise_incremental_registration tutorial, it seems like there is a lot of KdTree building/rebuilding.  For example, we build the KdTree when we compute the normals of the source and target clouds, then we rebuild essentially the same KdTrees when we initialize the pcl::IterativeClosestPointNonLinear object.  Further, we rebuild the source KdTree for each ICP iteration.  Aren't these KdTree's closely related to each other?  Doesn't it take a lot of cycles to do all of this rebuilding?  Is there any way to reuse the KdTrees once they are built?

Reply | Threaded
Open this post in threaded view
|

Performance questions

lllegrand

(Second attempt.  I think my first attempt got stuck because I was not a full fledged subscriber, or some such.  Apologies if this is a repost.)

 

Greetings,

 

I am interested in (among other things) doing fast registration of 3d scans to build object models.  Our current implementation is not very flexible and not very fast.  All indications are that a re-spin based on pcl would be a vast improvement, so I have spent the last couple of days going through the tutorials and trying to understand the code. 

 

The pcl definitely seems flexible.  However, as I sketch out my proposed perception processing graphs, I am concerned about the efficiency of the resulting computations.  Since I am new to all of this, it is certainly possible that (a) I am doing things all wrong, (b) I have misunderstood how things work, or (c) the things that I am concerned about actually don't take much time.  I welcome any schooling you can offer.

 

Here are some questions:

- It seems much of the flexibility of pcl comes from the idea of creating processing nodes/nodelets that can be mixed and matched in different ways to do various jobs.  The way that cloud data is passed between nodes/nodelets is via the sensor_msg::PointCloud2, but it seems that most useful processing requires the cloud data to be represented as a pcl::PointCloud.  So each time a node gets data, it mush first convert to pcl::PointCloud.  Then when it is done processing, if it wands to publish a resulting cloud, it must convert it to sensor_msg::PointCloud2.  Doesn't it take a lot of cycles to do all of these convert-to-then-back-again's?  I could reduce the number of conversions by cramming more functionality into a single node, but then I lose flexibility.  Can you offer advice about the best way to balance these concerns?  BTW, I saw in one post that there is an effort to make pcl::PointCloud publishable.  Is that work still going on?  Any estimate of the ETA?

 

- As I go through the  pairwise_incremental_registration tutorial, it seems like there is a lot of KdTree building/rebuilding.  For example, we build the KdTree when we compute the normals of the source and target clouds, then we rebuild essentially the same KdTrees when we initialize the pcl::IterativeClosestPointNonLinear object.  Further, we rebuild the source KdTree for each ICP iteration.  Aren't these KdTree's closely related to each other?  Doesn't it take a lot of cycles to do all of this rebuilding?  Is there any way to reuse the KdTrees once they are built?


_______________________________________________
[hidden email] / http://pcl.ros.org
https://code.ros.org/mailman/listinfo/pcl-users
Reply | Threaded
Open this post in threaded view
|

Re: Performance questions

lllegrand

I retract this comment:

>As I go through the  pairwise_incremental_registration tutorial, it seems like there is a lot of KdTree building/rebuilding.  For example, we build the KdTree when we compute the normals of the source and target clouds, then we rebuild essentially the same KdTrees when we initialize the pcl::IterativeClosestPointNonLinear object.

After further inspection, I now understand that registration is happening in 4D (x,y,z,curvature) so the second building of the KdTrees is required.  Duh.

However, this comment might still be valid:

>Further, we rebuild the source KdTree for each ICP iteration.  Aren't these KdTree's closely related to each other?  Doesn't it take a lot of cycles to do all of this rebuilding?  Is there any way to reuse the KdTrees once they are built?

 

 


_______________________________________________
[hidden email] / http://pcl.ros.org
https://code.ros.org/mailman/listinfo/pcl-users
Reply | Threaded
Open this post in threaded view
|

Re: Performance questions

garratt
In reply to this post by lllegrand
On Wed, 2010-11-10 at 14:49 -0800, Legrand, Louis L wrote:
> Greetings,
>
>  
>
> I am interested in (among other things) doing fast registration of 3d
> scans to build object models.
cool beans! I am also working toward that goal. I would be interested in
learning more about how you approach the problem.

> Here are some questions:
>
> - It seems much of the flexibility of pcl comes from the idea of
> creating processing nodes/nodelets that can be mixed and matched in
> different ways to do various jobs.  The way that cloud data is passed
> between nodes/nodelets is via the sensor_msg::PointCloud2, but it
> seems that most useful processing requires the cloud data to be
> represented as a pcl::PointCloud.  So each time a node gets data, it
> mush first convert to pcl::PointCloud.  Then when it is done
> processing, if it wands to publish a resulting cloud, it must convert
> it to sensor_msg::PointCloud2.  Doesn't it take a lot of cycles to do
> all of these convert-to-then-back-again's?  I could reduce the number
> of conversions by cramming more functionality into a single node, but
> then I lose flexibility.  Can you offer advice about the best way to
> balance these concerns?  BTW, I saw in one post that there is an
> effort to make pcl::PointCloud publishable.  Is that work still going
> on?  Any estimate of the ETA?
>
I can't help you much here, since I personally go the route of cramming
everything in one node. (and making libraries to allow the sharing of
code)


>  
>
> - As I go through the  pairwise_incremental_registration tutorial, it
> seems like there is a lot of KdTree building/rebuilding.  For example,
> we build the KdTree when we compute the normals of the source and
> target clouds, then we rebuild essentially the same KdTrees when we
> initialize the pcl::IterativeClosestPointNonLinear object.  Further,
> we rebuild the source KdTree for each ICP iteration.  Aren't these
> KdTree's closely related to each other?  Doesn't it take a lot of
> cycles to do all of this rebuilding?  Is there any way to reuse the
> KdTrees once they are built?

In my experience,
building a kdtree (for flann) takes about .4 ms / 1000 pts
running k nearest nieghbors takes  about 1 us / 1000 pts in the cloud,
with a overhead of around  ~ 1us/k:
for example: running KNN with k=30:
cloud size: time/search
1000 0.000036
2000 0.000041
8000 0.000063
64000 0.000102
512000 0.000167


for example: running KNN with k=1:
1000 0.000010
2000 0.000012
8000 0.000012
64000 0.000020
512000 0.000130

I kid you not, I was actually testing this today...
the times are from system clock, so they're pretty noisy, but you get
the idea.  (cpu speed: 2.8Ghz)

For something like segmentation, I found that I was doing on the order
of 10000 searches and building 2 kdtrees, so the searching time far
outweighs the tree build time.  A big factor is the size of the
underlying point cloud, so if you can segment out your object before
doing operations like finding normals, it can be a big help.

as for reusing trees, I have had some mysterious errors crop up when
using many flann kdtrees in the same thread.  I don't want to make any
accusations, because I have not tested it thoroughly, but it might be
something to keep in mind...

I'm currently testing a faster segmentation algorithm for pcl, as well
as some icp methods.  I would be interested in hearing if other people
are also working on performance boosts for parts of pcl.

cheers,
Garratt









>
>
> _______________________________________________
> [hidden email] / http://pcl.ros.org
> https://code.ros.org/mailman/listinfo/pcl-users


_______________________________________________
[hidden email] / http://pcl.ros.org
https://code.ros.org/mailman/listinfo/pcl-users
Reply | Threaded
Open this post in threaded view
|

Re: Performance questions

Radu B. Rusu
Administrator
In reply to this post by lllegrand
Dear Louis,


On 11/10/2010 02:49 PM, Legrand, Louis L wrote:
> (Second attempt.  I think my first attempt got stuck because I was not a
> full fledged subscriber, or some such. Apologies if this is a repost.)

Our apologies for that - it's one of the few things that keeps SPAM messages away. ;)

> Greetings,
>
> I am interested in (among other things) doing fast registration of 3d
> scans to build object models. Our current implementation is not very
> flexible and not very fast. All indications are that a re-spin based on
> pcl would be a vast improvement, so I have spent the last couple of days
> going through the tutorials and trying to understand the code.

Great! Before I even comment on the remaining of your e-mail, I must say that we are currently in the process of
revamping/finishing the registration library in PCL, so within a few days we hope to have a final implementation for PCL
1.0 that works well and is fast.

> The pcl definitely seems flexible. However, as I sketch out my proposed
> perception processing graphs, I am concerned about the efficiency of the
> resulting computations. Since I am new to all of this, it is certainly
> possible that (a) I am doing things all wrong, (b) I have misunderstood
> how things work, or (c) the things that I am concerned about actually
> don't take much time. I welcome any schooling you can offer.


That is our mistake - we advertise nodelets as a way to speed up development and prototyping of PCL (but not only)
algorithms in ROS. However, we should improve our documentation and be more explicit about that, as PCL the library has
nothing to do with nodelets. That's one of the main reasons why we have a PCL C++ library an PCL_ROS package that uses
the algorithms in PCL and wraps them up as nodelets.


> Here are some questions:
>
> - It seems much of the flexibility of pcl comes from the idea of
> creating processing nodes/nodelets that can be mixed and matched in
> different ways to do various jobs. The way that cloud data is passed
> between nodes/nodelets is via the sensor_msg::PointCloud2, but it seems
> that most useful processing requires the cloud data to be represented as
> a pcl::PointCloud. So each time a node gets data, it mush first convert
> to pcl::PointCloud. Then when it is done processing, if it wands to
> publish a resulting cloud, it must convert it to
> sensor_msg::PointCloud2. Doesn't it take a lot of cycles to do all of
> these convert-to-then-back-again's? I could reduce the number of
> conversions by cramming more functionality into a single node, but then
> I lose flexibility. Can you offer advice about the best way to balance
> these concerns? BTW, I saw in one post that there is an effort to make
> pcl::PointCloud publishable. Is that work still going on? Any estimate
> of the ETA?


You are 100% correct. The first implementation was done as you mentioned it: by converting back to
sensor_msgs/PointCloud2. Until recently it wasn't possible to publish pcl::PointCloud<T>, but now we're happy to
announce that we're there!

Patrick (CC-ed) is leading the efforts for the pcl::PointCloud<T> subscriber/publisher mechanisms, and he'll be happy to
provide an ETA and more information on that.


> - As I go through the pairwise_incremental_registration tutorial, it
> seems like there is a lot of KdTree building/rebuilding. For example, we
> build the KdTree when we compute the normals of the source and target
> clouds, then we rebuild essentially the same KdTrees when we initialize
> the pcl::IterativeClosestPointNonLinear object. Further, we rebuild the
> source KdTree for each ICP iteration. Aren't these KdTree's closely
> related to each other? Doesn't it take a lot of cycles to do all of this
> rebuilding? Is there any way to reuse the KdTrees once they are built?

Good point - the tutorial is very simple and needs to be improved. Efficiency was definitely not a main point on our
list when we wrote that. We should definitely reuse the trees. Since you're working on this subject, would you be
willing to help us out, provided that you have some free time?



Cheers,
Radu.
_______________________________________________
[hidden email] / http://pcl.ros.org
https://code.ros.org/mailman/listinfo/pcl-users
Reply | Threaded
Open this post in threaded view
|

Re: Performance questions

Radu B. Rusu
Administrator
In reply to this post by garratt
Marius, is there anything that prevents us from using multiple FLANN trees in the same thread?


On 11/10/2010 08:59 PM, Garratt wrote:
> On Wed, 2010-11-10 at 14:49 -0800, Legrand, Louis L wrote:
>
> as for reusing trees, I have had some mysterious errors crop up when
> using many flann kdtrees in the same thread.  I don't want to make any
> accusations, because I have not tested it thoroughly, but it might be
> something to keep in mind...
>

Thanks,
Radu.
_______________________________________________
[hidden email] / http://pcl.ros.org
https://code.ros.org/mailman/listinfo/pcl-users
Reply | Threaded
Open this post in threaded view
|

Re: Performance questions

Marius Muja
No, I don't see any reason why there would be any problem with
multiple FLANN trees in the same thread.

Marius


On Mon, Nov 15, 2010 at 10:36 AM, Radu Bogdan Rusu
<[hidden email]> wrote:

> Marius, is there anything that prevents us from using multiple FLANN trees
> in the same thread?
>
>
> On 11/10/2010 08:59 PM, Garratt wrote:
>>
>> On Wed, 2010-11-10 at 14:49 -0800, Legrand, Louis L wrote:
>>
>> as for reusing trees, I have had some mysterious errors crop up when
>> using many flann kdtrees in the same thread.  I don't want to make any
>> accusations, because I have not tested it thoroughly, but it might b
>> something to keep in mind...
>>
>
> Thanks,
> Radu.
>
>
_______________________________________________
[hidden email] / http://pcl.ros.org
https://code.ros.org/mailman/listinfo/pcl-users
Reply | Threaded
Open this post in threaded view
|

Re: Performance questions

Patrick Mihelich
In reply to this post by Radu B. Rusu
- It seems much of the flexibility of pcl comes from the idea of
creating processing nodes/nodelets that can be mixed and matched in
different ways to do various jobs. The way that cloud data is passed
between nodes/nodelets is via the sensor_msg::PointCloud2, but it seems
that most useful processing requires the cloud data to be represented as
a pcl::PointCloud. So each time a node gets data, it mush first convert
to pcl::PointCloud. Then when it is done processing, if it wands to
publish a resulting cloud, it must convert it to
sensor_msg::PointCloud2. Doesn't it take a lot of cycles to do all of
these convert-to-then-back-again's? I could reduce the number of
conversions by cramming more functionality into a single node, but then
I lose flexibility. Can you offer advice about the best way to balance
these concerns? BTW, I saw in one post that there is an effort to make
pcl::PointCloud publishable. Is that work still going on? Any estimate
of the ETA?


You are 100% correct. The first implementation was done as you mentioned it: by converting back to sensor_msgs/PointCloud2. Until recently it wasn't possible to publish pcl::PointCloud<T>, but now we're happy to announce that we're there!

Patrick (CC-ed) is leading the efforts for the pcl::PointCloud<T> subscriber/publisher mechanisms, and he'll be happy to provide an ETA and more information on that.

Yes, pcl::PointCloud<T> will be usable as a first-class message type soon. At that point there should be essentially zero overhead to using pcl::PointCloud<T> and passing them within a process.

This work has been driving some enhancements to roscpp, and the last bit of the puzzle happened in ROS trunk recently. I'm guessing that many people are running PCL "unstable" on top of cturtle, so my plan is to put the final subscriber/publisher integration in a branch. I should get to that in the next week or two. Then I'll leave it up to Radu when to merge those changes into trunk/a release.

Cheers,
Patrick

_______________________________________________
[hidden email] / http://pcl.ros.org
https://code.ros.org/mailman/listinfo/pcl-users
Reply | Threaded
Open this post in threaded view
|

Re: Performance questions

Radu B. Rusu
Administrator
Patrick,

Thanks for the info. We should make the modifications in trunk as these should not drive any other changes other than
our current pcl::Subscriber/Publisher infrastructure (one would hope ;) )

Cheers,
Radu.


On 11/15/2010 03:46 PM, Patrick Mihelich wrote:

>         - It seems much of the flexibility of pcl comes from the idea of
>         creating processing nodes/nodelets that can be mixed and matched in
>         different ways to do various jobs. The way that cloud data is passed
>         between nodes/nodelets is via the sensor_msg::PointCloud2, but
>         it seems
>         that most useful processing requires the cloud data to be
>         represented as
>         a pcl::PointCloud. So each time a node gets data, it mush first
>         convert
>         to pcl::PointCloud. Then when it is done processing, if it wands to
>         publish a resulting cloud, it must convert it to
>         sensor_msg::PointCloud2. Doesn't it take a lot of cycles to do
>         all of
>         these convert-to-then-back-again's? I could reduce the number of
>         conversions by cramming more functionality into a single node,
>         but then
>         I lose flexibility. Can you offer advice about the best way to
>         balance
>         these concerns? BTW, I saw in one post that there is an effort
>         to make
>         pcl::PointCloud publishable. Is that work still going on? Any
>         estimate
>         of the ETA?
>
>
>
>     You are 100% correct. The first implementation was done as you
>     mentioned it: by converting back to sensor_msgs/PointCloud2. Until
>     recently it wasn't possible to publish pcl::PointCloud<T>, but now
>     we're happy to announce that we're there!
>
>     Patrick (CC-ed) is leading the efforts for the pcl::PointCloud<T>
>     subscriber/publisher mechanisms, and he'll be happy to provide an
>     ETA and more information on that.
>
>
> Yes, pcl::PointCloud<T> will be usable as a first-class message type
> soon. At that point there should be essentially zero overhead to using
> pcl::PointCloud<T> and passing them within a process.
>
> This work has been driving some enhancements to roscpp, and the last bit
> of the puzzle happened in ROS trunk recently. I'm guessing that many
> people are running PCL "unstable" on top of cturtle, so my plan is to
> put the final subscriber/publisher integration in a branch. I should get
> to that in the next week or two. Then I'll leave it up to Radu when to
> merge those changes into trunk/a release.
>
> Cheers,
> Patrick
_______________________________________________
[hidden email] / http://pcl.ros.org
https://code.ros.org/mailman/listinfo/pcl-users