pcl_ros templating and 0.4 release

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

pcl_ros templating and 0.4 release

Radu B. Rusu
Administrator
I've been struggling for the past few days to find a solution to the pcl_ros problem. Even if you are not a user of
pcl_ros, you must have noticed the excessive compile times and large memory usage (unless you always used the binaries
and you never depended on pcl_ros).

The problem lies in the multitude of templating code that PCL_ROS used. Besides PCL code, it also used a lot of standard
C++ templates, boost, dynamic_reconfigure, roscpp, message_filters, etc.

Coupled with aggressive optimization settings (RelWithDebInfo + O3), the compiler is usually wasting up to 1-1.2GB of
RAM. We observed this issue when our internal machines at Willow Garage changed to have 6GB of RAM, 4 cores (+4 HT) and
no swap. Since <rosmake>'s default behavior is to start as many GCC processes as cores the system reports (4+4=8 in this
case), and because 1 GCC process could take ~1GB of RAM, our desktop machines started running out of memory, as 8x1GB >
6GB :)

A temporary solution was to set ROS_PARALLEL_JOBS=-j2 in pcl_ros/Makefile, so that only 2 copies are started. However
this solution is not satisfactory, and in fact, there is a ticket opened for <rosmake> in an attempt to calm down its
aggressive -j<NRCORES> behavior by default (https://code.ros.org/trac/ros/ticket/2920).


So we begun discussing alternative solutions. There was some virtual Multiple Inheritance that we implemented that made
the code nice and clean, and we thought that was causing it, but after removing it completely and rewriting the code, we
realized that it had nothing to do with it, as the memory consumption only reduced by 100M or less.


One thing that I did observe is that the compiler optimizations matter _a lot_. So I did some tests, and it turns out
that for a part of pcl_ros:

* compiling something with -O0 results in 595M RAM usage, 33seconds
* compiling something with -O3 results in 744M RAM usage, 52seconds

You see where this is going, right? Most of the stuff that we do on a daily basis, when we prototype, could be easily
tested with a different build command, and only when we are interested in deploying that code and really squeezing more
optimizations from the compiler we should take more time. The default ROS build system sets the
CMAKE_BUILD_TYPE/ROS_BUILD_TYPE to RelWithDebInfo, which enables -O2 (which is not as bad as -O3, but definitely far
from -O0). So keep this in mind while compiling against heavily templated code.



However, good news! I am 90% done with new architectural changes in pcl_ros, that use delegation and remove the
templates completely. In the process, I also removed the virtual inheritance in PCL, which means right now PCL_ROS in
trunk will not compile (on purpose) until I commit everything.

Once committed, we might need some help testing the nodelets, to make sure that everything is fine. I'll try my best to
test all, but I can't think about every possible scenario. Unit tests for nodelets are something we should implement as
well, but I don't know if I have any time for that at the moment.

Because of all these major changes, I propose that we number the next release as 0.4.

--
Cheers,
Radu.

_______________________________________________
[hidden email] / http://pcl.ros.org
https://code.ros.org/mailman/listinfo/pcl-users
Reply | Threaded
Open this post in threaded view
|

Re: pcl_ros templating and 0.4 release

Radu B. Rusu
Administrator
One thing that I forgot to mention:

* the ProjectInliers nodelet changed from accepting input on "~inliers" to "~indices", to be consistent with the rest of
the Filters in PCL_ROS.

-  sub_indices_filter_.subscribe (*pnh_, "inliers", max_queue_size_);
+  sub_indices_filter_.subscribe (*pnh_, "indices", max_queue_size_);

This means, that we will need to update our launch scripts with the new release of point_cloud_perception. Naming it
"~inliers" until now was a mistake.

Cheers,
Radu.


On 10/14/2010 12:12 PM, Radu Bogdan Rusu wrote:

> I've been struggling for the past few days to find a solution to the
> pcl_ros problem. Even if you are not a user of pcl_ros, you must have
> noticed the excessive compile times and large memory usage (unless you
> always used the binaries and you never depended on pcl_ros).
>
> The problem lies in the multitude of templating code that PCL_ROS used.
> Besides PCL code, it also used a lot of standard C++ templates, boost,
> dynamic_reconfigure, roscpp, message_filters, etc.
>
> Coupled with aggressive optimization settings (RelWithDebInfo + O3), the
> compiler is usually wasting up to 1-1.2GB of RAM. We observed this issue
> when our internal machines at Willow Garage changed to have 6GB of RAM,
> 4 cores (+4 HT) and no swap. Since <rosmake>'s default behavior is to
> start as many GCC processes as cores the system reports (4+4=8 in this
> case), and because 1 GCC process could take ~1GB of RAM, our desktop
> machines started running out of memory, as 8x1GB > 6GB :)
>
> A temporary solution was to set ROS_PARALLEL_JOBS=-j2 in
> pcl_ros/Makefile, so that only 2 copies are started. However this
> solution is not satisfactory, and in fact, there is a ticket opened for
> <rosmake> in an attempt to calm down its aggressive -j<NRCORES> behavior
> by default (https://code.ros.org/trac/ros/ticket/2920).
>
>
> So we begun discussing alternative solutions. There was some virtual
> Multiple Inheritance that we implemented that made the code nice and
> clean, and we thought that was causing it, but after removing it
> completely and rewriting the code, we realized that it had nothing to do
> with it, as the memory consumption only reduced by 100M or less.
>
>
> One thing that I did observe is that the compiler optimizations matter
> _a lot_. So I did some tests, and it turns out that for a part of pcl_ros:
>
> * compiling something with -O0 results in 595M RAM usage, 33seconds
> * compiling something with -O3 results in 744M RAM usage, 52seconds
>
> You see where this is going, right? Most of the stuff that we do on a
> daily basis, when we prototype, could be easily tested with a different
> build command, and only when we are interested in deploying that code
> and really squeezing more optimizations from the compiler we should take
> more time. The default ROS build system sets the
> CMAKE_BUILD_TYPE/ROS_BUILD_TYPE to RelWithDebInfo, which enables -O2
> (which is not as bad as -O3, but definitely far from -O0). So keep this
> in mind while compiling against heavily templated code.
>
>
>
> However, good news! I am 90% done with new architectural changes in
> pcl_ros, that use delegation and remove the templates completely. In the
> process, I also removed the virtual inheritance in PCL, which means
> right now PCL_ROS in trunk will not compile (on purpose) until I commit
> everything.
>
> Once committed, we might need some help testing the nodelets, to make
> sure that everything is fine. I'll try my best to test all, but I can't
> think about every possible scenario. Unit tests for nodelets are
> something we should implement as well, but I don't know if I have any
> time for that at the moment.
>
> Because of all these major changes, I propose that we number the next
> release as 0.4.
>
_______________________________________________
[hidden email] / http://pcl.ros.org
https://code.ros.org/mailman/listinfo/pcl-users
Reply | Threaded
Open this post in threaded view
|

Re: pcl_ros templating and 0.4 release

Radu B. Rusu
Administrator
The new changes to pcl_ros are complete. Here's how the compile time is looking like (machine specs irrelevant).

Before (-j1):
------
RelWithDebInfo + O3 : 3m55.611s
RelWithDebInfo      : 3m29.543s
Debug               : 2m30.485s

max memory usage: 1214M

After (-j1):
-----
RelWithDebInfo + O3 : 3m20.376s
RelWithDebInfo      : 2m48.064s
Debug               : 2m0.452s

max memory usage: 978M

with -j2 you can get down to 1m8.151s, and with -j4 to 0m42.846s. I think that's decent considering that a lot of things
get compiled there.

It's not magic, but it's an improvement. I'll try one more thing: more template specializations in PCL, and see if that
improves things even more.

Though PCL is full of unit tests, PCL_ROS doesn't have any yet. This means two things: 1) we need to start building them
if we want PCL_ROS to stabilize; 2) with the new changes, we might have introduced a bug or two (though hopefully none).
I'll do my best to test the new code within the next couple of days, but I would also appreciate any help that I can get.

Cheers,
Radu.


On 10/14/2010 12:14 PM, Radu Bogdan Rusu wrote:

> One thing that I forgot to mention:
>
> * the ProjectInliers nodelet changed from accepting input on "~inliers"
> to "~indices", to be consistent with the rest of the Filters in PCL_ROS.
>
> - sub_indices_filter_.subscribe (*pnh_, "inliers", max_queue_size_);
> + sub_indices_filter_.subscribe (*pnh_, "indices", max_queue_size_);
>
> This means, that we will need to update our launch scripts with the new
> release of point_cloud_perception. Naming it "~inliers" until now was a
> mistake.
>
> Cheers,
> Radu.
>
>
> On 10/14/2010 12:12 PM, Radu Bogdan Rusu wrote:
>> I've been struggling for the past few days to find a solution to the
>> pcl_ros problem. Even if you are not a user of pcl_ros, you must have
>> noticed the excessive compile times and large memory usage (unless you
>> always used the binaries and you never depended on pcl_ros).
>>
>> The problem lies in the multitude of templating code that PCL_ROS used.
>> Besides PCL code, it also used a lot of standard C++ templates, boost,
>> dynamic_reconfigure, roscpp, message_filters, etc.
>>
>> Coupled with aggressive optimization settings (RelWithDebInfo + O3), the
>> compiler is usually wasting up to 1-1.2GB of RAM. We observed this issue
>> when our internal machines at Willow Garage changed to have 6GB of RAM,
>> 4 cores (+4 HT) and no swap. Since <rosmake>'s default behavior is to
>> start as many GCC processes as cores the system reports (4+4=8 in this
>> case), and because 1 GCC process could take ~1GB of RAM, our desktop
>> machines started running out of memory, as 8x1GB > 6GB :)
>>
>> A temporary solution was to set ROS_PARALLEL_JOBS=-j2 in
>> pcl_ros/Makefile, so that only 2 copies are started. However this
>> solution is not satisfactory, and in fact, there is a ticket opened for
>> <rosmake> in an attempt to calm down its aggressive -j<NRCORES> behavior
>> by default (https://code.ros.org/trac/ros/ticket/2920).
>>
>>
>> So we begun discussing alternative solutions. There was some virtual
>> Multiple Inheritance that we implemented that made the code nice and
>> clean, and we thought that was causing it, but after removing it
>> completely and rewriting the code, we realized that it had nothing to do
>> with it, as the memory consumption only reduced by 100M or less.
>>
>>
>> One thing that I did observe is that the compiler optimizations matter
>> _a lot_. So I did some tests, and it turns out that for a part of
>> pcl_ros:
>>
>> * compiling something with -O0 results in 595M RAM usage, 33seconds
>> * compiling something with -O3 results in 744M RAM usage, 52seconds
>>
>> You see where this is going, right? Most of the stuff that we do on a
>> daily basis, when we prototype, could be easily tested with a different
>> build command, and only when we are interested in deploying that code
>> and really squeezing more optimizations from the compiler we should take
>> more time. The default ROS build system sets the
>> CMAKE_BUILD_TYPE/ROS_BUILD_TYPE to RelWithDebInfo, which enables -O2
>> (which is not as bad as -O3, but definitely far from -O0). So keep this
>> in mind while compiling against heavily templated code.
>>
>>
>>
>> However, good news! I am 90% done with new architectural changes in
>> pcl_ros, that use delegation and remove the templates completely. In the
>> process, I also removed the virtual inheritance in PCL, which means
>> right now PCL_ROS in trunk will not compile (on purpose) until I commit
>> everything.
>>
>> Once committed, we might need some help testing the nodelets, to make
>> sure that everything is fine. I'll try my best to test all, but I can't
>> think about every possible scenario. Unit tests for nodelets are
>> something we should implement as well, but I don't know if I have any
>> time for that at the moment.
>>
>> Because of all these major changes, I propose that we number the next
>> release as 0.4.
>>
_______________________________________________
[hidden email] / http://pcl.ros.org
https://code.ros.org/mailman/listinfo/pcl-users