Presently, YetiSim uses boost::shared_ptr for all pointer operations. Code execution profiling has revealed that incrementing and decrementing reference counts accounts for 50% of runtime. Smart pointers provide exception safety, and result in fairly safe code with minimal effort. Unfortunately the runtime cost has proven to be rather high. It could be argued that this is the cost of safety, however execution time is also highly important.
A few people have been surprised that the runtime cost of shared_ptr could be so high, and the reality is that YetiSim is a specialized application. Pointers are moved around within internal data structures frequently as simulation state changes, and this accounts for the high cost. The majority of the work performed by YetiSim, is done by moving pointers around. Each copy of a shared_ptr shares ownership of the pointed object, so that it does not disappear. The shared ownership is not strictly required in YetiSim, hence the high performance cost without advantage.
Strictly speaking, YetiSim would perform just fine without revising the design to reduce shared_ptr usage. Another hidden disadvantage of shared_ptrs may be how they affect parallelism. It may be that internally, shared_ptrs require the lock of a mutex for their use, otherwise the shared_ptr might be corrupted by threaded code. I’m not sure about the internals of the shared_ptr in threaded code, however this issue was hinted at on #boost. I feel that the potential performance gains achieved by redesigning classes which use shared_ptr justify the attention that such a redesign would require.
I would like to hide the use of real pointers within YetiSim, so that users do not see the added complexity. The use of pointers in C++ is not a super-advanced concept, but I want to provide a clean interface to users as much as possible. The target users of YetiSim will not be C++ gurus, they will be people who just need to write a simulation. Thus where possible, complexity has been shifted to YetiSim rather than to the library users.
Another bottleneck that has not yet shown itself, but surely is present, is the allocation of TaskContext objects used during parallel runs. The TaskContext structures are used by tbb::parallel_reduce for the join step, in which changes to the MasterScheduler are merged together. These structures have a clear() member function which resets them for later use, however presently they are deleted rather than reused. It would be better for TaskContext objects to be kept in a pool, so that objects could be allocated in larger chunks for use, and also reused rather than creating and deleting them.
It may also be prudent to build a pooling mechanism for the simulation entities themselves, if a large number of simulation entities were to be created and destroyed at runtime. This would require some interfaces to be implemented by the user directly, so this can wait for a while. It may be there are better ways to do this anyways. This is an issue that I will examine closer, but not for a little while.