YetiSim Blog

Blogs about simulation and developing YetiSim.

Archive for the 'YetiSim' Category

Complete State Machine Redesign

I first created the state machine construct for YetiSim in August 2007. Since that time, I’ve learned more about C++ and Threading Building Blocks. I’ve also had time to think about the design I chose, and read posts on the TBB forums from others who could benefit from a state machine construct too. I have realized that the state machine construct could be generalized for others to use, and that YetiSim could benefit from the separated component too. I would hope that if others find the state machine useful they would help write test cases, and improve the state machine.

I have undertaken the task of completely redesigning the state machine construct for contribution to official TBB.  After this has been done, YetiSim should be reduced to a few template arguments to the new state machine construct.  This also permits me to focus on the simulation aspects of YetiSim, and not on the implementation details of the state machine since it is packaged as a separate project.

I will be blogging here with more details as the redesign progresses.  Stay tuned for more updates.


Posted by AJ Guillon  (January 17, 2008)    |    Comments (0)

YetiSim 0.2.0 Released: Major Speed Improvements

Today I committed a substantial number of changes to YetiSim to the SVN repository, and have incremented the release version to 0.2.0.  This release boasts a 2X - 3X speed increase over 0.1.0.  You can download this snapshot from the home page.

A number of interfaces both at the user and internal library level have been changed.  The use of boost::shared_ptr has been entirely eliminated due to performance reasons.  The MemoryManager class has been introduced, and all memory management operations are confined to this class.  Admittedly, this class is fairly simple right now but that is because the tbb::scalable_allocator already does pooling for us, and probably does a pretty good job of memory management already.

This release is a step backward in terms of exception safety, however this will be addressed soon.  Enjoy the new and improved YetiSim.  Further performance improvements are on their way, so stay tuned.


Posted by AJ Guillon  (December 24, 2007)    |    Comments (0)

Pointer and Reference Semantics

I have been refactoring YetiSim to move away from boost::shared_ptr to the new MemoryManager class, and replacing the shared_ptrs with references and pointers.  Where should references be returned or passed as arguments, versus pointers? What meaning should be inferred by the programmer based on this? For example, what does the prototype X& foo(Y& y) mean vs. X* foo(Y* y).  There must be a global standard for how to interpret pointers and references in prototypes in YetiSim, otherwise chaos will ensue.

After some discussions on #tbb, I decided to create and adhere to specific conventions within YetiSim.  These are the conventions for arguments:

  • References as constructor arguments:  A reference that is used within a constructor argument may be stored by the constructed class.  The class does not assume ownership of the referenced data, but may store an internal reference to it for future use.  Be careful of this situation for class destruction.
  • References as function arguments:  If a reference has been passed as an argument to a function, then the function is free to use that argument for the duration of the call.  Unless explicit in the documentation, the reference will not be saved for future use.  There should be very few, if any, cases of functions saving references for later use.
  • Pointers as a constructor argument: When a pointer is used in a constructor, ownership of the data is transferred to the constructed class.  This class is now responsible to calling appropriate MemoryManager functions to delete the data.  The pointer may naturally be saved for future use, however when passing the data to other functions or classes it should now be done by reference to convey ownership is not being passed.
  • Pointers as a function argument:  The function which has been called via a pointer assumes responsibility for ownership.  The function may save the pointer for future use, but the function (or class if a member function) is now responsible for deleting the data by calling appropriate MemoryManager member functions.

Similarly there are conventions for the return part of the function prototype.

  • References as a return type: If a reference is returned, then the object may be freely used, but ownership is not acquired.  The reference may be stored for future use, but this has to be done with care in case the reference disappears.  There must be a guarantee that the object cannot hold the reference if the referenced object disappears.  A good example of this, is  the ThreadOfExecution class.  It holds StateMachineExecution references, but part of the garbage collection of a StateMachineExecution is to delete the ThreadOfExecution objects.  Thus, a ThreadOfExecution could never reference a defunct StateMachineExecution object.
  • Pointers as a return type: This represents a transfer of ownership from the called function.  It means that the called function will no longer be responsible for the destruction of the object.  If you called the function that returned a pointer, you now own that memory.  You are now responsible for its destruction, or passing ownership to someone else.

Another complexity is the use of references to access an object that might have been destroyed after throwing an exception.  This complexity must be part of the exception handling strategy, and must be considered within the design itself.  For now, I ignore this problem because it will have to be dealt with in the exception handling strategy.

So in summary, a pointer means transfer of ownership.  A reference means that you can just use the object freely, storing references to it or otherwise so long as you know that it won’t be destroyed when you don’t intend for it to be.  These are the global semantics for YetiSim references and pointers.  If there are changes, I will edit this blog or post a comment.


Posted by AJ Guillon  (December 13, 2007)    |    Comments (0)

YetiSim State Machine Branch Prediction

A few years ago, I read about the Intel Itanium processor.  One of the ideas that stuck with me was branch prediction.  The concept in my head is likely different now than what the processor actually does, but the idea is what is important here.  When a branch is encountered if we guess an execution path, we can start processing as if we should proceed down that path.  If our guess is wrong, we take a minor hit in performance.  A correct guess could result in a substantial performance gain.

The YetiSim state machine has guarded transitions, that are a lot like branches.  If we are able to guess which branch we will proceed along, then we could begin processing as if that is the correct path.  This is an example of how YetiSim performance can improve by further refinement of the implementation, without affecting the user interface.  I like this capability within YetiSim, since I believe simulation users (like processor users) should not be overly affected by the new architecture.  Readers knowledgeable about the Itanium could certainly point out problems with my analogy right now, but let’s just stay focused on YetiSim.

Branch prediction within YetiSim has a natural mapping to the state machine execution.  Statistics would have to be gathered during execution to improve the predictions made.  Fortunately,  YetiSim will be bundled already with just the tools required.  This is not something that I would do however, I would have to ask someone with more statistics experience than me to build the models for the branch predictions.  This is something that I would like to experiment with, after YetiSim becomes more stable.  The goal is to construct a stable simulator right now, not to wander around with a never ending list of desired features.  Still, this is a glimpse of the possible future of YetiSim.

A more immediate use of the branch statistics, is to dynamically adjust the order of branch evaluation so that the list of guards to consider is sorted by most frequently taken transitions.  This will reap some benefits at minor cost, and is easily implemented.

I’m not entirely sure about the details of the branch prediction, I have some ideas that I will outline but will not discuss in detail right now.  One of the ideas I have been toying with, is to separate out data from YetiSim so that it could be stored in a database.  Alternatively, it could fall back to just ordinary memory allocated from the heap.  The main idea is that if users restricted themselves to using YetiSim data storage constructs, that data could be saved elsewhere.  The execution of state machines could be captured with a data snapshot, and a database which provides transaction support could be used.  Only the correct path is committed to the database, the other transactions are silently discarded.  I’m not a database expert, someone else would have to provide the details here… but this is a general idea which could be of use.  Unfortunately it has disadvantages too… classes must use YetiSim data storage constructs.

Anyways, here’s a glimpse of what I’m thinking about.


Posted by AJ Guillon  (December 12, 2007)    |    Comments (0)

Adjusting Interfaces and Introducing MemoryManager

A new class for memory management has been implemented for YetiSim, which is going to replace boost::shared_ptr.  This class has not yet been added to SVN.  I am migrating existing code to use the new MemoryManager singleton, and the design of the classes are being adjusted as I go along.  Rather than committing a broken YetiSim to trunk, or branching, I’ve decided to just do all the changes locally and introduce them all at once.

This represents a major change to YetiSim, since the class design is solidifying.  Real destructors are to be implemented, and new / delete calls are entirely confined to the MemoryManager class.  The library interface will consist entirely of references, while internal operations will use pointers or references.  The introduction of the MemoryManager has forced me to reconsider memory management for each class, and carefully document assumptions that have been made.  I read an analogy somewhere (maybe Meyers, maybe some docs, don’t remember) that dealing with memory is like dealing with toxic chemicals.  They are sometimes essential, however they must be treated with care.  I like this analogy, and I think about it a lot while I code.

Don’t get me wrong, boost::shared_ptr was very useful at the time, and perhaps my ignorance of Boost smart pointers is part of the reason I have migrated back to relatively simple references and pointers.  The use of smart pointers helped me create an initial version of YetiSim, and execute some simple simulations.  They were very useful for prototyping, and I feel that the overall quality of YetiSim has benefited from my experience with them.  But I also believe that they belong outside of the YetiSim library at a user level for now, and even then users must be careful of performance pitfalls.  The low overhead of boost::shared_ptr becomes astronomical in very large systems, with constant pointer operations.


Posted by AJ Guillon  (December 12, 2007)    |    Comments (0)

The scalable_allocator: Reducing Implementation Work

I decided last night that I needed to implement a memory pool for YetiSim to boost efficiency. What I did not know, was that Threading Building Blocks already hid some of the details of pooling within the scalable_allocator. A response on the TBB forum indicated that the scalable_allocator was already capable of pooling, and that perhaps I did not need to implement my own classes for this purpose.

I already knew about the scalable_allocator, but there were some details that I did not know. First, that it implemented any form of pooling at all. You can learn more from this link. Another point that I did not know, was that it could be used as the allocator within an STL container. I knew that the cache_aligned_allocator could do this, but I was unaware that the scalable_allocator was also usable. This brings up a question: Does the cache_aligned_allocator pool memory in the same manner as the scalable_allocator? I would imagine it does, but I will have to investigate to find out.

This greatly simplifies the design of my MemoryManager class, in fact the first implementation of the class will merely wrap create and delete calls. At a later point, other magic may be done, however for now if the scalable_allocator works as promised, my memory management will be very simple. I still feel that it is good design to separate out memory management concerns, in case adjustments are required later.


Posted by AJ Guillon  (December 7, 2007)    |    Comments (0)

Improving Memory Usage: The Proposed MemoryManager Class

As discussed in my previous blog, YetiSim’s memory management has formed a bottleneck.  I’m not entirely sure how to deal with this problem, so I’m considering abstracting the problem by creating a helper class called MemoryManager.  This would form  a common interface for memory management within YetiSim.  Hence future enhancements could be hidden behind a well defined interface.  Also, I think that issues of parallelism could be better handled in one place.

This would mean migrating from boost::shared_ptr to just regular old pointers.  This should be exception safe, because the MemoryManager is responsible for creating and deleting objects.  Hence, if an exception is thrown memory should not be lost because the MemoryManager class could be used to recover it, or to delete it if not required.  A memory pool could easily be implemented with the class, and the complexities of memory management could be entirely abstracted.

The MemoryManager would be implemented as a singleton class, just like the MasterScheduler.

Another possible advantage of abstracting memory management, is the migration of YetiSim to execution on clusters.  I’m not a cluster expert, however I know that managing memory that is spread across clusters would be different than managing the memory of a single computer.  One of the primary goals of YetiSim, is to be able to take a simulation from a single computer to execution on a large cluster.  This is a topic for another blog, but this longterm goal adds support to the argument for implementing a class to abstract the memory management.

I’m going to work on designing the memory management class tonight, and implement as much of it as possible.


Posted by AJ Guillon  (December 6, 2007)    |    Comments (0)

Smart Pointers, and Memory Bottlenecks

Presently, YetiSim uses boost::shared_ptr for all pointer operations. Code execution profiling has revealed that incrementing and decrementing reference counts accounts for 50% of runtime. Smart pointers provide exception safety, and result in fairly safe code with minimal effort. Unfortunately the runtime cost has proven to be rather high. It could be argued that this is the cost of safety, however execution time is also highly important.

A few people have been surprised that the runtime cost of shared_ptr could be so high, and the reality is that YetiSim is a specialized application. Pointers are moved around within internal data structures frequently as simulation state changes, and this accounts for the high cost. The majority of the work performed by YetiSim, is done by moving pointers around. Each copy of a shared_ptr shares ownership of the pointed object, so that it does not disappear. The shared ownership is not strictly required in YetiSim, hence the high performance cost without advantage.

Strictly speaking, YetiSim would perform just fine without revising the design to reduce shared_ptr usage. Another hidden disadvantage of shared_ptrs may be how they affect parallelism. It may be that internally, shared_ptrs require the lock of a mutex for their use, otherwise the shared_ptr might be corrupted by threaded code. I’m not sure about the internals of the shared_ptr in threaded code, however this issue was hinted at on #boost. I feel that the potential performance gains achieved by redesigning classes which use shared_ptr justify the attention that such a redesign would require.

I would like to hide the use of real pointers within YetiSim, so that users do not see the added complexity. The use of pointers in C++ is not a super-advanced concept, but I want to provide a clean interface to users as much as possible. The target users of YetiSim will not be C++ gurus, they will be people who just need to write a simulation. Thus where possible, complexity has been shifted to YetiSim rather than to the library users.

Another bottleneck that has not yet shown itself, but surely is present, is the allocation of TaskContext objects used during parallel runs. The TaskContext structures are used by tbb::parallel_reduce for the join step, in which changes to the MasterScheduler are merged together. These structures have a clear() member function which resets them for later use, however presently they are deleted rather than reused. It would be better for TaskContext objects to be kept in a pool, so that objects could be allocated in larger chunks for use, and also reused rather than creating and deleting them.

It may also be prudent to build a pooling mechanism for the simulation entities themselves, if a large number of simulation entities were to be created and destroyed at runtime. This would require some interfaces to be implemented by the user directly, so this can wait for a while. It may be there are better ways to do this anyways. This is an issue that I will examine closer, but not for a little while.


Posted by AJ Guillon  (December 5, 2007)    |    Comments (0)

Code Profiling - Where is YetiSim spending time?

VTune still does not work, but I’m working on that. In the mean time, Zach showed me how to use gprof. The results for a simulation run, of 100,000 clocks gave:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
14.99 17.35 17.35 137532699 0.00 0.00 boost::detail::atomic_exchange_and_add(int*, int)
12.22 31.49 14.15 114224959 0.00 0.00 boost::detail::atomic_increment(int*)
4.48 36.67 5.18 31662490 0.00 0.00 boost::detail::atomic_conditional_increment(int*)
4.06 41.37 4.70 std::bad_alloc::bad_alloc()
2.95 44.79 3.42 117077426 0.00 0.00 boost::detail::shared_count::~shared_count()
2.94 48.20 3.41 5342911 0.00 0.00 yeti::ThreadOfExecution::executeNodeLogic()
2.54 51.14 2.95 97465810 0.00 0.00 boost::detail::shared_count::shared_count(boost::detail::shared_count const&)
2.36 53.87 2.73 131417618 0.00 0.00 boost::detail::sp_counted_base::release()

The data is probably pretty messy to read, but the important finding is that the majority of execution time is spent on shared_ptr operations. This indicates that the internals of YetiSim have to be adjusted, so that as Zach suggests, we use const shared_ptr<T>& where possible.

I’ll be looking at the code in detail to determine how this problem can be addressed, and how performance can be improved. I suspect the solution will be to create containers which actually own the object, and use const references everywhere else. This solution would give the benefits of a shared_ptr, however it should help performance time.


Posted by AJ Guillon  (December 4, 2007)    |    Comments (0)

YetiSim Nightly Builds Available

YetiSim now has nightly builds available.  The latest code from trunk will always be available from http://www.yetisim.org/nightly/yetisim-nightly-build.tar.gz.  In addition, the source documentation at http://doxygen.yetisim.org will always reflect the documentation of the nightly build… that is until YetiSim has a stable release.


Posted by AJ Guillon  (November 30, 2007)    |    Comments (0)