Yesterday I discussed a problem with processing interdependent entities with TBB. This doesn’t just pertain to simulations and to the execution_graph, but in general. There is no provided high-level construct which allows you to easily process elements of a range with interdependencies. Let me see if I can say this more clearly, say I have to process a collection of entities, but I also have to reprocess some of those entities over again in the algorithm. Perhaps I am processing 2D cells on a map, and have to complete a second pass to process the boundaries. This could certainly be done with the task_scheduler, and likely could be done with just the range and partitioner.
I am interested in a high level construct which would allow me to partition a set of elements into units which could be processed in parallel. This construct should then allow me to partition the set again to make another pass, and so on until I deem that the processing is completed. In this way, the problems discussed in Part 3 of my blog series on the execution_graph could be resolved. I know I promised code yesterday, but I’ll just show an idea today.
In order to ensure I’m conveying my ideas effectively, I’ve prepared some diagrams. Suppose I have a set of elements to process in some manner. I’ll represent this set as blobs of ink on a digital page, as shown below.

In order to process these elements, in the first pass I might partition the space into three segments. I do not have to process the entire space at once, just the segments which interest me. I’ve drawn lines around the elements I want to process. These three partitions should be processed in parallel.

Again after processing these segments I might not have completed my task. Next I will partition the space into another three segments which can be processed independently. Note that this time one of the segments only contains a single element to process, this is fine and entirely possible.

For the last step, I will partition the space one final time into two large segments. Once again these two segments will be processed in parallel.

I call the construct which will support this (tentatively), parallel_set_for. This conveys that we have a set of elements which are to be processed. We process subsets of this larger set independently, then partition the larger space into subsets again for further processing until the task is completed (as opposed to partitioning the subsets into smaller ones).
In contrast with the TBB parallel_for construct which simply divides elements for individual processing, the parallel_set_for construct might not even process all elements. Some might not affect computation so they might be omitted. Furthermore, the parallel_for construct is designed to divide work to keep all processors busy and simply complete after all elements have been processed. The parallel_set_for construct would proceed in steps, where independent subsets are selected and then processed.
In comparison to parallel_for, this processing method could form a bottleneck. If partitions are not selected properly, effort might be wasted because some partitions could have been processed in a previous step. In addition, a single element which is being processed might hold up multiple independent partitions which could process next without affecting results. I think that this could be compensated for with good interface design.
Well, that is what I’ve thought up so far. I welcome your comments. This is not a fully thought out idea, just an idea which sprung to mind today that I wanted to share with you. Good night!