My parallel programming research has culminated with the design of a few elementary data structures, each of which will be implemented optimally for specific parallel hardware. Each data structure interface will operate on the same data, and the parallel hardware to use for execution is selected at runtime. This design provides users with a high level abstraction which is guaranteed to execute efficiently at a low level. The next step is to prove whether or not the result of composing these data structures in arbitrary ways is still efficient.
The basic data structures I have provided are:
- Tuples
- Lists (with a twist!)
- Unordered Collection
- 1D, 2D, 3D collections
Tuples:
Tuples are provided in Python, and are also a concept provided in functional programming languages. They are an ordered collection with a fixed size, and are conceptually similar to mathematical vector objects. My tuples are homogeneous, that is all elements are of the same type. Large collections of tuples are ideally processed using vector operations. Tuples are not as generic as lists, because they have a fixed size.
Lists (with a twist!):
Lists are a key concept in most languages, and are perhaps the most elementary of data constructs. In C arrays provide a list concept, although of a fixed size. Python and functional languages such as Haskell provide lists directly. The equivalent in C++ is the vector template class. So why bother creating another list representation when my underlying language provides it? Simple: my lists correspond directly to cache lines. I do not provide the concept of random insert, cache lines (I mean, lists) are too simple for that. Instead you can delete something from memory, at which point the list will mark that area of memory as vacant (I can use a bitmask to keep track of what’s empty). The use of these lists should provide a good indicator of the cache behavior.
Unordered Collection:
Ordering is such a sequential thing to do! If I don’t care about the order of elements within a collection, I can just insert an element into some random position. I can construct an unordered collection by creating a bucket to hold objects, and use thread local storage to give each thread a sub-bucket. Each thread can then insert elements into its local bucket, and the full container is the union of all local storages. This type of container is useful when I’m constructing something like a pointer container. The only reason for the use of a collection in a pointer container is to hold references for RAII, so there is really no need for this container to have any ordering.
1D, 2D, and 3D Variables:
These are funny variables that are meant to ease real programming. How many times have you tried to do something in 3D, and got confused with indices? What about parallelizing table accesses? Threading Building Blocks provides the blocked_range2d template, but it would be nice to have a native type that it works with simply. I am providing 1D, 2D and 3D variables to allow the user to supply my parallel programming library with their intent. The use of these types indicate to the library that a collection is more than just an array of arrays of arrays, instead it indicates that this is a dimensional collection.
I’ve outlined briefly the data constructs which I am providing in my parallel library as primitives. Advanced data structures will be implemented as compositions of these types. Ideally the composition of these types will be just as efficient and parallelizable as the types themselves. I’d of course appreciate feedback.