next up previous [pdf]

Next: GPGPU Up: CPU Previous: Property compression

Cache-oblivious approaches

Cache optimization techniques, such as finding the optimal sub-domain for domain decomposition, are very CPU-specific. Cache-oblivious approaches attempt to find a solution that is independent of cache size, resulting in a minimum number of cache misses. These approaches generally use a divide and conquer scheme. Imagine the 1-D version of algorithm 2 and say we want to calculate the 10th position at $x$ at the 5th time sample $s(10,5)$. Ignoring velocity for a minute, we need $s(9:11,4)$ and $s(10,3)$. To obtain $s(9:11,4)$, we need $s(8:12,3)$ and $s(9:11,2)$. To obtain $s(8:12,3)$ we need $s(7:13,2)$ and $s(8:12,1)$. This concept is demonstrated in Figure 3. In addition, for all of these calculations, we will need $v(8:12)$. If we then do the progression of calculating $s(8:12,3)$, the L1 cache will have everything we need to update $s(9:11,4)$ and $s(10,5)$ without suffering a single cache miss.

oblivious
oblivious
Figure 3.
The concept behind oblivious cache. By reading all of the values at the bottom at the bottom of each pyramid the solution can be updated at all the location at the second level of the pyramid. The second level provides all that is needed to update the third level. Finally all the elements on the third level provide everything that is need to create the element at the top of the pyramid. This technique minimizes cache misses by reformulating the algorithm to maximize data reuse.[NR]
[pdf] [png]


next up previous [pdf]

Next: GPGPU Up: CPU Previous: Property compression

2009-10-16