Next: RTM Up: Clapp et al.: Hardware Previous: Clapp et al.: Hardware

Introduction

Over the last couple of years, it has become more difficult to assess which hardware can deliver the most cost-optimal solution for demanding imaging tasks. The days of faster and faster CPUs are over. A clear choice of hardware has been replaced with many-core technologies, and a proliferation of alternatives. In particular, accelerators like GP-GPU and Field Programmable Gate Arrays (FPGAs) have emerged as strong contenders for the title of hardware platform of choice. With the radical difference in hardware architectures, it has also become more and more difficult to evaluate which platform is optimal for the application in question. An apples-to-apples comparison is no longer possible. Through the example of RTM, we demonstrate that only through a careful optimization for each platform, with the involvement of Hardware, Computer-Science and algorithmic scientists, can we come up with a reasonable assessment of the alternatives available today.

At the core of the RTM algorithm is a modeling kernel. The simplicity of the modeling kernel has led to high-performance implementation on Field Programmable Gate Arrays (FPGA) (Nemeth et al., 2008), General Processing Graphics Processing Units (GPGPU) (Micikevicius, 2008), and conventional processors. There has been significant debate on which platform produces the most efficient code, both in terms of runtime and when code maintainability is factored in. One of the largest difficulties in comparing these three platforms is that while a naive, unoptimized RTM algorithm could be implemented on each platform and directly compared, an optimized version of the RTM algorithm varies significantly due to the strengths and weaknesses of each architecture.

In this paper, we discuss some of the algorithmic and optimization decisions used to implement RTM on each platform. In addition, we show how these choices are often directly tied to the characteristics of the underlying architecture. We begin by describing the basic RTM algorithm. We then discuss some of the strengths and weaknesses of each platform for RTM and the various algorithmic approaches and optimization techniques for implementing RTM on each platform. We conclude with a discussion about what generalities can be extracted from three different implementation approaches and at what level the implementations are architecture-specific.

Next: RTM Up: Clapp et al.: Hardware Previous: Clapp et al.: Hardware

2009-10-16