Magnetic resonance (MR) image reconstruction has reached a bottleneck where further speed improvement from the algorithmic perspective is difficult. However, some clinical practices such as real-time surgery monitoring demand faster reconstruction than what is currently available. For such dynamic imaging applications, radial sampling in k-space (i.e. projection acquisition) recently revives due to fast image acquisition, relatively good signal-to-noise ratio, and better resistance to motion artifacts, as compared with the conventional Cartesian scan. Concurrently, using the graphic processing unit (GPU) to improve algorithm performance has become increasingly popular. In this paper, an efficient GPU implementation of the fast Fourier transform (FFT) will first be described in detail, since the FFT is an important part of virtually all MR image reconstruction algorithms. Then, we evaluate the speed and image quality for the GPU implementation of two reconstruction algorithms that are suited for projection acquisition. The first algorithm is the look-up table based gridding algorithm. The second one is the filtered backprojection method which is widely used in computed tomography. Our results show that the GPU implementation is up to 100 times faster than a conventional CPU implementation with comparable image quality.
Quantization of the accumulated diffused error (ADE) is an effective means to reduce on-chip storage in a hardware implementation of error diffusion. A simple uniform quantizer can yield a factor of 2 savings with no apparent loss in image quality. Nonuniform quantizers with memory that depend on the quantizer index or various features13 can yield even greater savings -- up to a factor of 4, with essentially no loss in image quality. However, these quantizers depend on the trainability of the tone-dependent error diffusion (TDED) framework to achieve this level of quality. In addition, the design of the quantizers must be coupled to that of the TDED parameters in either a sequential or iterative fashion.
Error diffusion1{3 is a popular halftoning algorithm extensively used in digital printing. It renders di erent tone
levels by adaptively modulating local dot density. Moreover, because of its random dot placement nature, error
di usion is free of Moir e artifacts when rendering an image with strong periodic components. This makes it very
attractive to render scanned images which often have strong embedded periodic screen frequencies.
However, one potential drawback of error di usion for high speed printing applications is its computation
load. Unlike screening algorithms4, 5 which only require one threshold operation per pixel, error di usion also
must compute and di use the ltered pixel errors to the neighboring pixels.
In practice, it may be desirable to implement error di usion in parallel to speed up the computation. One
scenario is shown in Figure 1. The input image is rst equally split into four stripes. Each image stripe is
then fed to a DSP chip programmed to run error di usion. Each DSP chip runs error di usion independently
without synchronization or communication between processors. The halftone outputs from four DSP chips are
nally merged to form the whole halftone image. While this can speed up the algorithm by a factor of four,
one potential problem with this parallel implementation is that dot clusters or holes can be very visible along
the stripe boundaries in the merged halftone image. This is because the pixel error can not be di used across
the stripe boundary, so the \blue noise" characteristics of the halftone texture are destroyed near the stripe
boundaries. These artifacts are most visible in midtone areas, and somewhat less visible in the shadow areas. In
highlight areas the dots are sparse, so these boundary artifacts are much less visible.
Li and Allebach recently proposed parameter-trainable tone dependent error diffusion (TDED) which yields outstanding halftone quality among error diffusion based algorithms. In TDED, the tone dependent weights and thresholds as well as a halftone bitmap for threshold modulation are implemented as look-up tables (LUTs) which consume on-chip memory. In addition, the diffused errors must be buffered in on-chip memory and in most cases, transferred to off-chip memory. However, off-chip memory access considerably deteriorates system performance. In this paper, we propose two approaches to improve memory efficiency. First, we use deterministic bit flipping to replace threshold modulation, and linearize the weights and thresholds of TDED. This reduces the memory requirement by using only a few constants, rather than full LUTs, and generates halftones whose quality is nearly indistinguishable from that of standard TDED. Secondly, we propose a block-based processing strategy which significantly reduces off-chip memory access. We devise a novel scan-path which enables our algorithm to process any input image block-by-block without yielding block-boundary artifacts. Special filters are designed and optimized for the block diagonals so that the resulting halftone quality is comparable to that of standard TDED.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.