Experience with ADI-FDTD techniques on the Cray MTA supercomputer

Harry F. Jordan; Shahid Bokhari; Shawn Staker; Jon R. Sauer; Mona A. ElHelbawy; Melinda J. Piket-May

doi:10.1117/12.434878

27 July 2001 Experience with ADI-FDTD techniques on the Cray MTA supercomputer

Harry F. Jordan, Shahid Bokhari, Shawn Staker, Jon R. Sauer, Mona A. ElHelbawy, Melinda J. Piket-May

Proceedings Volume 4528, Commercial Applications for High-Performance Computing; (2001) https://doi.org/10.1117/12.434878
Event: ITCom 2001: International Symposium on the Convergence of IT and Communications, 2001, Denver, CO, United States

Abstract

Finite difference, time domain (FDTD) simulations are important to the design cycle for optical communications devices. High spatial resolution is essential, and the Courant condition limits the time step, making this problem require the level of high-performance system usually only available at a remote center. Model definition and result visualization can be done locally. Recent application of the alternating direction implicit (ADI) method to FDTD removes the Courant condition, promising larger time steps for meaningful turnaround in simulations. At each time step, tridiagonal equations are solved over single dimensions of a 3D problem, but all three dimensions are involved in each time step. Thus, for a distributed memory multiprocessor, no partition of the data prevents tridiagonals from crossing processors without remapping every time step. Likewise, for cache based or vector computers, there is a stride of NxN for tridiagonals at every time step for a NxNxN grid. There is plenty of parallelism, because NxN tridiagonals can be solved simultaneously. This makes the problem well suited to a machine like the Cray multithreaded architecture (MTA) that has a large, flat memory and uses parallelism to hide memory latency. A Cray MTA implementation of the ADI-FDTD code executes serial tridiagonal solvers in parallel on multiple threads and successfully hides memory latency, achieving just over one FLOP per clock cycle per processor for a 200x200x200 grid on an 8 processor system at the San Diego Supercomputer Center. The 8 processor speed is 2.06 Gflop and the efficiency is 98%. Comparing one MTA processor, with a 250 MHz clock to a 500 MHz Alpha processor, the MTA is three times as fast for a 50x50x50 grid problem size. A vectorized version of the code run on one Cray T90 processor is three times faster than one MTA processor for a 100x100x100 grid size.

Citation Download Citation

Harry F. Jordan, Shahid Bokhari, Shawn Staker, Jon R. Sauer, Mona A. ElHelbawy, and Melinda J. Piket-May "Experience with ADI-FDTD techniques on the Cray MTA supercomputer", Proc. SPIE 4528, Commercial Applications for High-Performance Computing, (27 July 2001); https://doi.org/10.1117/12.434878

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available