We address the discrepancy that existed between the low arithmetic complexity of nonuniform Fast Fourier Transform (NUFFT) algorithms and high latency in practical use of NUFFTs with large data sets, especially, in multi-dimensional domains. The execution time of a NUFFT can be longer by a factor of two orders of magnitude
than what is expected by the arithmetic complexity. We examine the architectural factors in the latency, primarily on the non-even latency distribution in memory references across different levels in the memory hierarchy. We then introduce an effective approach to reducing the latency substantially by exploiting the geometric features in the sample translation stage and making memory references local. The restructured NUFFT algorithms render efficient computation in sequential as well as in parallel. Experimental results and improvements for radially encoded magnetic resonance image reconstruction are presented.