The extended Kalman filter is one of the most widely used techniques for state estimation of nonlinear systems. In its
two steps of forecast and data assimilation, many matrix operations including multiplication and inversion are involved.
As recent graphic processor units (GPU) have shown to provide much speedup in matrix operations, we will explore in
this work a GPU-based implementation of the extended Kalman filter. The Compute Unified Device Architecture
(CUDA) on the Nvidia GeForce GTX 590 GPU hardware will be used for comparison with a single threaded CPU
counterpart. Experiments were conducted on typical large-scale over-determined systems with thousands of components
in states and measurements. Within the GPU memory limit, a speedup of 1386x is achieved for a system with
measurements having 5000 components and states having 3750 components. The speedup profile for various
combinations of measurement and state sizes will serve as good reference for future implementation of extended Kalman
filter on real large-scale applications.