In order to support a broad dynamic range and a high degree of precision, many of 3D renderings fundamental algorithms have been traditionally performed in floating-point. However, fixed-point data representation is preferable over floating-point representation in graphics applications on embedded devices where performance is of paramount importance, while the dynamic range and precision requirements are limited due to the small display sizes (current PDA's are 640 × 480 (VGA), while cell-phones are even smaller). In this paper we analyze the efficiency of a CORDIC-augmented Sandbridge
processor when implementing a vertex processor in software using fixed-point arithmetic. A CORDIC-based solution for vertex processing exhibits a number of advantages over classical Multiply-and-Acumulate solutions. First, since a single primitive is used to describe the computation, the code can easily be vectorized and multithreaded, and thus fits the major Sandbridge architectural features. Second, since a CORDIC iteration consists of only a shift operation followed by an addition, the computation may be deeply pipelined. Initially, we outline the Sandbridge architecture extension which encompasses a CORDIC functional unit and the associated instructions. Then, we consider rigid-body rotation, lighting, exponentiation, vector normalization, and perspective division (which are some of the most important data-intensive 3D graphics kernels) and propose a scheme to implement them on the CORDIC-augmented Sandbridge processor. Preliminary results indicate that the performance improvement within the extended instruction set ranges from 3× to 10× (with the exception of rigid body rotation).