The steady progress in semiconductor technology allows the manufacturing of smaller and smaller structures and image sensors with ever shrinking pixel sizes. One can get the impression that the pixel size is just limited by the technology and even smaller pixels are desirable. Today, consumer products with pixel sizes are already on the market and devices with are in production.1 In comparison, photo receptors in the human eye are reported to be larger than 3 μm.2
The general modeling of light is well understood,3 and simulation with commercial tools like ISET4 is possible. In contrast, this letter addresses parameters like aperture and pixel size and their photometric consequences for modeling the amount of light that is available in a digital video camera system. One of the design parameters is the resulting image quality. With small pixels only a few photons will hit a single pixel during an exposure period and the signal-to-noise power ratio (SNR) will be poor due to shot noise.5 Apart from all technological limitations, this physical boundary limits the performance of today’s video cameras.
Image Acquisition Model
The scene radiates a certain amount of light. This is described by an average radiance in object space . The sensor sees an effective amount of light equivalent to the cone with a solid angle as shown in Fig. 1(a). This cone is defined by the sphere of radius equal to the focal length and a circular aperture disk with diameter . The solid angle thus calculates to6
On the sensor some area is used for interconnects and transistors so that only some of the area is sensitive to light. Figure 1(b) shows pixels of size . The ratio of active to total areas is expressed as an effective sensor fill factor . Even with clever manufacturing like micro-lenses or back side illumination, holds. A single pixel thus captures a certain amount of radiant power (radiant flux) of the sensor irradiance7 The electrons are then collected in the pixel. Although we will see , on average, the charge is still quantized and the actual number of electrons is subject to shot noise due to the occurrence of random events. For electrons the associated shot noise is of strength .5 As , signal power is represented with and SNR thus calculates to 8 which are neglected in the ideal case.
SNR is a parameter that is directly visible in the final images. For answering the original question, we can combine the above equations. This leads to
Results for Ideal System
At first we assume ideal technology. A typical indoor scene is illuminated with a luminance of .9 For the peak sensitivity of the human eye at a wavelength of the SI unit candela is defined10 as radiant intensity of . The radiance in object space is then5 We therefore set . With green light with the minimum pixel size calculates to .
The influence of different apertures is shown in Fig. 2. With larger aperture diameters, even smaller pixels can be used. A variation of luminance is also possible: In practice, the human color perception (photoptic vision) starts at .9 The luminance in daylight exterior scenarios is typically .9 The resulting minimum pixel sizes thus range from 5 to 0.09 μm as shown in Fig. 3.
Up to now, we used monochromatic light only. We now extend this and also include the spectral distribution of light. Again, we start with a scene with a luminance of . Now, the light is made up of radiation from a light bulb. This is modeled as a black body at a certain color temperature and a spectral radiance of11 , we set Figure 4 shows the resulting set of normalized spectral radiances for typical color temperatures.
Today, most cameras are used to capture scenes for later viewing by a human. The camera should therefore create a representation of the scene that is similar to that of the human visual system. We simulate an ideal camera with the spectral sensitivity curves based on the Stockman and Sharpe cone measurements of the human eye.12 The corresponding spectral sensitivity functions for long (L), medium (M) and short (S) wavelengths are shown in Fig. 5. However, we assume an ideal camera with ideal color filters and material without any attenuation () at peak efficiency.
In Table 1, the resulting minimum pixel sizes are shown for the radiometric simulation. The luminosity case with monochromatic light at corresponds to the ideal simulation from above. There is less than 10% error for the simulation with and cones compared to the luminosity. This is plausible from the high similarity of the respective sensitivity curves. However, the capturing of blue light (short wavelengths with cone ) requires larger pixels. At short wavelengths, the individual photons have a higher energy and thus, there are fewer for a given radiant flux. This explains the problem of inferior performance of blue color channels in typical digital cameras. The extreme case of observing monochromatic green light with a short wavelength sensitivity leads to even fewer photons and would require pixels with 26 μm. In general, the monochromatic calculation is only slightly optimistic but gives a good approximation to a radiometric computation.
Minimum pixel sizes (in μm) based on radiometric calculations for light sources with black body radiation of temperature T and monochromatic light source.
|Light source||Cone L||Cone M||Cone S||Luminosity|
Results with Current Technology
The above numbers represent the theoretical limit for ideal sensors. In practice, a real world camera does not achieve these numbers. For example, a highly optimized three layer stacked image sensor is reported by Hannebauer et al.13 For pixels of size a high fill factor of and quantum efficiency of is possible with many (costly) optimizations. In current 1.4 μm consumer grade sensors the backside illumination (BSI) technology enables close to 100% fill factor.14 For color imaging, the spectral sensitivity is not without attenuation and peak quantum efficiencies of about are reported by OmniVision14 and Aptina.15 In scientific CMOS sensors, the combined sensor readout noise is reported as low as 16 and can thus be neglected among 1000 electrons. The combined assumption of , and leads to a minimum pixel size of . With mass-market sensors and additional noise,8 larger pixels are required.
These small pixels also reach another technological limit of decreasing full well capacity. For example Aptina reports15 electrons, which leaves only a dynamic range of from noise visibility5 to overexposure. As a result, most of the image will still look noisy. However, this is a technological challenge that could be addressed with multiple readouts during the exposure.17
Another limitation comes with optical diffraction. Even in ideal optics the achievable resolution of a camera system is limited. The Sparrow criterion suggests3 that there is no gain in resolution below a critical pixel size of . For our example of and , we obtain . Achieving this limit, however, is challenging, especially in the off-axis field, and leads to expensive optics. A further decrease in aperture requires a dramatic increase of the technological efforts and smaller tolerances for optics manufacturers.
In our photometric analysis, we discuss the number of photons per pixel. With small pixels the image quality is limited by shot noise, and for indoor scenarios the current video cameras are surprisingly close to this fundamental limit. We estimate that even with ideal technology, a pixel size below will not capture enough light to generate visually pleasing videos any more. Current technology is far from perfect and with optimistic assumptions, the limit at is close to current sensors. However, for other imaging scenarios like outdoor daylight still photography, there is plenty of room at the bottom.