Edwin Land coined the word “Retinex” in 1964.1 He used it to describe the theoretical need for three independent color channels to explain human color constancy. The word was a contraction of “retina” and “cortex.” A Retinex is a theoretical spectral channel that makes spatial comparisons between scene regions so as to calculate “Lightness” sensations (the monochromatic range of appearances between light and dark in each channel).
Land had enthusiastically experimented with two-color projections in the late 1950s and early 1960’s.2 By that time, he had hundreds of patents on many different photographic systems. He was well aware of the possibilities, and limitations, of silver halide photography. Before his “Red and White” light projection experiments, he accepted the standard explanation of color, namely, color was the result of the local quanta catches of receptors with different spectral sensitivities. Human color vision was thought to behave the way that color film did, in that color was a local phenomenon that resulted from spectral responses within each very small image segment. Then, Land thought that the quanta catches of the triplet of retinal cones in a small retinal region generated color appearances.
An accidental observation, made by a colleague, in a late-night experiment changed everything Land “knew” about color. The colleague remarked that there was more color than expected from mixtures of photographic separations using red and white lights. Land responded: “Oh yes, that is adaptation.” At 2 o’clock in the morning, Land sat up in bed, and said: “Adaptation, what adaptation?” He immediately returned to the lab to repeat the experiment. For the rest of his life, human color vision was a favorite research topic.
What was it that Land had seen, so briefly, that made him return to the lab in the middle of the night? Human trichromatic color theory and film have always been linked. When Thomas Young made his famous suggestion of human trichromacy in 1802, his colleague at the Royal Institution, Humphrey Davy, was studying a black and white photographic system. Young was the editor of the Institution’s journal that described Davy’s work.3 Young was well aware of silver halide’s response to light.
That night, Land realized there was nothing he could do with a locally responsive silver-halide system to make film behave the way that vision did. The color appearances in those projections could not be understood from the quanta catches of receptors in a tiny local region. He realized that human color appearances are fundamentally different: spatial comparisons control color sensations.
This was a startling observation made by a man whose company was about to bet its future on instant color film. If vision had a different mechanism, what was it? How do humans process information from different parts of the visual spectrum?
Color constancy provided the important clue to the answer. Land’s careful study of color appearance in three-color illumination led to the observation that spectral apparent lightnesses of an object in narrowband light were constant in variable amounts of illumination. The essential new idea was that spatial interaction of postreceptor neural processes depended on scene content, not the absolute amount of light. Film’s color separations recorded and reproduced the relative amount of light. Vision used spatial image processing to calculate monochromatic lightness appearances of each spectral channel. Land replaced the spectral response of spots of light on the retina with the spatial comparisons of the entire retina for each spectral sensitivity. Land coined the word “Retinex” to describe the three independent spatial mechanisms that explain color constancy.1
Color is the comparison of L, M, S Retinex monochromatic lightnesses.
Human Visual Pathway
Figure 1 illustrates the human visual pathway that begins with the visual pigments located in the distal tips of the cone and rod receptors in the retina (red ellipse). The quanta catch of these visual pigments initiates the spectral response to light. The receptors provide only the first response to the image on the retina. Appearance is the result of spatial processing along the entire visual pathway.
John Dowling greatly expanded the work of Hecht and Wald by describing the complex retinal spatial interactions.4 Berson5 has recently shown spatial modulation from melanopsin photopigment in ganglion cells. In 1953, Kuffler6 and Barlow7 showed that retinal cells make spatial comparisons. Hubel and Wiesel,8 DeValois and DeValois9 found spatial comparison cells in the cortex. Zeki10 found color constancy cells in V4 cortical cells. The dominant theme in research on the human visual pathway over the past 80 years has been the documentation of human spatial mechanisms at every stage along the visual pathway. Vision is a spatial process.
Vision Ratio-Making Sense
In 1974, Land wrote in his Friday Evening Discourse at the Royal Institution: “This Discourse is about a generally unrecognized animal sense—the ratio-making sense. It is the ratio-making sense which processes the radiation reaching our eyes in such a way as to discover the constant properties of objects in relation to the radiation falling on them.”11
Land put forward the idea that spatial comparisons, not receptor quanta catches, are the important stimuli for vision. Of course, quanta catches, as the first input step, play a role, but ratios of quanta catches play a much more fundamental role in synthesizing appearance.
Perhaps Land’s greatest contribution to vision research is the remarkable legacy of fascinating, simple but elegant, experiments. His “Red and White” projections, “Color and Black & White Mondrians,” changed the requirements of vision theories. Scenes required different mechanisms from quanta catch models. This paper will review Land’s and others’ experiments that help us understand humans’ unique spatial vision.
The best description of the original spatial algorithms that calculated lightness is found in the original literature:
Each of these articles describes important aspects of the model. In order to predict lightness in the “B&W Mondrian” and other test targets, the model varies the number and direction of paths. It includes a gradient threshold and a reset step that introduces normalization. Experiments showed that the reset step is the most interesting. Reset is key to the successful compression of HDR images. Frankle and McCann’s 1983 patent16 replaced paths with an array processor that calculated ratio, product, reset, and average using a multiresolution algorithm. This algorithm could calculate lightness predictions for a image in seconds in 1980. This led to the algorithmic Zoom Processing17 with O(N) computational efficiency. It is an extremely fast computational model and is even more efficient when combined with special purpose hardware. Sobol’s modification18 was incorporated into a line of commercial digital cameras. Review papers document the advances in the original Land and McCann Retinex theory and image processing algorithms over the past 50 years.17,1920.–21
Figure 2 shows a map of Land and McCann (1971) papers and patents that incorporate the original ratio-threshold-product-reset algorithms. Reference 22 is a web page with links to the full text of those papers.
For a comprehensive review of “Land and McCann” algorithms and their implementation, see Ref. 21 (Chapter 32).
Two Distinct Parts: Model Vision and Make Reproductions
From the very beginning, the Retinex algorithm had two distinct, but related parts:
• First, develop a model of human vision based on detailed measurements of human sensations generated by complex real-life scenes.
• Second, use that model of vision as the basis of calculating human sensations and writing those sensations on film.
Cameras require many improvements to mimic human vision, namely, cameras need to have color constancy and HDR scene compression. A successful model of spatial color vision can calculate color constancy in HDR scenes and write those sensations on LDR media. However, color photography research has shown that people prefer enhanced sensations over accurate reproductions, so color and tone-scale enhancements are needed to meet consumer preferences.
Over the past 5 decades of growth in digital imaging, there has been a parallel growth in spatial image processing.
This paper serves as a historical introduction to the Retinex at 50. This paper reviews the original vision experiments, updated to the present. In particular, it describes measurements of spatial vision to serve as ground truth for vision models.
Outline of the Paper
Section 1 (above) reviewed the early history and motivation of Retinex algorithms. As well, it provides an outline with links to the Land and McCann Retinex literature.
Section 2 describes the experimental basis of Retinex models of Color Constancy from the original Color Mondrian through recent 3-D Mondrians. An object’s triplet of L, M, S lightnesses predicts its color.
Section 3 describes Land’s early exploration of appearance in HDR targets using his Black and White Mondrian experiment. This experiment led to Land and McCann’s model of calculated lightness. It introduces the need for observer data to define the spatial properties of a model of lightness. It describes the use of observer data to understand the spatial processing of human vision, including appearance in HDR scenes influenced by intraocular glare.
Section 4 provides an introductory framework of additional Retinexes that have different goals, algorithms, and image processing properties.
Section 5 summarizes “The Retinex Idea.”
Color Mondrians and Color Constancy
The Retinex algorithm began as a model of color vision. Its three independent (L, M, S) spatial color channels were needed to explain Land’s Color Mondrian experiments. Figure 3 illustrates Land’s Double Mondrian experiment. He used this demonstration in his Ives Medal Address to the Optical Society of America in 1968. At the top is a photograph of the two side-by-side, identical Mondrians, and the two independent sets of long-, middle-, and short-wave illuminating projectors with adjustable intensities. This top photograph shows the apparatus in uniform illumination. The circular papers are the areas of interest: green in the left- and red in the right-Mondrian.
Land adjusted independently the left L, M, S illuminations on the green circle and the right L, M, S illuminations on the red circle. He adjusted the overall uniform illumination on each side so that the green paper in the left Mondrian and the red paper in the right had identical radiances. Appearance did not correlate with quanta catch. The expanded experiments showed that a single triplet of quanta catches can appear as any color, at any location in the Color Mondrian.11,12
To understand how human vision does this, Land studied the Mondrians in each waveband. Figure 3(b) illustrates a portion of the two Mondrians in long-wave illumination. In Land’s experiment, the circular green paper in the left Mondrian had the same radiance as the circular red paper in the right Mondrian. The green circle reflected a smaller percentage of long-wave light than the red circle. To make the left-green circle have the same long-wave radiance as the right-red circle, the L illumination on the left had to be increased.
Figure 3(b) illustrates more long-wave illumination on the left Mondrian. Land recognized that a common, everyday phenomenon was happening here. We all have observed that when a cloud passes in front of the sun, we have less light falling on that scene. Nevertheless, the appearance of that scene changes only a small amount. Figure 3(b) illustrates a small darkening of all papers on the right caused by less illumination. The lightnesses of corresponding Mondrian papers in both Mondrians are nearly constant. In Land’s experiment, the green circle appears dark, and the red circle appears light in long-wave illumination when they have identical radiances.
In Fig. 3(c), the green circle on the left Mondrian reflected more middle-wave light than the red circle on the right. In that case, the right Mondrian had increased middle-wave illumination. Again, increased uniform illumination of corresponding Mondrian papers makes very small increases in apparent lightness for all papers. Again, the lightnesses of all corresponding Mondrian papers in middle-wave light were nearly constant in variable illumination. The spatial relationships of the appearances of the two Mondrians were nearly constant. The green paper appeared lighter, and the red paper appeared darker in middle-wave illumination when they had identical radiances.
These observations explained to Land why vision has color constancy, while film does not. Color appearance correlates with the relative visual lightness in long-, middle-, and short-wave light. The Retinex is a theoretical independent channel that calculates the apparent monochromatic lightness of each image segment, for each spectral waveband. Color appearance correlates with three Retinex lightnesses (Fig. 4).
Quantitative Model of Color Constancy versus Observed Match Data
McCann et al.14 measured color sensations in Color Mondrian color constancy experiments. The experiments used five sets of combinations of L, M, S narrowband illuminations. They showed that in uniform illumination, color sensations correlated with the paper’s reflectance using cone spectral sensitivities. They designed a triplet of spectral filters (L-cone, M-cone, S-cone) that modified a telephotometer’s spectral response to match that of human cone pigments. Using those three filters, they measured the relative cone quanta catches, and the cone reflectances of all of the papers used in the experiment. L-cone quanta catch was the L-cone-sensitivity meter readings from each paper in combined L, M, S illumination. Since cone-sensitivity spectra are so broad, each cone response includes some contribution from each L, M, S light. L-cone response is the sum of L light plus crosstalk contributions from M and S light. Cone reflectance values are the ratio of quanta catch values of (each paper/white paper) paper in each combined L, M, S illumination. Cone reflectance values change with changes in the relative amounts of L, M, S illuminations.
McCann et al.14 measured appearances (matches) and cone reflectances in five different illuminants. In all cases, color-constant appearances correlated with cone reflectance values for that illumination. In some cases, the change in illumination caused enough cone crosstalk to predict specific predicted departures from perfect constancy. Observer data correlated with the predicted departures. Apparent color constancy is limited by cone crosstalk. Apparent color constancy does not correlate with the surface reflectance of objects (measured with narrowband spectra), but rather with calculated L, M, S ratios of a paper’s cone spectral response divided by a white paper’s cone spectral responses.
Furthermore, McCann et al.14 successfully modeled color sensations using the spatial algorithm described by Land and McCann.12 This quantitative study provides important data on the limits of color constancy. It is an important set of ground-truth data for models of human color constancy.
Measurements of the Effects of Adaptation in Color Constancy
Additional color matching experiments showed that receptor adaptation cannot explain color appearance [see Ref. 21 (Chapter 27)]. These Color Mondrian experiments modified the surround to compensate for changes in scene averages caused by adjustments in overall illumination. Not only did the different color samples have constant radiances but also they had constant average scene radiances. Receptor adaptation cannot account for these color constancy experiments. As well, Grayworld and vonKries normalization cannot account for human color constancy.
Switching Color Constancy “OFF’” and “ON”
Another experiment shut off color constancy in a complex scene. As proposed by Vadim Maximov, the experiment made two sets of papers with correlated reflectances, shifted in color space. The experiment used illumination with spectra that shifted the combined radiances to be identical. This complex scene made by the combination of reflectances and illuminations creates two displays with identical quanta catch. Identical quanta catches over the entire field of view generated identical sensations. Even though we should expect color constancy in a complex scene, these two complex displays shut constancy off. Introducing new maxima turned constancy back on [see Ref. 21 (Chapter 28)].
McCann23 made a pair of Maximov Shoe Boxes. Each was a cardboard shoebox approximately , with plastic lenses, a piece of diffuse drafting velum on top, and a Kodak Wratten Color Correction filter, as illustrated in Fig. 5, left. The simplified Mondrians with five or six papers are called Tatami, after Japanese floor mats.
In principle, it is easy to do (Fig. 5). Imagine two Maximov shoeboxes: one for the upper Tatami and one for the lower. Select two filters that attenuate the color spectra but do not reduce the light at any wavelength to zero. The experiment used Wratten Color Correction filters: CC40R and CC40C. These filters have different effects on appearances depending on how they are used. When the filters are viewed side-by-side on a lightbox, they appear as high chroma red and cyan areas, surrounded by the light-box white. They look like high-chroma papers. When the 40R filter is held close to one eye, the appearance of the room has a pale pink cask. Replacing 40R with 40C makes the color cast cyan. The room colors are almost constant. When viewed side-by-side, they are highly colored, but in a color constancy experiment, they generate small changes in appearance.
Two complex scenes with identical quanta catch
The experiment demands pairs of colored papers that have color differences equal to that of the Wratten Filters. Papers with such demanding specifications had to be manufactured to fit the measurements. Digital control of local printed areas was not generally available in 1990. McCann used an early digital xerographic Canon CLC 500 printer to make two Tatami with identical colorimetric shifts for all pairs of corresponding papers. The colored papers in are shifted by the same amount in CIEXYZ space. The amount of the shift is equal and opposite to the shift caused by changing from a Wratten 40R to a Wratten 40C.
The experiment was to compare the color appearances in the two shoe boxes. One (Fig. 5, top) illustrates Tatami A with five colors that were shifted away from red in CC40C illumination; the other (Fig. 5, bottom) illustrates five colors shifted toward red in CC40R illumination. The papers were carefully manufactured to have the exact opposite shift in chromaticity as that caused by the filters.
Ordinarily, illumination has little or no noticeable effect. When we viewed the two Tatami side-by-side on a table in a room, there was very little change in appearance alternating the two filters.
When viewed in the Maximov Shoeboxes, the different sets of reflectances, in different illuminations, changed in appearance from looking different, to looking the same. Tatami A looked the same as Tatami B (Fig. 5, right). The result was that the color constancy mechanism for complex images was shut off using this pair of Maximov Shoeboxes. Despite the fact that the reflectances were different, the color appearances were the same.
Why did Maximov’s boxes turn off color constancy? The answer is that both Tatami have to look identical because every pixel in their entire fields of view had identical cone quanta catches. The sets of papers were made to shift the entire image as much as the filters did. When viewed in isolation, the quanta catch for both were the same, everywhere in the field of view. Whenever two images have identical quanta catches everywhere, they look the same. It was a challenge to find a set of papers that all shifted the same amount. The reward for this control experiment was shutting off color constancy.
New maxima restores constancy
Figure 6 introduced a white band around the central patch. If the white influences the appearance of all colors in the field of view, then the corresponding areas in the new Tatami Aw and Bw should no longer match in the Shoeboxes. If this is true, then it shows that color constancy is the result of spatial comparisons.
If the fundamental determinant of color appearance is the quanta catch at a pixel, then the small white frame should have only a small effect on appearance. Except for whites, every other pixel in the field of view is identical in TatamiA and Aw as well as B and Bw. Consider the change in appearance caused by the new whites in Aw and Bw (Fig. 6, right), compared to Tatami A and B (Fig. 4, right). Introducing white reflectances in different spectral illuminations in both Tatami revived color constancy.
Two careful observations are important here:
• First, the whites in Aw and Bw do not look exactly the same. Aw looks reddish in the CC40R box and coolish in the CC40C box. The influence of the illuminant shift is visible.
• Second, the two sets of five original papers look almost the same as they do in the room.
The whites still have a reddish, or coolish, cast depending on the illumination.
Nevertheless, the striking conclusion is that the introduction of white to both displays brought color constancy back to this complex scene.23 Extended experiments showed that any new maximum in any of the L, M, S cone responses turned constancy back on Ref. 24.
These results support the early Retinex mechanisms using calculations that reset to the maxima in each waveband.14 As well, observers noted the changes in color appearance of the white papers. That observation supports the hypothesis that small appearance changes are due to changes of overall quanta catches [Refs. 21 (Chapter 21), 23, 24].
The changes in color appearances are consistent with the colors expected by normalizing each receptor set independently to a maximum reference. In other words, the colors observed are consistent with the Retinex Color Theory.
Measurements of Departures from 'Perfect Constancy’
McCann25,26 made extensive measurements of changes in color appearance with changes in spectral content of 27 illuminations. The experiments used R, G, B LEDs inside a diffusing hemisphere dome. Each illuminant was generated by having experimenter turning on either 1, or 2, or 4 LEDs in each spectral band. Three spectral LEDs at three light levels made 27 different combinations of illumination.
Observers matched two chromatic and one achromatic samples in all illuminants. Observers reported that the achromatic paper was nearly constant in all spectral illuminants. However, the chromatic samples showed a small but distinctive shift in appearance matches to the Munsell Book. That signature shift correlates with changes in spatial edge ratios due to the overlap in spectral sensitivity of cone photopigments.14 That signature was distinctly different from predictions made by an incomplete adaptation model.27
Color Mondrians in Illumination with Edges
All of the Color Constancy experiments described above used flat Mondrians in uniform illuminations. The Mondrian used in Ref. 14 is shown in Fig. 7(a). Recent experiments28 measured appearances in nonuniform illuminations that had sharp shadows, which created edges in illumination. Human visual appearance mechanisms treat edges in illumination the same way they treat edges in reflectance. The 3-D Mondrian experiments used blocks of wood [Figs. 7(b) and 7(c)].
All the 3-D Mondrian facets had one of 11 paints (R,G,B,Y,M,C,W,GL,GM,GD,K). The observers were informed that all blues had the same blue painted surfaces, etc. They were asked to measure changes in appearances of individual blue facets compared to a ground truth blue sample mounted in front of the 3-D Mondrians. The set of facets included each paint in nearly uniform (LDR) and in directional (HDR) illuminations. They were asked to quantify the degree of color constancy in more real-life illuminations. Figure 7(b) used an integrating illumination box (LDR illumination) that attempted to make uniform illumination. Observers reported that many facets with the same paint appeared nearly constant. Others facets with that paint did not. Figure 7(c) used two different white lights hitting the 3-D Mondrian from different directions (HDR illumination). These illuminants created sharp shadows. In HDR illumination observers reported many large departures from color constancy. Color appearance correlates with the edges in the retinal image, not with the reflectance of each painted surface.28
Carinna Parraman made a unique contribution. She painted the appearance of the two 3-D Mondrians in watercolors. She made two paintings by painstakingly reproducing the appearance of each facet (matching its sensation). The watercolor paintings were made using uniform illumination on the watercolor paper. Figure 8(a) shows her painting of the 3-D Mondrian in LDR illumination; and Fig. 8(b) shows the 3-D Mondrian in HDR illumination. She quantified her matching sensations of each scene segment by painting it and then measured sensations by measuring the reflectance of the watercolor painting.28
Although tedious and demanding great skill in painting, this is an important advance in measuring appearance of HDR scenes. Parraman matched the entire complex scene with watercolor paints. When she measured the reflectance of each individual facet, she converted her sensations to a ground truth color value for each facet. A successful model of vision must predict these painted apparent reflectance sensation values for each facet.
In summary, the 3-D Mondrian experiments measured the limits of color constancy. While departures from ideal (perfect) color constancy are very small in uniform illumination, constancy erodes with the increase of spatial structure in illumination. Color sensations of identical surface reflectances change in real-world illumination. Edges in illumination are processed in the same manner as edges in reflectance. Cone quanta catch cannot discriminate between radiances modified by reflectance and radiances modified by illumination.
Summary: A Model of Human Color Vision
The body of work in Sec. 2 using Color Mondrians provides an extensive dataset for ground truth information for Color Constancy models. The experiments provide observer data for models of human vision that include:
• Quantitative matches (Munsell Book) —Color Mondrians in constant average radiances [Ref. 21 (Chapter 27)]
In retrospect, these quantitative data on the limits of observer color constancy are very important. One cannot just assume perfect color constancy when modeling human vision. Color sensations do not correlate with surface reflectances in complex natural scenes. That model needs to account for the fact that color constancy varies with scene content. Edges in illumination have the same visual impact as edges in reflectance. Universally effective spatial algorithms must mimic human spatial mechanisms. After all, reproductions are made solely for human viewing.
Black and White Mondrians—Lightness Constancy
When Land realized that human vision was a spatial mechanism, he approached image reproduction in a new way. He thought that reproduction of real scenes must incorporate a spatial model of vision.29 The idea evolved to the sequence of capturing scene information; then, spatial processing to calculate visual sensation; then, writing sensations on film [Refs. 21 (Chapter 32), 16, 30, 31].
In 1968 Land and McCann extended Retinex Theory to include nonuniform illumination using the Black and White Mondrian experiment.12 Here, gradients of illumination made near-white and near-black papers have the same retinal luminance (Fig. 9). Despite equal cone quanta catches, the white paper looked white and the black paper looked black. The retinex lightness algorithm added thresholds and reset normalization to its spatial comparison mechanism. Spatial comparisons successfully modeled sensations. Land’s Black and White Mondrian was the first quantitative study of appearance in high-dynamic range (HDR) imaging. It used a range of illuminations falling on the scene that was equal to the range of reflectances of objects in the scene. It asked observers the sensation question, namely, “What is the appearance of the papers”?
The Black and White Mondrian makes a number of important points about human vision.
• White and black reflectances can have identical radiances in nonuniform illumination.
• Identical radiances can have any sensation (white to black).
• In a complex scene, radiance cannot predict appearance.
• The appearance of an area cannot predict the radiance of that area.
• Tone scale maps, using single pixels, cannot improve HDR images.
Tone scales can only improve regions of an image. If white and black reflectances have the same digital value, a single-pixel tone scale map cannot make changes in different directions. It cannot make the white area lighter, while making the black area darker. Improving an HDR scene reproduction requires spatial modifications [Ref. 21 (Chapter 31)].
The Black and White Mondrian also points out a serious concern. One can never just look at a picture to evaluate the success of a computational algorithm’s output. Algorithm analysis requires study of the output numerical values. When we look at an output image (Visual Inspection), human spatial image processing transforms radiance information into sensations. Since radiance does not correlate with appearance, a pixel’s appearance tells you nothing about the numerical content of the output image. One cannot evaluate the computational success, or failure, of an algorithm by inspecting a processed image on a display. Human observations, while inspecting the display image, add vision’s own spatial transformations.14 Obviously, one has to use human observers to measure observer preferences for the most desirable camera images, but the evaluation of computational imaging requires an actual analysis of the numerical output values, without human signal processing.
Extending Measurements of Appearance
One of Edwin Land’s greatest talents was his unique ability to think of critical experiments. His experiments tested the fundamental principles of a hypothesis or theory. As described above, Land used Color and Black and White Mondrian experiments as an exploration of the imaging properties of vision. These simple combinations of measurements of reflectance, illumination, and human sensations made an essential contribution to our thinking about appearance.
Can we add to Land’s experiments with additional tests, which inform us about the fundamental mechanisms of vision and provide additional ground truths for our models? Can we use the quantitative measurements of human responses to scenes to better test our models?
Surrounds and averages
What are the important properties of an image’s digital content? Should we look to image averages, contrast ranges, histograms, or other metrics of scene content?
Following the modeling protocol described in the 1960s,14 McCann et al. measured the appearances of lightnesses using many types of scene contents. This set of targets included variations in reflectances, uniform and gradient illuminations, and visual phenomena in order to study vision’s spatial properties. An essential part of this study was to include test targets in which appearances did not correlate with reflectances. Figure 10 shows a series of 15 black-and-white test targets used to evaluate lightness models. The targets were transparencies with a dynamic range of , with angular subtends of . The targets included variations in scene average luminance, gradients in illumination, variations of simultaneous contrast, extremes in background, and combinations of edges and gradients. The entire scene of calibrated luminances was the input to each spatial vision model. Observers matched the lightness of all the areas in all targets. Models of appearance calculated sensations using scene radiances as input. The results compared calculated sensations for all image segments with corresponding observer matches. Observer inspection of processed images and observer preferences were not part of the evaluation. The results showed that all these design parameters shown in Fig. 10 have small influences on matching sensations. Scene averages, contrast ranges, histograms, or other metrics of scene content were not critical factors for modeling matching sensations. We were able to fit all these experiments using a single set of model parameters. The fit of simultaneous contrast, Albers and, Gradients with Edges data were the most sensitive to these model parameters [Ref. 21 (Chapters 32 and 35)].
Spatial relationships versus image statistics
Robert Savoy made as set of six targets using identical histograms, namely, he used constant areas of a dark gray test patch, and constant areas of maximum luminance (white) and minimum luminance black surrounds in a dark room. Figure 11 (top) shows the spatial arrangement of six scenes made from identical pixel populations.32 The displays had a constant 2.5 deg dark-gray square at the center.
The background around test area T was constant (0.1% transmission), with the exception of the addition of a fixed number of maximum luminance pixels (1.0% transmission) in a variety of spatial arrangements.
Figure 11 (middle row) shows the measurements of the variable appearance of test area T from identical pixel populations. The same pixel populations are just rearranged in their spatial locations. All six targets had the same-size constant luminance central square area, labeled T.
In Fig. 11 (left target), all the maximum radiance pixels surround the test square. Observers matched T to Lightness 1.5, nearly black.
In Fig. 11 (right target), all the maximum radiance pixels are adjacent to the test square on only one side. Observers matched the test square to Lightness 3.9, near to middle gray (Lightness 5.0). Other spatial arrangements gave intermediate matches. Despite identical histograms, lightness varied over 30% of the range from white to black when viewed separately.
The set of six targets has different spatial positions of maximum luminance pixels and different adjacent stimuli. Asymmetry, contiguity, and enclosure are important. There is no simple rule that explains this spatial data. The only direct conclusion is that neither scene averages (Grayworld) nor the population of luminances (histogram) controls appearance.
Local image statistics
There are a number of studies that provide a challenge to models of vision using local statistics. One study measures the appearance of a central gray square with eight surround squares.33 Half of the surrounding squares are white, the other half black. The experiment measures the sensations of the central gray in all the combinations of spatial arrangements.
Figure 12 is a plot of segment pattern versus log matching luminance (LML). The graph plots the eight-white elements; all 14 patterns with 4 white and 4 black elements; and 8-black elements in the surround.33 They are sorted from left to right in order of increasing average LML. The two lowest LML values are from all-white, and 0 of 4 adjacent blacks. The next two patterns have one adjacent black, and the following seven LML values have two adjacent blacks. The next four LML values have three adjacent blacks, with more variability than previous patterns. The highest matching luminance is for the eight black squares.
Contrast is the psychophysical term used to describe the observation that a gray test area looks lighter when adjacent to black areas. The range of contrast effect from all-white to all-black surrounds is identified with small images of them on the vertical axis (Fig. 12). When we varied the eight half-white and half-black surrounding areas, we measured matching luminances that nearly covered the entire contrast range.
The adjacent segments have more influence than the diagonal segments on matching luminance. The data from the 14 test targets with 4-white and 4-black elements correlate with the number and location of gray-black edges/gray-white edges.33 Those data do not correlate with the constant average luminance of the surround (Grayworld) and the constant pixel-luminance histogram of the test target.
All of these detailed studies [Ref. 21 (Chapters 20 to 25)] point out that the spatial organization of boundaries is in control of sensations. Lightness appearance correlates with:
• spatial comparisons at edges;
• the direction of the spatial comparison;
• the enclosure by areas of higher luminance;
• the angular subtend of areas;
• and the separation from local maxima.
Scene statistics cannot account for observer matches and model their appearance.
Simultaneous contrast is the familiar demonstration that surrounds affect appearance. Figure 13 illustrates the test target. This simple experiment uses two identical gray papers on white and black surrounds. Observers report that gray-on-white appears darker than the same gray-on-black. What makes the experiment more interesting is the fact that the retinal stimulus of the apparently darker square is higher than the other. When we consider intraocular glare, the white surround scatters light into its gray square, yet it looks darker. Why does more light look darker? Two powerful spatial mechanisms, “intraocular glare” and postquanta catch “neural contrast” tend to cancel each other. Neural contrast is slightly stronger than “glare” for this target. It overcompensates glare, making the gray-on-white darker.
The effects of intraocular glare are hard to see, except in severe clinical cases. Nevertheless, it limits the range of light that reaches our retinas. Depending on the scene, amounts of glare can vary from very small to very large amounts. A scene composed of just stars at night has little glare, while a beach scene will have an extremely low range of light on the retina. Despite this limit of range of light on the retina, observers report that they see the richest, deepest blacks under high-average luminance and high glare conditions.
A set of HDR test targets with almost 6 log units of dynamic range was used to study the role of intraocular scatter [Ref. 21 (Chapters 14 to 19)]. The test targets have different backgrounds covering maximal to minimal glare. The target with half-white and half-black surround is shown in Fig. 14. Using Vos and van den Berg’s Glare Spread Function,34 it is possible to calculate the radiance image on the retina. The dynamic range of its retinal image is 2.0 log units. Depending on the content of the surround, the dynamic range of the retinal image changes from 1.5 to 4.0 log units.
Young observers, with low levels of intraocular glare, were asked to make magnitude estimates of appearances of test areas in Fig. 15. Given the endpoints of sensations (, and ), the observers estimated the appearance of 40 gray squares, in 20 pairs of squares. The vertical axis in Fig. 15 is the magnitude estimates of lightness. The plots of the retinal response functions (retinal luminance versus lightness appearance) show markedly different functions depending on scene content.
The envelope of visual response functions is measured by these experiments. There is no single visual response function to light. The response varies with the specific scene content.
Intraocular glare causes large changes in the dynamic range of light on the retina as the result of scene content. This is illustrated in Fig. 16. The first powerful spatial process is optical. Glare from all parts of the scene reduces the retinal light range of a beach scene to very low levels. Nevertheless, apparent contrast is highest when retinal range is lowest. The second powerful spatial process is neural; it is performed by post-quanta-catch spatial processes.
The combination these two processes is a cancelation of scene-dependent glare by scene-dependent neural contrast. The first spatial mechanism introduces substantial changes to the optical image, and the second mechanism transforms the neural response. Remarkably, the resulting sensations minimize the effects of intraocular glare. They show only small residual differences in appearance. Objects appear more constant because of the powerful postquanta-catch neural processing.
Summary: Observer Data Defines a Model of Spatial Vision
The ensemble of Lightness experiments reviewed in Sec. 3 measures important properties of human vision. This ensemble reveals vision’s unique pair (optical and neural) spatial-image-processing mechanisms. Section 2 documents the need for three independent color Retinex channels, each with spatial lightness rendering.
To understand, and improve, our image reproduction algorithms, we must understand how human vision processes our reproductions. If a reproduction has to reproduce what we see in all scenes, then that process must have a sophisticated model of human spatial vision.
Retinex Scene Reproduction Algorithms
Land initiated the idea that we needed a model of spatial vision to make better reproductions. That model needed to capture the wide range of scene radiances as input, spatially compare them to calculate sensations, and then display them.16,30,31
The Land and McCann Retinex Reproduction Algorithm has four ideas as its foundation. They are analogous to the four legs of a table.
1. Retinex is a model of human vision. The idea was to make better reproductions by incorporating an algorithm that mimicked vision. The first leg was extensive measurements of appearance in a wide variety of scenes in which appearances did not correlate with luminances [Refs. 13 and 21 (Chapter 35)]
2. The Land and McCann (L&M) Reset—As described in Ref. 12, it was an accident. The Retinex analog electronic circuit had a reset introduced by the electronics that acted to normalize the output.35 When we modeled reset’s properties we learned that it acted to normalize the different values reported on different paths. We found empirically that the combination of reset with the right length of path were all the parameters needed to model all of our difficult appearance test targets. We also found that the threshold, a logical operation designed to remove gradients, did not mimic vision (McCann).17,20 Using the Land and McCann Reset, we learned that we could successfully mimic vision. We used that data to set the parameters of our reproduction model.
3. Computational efficiency—In the 1970s, any attempt to perform electronic image processing had to be extremely efficient. By 1975, we abandoned 1-D paths and moved to experimenting with 2-D array processors to implement our algorithms using arrays.16 The L&M Reset was extremely efficient as a design feature. The Zoom Multiresolution implementation17 is O(N) in BigO notation.
4. Sensation versus perception—In 1980, at the AIC conference in Cambridge United Kingdom, L&M Retinex made a major clarification of our language about the Retinex model. We turned to the JOSA definitions of Sensation and Perception.36 We wanted to differentiate our bottom-up model (sensation) from Helmholtz idea of discounting the illumination, to recognize surface reflectance. Using the OSA definitions, perception implies recognition, implies top-down lightness generation, implies Helmholtz—not Land. That lecture36 described in detail that the Retinex problem of calculating lightnesses was about predicting the sensations caused by gradients and edges. For sensations, reflectance and illumination describe the physics of the stimulus but are not always correlated with apparent sensations. Perception experiments can measure a human’s ability to recognize objects and estimate their reflectances and illuminations. Perception experiments, beyond the scope of this paper, generate different data from sensation experiments.36
An important additional problem is that the spatial algorithm that mimics vision resides in the middle of the scene-reproduction processing pipeline. Assuming that the model successfully calculates sensations, we still have the practical problem of transforming that 2-D array of sensations into the appropriate signal for the reproduction media device. The print or display device needs an image that is calibrated for its conversion process from digits to light, viewed by the observer. That postspatial process also requires chroma and tone-scale enhancements to suit consumers’ preferences.16
Unfortunately, it can be much more convenient to take a shortcut. If the goal is simply to make a better scene reproduction, one can take a photograph of a scene, apply a spatial algorithm, and send that processed image to the output device. This shortcut removes two tedious tasks:
• First, it omits camera calibration to capture accurate radiance information.
• Second, it replaces the task of matching sensations with just asking the observer to evaluate the output. Which image looks best? Or, does the image appear to have the desired improvement?
Many authors have used this approach. There is no doubt that their algorithms have made improved renditions of the images that they selected. But, are these algorithms successful models of vision? Do these algorithms provide a general solution to the problems of scene reproduction? Or, are they simply singular examples of trial-and-error image manipulations?
The biggest problem with the visual inspection technique is that it does not include a discussion of the role of human vision in the algorithm’s evaluation process. If vision is a powerful spatial image processing mechanism, then what are the specific effects of using vision to measure success? Looking at the algorithm’s output image means that the observer is applying those same spatial image processing mechanisms a second time in looking at the experiment.
It is a mistake to use observer preferences to evaluate the accuracy of a model of vision. It fails to separate the model’s spatial processing from subsequent human spatial processing. Is the human processing the source of the improvement, rather than the digital algorithmic processing?
Additional Retinex Algorithms
In the Art and Science of HDR Imaging, Section F: HDR Image Processing,21 McCann and Rizzi attempt to discriminate between all the different Retinexes and related algorithms. That description took about 100 pages to cover the history and make clear distinctions between algorithms.
McCann and Rizzi defined and differentiated the following: Land and McCann Retinexes (L&M); Frankle and McCann Retinex (a 2-D implementation of L&M); Designator Retinexes—(Land’s new sampling technique—It does not discuss Reset); Andy Moore’s Resistive grids—(Land’s Designator); NASA Retinex (Jobson et al. extension of Designator); Gamut Retinex (L&M Spatial gamut mapping|); Milano Retinex (Rizzi et al.); Kotera Retinex (an extension of NASARetinex); Sobol Retinex (extension of L&M Retinex used in a line of HP cameras); Variational Retinex—Provenzi, Morel, Wilson & Cowan; Bertalmio and others. A key issue is whether Retinex algorithms can be formalized or require implementation by iterative processes. Provenzi et al.37 use ratio-product-reset-average steps in the Milano Retinex. They used a different reset process that allows formalization. Morel et al.38 state that the original Land and McCann Retinex reset cannot be formalized in PDE. These distinctions are important21 but beyond the scope of this review.
The intent of this review paper in Retinex at 50 is to focus on the underlying Land and McCann model of vision. The principles of many other Retinexes are covered in other papers in this Special Edition.
Discrimination between Spatial Algorithms
The dual challenge of Retinex continues today. How do we model human vision? How do we make better reproductions using that model? The answer to that challenge will be determined by the ground-truth data that we decide are important in evaluating images.
The quality of ground-truth selection will determine the quality of the algorithms. We need to get beyond simple evaluation principles of observer preference, color balance, and HDR compression. The Retinex approach studied human vision to understand its mechanisms. By thoughtfully collecting sets of difficult scene content, we can improve our ability to discriminate between moderately successful algorithms for some scenes and excellent algorithms for all scenes. In recent decades, the number and diversity of spatial algorithms have expanded dramatically. However, visual inspection of processed images lacks the discrimination to identify superior algorithms.
The original Retinex process used measured sensations created by a collection of challenging scene content: color constancy, gradients in illumination, constant spatial statistics, and illumination with edges. Each of these scenes provides a different challenge for a model of human vision. A successful model of vision should be able to predict observer matches in all these scene contents.
Retinex Falls between Colorimetry and Perception of the Surface of Objects
Colorimetry makes the unspecified assumption that spatial processes are absent from vision. While everyone agrees that quanta catch is necessary in a model of vision, no one should argue that it is sufficient. Human color vision is a spatial process.
There is an equally bad underlying “perception” assumption, namely, that “Objects Appear Constant” in all complex scenes. Here, the pendulum has swung to the opposite extreme. The underlying assumption is that a surface’s reflectance controls its appearance. Unfortunately, many authors mistakenly cite Land’s experiments as evidence for this idea. Some even cite Land’s experiments as evidence that spatial image processing can “discount the illumination,” so as to separate illumination from reflectance. Retinex does not do that. That notion is incompatible with Land’s writings:
• The last sentence in Land’s Ives Medal Address: “the function of retinex theory is to tell how the eye can ascertain reflectance in a field in which the illumination is unknowable and the reflectance is unknown.”12
• In the discussion of the “biological correlate of reflectance” [Refs. 11, 13, 15, 21 (Chapter 32)], Land cited many examples of test stimuli in which lightness did not correlate with physical reflectance.
Just as we cannot think that cone quanta catch alone can predict color, we cannot think that all objects always appear constant. Both the “Colorimetry” and the “Objects Appear Constant” assumptions are incompatible with accurate measurements of vision.
The origin of the word “Retinex” was the observation that color appearance in complex scenes correlated with the triplet of apparent lightnesses in L, M, S illuminations. Regardless of the cause of the lightness changes, when two identical physical objects look different, color appearances correlate with their L, M, S lightnesses [see Ref. 21 (Chapter 27)].
Figure 17 (top left) shows two identical sets of nine red squares. When the same sets of nine squares are surrounded by yellow and blue stripes, the left and right sets no longer have the same color (top center). On the left side, the red patches fall on top of the yellow stripes; and on the right side, they fall on blue stripes. The left patches appear a purple red, while the right ones appear a yellow orange. In other words, the left patches appear more blue, and the right ones more yellow.39
In Fig. 17 (bottom), the apparent lightnesses of the sets of red squares are different:
• In the L separation, the squares are lighter on the right;
• In the M separation, these squares are lighter on the right;
• in the S separation, the squares are darker on the right.
Land’s Retinex Theory predicts that whenever L and M separations are lighter and the S separation is darker, then that patch will appear more yellow. Whenever the S separation is lighter, and L and M separations are darker, then those squares will appear more blue. Colors correlate with L, M, S lightnesses.
Land’s Retinex Theory predicts that color in complex scenes correlates with the apparent lightnesses in long-, middle- and short-wave light. The triplet of Retinex Lightnesses, rather than the triplet of surface reflectances, predicts color appearance. That prediction still stands after more than 50 years.
The Retinex Theory of Color led to a wide variety of spatial image algorithms discussed in this Retinex at 50—Special Issue. Land introduced the idea that a model of spatial vision should be the foundation of spatial image processing algorithms that make better scene reproductions. Furthermore, the measurements of observer sensations should be the ground truth used to design and to evaluate the success of these algorithms. This paper reviews the ground truth measurements that can help us model vision. Furthermore, these ground truth data help us find the general solution for image reproductions for all types of scenes.40
This body of work would not have been possible without the inspiration of Edwin Land and Ansel Adams; the teaching of Nigel Daw and John Dowling; and the wonderful collaboration with Jeanne and Steve Benton, Vaito Eloranta, Ed Purcell, Tom Taylor, Suzanne McKee, John Hall, Jay Scarpetti, Jon Frankle, Bill Roberson, Hugo Liebmann, Bob Savoy, Allan Stiehl, Bill Wray, Karen Houston, Jim Burkhardt, Jay Thornton, Bob Sobol, Marzia Pezzetti, Ale Rizzi, Ivar Farup, Carinna Parraman, and Vassilios Vonikakis; and the special collaboration with Mary McCann.
John J. McCann received a degree in biology from Harvard College in 1964. He worked in and managed the Vision Research Laboratory at Polaroid from 1961 to 1996. He has studied human color vision, digital-image processing, large-format instant photography, and reproductions of fine art. His publications/patents have studied Retinex theory, color constancy, color from rod/cone interactions at low-light levels, appearance with scattered light, and HDR imaging. He is an IS&T and Optical Society of America (OSA) Fellow. He is a past president of IS&T and Artists Foundation, Boston. He is the IS&T/OSA-2002 Land Medalist and IS&T-2005 Honorary Member.