This is a historical translation of the seminal paper by H. Gamo, originally published in Oyo Buturi (Applied Physics, a journal of The Japan Society of Applied Physics) Vol. 25, pp. 431–443, 1956. English translation by Kenji Yamazoe, with further editing by the translator and Anthony Yen.
Since optical systems have distinctive features as compared to electrical communication systems, some formulation should be prepared for the optical image in order to use it in information theory of optical systems. In this paper the following formula for the intensity distribution of the image by an optical system having a given aperture constant α in the absence of both aberration and defect in focusing is obtained by considering the nature of illumination, namely coherent, partially coherent, and incoherent:
Discussion of the relationship between the object and the image of an optical system using information theory is a recent topic. Some basic studies such as describing the optical system using the response function have been done. However, entropy or noise is not discussed enough, though it is important in information theory. Information theory has been successfully applied to electrical communication, but the way it is applied is not directly applicable to an optical system. Hence, we need to formulate the information theory taking specific properties of the optical system into account.
The properties specific to the optical system are the following ones. (1) Observable quantity is not the amplitude of the wave but its intensity represented by the square of the amplitude; the intensity must be positive. (2) When we regard the optical system as a spatial filter, the phase and amplitude of the response function defined in the optical system are independent physical quantities; in the electric circuit, they are restricted by causality because they change over time. (3) Information contained in the image depends on the coherence of the illumination; for example, it is possible to obtain the information of amplitude and phase of coherent illumination; however, the phase information cannot be obtained when the illumination is incoherent. (4) If we regard the optical system as a communication channel, noise is an important quantity in deriving its capacity; in the optical system, noise can be stray light, disturbed observation due to random movement of the medium, graininess of a film, perception, and so on; these factors will add additional intensity to the nominal intensity or may partially reduce the nominal intensity; the resultant intensity will, however, not be negative. (5) Optical system behaves like a multi-dimensional spatial filter; in most cases, it is sufficiently described by a two-dimensional spatial filter.
These properties should be considered when we formulate information theory for the optical imaging system. If we do not take them into account, we cannot finalize the formulation of the information theory for the optical imaging system and may only get an insight that might help us understand the optical system.
In what follows, the author examines an aberration-free optical imaging system taking into account the first and second properties described above. In particular, the author will discuss how information contained in the image changes with respect to the third property, i.e., the coherence of illumination, using the sampling theorem that is often used in communication theory. Distribution of the image intensity will be described by a positive-definite Hermitian matrix, termed “intensity matrix,” and the physical meaning of the image will be revealed from the properties of the matrix. For this analysis, “phase coherence factor” introduced by H. H. Hopkins et al. will be used.
“The sampling theorem” used hereafter is briefly reviewed. We assume an optical imaging system in Fig. 1 consisting of a light source, an object, lens, and an image.
Let be an incident wave from a point on the source to a point on the object plane. When the complex transmission of the object is , the complex amplitude of the wave after object is . As shown in Fig. 1, if we define as the maximum half-angle of a light cone to be captured by the lens and to be the wavelength, then corresponds to the role of bandwidth of the electrical communication system. If we Fourier transform the wave , we obtain
In Eq. (1), represents the directional cosine of the plane wave; thus, is the amplitude of the plane wave that propagates with the angle of . Here, that forms the image is band-limited due to the aperture of the optical system. Since behaves like the bandwidth of the optical system, as briefly noted above,
The amplitude of the wave on the image plane is given by the inverse Fourier transform of , which has the bandwidth of . Thus, from Eq. (3)
On comparing Eq. (3′) and Eq. (4), we notice that is obtained by multiplying to which is the image amplitude at . Thus,
Therefore, the image amplitude obtained through the optical system with the bandwidth of due to its numerical aperture can be determined if the image amplitude is sampled at an interval of . The function set that appears in Eq. (5), i. e., forms a complete orthogonal system. Hence, with any integer and ,10). This result corresponds to the sampling theorem applied to a band-limited electrical system, with the only difference that the amplitude to be sampled is a complex number. The reason of this complex amplitude originates from the second property of the optical imaging system noted in Sec. 1. Although this result is obtained with an aberration-free one-dimensional imaging system, we can extend this idea and apply it to a two-dimensional imaging system with a square aperture. If we assign the basis function set to be , any band-limited complex amplitude can be represented by a series expansion. A rigorous treatment of the sampling for a circular aperture is not straightforward, but the sampling grid of a circular aperture is the same as that of the square aperture. For example, given a circle, we can define a sampling grid using circumscribed squares, and the amplitude distribution can be determined by sampled values though these sampled values are not fully independent of each other.
Above discussion holds when the illumination is a point source; in other words, it only holds when the illumination is coherent. Even when the illumination is coherent, its physical meaning is revealed only when we can measure the phase and amplitude independently by some appropriate way, such as taking the phase difference. This is due to the first property noted in Sec. 1, which demands that the direct measurement can be made to intensity only. In the following, a method to analyze the physical meaning of the image intensity will be introduced, which can be applied even when the illumination is partially coherent or incoherent.
Intensity Matrix Related to the Phase Coherence Factor
We are able to obtain the image intensity distribution by Eq. (5) in Sec. 2. Letting be the image intensity at a point on the image plane with coordinate ,
We now consider the image intensity from a generalized source of a finite extent with an arbitrary intensity at each point. Letting be the point source intensity at a point with coordinate , the image intensity at a point is given by
Since point sources are mutually incoherent, the sum of intensities formed by all point sources is the total intensity. This fact can be explained from the statistical view point as follows. Intensity is found by taking the time average of the square of the amplitude of the light wave. Since each point source has no correlation with all others, this time average of the square of superimposed light waves is equal to the sum of time average of the square of each light wave.
The main feature of this paper will be based on Eq. (7) which has a quadratic form of variables and with coefficients . Since the intensity cannot be negative, this is a positive quadratic form. The matrix with elements is termed “intensity matrix” in this paper. (Intensity matrix has the similar property to the information matrix proposed for general systems by D. M. MacKay.7 Although N. Wiener8 proposed the coherency matrix by assuming time varying light waves, it is not applied to an optical imaging system.) Once this matrix is given, we can determine the image intensity distribution. In this sense, the intensity matrix contains all the information of the image intensity distribution. In what follows, the physical property of the intensity matrix will be examined.
The element of the intensity matrix is expressed using the Hopkins’ phase coherence factor. The use of the phase coherence factor is beneficial for our purpose because it has been studied in detail. First, Eq. (7′) is changed to (See Sec. 8.)
Equation (8) shows the relationship of to the phase coherence factor. Inside Eq. (8), we single out the integral with respect to the source coordinate , and set it as . Then
This value can be regarded as a correlation between light wave amplitudes at points and on the object plane. If the amplitude at each point generated by a point source at are and , we obtain
When the intensity at is and at is ,
Here, is the phase coherence factor which is by definition given by
When the light source illuminates the object, the intensity (, 2) is the square of the absolute value of complex amplitude of the incident light at , which is represented as . Therefore,
Equation (11) is our first formulation. We discuss a few examples of the intensity matrix in the next section.
Examples of the Intensity Matrix
When the light source is coherent, the phase coherence factor is 1 regardless of or .2 Therefore, by substituting into Eq. (11), the integral can be separated into a product of two integrals to yield
The matrix element is a product of the complex amplitude on the image plane at and . This result simply produces Eq. (6) of Sec. 3, but analyzing the meaning of the above matrix will give us an important insight for the remaining part of this paper. In this coherent source case, a remarkable feature of the matrix is that its rank is 1 and its single eigenvalue determines . The rank of a matrix is said to be when the determinant of sub-matrix with the order lower than or equal to is non-zero but the determinant of all sub-matrices with the order higher than is 0. If the matrix element is given by Eq. (13), we may notice that the determinant of all sub-matrices with the order higher than 1 is 0. The characteristic equation that gives the eigenvalue is
In this case, the phase coherence factor is
When this is substituted into Eq. (11), the element of the intensity matrix for the incoherent source is given by the single integral as
Equation (14) is connected to the equation that gives the image from an incoherent source with intensity , which can be expressed as an integral by the incoherent imaging formula
According to Eq. (14′), a single integral yields the image whereas the expression by the intensity matrix seems to involve more integrations which gives us an impression of extra cost of calculation. This point is discussed at the end.
The simplest example is an object with uniform brightness. If we set where is a constant
[The intensities I(y) in Eq. (14′) and I(y) from Eq. (15) must be the same (See Sec. 9).] Our next example is simple and basic, in which the object has a sinusoidal amplitude over the object plane. The brightness on the object plane is
In this case, the element of the intensity matrix is represented by the following integral
We now change the valuables. Letting and ,
This type of definite integral will appear in the following but since the result may not be found in standard integral tables, the principal result together with its derivation is listed in Sec. 10. According to Sec. 10, when , i.e., ,
When or equivalently , is 0. Based on this result in which is 0 above a certain spatial frequency threshold, the response function for the image intensity in Eq. (14) is of triangular shape1 with 0 value over the bandwidth of .
We now consider the rank and eigenvalues of the intensity matrix. First, when the object brightness is uniform, the intensity matrix in Eq. (15) is a diagonal matrix with non-zero diagonal elements. Next, when the object brightness changes sinusoidally, especially when or equivalently , the intensity matrix is also a diagonal matrix with the diagonal element of . These examples show that incoherent illumination gives the greatest matrix rank in contrast to coherent illumination which gives the minimum matrix rank of 1. (When the object size is finite, the object intensity or complex transmittance can be represented by a Fourier series; hence we may be able to apply the results of Secs. 4.2 and 4.3 to each Fourier term.)
Partially Coherent Source
We consider the one-dimensional case, under which the phase coherence factor is defined in Eq. (12) as
Let us consider the intensity matrix element for an image formed by uniform object transmittance. Substituting in Eq. (17) into Eq. (11), and furthermore letting ,10)
Next, we consider an object with a sinusoidal transmittance. Substituting in Eq. (17) into Eq. (11), and furthermore letting ,10 and only the results are shown here:
We may be aware of a few interesting points from the above example of partially coherent illumination. First, for incoherent illumination, when or , leading to no contribution to the image. Whereas for partially coherent illumination, when or , meaning more non-zero matrix elements than incoherent illumination. Second, if and as in case (iii), we will obtain the same result as that of incoherent illumination. Lastly, if and as in case (iv), the result reduces to the coherent case in the limit of . Deriving the rank of the intensity matrix with the elements given above in order to calculate its eigenvalue is an extremely difficult problem, except in some special cases. This fact implies that the image formation with the intensity matrix may be limited in practice. However, since the intensity matrix itself has interesting general properties, it can be useful in clarifying and organizing the physical concept of optical image formation.
Up to this point, we have considered an object of sinusoidal transmittance or brightness. If the object has an arbitrary distribution of transmittance or brightness, its transmittance or brightness can be expressed as a sum of periodic terms using the Fourier integral. Since each term can be treated as is explained above, we can obtain the elements of the intensity matrix when the object has an arbitrary transmittance or brightness distribution.
General Properties of the Intensity Matrix
The intensity matrix was derived by squaring the amplitude obtained from the sampling theorem applied to a band-limited system. In electrical communication, since the square of the amplitude corresponds to the electric power, we may be able to derive an equation similar to the intensity matrix. However, this idea is not as crucial as the intensity matrix in an optical imaging system because the phase coherence factor explained in Sec. 1 is a unique feature only for an optical imaging system.
For detailed discussion in the following, Eq. (7) is repeated here. The intensity is
If are regarded as vectors in a multi-dimensional space, the linear sum is another vector. The intensity is then given by the inner product of these two vectors
Degrees of freedom for the image can be equated with the dimension number in the multi-dimensional space introduced above. Degrees of freedom is exactly the number of the sampling points used to express the image under consideration. If the area of the image is , . Toraldo di Francia defined the degrees of freedom when the source is coherent, which is defined above. However, Toraldo di Francia defined another definition when the source is incoherent. In comparison, it seems mathematically consistent, regardless of the coherence of the source, to define the degrees of freedom as the number of dimensions to be used to determine the image. The source coherence is contained in the intensity matrix. Therefore, one can separate the number of the dimensions determined by the numerical aperture of the imaging system from the source coherence.
Six important physical properties of the intensity matrix will be discussed next. The first physical property is related to the fact that the only observable quantity is the intensity, which must be real and positive. This fundamental fact leads to the following mathematical property:
If a matrix is a Hermitian matrix, its elements have the following property
Next, let us consider the direct relationship between the elements of the intensity matrix and the observable physical quantity. The first relationship is:
ii. Diagonal elements of the intensity matrix are equal to the intensities sampled at the interval of , and the trace of the intensity matrix is equal to the integrated image intensity over the image plane.
The first half of the statement can be understood from Eq. (7) in which is 1 at the sampling point of and zero at other sampling points. For the second half of the statement, the integrated image is obtained by taking the integral all over the image plane; therefore it is proved by integrating Eq. (7) with reference to Eq. (5′),
The source coherence is contained in the property of the intensity matrix, which will be more explicitly expressed by matrix diagonalization with unitary transformation. If intensity is expressed as the inner product of vectors as in Eq. (21), let be expressed by basis vectors . These vectors are orthogonal, for example, , , and so on. With these basic vectors, vector is given by
Now, orthogonal transformation of the basis will change Eq. (21) into the simplest form. The transformed vectors must be eigenvectors of the intensity matrix. That is, the eigenvectors will satisfy the following condition
For to have non-trivial solutions, the determinant of the matrix formed by the coefficients has to be zero. The resulting equation is called the characteristic equation whose solutions are eigenvalues . If each eigenvalue is substituted into Eq. (26), we will have a set of simultaneous equations whose solution is the corresponding eigenvector to the inserted eigenvalue. By letting the components of eigenvector corresponding to eigenvalue be (), the square of is given by . Hereinafter, we normalize the square of to be 1. In addition, it is proved that the eigenvectors are mutually orthogonal.5 Therefore, the orthonormal condition can be written as
Let us return to our original intention. Suppose that the original basis vectors are transformed to eigenvectors . Then, vector can be represented by a new vector , i.e.,
If we denote it by the vector components
The coefficients in Eq. (28), i.e., the vector components obtained by transforming vector satisfy the following condition due to the orthogonality in Eq. (27)
If Eq. (31) is written explicitly,
The square of vector is
As vector is expressed in Eq. (29),
Because of the orthogonal property, we obtain . Thus, the matrix transformation used here preserves the square of the absolute value (norm) is not changed. This is proved by Eq. (27) in conjunction with Eq. (29). The constant norm property with Eq. (31′) proves
Equations (27) and (34) can be simplified with an identity matrix as
In general, transformation that satisfies Eq. (35) is called a unitary transformation and matrix is called a unitary matrix.
With the unitary transformation by matrix , we can examine how the image in Eqs. (7) or (21) is transformed. As in Eq. (28), vector is written by the linear sum of transformed basic vectors . Then,
If we use the property of eigenvalue and eigenvector in Eq. (25) and the orthogonality in Eq. (27), we obtain
Since Eqs. (31) or (31′) shows the explicit form of ,
If this result is written in the matrix form,
From Eq. (37), results in a diagonal matrix in which off-diagonal elements are zero. Letting the diagonal matrix be ,
Diagonal elements of the diagonal matrix are eigenvalues .
Next, because of Eqs. (35) and (38)
Among the results shown above, Eqs. (37) and (40) are useful in revealing the physical properties of the intensity matrix. Thus, Eqs. (37) and (40) will be examined in more detail.
iii. The image intensity is given by eigenvalues and eigenfunctions () of the intensity matrix as
When the light source is coherent, only one eigenvalue is non-zero. Therefore4.2, in special cases where the object brightness is uniform or object transmittance changes sinusoidally across the object plane, we can obtain Eq. (37) without the unitary transformation. When the object brightness is uniform, all eigenvalues have the same value.
As a result, analyzing how the eigenvalue distributes reveals the degree of coherence of the light source, which is involved in image formation. When eigenvalues are , the sum of all eigenvalues is equal to the trace of the intensity matrix and also equal to the integrated image intensity on the image plane. Let us consider the following quantity
If the light source is coherent, we always obtain and if the light source is incoherent with uniform intensity, takes the maximum value
The value of changes continuously from 1 to 0 as the light source gradually changes its coherent state from coherent to incoherent. The quantity is an interesting quantity because we are able to measure coherence with . Therefore, we define the degree of coherence using . (The author noted in Ref. 9 that can be used for an object with uniform transmittance or brightness. However, according to Eq. (18′), from a light source with is the same as from incoherent source. We may need to investigate this point more carefully.)
According to von Neumann,6 is given with the matrix as
Next, let us summarize the result of Eq. (40).
iv. A positive definite Hermite matrix of order , in our case the element of intensity matrix , is given by positive real numbers together with orthonormal vectors () as
The number of the variables to define the intensity matrix is determined by the eigenvalues and eigenvectors. The vector component is in general complex, so that it can be represented by . The number of the variables that determines the vector components is . As a result, the number of the variables that determines the intensity matrix of order is . Here, Eq. (27) defines the orthonormal condition, which is a set of
Note that the simplest case for the intensity matrix is obtained when the light source is coherent because we may set the eigenvalues 0 except the single eigenvalue . In this case, only one vector decides the matrix elements, so that the number of independent variables is
The number of independent variables is a fundamental quantity of the intensity matrix that depends on the coherence of the light source. However, the number of independent variables is not always meaningful for image intensity as is discussed next. Rather, the image intensity is determined by sampling points where is the number of sampling points for a coherent source. If the sampled values could be arbitrary, the situation would be simpler; however, they need to be the sampled image intensity expressed in Eq. (7), which is obtained by the intensity matrix. Equation (7) introduces correlation among sampled values. Thus, the property of the intensity matrix explained here is necessary to derive the amount of information of the image intensity, which will be explained elsewhere.
vi. Regardless of the coherence of the light source, image intensity in Eq. (7) is determined by the sampled values with a sampling step of , and the following relationship between the sampled values and the intensity matrix is derived
The sampling step is derived by the Fourier transform of the image in Eq. (7), in which the Fourier transform of has twice as wide as the bandwidth of the Fourier transform of . Outside of the bandwidth, the Fourier transform of is 0. The equation is explicitly shown below by referring to the appendix
If or ,
When , we will obtain the following.
Since the bandwidth of is limited to , the sampling theorem in Sec. 2 leads to the series expansion of Eq. (46).
In Eq. (46), rewriting the -th sampled value as ,
Also, the integrated intensity is rewritten using Eq. (23) as
The results presented in this paper are an expanded version of the presentation given at a symposium on “Application of the information theory to optics” organized by The Japan Society of Applied Physics on April 6, 1956. This paper explains the physical meaning of the intensity matrix introduced here. This work is just the beginning; we may need to work more on, for example, the application of the intensity matrix to calculating the amount of information, extension of the intensity matrix to two-dimensional imaging, evaluation of the change of the intensity matrix by phase difference, examination of the intensity matrix with aberration, and so on. For these cases, the author wishes to present at another opportunity.
In this paper, the discussion is limited to finite dimensions; however, for completeness of the study, it should be discussed with infinite dimensions. Therefore, we need to utilize the concept of Hilbert space, but the author was not able to reach this point in this short paper.
The author thanks Professor Hidetoshi Takahashi of the Department of Physics of the University of Tokyo and Associate Professor Kazuo Miyake of Tokyo University of Education for helpful discussions and Professor Hiroshi Kubota of the Institute of Industrial Science of the University of Tokyo, who encouraged the author to apply the information theory to optics and provided the related references.
Since the submission of the manuscript, the author has noticed that D. Gabor of the United Kingdom independently proposed how to generally express the image by a Hermitian matrix. [Information Theory, Third London Symposium, edited by Colin Cherry (1956, Butterworths Scientific Publications) 4. Optical Transmission by D. Gabor, pp. 26-33.]
Although the concrete derivation differs from the one introduced in this paper, the author appreciates that he reached fundamentally the same conclusion. Those interested may read his paper together with this paper. The author mentioned in this paper that the intensity matrix for a generalized case would be a very difficult problem. However, since then, the author was able to conclude that the intensity matrix can be generalized by matrix transformation of the object or can be defined with the existence of aberration in the imaging optics, and report these findings in a physics seminar at the University of Tokyo. The details will be published elsewhere. These are not exactly pointed out by Gabor. However, our concepts are in agreement.
Proof of Eq. (8).
Let the object complex transmittance be and complex amplitude of the incident light be , the Fourier transform of the light amplitude at the object plane is given by Eq. (1),
On the other hand, the light amplitude in the image plane is given by Eq. (4)
Substituting Eq. (1) into Eq. (4) followed by changing the order of integration gives
We obtain and by substituting , into above equation, and insert them into Eq. (7′) to obtain Eq. (8).
Substituting the intensity matrix of Eq. (15) into Eq. (7) gives
This result leads to the requirement of . This relationship can be obtained by the series expansion of . [See, for example, Magnus, Overhettinger, Formeln und Sätze für die speziellen Functionen der mathematischen Physik (Springer 1948) p.215.] Similarly, if the object brightness changes periodically, substituting Eq. (16) into Eq. (7) yields
On the other hand, if is derived by Eq. (14),
Since the above two equations are identical, we obtain the one we need.
Partially Coherent Illumination
If the object transmittance is uniform, the image intensity is obtained by putting from Eqs. (18′) and (18″) into Eq. (7)
If the illumination is incoherent, we have Eq. (14′) and when the illumination is partially coherent, we have
If we set and substitute in Eq. (8′) and in Eq. (17) into above equation, we will obtain the following equation with a help of the integration in Eq. (18)
Therefore, we obtain
If we set a variable ,
The integration can be decomposed into four terms each of which has the following result
To arrive at the above result, we have to choose an integration path such that is cancelled by a loop with an infinite radius. In addition, this integral has a pole on the real axis so that we have to take the Cauchy’s principal value. Note that if the residue on the real axis is , the result of the complex integral is affected by , where is the residue inside the integration path. [Whittaker, Modern Analysis p.117 (1935).] With these procedures, we obtain the integration shown in the beginning.
In Eq. (16), we need to calculate the following integral;
This integral can be carried out in the similar way as the one shown above, i.e., we have to evaluate complex integrals with an exponential function. As a result,
Equation (18) has the following definite integral,
Above results reduces the integration of Eq. (19) to a single integral to give the results as follows
Among the above four results, (iii) and (iv) would be able to be calculated by the results of 10.2 and 10.4. For (ii),
For (b) and (c), we can use the formula shown already. However, we need some more calculation for (a) and (d).
In the end, the four integrals are