Cervical cancer is one of the most common female cancers, second only to breast cancer,1 with a combined worldwide incidence of almost half a million new cases annually. It is the most common cancer of the female genital tract in India, with approximately 100 000 new cases occurring each year. This accounts for about 20% of all new cases diagnosed worldwide annually.2 Squamous cell carcinoma is the most common malignant cervical tumor, but the incidence of adenocarcinomas has been rising during the past few decades.3 Cervical cancer progresses in stages from dysplasia through CIN I, CIN II, CIN III, and finally to invasive cervical carcinoma.4, 5, 6, 7 At present, the Pap smear is the standard screening technique, and this is said to be responsible for a 70% decrease of cervical cancer deaths in advanced countries. However, false negative results are reported to be high (about 15 to 25%) for early detection.8 Over the last several years, Raman spectroscopy9, 10, 11, 12 and fluorescence spectroscopy 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 of tissues have been developed as useful techniques for the discrimination of normal/malignant conditions and early detection of cervical malignancy. These optical methods are fast, highly objective, can be applied repeatedly in vitro/in vivo, and give direct information on the biochemical changes as the cells become malignant.
Many molecular and functional changes precede and continue to take place at the cellular level in neoplasia before morphologic or other alterations happen.4 Many of the biochemical changes, such as immune responses and production of degradative enzymes/fetal antigens, will be reflected in blood circulation.23 It is also known that in many epithelial cancers, even in the early stages, some of the abnormal cells escape into the blood stream and migrate to different locations. Clinically useful cancer signature materials (biomarkers) should thus be observable in an easily accessible body fluid such as serum. It is therefore highly likely that the protein profile of serum samples can possibly provide a diagnostic tool for early detection of cervical cancer. Proteomics is increasingly recognized as a very efficient area for possible early diagnosis of many diseases, like cancer and hypertension, which remain clinically silent for long periods.24, 25
Though laser-induced fluorescence and Raman spectroscopy of tissue also have been developed for discrimination between normal and malignant conditions, protein profiling has some advantages over these spectroscopic methods. The spectroscopic methods are applied to tissue samples, which are always heterogeneous, and mistakes can happen by placing the probe at the wrong site. Biopsy is done by physical examination (colposcopy) and errors like “past pointing” are possible so that you may not be getting the sample from the correct location. Protein profiling is done on homogeneous samples (serum), sampling is easy, and the sample is always representative of the subject irrespective of from where it is obtained. Though laser-induced fluorescence (LIF) and Raman are quite fast (a few minutes per subject) compared to high performance liquid chromatography (HPLC) protein profiling , a major advantage of the HPLC-protein profile method is that it gives much more information on changes that take place during the various stages of the disease. LIF gives broad structureless spectra, and only limited information can be derived on biochemical changes that take place during the progression of the disease. Though Raman spectra is better in this context, we still cannot get detailed information, since all biopolymers of a given class (say, proteins) will have very similar Raman spectra and it is very difficult to judge whether changes have taken place in protein composition, etc.
Protein profiling and proteomics have been suggested for early detection of diseases that remain clinically silent for a long period of time.26, 27, 28, 29 Proteomics is the study of functional genomics at the protein level. Several techniques like 2-D gel electrophoresis, SELDI, MALDI-TOF, and protein chips are available in proteomics.24, 25, 29 However, all these methods have several disadvantages compared to the HPLC-based technique.30
High performance liquid chromatography combined with laser-induced fluorescence detection (HPLC-LIF) is a highly sensitive technique for separation and detection of complex mixtures of proteins or other biomolecules,26, 27, 28, 30 even at subfemto mole levels. We have, in the present work, employed HPLC-LIF for protein profile analysis of serum samples from normal healthy volunteers and cervical cancer patients. The results show that there are noticeable differences between the two types of serum samples, even at a visual inspection level. We have applied, to the best of our knowledge for the first time, the method of principal component analysis (PCA) to the protein profiles of serum samples. This study has shown that serum protein profiles provide an objective method for the diagnosis of cervical cancer. The results of our studies are presented and discussed.
Materials and Methods
High Performance Liquid Chromatography Laser-Induced Fluorescence System
An HP 1100 gradient HPLC system with G1322A degasser, G1311A pump, and a manual injector (model number 7725, Rheodyne, Perkin Elmer, Massachusettes, USA) coupled to a Vydac 219TP52 biphenyl reversed phase narrow bore column (diphenyl, , , ), California, USA, was used for the separation of proteins. The effluent from the column was sent into a capillary flow cell fabricated in our laboratory. The cell is made of quartz capillary ( I.D., O.D. Hewlett Packard, G1600-64311) connected to the column with appropriate sleeves (Upchurch Scientific, F-130X, Washington, USA). It is mounted on a precision mount for accurate positioning and alignment for excitation and collection of fluorescence. Fluorescence excitation of proteins in the sample was achieved by illumination with the laser emission from a frequency doubled laser (Innova 90C FreD, Coherent, California, USA), focused onto the capillary cell. The fluorescence was collected and focused onto the entrance slit of a monochromator (Jobin Yvon DH10 SPEX, New Jersey, USA), set at . The fluorescence was detected by a photomultiplier (Hamamatsu R 453, New Jersey, USA) operated at , coupled through a preamplifier (EG&G model 5113, Maryland, USA) to a lock-in amplifier (EG&G model 7265). The fluorescence was chopped with an EG&G model 651 chopper at the entrance slit at , for lock-in detection.
Sampling and Sample Preparation
Normal samples of serum were collected from volunteers who were judged to be clinically normal (with respect to cervical cancer), and age matched, as far as possible. All normal subjects did not have any diagnosed disease. The age of the normal subjects varied from 22 to 77. Malignant samples were collected from subjects diagnosed with cervical cancer at the Department of Obstetrics and Gynecology, Kasturba Hospital, Manipal, India. Samples from 25 normal and 33 malignant subjects were analyzed. All samples were collected after informed consent. Ethical clearance of the Institute Ethical Committee was obtained for this work.
Blood samples were transported to the laboratory immediately after collection. The samples were stored at room temperature in an upright position for about . The separated liquid portion was centrifuged at for . The serum thus obtained was subjected to HPLC immediately. If storage was necessary, serum samples were stored at in the deep freezer. They were passively thawed to room temperature just before use. It was seen that the chromatograms of normal samples did not show any changes, even when recorded after several weeks of storage in the deep freeze.28 Table 1 gives details of the samples.
Samples taken for analysis.
|1 to 25||22 to 77||Normal|
|26 to 37||37 to 62||Stage 2B|
|38 to 54||37 to 72||Stage 3B|
The samples were diluted times with HPLC grade water. of the diluted sample was injected into the HPLC system, which was fitted with a loop. The sample was then eluted under a gradient run with [water, 0.1% v/v trifluoro acetic acid (TFA)] 70 to 40%, [acetonitrile (ACN), 0.1% v/v TFA], 30 to 60% in , followed by 60 to 100% in . The 100% ACN run was continued for another , taking the total run time to . After each run, the column was regenerated with HPLC grade water 0.1% TFA for .
To facilitate intercomparison of protein profiles recorded over several months, a rigorous protocol was followed for data analysis. All the chromatograms were subjected to a background correction with a polynomial fit to remove the background fluorescence. Background in the chromatographic runs comes from several sources like laser light scattered from the walls of the capillary, Rayleigh scattering from the eluent, PMT dark count, fluorescence of the acetonitrile-TFA complex on excitation with , fluorescence of biomolecules that are not effectively separated by the biphenyl column, etc. All these contributions remain more or less the same from run to run. The background thus has the same shape from run to run, starting with about (30% acetonitrile-TFA) in our case and increasing to about at the end of the run (60% acetonitrile-TFA). The background variation is a continuous function (no sudden discontinuities) and so can be expressed with a polynomial of the form, terms (negligible), where is the time of observation (retention time). We can give four or more points on the chromatogram, at places free from sample contribution. These points can then be fitted to the polynomial to derive the constant coefficients , , etc. The background can then be calculated at any point of time and subtracted from the total signal at that time to give the background-free chromatogram.
Minor shifts in peak positions from run to run were corrected by calibration of all the chromatograms along the time scale, using mean values of protein peaks of species like transferrin, human serum albumin (HSA), creatin kinase, etc., common to all samples. Presence of these species in all samples provided convenient internal standards for calibration of all chromatograms to the same time scale across the entire range. The chromatograms were then normalized with respect to an HSA peak at . Because of the relatively constant intensity of this peak in all samples, it has been used for normalization of all chromatograms for comparison purposes.
To see whether information could be obtained on the changes in serum profiles, in a first step, analysis of the HPLC peaks were attempted using curve resolution techniques. For this, band shapes were decided using single peaks fit with different types of functions (Gaussian, Lorentzian, Voigt, etc.), and the function that gave the best fit for the single peaks (in the present case, Gaussian) was used for the other regions also. Difference chromatograms were also computed to identify any significant differences in the protein profiles of different classes of samples.
For precise classification of serum protein profiles by an objective mathematical model, the chromatograms of a sufficiently large number of samples (15 to 30) were subjected to PCA (PLS PLUS/IQ software, Galactic Corporation, Salem, New Hampshire). In our method of PCA, the mean of all samples in the dataset is first formed. The differences of each sample from this mean are calculated to give the variations of each sample from the mean. With samples each having data points, we thus get an matrix of these variations. Because all the samples contain more or less the same components (HSA, transferrin, immunoglobulins, etc.), the large amount of data can be represented by a much smaller set of component peaks, their contributions to the chromatogram varying from sample to sample, depending on their concentrations. In matrix language, this implies that the matrix of variations discussed previously is highly redundant. It will have only a few nonzero eigenvectors (principal components), and the eigenvalues will rapidly come down to almost zero after the first few. Solving the eigenvalue eigenvector problem gives us the principal components (factors), percent variance (contribution of the factors to the variations in the dataset), and scores of factors for each sample. The scores for a given sample correspond to the contribution of each principal component to the variation of that sample from the mean. It is therefore possible to simulate the original chromatogram of any sample by multiplying the eigenvectors with their respective scores for that sample and adding these products to the mean of the dataset.
Runs were made with 20, 15, 12, 9, and 7 factors. Seven factors were found to contribute to more than 95% of variance for the combined data. An analysis of the combined data, though it gives very good discrimination between normal and malignant cases (see below), has several drawbacks. To overcome these draw backs, PCA was finally done with protein profiles of standard sets for routine diagnostic applications, discussed in detail later.
Results and Discussion
Figure 1 shows typical protein profiles of serum samples from subjects judged as clinically “normal.” A visual inspection of all normal samples shows more or less the same pattern, irrespective of age, physiological state (menopause, pregnancy, married, with/without children, etc.), life style, or social status.
Figure 2 shows the results for typical cases of malignant subjects. The malignant samples show noticeable changes from normal, in the peaks around region. The main differences are 1. a shoulder is observed for peak in some of the cases, and 2. new peaks are observed in the region in some cases. These differences between normal and other classes of chromatograms can be seen better by looking at difference spectra, discussed later. From the prior discussion it is seen that even a routine protein profiling of serum may give an indication of possible cervical malignancy.
Many groups31, 32, 33 have used PCA scores of Raman/fluorescence spectra of normal and malignant tissue samples combined for discrimination between normal and malignant conditions. We have shown9, 11, 18 that a better approach for classification of tissue spectra as belonging to normal, inflammatory, premalignant, or malignant is to use the technique of matching several parameters from PCA for test samples to corresponding parameters of certified calibration sets from each class of samples. To decide whether such a situation exists for protein profiles by HPLC also, and to optimize conditions of PCA for discrimination, PCA was run first with all the samples combined, irrespective of whether they belong to the normal or malignant group. The PCA analysis was performed by taking 12 factors, as mentioned earlier. Figure 3 shows the variations in eigenvalues and percent variance with increasing factor numbers. From Fig. 3 it is clear that seven factors are adequate to represent the data, since they contribute up to more than 95% of the total variance in the dataset.
The 58 samples shown in Table 1 were used in this PCA. The sample number versus score plot is shown in Fig. 4 for scores of factor 1, which is contributing 63% to the variance. It is seen that the normal and malignant groups form clusters falling in different ranges of factor 1 scores except for very few exceptions. Scores of 22 out of 25 normal samples lie on the positive side of the plot and three samples are lying on the negative score value. Similarly, 70% of the malignant samples have their scores on the negative side of the plot. So it is clear from the plot that the score values can discriminate to a very good extent between normal and malignant samples.
It may be noted that PCA determines the factors necessary to express the data to a desired level of accuracy. It is therefore possible that factors other than the first or a combination of two/three factors may give better discrimination. However, in the present case this has not happened. (Data are not shown.) We have done cluster analysis of serum samples also.34 The plot of scores of PCA is similar to cluster analysis. But in our method of PCA, we can simulate a chromatogram of any sample (e.g., stage 1) with factors of any calibration set (e.g., stage 2), thus accounting for all stage 2 components in the stage 1 sample. If we now subtract the simulated chromatogram of the sample from its actual chromatogram, we can get more correct information, even on small differences between the two, since the simulation will account only for amounts of components that can be generated by factors of the stage 2 calibration set. These will be canceled on subtraction, while all peaks not produced by simulation will show up.
The simple approach of using scores for discrimination of sample types has several disadvantages. To begin with, it may not discriminate very well between normal and premalignant types (or between premalignant and malignant classes), because of possible closer similarities between chromatograms of such two classes of samples. Second, in a PCA with all types of samples, the factors have to account for all chromatograms, and so even factor number 1 may contain contributions from all types: normal, premalignant, and malignant. The discrimination between types will be thus diluted considerably. Third, in a PCA with all types combined, the results can become weighted more strongly toward that class that has more representation in it, and so may not give a correct picture for the other types of samples.
In view of this, we have developed a more reliable approach for diagnosis, combining all the information that is available from a PCA. The third property mentioned before tells us that if PCA is done with only one class of samples, say normal, then the factors will be weighted highly for that class and will correctly account for only protein profiles of that class. Profiles from any other class of samples will be rejected as not belonging to that class with a high degree of probability.
A much more important aspect of such an approach is that, like in any analytical technique where standards with calibration curves are used for routine analysis, profiles of a set of clinically/pathologically diagnosed samples can be used as a standard calibration set. This standard calibration set can be subjected to PCA to derive parameters that will be highly characteristic for samples of that type. Any test sample can then be added to the set and the corresponding parameters for the test sample can be compared to the mean parameters for the set to decide whether the test sample belongs to that set, and if so, with what statistical probability.
A third advantage of this method is that sets of standard calibration profiles can be prepared for each class—normal, premalignant (CIN I, CIN II, CIN III, etc.), and malignant (different stages). Any test sample can be compared against each of these calibration sets and a decision can be made more accurately, depending on with which set it matches best.
It should also be mentioned here that calibration sets can be prepared in a main hospital, where qualified clinicians/pathologists are available and a large number of subjects may be examined in a reasonable period, say, in a few months to a year. These calibration sets can be supplied to any small hospital/clinic, and clinicians there can carry out objective diagnosis on their own with a blood sample routinely collected.
The possible errors that may arise from visual examination (fatigue factor in Pap smear and inexperience in colposcopy) are reduced in optical methods entirely based on recording of an optical signal and analysis of the data with standard mathematical procedures by a computer. No visual decision making is involved, and the system (HPLC instrument system plus a computer) is completely blind as to what sample is given for analysis.
Our diagnostic approach thus consists of the following steps. Record protein profile (chromatogram) from a reasonably large number of subjects who are clinically/pathologically diagnosed as normal, premalignant, or malignant. Take all the profiles from samples, clinically/pathologically diagnosed as belonging to one class, and do a PCA. Determine the number of significant factors required to represent that class. Check and remove any “outliers.” (In practice, a very small number, one or two, in a given class may stand out as not belonging to that class, for reasons such as instrument malfunction, ill-defined disease condition, etc.). From the set, randomly select 15 to 20 profiles to form the standard calibration set. Do PCA with this standard set to determine the scores of significant factors and other parameters. Test every member of the set by rotating them out one at a time for membership of the set within a desired range of standard deviation of the chosen parameters. Reject any that do not match to form the final calibration set. Do this for every class, where sufficient numbers of clinically/pathologically certified samples are available. It is helpful to remember that, since usually a maximum of 4 to 6 factors are quite sufficient to determine more than 95% of the variations from the mean for a given class, if a minimum of 10 to 12 chromatograms are available in a class, they are enough to form a calibration set. Once calibration sets are prepared for the different classes of samples, any test sample profile can be matched against each of the calibration sets. The sample is diagnosed as belonging to that class with which it matches best.
In addition to the scores and spectral residual, a statistical measure known as the Mahalanobis distance is also used for the match-no match test described above. This is calculated as the distance of the test sample point as measured from the mean of all the remaining points in the class. The distance is scaled in units of standard deviation for the range of variation in the class in all dimensions, and then used to assign a probability weight to the sample in terms of standard deviation.
Any sample that lies outside a desired range of standard deviation from the mean can be considered to be out of the group. The range can be decided by the clinician who can place a cut-off range. In Tables 2 and 3 a sample is considered “no match” if the Mahalanobis distance is greater than 2.5. Since is in units of standard deviation, a value of gives a probability of less than 1% of that sample belonging to the set with which it is matched. If the clinician wants a less rigorous cut-off, a higher value (e.g., 3) can be given that gives a probability of less than 0.1% for that sample to be out.
The Mahalanobis distance matrix equation isis an Mahalanobis matrix, is an matrix of training sample PCA scores, is the number of samples, and is the number of PCA factors. The Mahalanobis distance for a test sample is then given by is the square of the Mahalanobis distance in terms of standard deviations of the set, represented by .
There is one problem regarding the discrimination purely based on PCA scores. It is that any “impurities” that are in the unknown spectra, but were not present in the training samples, will not appear in the score calculations. To overcome this problem and get more specific diagnoses, residuals can be included in the vectors. The purpose of the protein profiling technique thus is to classify any test sample as belonging to only one of the well-defined sets, without or with limited prior knowledge. For this, standard calibration models are built from profiles of sets of samples, clinically/pathologically certified as normal, premalignant, and malignant-different stages. The parameters defined earlier—Mahalanobis distance, scores of factors, and squared residuals—are then used to discriminate between the sample types. This gives a very sensitive discrimination and shows how well an unknown test sample matches or does not match with any given calibration set.
It is quite reasonable to expect (and this has been verified experimentally) that all normal samples will more or less give the same profile. On the other hand, it is possible that malignant samples in different stages (stages 2, 3, etc.) may differ slightly from each other depending on the stage of the disease. However, it is highly probable that all such samples will be quite different from normal. Also, it is very likely that they may have some common features in their profiles because of conditions common in malignancy. Therefore, if all malignant samples are clubbed together as a class separate from normal, it is possible that in PCA some factors will contribute to variations common to all of them, while a few other factors may take care of intragroup variations. As mentioned earlier, it is also to be noted that in a mixed sample lot, the factors will contribute to common variations with weights corresponding to number of members in the subgroups. Hence if we have a sufficiently large number of samples of different stages, the factors will adjust to represent all members of the set. To test this hypothesis in the present work, we first formed standard model calibration sets of normal and malignant groups by randomly selecting 15 samples respectively from the normal and malignant groups, disregarding the stage of malignancy.
PCA was then done first with the normal calibration set. All normal and malignant samples were then matched with this calibration set. The 15 normal samples forming the standard set were tested for match/no match by rotating them out one by one (retrospectively), and the remaining ten normal samples were tested against the standard set as unknown samples (prospectively). It is seen that all the normal samples matched with the normal standard set. This corresponds to specificity [true negative/(true positive)] of 100%. When the 33 malignant samples were matched against the normal standard set, only two malignant samples (29 and 33) showed a match. The result is thus still better than that seen from the score versus sample number plot (Fig. 4), where nine malignant samples were in the range of the normal cluster. The match/no match test was then performed with the malignant standard set. It was seen that all normal samples did not match with the malignant set. Samples 29 and 33, which matched with normal, matched here also. Only two malignant samples, 44 and 53, did not match with the malignant set. The sensitivity (true positive/true negative) is thus about 94%. These results are shown in Table 2 .
Discrimination analysis with normal and malignant standard set.
|Discrimination analysis with normal standard set|
|Sample||Match||M distance||Limit tests||Specresidual|
|1 to 25||Yes||0.63 to 2.3||Pass||1.83 to 4.03|
|26 to 28||No||5.97 to 8.07||Fail||7.70 to 10.26|
|30 to 32||No||3.96 to 5.34||Fail||6.02 to 7.52|
|34 to 58||No||2.58 to 14.36||Fail||3.61 to 19.45|
|Discrimination analysis with malignant standard set|
|Sample||Match||M distance||Limit tests||Specresidual|
|1 to 25||No||2.86 to 11.95||Fail||4.00 to 18.16|
|26 to 43||Yes||0.62 to 2.27||Pass||0.33 to 5.12|
|45 to 52||Yes||0.74 to 1.48||Pass||0.88 to 4.01|
|54 to 58||Yes||1.04 to 2.13||Pass||1.12 to 5.12|
As mentioned earlier, to observe the changes from one stage to another, the difference chromatograms were calculated for serum samples from different stages. The changes from normal serum for the different types of samples are shown in Fig. 5 . The differences here are given by mean of calibration set of normal samples—chromatogram of mean of any given class simulated with the normal calibration set. In Fig. 5a, we have the mean of all normal samples as the test sample and the differences are practically zero (less than ±0.000 000 03), as it should be. The difference chromatograms for the different stages, on the other hand, show that there are noticeable differences of fairly large magnitude from the mean normal for all the malignant samples. The important point to note is that in addition to the large differences from the mean normal, the mean profiles for different stages of the disease showed appreciable differences between them also, as seen from the peaks marked with their retention time.
Discrimination between the Stages of Samples
To see whether discrimination of different stages of samples is possible, we carried out match/no match PCA with calibration sets of stages 2 and 3 samples for which a reasonable number (13 and 17, respectively) of pathologically certified samples were available. PCA analysis was carried out using two samples of stage 1, 13 of stage 2, 17 of stage 3, and one of stage 4. With ten samples of stage 2 forming a calibration set, all stage 2 samples and also the stage 4 sample showed a match. All stage 3 and stage 1 samples showed no match. Out of 17 stage 3 samples, 12 samples were taken to form a calibration set. It is found that, except for two samples (29 and 33), all 11 samples from stage 2 did not match with the calibration set. Three samples (42, 44, and 48) from the stage 3 group were also not matching with the calibration set of stage 3. This preliminary study shows that it may be possible to discriminate between different stages of the disease by protein profile analysis. Table 3 shows the match/no match results. Figure 6 shows the spectral residual versus M-distance plot in this case. It can be seen that even with two parameters, the residual and Mahalanobis distance, there is a clear separation of stage 2 and stage 3 samples when either of these calibration sets is used to classify samples. The samples of the calibration set all cluster together, while samples from the other stages are widely separated from the calibration set.
Discrimination analysis with stage 2 and stage 3 malignant standard set.
|Discrimination with stage 2 malignant standard set|
|Sample||M distance||Limit tests||Spec residual|
|26 to 37||0.72 to 1.11||Pass||0.02 to 1.03|
|38 to 54||3.87 to 37.91||Fail||1.76 to 13.56|
|56, 57||8.80 to 14.61||Fail||4.0 to 5.9|
|Discrimination with stage 3 malignant standard set|
|Sample||M distance||Limit tests||Spec residual|
|26 to 28||2.95 to 16.34||Fail||3.98 to 17.22|
|30 to 32||2.96 to 4.21||Fail||1.74 to 5.10|
|34 to 37||2.92 to 10.61||Fail||0.67 to 11.67|
|38 to 41||0.79 to 1.17||Pass||0.62 to 10.50|
|45 to 47||0.83 to 1.18||Pass||0.61 to 5.81|
|49 to 54||0.75 to 1.65||Pass||1.07 to 19.06|
|55 to 58||2.77 to 10.61||Fail||3.97 to 11.67|
Curve Resolution Studies
To see whether additional information can be obtained on the different stages of cervical cancer serum, we carried out curve resolution studies of the regions involved. The peaks were resolved using the Gaussian function. The two unresolved peaks from 1300 to 1800 regions were resolved by this method. In stage 4 samples, additional peaks were observed. These curve fit results are shown in Fig. 7a for the normal sample, and the results for the samples in different stages are given in Table 4 . The data in Table 4 illustrate the very good reproducibility in the different chromatograms for similar peaks, as indicated by the peak positions, intensities (heights), and half widths for many of the peaks. It may be noted that as a result of the calibration procedure, the variation in peak positions from run to run is minimized. For example, for the region peaks, the peak positions for the four different types of samples vary only about ± , less than 0.04%. The area under the curve corresponding to the third peak has decreased continuously in going from normal to stage 4 (Table 4).
Curve fitting for values for the protein profile of the serum samples. Peak numbers 1 to 4 (region 1300to1800s ) and 5, 6, and 7 (region 2300to2600s ).
Similar curve fit results for the other regions are shown in Fig. 7b for the normal sample, and the results for the different stages are given in Table 4. These peaks also showed noticeable differences between the different stages. From Table 4, for the peaks 5, 6 and 7, it is clearly seen that the half-width and peak intensities change for many component proteins, even when their retention times are relatively unchanged. These variations provide the basis for discrimination in PCA. In our method, we are doing the decision making based on principal component analysis (PCA) and match/no match of the parameters derived from PCA. The elution time, peak height, and width are built into the factors derived from the PCA. The parameters used include scores of factors, and chromatogram-chromatogram simulated using PCA factors and scores).2 The elution time, peak height, and width automatically come into the scores of factors and simulated chromatograms. It is obvious that all these data are thus used in the analysis and decision making.
In view of such differences between normal and different classes of serum, it may be tempting to use the corresponding proteins as tumor markers for diagnosis, prognosis, and follow up in therapy. However, it is well recognized4, 35 that many tumor markers are useful only as indicators, and do not provide unambiguous diagnosis because of the possibility that they may also be present under other conditions, like pregnancy. On the other hand, the present method, which makes use of the entire protein profile, will be much more reliable since it uses all observed variations in the sample.
The minor discrepancies observed between the pathological and HPLC-LIF protein profiling results can arise from several factors. These include: (1) experimental variations (errors) in pathology and HPLC-LIF, (2) data processing errors, and (3) inherent variations from sample to sample.
The experimental variations in pathology can come from sampling errors like “past pointing” in colposcopy,36 fatigue factor in examining large numbers of slides, or the inexperience of pathologist. But in the present work, both calibration standards and test samples of malignant conditions consist of samples diagnosed by biopsy and pathology. Since the pathology results are based on morphological changes, when they are diagnosed as malignant, it is highly unlikely they can be in error. Variations in HPLC-LIF can come from blood collection and serum preparation procedures, storage effects, contamination, or errors in HPLC runs. However, the fact that all the normal samples matched with one another very well, even though the samples were collected and run over a period of more than a year, and different samples were stored for periods ranging from a few hours to weeks, suggest that experimental variations in HPLC-LIF may not be responsible for any discrepancies.
In many developing countries, where advanced medical facilities are restricted to major towns/hospitals, what is required is a system usable as a screening method for susceptible populations. The system should be able to perform as a diagnostic tool for suspect cases, using readily available samples, the quality of which will not depend on the personal experience of the examining physician. It should also be able to do follow-up without the need for repeated biopsy and pathology. The system has to be fairly rugged, relatively fast (reasonable number of samples per day), usable by a clinical technician (to be used in a single clinician/surgeon’s office or in a small hospital in a village), and should have specificity and sensitivity comparable to currently available techniques.
We feel that the present technique meets all these requirements and can be used in a routine manner in small clinics and hospitals without the need for qualified pathologists to examine cytological smear and biopsy tissue samples. Also the technique, highly objective and operator-independent, will serve as a very good complimentary method for conventional screening techniques, reducing errors due to “fatigue factor,” sampling errors, any lack of experience of the pathologist, and consequent subjective diagnosis. The protein profile study of the cervical and normal serum samples contains much information on the structural or conformational changes occurring within the proteins. We have seen that significant changes occur between the protein profiles of normal and malignant samples. These changes may be due to changes in either structure or conformation produced during the process of carcinogenesis. The analysis of 58 normal and malignant samples using PCA was found to give very good results, with 100% specificity and 94% sensitivity.
Analysis of protein profiles of serum by the HPLC-LIF technique provides a method for screening for early detection of cancer. The method is minimally invasive (only routine blood sample required), relatively fast ( /sample), highly objective (no visual observation or personal judgment), and can be used even by a technician in a small clinic/hospital. It can be used for initial screening, follow-up in therapy, and early detection of any recurrence, and can be applied repeatedly, since no biopsy is involved.
The work was done under the project “Study of the protein profile of the clinical samples for the early diagnosis of female cancers,” Department of Science and Technology, Government of India, Project number SR/S2/LOP/05/2003.