New remote sensing image fusion for exploring spatiotemporal evolution of urban land use and land cover

Abstract. An evaluation of land use and cover change is a vital component of any study into climate change, ecological evolution, and human civilization’s long-term growth. Remote sensing image data-based land use and cover change (LUCC) research has become an essential and frequently utilized approach. Given the scarcity of high spatial resolution imagery in urban remote sensing, as well as the low accuracy and efficiency of urban land use classification, a new satellite image fusion methodology defined as nonshear wave transformation, a pulse linked neural network, and intensity–hue–saturation theory are suggested. From 2000 to 2020, the upgraded convolutional neural network approach is used to classify fused pictures and perform an in-depth investigation of the spatiotemporal evolution features of urban LUCC in Zhengzhou, Henan, China. According to the findings, the extent of urbanized land in Zhengzhou has expanded dramatically during the last 20 years. The share of urbanized land has risen from 9% in 2000 to 22% by 2020. The comprehensive dynamic degree and single dynamic grade of land use display varied features in different areas and counties; the comprehensive index of the extent of land use demonstrates more evident regional disparities. The research findings can expose the man-land system’s inherent conflicting interaction mechanism and give data to promote urban-related research.

The following are the main research concepts of this work, based on the results of the previous study. Using NSCT and PCNN theory, we provide a method for RS data synthesis by combining satellite images from several Landsat series over the Zhengzhou area between 2000 and 2020. Using rate band fusion, the original RS images can be found in a new way. This leads to an image data collection with high spatial resolution and better preservation of the information from the original multispectral band spectrum; classify the fused imagery using the convolutional neural network approach; integrate socioeconomic data over time with the spatial statistical analysis capabilities in ArcGIS; and conduct in-depth research on the unique characteristics of Zhengzhou's land-use change over time and space. The results are also meant to be used as scientific building blocks for future research on how the human-land-environment system works, urban land development planning, and sustainable urban development.

Study Area
Zhengzhou (34°44′N, 113°37′E) is the metropolis of Henan Province and the province's economic, political, and cultural hub. Located in northern Henan Province, its borders are the Yellow River to the north, Mount Song to the west, and the huge Huanghuai Plain to the southeast. Zhengzhou, also known as Shang Capital, was an important hub in the development of Chinese culture and is currently recognized as a member of the World Historical Urban Partnership. It is in the northern temperate zone and has a continental monsoon climate with four distinct seasons. Figure 1 shows a schematic representation of the study field for this article.

Data
The major data source for this study was the Landsat series of satellite observations made accessible by the US Geological Survey website, spanning seven time periods. Table 1 shows this to be true.

Methods
This work combines the advantages of intensity-hue-saturation (IHS), NSCT, and PCNN in picture fusion and then presents a satellite picture merging methodology based on the multispectral and panchromatic band characteristics of Landsat data. It improves the images' spatial resolution and keeps the multispectral bands' original spectral information intact. The multispectral image underwent an IHS transformation, and then NSCT was used to disassemble the panchromatic band and isolate the I component. Several fusion rules were then employed to account for the changes in characteristics of the decomposed lower and upper-frequency coefficients. Integration of high-frequency coefficients was accomplished using the PCNN model's rules, which are based on an improved PCNN, whereas low-frequency coefficients were combined using rules derived from fuzzy logic. Lower and higher frequency sub-band components are reconstructed using the NSCT inverse transformation, and combined images are generated using the IHS inverse transformation. Using the National Standard for the Categorization of Land Utilization Status in the People's Republic of China (GB/T21010-2017) and the actual land cover situation in the study area, we classified the land use types in the study area as either arable land, wooded land, urbanized land, water, or some other type of land. The convolutional neural network technique was used for the classification process. The overall accuracy of the classification was 95%, meeting the requirements of the study. You can see the development of this product's technology in Fig. 2.   33 presented the notion of NSCT transform theory in 2006. Its central idea is to break down pictures at numerous scales and orientations using a nonsubsampled pyramid transform (NSPFB) and a nonsubsampled directional filter bank (NSDFB). An NSPFB performs multiscale picture segmentation to guarantee that the image modification has multiresolution features. To guarantee that the image transformation has multidirectional properties, an NSDFB disintegrates sub-band pictures at different scales in the direction, yielding images with varying scales and orientations. One lower-frequency picture and P K k¼1 2 l k upper-frequency pictures could be obtained after K-layer decomposition, where l k is the directional decomposition degree of the K'th layer. An NSCT has several desirable properties, including those listed above (multiscale, outstanding spatial and frequency-domain local characteristics, and multidirectional attributes), as well as translation invariance and uniformly sized sub-band images. The original image's edge contour and contour information are preserved more faithfully as a result. Multisensory images rich in detail and orientation cues may be conveyed using texture detail information. The schematic representation is shown in Fig. 3.

Fuzzy logic
The conventional multiscale picture fusion method uses equal weights to combine the foreground and background images during the fusion process (the coefficients). When it comes to human vision, however, both the central and peripheral regions, as well as the individual pixels in multi-image fusion, are of varied degrees of interest. Fuzzy logic can be useful for resolving the uncertainty caused by the ambiguity present in the fusion process. For appropriately creating fuzzy conceptions based on membership functions that are forgiving of incorrect input, fuzzy logic is a convenient and flexible tool. Unlike fuzzy sets, which can share territory, exact sets often require a choice between two possibilities and have no shared ground. There is ambiguity in the realm of image fusion between the matching pixels of distinct target scenes and the total image. Fuzzy sets are useful for conveying fuzzy notions that cannot be quantified. This uncertainty may be quantified by creating fuzzy membership functions. The fuzzy membership function is described as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 1 8 3 (1) where U denotes the universe of discourse (that is, the set of objects), x indicates the element in the universe of discourse, A is the fuzzy set in U, and u A ðxÞ symbolizes the degree of membership of A, with a value ranging from 0 and 1. Gaussian, generalized bell-shaped, and triangle membership functions are often employed. In this work, we use the Gaussian membership function to the image fusion rules as a weighting function. The Gaussian membership function may be written as where u is the function's center and σ defines the function's breadth, which is normally positive.

PCNN modeling
Eckhorn 34 advocated using PCNN to describe how visual cortical cells in the brains of cats and other small animals interpret visual signals. This technique has been used to a variety of image processing tasks, including segmentation, edge extraction, target identification, picture fusion, and others. 35 Connecting many PCNN neuron branches together creates a feedback network, which is the neuron model of a PCNN. Every neuron has its own specialized divisions, including a receptive field, a modulation element, and a pulse initiator. Since the depth and breadth of application of a standard PCNN model are constrained by a number of factors, this study makes use of the commonly used modified PCNN model. 36 The equation in question looks like this: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 5 4 3 where ði; jÞ indicates the neuron or pixel coordinates, n denotes the number of repetitions, S ij is the external input excitation signal, this represents the gray level of the ði; jÞ'th pixel in the image, and W ijkl indicates that nerve in the matrix of connection weight coefficients between elements, F ij , L ij , θ ij , and U ij indicate neuron feedback input, link input, dynamic threshold, and internal activity items, respectively. Furthermore, V L and V θ are the amplitude coefficients of the threshold function of link input and change, respectively. The temporal constants of the link input and variable threshold functions, respectively, are α L and α θ . Meanwhile, β represents link strength, and Y ij represents neuron output; when Y ij ðnÞ output is 1, this means that the pixel ði; jÞ ignites once. When PCNN is used to process pictures, it is a two-dimensional single-layer network. The number of network nodes is proportional to the count of pixels, and each neuron has a one-to-one interaction with each pixel.

Low-frequency component fusion rules
During the picture fusion step, the relationship is unclear since images are conveyed using a multi-to-one mapping strategy. Even more, the image's blurriness will be exacerbated when the contour information and the noise in the image get entangled. This low-frequency region of the image reveals the image's average and contour features and reflects the approximation information used to create the image. An appropriate set of fusion rules must be implemented to handle the mystery of their connection. Knowledge in the low-frequency sub-band is best explained by the Gaussian membership function. To address this issue, the Gaussian membership function was implemented as an adaptive weighting function in the combiner designed for the low-frequency band in this study. In mathematical form, it looks like this E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 6 ; 1 1 4 where Lði; jÞ denotes just the lower incidence sub-band factor, u and σ are the median and deviation of a source picture's low-frequency sub-band photo, respectively, and k is the Gaussian function adjustment parameter, which is the extreme value obtained as an empirical value by the control variable method, which is k ¼ 0.8 The fusion rule for low-frequency coefficients is as continues to follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 6 ; 6 7 5 where L A ði; jÞ and L B ði; jÞ indicate the low-frequency sub-band factors of pictures A and B, and L F ði; jÞ denote the fused low-frequency sub-band coefficients.

High-frequency component fusion rules
Sub-bands at higher frequencies in the image reflect the full breadth of the image's information.
Common fusion methods focus on the attributes of single or groups of pixels. When analyzing data, it is common for valuable context to be lost when using aggregate measures such as gradient, variance, maximum absolute value, and regional energy total. Upper-frequency factor fusing is centered on the PCNN approach to better fuse the properties of the source picture. It is common practice for PCNN fusion implementations to use the gray value of a single pixel as feedback input, ignoring any possible association between neighboring pixels. Adding up the local Laplacian energy between neighboring pixels is a good way to evaluate the edge information in an image since it quantifies the amount of difference between them. In this study, the input to the PCNN model was the improved local Laplace energy sum of the high-frequency coefficients. The sum of all local Laplacian energies is defined as where s is the coefficient or the variable distance between pixels. The weight of the connection channel in the internal activity item and the ignition cycle of the central neuron may be adjusted based on the strength of the link, which represents the change in the coefficients. It is crucial to the fusion of images that it allows values between zero and one. Each neuron's connection strength in the traditional PCNN model is set by trial and error. Assuming that all neurons in the human visual system have the same connection coefficient value is irrational. The input neuron's value should change as a function of the neuron's placement, and the connection coefficient should reflect this. 37 Commonly used to adaptively tune the strength of connections between PCNN neurons, pixel definition takes into account factors including spatial frequency (SF), direction information, and standard deviation. Based on how people's brains process visual information, the SF reveals specifics about the image's local features and finer details. Although this is the case, important visual details are lost in the first SF because of its failure to account for the direction in which the image is moving. The modified spatial frequency (MSF) is determined by beginning with the original SF. Adding the gradient energies of the two diagonal (DF) images to those of the horizontal (RF) and vertical (CF) images yields the total gradient energy.
Image information is retrieved precisely and may be used to assess image clarity or activity level. The enhanced spatial frequency (MSF) is being used as the link strength of PCNN in this work. For M × N image blocks, MSF is defined as follows: The

Fusion results
To illustrate the effectiveness of the strategy, this research uses the fusion of multispectral bands and panchromatic bands in a single image acquired by the Landsat Enhanced Thematic Mapper over a specific area in Zhengzhou. MATLAB 2014a was used to generate the 400 × 400 pixel picture. The experiment employed a 9-7 filter for scale decomposition, a dmaxflat7 filter for direction decomposition, and stages [0,2,3,4] for direction decomposition; the PCNN model parameters were as follows:

Visual evaluation
The visual fusion results in Fig. 4 demonstrate that the proposed method's image fusion effect (f) effectively preserves the majority of the information in the multi-spectral image, and the local details are also significantly enhanced. The end results (f) are better than what can be obtained using other techniques. The (b)-(e) approach in Fig. 4 shows that while photos merged using methods (b) and (c) transform exhibit glaring spectral distortion, images fused using methods (d) and (e) maintain more spatial information of the source image.

Objective evaluation
For image fusion, the average gradient, SF, and standard deviation are frequently employed as quality indicators. If the entropy of the combined image increases, then more data will be sent. Objective indications of the fusion effect generated by different approaches are shown in Table 2.
The lack of statistically significant difference in information entropy among the methods is shown in Table 2, indicating that they are all capable of acquiring useful knowledge about the source image. The average gradient and SF indices, which are good measures of image clarity, also fare well in the algorithm proposed in this paper.

Classification result
In this study, we utilize a convolutional neural network to categorize the combined image data from Zhengzhou. The diagram of the network's structure is shown in Fig. 5. Seventy percent of the data set was utilized for training and the other 30 was used for testing in this experiment. Figure 6 shows the shifts in the classification model training metrics of accuracy during training, accuracy during verification, training loss, and verification loss. From Fig. 6(a) and 6(b), it can be observed that the loss of the model reduces to minimum after 20 iterations, reaches rapid convergence, and remains essentially stable, showing that the classification model has thoroughly absorbed the properties of the sample. The Kappa coefficient, average accuracy, and overall  accuracy after classification are all higher than 95%, which meet the requirements of further research. The results of these classification efforts are shown in Fig. 7. Because of space limitations and the absence of a compelling reason to display the whole data, we only provide partial years of classification results.

Quantity of land use
As shown in Fig. 8, Zhengzhou's urban footprint has grown substantially over the past two decades, from 9% in 2000 to 22% in 2020. This has led to a major decrease in the amount of farmable land in the Zhengzhou area and an overall weakening of the region's water systems. The area of land covered by forests fell between 2000 and 2006, a period of time considered to be a deforestation crisis. Loss of forest cover was rather large, estimated at 52%, and followed a pattern of rapid expansion followed by slowing depletion over a 7-year period. The total area of forested land has stabilized at a somewhat same size since 2013. Figure 9(a)-9(l) shows the changing percentages of land usage in several Zhengzhou districts and counties from a to l through time. These districts and counties include Dengfeng, Erqi, Gongyi, Guancheng, Huiji, Jinshui, Shangjie, Xinmi, Xinzheng, Xingyang, Zhongmu, and Zhongyuan. Dengfeng City, Xinzheng City, Guancheng Hui District, Xingyang City, and Zhongmu County demonstrate increasing urbanization over time, as shown in Fig. 9. The total amount of urbanized land in the remaining districts and counties has varied significantly during the research period, but it has continued to expand spatially. As the urbanized region has expanded, it has consumed a larger and larger share of agricultural land, reducing the amount of arable land to variable degrees in all districts and counties during the past 20 years. Although the percentage of area covered by forests and waterbodies fluctuated over time in different administrative divisions, overall, there was evidence of a continual dynamic adjustment state.      Zhengzhou's general dynamics have been relatively stable over the last 20 years, with values ranging from 0.07 to 0.13, whereas the dynamics of urbanized land, with values ranging from 0.03 to 0.07, have remained in a moderate adjustment state. When compared with natural environments, such as forests, rivers, and farms, the dynamics of urbanized land are negligible.  Still, most forested areas and bodies of water have experienced the most rapid shifts, with values ranging from 0.15 to 0.44 (Fig. 12).

Rate of land cover change
Complete land cover patterns have evolved during the past two decades in numerous different areas and counties. Compared with earlier epochs, the dynamics of comprehensive land use were significantly larger throughout the 2010 to 2013, 2013 to 2017, and 2017 to 2020 research periods. With the exception of Gongyi and Dengfeng, where general land-cover dynamics have been relatively modest in recent decades, the apex of global land-cover changes occurred between 2010 and 2013. Erqi District land-use dynamics decreased from high to low in the last three time periods, but stayed higher than comprehensive land-use dynamics in the first three time periods. Between the years of 2000 and 2010, the dynamic range of land use was generally minimal and restricted to select districts and counties. For several of the study periods, land use trends were comparable across various jurisdictions.

Degree of land use
Districts and counties can be distinguished from one another based on land use, as shown in Fig. 13. To be more specific, there was a fixed range of variation for the comprehensive index of land use intensity within a certain district or county, with some districts and counties having wider ranges than others. Dengfeng, Gongyi, and Xinmi; Erqi, Guancheng, Jinshui, Huiji, Fig. 12 The evolution of comprehensive dynamics in different research periods in Zhengzhou districts and counties. Fig. 13 Comprehensive index map of land use degrees in Zhengzhou districts and counties. Shangjie, and Zhongyuan; and Xinzheng, Xingyang, and Zhongmu are the three groups of districts and counties that have a fairly similar range in the comprehensive index of land use (Fig. 14).
More districts and counties experienced negative changes in the holistic factor of land cover than experienced positive changes during the two study periods of    There has been a noticeable change in Zhengzhou's land cover during the past 20 years. At a rate of between 10.59% and 21.26% annually, urban sprawl has been expanding at the expense of farmland, forests, and natural waterways, all of which have seen their footprints shrink or expand in recent years. The rate at which useable land was recovered from the dormant state between the years 2000 and 2003 was one such example. Both forest and water body areas decreased drastically, by as much as 19.85% and 14.31%, respectively, with the biggest decline for woodland occurring between 2003 and 2006, when it dropped by 38.27%. The area of water and arable land rapidly decreased between 2006 and 2010, with rates of change reaching 45.71% and 24.27%, respectively, whereas idle land may be exploited more effectively. With the passage

City center of gravity
The migration trajectory and migration rate of the urban land use center of gravity can intuitively reflect the process of urban land use spatial pattern changes, as well as the spatial trajectory of human use and transformation of land resources, which is crucial for comprehending decisions regarding the direction of economic and social growth, as well as the strength and impact of government policies throughout time. The spatial analysis tools of ArcGIS (ESRI, Redlands, California) were used to estimate the trajectory of center of gravity transformation of various land types in Zhengzhou at different research stages. Figure 15 shows the calculation results for (a) urbanized land, (b) forest land, (c) water bodies, and (d) agricultural land. As seen in Fig. 15 from (a) to (d), in terms of population density, Erqi District wins hands down. You can find the border between Xingyang City and Zhongyuan District in Erqi District.  Fig. 15 The center of gravity shift's characteristic map.
Xinmi City was home to the majority of the city's trees, whereas Zhongyuan District was where most of the water was located until recently, when it began a slow but steady migration to the northeast. The hub of the agricultural area was located close to where Xinmi City and Erqi District meet.

Conclusions
A new image merging technique based on NSCT, PCNN, and IHS theory is offered as a solution to the problem of inadequate spatial resolution in urban RS images. Adaptive fuzzy logic algorithms are the foundation for the fusion of low-frequency components. Based on human vision attributes that are more sensitive to picture edges and directions, high-frequency component fusion uses enhanced local Laplacian energy as the input to the PCNN model to augment the edge information of the image. Moreover, the connection strength of PCNN is adapted to the increased SF, which is used to enhance the characteristics and detailed information of local areas in images. Findings from merging indicate that the suggested merging strategy is consistent with human visual perception, can extract spectrum characteristics from the input image, can more effectively emphasize the image's target information, and can increase the fusion image's degree of information and clarity. Classification of the fused RS pictures using the proposed convolutional neural network deep learning model yielded Kappa coefficient, average accuracy, and final accuracy after classification values more than 95%, satisfying the criteria for future study. Zhengzhou has experienced rapid urbanization over the past 20 years, with the most notable performance being a dramatic reduction in arable land area as a result of the extraordinary scale of urbanized land growth. The water area remained mostly the same over time, whereas the woodland region evolved gradually. Diverse regions of the Zhengzhou metropolitan area have different land use patterns because of their unique histories. Over the course of the study, there was little discernible change in the dynamics of land usage in the Zhengzhou area. Zhengzhou's urbanization process has accelerated since the year 2000, as seen by the city's more active and intensive usage of all available land.
Distinct regional characteristics of municipalities and counties become more apparent when a comprehensive land use index is applied. To be more specific, the index of land-use intensity in a given district or county was constrained to change only within a given range, and this range was not constant across all districts or counties. An analysis of Zhengzhou's land use transfer matrix over the past two decades reveals a dramatic increase in urbanization and a significant shift in the city's land-use patterns. The clearing of land for urban development was the price paid for this success, however. Zhengzhou's rapid urbanization has led to the increased reclamation of formerly undeveloped plots of land and the consequent rise in land utilization, as seen by the city's shrinking supply of idle land. The concentration of urban areas has altered during the past two decades.
Using the suggested RS picture merging approach and the machine learning-based CNN RS satellite picture categories approach, this dissertation compensates for the loss of image resolution in urban RS and enhances land use classification accuracy. It also presents an in-depth examination of the geographical and temporal evolution features of land cover in Zhengzhou over the last 20 years of rapid urbanization. This sort of research is particularly interested in the features of land use development during the urbanization process.