Detailed mapping of urban surfaces is one of the most challenging tasks in remote sensing due to the three-dimensional structure of cities, spatial diversity, and material spectral variability. Satellite urban applications demand better spatial, spectral, and temporal resolution, although there are strict technical constraints among them. Therefore, the development of sophisticated methods that exploit both high spectral and spatial data sources becomes necessary. A hierarchical multiple endmember spectral mixture analysis (MESMA) approach is developed and applied on Sentinel-2 imagery for the detailed quantification of the urban land cover, taking advantage of Worldview-2 high spatial resolution. The case study is the urban and peri-urban area of Heraklion, Greece. The area to point regression kriging (ATPRK) method is applied to downscale Sentinel-2 bands from 10 and 20 m to 2 m (WorldView-2 spatial resolution) and create a spectral library (SL) of urban materials, which contain 180 separate spectra. The urban SL is then used in the developed hierarchical MESMA approach to estimate the abundances of 11 urban land cover classes based on the original Sentinel-2 image. The estimated land cover fractions validate against a very high-resolution (1 m) land cover map of the area. It is proved that the complexity of the urban land cover can be efficiently investigated by the proposed methodology. Error analysis shows good accuracy of the results in all estimated class fractions. Moreover, the good validation results lead to the conclusion that ATPRK fusion algorithm between Sentinel-2 and WorldView-2 bands produced reliable urban material spectra, capable for advanced spectral analysis on Sentinel-2 imagery. The developed methodology is easily transferable to other cities since it is based exclusively on earth observation data and is suitable for multiple urban applications related to urban climate, urban sprawl, and urban regeneration.