A fundamental step in the generation of visually detailed 3D city models is the acquisition of high fidelity 3D data. Typical approaches employ DSM representations usually derived from Lidar (Light Detection and Ranging) airborne scanning or image based procedures. In this contribution, we focus on the fusion of data from both these methods in order to enhance or complete them. Particularly, we combine an existing Lidar and orthomosaic dataset (used as reference), with a new aerial image acquisition (including both vertical and oblique imagery) of higher resolution, which was carried out in the area of Kallithea, in Athens, Greece. In a preliminary step, a digital orthophoto and a DSM is generated from the aerial images in an arbitrary reference system, by employing a Structure from Motion and dense stereo matching framework. The image-to-Lidar registration is performed by 2D feature (SIFT and SURF) extraction and matching among the two orthophotos. The established point correspondences are assigned with 3D coordinates through interpolation on the reference Lidar surface, are then backprojected onto the aerial images, and finally matched with 2D image features located in the vicinity of the backprojected 3D points. Consequently, these points serve as Ground Control Points with appropriate weights for final orientation and calibration of the images through a bundle adjustment solution. By these means, the aerial imagery which is optimally aligned to the reference dataset can be used for the generation of an enhanced and more accurately textured 3D city model.