Late gadolinium enhanced (LGE) cardiac magnetic resonance (CMR) imaging, the current benchmark for assessment of myocardium viability, enables the identification and quantification of the compromised myocardial tissue regions, as they appear hyper-enhanced compared to the surrounding, healthy myocardium. However, in LGE CMR images, the reduced contrast between the left ventricle (LV) myocardium and LV blood-pool hampers the accurate delineation of the LV myocardium. On the other hand, the balanced-Steady State Free Precession (bSSFP) cine CMR imaging provides high resolution images ideal for accurate segmentation of the cardiac chambers. In the interest of generating patient-specific hybrid 3D and 4D anatomical models of the heart, to identify and quantify the compromised myocardial tissue regions for revascularization therapy planning, in our previous work, we presented a spatial transformer network (STN) based convolutional neural network (CNN) architecture for registration of LGE and bSSFP cine CMR image datasets made available through the 2019 Multi-Sequence Cardiac Magnetic Resonance Segmentation Challenge (MS-CMRSeg). We performed a supervised registration by leveraging the region of interest (RoI) information using the manual annotations of the LV blood-pool, LV myocardium and right ventricle (RV) blood-pool provided for both the LGE and the bSSFP cine CMR images. In order to reduce the reliance on the number of manual annotations for training such network, we propose a joint deep learning framework consisting of three branches: a STN based RoI guided CNN for registration of LGE and bSSFP cine CMR images, an U-Net model for segmentation of bSSFP cine CMR images, and an U-Net model for segmentation of LGE CMR images. This results in learning of a joint multi-scale feature encoder by optimizing all three branches of the network architecture simultaneously. Our experiments show that the registration results obtained by training 25 of the available 45 image datasets in a joint deep learning framework is comparable to the registration results obtained by stand-alone STN based CNN model by training 35 of the available 45 image datasets and also shows significant improvement in registration performance when compared to the results achieved by the stand-alone STN based CNN model by training 25 of the available 45 image datasets.