Similarity learning is one of the most fundamental tasks in image analysis. The ability to extract similar images in the medical domain as part of content-based image retrieval (CBIR) systems has been researched for many years. The vast majority of methods used in CBIR systems are based on hand-crafted feature descriptors. The approximation of a similarity mapping for medical images is difficult due to the big variety of pixel-level structures of interest. In fundus photography (FP) analysis, a subtle difference in e.g. lesions and vessels shape and size can result in a different diagnosis. In this work, we demonstrated how to learn a similarity function for image patches derived directly from FP image data without the need of manually designed feature descriptors. We used a convolutional neural network (CNN) with a novel architecture adapted for similarity learning to accomplish this task. Furthermore, we explored and studied multiple CNN architectures. We show that our method can approximate the similarity between FP patches more efficiently and accurately than the state-of- the-art feature descriptors, including SIFT and SURF using a publicly available dataset. Finally, we observe that our approach, which is purely data-driven, learns that features such as vessels calibre and orientation are important discriminative factors, which resembles the way how humans reason about similarity. To the best of authors knowledge, this is the first attempt to approximate a visual similarity mapping in FP.