The recent advent of radiomics has enabled the development of prognostic and predictive tools which use routine imaging, but a key question that still remains is how reproducible these features may be across multiple sites and scanners. This is especially relevant in the context of MRI data, where signal intensity values lack tissue specific, quantitative meaning, as well as being dependent on acquisition parameters (magnetic field strength, image resolution, type of receiver coil). In this paper we present the first empirical study of the reproducibility of 5 different radiomic feature families in a multi-site setting; specifically, for characterizing prostate MRI appearance. Our cohort comprised 147 patient T2w MRI datasets from 4 different sites, all of which were first pre-processed to correct acquisition-related for artifacts such as bias field, differing voxel resolutions, as well as intensity drift (non-standardness). 406 3D voxel wise radiomic features were extracted and evaluated in a cross-site setting to determine how reproducible they were within a relatively homogeneous non-tumor tissue region; using 2 different measures of reproducibility: Multivariate Coefficient of Variation and Instability Score. Our results demonstrated that Haralick features were most reproducible between all 4 sites. By comparison, Laws features were among the least reproducible between sites, as well as performing highly variably across their entire parameter space. Similarly, the Gabor feature family demonstrated good cross-site reproducibility, but for certain parameter combinations alone. These trends indicate that despite extensive pre-processing, only a subset of radiomic features and associated parameters may be reproducible enough for use within radiomics-based machine learning classifier schemes.