Image heterogeneity metrics such as textural features are an active area of research for evaluating clinical outcomes with positron emission tomography (PET) imaging and other modalities. However, the effects of stochastic image acquisition noise on these metrics are poorly understood. We performed a simulation study by generating 50 statistically independent PET images of the NEMA IQ phantom with realistic noise and resolution properties. Heterogeneity metrics based on gray-level intensity histograms, co-occurrence matrices, neighborhood difference matrices, and zone size matrices were evaluated within regions of interest surrounding the lesions. The impact of stochastic variability was evaluated with percent difference from the mean of the 50 realizations, coefficient of variation and estimated sample size for clinical trials. Additionally, sensitivity studies were performed to simulate the effects of patient size and image reconstruction method on the quantitative performance of these metrics. Complex trends in variability were revealed as a function of textural feature, lesion size, patient size, and reconstruction parameters. In conclusion, the sensitivity of PET textural features to normal stochastic image variation and imaging parameters can be large and is feature-dependent. Standards are needed to ensure that prospective studies that incorporate textural features are properly designed to measure true effects that may impact clinical outcomes.