Photoacoustic Imaging (PAI) is an emerging technology with strong potential for broad clinical applications from breast cancer detection to cerebral monitoring due to its ability to compute maps of blood oxygen saturation (SO2) distribution in deep tissues using multispectral imaging. However, no well-validated consensus test methods currently exist for evaluating oximetry-specific performance characteristics of PAI devices. We have developed a phantombased flow system capable of rapid SO2 adjustment to serve as a test bed for elucidation of factors impacting SO2 measurement and quantitative characterization of device performance. The flow system is comprised of a peristaltic pump, membrane oxygenator, oxygen and nitrogen gas, and in-line oxygen, pH, and temperature sensors that enable real-time estimation of SO2 reference values. Bovine blood was delivered through breast-relevant tissue phantoms containing vessel-mimicking fluid channels, which were imaged using a custom multispectral PAI system. Blood was periodically drawn for SO2 measurement in a clinical-grade CO-oximeter. We used this flow phantom system to evaluate the impact of device parameters (e.g.,wavelength-dependent fluence corrections) and tissue parameters (e.g. fluid channel depth, blood SO2, spectral coloring artifacts) on oximetry measurement accuracy. Results elucidated key challenges in PAI oximetry and device design trade-offs, which subsequently allowed for optimization of system performance. This approach provides a robust benchtop test platform that can support PAI oximetry device optimization, performance validation, and clinical translation, and may inform future development of consensus test methods for performance assessment of photoacoustic oximetry imaging systems.