A delay-and-sum beamformer implementation for 3D imaging with row-column arrays is presented. It is written entirely in the MATLAB programming language for flexible use and fast modifications for research use, and all parts can run on either the CPU or GPU. Dynamic apodization with row-column arrays is presented and is supported in both transmit and receive. Delay calculations are simplified compared to previous beamformers, and 3D delay and apodization calculations are reduced to 2D problems for faster calculations. The performance is evaluated on an Intel Xeon E5-2630 v4 CPU with 64 GB RAM and a NVIDIA GeForce GTX 1080 Ti GPU with 11 GB RAM. A 192+192 array is simulated to image a volume of 96-by-96-by-45 wavelengths sampled at 0.3 wavelength in the axial direction and 0.5 wavelength in the lateral and elevation directions giving 5.53 million sample points. A single-element synthetic aperture sequence with 192 emissions is used. The 192 volumes are beamformed in approximately 1 hour on the CPU and 5 minutes on the GPU corresponding to a speed-up of up to 12.2 times. For a smaller beamforming problem consisting of the three center planes in the volume, a speed-up of 4.6 times is found from 109 to 24 seconds. The GPU utilization is around 5.0% of the possible floating point calculations indicating a trade-off between the easy programming approach and high performance.