In the last few years, a variety of multicore architectures have been used to parallelize image processing applications.
In this paper, we focus on assessing the parallel speed-ups of different Canny edge detection parallelization
strategies on the Tile64, a tiled multicore architecture developed by the Tilera Corporation. Included in these
strategies are different ways Canny edge detection can be parallelized, as well as differences in data management.
The two parallelization strategies examined were loop-level parallelism and domain decomposition. Loop-level
parallelism is achieved through the use of OpenMP,1 and it is capable of parallelization across the range of values
over which a loop iterates. Domain decomposition is the process of breaking down an image into subimages,
where each subimage is processed independently, in parallel. The results of the two strategies show that for the
same number of threads, programmer implemented, domain decomposition exhibits higher speed-ups than the
compiler managed, loop-level parallelism implemented with OpenMP.