As multithreaded and reconfigurable logic architectures play an increasing role in high-performance computing (HPC),
the scientific community is in need for new programming models for efficiently mapping existing applications to the new
parallel platforms. In this paper, we show how we can effectively exploit tightly coupled fine-grained parallelism in architectures
such as GPU and FPGA to speedup applications described by uniform recurrence equations. We introduce the
concept of rolling partial-prefix sums to dynamically keep track of and resolve multiple dependencies without having to
evaluate intermediary values. Rolling partial-prefix sums are applicable in low-latency evaluation of dynamic programming
problems expressed as uniform or affine equations. To assess our approach, we consider two common problems in
computational biology, hidden Markov models (HMMER) for protein motif finding and the Smith-Waterman algorithm.
We present a platform independent, linear time solution to HMMER, which is traditionally solved in bilinear time, and a
platform independent, sub-linear time solution to Smith-Waterman, which is normally solved in linear time.