There are applications (such as Internet search engines) where short textual strings, for example abstracts or pieces of Web pages, need to be compressed independently of each other. The usual adaptive compression algorithms perform poorly on these short strings due to the lack of necessary data to learn.
In this manuscript, we introduce a compression algorithm targeting short text strings; e.g., containing a few hundred symbols. We also target natural language insensitivity, to facilitate its robust compression and fast implementation. The algorithm is based on the following findings. Applying the move-to-front transform (MTFT) after the Burrows-Wheeler transform (BWT) brings the short textual strings to a "normalized form" where the distribution of the resulting "ranks" has a shape similar over the set of natural language strings. This facilitates the use of a static coding method with few variations, which we call shortBWT, where no on-line learning is needed, to encode the ranks. Finally, for short strings, shortBWT runs very fast because the strings fit into the cache of most current computers.
The introduction for this paper will review the mathematical bases of BWT and MTF, it will also review our recently published metric for rapidly pre-characterizing the compressibility of such short textual strings when using these transforms.
In this work, a method of decision-feedback equalization is developed for a digital holographic channel that experiences moderate-to-severe imaging errors. Decision feedback is utilized, not only where the channel is well-behaved, but also near the edges of the camera grid that are subject to a high degree of imaging error. In addition to these effects, the channel is worsened by typical problems of holographic channels, including non-uniform illumination, dropouts, and stuck bits. The approach described in this paper builds on established methods for performing trained and blind equalization on time-varying channels. The approach is tested on experimental data sets. On most of these data sets, the method of equalization described in this work delivers at least an order of magnitude improvement in bit-error rate (BER) before error-correction coding (ECC). When ECC is introduced, the approach is able to recover stored data with no errors for many of the tested data sets. Furthermore, a low BER was maintained even over a range of small alignment perturbations in the system. It is believed that this equalization method can allow cost reductions to be made in page-memory systems, by allowing for a larger image area per page or less complex imaging components, without sacrificing the low BER required by data storage applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.