All traffic models for MPEG-like encoded variable bit rate (VBR) video can be categorized into (i) data rate models (DRMs), and (ii) frame size models (FSMs). Almost all proposed VBR traffic models are DRMs. Since DRMs generate only data arrival rate, they are good for estimating average packet-loss and ATM buffer over-flowing probabilities, but fail to identify such details as percentage of frames affected. FSMs generate sizes of individual MPEG frames, and are good for studying frame loss rate in addition to data loss rate. Among three previously proposed FSMs: (i) one generates frame sizes for full-length movies without preserving GOP-periodicity; (ii) another generates frame sizes for full-length movies without preserving size-based video-segment transitions; and (iii) the third generates VBR video traffic for news videos from scene content description provided to it presupposing a proper segmentation. In this paper, we propose two segmentation techniques for VBR videos - (a) Equal Number of GOPs in all shot classes (ENG), and (b) Geometrically Increasing Interval Lengths for shot classes (GIIL). Each technique partitions the GOPs in the video into size-based shot classes. Frames in each class produce three data-sets one each for I-, B-, and P-type frames. Each of these data-sets can be modeled with an axis shifted Gamma distribution. Markov renewal processes model interclass transitions. We have used QQ plots to show visual similarity of model-generated VBR video data-sets with original data-set. Leaky-bucket simulation study has been used to show similarity of data and frame loss rates between model-generated videos and original video. Our study of frame-based VBR video revealed GIIL segmentation technique separates the I-, B-, and P- frames in well behaved shot classes whose statistical properties can be captured by Gamma-based models.
Kosko proposed Bidirectional Associative Memory (BAM), where pairs of patterns (A<SUB>i</SUB>,B<SUB>i</SUB>) are encoded. When one of a pair of patterns is presented, the other is expected to be recalled. Irrespective of the number of pattern-pairs encoded, if dimensions of A<SUB>i</SUB> and B<SUB>i</SUB> are n and m respectively, a correlation matrix with (mn) elements is required to encode them; and at least O(mn) computation-time is required for recalling a pattern. It is believed that for practical applications (mn) is a large number. Moreover, to guarantee correct recalling of every encoded pattern, the correlation matrix may need to be augmented, which will increase the size of the matrix further. To overcome these problems, we propose a Three Layer BAM (TLBAM) and two novel encoding methods that require smaller size correlation-matrices. To encode p-pair of patterns, only p(m + n) elements are necessary. Thus, recalling time is also reduced. For instance, to encode three pattern-pairs from a recent paper (with n equals 288, m equals 280, and p equals 3) a correlation matrix of (288 X 280 equals) 80,640 elements is required. This encoding does not recall all three pairs correctly. Using one augmentation method the modified correlation matrix will have 89,600 elements for correct recall of all three pairs. Another augmentation method requires modified correlation matrix of 81,208 elements. Our novel encodings proposed here require two correlation matrices with only (288 X 3 + 3 X 280 equals) 1,704 elements.
This paper is motivated by the work of Laden and Keefe, and addresses the topic of pitch class recognition. A neural net with one hidden layer is trained to recognize all thirty-six major, minor and diminished chords, which can be built over a chromatic scale that starts and ends in C. A harmonic complex representation is chosen for the chords. Each tone is represented by five partial harmonics. A three note chord consists of fifteen partials. Our net is trained with the Error Backpropagation algorithm. The effect of different learning rates and hidden layer sizes are studied. Experiments with a technique known as Bold Driver to speedup the learning are also conducted. Following the existing work, we examine the recognition of incomplete patterns, that is, chords with some harmonics missing. The recognition performance of the system could be significantly improved by adding noise in the training session, and using voting networks. Also the number of epochs needed to recognize all chords could be drastically reduced.
This work concentrates on a novel method for empirical estimation of generalization ability of neural networks. Given a set of training (and testing) data, one can choose a network architecture (number of layers, number of neurons in each layer etc.), an initialization method, and a learning algorithm to obtain a network. One measure of performance of the trained network is how closely its actual output approximates the desired output for an input that it has never seen before. Current methods provide a `number' that indicates the estimation of generalization ability of the network. However, this number provides no further information to understand the contributing factors when generalization stability is not very good. The proposed method uses a number of parameters to define generalization ability. A set of values of these parameters provides an estimate of generalization ability. In addition, a value of each parameter indicate the contribution of such factors as network architecture, initialization method, and training data set etc. Furthermore, a method has been developed to verify the validity of estimated values of the parameters.
There are several models for neurons and their interconnections. Among them, feedforward artificial neural networks (FFANNs) are very popular for being quite simple. However, to make them truly reliable and smart information processing systems, such characteristics as learning speed, local minima, and generalization ability need more understanding. Difficulties such as long learning-time and local minima, may not affect them as much as the question of generalization ability, because in many applications a network needs only one training, and then it may be used for a long time. However, the question of generalization ability of ANNs is of great interest for both theoretical understanding and practical use, because generalization ability is a measure of a learning system that indicates how closely its actual output approximates to the desired output for an input that it has never seen. We investigate novel techniques for systematic initializations (as opposed to purely random initializations) of FFANN architectures for possible improvement of their generalization ability. Our preliminary work has successfully employed row-vectors of Hadamard matrices to generate initializations; this initialization method has produced networks with better generalization ability.