Digitalized video and audio system has become the trend of the progress in multimedia, because it provides great performance in quality and feasibility of processing. However, as the huge amount of information is needed while the bandwidth is limitted, data compression plays an important role in the system. Say, for a 176 x 144 monochrornic sequence with 10 frames/sec frame rate, the bandwidth is about 2Mbps. This wastes much channel resource and limits the applications. MPEG (moving piciure ezperi group) standardizes the video codec scheme, and it performs high compression ratio while providing good quality. MPEG-i is used for the frame size about 352 x 240 and 30 frames per second, and MPEG-2 provides scalibility and can be applied on scenes with higher definition, say HDTV (high definition ielevision). On the other hand, some applications concerns the very low bitrate, such as videophone and video-conferencing. Because the channel bandwidth is much limitted in telephone network, a very high compression ratio must be required. For the channel bandwidth as low as 28.8 Kbps (V.42), full-duplex is necessary, and 4Kbps is reserved for one-way speech, the effective bandwidth available is only about 10Kbps. As a result, the digital video signal should be compressed 200 times. For conventional codec scheme, such as 11.261, the performance is poor when the bit-rate is about 64Kbps. Therefore, MPEG-4 is being developed to satisfy the demand. In order to encode video signal, MC (motion compensated) method was widely used in standard systems, in which fix-block based algorithm was applied. However, there are some disadvantages such as block-effect that degrades the performance. To fit these applications, there are many approaches. For instance, model-based coding2 and analysis-synthesis coding.4 Model based coding is a very popular topic of survey in very low bit rate system. Though the encoded pictures are much constrained only in slow-moving of human face, the transmitted information is very little because only some parameters of motion of models must be sent. However, the technique is still at beginning because how to analyze and extract the parameters in a moving sequence is very difficult and complex. Besides, the moving sequence is much limitted in specific patterns. That is, if an unexpected pattern existed in the picture, say, a raising hand, the system may cost even much more both in complexity and bit-rate. And as a commercial consideration, most customers may not accept an "synthesized" countenance of his friend or relative in the screen. The other branch is "object-oriented analysis-synthesis coding" . In the approach, the contents in the picture is classified into background, model compliance object, and model failure object. The model-compliance object is coded by its motion and shape; while the model-failure object is coded by its color (including luminance and chrominance) and shape. The coding of background is unnecessary. Head and shoulder usually belong to the model-compliance objects and the details in face such as eyes and mouth are model-failure objects. This method immunes from the variety of picture patterns compared with the model-based approach, however, the performance is not good enough to fit the practical demand. Besides, the complex analysis is also a major burden. There are many researches about this topic recently and lots of modifications and adaptive schemes were proposed. As mentioned above, object-oriented coding is block-effect free. Besides, it is an efficient approach for coding the scene in video-conferencing application because only simple head and shoulder occupies the major part. In the view of image analysis, object-oriented approach contains much direct information than conventional waveform coding thus it can satisfy the requirements. In this paper, a pseudo object-oriented coding system is proposed. It is based on a pel-wised motion estimation and arbitrarily shaped transform. Instead of block matching scheme, the motion estimation is realized by a modified optical flow algorithm (MOFA). Because ofthe quasi-homogeneous property of motion field, the objects can be extracted by simpler segmentation. These objects are applied with arbitrarily shaped transform (AST). Briefly speaking, MOFA reduces temporal redundancy while AST reduces spatial redundancy. Since AST is applied on motion field rather than color field, it is quite different from conventional coding system.