This work addresses the problem of representing an image sequence as a set of octrees. The purpose is to generate a flexible data structure to model video signals, for applications such as motion estimation, video coding and/or analysis. An image sequence can be represented as a 3-dimensional causal signal, which becomes a 3 dimensional array of data when the signal has been digitized. If it is desirable to track long-term spatio-temporal correlation, a series of octree structures may be embedded on this 3D array. Each octree looks at a subset of data in the spatio-temporal space. At the lowest level (leaves of the octree), adjacent pixels of neighboring frames are captured. A combination of these is represented at the parent level of each group of 8 children. This combination may result in a more compact representation of the information of these pixels (coding application) or in a local estimate of some feature of interest (e.g., velocity, classification, object boundary). This combination can be iterated bottom-up to get a hierarchical description of the image sequence characteristics. A coding strategy using such data structure involves the description of the octree shape using one bit per node except for leaves of the tree located at the lowest level, and the value (or parametric model) assigned to each one of these leaves. Experiments have been performed to represent Common Image Format (CIF) sequences.