Indexing, retrieval and delivery of visual and spatio-temporal properties of video objects requires efficient data models and sound operations on the model are mandatory. However, most object-based video data models address only a single aspect of those properties. In this paper, we present an efficient video object representation method that captures the visual, spatial and temporal properties of objects in a video in the form of a unified abstracted data type. The proposed data type is a polygon mesh, named video object mesh, which is defined in a spatio-temporal domain. Based on the application needs, a contour of an object is modeled with a polygonal contour. With the contour and color information of the object, content-based triangularization is performed. A video object in a frame is modeled with two dimensional-polygon mesh. Each vertex in the mesh, color information is embedded for further use. By using motion analysis, a corresponding vertex in the adjacent frame is identified connected to the vertex that is being analyzed. These processes are continued until a video object disappears. The result of these processes is a three dimensional polygon mesh hat models location variant motion and location invariant motion that can not be captured by traditional trajectory based motion model. The proposed model is also useful camera motion analysis. Since a surface shape of a video object mesh has partial information of camera motion.