Multiple target tracking in video is an important problem in many emerging applications. It is also a challenging problem, where the coalescence phenomenon often happens, meaning the tracker associates more than one trajectories to some targets while loses track for others. This coalescence may result in the failure of tracker, especially when similar targets move close or present partial or complete occlusions. Existing approaches are mainly based on joint state space representation of the multiple targets being tracked, therefore confronted by the combinatorial complexity due to the nature of the intrinsic high dimensionality. In this paper, we propose a novel distributed framework with linear complexity to this problem. The basic idea is a collaborative inference mechanism, where the estimate of each individual target state is not only determined by its own observation and dynamics, but also through the interaction and collaboration with the state estimates of other targets, which finally leads to a competition mechanism that enables different but spatial adjacent targets to compete for the common image observations. The theoretical foundation of the new approach is based on a well designed Markov network, where the structure configuration in this network can change with time. In order to inference from such a Markov network, a probabilistic variational analysis of this Markov network is conducted and reveals a mean field approximation to the posterior density of each target, therefore provides a computationally efficient way for such a difficult inference problem. Compared with the existing solutions, the proposed new approach stands out by its linear computational cost and excellent performance achieved to deal with the coalescence problem, as pronounced in the extensive experiments.