Most of the applications with mobile devices require self-localization of the devices. GPS cannot be used in indoor environment, the positions of mobile devices are estimated autonomously by using IMU. Since the self-localization is based on IMU of low accuracy, and then the self-localization in indoor environment is still challenging. The selflocalization method using images have been developed, and the accuracy of the method is increasing. This paper develops the self-localization method without GPS in indoor environment by integrating sensors, such as IMU and cameras, on mobile devices simultaneously. The proposed method consists of observations, forecasting and filtering. The position and velocity of the mobile device are defined as a state vector. In the self-localization, observations correspond to observation data from IMU and camera (observation vector), forecasting to mobile device moving model (system model) and filtering to tracking method by inertial surveying and coplanarity condition and inverse depth model (observation model). Positions of a mobile device being tracked are estimated by system model (forecasting step), which are assumed as linearly moving model. Then estimated positions are optimized referring to the new observation data based on likelihood (filtering step). The optimization at filtering step corresponds to estimation of the maximum a posterior probability. Particle filter are utilized for the calculation through forecasting and filtering steps. The proposed method is applied to data acquired by mobile devices in indoor environment. Through the experiments, the high performance of the method is confirmed.