Today, mobile devices (smartphones, tablets, etc.) are widespread and of high importance for their users. Their performance as well as versatility increases over time. This leads to the opportunity to use such devices for more specific tasks like image processing in an industrial context. For the analysis of images requirements like image quality (blur, illumination, etc.) as well as a defined relative position of the object to be inspected are crucial. Since mobile devices are handheld and used in constantly changing environments the challenge is to fulfill these requirements. We present an approach to overcome the obstacles and stabilize the image capturing process such that image analysis becomes significantly improved on mobile devices. Therefore, image processing methods are combined with sensor fusion concepts. The approach consists of three main parts. First, pose estimation methods are used to guide a user moving the device to a defined position. Second, the sensors data and the pose information are combined for relative motion estimation. Finally, the image capturing process is automated. It is triggered depending on the alignment of the device and the object as well as the image quality that can be achieved under consideration of motion and environmental effects.