In this paper we present a strategy for handling of multimodal signals from pen-based mobile devices for Human to Computer Interaction (HCI), where our focus is on the modalities of spoken and handwritten inputs. Each modality for itself is quite well understood, as the exhaustive literature demonstrates, although still a number of challenges exist, like recognition result improvements. Among the potentials in multimodal HCI are improvements in recognition and robustness as well as seamless men-machine communication based on fusion of different modalities by exploiting redundancies among these modalities. However, such valuable fusion of both modalities still poses some problems. Open problems today include design approaches for fusion strategies and with the increasing number of mobile and pen-based computers, particularly techniques for fusion of handwriting and speech appear to have a great potential. But today few publications can be found that addresses this potential. In this work we introduce a conceptional approach based on a model to describe a bimodal HCI process. We analyze four exemplary applications with respect to the structure of this model, and highlight the open problems within these applications. Further, we will outline possible solutions to these challenges. Having such fusion model for HCI may simplify the development of seamless and intuitive to user interfaces on pen-based mobile devices. For one of our application scenarios, a bimodal system for form data recording and recognition in medical or financial environment, we will present some first experimental results.