The effort involved in creating accurate ground truth segmentation maps hinders advances in machine learning approaches to tumor delineation in clinical positron emission tomography (PET) scans. To address this challenge, we propose a fully convolutional network (FCN) model to delineate tumor volumes from PET scans automatically while relying on weak annotations in the form of bounding boxes (without delineations) around tumor lesions. To achieve this, we propose a novel loss function that dynamically combines a supervised component, designed to leverage the training bounding boxes, with an unsupervised component, inspired by the Mumford-Shah piecewise constant level-set image segmentation model. The model is trained end-to-end with the proposed differentiable loss function and is validated on a public clinical PET dataset of head and neck tumors. Using only bounding box annotations as supervision, the model achieves competitive results with state-of-the-art supervised and semiautomatic segmentation approaches. Our proposed approach improves the Dice similarity by approximately 30% and reduces the unsigned distance error by approximately 7 mm compared to a model trained with only bounding boxes (weak supervision). Also, after the post-processing step (morphological operations), our weak supervision approach differs only 7% in terms of the Dice similarity from the quality of the fully supervised model, for segmentation task.