Content-based retrieval of video databases spurred tremendous interest and produced a variety of approaches for modeling and querying video data. In this paper, we propose the use of Query Petri-nets (QPNs), for specifying multi-object spatio-temporal video queries. We elaborate on the expressive capabilities of QPNs, rendering them suitable for specifying complex queries, inexact queries, and underspecified queries. We show that using QPNs makes it feasible to mix semantic annotation-based queries with image and video features in a unified and seamlessly integrated manner. We present our prototype implementation of a visual interface that is based on QPNs and elaborate its capabilities.