Computer Vision and Video Analysis has gained a strong foothold in different areas such as Intelligent Transportation Systems (ITS), Advanced Manufacturing and cartography. In order to build and validate accurate and reliable computer vision-based solutions it is necessary to record and label vast volume of video that cover a great variety of scenarios that these applications have. Although tools designed to build large video datasets already exist, there remains a severe lack of tools that exploit effectively this huge amount of video data. Furthermore, video annotation tasks are still done mostly manually with minimal set of tools available for aiding in the task. Even though significant work has been carried out to build specialised interfaces tailored to video annotation, there is still a high dependency on a the human being to manually annotate objects and events in a video scene