Milestone Systems has introduced a new generative-AI plug-in for XProtect designed to help organisations interpret large volumes of video faster and more accurately. As video systems continue to produce increasing amounts of footage, operators often face the challenge of manually reviewing clips, validating alarms and compiling reports. The new plug-in is intended to automate much of that work, reducing response times and easing operator fatigue.
The plug-in uses AI to automatically create structured incident summaries from selected video clips, helping teams cut down the time spent preparing documentation. It also offers automated event validation, analysing motion events and confirming whether an alert warrants attention. By reducing false positives, the system aims to streamline monitoring workflows and improve operational efficiency. Bookmarked footage is also summarised using natural-language descriptions, allowing operators to skim context quickly without watching each clip in full.
The integration works directly with the XProtect rule engine and can be deployed on-premises or in the cloud, supporting a range of compliance and deployment requirements.
Milestone built the solution on its Hafnia Vision Language Model, trained on 75,000 hours of ethically sourced video from Europe or the United States. It uses NVIDIA Cosmos Curator for data preparation and the NVIDIA Cosmos Reason VLM for inference, forming the foundation of what the company describes as one of the industry’s most advanced and compliance-focused video AI platforms.
Thomas Jensen, CEO of Milestone Systems, said the new capabilities will help cities and organisations working with traffic and surveillance systems to achieve greater efficiency and insight. He noted that partners will also be able to extend the platform with their own solutions now that AI capabilities are embedded within XProtect.
Early adopters, including the cities of Genoa in Italy and Dubuque in Iowa, are preparing to deploy the system to enhance traffic management.
Milestone is also launching a Vision Language Model as a Service, accessible through APIs. This will allow developers and integrators to build their own generative-AI video solutions independent of the underlying video management platform, broadening the potential applications across the wider surveillance ecosystem.
