Milestone Systems releases VLM

0

Milestone Systems has released a new vision language model (VLM) designed to help security and traffic operators extract usable insight from large volumes of video more efficiently, as video surveillance environments continue to scale in size and complexity.

The VLM, which specialises in traffic understanding and is powered by NVIDIA Cosmos Reason, underpins two new offerings: a Video Summarization tool for Milestone’s XProtect Video Management Software, and a VLM as a Service (VLMaaS) aimed at developers and third-party integrators.

Addressing video review fatigue

Video surveillance systems capture vast amounts of footage, but reviewing and interpreting that data remains a largely manual and time-consuming process. Milestone said its new Video Summarization tool is designed to reduce this burden by automatically analysing footage and generating structured text summaries that describe what is happening in a scene.

Delivered as a generative AI-powered plug-in for the XProtect Smart Client, the tool allows operators to submit a short video clip along with a simple prompt. The system then produces a written summary within seconds, enabling faster review and investigation. Early testing cited by Milestone suggests the approach could reduce false alarm fatigue among operators by up to 30 per cent.

Rather than searching footage by timestamps or relying on manual tagging, users can search directly through summaries based on video content. Summaries can be bookmarked, filtered and reviewed within the existing XProtect workflow, and can also be triggered automatically through existing event rules and alarms.

Milestone said this allows operators to focus on relevant incidents while filtering out non-actionable motion or environmental noise. The Video Summarization tool is free to download and install, with usage charged only when prompts are sent to the model. Region-specific, sovereign VLMs are initially available for the US and EU, with additional regions planned.

Extending video intelligence beyond VMS

Alongside the XProtect plug-in, Milestone has introduced Hafnia VLM as a Service, providing API access to the same underlying video intelligence for developers, integrators and technology partners.

The service is intended to simplify the integration of advanced video understanding into third-party applications without requiring organisations to build, train or manage their own AI models. Milestone said developers can use the API to add generative video intelligence to existing platforms, whether for proof-of-concept projects or full-scale deployments.

According to the company, using VLMaaS can reduce development effort by up to 70 times compared with fine-tuning a vision language model independently. The service is API-first, delivered over HTTPS, and supports prompt-based instructions for traffic-related use cases. Models are currently fine-tuned for US and EU environments, with further regional support planned.

Milestone emphasised that the model is trained using responsibly sourced data with auditable lineage and is designed to comply with GDPR and the EU AI Act. Pricing follows a pay-per-use model based on API calls, avoiding large upfront investments or custom training costs.

Implications for the surveillance market

The release reflects a broader shift in the CCTV and video management market toward AI tools that reduce operator workload rather than simply adding more analytics outputs. As surveillance deployments grow and traffic and city monitoring use cases expand, tools that summarise, prioritise and contextualise events are increasingly seen as necessary to maintain operational effectiveness.

For buyers, the development highlights how generative AI is moving from experimental analytics into day-to-day VMS workflows and developer platforms, with an emphasis on usability, governance and integration rather than raw detection capability alone.

Share.

Comments are closed.