Cinematographic cut analysis tool
A cinematographic cut analysis tool built on computer vision and machine learning

The first cuts of a film are difficult territories for anyone other than the person editing them to inhabit. The material arrives raw, timings change from one session to another, and the editing software timeline offers very little surface for thinking. Handwritten notes get lost, verbal comments are forgotten, and the conversation between the creative team and the director remains fragmented. Editing Analytics starts from a concrete premise arising in that workspace: we need a visual board, something like a digital post-it wall, that allows for noting, classifying, and analyzing what happens within a cut while it is still forming.
The second premise is one of research: robust models exist for automatically detecting cuts in a film, but consolidated tools for detecting sequences do not. Defining what a sequence is and recognizing it automatically in an edit is an open problem, and this prototype is a systematic attempt to address it.

The system operates with a two-layer architecture. In the frontend, an application organizes visual material in a grid and offers modules for annotation, cut comparison, statistical analysis, and temporal visualization of editing rhythm. In the backend, a unified Python server articulates four different models depending on the type of analysis required. PySceneDetect acts as the base layer for shot detection through three configurable detectors.
For the semantic layer, the system integrates YOLOv12, RT-DETR, and several VLMs. Identification of visual similarity between fragments combines two different representation architectures: DINOv2 and CLIP, which generate joint image and text representations in the same vector space. A chain of clustering algorithms—HDBSCAN, K-Means, DBSCAN, Agglomerative Clustering, Gaussian Mixture Models, and UMAP for dimensional reduction—operates on these embeddings to detect clusters of visually coherent shots. These clusters are the computational attempt to identify what an editor recognizes as a sequence.

Montage theory distinguishes between very different units of time: the shot as the minimum unit, the scene as a unit of place and action, and the sequence as a unit of narrative meaning. Walter Murch, in his reflections on the craft, noted that editing is building rhythm, and that the rhythm of a film does not live in any individual shot but in the relationship between shots. André Bazin maintained that montage creates a reality that does not exist in the original material, while Eisenstein theorized the clash between shots as the engine of meaning itself. All these approaches share a difficulty: the sequence is a unit that only exists in the reading, not in the signal. That is exactly what makes its automatic detection so complicated. Vision models can measure differences in pixels, semantic vectors, or the presence of objects, but none of these metrics directly captures the narrative intention that unites a set of shots. Editing Analytics tries to approach this impossibility, converts it into a research surface, and proposes visualizations that allow the creative team to reason about it.

The prototype is used in the context of active editorial work within Artefacto. During first-cut sessions, the system processes the video file exported from the editing room, extracts representative frames per shot, generates visual embeddings, and presents them grouped on the board. The team can add annotations per shot, compare versions of the same cut side-by-side, and visualize the temporal rhythm as a density curve throughout the film. The statistical analysis module calculates average shot duration, duration distribution, temporal evolution of rhythm, and object detection patterns.
The goal of the project is to have proprietary cut analysis tools—tools that serve to read what already exists within a film before deciding what to change.