Tech & Research

text2edit

In-browser transcription

In-browser transcription editor for documentary editing workflows

During the research process for projects like 16 Soles by Marta Ferrer and Artefactos de Guerra by Jorge Caballero, Artefacto encountered a specific problem in the editing room: the need to move from a transcribed interview to an editing structure quickly, in a controlled manner, and without depending on proprietary software tools. text2edit is the answer to that need.

It is a documentary editor that works directly in the browser, designed to transform transcriptions generated by speech recognition models into editing decisions exportable to any professional editing system. The starting point of the workflow is Whisper, OpenAI’s transcription model, which is previously executed in Google Colab to generate a text file with timestamps. That JSON or TXT file is what is loaded into text2edit to start the editorial process.

text2edit works as a text-based editing interface linked to the original video or audio file. Once the transcription with timestamps is loaded, the tool presents two synchronized playback panels: one for the source material and another for the editing timeline under construction. The editor selects text fragments, incorporates them into the timeline, and reorders or trims them using a visual canvas interface.

Integration with an LLM adds a layer of spelling and grammar correction specifically oriented toward Catalan, a language that widely used industry editing programs like Adobe Premiere Pro manage with errors or directly ignore. This function makes text2edit a relevant resource for projects working with minority languages where textual precision has direct consequences on the final subtitles and on editing decisions.

The system exports to multiple formats: FCP XML for Final Cut Pro, CMX 3600 EDL for professional editing systems, SRT for subtitling, and TXT for documentation.

Text-to-edit, understood as a method, proposes that the transcribed spoken word can function as an editing unit before the editor touches the video. This shifts the axis of the editorial decision from the frame toward language. In documentary projects with a high volume of interviews, this shift has consequences for how time is structured, who speaks, how much, and what remains left out.

The tool is active, open-source under the MIT license, and is part of the set of technical resources that Artefacto develops to make documentary research sustainable in languages and contexts that dominant platforms tend to leave out.

Code and documentation:  https://github.com/jcaballeroramos/text2edit