An image becomes a three-dimensional space
An image becomes a three-dimensional space and responds to hand movements in real-time.

Depth Cloud is an audiovisual experimentation prototype developed in the Artefacto Films laboratory that transforms still images into interactive three-dimensional point clouds. The project arises as an aesthetic exploration within a broader research on spatial perception in cinema and the question of what happens when a photograph or a film frame acquires depth, volume, and the capacity for physical response. The image stops being a flat surface and becomes navigable matter, a structure that the body can manipulate with its hands in front of a screen.

The system works in three linked stages. In the first, the user uploads an image and Google’s Gemini multimodal language model generates a depth map—a grayscale representation where each pixel value encodes the estimated distance to the camera plane.
In the second stage, a conversion process transforms that map into a three-dimensional point cloud rendered in real-time using WebGL shaders within a React component.
In the third, MediaPipe Hands analyzes the user’s camera signal and translates hand gestures into spatial transformations on the cloud: rotation, scale, explosive particle dispersion.
The system also operates in an alternative mode where Gemini Nano generates voxelized scenes from the same input image, producing a discrete volumetric representation of the photographic space.

For more than a century, cinema organized the gaze from a fixed point: the immobile spectator in front of a flat projection. Depth Cloud subjects that convention to direct tension. By converting the frame into a point cloud manipulable with the body, the project proposes that the cinematographic image has a latent spatiality that computer vision models can make visible. At the same time, it opens a reflection on the nature of depth generated by artificial intelligence; the depth estimation produced by Gemini is a statistical inference, an interpretation of space that the model builds from patterns learned in millions of images.

The prototype allows exploring how a single tool can be simultaneously an instrument for film archive analysis, an interactive installation device, and a platform for aesthetic experimentation with point clouds.
Application: https://depthcloud-hand-control-231178493219.us-west1.run.app/