Vector sketch animation generation
with differentialable motion trajectories

X. Zhu1 X. Yang1 S. Zheng1 Z. Zhang2 F. Gao1 J. Huang3 J. Chen 1
Our method converts an object video (top) into a sketch animation using 2D vector graphics (bottom).
We propose a differentiable motion trajectory with a Bernstein basis (cross-frame curves, middle) to represent stroke control point movement across frames.

Overview

Abstract

Sketching is a direct and inexpensive means of visual expression. Though image-based sketching has been well studied, video-based sketch animation generation is still very challenging due to the temporal coherence requirement. In this paper, we propose a novel end-to-end automatic generation approach for vector sketch animation. To solve the flickering issue, we introduce a Differentiable Motion Trajectory (DMT) representation that describes the frame-wise movement of stroke control points using differentiable polynomial-based trajectories. DMT enables global semantic gradient propagation across multiple frames, significantly improving the semantic consistency and temporal coherence, and producing high-framerate output. DMT employs a Bernstein basis to balance the sensitivity of polynomial parameters, thus achieving more stable optimization. Instead of implicit fields, we introduce sparse track points for explicit spatial modeling, which improves efficiency and supports long-duration video processing. Evaluations on DAVIS and LVOS datasets demonstrate the superiority of our approach over SOTA methods. Cross-domain validation on 3D models and text-to-video data confirms the robustness and compatibility of our approach.

Framework

First, we obtain the tracking information from the video and initialize the DMT parameters. Then, these parameters are iteratively optimized to make the rasterized sketch animation semantically and geometrically close to the input video, and the stroke movement trajectories are consistent with the tracking information.

Comparison - Different Algorithm

Original Video Canny Hed CLIPasso SketchVideo Fang et al. LiveSketch-MLP Ours

Comparison - Different Stroke Number

This section presents a comparison of conversion effects under different stroke counts. Each comparison group includes five rows of results (from top to bottom: CLIPasso, SketchVideo, Fang et al., LiveSketch-MLP, and our algorithm), with the 4 sets of results for Fang et al. generously provided by Prof. Xiaonan Fang. We appreciate his support for our comparative experiments.

Input 4 Strokes 8 Strokes 16 Strokes 32 Strokes 64 Strokes

mallard-water

CLIPasso

SketchVideo

Fang et al.

LiveSketch-MLP

Ours

hike

CLIPasso

SketchVideo

Fang et al.

LiveSketch-MLP

Ours

soapbox

CLIPasso

SketchVideo

LiveSketch-MLP

Ours

train

CLIPasso

SketchVideo

LiveSketch-MLP

Ours

scooter-gray

CLIPasso

SketchVideo

LiveSketch-MLP

Ours

Input 4 Strokes 8 Strokes 16 Strokes 32 Strokes 64 Strokes

bear

CLIPasso

SketchVideo

Fang et al.

LiveSketch-MLP

Ours

flamingo

CLIPasso

SketchVideo

Fang et al.

LiveSketch-MLP

Ours

rollerblade

CLIPasso

SketchVideo

LiveSketch-MLP

Ours

stroller

CLIPasso

SketchVideo

LiveSketch-MLP

Ours

stunt

CLIPasso

SketchVideo

LiveSketch-MLP

Ours

Long Video Experiment

Original Video Tracking Info Vis 4 Strokes Result 8 Strokes Result 16 Strokes Result 32 Strokes Result Parameter Information
  • Video duration: 17s
  • DMT's highest degree: 199
  • Input Frames: 400
  • Input Frame Rate: 24fps
  • Output Frames: 400
  • Output Frame Rate: 24fps
  • Video duration: 50s
  • DMT's highest degree: 199
  • Input Frames: 300
  • Input Frame Rate: 6fps
  • Output Frames: 1200
  • Output Frame Rate: 24fps

3D to Sketch Animation

3D Animation Tracking Info Vis 4 Strokes Result 8 Strokes Result 16 Strokes Result 32 Strokes Result Parameter Information
  • Model Vertex Count: 7,775
  • Model Polygon Count: 10,666
  • Animation duration: 2s
  • Animation Frames: 50
  • Animation Frame Rate: 24fps
  • Model Vertex Count: 7,775
  • Model Polygon Count: 10,666
  • Animation duration: 2s
  • Animation Frames: 50
  • Animation Frame Rate: 24fps
  • Model Vertex Count: 7,775
  • Model Polygon Count: 10,666
  • Animation duration: 2s
  • Animation Frames: 50
  • Animation Frame Rate: 24fps
  • Model Vertex Count: 7,775
  • Model Polygon Count: 10,666
  • Animation duration: 21s
  • Animation Frames: 500
  • Animation Frame Rate: 24fps

Text-to-Sketch Animation

Text Prompt Video Generated by CogvideoX-2B Sketch Animation Converted by Our Method

"A cute Corgi"

8 fps

24 fps

"The goldenfish is gracefully moving through the water, its fins and tail fin gently propelling it forward with effortless agility"

8 fps

24 fps

"The wine in the wine glass sways from side to side."

8 fps

24 fps

Ablation Studies

1. Power basis vs. Bernstein basis

Original Video iter=50 iter=100 iter=200 iter=500 iter=1000

Power Basis

Bernstein Basis

2. Comparison with and without Motion Heatmap in initial Stroke Generation

Original Video CLIP Attention Map Motion Heatmap Prob Density Map & Sampling Points w/o Motion Heatmap Result without Motion Heatmap Prob Density Map & Sampling Points w/ Motion Heatmap Result with Motion Heatmap

3. Comparison with and without Consistency Loss

Original Video Tracking Data Visualization Without Consistency Loss With Consistency Loss

Citation

BibTeX

@article{zhu2025vector,
  title={Vector sketch animation generation with differentialable motion trajectories},
  author={Zhu, Xinding and Yang, Xinye and Zheng, Shuyang and Zhang, Zhexin and Gao, Fei and Huang, Jing and Chen, Jiazhou},
  journal={arXiv preprint arXiv:2509.25857},
  year={2025}
}