Vector sketch animation generation
with differentialable motion trajectories
We propose a differentiable motion trajectory with a Bernstein basis (cross-frame curves, middle) to represent stroke control point movement across frames.
Overview
Abstract
Sketching is a direct and inexpensive means of visual expression. Though image-based sketching has been well studied, video-based sketch animation generation is still very challenging due to the temporal coherence requirement. In this paper, we propose a novel end-to-end automatic generation approach for vector sketch animation. To solve the flickering issue, we introduce a Differentiable Motion Trajectory (DMT) representation that describes the frame-wise movement of stroke control points using differentiable polynomial-based trajectories. DMT enables global semantic gradient propagation across multiple frames, significantly improving the semantic consistency and temporal coherence, and producing high-framerate output. DMT employs a Bernstein basis to balance the sensitivity of polynomial parameters, thus achieving more stable optimization. Instead of implicit fields, we introduce sparse track points for explicit spatial modeling, which improves efficiency and supports long-duration video processing. Evaluations on DAVIS and LVOS datasets demonstrate the superiority of our approach over SOTA methods. Cross-domain validation on 3D models and text-to-video data confirms the robustness and compatibility of our approach.
Framework
Comparison - Different Algorithm
| Original Video | Canny | Hed | CLIPasso | SketchVideo | Fang et al. | LiveSketch-MLP | Ours |
|---|---|---|---|---|---|---|---|
Comparison - Different Stroke Number
This section presents a comparison of conversion effects under different stroke counts. Each comparison group includes five rows of results (from top to bottom: CLIPasso, SketchVideo, Fang et al., LiveSketch-MLP, and our algorithm), with the 4 sets of results for Fang et al. generously provided by Prof. Xiaonan Fang. We appreciate his support for our comparative experiments.
| Input | 4 Strokes | 8 Strokes | 16 Strokes | 32 Strokes | 64 Strokes |
|---|---|---|---|---|---|
|
mallard-water |
|||||
|
CLIPasso |
|||||
|
SketchVideo |
|||||
|
Fang et al. |
|||||
|
LiveSketch-MLP |
|||||
|
Ours |
|||||
|
hike |
|||||
|
CLIPasso |
|||||
|
SketchVideo |
|||||
|
Fang et al. |
|||||
|
LiveSketch-MLP |
|||||
|
Ours |
|||||
|
soapbox |
|||||
|
CLIPasso |
|||||
|
SketchVideo |
|||||
|
LiveSketch-MLP |
|||||
|
Ours |
|||||
|
train |
|||||
|
CLIPasso |
|||||
|
SketchVideo |
|||||
|
LiveSketch-MLP |
|||||
|
Ours |
|||||
|
scooter-gray |
|||||
|
CLIPasso |
|||||
|
SketchVideo |
|||||
|
LiveSketch-MLP |
|||||
|
Ours |
|||||
| Input | 4 Strokes | 8 Strokes | 16 Strokes | 32 Strokes | 64 Strokes |
|---|---|---|---|---|---|
|
bear |
|||||
|
CLIPasso |
|||||
|
SketchVideo |
|||||
|
Fang et al. |
|||||
|
LiveSketch-MLP |
|||||
|
Ours |
|||||
|
flamingo |
|||||
|
CLIPasso |
|||||
|
SketchVideo |
|||||
|
Fang et al. |
|||||
|
LiveSketch-MLP |
|||||
|
Ours |
|||||
|
rollerblade |
|||||
|
CLIPasso |
|||||
|
SketchVideo |
|||||
|
LiveSketch-MLP |
|||||
|
Ours |
|||||
|
stroller |
|||||
|
CLIPasso |
|||||
|
SketchVideo |
|||||
|
LiveSketch-MLP |
|||||
|
Ours |
|||||
|
stunt |
|||||
|
CLIPasso |
|||||
|
SketchVideo |
|||||
|
LiveSketch-MLP |
|||||
|
Ours |
|||||
Long Video Experiment
| Original Video | Tracking Info Vis | 4 Strokes Result | 8 Strokes Result | 16 Strokes Result | 32 Strokes Result | Parameter Information |
|---|---|---|---|---|---|---|
|
||||||
|
3D to Sketch Animation
| 3D Animation | Tracking Info Vis | 4 Strokes Result | 8 Strokes Result | 16 Strokes Result | 32 Strokes Result | Parameter Information |
|---|---|---|---|---|---|---|
|
||||||
|
||||||
|
||||||
|
Text-to-Sketch Animation
| Text Prompt | Video Generated by CogvideoX-2B | Sketch Animation Converted by Our Method |
|---|---|---|
|
"A cute Corgi" |
8 fps |
24 fps |
|
"The goldenfish is gracefully moving through the water, its fins and tail fin gently propelling it forward with effortless agility" |
8 fps |
24 fps |
|
"The wine in the wine glass sways from side to side." |
8 fps |
24 fps |
Ablation Studies
1. Power basis vs. Bernstein basis
| Original Video | iter=50 | iter=100 | iter=200 | iter=500 | iter=1000 |
|---|---|---|---|---|---|
|
Power Basis |
|||||
|
Bernstein Basis |
2. Comparison with and without Motion Heatmap in initial Stroke Generation
| Original Video | CLIP Attention Map | Motion Heatmap | Prob Density Map & Sampling Points w/o Motion Heatmap | Result without Motion Heatmap | Prob Density Map & Sampling Points w/ Motion Heatmap | Result with Motion Heatmap |
|---|---|---|---|---|---|---|
|
|
|
|
3. Comparison with and without Consistency Loss
| Original Video | Tracking Data Visualization | Without Consistency Loss | With Consistency Loss |
|---|---|---|---|
Citation
BibTeX
@article{zhu2025vector,
title={Vector sketch animation generation with differentialable motion trajectories},
author={Zhu, Xinding and Yang, Xinye and Zheng, Shuyang and Zhang, Zhexin and Gao, Fei and Huang, Jing and Chen, Jiazhou},
journal={arXiv preprint arXiv:2509.25857},
year={2025}
}