Vector sketch animation generation
with differentialable motion trajectories

X. Zhu¹ X. Yang¹ S. Zheng¹ Z. Zhang² F. Gao¹ J. Huang³ J. Chen ¹

¹ Zhejiang University of Technology ² Hangzhou Dianzi University ³ Zhejiang Gongshang University

Our method converts an object video (top) into a sketch animation using 2D vector graphics (bottom).
We propose a differentiable motion trajectory with a Bernstein basis (cross-frame curves, middle) to represent stroke control point movement across frames.

Overview

Abstract

Sketching is a direct and inexpensive means of visual expression. Though image-based sketching has been well studied, video-based sketch animation generation is still very challenging due to the temporal coherence requirement. In this paper, we propose a novel end-to-end automatic generation approach for vector sketch animation. To solve the flickering issue, we introduce a Differentiable Motion Trajectory (DMT) representation that describes the frame-wise movement of stroke control points using differentiable polynomial-based trajectories. DMT enables global semantic gradient propagation across multiple frames, significantly improving the semantic consistency and temporal coherence, and producing high-framerate output. DMT employs a Bernstein basis to balance the sensitivity of polynomial parameters, thus achieving more stable optimization. Instead of implicit fields, we introduce sparse track points for explicit spatial modeling, which improves efficiency and supports long-duration video processing. Evaluations on DAVIS and LVOS datasets demonstrate the superiority of our approach over SOTA methods. Cross-domain validation on 3D models and text-to-video data confirms the robustness and compatibility of our approach.

Framework

First, we obtain the tracking information from the video and initialize the DMT parameters. Then, these parameters are iteratively optimized to make the rasterized sketch animation semantically and geometrically close to the input video, and the stroke movement trajectories are consistent with the tracking information.

Comparison - Different Algorithm

Original Video	Canny	Hed	CLIPasso	SketchVideo	Fang et al.	LiveSketch-MLP	Ours

Comparison - Different Stroke Number

This section presents a comparison of conversion effects under different stroke counts. Each comparison group includes five rows of results (from top to bottom: CLIPasso, SketchVideo, Fang et al., LiveSketch-MLP, and our algorithm), with the 4 sets of results for Fang et al. generously provided by Prof. Xiaonan Fang. We appreciate his support for our comparative experiments.

Input	4 Strokes	8 Strokes	16 Strokes	32 Strokes	64 Strokes
mallard-water
	CLIPasso

	SketchVideo

	Fang et al.

	LiveSketch-MLP

	Ours
hike
	CLIPasso

	SketchVideo

	Fang et al.

	LiveSketch-MLP

	Ours
soapbox
	CLIPasso

	SketchVideo

	LiveSketch-MLP

	Ours
train
	CLIPasso

	SketchVideo

	LiveSketch-MLP

	Ours
scooter-gray
	CLIPasso

	SketchVideo

	LiveSketch-MLP

	Ours

Input	4 Strokes	8 Strokes	16 Strokes	32 Strokes	64 Strokes
bear
	CLIPasso

	SketchVideo

	Fang et al.

	LiveSketch-MLP

	Ours
flamingo
	CLIPasso

	SketchVideo

	Fang et al.

	LiveSketch-MLP

	Ours
rollerblade
	CLIPasso

	SketchVideo

	LiveSketch-MLP

	Ours
stroller
	CLIPasso

	SketchVideo

	LiveSketch-MLP

	Ours
stunt
	CLIPasso

	SketchVideo

	LiveSketch-MLP

	Ours

Long Video Experiment

Original Video	Tracking Info Vis	4 Strokes Result	8 Strokes Result	16 Strokes Result	32 Strokes Result	Parameter Information
						Video duration: 17s DMT's highest degree: 199 Input Frames: 400 Input Frame Rate: 24fps Output Frames: 400 Output Frame Rate: 24fps
						Video duration: 50s DMT's highest degree: 199 Input Frames: 300 Input Frame Rate: 6fps Output Frames: 1200 Output Frame Rate: 24fps

3D to Sketch Animation

3D Animation	Tracking Info Vis	4 Strokes Result	8 Strokes Result	16 Strokes Result	32 Strokes Result	Parameter Information
						Model Vertex Count: 7,775 Model Polygon Count: 10,666 Animation duration: 2s Animation Frames: 50 Animation Frame Rate: 24fps
						Model Vertex Count: 7,775 Model Polygon Count: 10,666 Animation duration: 2s Animation Frames: 50 Animation Frame Rate: 24fps
						Model Vertex Count: 7,775 Model Polygon Count: 10,666 Animation duration: 2s Animation Frames: 50 Animation Frame Rate: 24fps
						Model Vertex Count: 7,775 Model Polygon Count: 10,666 Animation duration: 21s Animation Frames: 500 Animation Frame Rate: 24fps

Text-to-Sketch Animation

Text Prompt	Video Generated by CogvideoX-2B	Sketch Animation Converted by Our Method
"A cute Corgi"	8 fps	24 fps
"The goldenfish is gracefully moving through the water, its fins and tail fin gently propelling it forward with effortless agility"	8 fps	24 fps
"The wine in the wine glass sways from side to side."	8 fps	24 fps

Ablation Studies

1. Power basis vs. Bernstein basis

Original Video	iter=50	iter=100	iter=200	iter=500	iter=1000
			Power Basis
			Bernstein Basis

2. Comparison with and without Motion Heatmap in initial Stroke Generation

Original Video	CLIP Attention Map	Motion Heatmap	Prob Density Map & Sampling Points w/o Motion Heatmap	Result without Motion Heatmap	Prob Density Map & Sampling Points w/ Motion Heatmap	Result with Motion Heatmap

3. Comparison with and without Consistency Loss

Original Video	Tracking Data Visualization	Without Consistency Loss	With Consistency Loss

Citation

BibTeX

@article{zhu2025vector,
  title={Vector sketch animation generation with differentialable motion trajectories},
  author={Zhu, Xinding and Yang, Xinye and Zheng, Shuyang and Zhang, Zhexin and Gao, Fei and Huang, Jing and Chen, Jiazhou},
  journal={arXiv preprint arXiv:2509.25857},
  year={2025}
}