Taylor-Series-Expansion-Based Vision Transformer Models
Related Experiment Videos
View abstract on PubMed
Summary
A new Taylor-Series-Expansion (TSE) based vision transformer approximates nonlinear functions efficiently. This method reduces memory usage and boosts deployment speed with minimal accuracy loss.
Area of Science:
- Computer Vision
- Machine Learning
- Mathematics
Background:
- Taylor-Series-Expansion (TSE) approximates nonlinear functions using finite series.
- Vision transformers are powerful but computationally intensive.
Purpose of the Study:
- To design a novel TSE-based vision transformer.
- To reduce memory burden and improve deployment efficiency of vision models.
Main Methods:
- Developed a TSE-based vision transformer approximating naive models.
- Incorporated shared first-order TSE blocks, finite multiplications, and learnable TSE coefficients.
- Introduced a Taylor skip mechanism for dynamic expansion capability.
Main Results:
- Achieved significant deployment latency boosts (1.30-1.45×) on A100 and AGX Orin GPUs with negligible accuracy loss.
- Demonstrated orthogonal benefits when combined with model compression techniques, further enhancing performance (up to 3.61×).
- Validated on ImageNet classification, COCO detection, and ADE20K segmentation tasks.
Conclusions:
- TSE-based vision transformers offer an efficient alternative to traditional models.
- The approach effectively reduces memory footprint and enhances real-world deployment performance.
- This method presents a promising direction for optimizing deep learning vision models.