pytorch/ignite

Support for TorchSnapshot for efficient checkpoint saving and loading

Open

#2,752 opened on Oct 24, 2022

View on GitHub
 (3 comments) (0 reactions) (0 assignees)Python (602 forks)batch import
enhancementhelp wanted

Repository metrics

Stars
 (4,313 stars)
PR merge metrics
 (Avg merge 15d 11h) (17 merged PRs in 30d)

Description

🚀 Feature

TorchSnapshot is a performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind. It includes many optimizations to control for memory usage and optimize checkpoint writing for DDP-style workloads over torch.save/torch.load. For more information, please check out the readme: https://github.com/pytorch/torchsnapshot#why-torchsnapshot

This could be a nice addition to Ignite, similar to the existing Checkpoint handler

cc @yifuwang

Contributor guide