pytorch/ignite
View on GitHubSupport for TorchSnapshot for efficient checkpoint saving and loading
Open
#2,752 opened on Oct 24, 2022
enhancementhelp wanted
Repository metrics
- Stars
- (4,313 stars)
- PR merge metrics
- (Avg merge 15d 11h) (17 merged PRs in 30d)
Description
🚀 Feature
TorchSnapshot is a performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind. It includes many optimizations to control for memory usage and optimize checkpoint writing for DDP-style workloads over torch.save/torch.load. For more information, please check out the readme: https://github.com/pytorch/torchsnapshot#why-torchsnapshot
This could be a nice addition to Ignite, similar to the existing Checkpoint handler
cc @yifuwang