alibaba/GraphScope

[BUG] Loading from large dataframe/large numpy requires holding all chunks in coordinator

Open

#2,342 opened on Dec 23, 2022

View on GitHub
 (0 comments) (1 reaction) (0 assignees)HTML (301 forks)batch import
bugcomponent:coordinatorgood first issue

Repository metrics

Stars
 (2,401 stars)
PR merge metrics
 (Avg merge 1m) (7 merged PRs in 30d)

Description

Describe the bug

It looks strange that we need to accumulate all chunks in the request stream into a list in coordinator before sending to analytical engine, that would requires large available memory for the coordinator pod.

https://github.com/alibaba/GraphScope/blob/b80a35599424580325a750e734f8a3b2dead2a5b/coordinator/gscoordinator/dag_manager.py#L77-L107

Contributor guide