[BUG] Loading from large dataframe/large numpy requires holding all chunks in coordinator · alibaba/GraphScope#2342

(0 comments) (1 reaction) (0 assignees)HTML (301 forks)batch import

bugcomponent:coordinatorgood first issue

Repository metrics

Stars: (2,401 stars)
PR merge metrics: (Avg merge 1m) (7 merged PRs in 30d)

Description

Describe the bug

It looks strange that we need to accumulate all chunks in the request stream into a list in coordinator before sending to analytical engine, that would requires large available memory for the coordinator pod.

https://github.com/alibaba/GraphScope/blob/b80a35599424580325a750e734f8a3b2dead2a5b/coordinator/gscoordinator/dag_manager.py#L77-L107

Contributor guide

Research direction: Investigate the coordinator's request stream handling in dag manager.py and explore options to stream chunks directly to the analytical engine without accumulating them in memory.
Tech stack: python
Domain: backend
Issue type: Bug
Difficulty: 3
Estimated time: 1-3 hours
Activity status: Fresh
Clarity: Mostly clear
Prerequisites: PythonDistributed systems
Newbie friendliness: 40

Repository metrics

Description

Contributor guide

Get fresh easy issues in your inbox.