feathr-ai/feathr

[BUG] Feature Materialization hang in stage "RedisOutputUtils.scala:37" in local spark env

Open

#693 opened on Sep 22, 2022

View on GitHub
 (5 comments) (0 reactions) (1 assignee)Scala (244 forks)batch import
buggood first issue

Repository metrics

Stars
 (1,929 stars)
PR merge metrics
 (No merged PRs in 30d)

Description

Willingness to contribute

Yes. I would be willing to contribute a fix for this bug with guidance from the Feathr community.

Feathr version

0.7.2

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 20.0):Mac OS
  • Python version:3.9
  • Spark version, if reporting runtime issue:3.3.0

Describe the problem

The feature gen job hang in redis write stage without any error message.

Tracking information

22/09/22 10:39:55 INFO TaskSchedulerImpl: Adding task set 8.0 with 3 tasks resource profile 0 22/09/22 10:39:55 INFO TaskSetManager: Starting task 0.0 in stage 8.0 (TID 10) (localhost, executor driver, partition 0, ANY, 5196 bytes) taskResourceAssignments Map() 22/09/22 10:39:55 INFO TaskSetManager: Starting task 1.0 in stage 8.0 (TID 11) (localhost, executor driver, partition 1, ANY, 5196 bytes) taskResourceAssignments Map() 22/09/22 10:39:55 INFO TaskSetManager: Starting task 2.0 in stage 8.0 (TID 12) (localhost, executor driver, partition 2, ANY, 5196 bytes) taskResourceAssignments Map() 22/09/22 10:39:55 INFO Executor: Running task 0.0 in stage 8.0 (TID 10) 22/09/22 10:39:55 INFO Executor: Running task 1.0 in stage 8.0 (TID 11) 22/09/22 10:39:55 INFO Executor: Running task 2.0 in stage 8.0 (TID 12) 22/09/22 10:39:55 INFO RedisKeysRDD: Computing partition, get keys partId: 0, [0 - 5461] nodes: RedisNode(RedisEndpoint(feathrazuretest3redis.redis.cache.windows.net,6380,null,,0,2000,true),0,16383,0,1) 22/09/22 10:39:55 INFO RedisKeysRDD: Computing partition, get keys partId: 1, [5462 - 10922] nodes: RedisNode(RedisEndpoint(feathrazuretest3redis.redis.cache.windows.net,6380,null,,0,2000,true),0,16383,0,1) 22/09/22 10:39:55 INFO RedisKeysRDD: Computing partition, get keys partId: 2, [10923 - 16383] nodes: RedisNode(RedisEndpoint(feathrazuretest3redis.redis.cache.windows.net,6380,null,,0,2000,true),0,16383,0,1) 22/09/22 11:09:37 INFO BlockManagerInfo: Removed broadcast_7_piece0 on localhost:56004 in memory (size: 46.5 KiB, free: 434.4 MiB)

Code to reproduce bug

No response

What component(s) does this bug affect?

  • Python Client: This is the client users use to interact with most of our API. Mostly written in Python.
  • Computation Engine: The computation engine that execute the actual feature join and generation work. Mostly in Scala and Spark.
  • Feature Registry API: The frontend API layer supports SQL, Purview(Atlas) as storage. The API layer is in Python(FAST API)
  • Feature Registry Web UI: The Web UI for feature registry. Written in React

Contributor guide