trycua/cua

cua-driver get_window_state omits per-element geometry

Open

#1,564 opened on May 18, 2026

View on GitHub
 (4 comments) (0 reactions) (0 assignees)HTML (1,051 forks)batch import
enhancementgood first issue

Repository metrics

Stars
 (16,722 stars)
PR merge metrics
 (Avg merge 1d) (166 merged PRs in 30d)

Description

Summary

get_window_state returns tree_markdown and screenshot dimensions, but no per-element geometry (bounds, frame, x/y/width/height, AXPosition, AXSize, etc.).

This makes it impossible for downstream clients to map element indexes/refs to real coordinates.

Reproduction

list_windows includes window bounds

cua-driver call list_windows '{"on_screen_only":false}'

Example result:

{
  "app_name": "Safari浏览器",
  "bounds": { "x": 0, "y": 0, "width": 1920, "height": 30 },
  "pid": 2395,
  "window_id": 3882
}

get_window_state does not include element geometry

cua-driver call get_window_state '{"pid":2395,"window_id":140}' --raw

Observed structuredContent:

{
  "bundle_id": "com.apple.Safari",
  "element_count": 474,
  "name": "Safari浏览器",
  "pid": 2395,
  "screenshot_height": 304,
  "screenshot_scale_factor": 2,
  "screenshot_width": 388,
  "tree_markdown": "...",
  "turn_id": 8
}

Missing fields:

  • bounds
  • frame
  • elements
  • AXPosition
  • AXSize
  • per-element x/y/width/height

Expected

Either:

  1. return a structured element list with geometry, or
  2. expose a geometry map keyed by element index, or
  3. document explicitly that get_window_state does not provide per-element geometry.

Environment

  • macOS 26.5
  • CuaDriver.app version 0.2.0

Question

Is this intentional API design, or should get_window_state expose per-element geometry?

Contributor guide