5 Commits

Author SHA1 Message Date
Devin AI ed0069ed89 fix(agent-proxy): cancel upstream reader on abort so segment timer can actually interrupt parseSse
Root cause of "segment_end never written, response just runs until
Netlify kills it": the Fetch spec only honors AbortSignal for the
request/header phase. Once you have `res.body` and start reading
from it, aborting that same signal does NOT close the body stream —
any in-flight `reader.read()` hangs indefinitely.

So when the segment timer fired and called segmentAbort.abort(), the
running `for await (const event of parseSse(upstream))` stayed
blocked on a `reader.read()` that never resolved. The while loop
never exited, segment_end was never written, controller.close() was
never called, and Netlify eventually pulled the rug somewhere
between ~49 s and ~60 s in.

Fix: parseSse now takes an optional AbortSignal and, when it fires,
calls `reader.cancel()` directly. The WHATWG ReadableStream spec
guarantees that any pending read() resolves with `{done: true}`
after cancel, which lets the for-await exit promptly and the body
loop re-evaluate `while (!done && !segmenting)` → break →
writeJson(segment_end) → controller.close(). Confirmed in the curl
trace: heartbeats now stop at the budget mark and segment_end fires
before Netlify's cap.

Co-Authored-By: alex <alex@semipublic.co>
2026-05-13 13:20:19 +00:00
Devin AI 25c4e2442c fix(agent-proxy): conservative segment budget + drop unnecessary upstream.cancel()
Two related fixes after re-testing on the deploy preview:

1. SEGMENT_BUDGET_MS: 54_000 -> 40_000.

   Empirically, Netlify Edge Function responses get cut anywhere
   between ~49 s and ~60 s wall-clock — significantly more variance
   than the docs imply. With a 54 s budget, our segment_end write +
   controller.close() sometimes lands AFTER the platform has already
   killed the response (curl: 'HTTP/2 stream 1 was not closed cleanly
   before end of the underlying stream'). 40 s gives ~10 s of headroom
   on the low end of the observed range while still keeping segment
   overhead reasonable (~1 segment per 40 s of brief).

2. HEARTBEAT_MS: 10_000 -> 5_000.

   Tighter keep-alive so the connection stays warm even during long
   model-thinking gaps where the agent isn't emitting events.

3. Drop the 'await upstream.cancel()' between backfill and tail.

   The upstream fetch's body is already cleaned up by
   segmentAbort.abort() (or will be by the runtime once we drop our
   reference). On Deno Edge, awaiting cancel() on an already-aborted
   body can hang, which prevents segment_end from being written.

Co-Authored-By: alex <alex@semipublic.co>
2026-05-13 13:16:44 +00:00
Devin AI df3c8e7d1c fix(agent-proxy): events.list uses opaque page cursor, not after_id
The Managed Agents events endpoint (`GET /v1/sessions/{id}/events`)
does NOT support filtering by event id. It returns an opaque
`next_page` cursor on each response and accepts it back via the
`page` query parameter; an `after_id=` filter returns 400 Bad
Request, which caused every segment resume to fail backfill (visible
as `{"type":"status","kind":"session_error","message":"Backfill
failed: events.list returned 400 Bad Request"}`).

Caught during testing of commit 8e44de5: resuming from a segment
boundary always returned 400 and the brief silently lost events from
the previous segment.

Changes:
- `listAllEvents` now paginates via `page` / `next_page` and pulls
  the full session history (limit=1000). The Anthropic API has no
  per-id filter, so the caller is responsible for skipping events
  already delivered.
- New `pastInitialId` flag at the top of the body loop: on resume,
  mute every event up to and including `initialLastEventId`
  (still adding them to `seenEventIds` so the live stream doesn't
  re-emit them), then start delivering. On a brand-new session the
  flag starts true and is a no-op.
- Safety fallback: if backfill completes without ever seeing
  `initialLastEventId` (stale cursor / truncated history), flip
  the flag to true so we don't get stuck muting forever — the live
  stream will start delivering whatever shows up next.

Co-Authored-By: alex <alex@semipublic.co>
2026-05-13 13:08:54 +00:00
Devin AI 8e44de5271 feat(agent-proxy): segment streaming responses at ~54s to bypass Netlify Edge's ~60s response cap
Netlify Edge Functions empirically cap a single streaming response at
~60s wall-clock, regardless of activity. Confirmed against Netlify's
own canonical SSE example (edge-functions-examples.netlify.app/sse)
which also cuts at +60.1s. The Anthropic Managed Agent session is
fine for several minutes; the cap is per-HTTP-response.

This commit splits a long brief across multiple HTTP responses, while
keeping the UX of one continuous stream:

netlify/edge-functions/agent-proxy.ts
- Accept either a new-session payload {stationName, stationLocation,
  stationWebsite} or a resume payload {sessionId, lastEventId,
  startedAt}.
- On resume, skip session creation + user.message send. Just reopen
  the live SSE stream and backfill via
  GET /v1/sessions/{id}/events?after_id=lastEventId (deduped by
  event.id), then keep tailing.
- Single AbortController per segment. A 54s timer aborts the upstream,
  the for-await loops exit, and we write one final NDJSON line:
  {type:'segment_end', sessionId, lastEventId, startedAt}.
- The 20-min OVERALL_BUDGET_MS is enforced via Date.now() - startedAt
  so it spans across all segments.
- Refactor main loop so every iteration is openStream + backfill +
  tail. Cleaner than the previous initial-stream + reconnect-only-on-
  drop pattern.

src/App.jsx
- readStream() now returns a {sessionId, lastEventId, startedAt}
  payload if it saw a segment_end, or null if the stream ended
  cleanly.
- handleSubmit() loops, reopening /api/agent-proxy with the resume
  payload until readStream returns null. Spinner/status state stays
  on across segments so the UI shows one continuous stream.

README.md
- Document the segmented-streaming protocol and why it exists.

Co-Authored-By: alex <alex@semipublic.co>
2026-05-13 13:00:54 +00:00
Devin AI d1c5be112e migrate agent-proxy to Netlify Edge Function so long sessions stream end-to-end
The reconnect + events.list backfill in c283d88 is correct but never ran:
the previous v2 Node Function was killed at ~27 s (well before the 20 min
reconnect budget could matter), so streams always died after the first MCP
tool batch.

Move the proxy to a Netlify Edge Function (Deno runtime) which has no
streaming-duration cap as long as we keep writing to the response body.
Same reconnect / backfill / dedupe-by-event-id pattern; same NDJSON wire
protocol to the browser. Implemented with plain fetch() against the
Anthropic REST API (npm packages on Edge are beta) so we have no SDK
runtime dependency.

Frontend now POSTs to /api/agent-proxy. The Anthropic SDK is removed
from the package; @netlify/edge-functions is added for ambient types.

Co-Authored-By: alex <alex@semipublic.co>
2026-05-13 12:40:31 +00:00