<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://blog.williamchong.cloud/feed.xml" rel="self" type="application/atom+xml" /><link href="https://blog.williamchong.cloud/" rel="alternate" type="text/html" /><updated>2026-02-18T18:10:57+00:00</updated><id>https://blog.williamchong.cloud/feed.xml</id><title type="html">William Chong’s Cloud</title><subtitle>William Chong&apos;s blog for sharing small technical tips and tricks, as well as random rants.</subtitle><author><name>William Chong</name></author><entry><title type="html">Minimax TTS API Update: let’s vibe a TypeScript SDK</title><link href="https://blog.williamchong.cloud/code/2026/02/12/minimax-tts-api-improvements-eventsource-parser-and-sdk.html" rel="alternate" type="text/html" title="Minimax TTS API Update: let’s vibe a TypeScript SDK" /><published>2026-02-12T18:00:00+00:00</published><updated>2026-02-12T18:00:00+00:00</updated><id>https://blog.williamchong.cloud/code/2026/02/12/minimax-tts-api-improvements-eventsource-parser-and-sdk</id><content type="html" xml:base="https://blog.williamchong.cloud/code/2026/02/12/minimax-tts-api-improvements-eventsource-parser-and-sdk.html"><![CDATA[<p><img src="/assets/images/2026-02-13-minimax-tts-api-improvements-eventsource-parser-and-sdk/cover.png" alt="Minimax TTS API Improvements" /></p>

<h2 id="introduction">Introduction</h2>

<p>In the <a href="/code/2025/06/21/handling-minimax-tts-api-basic-and-streaming.html">previous post</a>, we implemented streaming text-to-speech with the Minimax API by manually parsing Server-Sent Events and handling the aggregated summary block at the end of the stream. While functional, that approach had two pain points: hand-rolled SSE parsing logic and the need to detect and discard the final summary chunk. Since then, three things have improved the situation significantly:</p>

<ol>
  <li>Minimax added <code class="language-plaintext highlighter-rouge">stream_options.exclude_aggregated_audio</code> to the API</li>
  <li>The <a href="https://github.com/rexxars/eventsource-parser"><code class="language-plaintext highlighter-rouge">eventsource-parser</code></a> library provides a robust, spec-compliant SSE parser</li>
  <li>Most importantly, I published <a href="https://github.com/williamchong/minimax-speech-ts"><code class="language-plaintext highlighter-rouge">minimax-speech-ts</code></a>, a TypeScript SDK that wraps the entire Minimax Speech API (Lol)</li>
</ol>

<h2 id="excluding-the-aggregated-audio">Excluding the Aggregated Audio</h2>

<p>In the previous post, we had to handle a summary block at the end of the stream that contained the complete aggregated audio data. We checked for <code class="language-plaintext highlighter-rouge">data.status === 1</code> to filter out the final chunk with <code class="language-plaintext highlighter-rouge">status === 2</code>. This was necessary because the API always sent the full concatenated audio as the last event, which we didn’t need when streaming.</p>

<p>Minimax has since added a <a href="https://platform.minimax.io/docs/api-reference/speech-t2a-http#body-stream-options-exclude-aggregated-audio"><code class="language-plaintext highlighter-rouge">stream_options.exclude_aggregated_audio</code></a> parameter. When set to <code class="language-plaintext highlighter-rouge">true</code>, the final chunk no longer contains the complete audio data. This means we no longer need to filter it out ourselves:</p>

<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">response</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">$fetch</span><span class="o">&lt;</span><span class="nx">ReadableStream</span><span class="o">&gt;</span><span class="p">(</span>
  <span class="s2">`https://api.minimaxi.chat/v1/t2a_v2?GroupId=</span><span class="p">${</span><span class="nx">minimaxGroupId</span><span class="p">}</span><span class="s2">`</span><span class="p">,</span>
  <span class="p">{</span>
    <span class="na">method</span><span class="p">:</span> <span class="dl">"</span><span class="s2">POST</span><span class="dl">"</span><span class="p">,</span>
    <span class="na">headers</span><span class="p">:</span> <span class="p">{</span>
      <span class="na">Authorization</span><span class="p">:</span> <span class="s2">`Bearer </span><span class="p">${</span><span class="nx">minimaxAPIKey</span><span class="p">}</span><span class="s2">`</span><span class="p">,</span>
    <span class="p">},</span>
    <span class="na">responseType</span><span class="p">:</span> <span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">,</span>
    <span class="na">body</span><span class="p">:</span> <span class="p">{</span>
      <span class="na">stream</span><span class="p">:</span> <span class="kc">true</span><span class="p">,</span>
      <span class="na">stream_options</span><span class="p">:</span> <span class="p">{</span>
        <span class="na">exclude_aggregated_audio</span><span class="p">:</span> <span class="kc">true</span><span class="p">,</span> <span class="c1">// no more summary block</span>
      <span class="p">},</span>
      <span class="na">text</span><span class="p">:</span> <span class="dl">"</span><span class="s2">your text</span><span class="dl">"</span><span class="p">,</span>
      <span class="na">model</span><span class="p">:</span> <span class="dl">"</span><span class="s2">speech-02-hd</span><span class="dl">"</span><span class="p">,</span>
      <span class="na">voice_setting</span><span class="p">:</span> <span class="p">{</span>
        <span class="na">voice_id</span><span class="p">:</span> <span class="dl">"</span><span class="s2">Chinese (Mandarin)_Warm_Bestie</span><span class="dl">"</span><span class="p">,</span>
        <span class="na">speed</span><span class="p">:</span> <span class="mf">0.95</span><span class="p">,</span>
        <span class="na">pitch</span><span class="p">:</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span>
        <span class="na">emotion</span><span class="p">:</span> <span class="dl">"</span><span class="s2">neutral</span><span class="dl">"</span><span class="p">,</span>
      <span class="p">},</span>
      <span class="na">language_boost</span><span class="p">:</span> <span class="dl">"</span><span class="s2">Chinese,Yue</span><span class="dl">"</span><span class="p">,</span>
    <span class="p">},</span>
  <span class="p">}</span>
<span class="p">);</span>
</code></pre></div></div>

<p>With this option, every chunk in the stream has <code class="language-plaintext highlighter-rouge">status: 1</code> and contains only an audio segment, except for the final chunk with <code class="language-plaintext highlighter-rouge">status: 2</code> that now has an empty audio field. This eliminates the need for the status-checking logic we previously had in <code class="language-plaintext highlighter-rouge">processEventData</code>.</p>

<h2 id="replacing-manual-sse-parsing-with-eventsource-parser">Replacing Manual SSE Parsing with eventsource-parser</h2>

<p>In the previous implementation, we manually split the stream on <code class="language-plaintext highlighter-rouge">\n\n</code> boundaries and stripped <code class="language-plaintext highlighter-rouge">data: </code> prefixes. While this worked for Minimax’s specific formatting, it was fragile and made assumptions about the event structure. The <a href="https://github.com/rexxars/eventsource-parser"><code class="language-plaintext highlighter-rouge">eventsource-parser</code></a> library provides a spec-compliant SSE parser that handles all edge cases for us.</p>

<h3 id="using-eventsourceparserstream">Using EventSourceParserStream</h3>

<p>The library exposes an <a href="https://github.com/rexxars/eventsource-parser"><code class="language-plaintext highlighter-rouge">EventSourceParserStream</code></a> – a <code class="language-plaintext highlighter-rouge">TransformStream</code> that takes decoded text and outputs parsed SSE events. This is a direct replacement for our custom <code class="language-plaintext highlighter-rouge">TransformStream</code>:</p>

<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">import</span> <span class="p">{</span> <span class="nx">EventSourceParserStream</span> <span class="p">}</span> <span class="k">from</span> <span class="dl">"</span><span class="s2">eventsource-parser/stream</span><span class="dl">"</span><span class="p">;</span>

<span class="kd">const</span> <span class="nx">eventStream</span> <span class="o">=</span> <span class="nx">response</span>
  <span class="p">.</span><span class="nx">pipeThrough</span><span class="p">(</span><span class="k">new</span> <span class="nx">TextDecoderStream</span><span class="p">())</span>
  <span class="p">.</span><span class="nx">pipeThrough</span><span class="p">(</span><span class="k">new</span> <span class="nx">EventSourceParserStream</span><span class="p">());</span>
</code></pre></div></div>

<p>Each event emitted by <code class="language-plaintext highlighter-rouge">EventSourceParserStream</code> has a <code class="language-plaintext highlighter-rouge">data</code> property containing the event payload (with the <code class="language-plaintext highlighter-rouge">data: </code> prefix already stripped), an <code class="language-plaintext highlighter-rouge">event</code> property for the event type, and an <code class="language-plaintext highlighter-rouge">id</code> property for the event ID. For Minimax’s API, we only need <code class="language-plaintext highlighter-rouge">event.data</code>.</p>

<h3 id="simplified-transformstream">Simplified TransformStream</h3>

<p>With the SSE parsing handled by the library, our <code class="language-plaintext highlighter-rouge">TransformStream</code> now only needs to extract audio data from parsed events:</p>

<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">import</span> <span class="p">{</span> <span class="nx">EventSourceParserStream</span> <span class="p">}</span> <span class="k">from</span> <span class="dl">"</span><span class="s2">eventsource-parser/stream</span><span class="dl">"</span><span class="p">;</span>

<span class="kd">const</span> <span class="nx">audioTransform</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">TransformStream</span><span class="p">({</span>
  <span class="nx">transform</span><span class="p">(</span><span class="nx">event</span><span class="p">,</span> <span class="nx">controller</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">try</span> <span class="p">{</span>
      <span class="kd">const</span> <span class="nx">parsed</span> <span class="o">=</span> <span class="nx">JSON</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="nx">event</span><span class="p">.</span><span class="nx">data</span><span class="p">);</span>

      <span class="k">if</span> <span class="p">(</span><span class="nx">parsed</span><span class="p">.</span><span class="nx">base_resp</span><span class="p">?.</span><span class="nx">status_code</span> <span class="o">!==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="nx">controller</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="nx">parsed</span><span class="p">.</span><span class="nx">base_resp</span><span class="p">?.</span><span class="nx">status_msg</span> <span class="o">||</span> <span class="dl">"</span><span class="s2">Unknown API error</span><span class="dl">"</span><span class="p">);</span>
        <span class="k">return</span><span class="p">;</span>
      <span class="p">}</span>

      <span class="k">if</span> <span class="p">(</span><span class="nx">parsed</span><span class="p">.</span><span class="nx">data</span><span class="p">?.</span><span class="nx">audio</span><span class="p">)</span> <span class="p">{</span>
        <span class="nx">controller</span><span class="p">.</span><span class="nx">enqueue</span><span class="p">(</span><span class="nx">Buffer</span><span class="p">.</span><span class="k">from</span><span class="p">(</span><span class="nx">parsed</span><span class="p">.</span><span class="nx">data</span><span class="p">.</span><span class="nx">audio</span><span class="p">,</span> <span class="dl">"</span><span class="s2">hex</span><span class="dl">"</span><span class="p">));</span>
      <span class="p">}</span>
    <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">error</span><span class="p">)</span> <span class="p">{</span>
      <span class="c1">// Skip malformed events</span>
    <span class="p">}</span>
  <span class="p">},</span>
<span class="p">});</span>

<span class="kd">const</span> <span class="nx">audioStream</span> <span class="o">=</span> <span class="nx">response</span>
  <span class="p">.</span><span class="nx">pipeThrough</span><span class="p">(</span><span class="k">new</span> <span class="nx">TextDecoderStream</span><span class="p">())</span>
  <span class="p">.</span><span class="nx">pipeThrough</span><span class="p">(</span><span class="k">new</span> <span class="nx">EventSourceParserStream</span><span class="p">())</span>
  <span class="p">.</span><span class="nx">pipeThrough</span><span class="p">(</span><span class="nx">audioTransform</span><span class="p">);</span>

<span class="k">return</span> <span class="nx">sendStream</span><span class="p">(</span><span class="nx">event</span><span class="p">,</span> <span class="nx">audioStream</span><span class="p">);</span>
</code></pre></div></div>

<p>Compare this with the previous implementation: the buffer management, <code class="language-plaintext highlighter-rouge">\n\n</code> splitting, <code class="language-plaintext highlighter-rouge">data: </code> prefix stripping, and <code class="language-plaintext highlighter-rouge">flush</code> logic are all gone. The <code class="language-plaintext highlighter-rouge">EventSourceParserStream</code> handles all of that correctly according to the SSE specification.</p>

<h2 id="why-build-a-new-sdk">Why Build a New SDK?</h2>

<p>Minimax does provide an official JavaScript SDK, but it’s an <a href="https://github.com/MiniMax-AI/MiniMax-MCP-JS">MCP (Model Context Protocol) server</a> – designed for AI agent tool-calling, not for direct use in application code. There is no official Node.js client library for calling the Minimax Speech API directly. If you want to integrate Minimax TTS into a server application, you’re left writing raw HTTP calls yourself, which is what I was doing in the <a href="/code/2025/06/21/handling-minimax-tts-api-basic-and-streaming.html">previous post</a>. That’s the gap <code class="language-plaintext highlighter-rouge">minimax-speech-ts</code> fills: a proper typed client library for the HTTP API.</p>

<h2 id="using-minimax-speech-ts">Using minimax-speech-ts</h2>

<p>While the above improvements already simplify the integration, there’s still boilerplate involved: constructing the request, handling authentication, converting hex to buffers, and managing error codes. I built <a href="https://github.com/williamchong/minimax-speech-ts"><code class="language-plaintext highlighter-rouge">minimax-speech-ts</code></a> to wrap all of this into a clean TypeScript SDK.</p>

<h3 id="installation">Installation</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>npm <span class="nb">install </span>minimax-speech-ts
</code></pre></div></div>

<h3 id="basic-usage">Basic Usage</h3>

<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">import</span> <span class="p">{</span> <span class="nx">MiniMaxSpeech</span> <span class="p">}</span> <span class="k">from</span> <span class="dl">"</span><span class="s2">minimax-speech-ts</span><span class="dl">"</span><span class="p">;</span>

<span class="kd">const</span> <span class="nx">client</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">MiniMaxSpeech</span><span class="p">({</span>
  <span class="na">apiKey</span><span class="p">:</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">MINIMAX_API_KEY</span><span class="o">!</span><span class="p">,</span>
  <span class="na">groupId</span><span class="p">:</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">MINIMAX_GROUP_ID</span><span class="p">,</span>
<span class="p">});</span>

<span class="c1">// Non-streaming: get complete audio as a Buffer</span>
<span class="kd">const</span> <span class="nx">result</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">client</span><span class="p">.</span><span class="nx">synthesize</span><span class="p">({</span>
  <span class="na">text</span><span class="p">:</span> <span class="dl">"</span><span class="s2">your text</span><span class="dl">"</span><span class="p">,</span>
  <span class="na">model</span><span class="p">:</span> <span class="dl">"</span><span class="s2">speech-02-hd</span><span class="dl">"</span><span class="p">,</span>
  <span class="na">voiceSetting</span><span class="p">:</span> <span class="p">{</span>
    <span class="na">voiceId</span><span class="p">:</span> <span class="dl">"</span><span class="s2">Chinese (Mandarin)_Warm_Bestie</span><span class="dl">"</span><span class="p">,</span>
    <span class="na">speed</span><span class="p">:</span> <span class="mf">0.95</span><span class="p">,</span>
    <span class="na">pitch</span><span class="p">:</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span>
    <span class="na">emotion</span><span class="p">:</span> <span class="dl">"</span><span class="s2">neutral</span><span class="dl">"</span><span class="p">,</span>
  <span class="p">},</span>
  <span class="na">languageBoost</span><span class="p">:</span> <span class="dl">"</span><span class="s2">Chinese,Yue</span><span class="dl">"</span><span class="p">,</span>
<span class="p">});</span>

<span class="k">await</span> <span class="nx">fs</span><span class="p">.</span><span class="nx">promises</span><span class="p">.</span><span class="nx">writeFile</span><span class="p">(</span><span class="dl">"</span><span class="s2">output.mp3</span><span class="dl">"</span><span class="p">,</span> <span class="nx">result</span><span class="p">.</span><span class="nx">audio</span><span class="p">);</span>
</code></pre></div></div>

<h3 id="streaming-usage">Streaming Usage</h3>

<p>The SDK returns a <code class="language-plaintext highlighter-rouge">ReadableStream&lt;Buffer&gt;</code> that handles all the SSE parsing, hex decoding, and error handling internally:</p>

<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">stream</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">client</span><span class="p">.</span><span class="nx">synthesizeStream</span><span class="p">({</span>
  <span class="na">text</span><span class="p">:</span> <span class="dl">"</span><span class="s2">your text</span><span class="dl">"</span><span class="p">,</span>
  <span class="na">voiceSetting</span><span class="p">:</span> <span class="p">{</span> <span class="na">voiceId</span><span class="p">:</span> <span class="dl">"</span><span class="s2">Chinese (Mandarin)_Warm_Bestie</span><span class="dl">"</span> <span class="p">},</span>
  <span class="na">streamOptions</span><span class="p">:</span> <span class="p">{</span> <span class="na">excludeAggregatedAudio</span><span class="p">:</span> <span class="kc">true</span> <span class="p">},</span>
<span class="p">});</span>

<span class="c1">// Use directly with sendStream in Nuxt/Nitro</span>
<span class="k">return</span> <span class="nx">sendStream</span><span class="p">(</span><span class="nx">event</span><span class="p">,</span> <span class="nx">stream</span><span class="p">);</span>
</code></pre></div></div>

<p>The SDK uses a camelCase interface (<code class="language-plaintext highlighter-rouge">excludeAggregatedAudio</code>) and automatically converts to the snake_case wire format (<code class="language-plaintext highlighter-rouge">exclude_aggregated_audio</code>) expected by the API. It also provides typed error classes for different failure modes:</p>

<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">import</span> <span class="p">{</span>
  <span class="nx">MiniMaxAuthError</span><span class="p">,</span>
  <span class="nx">MiniMaxRateLimitError</span><span class="p">,</span>
  <span class="nx">MiniMaxValidationError</span><span class="p">,</span>
<span class="p">}</span> <span class="k">from</span> <span class="dl">"</span><span class="s2">minimax-speech-ts</span><span class="dl">"</span><span class="p">;</span>

<span class="k">try</span> <span class="p">{</span>
  <span class="kd">const</span> <span class="nx">stream</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">client</span><span class="p">.</span><span class="nx">synthesizeStream</span><span class="p">({</span> <span class="na">text</span><span class="p">:</span> <span class="dl">"</span><span class="s2">hello</span><span class="dl">"</span> <span class="p">});</span>
<span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">e</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">if</span> <span class="p">(</span><span class="nx">e</span> <span class="k">instanceof</span> <span class="nx">MiniMaxRateLimitError</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Back off and retry</span>
  <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="nx">e</span> <span class="k">instanceof</span> <span class="nx">MiniMaxAuthError</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Invalid API key</span>
  <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Beyond basic TTS, the SDK also covers voice cloning, voice design, async synthesis for long-form content, and voice management – the full Minimax Speech HTTP API surface. WebSocket support is planned for a future release.</p>

<h2 id="building-a-typescript-sdk-with-claude-code">Building a TypeScript SDK with Claude Code</h2>

<p>The <a href="https://github.com/williamchong/minimax-speech-ts"><code class="language-plaintext highlighter-rouge">minimax-speech-ts</code></a> library was built entirely using <a href="https://docs.anthropic.com/en/docs/build-with-claude/claude-code">Claude Code</a>, Anthropic’s CLI for Claude. The result is a ~1500-line TypeScript library with 79 tests, a single runtime dependency (<code class="language-plaintext highlighter-rouge">eventsource-parser</code>), dual ESM/CJS output, CI/CD, and TypeDoc-generated API documentation – built in about a day.</p>

<p>I’ve been developing a mental framework for effective AI-assisted coding that I think of as three iterative steps: <strong>Context</strong>, <strong>Limit</strong>, and <strong>Progress</strong>. The development of this SDK is a good concrete example of how it plays out in practice.</p>

<h3 id="step-1-context--feed-the-ai-the-right-information">Step 1: Context – Feed the AI the Right Information</h3>

<p>The quality of AI-generated code is directly bounded by the context it has access to. My initial prompt to Claude Code referenced:</p>

<ul>
  <li>The <a href="https://platform.minimax.io/docs/api-reference/speech-t2a-http">Minimax API documentation</a> for the complete API spec</li>
  <li>The <a href="https://github.com/MiniMax-AI/MiniMax-MCP-JS">MiniMax-MCP-JS</a> repository as a reference for how Minimax structures their APIs</li>
  <li>My <a href="/code/2025/06/21/handling-minimax-tts-api-basic-and-streaming.html">existing blog post</a> as context for the current “dumb” approach</li>
  <li>My other TypeScript libraries (<a href="https://github.com/williamchong/epubcheck-ts">epubcheck-ts</a>, <a href="https://github.com/williamchong/epub.ts">epub.ts</a>) as style reference</li>
</ul>

<p>The single most important factor was that Minimax provides their API documentation in markdown format (e.g. <a href="https://platform.minimax.io/docs/api-reference/speech-t2a-http.md"><code class="language-plaintext highlighter-rouge">speech-t2a-http.md</code></a>). Claude Code could fetch and read the full API specification directly, generating accurate type definitions, request/response handling, and validation logic without manual transcription. Pointing at my existing libraries meant it could match my preferences for project structure, tooling choices (tsup, vitest), and coding conventions without explicit configuration.</p>

<p>Without good context, the AI guesses. With good context, it builds.</p>

<h3 id="step-2-limit--set-boundaries-and-constraints">Step 2: Limit – Set Boundaries and Constraints</h3>

<p>Context alone produces code, but it doesn’t guarantee <em>correct</em> code. The limit step is about defining and enforcing constraints that keep the output within acceptable bounds.</p>

<p><strong>Tests as limits</strong>: Tests were written alongside the implementation from the very first feature commit. This meant Claude Code could run the 79 tests after each change and catch regressions immediately, covering all API methods, error classification, snake_case mapping, and streaming edge cases. Tests are the most concrete form of limit – they define exactly what “correct” means.</p>

<p><strong>Linting as limits</strong>: Adding ESLint with <code class="language-plaintext highlighter-rouge">typescript-eslint</code> in strict mode (including <code class="language-plaintext highlighter-rouge">no-explicit-any</code>) prevented the AI from taking shortcuts with loose types. TypeScript’s strict mode itself is a limit – it forces the generated code to handle nullability and type narrowing properly.</p>

<p><strong>Validation as limits</strong>: Client-side parameter validation (emotion/model compatibility, WAV format restrictions, required field checks) encodes domain-specific constraints that the AI learned from the API docs. The declarative <code class="language-plaintext highlighter-rouge">validate()</code> helper using <code class="language-plaintext highlighter-rouge">[condition, message]</code> tuples made these constraints explicit and testable.</p>

<h3 id="step-3-progress--review-fix-and-ship">Step 3: Progress – Review, Fix, and Ship</h3>

<p>With context and limits in place, the final step is iterating toward completion: expanding scope, catching remaining issues, and preparing for release.</p>

<p><strong>Expanding scope</strong>: Starting from core <code class="language-plaintext highlighter-rouge">synthesize()</code> and <code class="language-plaintext highlighter-rouge">synthesizeStream()</code> methods, I asked Claude Code to expand to the full API surface – async synthesis, file upload, voice cloning, voice design, voice management. The limits from step 2 ensured each expansion didn’t break existing functionality.</p>

<p><strong><code class="language-plaintext highlighter-rouge">/review</code> for catching subtle bugs</strong>: Claude Code has a built-in <code class="language-plaintext highlighter-rouge">/review</code> command that performs a code review on the current codebase. Running <code class="language-plaintext highlighter-rouge">/review</code> after the main implementation caught several issues: a TypeScript overload ordering bug (where the general signature shadowed the specific <code class="language-plaintext highlighter-rouge">outputFormat: 'url'</code> overload), missing <code class="language-plaintext highlighter-rouge">traceId</code> propagation in error paths, and the need to URL-encode the <code class="language-plaintext highlighter-rouge">groupId</code> query parameter. These are exactly the kinds of subtle issues that slip through during development but get caught by a systematic review pass.</p>

<p><strong><code class="language-plaintext highlighter-rouge">AGENTS.md</code> and <code class="language-plaintext highlighter-rouge">.github/copilot-instructions.md</code> for ongoing progress</strong>: The library includes an <a href="https://github.com/williamchong/minimax-speech-ts/blob/master/AGENTS.md"><code class="language-plaintext highlighter-rouge">AGENTS.md</code></a> file that documents the architecture, key patterns, test patterns, and build commands. This file is symlinked as <code class="language-plaintext highlighter-rouge">CLAUDE.md</code> so that Claude Code automatically picks it up as project instructions, while other AI coding tools (GitHub Copilot, Cursor, etc.) can read it via <code class="language-plaintext highlighter-rouge">AGENTS.md</code> – the emerging convention for AI agent context files. The symlink trick avoids maintaining two copies of the same content. The repository also has a <a href="https://docs.github.com/en/copilot/how-tos/configure-custom-instructions/add-repository-instructions"><code class="language-plaintext highlighter-rouge">.github/copilot-instructions.md</code></a> file, generated by GitHub Copilot agent itself based on the existing codebase and <code class="language-plaintext highlighter-rouge">AGENTS.md</code>. GitHub Copilot automatically incorporates these instructions when performing <a href="https://docs.github.com/en/copilot/how-tos/configure-custom-instructions/add-repository-instructions">code reviews on pull requests</a>, so its PR reviews are aware of the project’s specific patterns (e.g. “validate before fetch”, “camelCase public API with snake_case wire format”) rather than applying generic heuristics. These files feed back into step 1 – they become context for future AI-assisted development, creating a virtuous cycle.</p>

<p><strong>Shipping</strong>: Finally, Claude Code generated the README, CI workflow (Node 18/20/22), TypeDoc configuration, license, and npm metadata. Throughout the process, it maintained consistency in patterns like the camelCase-to-snake_case conversion and the error hierarchy.</p>

<h3 id="the-takeaway">The Takeaway</h3>

<p>The <strong>Context → Limit → Progress</strong> cycle isn’t a one-shot sequence – it’s iterative. Each round of progress reveals new context needs and new constraints to enforce. The entire library went from empty directory to published npm package in a day, but the quality came from deliberately feeding the right context, setting clear limits, and iterating through review. I’ll explore this framework in more depth in a future post.</p>

<h2 id="conclusion">Conclusion</h2>

<p>The three improvements covered in this post each address a different layer of the integration:</p>

<ul>
  <li><strong><code class="language-plaintext highlighter-rouge">exclude_aggregated_audio</code></strong> eliminates the need to filter the summary block at the API layer</li>
  <li><strong><code class="language-plaintext highlighter-rouge">eventsource-parser</code></strong> replaces manual SSE parsing with a spec-compliant library</li>
  <li><strong><code class="language-plaintext highlighter-rouge">minimax-speech-ts</code></strong> wraps the entire API into a typed SDK with streaming, error handling, and camelCase ergonomics</li>
</ul>

<p>If you’re integrating Minimax TTS into a Node.js application, using the SDK is the most straightforward path. If you need more control or are integrating with a different runtime, the combination of <code class="language-plaintext highlighter-rouge">exclude_aggregated_audio</code> and <code class="language-plaintext highlighter-rouge">eventsource-parser</code> provides a clean foundation.</p>

<h2 id="additional-resources">Additional Resources</h2>

<h3 id="related-articles">Related Articles</h3>

<ul>
  <li><a href="/2025/06/22/handling-minimax-tts-api-basic-and-streaming/">Handling Minimax TTS API: Basic HTTP and Streaming</a> (Previous post)</li>
  <li><a href="/2023/10/13/convert-google-text-to-speech-to-nodejs-stream/">Convert Google text to speech API result to HTTP streamed response</a></li>
  <li><a href="/2023/10/15/convert-azure-text-to-speech-to-nodejs-stream/">Convert Azure text to speech API result to HTTP streamed response</a></li>
  <li><a href="/2025/09/02/convert-azure-text-to-speech-to-web-readable-stream/">Convert Azure text to speech API result to a web ReadableStream</a></li>
  <li><a href="/2025/06/06/convert-aws-polly-to-nodejs-stream/">Convert AWS Polly text to speech API result to HTTP streamed response</a></li>
</ul>

<h3 id="links">Links</h3>

<ul>
  <li><a href="https://github.com/williamchong/minimax-speech-ts">minimax-speech-ts on GitHub</a></li>
  <li><a href="https://www.npmjs.com/package/minimax-speech-ts">minimax-speech-ts on npm</a></li>
  <li><a href="https://williamchong.github.io/minimax-speech-ts/">minimax-speech-ts API Reference</a></li>
  <li><a href="https://github.com/rexxars/eventsource-parser">eventsource-parser on GitHub</a></li>
  <li><a href="https://platform.minimax.io/docs/api-reference/speech-t2a-http">Minimax TTS API Documentation</a></li>
  <li><a href="https://docs.anthropic.com/en/docs/build-with-claude/claude-code">Claude Code</a></li>
</ul>

<hr />

<p><em>By the way – guess what tool and process I used to write this blog post.</em></p>]]></content><author><name>William Chong</name></author><category term="code" /><category term="minimax" /><category term="text-to-speech" /><category term="tts" /><category term="api-integration" /><category term="server-sent-events" /><category term="sse" /><category term="streaming" /><category term="nodejs" /><category term="typescript" /><category term="sdk" /><category term="eventsource-parser" /><category term="claude-code" /><category term="ai-coding" /><category term="vibe-coding" /><summary type="html"><![CDATA[Follow-up to Minimax TTS streaming integration. How I vibe-coded minimax-speech-ts from API docs to npm package in a day using Claude Code, and the Context-Limit-Progress framework for AI-assisted coding.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.williamchong.cloud/assets/images/2026-02-13-minimax-tts-api-improvements-eventsource-parser-and-sdk/cover.png" /><media:content medium="image" url="https://blog.williamchong.cloud/assets/images/2026-02-13-minimax-tts-api-improvements-eventsource-parser-and-sdk/cover.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Convert Azure text to speech API result to a web ReadableStream</title><link href="https://blog.williamchong.cloud/code/2025/09/01/convert-azure-text-to-speech-to-web-readable-stream.html" rel="alternate" type="text/html" title="Convert Azure text to speech API result to a web ReadableStream" /><published>2025-09-01T18:00:00+00:00</published><updated>2025-09-01T18:00:00+00:00</updated><id>https://blog.williamchong.cloud/code/2025/09/01/convert-azure-text-to-speech-to-web-readable-stream</id><content type="html" xml:base="https://blog.williamchong.cloud/code/2025/09/01/convert-azure-text-to-speech-to-web-readable-stream.html"><![CDATA[<p><img src="/assets/images/2025-09-02-convert-azure-text-to-speech-to-web-readable-stream/cover.png" alt="Azure text to speech API" /></p>

<h2 id="introduction">Introduction</h2>

<p><a href="/code/2023/10/14/convert-azure-text-to-speech-to-nodejs-stream.html">Previously</a>, we discussed the usage of the <a href="https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/text-to-speech">Azure Text-to-Speech API</a> and how to implement real-time audio streaming by converting the API’s output into a stream using Node.js PassThrough. In this tutorial, we will focus on converting the API’s output into a web-compatible <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream">ReadableStream</a>.</p>

<h2 id="why-web-readable-stream">Why web (readable) stream?</h2>

<p>While the Node.js stream module would work for most API use cases, using web streams provides some additional benefits:</p>

<ul>
  <li>
    <p><a href="https://developer.mozilla.org/en-US/docs/Web/API/Streams_API">Web streams</a> are the modern standard across all JavaScript environments. This means you can use the same stream handlers in server and client code. For example, <a href="https://developer.mozilla.org/en-US/docs/Web/API/TransformStream">TransformStream</a> handlers can move between API and browser code without modification, allowing for greater flexibility when design needs it.</p>
  </li>
  <li>
    <p>Compatible with the <a href="https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API">Fetch API</a>. The Fetch API natively returns a <code class="language-plaintext highlighter-rouge">ReadableStream</code>, so using web streams allows for seamless integration with the Fetch API and other web platform features.</p>
  </li>
  <li>
    <p>Unified interface across different TTS providers: By using web streams, we can create a consistent interface for different TTS providers’ SDK or API, since output formats accepted in JavaScript can be converted to a <code class="language-plaintext highlighter-rouge">ReadableStream</code> one way or another.</p>
  </li>
</ul>

<h2 id="pull-or-push">Pull or Push?</h2>

<p>Azure supports <a href="https://docs.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/pushaudiooutputstream"><code class="language-plaintext highlighter-rouge">PushAudioOutputStream</code></a> and <a href="https://docs.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/pullaudiooutputstream"><code class="language-plaintext highlighter-rouge">PullAudioOutputStream</code></a>. Both can be used to create a web-compatible <code class="language-plaintext highlighter-rouge">ReadableStream</code>. In the previous article we used <code class="language-plaintext highlighter-rouge">PushAudioOutputStream</code>, but let’s revisit the technical differences between the two.</p>

<ul>
  <li>
    <p><a href="https://docs.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/pushaudiooutputstream"><code class="language-plaintext highlighter-rouge">PushAudioOutputStream</code></a>: To create <code class="language-plaintext highlighter-rouge">PushAudioOutputStream</code>, we need to provide <a href="https://docs.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/pushaudiooutputstreamcallback"><code class="language-plaintext highlighter-rouge">PushAudioOutputStreamCallback</code></a>, which has to implement <code class="language-plaintext highlighter-rouge">function write(dataBuffer: ArrayBuffer)</code> and <code class="language-plaintext highlighter-rouge">function close()</code>.</p>
  </li>
  <li>
    <p><a href="https://docs.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/pullaudiooutputstream"><code class="language-plaintext highlighter-rouge">PullAudioOutputStream</code></a>: To create <code class="language-plaintext highlighter-rouge">PullAudioOutputStream</code>, we need to provide <a href="https://docs.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/pullaudiooutputstreamcallback"><code class="language-plaintext highlighter-rouge">PullAudioOutputStreamCallback</code></a>, which has to implement <code class="language-plaintext highlighter-rouge">function read(dataBuffer: ArrayBuffer): number</code> and <code class="language-plaintext highlighter-rouge">function close()</code>.</p>
  </li>
</ul>

<p>For our use case, we need to mimic a <code class="language-plaintext highlighter-rouge">fetch</code> streamed response, which returns a web standard <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream"><code class="language-plaintext highlighter-rouge">ReadableStream</code></a>. Comparing the interface of <code class="language-plaintext highlighter-rouge">ReadableStream</code>, which requires <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/ReadableStream#start"><code class="language-plaintext highlighter-rouge">start</code></a>, <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/ReadableStream#pull"><code class="language-plaintext highlighter-rouge">pull</code></a>, and <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/ReadableStream#cancel"><code class="language-plaintext highlighter-rouge">cancel</code></a> methods, we can see that <code class="language-plaintext highlighter-rouge">PullAudioOutputStream</code> is more aligned with the <code class="language-plaintext highlighter-rouge">ReadableStream</code> interface, with its <code class="language-plaintext highlighter-rouge">read</code> method corresponding to the <code class="language-plaintext highlighter-rouge">pull</code> method in <code class="language-plaintext highlighter-rouge">ReadableStream</code>.</p>

<h2 id="implementation">Implementation</h2>

<p>For implementation, first we’ll set up a <a href="https://docs.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/speechsynthesizer">speech synthesizer</a> similar to our <a href="/code/2023/10/14/convert-azure-text-to-speech-to-nodejs-stream.html">previous article</a>. Note the only difference is that we will be setting up a <code class="language-plaintext highlighter-rouge">PullAudioOutputStream</code> here.</p>

<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">outputFormat</span> <span class="o">=</span>
  <span class="nx">sdk</span><span class="p">.</span><span class="nx">SpeechSynthesisOutputFormat</span><span class="p">.</span><span class="nx">Audio16Khz128KBitRateMonoMp3</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">pullStream</span> <span class="o">=</span> <span class="nx">sdk</span><span class="p">.</span><span class="nx">PullAudioOutputStream</span><span class="p">.</span><span class="nx">create</span><span class="p">();</span>
<span class="kd">const</span> <span class="nx">audioConfig</span> <span class="o">=</span> <span class="nx">sdk</span><span class="p">.</span><span class="nx">AudioConfig</span><span class="p">.</span><span class="nx">fromStreamOutput</span><span class="p">(</span><span class="nx">pullStream</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">speechConfig</span> <span class="o">=</span> <span class="nx">sdk</span><span class="p">.</span><span class="nx">SpeechConfig</span><span class="p">.</span><span class="nx">fromSubscription</span><span class="p">(</span>
  <span class="nx">azureSubscriptionKey</span><span class="p">,</span>
  <span class="nx">azureServiceRegion</span>
<span class="p">);</span>
<span class="nx">speechConfig</span><span class="p">.</span><span class="nx">speechSynthesisVoiceName</span> <span class="o">=</span> <span class="nx">voiceName</span><span class="p">;</span>
<span class="nx">speechConfig</span><span class="p">.</span><span class="nx">speechSynthesisOutputFormat</span> <span class="o">=</span> <span class="nx">outputFormat</span><span class="p">;</span>

<span class="kd">const</span> <span class="nx">synthesizer</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">sdk</span><span class="p">.</span><span class="nx">SpeechSynthesizer</span><span class="p">(</span><span class="nx">speechConfig</span><span class="p">,</span> <span class="nx">audioConfig</span><span class="p">);</span>
<span class="kd">function</span> <span class="nx">speakTextAsync</span><span class="p">(</span><span class="nx">synthesizer</span><span class="p">,</span> <span class="nx">text</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">return</span> <span class="k">new</span> <span class="nb">Promise</span><span class="p">((</span><span class="nx">resolve</span><span class="p">,</span> <span class="nx">reject</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span>
    <span class="nx">synthesizer</span><span class="p">.</span><span class="nx">speakTextAsync</span><span class="p">(</span>
      <span class="nx">text</span><span class="p">,</span>
      <span class="kd">function</span> <span class="p">(</span><span class="nx">result</span><span class="p">)</span> <span class="p">{</span>
        <span class="nx">synthesizer</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
        <span class="nx">resolve</span><span class="p">(</span><span class="nx">result</span><span class="p">);</span>
      <span class="p">},</span>
      <span class="kd">function</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="p">{</span>
        <span class="nx">synthesizer</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
        <span class="nx">reject</span><span class="p">(</span><span class="nx">err</span><span class="p">);</span>
      <span class="p">}</span>
    <span class="p">);</span>
  <span class="p">});</span>
<span class="p">}</span>
<span class="k">await</span> <span class="nx">speakTextAsync</span><span class="p">(</span><span class="nx">synthesizer</span><span class="p">,</span> <span class="nx">text</span><span class="p">);</span>
</code></pre></div></div>

<p>Then we’ll create a <code class="language-plaintext highlighter-rouge">ReadableStream</code> from the <code class="language-plaintext highlighter-rouge">PullAudioOutputStream</code>. We only need to implement <code class="language-plaintext highlighter-rouge">pull</code> and <code class="language-plaintext highlighter-rouge">cancel</code>. Since the read method requires an <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer"><code class="language-plaintext highlighter-rouge">ArrayBuffer</code></a>, we’ll create a new buffer for each read operation.</p>

<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">readableStream</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">ReadableStream</span><span class="p">({</span>
  <span class="k">async</span> <span class="nx">pull</span><span class="p">(</span><span class="nx">controller</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">try</span> <span class="p">{</span>
      <span class="kd">const</span> <span class="nx">buffer</span> <span class="o">=</span> <span class="k">new</span> <span class="nb">ArrayBuffer</span><span class="p">(</span><span class="mi">1024</span><span class="p">);</span>
      <span class="kd">const</span> <span class="nx">bytesRead</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">pullStream</span><span class="p">.</span><span class="nx">read</span><span class="p">(</span><span class="nx">buffer</span><span class="p">);</span>

      <span class="k">if</span> <span class="p">(</span><span class="nx">bytesRead</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="kd">const</span> <span class="nx">chunk</span> <span class="o">=</span> <span class="k">new</span> <span class="nb">Uint8Array</span><span class="p">(</span><span class="nx">buffer</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nx">bytesRead</span><span class="p">);</span>
        <span class="nx">controller</span><span class="p">.</span><span class="nx">enqueue</span><span class="p">(</span><span class="nx">chunk</span><span class="p">);</span>
      <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="nx">pullStream</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
        <span class="nx">controller</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
      <span class="p">}</span>
    <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">error</span><span class="p">)</span> <span class="p">{</span>
      <span class="nx">console</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="dl">"</span><span class="s2">[Speech] Error reading from pull stream:</span><span class="dl">"</span><span class="p">,</span> <span class="nx">error</span><span class="p">);</span>
      <span class="nx">controller</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="nx">error</span><span class="p">);</span>
      <span class="nx">pullStream</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
    <span class="p">}</span>
  <span class="p">},</span>
  <span class="nx">cancel</span><span class="p">()</span> <span class="p">{</span>
    <span class="nx">pullStream</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
  <span class="p">},</span>
<span class="p">});</span>

<span class="k">return</span> <span class="nx">readableStream</span><span class="p">;</span>
</code></pre></div></div>

<p>One minor optimization we can make is to use the <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/ReadableStream#type"><code class="language-plaintext highlighter-rouge">type</code></a> and <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/ReadableStream#autoallocatechunksize"><code class="language-plaintext highlighter-rouge">autoAllocateChunkSize</code></a> options on the <code class="language-plaintext highlighter-rouge">ReadableStream</code>. This allows the stream to reuse buffers from existing allocations or allocate new ones as needed to minimize memory usage and improve performance.</p>

<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">readableStream</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">ReadableStream</span><span class="p">({</span>
  <span class="na">type</span><span class="p">:</span> <span class="dl">"</span><span class="s2">bytes</span><span class="dl">"</span><span class="p">,</span>
  <span class="na">autoAllocateChunkSize</span><span class="p">:</span> <span class="mi">1024</span><span class="p">,</span>
  <span class="k">async</span> <span class="nx">pull</span><span class="p">(</span><span class="nx">controller</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">try</span> <span class="p">{</span>
      <span class="kd">const</span> <span class="nx">byobRequest</span> <span class="o">=</span> <span class="nx">controller</span><span class="p">.</span><span class="nx">byobRequest</span><span class="p">;</span>
      <span class="k">if</span> <span class="p">(</span><span class="nx">byobRequest</span><span class="p">?.</span><span class="nx">view</span><span class="p">)</span> <span class="p">{</span>
        <span class="kd">const</span> <span class="nx">buffer</span> <span class="o">=</span> <span class="nx">byobRequest</span><span class="p">.</span><span class="nx">view</span><span class="p">.</span><span class="nx">buffer</span> <span class="k">as</span> <span class="nb">ArrayBuffer</span><span class="p">;</span>
        <span class="kd">const</span> <span class="nx">bytesRead</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">pullStream</span><span class="p">.</span><span class="nx">read</span><span class="p">(</span><span class="nx">buffer</span><span class="p">);</span>

        <span class="k">if</span> <span class="p">(</span><span class="nx">bytesRead</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
          <span class="nx">byobRequest</span><span class="p">.</span><span class="nx">respond</span><span class="p">(</span><span class="nx">bytesRead</span><span class="p">);</span>
        <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
          <span class="nx">pullStream</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
          <span class="nx">controller</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
        <span class="p">}</span>
      <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="c1">// Fallback: allocate our own buffer</span>
        <span class="kd">const</span> <span class="nx">buffer</span> <span class="o">=</span> <span class="k">new</span> <span class="nb">ArrayBuffer</span><span class="p">(</span><span class="mi">1024</span><span class="p">);</span>
        <span class="kd">const</span> <span class="nx">bytesRead</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">pullStream</span><span class="p">.</span><span class="nx">read</span><span class="p">(</span><span class="nx">buffer</span><span class="p">);</span>

        <span class="k">if</span> <span class="p">(</span><span class="nx">bytesRead</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
          <span class="kd">const</span> <span class="nx">chunk</span> <span class="o">=</span> <span class="k">new</span> <span class="nb">Uint8Array</span><span class="p">(</span><span class="nx">buffer</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nx">bytesRead</span><span class="p">);</span>
          <span class="nx">controller</span><span class="p">.</span><span class="nx">enqueue</span><span class="p">(</span><span class="nx">chunk</span><span class="p">);</span>
        <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
          <span class="nx">pullStream</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
          <span class="nx">controller</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
        <span class="p">}</span>
      <span class="p">}</span>
    <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">error</span><span class="p">)</span> <span class="p">{</span>
      <span class="nx">console</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="dl">"</span><span class="s2">[Speech] Error reading from pull stream:</span><span class="dl">"</span><span class="p">,</span> <span class="nx">error</span><span class="p">);</span>
      <span class="nx">controller</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="nx">error</span><span class="p">);</span>
      <span class="nx">pullStream</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
    <span class="p">}</span>
  <span class="p">},</span>
  <span class="nx">cancel</span><span class="p">()</span> <span class="p">{</span>
    <span class="nx">pullStream</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
  <span class="p">},</span>
<span class="p">});</span>

<span class="k">return</span> <span class="nx">readableStream</span><span class="p">;</span>
</code></pre></div></div>

<p>Then we can proceed to read from the <code class="language-plaintext highlighter-rouge">readableStream</code> as needed.</p>

<h2 id="conclusion">Conclusion</h2>

<p>In this blog post, we explored how to convert Azure Text-to-Speech output into a web-readable stream. By leveraging the <a href="https://docs.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/pullaudiooutputstream"><code class="language-plaintext highlighter-rouge">PullAudioOutputStream</code></a> and the <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream">ReadableStream API</a>, we can create a flexible and efficient solution for handling audio data in web applications. This approach not only improves performance but also enhances maintainability by providing a clear separation of concerns between the TTS provider’s interface and the API response format.</p>]]></content><author><name>William Chong</name></author><category term="code" /><summary type="html"><![CDATA[Learn how to convert Azure Text-to-Speech API output into a web-compatible ReadableStream using PullAudioOutputStream for better performance and cross-platform compatibility.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.williamchong.cloud/assets/images/2025-09-02-convert-azure-text-to-speech-to-web-readable-stream/cover.png" /><media:content medium="image" url="https://blog.williamchong.cloud/assets/images/2025-09-02-convert-azure-text-to-speech-to-web-readable-stream/cover.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Handling Minimax TTS API: Basic HTTP and Streaming</title><link href="https://blog.williamchong.cloud/code/2025/06/21/handling-minimax-tts-api-basic-and-streaming.html" rel="alternate" type="text/html" title="Handling Minimax TTS API: Basic HTTP and Streaming" /><published>2025-06-21T18:00:00+00:00</published><updated>2025-06-21T18:00:00+00:00</updated><id>https://blog.williamchong.cloud/code/2025/06/21/handling-minimax-tts-api-basic-and-streaming</id><content type="html" xml:base="https://blog.williamchong.cloud/code/2025/06/21/handling-minimax-tts-api-basic-and-streaming.html"><![CDATA[<p><img src="/assets/images/2025-06-22-handling-minimax-tts-api-basic-and-streaming/cover.png" alt="Minimax TTS API Integration" /></p>

<blockquote>
  <p><strong>Update</strong>: This post has a follow-up — <a href="/code/2026/02/12/minimax-tts-api-improvements-eventsource-parser-and-sdk.html">Minimax TTS API Update: let’s vibe a TypeScript SDK</a> — covering <code class="language-plaintext highlighter-rouge">eventsource-parser</code> for spec-compliant SSE parsing, the new <code class="language-plaintext highlighter-rouge">exclude_aggregated_audio</code> option, and the <a href="https://github.com/williamchong/minimax-speech-ts"><code class="language-plaintext highlighter-rouge">minimax-speech-ts</code></a> TypeScript SDK.</p>
</blockquote>

<h2 id="introduction">Introduction</h2>

<p>Continuing from our series of integrating TTS (Text-to-Speech) API services from <a href="/2023/10/13/convert-google-text-to-speech-to-nodejs-stream/">Google</a>, <a href="/2023/10/15/convert-azure-text-to-speech-to-nodejs-stream/">Azure</a> and <a href="/2025/06/06/convert-aws-polly-to-nodejs-stream/">AWS</a>, the latest attempt in my TTS exploration is <a href="https://minimax.io">Minimax</a>. Minimax is a relatively new company and unlike previous attempts, it is not a full blown cloud service provider, but rather a specialized API service for AI, video and audio related tasks.</p>

<p>Minimax provides three ways to interact with their <a href="https://www.minimax.io/platform/document/T2A%20V2?key=66719005a427f0c8a5701643#TJeyxusWAUP0l3tX67brbAyE">TTS API</a>: WebSocket, HTTP and MCP. Since I’m working on a server application, I’ll focus on the HTTP API. For the HTTP API, Minimax offers 2 modes: streaming and non-streaming. In this post I’ll cover both approaches, with the goal of implementing a complete streaming API with low latency and high performance.</p>

<h2 id="integrating-minimax-tts-api">Integrating Minimax TTS API</h2>

<p>Unfortunately, Minimax doesn’t provide an official SDK, so we’ll use the HTTP API directly. The API is well documented, and we can use any HTTP client to interact with it. In this example, I’ll use the <code class="language-plaintext highlighter-rouge">$fetch</code> function from Nuxt 3, but you can use any HTTP client of your choice.</p>

<h3 id="the-non-streaming-api">The Non-streaming API</h3>

<p>To begin, we’ll implement the basic TTS API integration using the non-streaming API. This means we wait for the audio to be completely generated, then download the entire audio as a buffer. The implementation is simple and straightforward:</p>

<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">response</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">$fetch</span><span class="p">(</span>
  <span class="s2">`https://api.minimaxi.chat/v1/t2a_v2?GroupId=</span><span class="p">${</span><span class="nx">minimaxGroupId</span><span class="p">}</span><span class="s2">`</span><span class="p">,</span>
  <span class="p">{</span>
    <span class="na">method</span><span class="p">:</span> <span class="dl">"</span><span class="s2">POST</span><span class="dl">"</span><span class="p">,</span>
    <span class="na">headers</span><span class="p">:</span> <span class="p">{</span>
      <span class="na">Authorization</span><span class="p">:</span> <span class="s2">`Bearer </span><span class="p">${</span><span class="nx">minimaxAPIKey</span><span class="p">}</span><span class="s2">`</span><span class="p">,</span>
    <span class="p">},</span>
    <span class="na">body</span><span class="p">:</span> <span class="p">{</span>
      <span class="na">text</span><span class="p">:</span> <span class="dl">"</span><span class="s2">your text</span><span class="dl">"</span><span class="p">,</span>
      <span class="na">model</span><span class="p">:</span> <span class="dl">"</span><span class="s2">speech-02-hd</span><span class="dl">"</span><span class="p">,</span>
      <span class="na">voice_setting</span><span class="p">:</span> <span class="p">{</span>
        <span class="na">voice_id</span><span class="p">:</span> <span class="dl">"</span><span class="s2">Chinese (Mandarin)_Warm_Bestie</span><span class="dl">"</span><span class="p">,</span>
        <span class="na">speed</span><span class="p">:</span> <span class="mf">0.95</span><span class="p">,</span>
        <span class="na">pitch</span><span class="p">:</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span>
        <span class="na">emotion</span><span class="p">:</span> <span class="dl">"</span><span class="s2">neutral</span><span class="dl">"</span><span class="p">,</span>
      <span class="p">},</span>
      <span class="na">language_boost</span><span class="p">:</span> <span class="dl">"</span><span class="s2">Chinese,Yue</span><span class="dl">"</span><span class="p">,</span>
    <span class="p">},</span>
  <span class="p">}</span>
<span class="p">);</span>
<span class="kd">const</span> <span class="nx">audioHex</span> <span class="o">=</span> <span class="nx">response</span><span class="p">.</span><span class="nx">data</span><span class="p">.</span><span class="nx">audio</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">audioBuffer</span> <span class="o">=</span> <span class="nx">Buffer</span><span class="p">.</span><span class="k">from</span><span class="p">(</span><span class="nx">audioHex</span><span class="p">,</span> <span class="dl">"</span><span class="s2">hex</span><span class="dl">"</span><span class="p">);</span>
</code></pre></div></div>

<p>Here we’re calling the <a href="https://www.minimax.io/news/minimax-speech-02"><code class="language-plaintext highlighter-rouge">speech-02-hd</code> model</a> with a preference for <code class="language-plaintext highlighter-rouge">Chinese,Yue</code>. This is required for my case because Mandarin Chinese and Cantonese text often look very similar and the target spoken language cannot be reliably determined just by text. High quality Cantonese voice support is one of the unique strength of Minimax. Note that even if the voice is set to <code class="language-plaintext highlighter-rouge">Chinese (Mandarin)_Warm_Bestie</code>, the audio can still be generated in Cantonese, and any English inside the text will still be pronounced as English. This behaviour is similar to the Azure multilingual voices and is very convenient when handling multilingual texts.</p>

<h3 id="streaming-the-non-streaming-api">Streaming the Non-streaming API</h3>

<p>Even if the non-streaming API is not designed for streaming, we can still convert the complete received response into a stream. This allow user to start playing the audio as soon as our API start sending the response, rather than waiting for the entire audio to be downloaded in browser.</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// continue from the previous code snippet</span>
<span class="kd">const</span> <span class="nx">audioBuffer</span> <span class="o">=</span> <span class="nx">Buffer</span><span class="p">.</span><span class="k">from</span><span class="p">(</span><span class="nx">audioHex</span><span class="p">,</span> <span class="dl">"</span><span class="s2">hex</span><span class="dl">"</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">stream</span> <span class="o">=</span> <span class="nx">Readable</span><span class="p">.</span><span class="k">from</span><span class="p">(</span><span class="nx">audioBuffer</span><span class="p">);</span>
<span class="k">return</span> <span class="nx">sendStream</span><span class="p">(</span><span class="nx">event</span><span class="p">,</span> <span class="nx">stream</span><span class="p">);</span>
</code></pre></div></div>

<p>Another optimization we could do is to receive the TTS API response as a stream, and then pipe it to the HTTP response. However we would be still bounded by the fact that the TTS API would not start responding until the entire audio is generated. A much better approach is to use the streamed version of the API.</p>

<h3 id="implement-real-streaming-using-the-streamed-api">Implement Real streaming using the streamed API</h3>

<p>According to the <a href="https://www.minimax.io/platform/document/T2A%20V2?key=66719005a427f0c8a5701643#TJeyxusWAUP0l3tX67brbAyE">Minimax documentation</a>, the streamed API has the following response format:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>//end
{
    "data":{
        "audio":"hex audio_chunk1 + hex audio_chunk2 + hex audio_chunk3",
        "status":2
    },
     "extra_info":{
      ...
    },
    "trace_id":"01b8bf9bb7433cc75c18eee6cfa8fe21",
    "base_resp":{
        "status_code":0,
        "status_msg":""
    }
}
// thrid chunk
{
    "data":{
        "audio":"hex audio_chunk3",
        "status":1,
    },
    "trace_id":"01b8bf9bb7433cc75c18eee6cfa8fe21",
    "base_resp":{
        "status_code":0,
        "status_msg":""
    }
}
//second chunk
{
    "data":{
        "audio":"hex audio_chunk2",
        "status":1,
    },
    "trace_id":"01b8bf9bb7433cc75c18eee6cfa8fe21",
    "base_resp":{
        "status_code":0,
        "status_msg":""
    }
}
//first chunk
{
    "data":{
        "audio":"hex audio_chunk1",
        "status":1,
    },
    "trace_id":"01b8bf9bb7433cc75c18eee6cfa8fe21",
    "base_resp":{
        "status_code":0,
        "status_msg":""
    }
}
</code></pre></div></div>

<p>At first glance, this response format is not trivial to parse. Due to the nature of network streaming, we cannot assume that a response will always be received in a single JSON chunk. This means we need to keep reading the stream until we receive a complete JSON object. Since we might receive multiple JSON objects in a single response, the parentheses matching of <code class="language-plaintext highlighter-rouge">{</code> and <code class="language-plaintext highlighter-rouge">}</code> must be checked when storing the response in a buffer. We also need to trim away any parts from the next JSON object that might be included in the current buffer to ensure <code class="language-plaintext highlighter-rouge">JSON.parse()</code> doesn’t throw an error.</p>

<p>The last chunk of the response is a summary block, which can be ignored for audio streaming purposes. It would be nice if the API allowed us to opt out of receiving the summary block, but since that’s not possible, we’ll simply drop it.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>data: {"data":{"audio":"(hex)...... \n\n
data: ......(hex)","status":1,},"trace_id":"01b8bf9bb7433cc75c18eee6cfa8fe21","base_resp":{"status_code":0,"status_msg":""}}\n\n
</code></pre></div></div>

<p>Fortunately, by investigating the actual formatting and behavior of the streamed API response, the server always chunks its response starting with <code class="language-plaintext highlighter-rouge">data: </code> and ending with <code class="language-plaintext highlighter-rouge">\n\n</code>. This clearly indicates that the streamed API response is implemented as Server-Sent Events (SSE). This means we can make assumptions about parsing the streamed response, or even use a library that handles SSE for us. In this post, I’ll handle the response parsing manually, checking for <code class="language-plaintext highlighter-rouge">data: </code> and <code class="language-plaintext highlighter-rouge">\n</code> to extract the audio chunks.</p>

<p>Let’s implement the streaming API in our Nuxt 3 server:</p>

<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">response</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">$fetch</span><span class="o">&lt;</span><span class="nx">ReadableStream</span><span class="o">&gt;</span><span class="p">(</span>
  <span class="s2">`https://api.minimaxi.chat/v1/t2a_v2?GroupId=</span><span class="p">${</span><span class="nx">minimaxGroupId</span><span class="p">}</span><span class="s2">`</span><span class="p">,</span>
  <span class="p">{</span>
    <span class="na">method</span><span class="p">:</span> <span class="dl">"</span><span class="s2">POST</span><span class="dl">"</span><span class="p">,</span>
    <span class="na">headers</span><span class="p">:</span> <span class="p">{</span>
      <span class="na">Authorization</span><span class="p">:</span> <span class="s2">`Bearer </span><span class="p">${</span><span class="nx">minimaxAPIKey</span><span class="p">}</span><span class="s2">`</span><span class="p">,</span>
    <span class="p">},</span>
    <span class="na">responseType</span><span class="p">:</span> <span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">,</span> <span class="c1">// set this to receive a stream in `response`</span>
    <span class="na">body</span><span class="p">:</span> <span class="p">{</span>
      <span class="na">stream</span><span class="p">:</span> <span class="kc">true</span><span class="p">,</span> <span class="c1">// ask for streaming response</span>
      <span class="na">text</span><span class="p">:</span> <span class="dl">"</span><span class="s2">your text</span><span class="dl">"</span><span class="p">,</span>
      <span class="na">model</span><span class="p">:</span> <span class="dl">"</span><span class="s2">speech-02-hd</span><span class="dl">"</span><span class="p">,</span>
      <span class="na">voice_setting</span><span class="p">:</span> <span class="p">{</span>
        <span class="na">voice_id</span><span class="p">:</span> <span class="dl">"</span><span class="s2">Chinese (Mandarin)_Warm_Bestie</span><span class="dl">"</span><span class="p">,</span>
        <span class="na">speed</span><span class="p">:</span> <span class="mf">0.95</span><span class="p">,</span>
        <span class="na">pitch</span><span class="p">:</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span>
        <span class="na">emotion</span><span class="p">:</span> <span class="dl">"</span><span class="s2">neutral</span><span class="dl">"</span><span class="p">,</span>
      <span class="p">},</span>
      <span class="na">language_boost</span><span class="p">:</span> <span class="dl">"</span><span class="s2">Chinese,Yue</span><span class="dl">"</span><span class="p">,</span>
    <span class="p">},</span>
  <span class="p">}</span>
<span class="p">);</span>
</code></pre></div></div>

<p>To convert the received hex into a stream, we need to read the stream and parse the response as it comes in. We’ll use <a href="https://developer.mozilla.org/en-US/docs/Web/API/TransformStream"><code class="language-plaintext highlighter-rouge">TransformStream</code></a> since fetch returns a <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream"><code class="language-plaintext highlighter-rouge">ReadableStream</code></a> when <code class="language-plaintext highlighter-rouge">responseType</code> is set to <code class="language-plaintext highlighter-rouge">stream</code>. <a href="https://developer.mozilla.org/en-US/docs/Web/API/TransformStream"><code class="language-plaintext highlighter-rouge">TransformStream</code></a> allows us to convert the incoming stream into a format suitable for sending to the client in a streamed format.</p>

<p>First, we need to convert the <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream"><code class="language-plaintext highlighter-rouge">ReadableStream</code></a> (which contains uint8 array chunks) into a string format that we can parse. This is straightforward with the <a href="https://developer.mozilla.org/en-US/docs/Web/API/TextDecoderStream"><code class="language-plaintext highlighter-rouge">TextDecoderStream</code></a> API.</p>

<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">decodedStream</span> <span class="o">=</span> <span class="nx">response</span><span class="p">.</span><span class="nx">pipeThrough</span><span class="p">(</span><span class="k">new</span> <span class="nx">TextDecoderStream</span><span class="p">());</span> <span class="c1">// Now the stream is in string format</span>
</code></pre></div></div>

<p>Next, we’ll create the actual <a href="https://developer.mozilla.org/en-US/docs/Web/API/TransformStream"><code class="language-plaintext highlighter-rouge">TransformStream</code></a> logic. We need to implement <code class="language-plaintext highlighter-rouge">start</code>, <code class="language-plaintext highlighter-rouge">transform</code>, and <code class="language-plaintext highlighter-rouge">flush</code> methods. The main logic will be in the <code class="language-plaintext highlighter-rouge">transform</code> method. Since we know a complete SSE event is separated by <code class="language-plaintext highlighter-rouge">\n\n</code>, we can use this to determine if we need to fetch more data or if we already have a complete event to process. Note that this assumes no <code class="language-plaintext highlighter-rouge">\n\n</code> is present in the audio data itself, which is a safe assumption for audio data in hex format.</p>

<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">processStream</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">TransformStream</span><span class="p">({</span>
  <span class="nx">start</span><span class="p">()</span> <span class="p">{</span>
    <span class="nx">buffer</span> <span class="o">=</span> <span class="dl">""</span><span class="p">;</span>
  <span class="p">},</span>
  <span class="nx">transform</span><span class="p">(</span><span class="nx">chunk</span><span class="p">,</span> <span class="nx">controller</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">try</span> <span class="p">{</span>
      <span class="nx">buffer</span> <span class="o">+=</span> <span class="nx">chunk</span><span class="p">;</span>
      <span class="kd">let</span> <span class="nx">eventEndIndex</span> <span class="o">=</span> <span class="nx">buffer</span><span class="p">.</span><span class="nx">indexOf</span><span class="p">(</span><span class="dl">"</span><span class="se">\n\n</span><span class="dl">"</span><span class="p">);</span>
      <span class="k">while</span> <span class="p">(</span><span class="nx">eventEndIndex</span> <span class="o">!==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="kd">const</span> <span class="nx">event</span> <span class="o">=</span> <span class="nx">buffer</span><span class="p">.</span><span class="nx">substring</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">eventEndIndex</span><span class="p">).</span><span class="nx">trim</span><span class="p">();</span>
        <span class="nx">buffer</span> <span class="o">=</span> <span class="nx">buffer</span><span class="p">.</span><span class="nx">substring</span><span class="p">(</span><span class="nx">eventEndIndex</span> <span class="o">+</span> <span class="mi">2</span><span class="p">);</span> <span class="c1">// '\n\n' is 2 characters long</span>
        <span class="k">if</span> <span class="p">(</span><span class="nx">event</span><span class="p">)</span> <span class="p">{</span>
          <span class="k">try</span> <span class="p">{</span>
            <span class="kd">const</span> <span class="nx">audioBuffer</span> <span class="o">=</span> <span class="nx">processEventData</span><span class="p">(</span><span class="nx">event</span><span class="p">);</span>
            <span class="k">if</span> <span class="p">(</span><span class="nx">audioBuffer</span><span class="p">)</span> <span class="p">{</span>
              <span class="nx">controller</span><span class="p">.</span><span class="nx">enqueue</span><span class="p">(</span><span class="nx">audioBuffer</span><span class="p">);</span>
            <span class="p">}</span>
          <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">error</span><span class="p">)</span> <span class="p">{</span>
            <span class="nx">controller</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span>
              <span class="p">(</span><span class="nx">error</span> <span class="k">as</span> <span class="nb">Error</span><span class="p">).</span><span class="nx">message</span> <span class="o">||</span> <span class="dl">"</span><span class="s2">TTS_PROCESSING_ERROR</span><span class="dl">"</span>
            <span class="p">);</span>
            <span class="k">return</span><span class="p">;</span>
          <span class="p">}</span>
        <span class="p">}</span>
        <span class="nx">eventEndIndex</span> <span class="o">=</span> <span class="nx">buffer</span><span class="p">.</span><span class="nx">indexOf</span><span class="p">(</span><span class="dl">"</span><span class="se">\n\n</span><span class="dl">"</span><span class="p">);</span>
      <span class="p">}</span>
    <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">error</span><span class="p">)</span> <span class="p">{</span>
      <span class="nx">controller</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span>
        <span class="dl">"</span><span class="s2">TTS_PROCESSING_ERROR: Failed to process text-to-speech data</span><span class="dl">"</span>
      <span class="p">);</span>
      <span class="k">return</span><span class="p">;</span>
    <span class="p">}</span>
  <span class="p">},</span>
  <span class="nx">flush</span><span class="p">(</span><span class="nx">controller</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">buffer</span><span class="p">.</span><span class="nx">trim</span><span class="p">())</span> <span class="p">{</span>
      <span class="k">return</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">try</span> <span class="p">{</span>
      <span class="kd">const</span> <span class="nx">audioBuffer</span> <span class="o">=</span> <span class="nx">processEventData</span><span class="p">(</span><span class="nx">buffer</span><span class="p">);</span>
      <span class="k">if</span> <span class="p">(</span><span class="nx">audioBuffer</span><span class="p">)</span> <span class="p">{</span>
        <span class="nx">controller</span><span class="p">.</span><span class="nx">enqueue</span><span class="p">(</span><span class="nx">audioBuffer</span><span class="p">);</span>
      <span class="p">}</span>
    <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">error</span><span class="p">)</span> <span class="p">{</span>
      <span class="c1">// no op</span>
    <span class="p">}</span>
  <span class="p">},</span>
<span class="p">});</span>
</code></pre></div></div>

<p>In <code class="language-plaintext highlighter-rouge">processEventData</code>, we parse the actual event data, removing the SSE <code class="language-plaintext highlighter-rouge">data: </code> prefix and <code class="language-plaintext highlighter-rouge">\n\n</code> suffix. We extract the hex audio data into a <code class="language-plaintext highlighter-rouge">Buffer</code> and return it. The buffer is then sent to the client using <code class="language-plaintext highlighter-rouge">controller.enqueue()</code>.</p>

<p>We also check for <code class="language-plaintext highlighter-rouge">data.status === 1</code> to ensure the summary block at the end of the stream is ignored.</p>

<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">function</span> <span class="nx">processEventData</span><span class="p">(</span><span class="nx">eventData</span><span class="p">:</span> <span class="kr">string</span><span class="p">)</span> <span class="p">{</span>
  <span class="kd">const</span> <span class="nx">dataMatch</span> <span class="o">=</span> <span class="nx">eventData</span><span class="p">.</span><span class="nx">match</span><span class="p">(</span><span class="sr">/^data:</span><span class="se">\s</span><span class="sr">*</span><span class="se">(</span><span class="sr">.+</span><span class="se">)</span><span class="sr">$/m</span><span class="p">);</span>
  <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">dataMatch</span><span class="p">)</span> <span class="k">return</span> <span class="kc">null</span><span class="p">;</span>

  <span class="kd">const</span> <span class="nx">jsonStr</span> <span class="o">=</span> <span class="nx">dataMatch</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="nx">trim</span><span class="p">();</span>
  <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">jsonStr</span><span class="p">)</span> <span class="k">return</span> <span class="kc">null</span><span class="p">;</span>

  <span class="kd">const</span> <span class="nx">parsed</span><span class="p">:</span> <span class="nx">TTSChunk</span> <span class="o">=</span> <span class="nx">JSON</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="nx">jsonStr</span><span class="p">);</span> <span class="c1">// this might throw if the JSON is malformed</span>

  <span class="k">if</span> <span class="p">(</span><span class="nx">parsed</span><span class="p">.</span><span class="nx">base_resp</span><span class="p">?.</span><span class="nx">status_code</span> <span class="o">!==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">throw</span> <span class="nx">createError</span><span class="p">({</span>
      <span class="na">name</span><span class="p">:</span> <span class="dl">"</span><span class="s2">TTS_API_ERROR</span><span class="dl">"</span><span class="p">,</span>
      <span class="na">message</span><span class="p">:</span> <span class="nx">parsed</span><span class="p">.</span><span class="nx">base_resp</span><span class="p">?.</span><span class="nx">status_msg</span> <span class="o">||</span> <span class="dl">"</span><span class="s2">Unknown API error</span><span class="dl">"</span><span class="p">,</span>
    <span class="p">});</span>
  <span class="p">}</span>

  <span class="k">if</span> <span class="p">(</span><span class="nx">parsed</span><span class="p">.</span><span class="nx">data</span><span class="p">.</span><span class="nx">status</span> <span class="o">===</span> <span class="mi">1</span> <span class="o">&amp;&amp;</span> <span class="nx">parsed</span><span class="p">.</span><span class="nx">data</span><span class="p">.</span><span class="nx">audio</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">return</span> <span class="nx">Buffer</span><span class="p">.</span><span class="k">from</span><span class="p">(</span><span class="nx">parsed</span><span class="p">.</span><span class="nx">data</span><span class="p">.</span><span class="nx">audio</span><span class="p">,</span> <span class="dl">"</span><span class="s2">hex</span><span class="dl">"</span><span class="p">);</span>
  <span class="p">}</span>

  <span class="k">return</span> <span class="kc">null</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Finally, we can pipe the transform stream to <code class="language-plaintext highlighter-rouge">sendStream</code> and send a binary audio stream to the client:</p>

<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// continue from the previous code snippet</span>
<span class="c1">// decodedStream is a ReadableStream of strings</span>
<span class="c1">// processStream is a TransformStream that processes the SSE events</span>
<span class="nx">decodedStream</span><span class="p">.</span><span class="nx">pipeThrough</span><span class="p">(</span><span class="nx">processStream</span><span class="p">);</span>
<span class="k">return</span> <span class="nx">sendStream</span><span class="p">(</span><span class="nx">event</span><span class="p">,</span> <span class="nx">processStream</span><span class="p">.</span><span class="nx">readable</span><span class="p">);</span>
</code></pre></div></div>

<p>This approach involves some native assumptions about event formatting, but it works reasonably well as an integration example and MVP for this task.</p>

<h2 id="conclusion">Conclusion</h2>

<p>In this post, we explored how to integrate the Minimax TTS API using both non-streaming and streaming approaches. The non-streaming method provides a straightforward way to get audio data, suitable for short texts or applications where latency isn’t a concern. However, for applications requiring real-time audio playback, the streaming approach, although more complex, offers a significant improvement in user experience by allowing audio to be played as it’s generated.</p>

<p>The streaming implementation demonstrates how to handle Server-Sent Events manually and convert hex audio chunks into a streamable format. This approach provides low latency and improved perceived performance, making it ideal for interactive applications where immediate audio feedback is important.</p>

<h2 id="additional-resources">Additional Resources</h2>

<h3 id="related-tts-integration-articles">Related TTS Integration Articles</h3>

<ul>
  <li><a href="/2023/10/13/convert-google-text-to-speech-to-nodejs-stream/">Convert Google text to speech API result to HTTP streamed response</a></li>
  <li><a href="/2023/10/15/convert-azure-text-to-speech-to-nodejs-stream/">Convert Azure text to speech API result to HTTP streamed response</a></li>
  <li><a href="/2025/06/06/convert-aws-polly-to-nodejs-stream/">Convert AWS Polly text to speech API result to HTTP streamed response</a></li>
  <li><a href="/2023/10/28/openai-api-to-http-streamed-response/">Convert OpenAI API stream to HTTP streamed response</a> (General streaming concepts)</li>
</ul>

<h3 id="documentation">Documentation</h3>

<ul>
  <li><a href="https://www.minimax.io/platform/document/T2A%20V2?key=66719005a427f0c8a5701643#TJeyxusWAUP0l3tX67brbAyE">Minimax TTS API Documentation</a></li>
  <li><a href="https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events">Server-Sent Events - MDN</a></li>
  <li><a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream">ReadableStream - MDN</a></li>
  <li><a href="https://developer.mozilla.org/en-US/docs/Web/API/TransformStream">TransformStream - MDN</a></li>
  <li><a href="https://developer.mozilla.org/en-US/docs/Web/API/TextDecoderStream">TextDecoderStream - MDN</a></li>
</ul>]]></content><author><name>William Chong</name></author><category term="code" /><category term="minimax" /><category term="text-to-speech" /><category term="tts" /><category term="api-integration" /><category term="server-sent-events" /><category term="sse" /><category term="streaming" /><category term="real-time" /><category term="audio" /><category term="nodejs" /><category term="javascript" /><summary type="html"><![CDATA[Complete guide to integrating Minimax Text-to-Speech API with both blocking and streaming approaches. Learn how to implement basic TTS integration and convert it to real-time streaming for better user experience.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.williamchong.cloud/assets/images/2025-06-22-handling-minimax-tts-api-basic-and-streaming/cover.png" /><media:content medium="image" url="https://blog.williamchong.cloud/assets/images/2025-06-22-handling-minimax-tts-api-basic-and-streaming/cover.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Convert AWS Polly text to speech API result to HTTP streamed response</title><link href="https://blog.williamchong.cloud/code/2025/06/06/convert-aws-polly-to-nodejs-stream.html" rel="alternate" type="text/html" title="Convert AWS Polly text to speech API result to HTTP streamed response" /><published>2025-06-06T02:00:00+00:00</published><updated>2025-06-06T02:00:00+00:00</updated><id>https://blog.williamchong.cloud/code/2025/06/06/convert-aws-polly-to-nodejs-stream</id><content type="html" xml:base="https://blog.williamchong.cloud/code/2025/06/06/convert-aws-polly-to-nodejs-stream.html"><![CDATA[<p><img src="/assets/images/2025-06-06-convert-aws-polly-to-nodejs-stream/cover.jpg" alt="AWS Polly text to speech API" /></p>

<p>In previous articles, I’ve explored streaming implementations for <a href="/code/2023/10/13/convert-google-text-to-speech-to-nodejs-stream.html">Google’s Text-to-Speech API</a> and <a href="/code/2023/10/14/convert-azure-text-to-speech-to-nodejs-stream.html">Azure’s Text-to-Speech service</a>. Continuing this series, let’s see how to implement streaming with AWS Polly, Amazon’s text-to-speech service.</p>

<p>Spoiler: It’s trivial.</p>

<h2 id="aws-polly-javascript-sdk">AWS Polly Javascript SDK</h2>

<p>The command we would be using is <a href="https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/client/polly/command/SynthesizeSpeechCommand/">SynthesizeSpeechCommand</a> from the AWS SDK for JavaScript v3. The response from the <code class="language-plaintext highlighter-rouge">SynthesizeSpeechCommand</code> already includes an <code class="language-plaintext highlighter-rouge">AudioStream</code>. <code class="language-plaintext highlighter-rouge">AudioStream</code> has methods <code class="language-plaintext highlighter-rouge">transformToByteArray</code>, <code class="language-plaintext highlighter-rouge">transformToString</code>, and <code class="language-plaintext highlighter-rouge">transformToWebStream</code>, which transform the audio stream into different formats. For HTTP streaming, we can just use <code class="language-plaintext highlighter-rouge">transformToWebStream()</code> to convert the AWS SDK audio stream into a web-compatible stream.</p>

<h2 id="example-code">Example code</h2>

<p>The following code convert a text input to an ogg audio stream using AWS Polly’s <code class="language-plaintext highlighter-rouge">SynthesizeSpeechCommand</code>.</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">import</span> <span class="nx">type</span> <span class="p">{</span>
  <span class="nx">LanguageCode</span><span class="p">,</span>
  <span class="nx">VoiceId</span><span class="p">,</span>
<span class="p">}</span> <span class="k">from</span> <span class="dl">'</span><span class="s1">@aws-sdk/client-polly</span><span class="dl">'</span>
<span class="k">import</span> <span class="p">{</span>
  <span class="nx">PollyClient</span><span class="p">,</span>
  <span class="nx">SynthesizeSpeechCommand</span><span class="p">,</span>
<span class="p">}</span> <span class="k">from</span> <span class="dl">'</span><span class="s1">@aws-sdk/client-polly</span><span class="dl">'</span>

<span class="kd">const</span> <span class="nx">client</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">PollyClient</span><span class="p">({</span>
  <span class="na">region</span><span class="p">:</span> <span class="nx">awsRegion</span><span class="p">,</span>
  <span class="na">credentials</span><span class="p">:</span> <span class="p">{</span>
    <span class="na">accessKeyId</span><span class="p">:</span> <span class="nx">awsAccessKeyId</span><span class="p">,</span>
    <span class="na">secretAccessKey</span><span class="p">:</span> <span class="nx">awsAccessKeySecret</span><span class="p">,</span>
  <span class="p">},</span>
<span class="p">})</span>

<span class="p">...</span>

<span class="kd">const</span> <span class="nx">command</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">SynthesizeSpeechCommand</span><span class="p">({</span>
  <span class="na">Text</span><span class="p">:</span> <span class="nx">text</span><span class="p">,</span>
  <span class="na">OutputFormat</span><span class="p">:</span> <span class="dl">'</span><span class="s1">ogg_vorbis</span><span class="dl">'</span><span class="p">,</span>
  <span class="na">VoiceId</span><span class="p">:</span> <span class="dl">'</span><span class="s1">Ruth</span><span class="dl">'</span> <span class="k">as</span> <span class="nx">VoiceId</span><span class="p">,</span>
  <span class="na">LanguageCode</span><span class="p">:</span> <span class="dl">'</span><span class="s1">en-US</span><span class="dl">'</span> <span class="k">as</span> <span class="nx">LanguageCode</span><span class="p">,</span>
  <span class="na">Engine</span><span class="p">:</span> <span class="dl">'</span><span class="s1">neural</span><span class="dl">'</span><span class="p">,</span>
  <span class="na">TextType</span><span class="p">:</span> <span class="dl">'</span><span class="s1">text</span><span class="dl">'</span><span class="p">,</span>
<span class="p">})</span>
<span class="kd">const</span> <span class="nx">response</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">client</span><span class="p">.</span><span class="nx">send</span><span class="p">(</span><span class="nx">command</span><span class="p">)</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">response</span><span class="p">.</span><span class="nx">AudioStream</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">throw</span> <span class="nx">createError</span><span class="p">({</span>
    <span class="na">status</span><span class="p">:</span> <span class="mi">500</span><span class="p">,</span>
    <span class="na">message</span><span class="p">:</span> <span class="dl">'</span><span class="s1">SPEECH_SYNTHESIS_FAILED</span><span class="dl">'</span><span class="p">,</span>
  <span class="p">})</span>
<span class="p">}</span>
<span class="kd">const</span> <span class="nx">stream</span> <span class="o">=</span> <span class="nx">response</span><span class="p">.</span><span class="nx">AudioStream</span><span class="p">.</span><span class="nx">transformToWebStream</span><span class="p">()</span>
<span class="nx">setHeader</span><span class="p">(</span><span class="nx">event</span><span class="p">,</span> <span class="dl">'</span><span class="s1">content-type</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">audio/ogg; codecs=opus</span><span class="dl">'</span><span class="p">)</span>
<span class="nx">setHeader</span><span class="p">(</span><span class="nx">event</span><span class="p">,</span> <span class="dl">'</span><span class="s1">cache-control</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">public, max-age=3600</span><span class="dl">'</span><span class="p">)</span>
<span class="k">return</span> <span class="nx">sendStream</span><span class="p">(</span><span class="nx">event</span><span class="p">,</span> <span class="nx">stream</span><span class="p">)</span>

</code></pre></div></div>

<h2 id="conclusion">Conclusion</h2>

<p>Since AWS SDK already provides a stream based response and useful helper methods to convert the audio stream, implementing HTTP streaming for AWS Polly is straightforward. This allows you to reduce the delay between the request and the audio playback, enhancing the overall user experience with real-time audio playback.</p>]]></content><author><name>William Chong</name></author><category term="code" /><category term="javascript" /><category term="nodejs" /><category term="aws" /><category term="polly" /><category term="text-to-speech" /><category term="audio-streaming" /><category term="speech-synthesis" /><summary type="html"><![CDATA[A simple guide on implementing HTTP streaming of AWS Polly text-to-speech output improved user experience and reduced latency.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.williamchong.cloud/assets/images/2025-06-06-convert-aws-polly-to-nodejs-stream/cover.jpg" /><media:content medium="image" url="https://blog.williamchong.cloud/assets/images/2025-06-06-convert-aws-polly-to-nodejs-stream/cover.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">E-voting, blockchain and zero-knowledge proof (1): Why are we still using paper to vote? What is an ideal e-voting system?</title><link href="https://blog.williamchong.cloud/technology/2025/05/04/evoting-blockchain-zk-introduction.html" rel="alternate" type="text/html" title="E-voting, blockchain and zero-knowledge proof (1): Why are we still using paper to vote? What is an ideal e-voting system?" /><published>2025-05-04T16:00:00+00:00</published><updated>2025-05-04T16:00:00+00:00</updated><id>https://blog.williamchong.cloud/technology/2025/05/04/evoting-blockchain-zk-introduction</id><content type="html" xml:base="https://blog.williamchong.cloud/technology/2025/05/04/evoting-blockchain-zk-introduction.html"><![CDATA[<p><img src="/assets/images/2025-05-05-evoting-blockchain-zk-introduction/cover.png" alt="E-voting, blockchain and zero-knowledge proof (1)" /></p>

<h2 id="background">Background</h2>

<p>In the year 2025, many of us are still using paper ballots and stamps to vote, still waiting for days for the tallying results, and even weeks for a final outcome if disputes trigger a recount. One would have thought technology would have solved this problem by now. However, e-voting remains a controversial topic and faces significant resistance. The reasons for opposition aren’t merely rooted in tradition or politics; many concerns relate to legitimate issues of security, integrity and privacy. Are we destined to count paper ballots for important decisions indefinitely?</p>

<p>A few years ago, I worked on a <a href="https://github.com/DASC7600-2022-E-Voting/E-Voting-App">project</a> implementing a secure e-voting system on blockchain. It was surprising to me that even in the web3 space—where trustless systems, public verifiability, zero knowledge and governance are all highly valued—e-voting remains a niche topic that has received limited attention and resources. In the end, I wasn’t able to complete a fully functional MVP. Here I would like to share some concepts and ideas I’ve learned, hoping they might benefit others interested in this field.</p>

<p>In this series of articles, we will explore the reasons why e-voting is not widely adopted, and how blockchain and zero-knowledge proof technologies can help address some of these challenges.</p>

<h2 id="why-not-e-voting">Why (not) e-voting?</h2>

<p>E-voting systems come in many forms, but they can primarily be divided into two categories: online voting and electronic voting machines (EVMs). Online voting enables citizens to cast their ballots via the internet, while EVMs are physical devices used for on-site electronic voting.</p>

<p>In the following sections, we’ll discuss the advantages and disadvantages of e-voting. Some points apply to both categories, while others are specific to one form.</p>

<h3 id="advantages-of-e-voting">Advantages of e-voting</h3>

<ul>
  <li>
    <p><strong>Speed &amp; Efficiency</strong>: E-voting can provide faster results, as the counting process is automated and can even be done in real-time.</p>
  </li>
  <li>
    <p><strong>Reduce human error</strong>: Counting errors can be completely eliminated with e-voting, and recounting is never needed. Depending on the system design, errors in voter eligibility verification and even double voting can also be prevented.</p>
  </li>
  <li>
    <p><strong>Accessibility</strong>: E-voting machines and online voting platforms can be designed with accessibility in mind, featuring audio transcription capabilities and adjustable fonts. In traditional voting, people with disabilities often must rely on poll workers for assistance.</p>
  </li>
  <li>
    <p><strong>Cost-effective</strong>: E-voting reduces costs associated with printing, mailing, and labor for counting and verifying votes.</p>
  </li>
  <li>
    <p><strong>Convenience</strong>: Remote voting allows citizens to vote from anywhere, which is especially beneficial for people overseas or with mobility challenges. Even for on-site voting, e-voting machines can reduce waiting times.</p>
  </li>
</ul>

<h3 id="disadvantages-of-e-voting">Disadvantages of e-voting</h3>

<ul>
  <li>
    <p><strong>Security</strong>: E-voting systems must operate on servers connected to the internet, making them vulnerable to hacking and cyber attacks. For example, a hacker could alter vote counts or manipulate election results.</p>
  </li>
  <li>
    <p><strong>Integrity</strong>: E-voting websites and hardware can be susceptible to supply chain attacks and tampering. For instance, a hacker could modify the user interface of a voting machine to mislead voters into selecting the wrong candidate or even change the vote count after the election.</p>
  </li>
  <li>
    <p><strong>Privacy</strong>: Ensuring votes are cast in private and that voter identities remain confidential is crucial. However, in the context of e-voting, it is challenging to guarantee that votes leave no digital fingerprints and are truly anonymous, as digital authentication methods are required to verify voter identities. Data breaches could expose voter identities and even the content of their votes.</p>
  </li>
  <li>
    <p><strong>Technical issues</strong>: Software bugs and hardware glitches can lead to errors in the voting process, such as miscounted votes or system crashes. Additionally, internet connectivity problems or hardware failures could prevent voters from casting their ballots.</p>
  </li>
  <li>
    <p><strong>Transparency</strong>: While paper ballots can be physically counted, audited, and observed by third parties, e-voting systems are often closed-source and proprietary, making it difficult to verify the integrity of such black-box systems. Even open-source or audited systems face challenges in ensuring that the deployed system remains un-tampered and un-compromised.</p>
  </li>
</ul>

<h2 id="what-would-an-ideal-e-voting-system-look-like">What would an ideal e-voting system look like?</h2>

<p>An ideal e-voting system should preserve the advantages of e-voting while mitigating most of the disadvantages. However, it is often impossible to eliminate all drawbacks, and design trade-offs must be made based on specific use cases.</p>

<p>To design a practical e-voting system that would gain widespread acceptance, let’s take a step back and analyze the characteristics of an ideal voting system.</p>

<ul>
  <li>
    <p><strong>Public verifiability</strong>:
The system should allow third parties to verify the following information:</p>

    <ul>
      <li>Number of eligible voters</li>
      <li>The start, end, and duration of the voting period</li>
      <li>The number of votes casted</li>
      <li>The number of votes deemed invalid, if any</li>
      <li>The tallying results</li>
    </ul>

    <p>This ensures that the entire voting process and results can be audited and verified by any third party, not just by a trusted authority.</p>
  </li>
  <li>
    <p><strong>Individual verifiability</strong>:
The system should allow voters to verify the following information. Whether this information can also be verified by third parties depends on design and implementation constraints:</p>

    <ul>
      <li>If one is eligible to vote</li>
      <li>If one’s vote is casted</li>
      <li>If one’s vote is counted</li>
    </ul>

    <p>This ensures that voters can confirm their votes are included in the tally.</p>
  </li>
  <li>
    <p><strong>Anonymity</strong>:
The system should ensure the following information cannot be proven or obtained by anyone, including the voting authority:</p>

    <ul>
      <li>One cannot know the option for a just-cast vote until tallying</li>
      <li>Any voter’s vote cannot be linked to their identity</li>
      <li>No voter can prove they voted for a specific candidate</li>
    </ul>

    <p>These concepts are often referred to as <a href="https://en.wikipedia.org/wiki/Secret_ballot">ballot secrecy</a> and receipt freeness. This ensures voters cannot be coerced or bribed to vote for a specific candidate (vote buying) and that their votes remain truly private.</p>
  </li>
  <li>
    <p><strong>High availability</strong>:
The system should possess the following properties to ensure voters can cast their ballots during the voting period:</p>

    <ul>
      <li>Capacity to handle the expected number of voters</li>
      <li>Resistance to DDOS attacks</li>
      <li>No single point of failure</li>
      <li>Disaster recovery plan</li>
    </ul>
  </li>
</ul>

<h2 id="how-can-blockchain-and-zero-knowledge-proof-help">How can blockchain and zero-knowledge proof help?</h2>

<p>Looking at the above requirements, designing a system that satisfies all of them is highly challenging. How can one make something publicly verifiable while ensuring parts of it remain secret? How can secrecy be maintained in such a system if the secret holder might be incentivized to reveal their voting choice? Many existing systems—e-voting or otherwise—rely on trusting some form of authority to resolve these conflicts and act as the primary entity ensuring the system’s privacy and integrity. This reliance often becomes the source of fraud and disputes, as such authorities can be bribed, compromised, or even have their own incentives to commit fraud.</p>

<p>This is where blockchain and <a href="https://en.wikipedia.org/wiki/Zero-knowledge_proof">zero-knowledge proof</a> come into play. Blockchain is designed as a publicly verifiable ledger that eliminates the need for trusted authorities. It is inherently highly available and tamper-proof. Zero-knowledge techniques, often used on top of blockchain, provide public verifiability of private information. These two technologies can be combined to create a system that is both publicly verifiable and private, while also ensuring the system’s integrity and availability.</p>

<p>In the next article, we will discuss some technical and mathematical concepts related to these technologies and how they can be used to design a practical next-generation e-voting system.</p>]]></content><author><name>William Chong</name></author><category term="technology" /><category term="e-voting" /><category term="blockchain" /><category term="zero-knowledge-proof" /><category term="zk-snark" /><category term="cryptography" /><category term="decentralization" /><category term="election-security" /><category term="privacy" /><category term="web3" /><category term="digital-democracy" /><summary type="html"><![CDATA[Exploring why paper-based voting still prevails in 2025, the security and privacy challenges of electronic voting systems, and how blockchain and zero-knowledge proof technologies can create an ideal e-voting solution. This first article in the series introduces the limitations of existing systems and future development directions.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.williamchong.cloud/assets/images/2025-05-05-evoting-blockchain-zk-introduction/cover.png" /><media:content medium="image" url="https://blog.williamchong.cloud/assets/images/2025-05-05-evoting-blockchain-zk-introduction/cover.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Fix Google Analytics (GA4) Events Not Firing in iFrame</title><link href="https://blog.williamchong.cloud/code/2025/02/19/fix-ga-events-in-iframe.html" rel="alternate" type="text/html" title="Fix Google Analytics (GA4) Events Not Firing in iFrame" /><published>2025-02-19T15:00:00+00:00</published><updated>2025-02-19T15:00:00+00:00</updated><id>https://blog.williamchong.cloud/code/2025/02/19/fix-ga-events-in-iframe</id><content type="html" xml:base="https://blog.williamchong.cloud/code/2025/02/19/fix-ga-events-in-iframe.html"><![CDATA[<p><img src="/assets/images/2025-02-19-fix-ga-events-in-iframe/0.png" alt="Debugging GA4 Events" /></p>

<h2 id="background">Background</h2>

<p>Although I’m not a Google Cloud Platform or Google Analytics expert, I often encounter strange issues that neither ChatGPT nor StackOverflow can resolve. This is one of those cases.</p>

<p>We were working on an HTML widget designed to be embedded in partner websites as an <code class="language-plaintext highlighter-rouge">&lt;iframe&gt;</code>. Google Analytics 4 (GA4) tracking was added to the widget, but we noticed that the number of logged page view events was significantly lower than expected.</p>

<p>Initially, we suspected that ad-blockers were causing the problem. However, after comparing the GA4 data from the partner’s site, we ruled this out. The page views on the parent site were normal and much higher than those logged from the iframe.</p>

<p>It is worth mentioning that this issue is not about triggering iframe GA4 events from a cross-origin parent site, which can be resolved using <a href="https://developer.mozilla.org/en-US/docs/Web/API/Window/postMessage">postMessage</a>.</p>

<hr />

<h2 id="troubleshooting-the-issue-in-production">Troubleshooting the Issue in Production</h2>

<h3 id="initial-checks">Initial Checks</h3>

<p>To eliminate implementation issues, we first confirmed that GA4 events were being sent correctly when the widget was accessed directly (not embedded). The simplest way to do this was to open the widget’s <code class="language-plaintext highlighter-rouge">src</code> URL directly, and check the network tab in the browser’s developer console. By filtering network requests using the GA4’s tag ID, we could easily inspect outgoing events from <code class="language-plaintext highlighter-rouge">gtag</code>.</p>

<p><img src="/assets/images/2025-02-19-fix-ga-events-in-iframe/1.png" alt="Network requests in page" /></p>

<p>Here, we see two requests: one for loading the <code class="language-plaintext highlighter-rouge">gtag</code> JavaScript and another for sending the <code class="language-plaintext highlighter-rouge">page_view</code> event. This confirms that GA4 events are triggered correctly when the widget is opened directly.</p>

<p>Next, we checked what happens when the widget was embedded in an <code class="language-plaintext highlighter-rouge">iframe</code>. By inspecting the <code class="language-plaintext highlighter-rouge">iframe</code> element on the parent site, we observed the following:</p>

<p><img src="/assets/images/2025-02-19-fix-ga-events-in-iframe/2.png" alt="Network requests in iframe" /></p>

<p>The <code class="language-plaintext highlighter-rouge">gtag.js</code> script was loaded correctly, but the <code class="language-plaintext highlighter-rouge">page_view</code> event was not being sent!</p>

<hr />

<h3 id="manually-triggering-ga4-events-from-the-iframe">Manually Triggering GA4 Events from the iFrame</h3>

<p>To determine whether the issue was related to JavaScript, we attempted to manually trigger a <code class="language-plaintext highlighter-rouge">page_view</code> event. By default, <code class="language-plaintext highlighter-rouge">gtag</code> automatically sends a <code class="language-plaintext highlighter-rouge">page_view</code> event upon initialization. To test if the automated event was being blocked, we manually triggered it with the following code:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">gtag</span><span class="p">);</span> <span class="c1">// Sanity check to ensure gtag is loaded</span>
<span class="nx">gtag</span><span class="p">(</span><span class="dl">"</span><span class="s2">event</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">page_view</span><span class="dl">"</span><span class="p">);</span>
</code></pre></div></div>

<p>First, we tested this in the non-embedded version of the widget:</p>

<p><img src="/assets/images/2025-02-19-fix-ga-events-in-iframe/3.png" alt="Console in page" /></p>

<p>As expected, after a few seconds, a new network call is triggered, indicating that the <code class="language-plaintext highlighter-rouge">page_view</code> event was sent successfully. The few seconds delay is due to the default <a href="https://support.google.com/analytics/answer/9322688?hl=en#zippy=%2Crealtime-report%2Cdebugview-report:~:text=Understand%20event%20grouping">GA4 event batching</a>.</p>

<p>Let’s try it in the embedded version. Note that we can select the iframe in the console to run the code within the <code class="language-plaintext highlighter-rouge">iframe</code> context.</p>

<p><img src="/assets/images/2025-02-19-fix-ga-events-in-iframe/4.png" alt="Console in iframe" />
Nothing happened! The event was not sent, indicating that something was blocking it.</p>

<h3 id="using-the-ga-debugger">Using the GA Debugger</h3>

<p>To dig deeper, we used the <a href="https://chromewebstore.google.com/detail/google-analytics-debugger/jnkmfdileelhofjcijamephohjechhna">Google Analytics Debugger Chrome extension</a>, which logs all GA events to the console. After installing and enabling the extension, we refreshed the page.</p>

<p>For sanity checks, we ran it on the non-embedded version first:</p>

<p><img src="/assets/images/2025-02-19-fix-ga-events-in-iframe/5.png" alt="GA Debugger console" /></p>

<p>The event was triggered successfully, and its details were logged in the console.</p>

<p>Next, we tested the embedded version:</p>

<p><img src="/assets/images/2025-02-19-fix-ga-events-in-iframe/6.png" alt="GA Debugger console in iframe" /></p>

<p>The event details were missing, and we saw two warnings in the console:</p>

<ul>
  <li>Unable to update session cookie.</li>
  <li>Unable to set cookie.</li>
</ul>

<p>It became clear that these cookie-related warnings were preventing the event from being sent.</p>

<h2 id="cookies-in-cross-origin-iframes-and-samesite">Cookies in Cross-Origin iFrames and SameSite</h2>

<p>The root cause of the problem was <a href="https://chromestatus.com/feature/5088147346030592">Chromium’s changes</a> to the <code class="language-plaintext highlighter-rouge">SameSite</code> cookie attribute. <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie#samesitesamesite-value">SameSite</a> controls whether cookies should be sent in cross-site requests. In this case, since the iframe’s domain differed from the parent site’s domain, cookies were blocked by default.</p>

<p>To mitigate this issue, the <code class="language-plaintext highlighter-rouge">SameSite=None</code> attribute must be used to allow cross-origin cookies.</p>

<h2 id="fixing-the-cookie-issue">Fixing the Cookie Issue</h2>

<p>To ensure proper cookie settings for <code class="language-plaintext highlighter-rouge">gtag</code> and GA4, the following configuration flags should be added during <code class="language-plaintext highlighter-rouge">gtag</code> initialization:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">gtag</span><span class="p">(</span><span class="dl">"</span><span class="s2">config</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">GA_MEASUREMENT_ID</span><span class="dl">"</span><span class="p">,</span> <span class="p">{</span>
  <span class="na">cookie_flags</span><span class="p">:</span> <span class="dl">"</span><span class="s2">SameSite=None; Secure</span><span class="dl">"</span><span class="p">,</span>
  <span class="na">cookie_update</span><span class="p">:</span> <span class="kc">false</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div></div>

<p>Here’s the result after implementing these flags:</p>

<p><img src="/assets/images/2025-02-19-fix-ga-events-in-iframe/7.png" alt="GA Debugger in iframe after fix" /></p>

<p>With this fix, the <code class="language-plaintext highlighter-rouge">page_view</code> events were successfully sent from the iframe.</p>

<h2 id="conclusion">Conclusion</h2>

<p>While I did knew of the <code class="language-plaintext highlighter-rouge">SameSite</code> cookie changes, I didn’t expect them to affect simple GA4 <code class="language-plaintext highlighter-rouge">page_view</code> events from cross-origin iframes that don’t interact with their parent site. It would be helpful if GA4 provided better fallback mechanisms for events when cookies are blocked—or at the very least, documented this behavior more clearly.</p>

<p>This raises an interesting question: are advertising agencies that embed ads in iframes aware of this issue? I’m not sure. Hopefully, this post won’t degrade everyone’s privacy by revealing this workaround, but it might help others struggling with similar problems.</p>]]></content><author><name>William Chong</name></author><category term="code" /><category term="google-analytics" /><category term="ga4" /><category term="cookies" /><category term="samesite" /><category term="cross-origin" /><category term="iframe" /><category term="web-tracking" /><category term="debugging" /><category term="browser-security" /><category term="chrome" /><category term="third-party-cookies" /><category term="web-development" /><summary type="html"><![CDATA[A comprehensive troubleshooting guide explaining why Google Analytics (GA4) events fail to trigger in cross-origin iframes, with a proven solution using proper SameSite cookie settings to ensure accurate event tracking.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.williamchong.cloud/assets/images/2025-02-19-fix-ga-events-in-iframe/0.png" /><media:content medium="image" url="https://blog.williamchong.cloud/assets/images/2025-02-19-fix-ga-events-in-iframe/0.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Integrating Vertex AI Search for Commerce Part 2: Collecting Realtime Data</title><link href="https://blog.williamchong.cloud/code/2025/02/01/vertex-ai-search-retail-2-collecting-real-time-events.html" rel="alternate" type="text/html" title="Integrating Vertex AI Search for Commerce Part 2: Collecting Realtime Data" /><published>2025-02-01T20:00:00+00:00</published><updated>2025-02-01T20:00:00+00:00</updated><id>https://blog.williamchong.cloud/code/2025/02/01/vertex-ai-search-retail-2-collecting-real-time-events</id><content type="html" xml:base="https://blog.williamchong.cloud/code/2025/02/01/vertex-ai-search-retail-2-collecting-real-time-events.html"><![CDATA[<p><img src="/assets/images/2025-02-02-vertex-ai-search-retail-2-collecting-real-time-events/0.png" alt="Google Cloud Vertex AI Search for Commerce" /></p>

<h2 id="background">Background</h2>

<p>In the <a href="/code/2024/11/16/vertex-ai-search-retail-1-importing-data.html">previous post</a>, we covered the initial steps to set up <a href="https://cloud.google.com/solutions/vertex-ai-search-commerce">Vertex AI Search for Commerce</a> (yes, they have changed their product name again, from Retail to Commerce). For some use cases, importing an existing product catalog and user data may be sufficient to build the data model.</p>

<p>However, if you lack enough existing data to train the model or need fresh data to confirm that your model is functioning as expected, you must feed user events to the Vertex AI Search service as they occur on your website. In this post, we will discuss how to integrate real-time event collection into your site.</p>

<h2 id="collecting-real-time-events">Collecting Real-time Events</h2>

<p>Google provides a <a href="https://cloud.google.com/retail/docs/record-events">documentation page</a> specifically on how to record user events. Generally, there are three methods: Google Tag Manager (GTM), API, and JavaScript Pixel. You can find more details here.</p>

<p>A list of recommended user events and their payloads can be found <a href="https://cloud.google.com/retail/docs/user-events#formats">here</a>.</p>

<p>If you are already using GA4 events, some of these can automatically convert to retail user events, as detailed <a href="https://cloud.google.com/retail/docs/user-events#ga4-mapping">here</a>.</p>

<p>However, some retail user event types do not have a direct equivalent in GA4, which means you will need to either ignore those event types or send them manually through another method.</p>

<h3 id="the-easy-way---gtm--ga4">The Easy Way - GTM &amp; GA4</h3>

<p>If you already have GTM set up on your website, this should be the simplest method. The GTM console includes templates like <code class="language-plaintext highlighter-rouge">Variable - Ecommerce</code> and <code class="language-plaintext highlighter-rouge">Variable - Cloud Retail</code>. Refer to the GTM documentation for straightforward implementation.</p>

<p>Unfortunately, we do not have GTM set up on our website due to complications with running it alongside Content Security Policy (CSP), so I will skip this part.</p>

<h3 id="api---for-those-with-ga4-or-concerned-about-ad-blockers">API - For Those with GA4 or Concerned About Ad Blockers</h3>

<p>If you are familiar with implementing server-side events (like in <a href="https://developers.facebook.com/docs/marketing-api/conversions-api/">Meta’s Conversions API</a>), you can manually build your JSON payload and use the <a href="https://cloud.google.com/retail/docs/record-events#write"><code class="language-plaintext highlighter-rouge">userEvents.write</code> API to send events</a> to the Vertex AI Search service. This ensures that events are sent even if the user has ad blockers or has disabled JavaScript, provided you have the appropriate server-side API related to the event.</p>

<p>Additionally, if you are already using Google Analytics 4 (GA4), you can use a <code class="language-plaintext highlighter-rouge">prebuilt_rule</code> called <code class="language-plaintext highlighter-rouge">ga4_bq</code> to send a GA4 event payload directly to the <code class="language-plaintext highlighter-rouge">userEvents.write</code> API. While this is mentioned in the documentation, I’m unsure in what scenarios this would be applicable since you cannot directly retrieve a GA4 event payload from <code class="language-plaintext highlighter-rouge">gtm.js</code>. Please inspire me if you have ideas!</p>

<h3 id="javascript-pixel-tracking">JavaScript Pixel Tracking</h3>

<p>For me, the most suitable method appears to be using the JavaScript Pixel. Similar to GA4 or other pixel tracking services, I would simply include a script and call functions when a user event occurs.</p>

<p>The documentation provides a complete JavaScript example:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">var</span> <span class="nx">user_event</span> <span class="o">=</span> <span class="p">{</span>
  <span class="na">eventType</span><span class="p">:</span> <span class="dl">"</span><span class="s2">detail-page-view</span><span class="dl">"</span><span class="p">,</span>
  <span class="na">visitorId</span><span class="p">:</span> <span class="dl">"</span><span class="s2">visitor-id</span><span class="dl">"</span><span class="p">,</span>
  <span class="na">userInfo</span><span class="p">:</span> <span class="p">{</span>
    <span class="na">userId</span><span class="p">:</span> <span class="dl">"</span><span class="s2">user-id</span><span class="dl">"</span><span class="p">,</span>
  <span class="p">},</span>
  <span class="na">attributionToken</span><span class="p">:</span> <span class="dl">"</span><span class="s2">attribution-token</span><span class="dl">"</span><span class="p">,</span>
  <span class="na">experimentIds</span><span class="p">:</span> <span class="dl">"</span><span class="s2">experiment-id</span><span class="dl">"</span><span class="p">,</span>
  <span class="na">productDetails</span><span class="p">:</span> <span class="p">[</span>
    <span class="p">{</span>
      <span class="na">product</span><span class="p">:</span> <span class="p">{</span> <span class="na">id</span><span class="p">:</span> <span class="dl">"</span><span class="s2">123</span><span class="dl">"</span> <span class="p">},</span>
    <span class="p">},</span>
  <span class="p">],</span>
<span class="p">};</span>

<span class="kd">var</span> <span class="nx">_gre</span> <span class="o">=</span> <span class="nx">_gre</span> <span class="o">||</span> <span class="p">[];</span>
<span class="c1">// Credentials for project.</span>
<span class="nx">_gre</span><span class="p">.</span><span class="nx">push</span><span class="p">([</span><span class="dl">"</span><span class="s2">apiKey</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">api-key</span><span class="dl">"</span><span class="p">]);</span>
<span class="nx">_gre</span><span class="p">.</span><span class="nx">push</span><span class="p">([</span><span class="dl">"</span><span class="s2">logEvent</span><span class="dl">"</span><span class="p">,</span> <span class="nx">user_event</span><span class="p">]);</span>
<span class="nx">_gre</span><span class="p">.</span><span class="nx">push</span><span class="p">([</span><span class="dl">"</span><span class="s2">projectId</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">project-id</span><span class="dl">"</span><span class="p">]);</span>
<span class="nx">_gre</span><span class="p">.</span><span class="nx">push</span><span class="p">([</span><span class="dl">"</span><span class="s2">locationId</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">global</span><span class="dl">"</span><span class="p">]);</span>
<span class="nx">_gre</span><span class="p">.</span><span class="nx">push</span><span class="p">([</span><span class="dl">"</span><span class="s2">catalogId</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">default_catalog</span><span class="dl">"</span><span class="p">]);</span>

<span class="p">(</span><span class="kd">function</span> <span class="p">()</span> <span class="p">{</span>
  <span class="kd">var</span> <span class="nx">gre</span> <span class="o">=</span> <span class="nb">document</span><span class="p">.</span><span class="nx">createElement</span><span class="p">(</span><span class="dl">"</span><span class="s2">script</span><span class="dl">"</span><span class="p">);</span>
  <span class="nx">gre</span><span class="p">.</span><span class="nx">type</span> <span class="o">=</span> <span class="dl">"</span><span class="s2">text/javascript</span><span class="dl">"</span><span class="p">;</span>
  <span class="nx">gre</span><span class="p">.</span><span class="k">async</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
  <span class="nx">gre</span><span class="p">.</span><span class="nx">src</span> <span class="o">=</span> <span class="dl">"</span><span class="s2">https://www.gstatic.com/retail/v2_event.js</span><span class="dl">"</span><span class="p">;</span>
  <span class="kd">var</span> <span class="nx">s</span> <span class="o">=</span> <span class="nb">document</span><span class="p">.</span><span class="nx">getElementsByTagName</span><span class="p">(</span><span class="dl">"</span><span class="s2">script</span><span class="dl">"</span><span class="p">)[</span><span class="mi">0</span><span class="p">];</span>
  <span class="nx">s</span><span class="p">.</span><span class="nx">parentNode</span><span class="p">.</span><span class="nx">insertBefore</span><span class="p">(</span><span class="nx">gre</span><span class="p">,</span> <span class="nx">s</span><span class="p">);</span>
<span class="p">})();</span>
</code></pre></div></div>

<p>While this seems straightforward, I have some bad feeling looking at it. The example covers the code for configuration, event formatting, and script loading, but why are the script tags created after all the event formatting and function calls? What function from <code class="language-plaintext highlighter-rouge">window._gre</code> should I invoke after the script has loaded?</p>

<h2 id="retail-pixel---it-doesnt-work-as-expected">Retail Pixel - It Doesn’t Work as Expected</h2>

<p>From the example code above, it seems that the retail pixel behaves like GTM or any other pixel tracking service. My expectations were:</p>

<p>A global variable under window is created as an array, in this case, <code class="language-plaintext highlighter-rouge">_gre</code>.
The array is used to store configuration and function calls before the script loads.
The actual script loads asynchronously. After the script loads, it should read the <code class="language-plaintext highlighter-rouge">_gre</code> array and execute any configuration or function calls stored within.
Finally, the <code class="language-plaintext highlighter-rouge">_gre</code> should be replaced by a new object with the same name, implementing functions that allow interaction with the retail API.
However, parts of this process are not shown in the example code, particularly what to do after the script is loaded. One might expect to either call a new function like <code class="language-plaintext highlighter-rouge">_gre.logEvent</code>, or continue using <code class="language-plaintext highlighter-rouge">_gre.push</code> to send user events via the pixel. Unfortunately, such functionality does not exist, and the example code appears to encompass the entire functionality of the retail pixel.</p>

<h3 id="configuration-values-are-not-stored-in-pixel">Configuration Values Are Not Stored in Pixel</h3>

<p>Typically, one would expect configuration values like:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">_gre</span><span class="p">.</span><span class="nx">push</span><span class="p">([</span><span class="dl">'</span><span class="s1">apiKey</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">api-key</span><span class="dl">'</span><span class="p">]);</span>
<span class="nx">_gre</span><span class="p">.</span><span class="nx">push</span><span class="p">([</span><span class="dl">'</span><span class="s1">projectId</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">project-id</span><span class="dl">'</span><span class="p">]);</span>
<span class="nx">_gre</span><span class="p">.</span><span class="nx">push</span><span class="p">([</span><span class="dl">'</span><span class="s1">locationId</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">global</span><span class="dl">'</span><span class="p">]);</span>
<span class="nx">_gre</span><span class="p">.</span><span class="nx">push</span><span class="p">([</span><span class="dl">'</span><span class="s1">catalogId</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">default_catalog</span><span class="dl">'</span><span class="p">]);</span>
</code></pre></div></div>

<p>to be stored in the pixel object. Any subsequent calls to <code class="language-plaintext highlighter-rouge">logEvent</code> (or its <code class="language-plaintext highlighter-rouge">_gre.push</code> equivalent) would then be able to read these values. However, these values are not retained in the pixel object, meaning they must be provided each time a user event is sent.</p>

<h3 id="_grepush-is-not-hooked-by-the-script"><code class="language-plaintext highlighter-rouge">_gre.push</code> Is Not Hooked by the Script</h3>

<p>User events are sent by invoking <code class="language-plaintext highlighter-rouge">_gre.push(['logEvent', user_event])</code>. This works if <code class="language-plaintext highlighter-rouge">_gre.push</code> is called before the pixel JavaScript is loaded. However, if it is called after the script has loaded, it will not function, as the <code class="language-plaintext highlighter-rouge">_gre.push</code> function is not hooked after the script is loaded. In GTM, pushing something into the data layer should trigger GTM events because it hooks into the <code class="language-plaintext highlighter-rouge">.push</code> function of data layer variables. Unfortunately, this is not the case for the retail pixel; you must manually trigger another function to send the user event after the script is loaded.</p>

<h3 id="_grelogevent-is-not-a-function"><code class="language-plaintext highlighter-rouge">_gre.logEvent</code> Is Not a Function</h3>

<p>Continuing from the previous point, if <code class="language-plaintext highlighter-rouge">.push</code> does not work, what would the trigger function be? It appears that <code class="language-plaintext highlighter-rouge">logEvent</code> is a function name that should be called. However, it is not a function, and calling<code class="language-plaintext highlighter-rouge"> _gre.logEvent</code> results in an error.</p>

<h2 id="simple-analysis-of-the-retail-pixel">Simple Analysis of the Retail Pixel</h2>

<p>To resolve these mysteries, one must examine <a href="https://www.gstatic.com/retail/v2_event.js">the source code of the retail pixel</a>.</p>

<p>It turns out the function is indeed called <code class="language-plaintext highlighter-rouge">logEvent</code>, but it is encapsulated within a new variable called <code class="language-plaintext highlighter-rouge">cloud_retail</code>. This <code class="language-plaintext highlighter-rouge">cloud_retail.logEvent</code> function takes the entire <code class="language-plaintext highlighter-rouge">_gre</code> as a parameter, and is only invoked once after the script has loaded!</p>

<p>To send a user event after the script is loaded, you must call <code class="language-plaintext highlighter-rouge">_gre.push(['logEvent', user_event])</code> and then call <code class="language-plaintext highlighter-rouge">cloud_retail.logEvent(_gre)</code> after the script is loaded.</p>

<p>However, note that the <code class="language-plaintext highlighter-rouge">_gre</code> array is not cleared after the script loads, so if there are existing events in <code class="language-plaintext highlighter-rouge">_gre</code> before the script loads, they will also be sent if you call <code class="language-plaintext highlighter-rouge">logEvent(_gre)</code> again, resulting in duplicate events. For <code class="language-plaintext highlighter-rouge">cloud_retail.logEvent(_gre)</code> to function correctly, you must clear <code class="language-plaintext highlighter-rouge">_gre</code> after invoking it to prevent this.</p>

<p>Another issue arises: <code class="language-plaintext highlighter-rouge">_gre</code> or the pixel does not store configuration values like project ID or catalog ID, meaning these values must be passed again to <code class="language-plaintext highlighter-rouge">_gre</code>.</p>

<p>Thus, to log events using <code class="language-plaintext highlighter-rouge">_gre</code> and <code class="language-plaintext highlighter-rouge">cloud_retail.logEvent</code> naively, the code would look like this:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">_gre</span> <span class="o">=</span> <span class="p">[];</span>
<span class="nx">_gre</span><span class="p">.</span><span class="nx">push</span><span class="p">([</span><span class="dl">"</span><span class="s2">apiKey</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">api-key</span><span class="dl">"</span><span class="p">]);</span>
<span class="nx">_gre</span><span class="p">.</span><span class="nx">push</span><span class="p">([</span><span class="dl">"</span><span class="s2">projectId</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">project-id</span><span class="dl">"</span><span class="p">]);</span>
<span class="nx">_gre</span><span class="p">.</span><span class="nx">push</span><span class="p">([</span><span class="dl">"</span><span class="s2">locationId</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">global</span><span class="dl">"</span><span class="p">]);</span>
<span class="nx">_gre</span><span class="p">.</span><span class="nx">push</span><span class="p">([</span><span class="dl">"</span><span class="s2">catalogId</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">default_catalog</span><span class="dl">"</span><span class="p">]);</span>
<span class="nx">_gre</span><span class="p">.</span><span class="nx">push</span><span class="p">([</span><span class="dl">"</span><span class="s2">logEvent</span><span class="dl">"</span><span class="p">,</span> <span class="nx">user_event</span><span class="p">]);</span>
<span class="nx">cloud_retail</span><span class="p">.</span><span class="nx">logEvent</span><span class="p">(</span><span class="nx">_gre</span><span class="p">);</span>
</code></pre></div></div>

<h2 id="hack-or-actual-implementation">Hack, or “actual implementation”</h2>

<p>A more practical solution would be to just forget <code class="language-plaintext highlighter-rouge">_gre</code>, store these config values in a separate object, and pass it directly to <code class="language-plaintext highlighter-rouge">cloud_retail.logEvent</code> every time an event is sent.</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">function</span> <span class="nx">logEvent</span><span class="p">(</span><span class="nx">user_event</span><span class="p">)</span> <span class="p">{</span>
  <span class="c1">// get api-key, project-id, global and default_catalo from some where</span>
  <span class="nx">cloud_retail</span><span class="p">.</span><span class="nx">logEvent</span><span class="p">([</span>
    <span class="p">[</span><span class="dl">"</span><span class="s2">apiKey</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">api-key</span><span class="dl">"</span><span class="p">],</span>
    <span class="p">[</span><span class="dl">"</span><span class="s2">projectId</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">project-id</span><span class="dl">"</span><span class="p">],</span>
    <span class="p">[</span><span class="dl">"</span><span class="s2">locationId</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">global</span><span class="dl">"</span><span class="p">],</span>
    <span class="p">[</span><span class="dl">"</span><span class="s2">catalogId</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">default_catalog</span><span class="dl">"</span><span class="p">],</span>
    <span class="p">[</span><span class="dl">"</span><span class="s2">logEvent</span><span class="dl">"</span><span class="p">,</span> <span class="nx">user_event</span><span class="p">],</span>
  <span class="p">]);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Consequently, I developed this class for my Nuxt project:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">class</span> <span class="nx">GoogleRetailPixel</span> <span class="p">{</span>
  <span class="nx">userId</span><span class="p">;</span>
  <span class="nx">visitorId</span><span class="p">;</span>
  <span class="nx">apiKey</span><span class="p">;</span>
  <span class="nx">projectId</span><span class="p">;</span>

  <span class="kd">constructor</span><span class="p">(</span><span class="nx">apiKey</span><span class="p">,</span> <span class="nx">projectId</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">this</span><span class="p">.</span><span class="nx">apiKey</span> <span class="o">=</span> <span class="nx">apiKey</span><span class="p">;</span>
    <span class="k">this</span><span class="p">.</span><span class="nx">projectId</span> <span class="o">=</span> <span class="nx">projectId</span><span class="p">;</span>
    <span class="nb">window</span><span class="p">.</span><span class="nx">_gre</span> <span class="o">=</span> <span class="nb">window</span><span class="p">.</span><span class="nx">_gre</span> <span class="o">||</span> <span class="p">[];</span>
    <span class="nb">window</span><span class="p">.</span><span class="nx">_gre</span><span class="p">.</span><span class="nx">push</span><span class="p">([</span><span class="dl">"</span><span class="s2">apiKey</span><span class="dl">"</span><span class="p">,</span> <span class="nx">apiKey</span><span class="p">]);</span>
    <span class="nb">window</span><span class="p">.</span><span class="nx">_gre</span><span class="p">.</span><span class="nx">push</span><span class="p">([</span><span class="dl">"</span><span class="s2">projectId</span><span class="dl">"</span><span class="p">,</span> <span class="nx">projectId</span><span class="p">]);</span>
    <span class="nb">window</span><span class="p">.</span><span class="nx">_gre</span><span class="p">.</span><span class="nx">push</span><span class="p">([</span><span class="dl">"</span><span class="s2">locationId</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">global</span><span class="dl">"</span><span class="p">]);</span>
    <span class="nb">window</span><span class="p">.</span><span class="nx">_gre</span><span class="p">.</span><span class="nx">push</span><span class="p">([</span><span class="dl">"</span><span class="s2">catalogId</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">default_catalog</span><span class="dl">"</span><span class="p">]);</span>
  <span class="p">}</span>

  <span class="nx">setUserId</span><span class="p">(</span><span class="nx">userId</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">this</span><span class="p">.</span><span class="nx">userId</span> <span class="o">=</span> <span class="nx">userId</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="nx">setVisitorId</span><span class="p">(</span><span class="nx">visitorId</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">this</span><span class="p">.</span><span class="nx">visitorId</span> <span class="o">=</span> <span class="nx">visitorId</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="nx">logEvent</span><span class="p">(</span><span class="nx">eventType</span><span class="p">,</span> <span class="nx">payload</span> <span class="o">=</span> <span class="p">{},</span> <span class="p">{</span> <span class="nx">attributionToken</span><span class="p">,</span> <span class="nx">experimentIds</span> <span class="p">}</span> <span class="o">=</span> <span class="p">{})</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">visitorId</span><span class="p">)</span> <span class="p">{</span>
      <span class="k">return</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="kd">const</span> <span class="nx">event</span> <span class="o">=</span> <span class="p">{</span>
      <span class="nx">eventType</span><span class="p">,</span>
      <span class="nx">attributionToken</span><span class="p">,</span>
      <span class="nx">experimentIds</span><span class="p">,</span>
      <span class="na">visitorId</span><span class="p">:</span> <span class="k">this</span><span class="p">.</span><span class="nx">visitorId</span><span class="p">,</span>
      <span class="na">userInfo</span><span class="p">:</span> <span class="k">this</span><span class="p">.</span><span class="nx">userId</span>
        <span class="p">?</span> <span class="p">{</span>
            <span class="na">userId</span><span class="p">:</span> <span class="k">this</span><span class="p">.</span><span class="nx">userId</span><span class="p">,</span>
          <span class="p">}</span>
        <span class="p">:</span> <span class="kc">undefined</span><span class="p">,</span>
      <span class="p">...</span><span class="nx">payload</span><span class="p">,</span>
    <span class="p">};</span>
    <span class="c1">// HACK: cloud_retail does not replace _gre on init</span>
    <span class="c1">// cloud_retail only calls logEvent once on _gre and</span>
    <span class="c1">// it does not even clear _gre after that</span>
    <span class="k">if</span> <span class="p">(</span><span class="nb">window</span><span class="p">.</span><span class="nx">cloud_retail</span><span class="p">)</span> <span class="p">{</span>
      <span class="nb">window</span><span class="p">.</span><span class="nx">cloud_retail</span><span class="p">.</span><span class="nx">logEvent</span><span class="p">([</span>
        <span class="p">[</span><span class="dl">"</span><span class="s2">apiKey</span><span class="dl">"</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">apiKey</span><span class="p">],</span>
        <span class="p">[</span><span class="dl">"</span><span class="s2">projectId</span><span class="dl">"</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">projectId</span><span class="p">],</span>
        <span class="p">[</span><span class="dl">"</span><span class="s2">locationId</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">global</span><span class="dl">"</span><span class="p">],</span>
        <span class="p">[</span><span class="dl">"</span><span class="s2">catalogId</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">default_catalog</span><span class="dl">"</span><span class="p">],</span>
        <span class="p">[</span><span class="dl">"</span><span class="s2">logEvent</span><span class="dl">"</span><span class="p">,</span> <span class="nx">event</span><span class="p">],</span>
      <span class="p">]);</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
      <span class="nb">window</span><span class="p">.</span><span class="nx">_gre</span><span class="p">.</span><span class="nx">push</span><span class="p">([</span><span class="dl">"</span><span class="s2">logEvent</span><span class="dl">"</span><span class="p">,</span> <span class="nx">event</span><span class="p">]);</span>
    <span class="p">}</span>
  <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>With this implementation, I push events into <code class="language-plaintext highlighter-rouge">_gre</code> if the <code class="language-plaintext highlighter-rouge">cloud_retail</code> object is not yet loaded, and call <code class="language-plaintext highlighter-rouge">cloud_retail.logEvent</code> if it is. This prevents duplicate events from being sent while allowing the asynchronous loading of the retail pixel.</p>

<p>I have also wrapped this functionality <a href="https://www.npmjs.com/package/nuxt-gre-pixel-module">as a Nuxt 2 library</a>, just in case I need to use it in other projects.</p>

<p>Since the project requiring this is still running Nuxt 2, I haven’t written a Nuxt 3 version yet, but it should be straightforward and may happen if we migrate to Nuxt 3.</p>

<h2 id="confirming-event-transmission">Confirming Event Transmission</h2>

<p><img src="/assets/images/2025-02-02-vertex-ai-search-retail-2-collecting-real-time-events/0.png" alt="User Events Integration page" />
To confirm that the events are being sent, you can check the User Events Integration page in the Vertex AI Search console, specifically within the Event Tab on the Data Page. This section displays real-time event counts from the specified date to the present. You can also monitor the percentage of unjoined events to ensure you are sending correct product IDs or to determine if the product catalog needs updating or importing.</p>

<h2 id="conclusion">Conclusion</h2>

<p>It seems that using the retail pixel is not as straightforward as I initially thought. The pixel has limited functionality and does not appear as production-ready as other Google products or similar services named “pixel”. However, with a few tweaks and hacks, it can still function as expected.</p>

<p><img src="/assets/images/2025-02-02-vertex-ai-search-retail-2-collecting-real-time-events/1.png" alt="Data Error when training model" /></p>

<p>Now that we have real-time events being sent to the Vertex AI Search service, I hope that my model requirements will be met, allowing me to start training my model. However, even if I meet the data requirements stated on the model training page, the training could still fail midway due to insufficient data.</p>

<p>As this issue is challenging for me to debug as a Google Cloud user, I will likely wait to see if more data collected over time resolves the problem or if I can find the time to contact someone from Google Cloud to investigate. Stay tuned for any new posts in case I have further findings.</p>]]></content><author><name>William Chong</name></author><category term="code" /><category term="google-cloud" /><category term="vertex-ai" /><category term="retail-search" /><category term="commerce" /><category term="javascript-pixel" /><category term="tracking-implementation" /><category term="nuxt" /><category term="event-collection" /><category term="product-recommendation" /><category term="real-time-data" /><category term="e-commerce" /><summary type="html"><![CDATA[Learn how to implement real-time user event collection for Google Vertex AI Search for Commerce, with practical solutions to the challenges of the JavaScript Pixel tracking method and a ready-to-use implementation for Nuxt applications.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.williamchong.cloud/assets/images/2025-02-02-vertex-ai-search-retail-2-collecting-real-time-events/0.png" /><media:content medium="image" url="https://blog.williamchong.cloud/assets/images/2025-02-02-vertex-ai-search-retail-2-collecting-real-time-events/0.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Integrating Vertex AI Search for commerce Part 1: Importing GA4 Data</title><link href="https://blog.williamchong.cloud/code/2024/11/16/vertex-ai-search-retail-1-importing-data.html" rel="alternate" type="text/html" title="Integrating Vertex AI Search for commerce Part 1: Importing GA4 Data" /><published>2024-11-16T08:00:00+00:00</published><updated>2024-11-16T08:00:00+00:00</updated><id>https://blog.williamchong.cloud/code/2024/11/16/vertex-ai-search-retail-1-importing-data</id><content type="html" xml:base="https://blog.williamchong.cloud/code/2024/11/16/vertex-ai-search-retail-1-importing-data.html"><![CDATA[<p><img src="/assets/images/2024-11-16-vertex-ai-search-retail-1-importing-data/cover.png" alt="Google Cloud Vertex AI Search for Commerce" /></p>

<h2 id="background">Background</h2>

<p>In one of our e-commerce project, a very useful feature we always wanted to have personalized item recommendations for users. Since we don’t have a dedicated data scientist, we don’t have the resource to home-bake our own model, and was looking for a suitable managed cloud service for this.</p>

<p>We looked into <a href="https://aws.amazon.com/personalize/">Amazon Personalize</a>, which seems easy and promising, but unfortunately, we didn’t have time to set up a new data pipeline just for it, and couldn’t even try the setting up models.</p>

<p>Recently, I came across Google Cloud’s <a href="https://cloud.google.com/solutions/retail-product-discovery">Vertex AI Search for Commerce</a>, which seems to have seamless integration with <a href="https://www.google.com/retail/">Google Merchant Center</a> and <a href="https://marketingplatform.google.com/about/analytics/">Google Analytics</a>(GA4). This lowers integration costs, so I decided to give it a try. Turns out, it is not that easy.</p>

<h2 id="importing-historical-data">Importing Historical Data</h2>

<p>To train a recommendation model in Vertex AI Search, we need data, including products and user events. User events must include proper IDs for products so that the model can learn the relationship between products. Events that contain invalid product IDs are called unjoined events, and will be ignored by the model.</p>

<h2 id="importing-product-catalog">Importing Product Catalog</h2>

<p>There are a few ways to import product catalogs into Vertex AI Search. Here, we will cover two of them. Note that there are <a href="https://cloud.google.com/retail/docs/upload-catalog#import-bp">different limitations</a> for each import method.</p>

<h3 id="importing-product-catalog-from-google-merchant-center">Importing Product Catalog from Google Merchant Center</h3>

<p>If you already have Google Merchant Center set up either for shopping ads or Google Ads, you can easily <a href="https://cloud.google.com/retail/docs/upload-catalog#mc">import the product catalog from there</a>. This is the easiest way to import a product catalog, especially when you have <a href="https://developers.google.com/search/docs/appearance/structured-data/product">product structured data</a> already set up in your e-commerce site. Google Merchant Center will fetch all the products automatically from your website without any additional import procedure.</p>

<p>Sadly, the last time I used Google Merchant Center, all my products were disapproved since they were considered as <a href="https://support.google.com/merchants/answer/6150006">unsupported shopping content</a>. After a while, they were completely removed from the product list even when I don’t want shopping ad, and wouldn’t reappear somehow. So, I can’t use this method.</p>

<h3 id="importing-product-catalog-via-api">Importing Product Catalog via API</h3>

<p>As a developer, <a href="https://cloud.google.com/retail/docs/upload-catalog#inline">importing via API</a> is the most flexible way to import a product catalog. The product schema is defined as follows:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kd">const</span> <span class="p">{</span> <span class="nx">data</span> <span class="p">}</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">axios</span><span class="p">.</span><span class="nx">post</span><span class="p">(</span><span class="dl">'</span><span class="s1">https://retail.googleapis.com/v2/projects/${your-project-number}/locations/global/catalogs/default_catalog/branches/0/products:import</span><span class="dl">'</span><span class="p">,</span> <span class="p">{</span>
      <span class="dl">"</span><span class="s2">inputConfig</span><span class="dl">"</span><span class="p">:</span> <span class="p">{</span>
        <span class="dl">"</span><span class="s2">productInlineSource</span><span class="dl">"</span><span class="p">:</span> <span class="p">{</span>
          <span class="dl">"</span><span class="s2">products</span><span class="dl">"</span><span class="p">:</span> <span class="p">[</span>
            <span class="o">%</span><span class="p">{</span><span class="nx">your</span> <span class="nx">products</span><span class="p">}</span>
          <span class="p">],</span>
        <span class="p">}</span>
      <span class="p">}</span>
    <span class="p">},</span> <span class="p">{</span>
      <span class="na">headers</span><span class="p">:</span> <span class="p">{</span>
        <span class="dl">'</span><span class="s1">Authorization </span><span class="dl">'</span><span class="p">:</span> <span class="s2">`Bearer $(gcloud auth print-access-token)`</span><span class="p">,</span>
      <span class="p">},</span>
    <span class="p">});</span>
</code></pre></div></div>

<p>To get your project number, use the following command:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gcloud projects list <span class="se">\</span>
<span class="nt">--filter</span><span class="o">=</span><span class="s2">"</span><span class="si">$(</span>gcloud config get-value project<span class="si">)</span><span class="s2">"</span> <span class="se">\</span>
<span class="nt">--format</span><span class="o">=</span><span class="s2">"value(PROJECT_NUMBER)"</span>
</code></pre></div></div>

<p>However, there is one extra thing to add to make the API work. If you encounter the following error:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"error"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"code"</span><span class="p">:</span><span class="w"> </span><span class="mi">403</span><span class="p">,</span><span class="w">
    </span><span class="nl">"message"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Your application is authenticating by using local Application Default Credentials. The retail.googleapis.com API requires a quota project, which is not set by default. To learn how to set your quota project, see https://cloud.google.com/docs/authentication/adc-troubleshooting/user-creds ."</span><span class="p">,</span><span class="w">
    </span><span class="nl">"status"</span><span class="p">:</span><span class="w"> </span><span class="s2">"PERMISSION_DENIED"</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>Then you need to add the following header to your request:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  <span class="nx">headers</span><span class="p">:</span> <span class="p">{</span>
    <span class="dl">'</span><span class="s1">x-goog-user-project</span><span class="dl">'</span><span class="p">:</span> <span class="nx">$</span><span class="p">{</span><span class="nx">your</span><span class="o">-</span><span class="nx">project</span><span class="o">-</span><span class="nx">id</span><span class="p">},</span>
    <span class="p">...</span>
  <span class="p">}</span>
</code></pre></div></div>

<h2 id="importing-user-events">Importing User Events</h2>

<p>Like the product catalog, there are a few ways to <a href="https://cloud.google.com/retail/docs/import-user-events">import user events</a> into Vertex AI Search. We will only cover Google Analytics(GA4) data import here since it requires the least effort for sites already set up with GA4</p>

<h3 id="importing-ga4-data-from-bigquery">Importing GA4 Data from BigQuery</h3>

<p>Before we can import Google Analytics 4(GA4) events into Vertex AI Search, we need to have the data in BigQuery. Follow <a href="https://cloud.google.com/retail/docs/import-user-events#bq-ga4">the guide</a> to set up BigQuery export to GA4. Normally, GA4 events are exported daily to a dataset named <code class="language-plaintext highlighter-rouge">analytics_123456789.events_20241116</code> where <code class="language-plaintext highlighter-rouge">123456789</code> is your GA4 property ID and <code class="language-plaintext highlighter-rouge">20241116</code> is the date partition of the export.</p>

<p>Once the GA4 data is in BigQuery, we can import the table using the Vertex AI Search console. However, the web UI console can only import one table at a time. Since GA4 exports are partitioned by date, it would be tedious to import them one by one.</p>

<p>One simple solution is to merge all the tables into one table and import them. However, this is only feasible if the size of historical data is small. The sample SQL is as follows:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="nv">`analytics_123456789.combined_events`</span> <span class="k">AS</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="nv">`analytics_123456789.event_*`</span> <span class="k">WHERE</span> <span class="n">_PARTITIONTIME</span> <span class="k">BETWEEN</span> <span class="s1">'2023-03-01'</span> <span class="k">AND</span> <span class="s1">'2023-03-31'</span>

</code></pre></div></div>

<h3 id="ga4-event-mapping">GA4 Event Mapping</h3>

<p>Many user events required in Vertex AI have a <a href="https://cloud.google.com/retail/docs/user-events#ga4-mapping">direct mapping</a> to GA4 events, especially e-commerce events.</p>

<p>Search-related events are more tricky, but as long as <code class="language-plaintext highlighter-rouge">view_list</code> is set up properly with <code class="language-plaintext highlighter-rouge">search_term</code> param set, it should be fine. Another way is to use <code class="language-plaintext highlighter-rouge">view_search_results</code>, which is an automated event if you have enabled GA4’s <a href="https://support.google.com/analytics/answer/9216061">enhanced measurement</a>. However, this requires the search term to be in the URL query string with predefined keys.</p>

<h3 id="what-about-home-page-view">What about <code class="language-plaintext highlighter-rouge">home-page-view</code>?</h3>

<p>There is no GA4 event that can directly map to the retail user event <code class="language-plaintext highlighter-rouge">home-page-view</code>. During import, <code class="language-plaintext highlighter-rouge">page_view</code> with a path of / is automatically used as a substitute. This is not ideal if your homepage is not at <code class="language-plaintext highlighter-rouge">/</code>. For example, if your homepage has multiple locales, the home page might have paths like <code class="language-plaintext highlighter-rouge">/en</code> and <code class="language-plaintext highlighter-rouge">/zh</code>.</p>

<p>To correctly import these events, we would have to query for these events:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="nv">`analytics_123456789.ga_homepage`</span> <span class="p">(</span>
  <span class="n">eventType</span> <span class="n">STRING</span><span class="p">,</span>
  <span class="n">visitorId</span> <span class="n">STRING</span><span class="p">,</span>
  <span class="n">userId</span> <span class="n">STRING</span><span class="p">,</span>
  <span class="n">eventTime</span> <span class="n">STRING</span>
<span class="p">);</span>

<span class="k">INSERT</span> <span class="k">INTO</span> <span class="nv">`analytics_123456789.ga_homepage`</span> <span class="p">(</span><span class="n">eventType</span><span class="p">,</span> <span class="n">visitorId</span><span class="p">,</span> <span class="n">userId</span><span class="p">,</span> <span class="n">eventTime</span><span class="p">)</span>
<span class="k">SELECT</span>
  <span class="nv">"home-page-view"</span> <span class="k">as</span> <span class="n">eventType</span><span class="p">,</span>
  <span class="n">user_pseudo_id</span> <span class="k">as</span> <span class="n">visitorId</span><span class="p">,</span>
  <span class="n">user_id</span> <span class="k">as</span> <span class="n">userId</span><span class="p">,</span>
  <span class="k">CAST</span><span class="p">(</span><span class="n">FORMAT_TIMESTAMP</span><span class="p">(</span><span class="nv">"%Y-%m-%dT%H:%M:%SZ"</span><span class="p">,</span><span class="n">timestamp_seconds</span><span class="p">(</span><span class="k">CAST</span> <span class="p">((</span><span class="n">event_timestamp</span><span class="o">/</span><span class="mi">1000000</span><span class="p">)</span> <span class="k">as</span> <span class="n">int64</span><span class="p">)))</span> <span class="k">as</span> <span class="n">STRING</span><span class="p">)</span> <span class="k">AS</span> <span class="n">eventTime</span>
<span class="k">FROM</span> <span class="nv">`analytics_123456789.CREATE TABLE `</span><span class="n">analytics_123456789</span><span class="p">.</span><span class="n">combined_events</span><span class="nv">` AS
`</span> <span class="k">where</span> <span class="n">event_name</span> <span class="o">=</span> <span class="s1">'page_view'</span> <span class="k">AND</span> <span class="nv">`event_params`</span><span class="p">[</span><span class="n">SAFE_OFFSET</span><span class="p">(</span><span class="mi">0</span><span class="p">)].</span><span class="nv">`key`</span> <span class="o">=</span> <span class="s1">'page_path'</span> <span class="k">and</span> <span class="p">(</span><span class="nv">`event_params`</span><span class="p">[</span><span class="n">SAFE_OFFSET</span><span class="p">(</span><span class="mi">0</span><span class="p">)].</span><span class="nv">`value`</span><span class="p">.</span><span class="nv">`string_value`</span> <span class="o">=</span> <span class="s1">'/zh-Hant'</span> <span class="k">or</span> <span class="nv">`event_params`</span><span class="p">[</span><span class="n">SAFE_OFFSET</span><span class="p">(</span><span class="mi">0</span><span class="p">)].</span><span class="nv">`value`</span><span class="p">.</span><span class="nv">`string_value`</span> <span class="o">=</span> <span class="s1">'/en'</span><span class="p">)</span>
</code></pre></div></div>

<h2 id="import-ordering-of-event-and-product-catalog-matters">Import ordering of event and product catalog matters!</h2>

<p>Note that if you import events before the product catalog, events will still be unjoined even if you filled in the correct product IDs in product catalog import. This is because product IDs are joined when ingesting user events, not vice versa.</p>

<p>In this case, you would have to trigger a <a href="https://cloud.google.com/retail/docs/manage-user-events#rejoin-event">user event rejoin job</a> for the historical events to be joined according to the new catalog.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
curl <span class="nt">-X</span> POST <span class="se">\</span>
    <span class="nt">-H</span> <span class="s2">"Authorization: Bearer </span><span class="si">$(</span>gcloud auth application-default print-access-token<span class="si">)</span><span class="s2">"</span> <span class="se">\</span>
    <span class="nt">-H</span> <span class="s2">"Content-Type: application/json; charset=utf-8"</span> <span class="se">\</span>
    <span class="nt">--data</span> <span class="s2">"{
     'userEventRejoinScope': 'UNJOINED_EVENTS'
     }"</span> <span class="se">\</span>
    <span class="s2">"https://retail.googleapis.com/v2/projects/</span><span class="k">${</span><span class="nv">your</span><span class="p">-project-nubmer</span><span class="k">}</span><span class="s2">/locations/global/catalogs/default_catalog/userEvents:rejoin"</span>

</code></pre></div></div>

<h2 id="testing-out-the-model">Testing out the model</h2>

<p>Once the data is imported, we can start training the model. <a href="https://cloud.google.com/retail/docs/models#model-types">There are different models</a>, each suited for different use cases, e.g. “Recommended for you”, “Others you may like”, “Frequently bought together”, etc. Each has a minimum requirement for the data. The console will not allow training of the model if the data requirement is not met.</p>

<p>However, even if the requirements are met, the training model can still fail with an <code class="language-plaintext highlighter-rouge">INSUFFICIENT_TRAINING_DATA</code> error. The message isn’t very helpful, but it likely relates to the quality of the training data. For instance, poor data quality or a high unjoined event rate could be the issue.</p>

<h3 id="trying-the-similar-product-model">Trying the Similar Product model</h3>

<p>Luckily, the “similar product” model <a href="https://cloud.google.com/retail/docs/create-models#import-reqs">only requires the product catalog</a> to be imported. To start training the model, go to the Model tab of the Vertex AI Search console and create a “similar product” model. The training will take a while.</p>

<p>After the model is finished, we need to create a serving config to use this model. Create one in the “Serving Configs” tab. There are some configurations in serving config that we can tweak, but the default config should be good enough for testing.</p>

<p>After the serving config is created, we can go to the “Evaluate” tab to test the model. Select the serving config we just created and pick a product ID as input. The model should return a list of other similar products.</p>

<h2 id="conclusion">Conclusion</h2>

<p>Since my historical GA4 events do not meet the model requirement, I could not try the recommendation models that I am interested in. To improve the data quality, I will be implementing methods to <a href="https://cloud.google.com/retail/docs/record-events">collect real-time user data</a>. In the <a href="/code/2025/02/01/vertex-ai-search-retail-2-collecting-real-time-events.html">next post</a>, we will cover how to collect real-time user data.</p>]]></content><author><name>William Chong</name></author><category term="code" /><category term="google-cloud" /><category term="vertex-ai" /><category term="retail-search" /><category term="product-recommendation" /><category term="bigquery" /><category term="ga4" /><category term="e-commerce" /><category term="data-import" /><category term="machine-learning" /><category term="cloud-api" /><summary type="html"><![CDATA[Learn how to import historical data into Google Cloud's Vertex AI Search for Commerce. This comprehensive guide covers importing product catalogs using API and user events from Google Analytics(GA4) via BigQuery, including solutions for common import challenges.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.williamchong.cloud/assets/images/2024-11-16-vertex-ai-search-retail-1-importing-data/cover.png" /><media:content medium="image" url="https://blog.williamchong.cloud/assets/images/2024-11-16-vertex-ai-search-retail-1-importing-data/cover.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Modifying `NewSingleHostReverseProxy` Response Data in Go without HTTP Errors</title><link href="https://blog.williamchong.cloud/code/2024/10/12/modifying-json-response-in-go.html" rel="alternate" type="text/html" title="Modifying `NewSingleHostReverseProxy` Response Data in Go without HTTP Errors" /><published>2024-10-12T20:00:00+00:00</published><updated>2024-10-12T20:00:00+00:00</updated><id>https://blog.williamchong.cloud/code/2024/10/12/modifying-json-response-in-go</id><content type="html" xml:base="https://blog.williamchong.cloud/code/2024/10/12/modifying-json-response-in-go.html"><![CDATA[<p><img src="/assets/images/2024-10-13-modifying-json-response-in-go/cover.png" alt="Modifying JSON response in Go" /></p>

<h2 id="background">Background</h2>

<p>Setting up an HTTP proxy in Go is straightforward with the built-in <a href="https://pkg.go.dev/net/http/httputil#NewSingleHostReverseProxy">NewSingleHostReverseProxy</a>. When used with <a href="https://github.com/gin-gonic/gin">Gin</a> as middleware, the code looks like this:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">router</span> <span class="o">:=</span> <span class="n">gin</span><span class="o">.</span><span class="n">New</span><span class="p">()</span>
<span class="n">proxy</span> <span class="o">:=</span> <span class="n">httputil</span><span class="o">.</span><span class="n">NewSingleHostReverseProxy</span><span class="p">(</span><span class="n">lcdURL</span><span class="p">)</span>
<span class="n">proxyHandler</span> <span class="o">:=</span> <span class="k">func</span><span class="p">(</span><span class="n">c</span> <span class="o">*</span><span class="n">gin</span><span class="o">.</span><span class="n">Context</span><span class="p">)</span> <span class="p">{</span>
  <span class="n">proxy</span><span class="o">.</span><span class="n">ServeHTTP</span><span class="p">(</span><span class="n">c</span><span class="o">.</span><span class="n">Writer</span><span class="p">,</span> <span class="n">c</span><span class="o">.</span><span class="n">Request</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">router</span><span class="o">.</span><span class="n">Use</span><span class="p">(</span><span class="n">proxyHandler</span><span class="p">)</span>
</code></pre></div></div>

<p>But what if we want to modify the proxied response before sending it back to the client? For example, we might want to hide PII before serving internal data to third parties. For simplicity, let’s assume the body is in JSON format.</p>

<h2 id="built-in-functions-vs-middleware">Built-in Functions vs. Middleware</h2>

<p>Sadly, <code class="language-plaintext highlighter-rouge">NewSingleHostReverseProxy</code> doesn’t directly support response modification. However, the <a href="https://pkg.go.dev/net/http/httputil#ReverseProxy">ReverseProxy</a> instance it returns allows us to use the method <code class="language-plaintext highlighter-rouge">ModifyResponse</code>, which allows us to define a function to modify the response:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">proxy</span><span class="o">.</span><span class="n">ModifyResponse</span> <span class="o">=</span> <span class="k">func</span><span class="p">(</span><span class="n">r</span> <span class="o">*</span><span class="n">http</span><span class="o">.</span><span class="n">Response</span><span class="p">)</span> <span class="kt">error</span> <span class="p">{</span>
  <span class="n">b</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">io</span><span class="o">.</span><span class="n">ReadAll</span><span class="p">(</span><span class="n">r</span><span class="o">.</span><span class="n">Body</span><span class="p">)</span>
  <span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
    <span class="k">return</span> <span class="n">err</span>
  <span class="p">}</span>
  <span class="k">defer</span> <span class="n">r</span><span class="o">.</span><span class="n">Body</span><span class="o">.</span><span class="n">Close</span><span class="p">()</span>

  <span class="k">var</span> <span class="n">jsonObject</span> <span class="k">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="k">interface</span><span class="p">{}</span>
  <span class="n">err</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">Unmarshal</span><span class="p">(</span><span class="n">b</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">jsonObject</span><span class="p">)</span>
  <span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
    <span class="k">return</span> <span class="n">err</span>
  <span class="p">}</span>

  <span class="c">// Modify jsonObject here</span>

  <span class="n">newBody</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">json</span><span class="o">.</span><span class="n">Marshal</span><span class="p">(</span><span class="n">jsonObject</span><span class="p">)</span>
  <span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
    <span class="k">return</span> <span class="n">err</span>
  <span class="p">}</span>

  <span class="n">r</span><span class="o">.</span><span class="n">Body</span> <span class="o">=</span> <span class="n">io</span><span class="o">.</span><span class="n">NopCloser</span><span class="p">(</span><span class="n">bytes</span><span class="o">.</span><span class="n">NewReader</span><span class="p">(</span><span class="n">newBody</span><span class="p">))</span>
  <span class="k">return</span> <span class="no">nil</span>
<span class="p">}</span>
</code></pre></div></div>

<p>However, if you’re running the proxy alongside other API routes, you might want a unified way to modify all responses. Writing the rewrite logic as Gin middleware can achieve this.</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">filterJsonBody</span><span class="p">()</span> <span class="n">gin</span><span class="o">.</span><span class="n">HandlerFunc</span> <span class="p">{</span>
  <span class="k">return</span> <span class="k">func</span><span class="p">(</span><span class="n">c</span> <span class="o">*</span><span class="n">gin</span><span class="o">.</span><span class="n">Context</span><span class="p">)</span> <span class="p">{</span>
    <span class="c">// Create our own writer</span>
    <span class="n">wb</span> <span class="o">:=</span> <span class="o">&amp;</span><span class="n">copyWriter</span><span class="p">{</span>
      <span class="n">body</span><span class="o">:</span>           <span class="o">&amp;</span><span class="n">bytes</span><span class="o">.</span><span class="n">Buffer</span><span class="p">{},</span>
      <span class="n">ResponseWriter</span><span class="o">:</span> <span class="n">c</span><span class="o">.</span><span class="n">Writer</span><span class="p">,</span>
    <span class="p">}</span>

    <span class="c">// Inject it into gin context</span>
    <span class="n">c</span><span class="o">.</span><span class="n">Writer</span> <span class="o">=</span> <span class="n">wb</span>

    <span class="c">// Call the next handler</span>
    <span class="n">c</span><span class="o">.</span><span class="n">Next</span><span class="p">()</span>

    <span class="c">// Handle response modification at the end of handler chain</span>
    <span class="n">originBodyBytes</span> <span class="o">:=</span> <span class="n">wb</span><span class="o">.</span><span class="n">body</span><span class="o">.</span><span class="n">Bytes</span><span class="p">()</span>

    <span class="k">var</span> <span class="n">jsonObject</span> <span class="k">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="k">interface</span><span class="p">{}</span>
    <span class="n">err</span> <span class="o">:=</span> <span class="n">json</span><span class="o">.</span><span class="n">Unmarshal</span><span class="p">(</span><span class="n">originBodyBytes</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">jsonObject</span><span class="p">)</span>
    <span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
      <span class="n">c</span><span class="o">.</span><span class="n">AbortWithError</span><span class="p">(</span><span class="n">http</span><span class="o">.</span><span class="n">StatusInternalServerError</span><span class="p">,</span> <span class="n">err</span><span class="p">)</span>
      <span class="k">return</span>
    <span class="p">}</span>

    <span class="c">// Modify jsonObject here</span>

    <span class="n">newBody</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">json</span><span class="o">.</span><span class="n">Marshal</span><span class="p">(</span><span class="n">jsonObject</span><span class="p">)</span>
    <span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
      <span class="n">c</span><span class="o">.</span><span class="n">AbortWithError</span><span class="p">(</span><span class="n">http</span><span class="o">.</span><span class="n">StatusInternalServerError</span><span class="p">,</span> <span class="n">err</span><span class="p">)</span>
      <span class="k">return</span>
    <span class="p">}</span>

    <span class="n">wb</span><span class="o">.</span><span class="n">ResponseWriter</span><span class="o">.</span><span class="n">Write</span><span class="p">(</span><span class="n">newBody</span><span class="p">)</span>
  <span class="p">}</span>
<span class="p">}</span>

<span class="k">type</span> <span class="n">copyWriter</span> <span class="k">struct</span> <span class="p">{</span>
  <span class="n">gin</span><span class="o">.</span><span class="n">ResponseWriter</span>
  <span class="n">body</span> <span class="o">*</span><span class="n">bytes</span><span class="o">.</span><span class="n">Buffer</span>
<span class="p">}</span>

<span class="k">func</span> <span class="p">(</span><span class="n">cw</span> <span class="o">*</span><span class="n">copyWriter</span><span class="p">)</span> <span class="n">Write</span><span class="p">(</span><span class="n">b</span> <span class="p">[]</span><span class="kt">byte</span><span class="p">)</span> <span class="p">(</span><span class="kt">int</span><span class="p">,</span> <span class="kt">error</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">return</span> <span class="n">cw</span><span class="o">.</span><span class="n">body</span><span class="o">.</span><span class="n">Write</span><span class="p">(</span><span class="n">b</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Add it at the beginning of the router handler chain, so that our <code class="language-plaintext highlighter-rouge">copyWriter</code> instance would replace the <code class="language-plaintext highlighter-rouge">Writer</code> in the Gin context for all routes.</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">router</span> <span class="o">:=</span> <span class="n">gin</span><span class="o">.</span><span class="n">New</span><span class="p">()</span>
<span class="n">router</span><span class="o">.</span><span class="n">Use</span><span class="p">(</span><span class="n">filterJsonBody</span><span class="p">())</span>

<span class="c">// Other routes</span>
<span class="n">router</span><span class="o">.</span><span class="n">Use</span><span class="p">(</span><span class="n">proxyHandler</span><span class="p">)</span>
</code></pre></div></div>

<h2 id="http-error">HTTP Error?</h2>

<p>If you add the middleware as shown and then use curl to test the API, you may encounter errors like:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">HTTP/2 stream 1 was not closed cleanly: INTERNAL_ERROR (err 2)</code></li>
  <li><code class="language-plaintext highlighter-rouge">curl: (18) transfer closed with x bytes remaining to read</code></li>
</ul>

<p>Which of these would show up depends on your infrastructure setup. For example, cloud hosting load balancers would likely be in HTTP/2. The HTTP/2 error can be cryptic, but the curl error hints at a <code class="language-plaintext highlighter-rouge">Content-Length</code> header mismatch. One way to confirm the issue is to make curl show the HTTP headers by using <code class="language-plaintext highlighter-rouge">curl -v</code>. Indeed, the <code class="language-plaintext highlighter-rouge">Content-Length</code> header is set as the original body length, instead of the modified one.</p>

<h3 id="fixing-the-content-length-header-in-modifyresponse-approach">Fixing the Content-Length Header in <code class="language-plaintext highlighter-rouge">ModifyResponse</code> Approach</h3>

<p>Fixing this issue in <code class="language-plaintext highlighter-rouge">ModifyResponse</code> is straightforward; we just need to set the <code class="language-plaintext highlighter-rouge">Content-Length</code> header to the new body length:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">r</span><span class="o">.</span><span class="n">ContentLength</span> <span class="o">=</span> <span class="kt">int64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">newBody</span><span class="p">))</span>
<span class="n">r</span><span class="o">.</span><span class="n">Header</span><span class="o">.</span><span class="n">Set</span><span class="p">(</span><span class="s">"Content-Length"</span><span class="p">,</span> <span class="n">strconv</span><span class="o">.</span><span class="n">Itoa</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">newBody</span><span class="p">)))</span>
</code></pre></div></div>

<h3 id="fixing-the-content-length-header-in-gin-middleware-approach">Fixing the Content-Length Header in Gin Middleware Approach</h3>

<p>In the middleware approach, ideally, we would also rewrite the <code class="language-plaintext highlighter-rouge">Content-Length</code> header with the new correct length. However, in the method we would override here, <code class="language-plaintext highlighter-rouge">WriteHeader</code>, we can’t directly access the <code class="language-plaintext highlighter-rouge">http.Response</code> object. Instead, a workaround is to use <code class="language-plaintext highlighter-rouge">Transfer-Encoding: chunked</code> to signal that the response body is sent in chunks. This way, we don’t need to set the <code class="language-plaintext highlighter-rouge">Content-Length</code> header at all:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="p">(</span><span class="n">cw</span> <span class="o">*</span><span class="n">copyWriter</span><span class="p">)</span> <span class="n">WriteHeader</span><span class="p">(</span><span class="n">statusCode</span> <span class="kt">int</span><span class="p">)</span> <span class="p">{</span>
  <span class="n">cw</span><span class="o">.</span><span class="n">Header</span><span class="p">()</span><span class="o">.</span><span class="n">Del</span><span class="p">(</span><span class="s">"Content-Length"</span><span class="p">)</span>
  <span class="n">cw</span><span class="o">.</span><span class="n">Header</span><span class="p">()</span><span class="o">.</span><span class="n">Set</span><span class="p">(</span><span class="s">"Transfer-Encoding"</span><span class="p">,</span> <span class="s">"chunked"</span><span class="p">)</span>
  <span class="n">cw</span><span class="o">.</span><span class="n">ResponseWriter</span><span class="o">.</span><span class="n">WriteHeader</span><span class="p">(</span><span class="n">statusCode</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p>One downside of using <code class="language-plaintext highlighter-rouge">chunked</code> encoding is that some HTTP clients and proxies may not cache them properly. Since I use Nginx in my setup, which caches small chunked responses, this tradeoff is acceptable for my use case.</p>

<h3 id="conclusion">Conclusion</h3>

<p>Setting up a reverse proxy in Go is easy; the ability to modify the response allows even more flexibility in use cases. By using Gin middleware, we easily apply a unified modification logic to all routes. However, be aware of the <code class="language-plaintext highlighter-rouge">Content-Length</code> header issue when modifying the response body, and choose the appropriate solution wisely based on your setup.</p>]]></content><author><name>William Chong</name></author><category term="code" /><category term="golang" /><category term="go" /><category term="reverse-proxy" /><category term="http" /><category term="json-manipulation" /><category term="middleware" /><category term="gin" /><category term="content-length" /><category term="http2" /><category term="response-modification" /><category term="web-development" /><category term="api-gateway" /><category term="data-transformation" /><summary type="html"><![CDATA[A comprehensive guide to intercepting and modifying JSON responses in Go reverse proxies, covering built-in ModifyResponse methods and custom Gin middleware approaches, with practical solutions for the Content-Length header issues that cause HTTP transfer errors.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.williamchong.cloud/assets/images/2024-10-13-modifying-json-response-in-go/cover.png" /><media:content medium="image" url="https://blog.williamchong.cloud/assets/images/2024-10-13-modifying-json-response-in-go/cover.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Fixing Google Analytics (GA4) purchase funnel and Stripe Checkout</title><link href="https://blog.williamchong.cloud/code/2024/04/20/maintaining-ga-session-in-stripe-checkout.html" rel="alternate" type="text/html" title="Fixing Google Analytics (GA4) purchase funnel and Stripe Checkout" /><published>2024-04-20T20:00:00+00:00</published><updated>2024-04-20T20:00:00+00:00</updated><id>https://blog.williamchong.cloud/code/2024/04/20/maintaining-ga-session-in-stripe-checkout</id><content type="html" xml:base="https://blog.williamchong.cloud/code/2024/04/20/maintaining-ga-session-in-stripe-checkout.html"><![CDATA[<p><img src="/assets/images/2024-04-21-maintaining-ga-session-in-stripe-checkout/cover.png" alt="A GA4 purchase journey funnel" /></p>

<h2 id="background">Background</h2>

<p>Stripe Checkout and Google Analytics (GA4) are powerful and commonly used tools in e-commerce websites. Stripe Checkout allows you to collect payments with minimal code and a prebuilt UI, while GA helps analyze user behavior to improve sales.</p>

<h2 id="issue-0-purchase-events-in-ga-funnel">Issue: 0 purchase events in GA funnel</h2>

<p>However, I recently encountered a significant issue where GA’s purchase journey dashboard couldn’t properly plot the purchase funnel analysis when using Stripe Checkout. The number of events always dropped to 0 after the “begin checkout” step. Interestingly, the “Ecommerce Purchases” dashboard showed that the number of purchase events wasn’t actually 0. So, there must be something wrong with the purchase journey funnel.</p>

<h2 id="ga-sessions-stripe-checkout-and-third-party-domains">GA Sessions, Stripe Checkout and third party domains</h2>

<p>In Google Analytics 4, events are attributed to different sessions. Users receive a unique ID cookie when they land on a GA-enabled site, and events fired under the same ID count towards the same session. A new session is created either when a new ID is received or when a time limit has passed since the last event from a particular ID.</p>

<p><a href="https://support.google.com/analytics/answer/9191807">Learn more about sessions</a></p>

<p>To understand the issue with the purchase journey funnel, we need to consider the nature of Stripe Checkout. In Stripe checkout, there is an additional redirect step after the checkout begins. By default, Stripe checkout is a hosted page under the domain checkout.stripe.com. Only after users complete or cancel the checkout process are they redirected back to the original site. This redirect process seems to change the session ID, which breaks the purchase journey funnel.</p>

<p><img src="/assets/images/2024-04-21-maintaining-ga-session-in-stripe-checkout/stripe.png" alt="Stripe Checkout hosted page on checkout.stripe.com" /></p>

<p>Why does the session ID change after redirection? One possible reason is the third-party cookie constraints imposed by modern browsers. Browsers now alter cookie behaviors for third-party domains to protect user privacy, such as limiting their lifetime or adding extra per-origin isolations. Since GA cookies are sent under the domain analytics.google.com and not under the user’s application domain, they are affected by these restrictions. The redirection to and from the Stripe Checkout page likely resets the GA cookie, creating a new session ID. As a result, GA thinks the final purchase step of the purchase journey funnel occurs under a different user session, leading to the correct event count but a broken funnel.</p>

<h2 id="fixing-session-with-cross-domain-measurement">Fixing session with cross-domain measurement</h2>

<p><img src="/assets/images/2024-04-21-maintaining-ga-session-in-stripe-checkout/0.png" alt="Cross-domain measure settings in GA4" /></p>

<p>To mitigate this issue and attribute the event to the correct session (thus fixing the purchase journey funnel), we need to inform GA explicitly that users redirected back from the Stripe Checkout page already have an existing session ID. GA4 actually has an automated solution for this issue called “cross-domain measurement.” It automatically adds a query string <code class="language-plaintext highlighter-rouge">_gl</code> when a user clicks on a URL under any domain that you also own. This <code class="language-plaintext highlighter-rouge">_gl</code> query string is a unique linker ID that allows the gtag script in the receiving domain to identify existing users and sessions without relying on the presence of the same cookie ID.</p>

<p>You can learn more about how cross-domain measurement works and configure a list of domains you own under “Configure your domains” in your “Google tag data stream settings.”</p>

<p><a href="https://support.google.com/analytics/answer/10071811">Learn more about cross-domain measurement</a></p>

<p>However, this automated solution does not work in the case of Stripe Checkout. We don’t own or control the gtag script under <code class="language-plaintext highlighter-rouge">checkout.stripe.com</code>, so it won’t utilize the <code class="language-plaintext highlighter-rouge">_gl</code> query string we send or send back the user with the proper <code class="language-plaintext highlighter-rouge">_gl</code> query string when they are redirected back to our own site.</p>

<h2 id="making-cross-domain-measurement-work-manually">Making cross-domain measurement work manually</h2>

<p>Nevertheless, by understanding how to mitigate cross-domain issues, we can implement a manual link for session IDs.</p>

<p>Before sending the user to the Stripe Checkout page, collect the GA client ID and session ID using the gtag query. The following code snippet is an example under a Nuxt.js app, using the <code class="language-plaintext highlighter-rouge">Vue.$gtag</code> syntax:</p>

<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript">  <span class="nx">Vue</span><span class="p">.</span><span class="nx">$gtag</span><span class="p">.</span><span class="nx">query</span><span class="p">(</span><span class="dl">'</span><span class="s1">get</span><span class="dl">'</span><span class="p">,</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">GA_TRACKING_ID</span><span class="p">,</span> <span class="dl">'</span><span class="s1">client_id</span><span class="dl">'</span><span class="p">,</span> <span class="nx">id</span> <span class="o">=&gt;</span> <span class="p">{</span>
    <span class="nx">store</span><span class="p">.</span><span class="nx">dispatch</span><span class="p">(</span><span class="dl">'</span><span class="s1">setGaClientId</span><span class="dl">'</span><span class="p">,</span> <span class="nx">id</span><span class="p">);</span>
  <span class="p">});</span>
  <span class="nx">Vue</span><span class="p">.</span><span class="nx">$gtag</span><span class="p">.</span><span class="nx">query</span><span class="p">(</span><span class="dl">'</span><span class="s1">get</span><span class="dl">'</span><span class="p">,</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">GA_TRACKING_ID</span><span class="p">,</span> <span class="dl">'</span><span class="s1">session_id</span><span class="dl">'</span><span class="p">,</span> <span class="nx">id</span> <span class="o">=&gt;</span> <span class="p">{</span>
    <span class="nx">store</span><span class="p">.</span><span class="nx">dispatch</span><span class="p">(</span><span class="dl">'</span><span class="s1">setGaSessionId</span><span class="dl">'</span><span class="p">,</span> <span class="nx">id</span><span class="p">);</span>
  <span class="p">});</span></code></pre></figure>

<p>When creating the Stripe Checkout session, include the <code class="language-plaintext highlighter-rouge">ga_client_id</code> and <code class="language-plaintext highlighter-rouge">ga_session_id</code> respectively in the <code class="language-plaintext highlighter-rouge">success_url</code> and <code class="language-plaintext highlighter-rouge">cancel_url</code>:</p>

<figure class="highlight"><pre><code class="language-typescript" data-lang="typescript"><span class="kd">const</span> <span class="nx">checkoutPayload</span><span class="p">:</span> <span class="nx">Stripe</span><span class="p">.</span><span class="nx">Checkout</span><span class="p">.</span><span class="nx">SessionCreateParams</span> <span class="o">=</span> <span class="p">{</span>
  <span class="na">mode</span><span class="p">:</span> <span class="dl">'</span><span class="s1">payment</span><span class="dl">'</span><span class="p">,</span>
  <span class="na">success_url</span><span class="p">:</span> <span class="s2">`</span><span class="p">${</span><span class="nx">successUrl</span><span class="p">}</span><span class="s2">?ga_client_id=</span><span class="p">${</span><span class="nx">gaClientId</span><span class="p">}</span><span class="s2">&amp;ga_session_id=</span><span class="p">${</span><span class="nx">gaSessionId</span><span class="p">}</span><span class="s2">`</span><span class="p">,</span>
  <span class="na">cancel_url</span><span class="p">:</span> <span class="s2">`</span><span class="p">${</span><span class="nx">cancelUrl</span><span class="p">}</span><span class="s2">?ga_client_id=</span><span class="p">${</span><span class="nx">gaClientId</span><span class="p">}</span><span class="s2">&amp;ga_session_id=</span><span class="p">${</span><span class="nx">gaSessionId</span><span class="p">}</span><span class="s2">`</span><span class="p">,</span>
  <span class="p">...</span>
<span class="p">}</span></code></pre></figure>

<p>When initializing GA/gtag on our own site, check for the query string parameters <code class="language-plaintext highlighter-rouge">ga_client_id</code> and <code class="language-plaintext highlighter-rouge">ga_session_id</code>. If these values exist, we can assume that the user was redirected from the checkout flow and restore their <code class="language-plaintext highlighter-rouge">client_id</code> and <code class="language-plaintext highlighter-rouge">session_id</code> accordingly:</p>

<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"><span class="k">if</span> <span class="p">(</span><span class="nx">query</span><span class="p">.</span><span class="nx">ga_client_id</span> <span class="o">&amp;&amp;</span> <span class="nx">query</span><span class="p">.</span><span class="nx">ga_session_id</span><span class="p">)</span> <span class="p">{</span>
  <span class="nx">Vue</span><span class="p">.</span><span class="nx">$gtag</span><span class="p">.</span><span class="nx">config</span><span class="p">({</span>
    <span class="na">client_id</span><span class="p">:</span> <span class="nx">query</span><span class="p">.</span><span class="nx">ga_client_id</span><span class="p">,</span>
    <span class="na">session_id</span><span class="p">:</span> <span class="nx">query</span><span class="p">.</span><span class="nx">ga_session_id</span><span class="p">,</span>
  <span class="p">});</span>
<span class="p">}</span></code></pre></figure>

<p>This fixes the issue of lost sessions and restores the normal functioning of the purchase journey funnel.</p>

<h2 id="another-solution-server-side-event-recording">Another solution: Server side event recording</h2>

<p>The server-side event recording solution is mentioned in the Stripe official documentation. It allows you to fire events directly from the server side. This approach involves sending the client ID of the checkout user to the Stripe Checkout session, which is then stored in the checkout metadata. Once the payment is successfully completed, the server can directly fire the purchase event with the corresponding IDs set.</p>

<p>While this solution can help in correctly logging a purchase event, it’s important to note that the session ID is not explicitly mentioned in the official guide. Therefore, it’s unclear whether this approach will work seamlessly with the purchase journey funnel.</p>

<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"> <span class="k">if</span> <span class="p">(</span><span class="nx">event</span><span class="p">.</span><span class="nx">type</span> <span class="o">===</span> <span class="dl">"</span><span class="s2">checkout.session.completed</span><span class="dl">"</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Record metrics using the Google Analytics Measurement Protocol</span>
    <span class="c1">// See https://developers.google.com/analytics/devguides/collection/protocol/v1/devguide</span>
    <span class="kd">const</span> <span class="nx">params</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">URLSearchParams</span><span class="p">({</span>
      <span class="na">v</span><span class="p">:</span> <span class="dl">"</span><span class="s2">1</span><span class="dl">"</span><span class="p">,</span> <span class="c1">// Version</span>
      <span class="na">tid</span><span class="p">:</span> <span class="o">&lt;</span><span class="nx">GOOGLE_ANALYTICS_CLIENT_ID</span><span class="o">&gt;</span><span class="p">,</span> <span class="c1">// Tracking ID / Property ID.</span>
      <span class="na">cid</span><span class="p">:</span> <span class="nx">event</span><span class="p">.</span><span class="nx">data</span><span class="p">.</span><span class="nx">object</span><span class="p">.</span><span class="nx">metadata</span><span class="p">.</span><span class="nx">analyticsClientId</span><span class="p">,</span> <span class="c1">// Client ID</span>
      <span class="na">t</span><span class="p">:</span> <span class="dl">"</span><span class="s2">event</span><span class="dl">"</span><span class="p">,</span> <span class="c1">// Event hit type</span>
      <span class="na">ec</span><span class="p">:</span> <span class="dl">"</span><span class="s2">ecommerce</span><span class="dl">"</span><span class="p">,</span> <span class="c1">// Event Category</span>
      <span class="na">ea</span><span class="p">:</span> <span class="dl">"</span><span class="s2">purchase</span><span class="dl">"</span><span class="p">,</span> <span class="c1">// Event Action</span>
    <span class="p">});</span>

    <span class="nx">request</span><span class="p">(</span><span class="s2">`https://www.google-analytics.com/batch?</span><span class="p">${</span><span class="nx">params</span><span class="p">.</span><span class="nx">toString</span><span class="p">()}</span><span class="s2">`</span><span class="p">,</span> <span class="p">{</span>
      <span class="na">method</span><span class="p">:</span> <span class="dl">"</span><span class="s2">POST</span><span class="dl">"</span><span class="p">,</span>
    <span class="p">});</span>
  <span class="p">}</span></code></pre></figure>

<p>For more detailed information and implementation instructions, you can <a href="https://docs.stripe.com/payments/checkout/analyze-conversion-funnel#server-side-event-recording">refer to the official guide on server-side event recording provided by Stripe</a>.</p>

<h2 id="extra-config-to-make-analytics-data-cleaner">Extra config to make analytics data cleaner</h2>

<p>Another issue related to GA4 and Stripe Checkout is that users redirected back from the Checkout page are counted as referral traffic by default in GA. This happens because returning users are misidentified as new sessions, leading to an increase in referral traffic. To mitigate this issue, you can exclude <code class="language-plaintext highlighter-rouge">checkout.stripe.com</code> as a referral source.</p>

<p><img src="/assets/images/2024-04-21-maintaining-ga-session-in-stripe-checkout/1.png" alt="Unwanted referrals settings in GA4" /></p>

<p>Follow this guide from Google to set up “unwanted referrals” in your Google tag data stream settings:</p>

<p><a href="https://support.google.com/analytics/answer/10327750">Identify unwanted referrals</a></p>

<p>By implementing these additional configurations, you can ensure cleaner analytics data and improve the accuracy of your funnel analysis.</p>]]></content><author><name>William Chong</name></author><category term="code" /><category term="google-analytics" /><category term="ga4" /><category term="stripe-checkout" /><category term="cross-domain-tracking" /><category term="session-persistence" /><category term="e-commerce" /><category term="conversion-funnel" /><category term="third-party-cookies" /><category term="client-id" /><category term="web-analytics" /><category term="referral-traffic" /><summary type="html"><![CDATA[A comprehensive guide to fixing broken GA4 purchase funnels when using Stripe Checkout by maintaining cross-domain sessions. Learn how to implement session persistence techniques and configure proper cross-domain measurement for accurate e-commerce tracking.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.williamchong.cloud/assets/images/2024-04-21-maintaining-ga-session-in-stripe-checkout/cover.png" /><media:content medium="image" url="https://blog.williamchong.cloud/assets/images/2024-04-21-maintaining-ga-session-in-stripe-checkout/cover.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>