William Chong’s Cloud

Modifying `NewSingleHostReverseProxy` Response Data in Go without HTTP Errors

2024-10-12T20:00:00+00:00

Background

Setting up an HTTP proxy in Go is straightforward with the built-in NewSingleHostReverseProxy. When used with Gin as middleware, the code looks like this:

router := gin.New()
proxy := httputil.NewSingleHostReverseProxy(lcdURL)
proxyHandler := func(c *gin.Context) {
  proxy.ServeHTTP(c.Writer, c.Request)
}
router.Use(proxyHandler)

But what if we want to modify the proxied response before sending it back to the client? For example, we might want to hide PII before serving internal data to third parties. For simplicity, let’s assume the body is in JSON format.

Built-in Functions vs. Middleware

Sadly, NewSingleHostReverseProxy doesn’t directly support response modification. However, the ReverseProxy instance it returns allows us to use the method ModifyResponse, which allows us to define a function to modify the response:

proxy.ModifyResponse = func(r *http.Response) error {
  b, err := io.ReadAll(r.Body)
  if err != nil {
    return err
  }
  defer r.Body.Close()

  var jsonObject map[string]interface{}
  err = json.Unmarshal(b, &jsonObject)
  if err != nil {
    return err
  }

  // Modify jsonObject here

  newBody, err := json.Marshal(jsonObject)
  if err != nil {
    return err
  }

  r.Body = io.NopCloser(bytes.NewReader(newBody))
  return nil
}

However, if you’re running the proxy alongside other API routes, you might want a unified way to modify all responses. Writing the rewrite logic as Gin middleware can achieve this.

func filterJsonBody() gin.HandlerFunc {
  return func(c *gin.Context) {
    // Create our own writer
    wb := &copyWriter{
      body:           &bytes.Buffer{},
      ResponseWriter: c.Writer,
    }

    // Inject it into gin context
    c.Writer = wb

    // Call the next handler
    c.Next()

    // Handle response modification at the end of handler chain
    originBodyBytes := wb.body.Bytes()

    var jsonObject map[string]interface{}
    err := json.Unmarshal(originBodyBytes, &jsonObject)
    if err != nil {
      c.AbortWithError(http.StatusInternalServerError, err)
      return
    }

    // Modify jsonObject here

    newBody, err := json.Marshal(jsonObject)
    if err != nil {
      c.AbortWithError(http.StatusInternalServerError, err)
      return
    }

    wb.ResponseWriter.Write(newBody)
  }
}

type copyWriter struct {
  gin.ResponseWriter
  body *bytes.Buffer
}

func (cw *copyWriter) Write(b []byte) (int, error) {
  return cw.body.Write(b)
}

Add it at the beginning of the router handler chain, so that our copyWriter instance would replace the Writer in the Gin context for all routes.

router := gin.New()
router.Use(filterJsonBody())

// Other routes
router.Use(proxyHandler)

HTTP Error?

If you add the middleware as shown and then use curl to test the API, you may encounter errors like:

HTTP/2 stream 1 was not closed cleanly: INTERNAL_ERROR (err 2)
curl: (18) transfer closed with x bytes remaining to read

Which of these would show up depends on your infrastructure setup. For example, cloud hosting load balancers would likely be in HTTP/2. The HTTP/2 error can be cryptic, but the curl error hints at a Content-Length header mismatch. One way to confirm the issue is to make curl show the HTTP headers by using curl -v. Indeed, the Content-Length header is set as the original body length, instead of the modified one.

Fixing the Content-Length Header in `ModifyResponse` Approach

Fixing this issue in ModifyResponse is straightforward; we just need to set the Content-Length header to the new body length:

r.ContentLength = int64(len(newBody))
r.Header.Set("Content-Length", strconv.Itoa(len(newBody)))

Fixing the Content-Length Header in Gin Middleware Approach

In the middleware approach, ideally, we would also rewrite the Content-Length header with the new correct length. However, in the method we would override here, WriteHeader, we can’t directly access the http.Response object. Instead, a workaround is to use Transfer-Encoding: chunked to signal that the response body is sent in chunks. This way, we don’t need to set the Content-Length header at all:

func (cw *copyWriter) WriteHeader(statusCode int) {
  cw.Header().Del("Content-Length")
  cw.Header().Set("Transfer-Encoding", "chunked")
  cw.ResponseWriter.WriteHeader(statusCode)
}

One downside of using chunked encoding is that some HTTP clients and proxies may not cache them properly. Since I use Nginx in my setup, which caches small chunked responses, this tradeoff is acceptable for my use case.

Conclusion

Setting up a reverse proxy in Go is easy; the ability to modify the response allows even more flexibility in use cases. By using Gin middleware, we easily apply a unified modification logic to all routes. However, be aware of the Content-Length header issue when modifying the response body, and choose the appropriate solution wisely based on your setup.

Fixing Google Analytics (GA4) purchase funnel and Stripe Checkout

2024-04-20T20:00:00+00:00

Background

Stripe Checkout and Google Analytics (GA4) are powerful and commonly used tools in e-commerce websites. Stripe Checkout allows you to collect payments with minimal code and a prebuilt UI, while GA helps analyze user behavior to improve sales.

Issue: 0 purchase events in GA funnel

However, I recently encountered a significant issue where GA’s purchase journey dashboard couldn’t properly plot the purchase funnel analysis when using Stripe Checkout. The number of events always dropped to 0 after the “begin checkout” step. Interestingly, the “Ecommerce Purchases” dashboard showed that the number of purchase events wasn’t actually 0. So, there must be something wrong with the purchase journey funnel.

GA Sessions, Stripe Checkout and third party domains

In Google Analytics 4, events are attributed to different sessions. Users receive a unique ID cookie when they land on a GA-enabled site, and events fired under the same ID count towards the same session. A new session is created either when a new ID is received or when a time limit has passed since the last event from a particular ID.

Learn more about sessions

To understand the issue with the purchase journey funnel, we need to consider the nature of Stripe Checkout. In Stripe checkout, there is an additional redirect step after the checkout begins. By default, Stripe checkout is a hosted page under the domain checkout.stripe.com. Only after users complete or cancel the checkout process are they redirected back to the original site. This redirect process seems to change the session ID, which breaks the purchase journey funnel.

Why does the session ID change after redirection? One possible reason is the third-party cookie constraints imposed by modern browsers. Browsers now alter cookie behaviors for third-party domains to protect user privacy, such as limiting their lifetime or adding extra per-origin isolations. Since GA cookies are sent under the domain analytics.google.com and not under the user’s application domain, they are affected by these restrictions. The redirection to and from the Stripe Checkout page likely resets the GA cookie, creating a new session ID. As a result, GA thinks the final purchase step of the purchase journey funnel occurs under a different user session, leading to the correct event count but a broken funnel.

Fixing session with cross-domain measurement

To mitigate this issue and attribute the event to the correct session (thus fixing the purchase journey funnel), we need to inform GA explicitly that users redirected back from the Stripe Checkout page already have an existing session ID. GA4 actually has an automated solution for this issue called “cross-domain measurement.” It automatically adds a query string _gl when a user clicks on a URL under any domain that you also own. This _gl query string is a unique linker ID that allows the gtag script in the receiving domain to identify existing users and sessions without relying on the presence of the same cookie ID.

You can learn more about how cross-domain measurement works and configure a list of domains you own under “Configure your domains” in your “Google tag data stream settings.”

Learn more about cross-domain measurement

However, this automated solution does not work in the case of Stripe Checkout. We don’t own or control the gtag script under checkout.stripe.com, so it won’t utilize the _gl query string we send or send back the user with the proper _gl query string when they are redirected back to our own site.

Making cross-domain measurement work manually

Nevertheless, by understanding how to mitigate cross-domain issues, we can implement a manual link for session IDs.

Before sending the user to the Stripe Checkout page, collect the GA client ID and session ID using the gtag query. The following code snippet is an example under a Nuxt.js app, using the Vue.$gtag syntax:

  Vue.$gtag.query('get', process.env.GA_TRACKING_ID, 'client_id', id => {
    store.dispatch('setGaClientId', id);
  });
  Vue.$gtag.query('get', process.env.GA_TRACKING_ID, 'session_id', id => {
    store.dispatch('setGaSessionId', id);
  });

When creating the Stripe Checkout session, include the ga_client_id and ga_session_id respectively in the success_url and cancel_url:

const checkoutPayload: Stripe.Checkout.SessionCreateParams = {
  mode: 'payment',
  success_url: `${successUrl}?ga_client_id=${gaClientId}&ga_session_id=${gaSessionId}`,
  cancel_url: `${cancelUrl}?ga_client_id=${gaClientId}&ga_session_id=${gaSessionId}`,
  ...
}

When initializing GA/gtag on our own site, check for the query string parameters ga_client_id and ga_session_id. If these values exist, we can assume that the user was redirected from the checkout flow and restore their client_id and session_id accordingly:

if (query.ga_client_id && query.ga_session_id) {
  Vue.$gtag.config({
    client_id: query.ga_client_id,
    session_id: query.ga_session_id,
  });
}

This fixes the issue of lost sessions and restores the normal functioning of the purchase journey funnel.

Another solution: Server side event recording

The server-side event recording solution is mentioned in the Stripe official documentation. It allows you to fire events directly from the server side. This approach involves sending the client ID of the checkout user to the Stripe Checkout session, which is then stored in the checkout metadata. Once the payment is successfully completed, the server can directly fire the purchase event with the corresponding IDs set.

While this solution can help in correctly logging a purchase event, it’s important to note that the session ID is not explicitly mentioned in the official guide. Therefore, it’s unclear whether this approach will work seamlessly with the purchase journey funnel.

 if (event.type === "checkout.session.completed") {
    // Record metrics using the Google Analytics Measurement Protocol
    // See https://developers.google.com/analytics/devguides/collection/protocol/v1/devguide
    const params = new URLSearchParams({
      v: "1", // Version
      tid: <GOOGLE_ANALYTICS_CLIENT_ID>, // Tracking ID / Property ID.
      cid: event.data.object.metadata.analyticsClientId, // Client ID
      t: "event", // Event hit type
      ec: "ecommerce", // Event Category
      ea: "purchase", // Event Action
    });

    request(`https://www.google-analytics.com/batch?${params.toString()}`, {
      method: "POST",
    });
  }

For more detailed information and implementation instructions, you can refer to the official guide on server-side event recording provided by Stripe.

Extra config to make analytics data cleaner

Another issue related to GA4 and Stripe Checkout is that users redirected back from the Checkout page are counted as referral traffic by default in GA. This happens because returning users are misidentified as new sessions, leading to an increase in referral traffic. To mitigate this issue, you can exclude checkout.stripe.com as a referral source.

Follow this guide from Google to set up “unwanted referrals” in your Google tag data stream settings:

Identify unwanted referrals

By implementing these additional configurations, you can ensure cleaner analytics data and improve the accuracy of your funnel analysis.

Concatenating ogg vorbis (.ogg) audio files on frontend

2024-01-01T18:00:00+00:00

Background

When working with the Azure Text-to-Speech API, there is no size limit for the input text, but the resulting output audio is limited to a maximum length of 10 minutes. If the input exceeds this limit, the output audio will be truncated, causing unexpected issues. This limitation is particularly problematic for scenarios involving long texts, such as whole chapters from ePub ebooks, which often surpass the threshold and result in undesired truncation.

Working around the output limit

An easy workaround for this limitation is to split the input into 10-minute chunks when making API calls to the Text-to-Speech service. The resulting audio files can then be concatenated into a single file. However, calculating the exact output audio length for a given set of plaintext input can be challenging, especially when dealing with bilingual texts like Chinese and English. In English, words are separated by spaces, making it relatively straightforward to estimate the length by counting number of words. However, in Chinese, characters and words are not separated by spaces, making the issue more complex.

Fortunately, in this particular case, the input text consists of well-formatted paragraphs with new lines (or and

tags in the case of HTML input). A simple approach would be to split the text by newline characters (\n) and merge the parts until the maximum character length is reached. We can set a more conservative maximum character length here, as having the audio files shorter than 10 minutes will not affect the final concatenated result.

function splitText (text: string, maxLength = 2000) {
  const sections = []
  const words = text.split('\n')
  let currentSection = ''

  for (const word of words) {
    if (currentSection.length + word.length + 1 > maxLength) {
      sections.push(currentSection)
      currentSection = ''
    }
    currentSection += word + '\n'
  }

  sections.push(currentSection.trim())
  return sections
}

By utilizing this method, the Azure Text-to-Speech API can be called with these split text sections. When making the API call, it’s recommended to select OGG as the audio output type, as OGG files are smaller in size compared to MP3 files. Following these steps, you will have a collection of OGG files, each shorter than 10 minutes.

Concatenating the audio on the frontend? A challenge

Concatenating audio files is relatively straightforward in a server environment, thanks to powerful tools like FFmpeg. It involves saving all the intermediate audio files on the server, merging them using FFmpeg, and then storing the resulting file until it is downloaded by the user.

However, in this project, a more challenging approach was taken: merging the OGG audio files on the frontend using JavaScript. This approach eliminates the need for server file storage, simplifies the architecture, reduces costs, and fits more seamlessly into the current API structure powered by Nuxt.js.

The challenge arises when trying to find a browser-side replacement for the powerful FFmpeg concat function.

Solution #1: Simple concatenation using `cat`

After conducting some research, it seems that the OGG specification allows for the direct concatenation of two OGG files to create a single OGG file with two logical streams. For example, in a Linux bash environment, one can simply use the cat command to achieve this.

`cat 1.ogg 2.ogg > result.ogg`

One prerequisite of this approach is each OGG file must have a unique serial metadata. Fortunately, it appears that the Azure Text-to-Speech API returns OGG files with randomized serial numbers. Kudos to Azure!

Implementing this in the browser is surprisingly simple using Blob.

const sections = splitText(text, 2000)
const blobs = []
for (let i = 0; i < sections.length; i++) {
  const text = sections[i]
  const blob = await convertTextToAudio(text)
  blobs.push(blob) // blob is the ogg file of the text sections
}
const blob = new Blob(blobs, { type: 'audio/ogg' })

However, despite the merged OGG file being playable, most audio players are unable to properly seek within this track. I tested Chrome, VLC player, and Audacity, and none of them could display the correct total duration or seek the track correctly. Since the ability to seek is crucial for our audio ebook use case, this solution is not acceptable.

Solution #2: Web Audio and MediaStream Recording API

When seeking a solution for concatenating OGG files, next thing I do is to ask ChatGPT, and it suggested using the AudioContext provided by the Web Audio API. By using the decodeAudioData function to decode each input OGG file, the files can be concatenated using audioContext.createBuffer and AudioBuffer. It is important to note that this approach assumes that both audio files have the same number of channels and sample rate.

// Function to concatenate two OGG files, provided by ChatGPT
function concatenateOGGFiles(file1, file2) {
  return new Promise((resolve, reject) => {
    // Create audio context
    const audioContext = new (window.AudioContext || window.webkitAudioContext)();

    // Load the first file
    const reader1 = new FileReader();
    reader1.onload = function(e) {
      audioContext.decodeAudioData(e.target.result, function(buffer1) {
        // Load the second file
        const reader2 = new FileReader();
        reader2.onload = function(e) {
          audioContext.decodeAudioData(e.target.result, function(buffer2) {
            // Create a new buffer for the concatenated audio
            const length = buffer1.length + buffer2.length;
            const audioBuffer = audioContext.createBuffer(buffer1.numberOfChannels, length, buffer1.sampleRate);

            // Copy the samples from the first buffer
            for (let channel = 0; channel < buffer1.numberOfChannels; channel++) {
              const channelData = buffer1.getChannelData(channel);
              audioBuffer.getChannelData(channel).set(channelData, 0);
            }

            // Copy the samples from the second buffer
            for (let channel = 0; channel < buffer2.numberOfChannels; channel++) {
              const channelData = buffer2.getChannelData(channel);
              audioBuffer.getChannelData(channel).set(channelData, buffer1.length);
            }

            resolve(audioBuffer);
          });
        };
        reader2.onerror = reject;
        reader2.readAsArrayBuffer(file2);
      });
    };
    reader1.onerror = reject;
    reader1.readAsArrayBuffer(file1);
  });
}

One issue remains: how to save the concatenated AudioBuffer as an OGG file. It turns out an other modern browser API, the MediaStream Recording API would come in handy. The AudioBuffer can be sent to a MediaRecorder, and the result can be saved as an OGG file after the recording ends. Here is a snippet of example code from MDN’s MediaRecorder page.

const chunks = [];

mediaRecorder.onstop = (e) => {
  console.log("data available after MediaRecorder.stop() called.");

  const audio = document.createElement("audio");
  audio.controls = true;
  const blob = new Blob(chunks, { type: "audio/ogg; codecs=opus" });
  const audioURL = window.URL.createObjectURL(blob);
  audio.src = audioURL;
  console.log("recorder stopped");
};

mediaRecorder.ondataavailable = (e) => {
  chunks.push(e.data);
};

Unfortunately, Chrome does not support saving the file as OGG using MediaRecorder. In fact, only the .webm format is supported for audio. While sending an uncompressed .wav file is undesirable due to its large size, using .webm severely limits the compatibility of the resulting audio file. Since Chrome is a widely used browser, and we cannot ignore this limitation, I ultimately did not implement this approach.

Final Solution: Bring FFmpeg to the browser side

Due to the absence of FFmpeg on the frontend, we attempted two alternative methods, both of which proved unsuccessful. However, what if we can actually use FFmpeg on the client side? This would resolve all the issues and provide a straightforward solution.

As it turns out, this is indeed possible with ffmpeg.wasm, thanks to powerful WebAssembly technologies that allow running FFmpeg in the browser.

By incorporating FFmpeg into the browser, the problem of concatenating OGG files simplifies to a single FFmpeg concat command. The resulting code would be similar to the following snippet. Please note that if you use vite as your build system (in my case, Nuxt 3), you may need to apply additional configuration for baseURL, coreURL, and wasmURL due to a CORS issue.

export async function concatOggs (files: Blob[]) {
  if (FFmpeg === null) {
    const baseURL = 'https://unpkg.com/@FFmpeg/core@0.12.4/dist/esm'
    FFmpeg = new FFmpeg()
    FFmpeg.on('log', ({ message }) => {
      console.log(message)
    })
    await FFmpeg.load({
      coreURL: await toBlobURL(`${baseURL}/FFmpeg-core.js`, 'text/javascript'),
      wasmURL: await toBlobURL(`${baseURL}/FFmpeg-core.wasm`, 'application/wasm')
    })
  }
  const inputPaths = []
  for (let i = 0; i < files.length; i++) {
    const file = files[i]
    const { name = i.toString() } = file
    FFmpeg.writeFile(name, new Uint8Array(await file.arrayBuffer()))
    inputPaths.push(`file ${name}`)
  }
  await FFmpeg.writeFile('concat_list.txt', inputPaths.join('\n'))
  await FFmpeg.exec(['-f', 'concat', '-safe', '0', '-i', 'concat_list.txt', 'output.ogg'])
  const data = await FFmpeg.readFile('output.ogg')
  return (
    new Blob([data], {
      type: 'audio/ogg'
    })
  )
};

Once this function is called, the Web worker will handle the rest, and the concatenated OGG file will be available as a downloadable blob. Since FFmpeg recreates the metadata rather than simply concatenating the files, seeking and duration work correctly in the players I tested.

Additional remarks on FFmpeg license

It’s important to note the license issue associated with FFmpeg. FFmpeg is LGPL licensed, which means that if it is directly distributed as WebAssembly to clients, your entire web application would also need to be distributed under a LGPL-compatible license. While this may not be a significant concern for personal or open-source projects, it may not be applicable in other cases.

Convert OpenAI API stream to HTTP streamed response

2023-10-27T20:00:00+00:00

Previously, we covered the implementation of HTTP streamed response for Google’s Text-to-Speech API and Azure Text-to-Speech API. In this post, we will explore the same technique for another hot topic: calling the OpenAI API and streaming ChatGPT responses word by word using HTTP streamed (chunked) responses.

Background

Why is a streamed response useful in this case? Waiting for ChatGPT to complete its answer is time-consuming. In fact, requesting the gpt-3.5-16k model to translate a piece of an article with approximately 400 tokens often requires more than 30 seconds to receive a complete response. If this is implemented in a web app, users will be staring at a loading screen for more than 30 seconds with nothing else to do.

Fortunately, ChatGPT (and most other mainstream generative models) generate responses word by word. They predict the most probable word sequence as an answer based on the input text and existing words. To enhance the user experience, it would be better to display this word-by-word generation process live to the user, allowing them to start reading immediately and feel engaged. This is the standard user experience for most AI programs nowadays.

Possible Solutions and Why HTTP?

To enable the UI or frontend to show the generative process in real-time, we need to stream the response from the OpenAI API from our backend server to the frontend. OpenAI API and its SDKs use Server-Sent Events as an approach, while websockets are another popular technique in this context. However, implementing these techniques requires additional knowledge and libraries.

Is there a simpler way? If we can accomplish this task using only HTTP and XHR, without relying on extra knowledge or libraries, it would be ideal. Fortunately, we can achieve this by utilizing HTTP chunked responses as a long-lived streaming connection. If you are familiar with older techniques, you might recognize this as a form of “long-polling.”

Backend, all about transformation

The OpenAI library supports a stream mode specifically for this purpose. However, in the sample code, only a blocking “for await...of” loop is used to send the output to stdout.

import OpenAI from 'openai';

const openai = new OpenAI();

async function main() {
const stream = await openai.chat.completions.create({
  model: 'gpt-4',
    messages: [{ role: 'user', content: 'Say this is a test' }],
    stream: true,
  });
  for await (const part of stream) {
    process.stdout.write(part.choices[0]?.delta?.content || '');
  }
}

main();

To make our API non-blocking and truly streaming, we need to pipe the stream to an HTTP response object instead of using a “for await...of” loop. However, in the provided code, we cannot easily achieve that since the output part from the stream is not directly useful. What we actually need is part.choices[0]?.delta?.content || ''. In previous articles, we used PassThrough to connect streamed input and output. However, in this case, we need to use Transform. In fact, PassThrough is just a Transform that does nothing. Here, we want to transform any output (JSON object) from the stream into words (part.choices[0]?.delta?.content || '') to be displayed to the users.

To use Transform, we define the transform function parameter and perform the transformation inside this function. Note that since the input piped into Transform is an object, we must also set objectMode: true. Otherwise, an The "chunk" argument must be of type string or an instance of Buffer error will be thrown when an object is received, as Node.js expects the input to be string or buffer.

The resulting code is as follows.

const response = await openai.chat.completions.create({
  messages: [
    {
      role: 'system',
      content: 'Translate the given blog post to Chinese please'
    },
    { role: 'user', content: text }
  ],
  model: 'gpt-3.5-turbo-16k',
  stream: true
})
const stream = Readable.from(response)
const bufferStream = new Transform({
  objectMode: true,
  transform (chunk, \_, callback) {
    const data = chunk as ChatCompletionChunk
    callback(null, data.choices[0]?.delta?.content || '')
  }
})
stream.pipe(bufferStream)
return sendStream(event, bufferStream)

Frontend, wth?

The above code covers the API part. However, to handle a chunked response, we also need special handling on the frontend. Conceptually, it’s simple: keep receiving the chunked response and display it as text on the DOM. However, things get more complicated in practice.

const { data, error } = await useFetch('/api/translate', {
  method: 'POST',
  body: {
    text: t,
    language: translateLocale.value,
    type
  },
  responseType: 'stream'
})
if (error.value) { throw error.value }
const stream = data.value as ReadableStream

In my case, I’m using Nuxt 3 to build my web app. The official way to make XHR calls in Nuxt 3 is by using useFetch. Setting responseType: 'stream' allows the response to be in stream mode, which is straightforward. However, the data returned in this case is a ReadableStream object. And here’s where the trouble begins.

Reading a ReadableStream in the browser environment is much more challenging compared to using a Node.js stream. Fortunately, Mozilla understands this pain and provides a specific document page with hints, especially regarding using a simple for await...of loop to save us the trouble of reading the stream.

In an ideal world, the code would look like this:

// in an ideal world
const stream = data.value as ReadableStream

for await (const chunk of stream) {
  // translateOutput.value is binded to a textarea as UI output
  translateOutput.value += value
}

However, this throws a “stream is not async iterable” error on Chrome. What’s going on? It turns out to be a bug in chrome. It’s frustrating, but at least the bug report provides a workaround.

One last piece of the puzzle is that the received chunk is in bytes, represented as a Uint8Array in modern browsers. We need to convert it back to a UTF-8 string. In Node.js, this is trivial with just a toString('utf8') call. However, in the browser environment, we need some help from the TextDecoder class, which handles byte decoding. If you’re targeting older browsers, you might need a polyfill for TextDecoder.

Here’s our final frontend code:

const { data, error } = await useFetch('/api/translate', {
  method: 'POST',
  body: {
    text: t,
    language: translateLocale.value,
    type
  },
  responseType: 'stream'
})
if (error.value) { throw error.value }
const stream = data.value as ReadableStream
// should use await-for-of if not for https://bugs.chromium.org/p/chromium/issues/detail?id=929585
const reader = stream.getReader()
let done = false
while (!done) {
  const chunk = await reader.read()
  done = chunk.done
  const value = chunk.value as Uint8Array
  // translateOutput.value is binded to a textarea as UI output
  translateOutput.value += new TextDecoder().decode(value)
}

Done… or not yet? NGINX and load balancers

The result works well, but then I encountered an additional infrastructure-related issue. The code works fine on my local machine, but on the production deployment, the frontend doesn’t receive anything from the stream. The text only shows up once the whole HTTP request is complete without chunking. This defeats the purpose of my work.

Fortunately, I quickly identified the issue. It turns out our NGINX proxy is buffering chunked response by default. Simply adding proxy_buffering off; fixed the issue. It’s also good to know that neither Cloudflare nor the Google Cloud load balancer does this by default. Otherwise, I would have had a major headache.

Another issue arose when we had to adjust the timeout for our Google Cloud HTTP load balancer running as our Kubernetes ingress, from 60 seconds to 1800 seconds. This timeout affects all HTTP connections in the cluster, regardless of whether they are HTTP chunked responses, server-sent events, or websockets.

Done (Real)

This task turned out to be more of a hassle than I initially thought. However, we now have a properly streaming chatbot using only HTTP protocols, without the need for extra libraries.

Convert Azure text to speech API result to HTTP streamed response

2023-10-14T18:40:00+00:00

Previously, we discussed the usage of 「Google’s Text-to-Speech API](/code/2023/10/13/convert-google-text-to-speech-to-nodejs-stream.html)I in a Node.js stream. Similarly, when using the Azure version of the API, it might be preferable to receive a streamed response over a buffer.

Unlike Google’s API, which only supports the entire buffer as the response format, the Azure API allows us to set different output options using the audioConfig class, as described in the audioConfig documentation. These options include fromAudioFileOutput, fromDefaultSpeakerOutput, and fromStreamOutput. It’s important to note that audioConfig is also used for input configuration in other scenarios, but we won’t cover that here.

The official Azure API guide provides a sample code snippet, which demonstrates the usage of fromAudioFileOutput. This approach writes the audio output as a file in the file system. However, this method has two drawbacks: first, similar to the entire buffer approach, we would need to wait for the file download to complete before proceeding, and second, after the response is sent, the files would need to be manually removed.

Fortunately, Azure also provides the fromStreamOutput function, as documented here, which allows us to use a stream as the output. Two types of streams are available: PullAudioOutputStream and PushAudioOutputStream. The pull stream requires the caller to invoke its read() method to obtain data, while the push stream uses the write() and close() methods of the callback object. In this case, we will use the PushAudioOutputStream and mimic the behavior of a write stream by utilizing PassThrough, as we did previously. It’s important to note that while PassThrough doesn’t have a close() method, calling end() serves the same purpose in this context.

Additionally, the speakTextAsync function, which utilizes the speakTextAsync method of the SpeechSynthesizer class, accepts callback functions instead of returning a promise. To provide convenience, a simple promise wrapper is used.

Below is a code snippet from a Nuxt 3 project that demonstrates the integration with Azure’s Text-to-Speech API:

import sdk, { SpeechSynthesizer } from "microsoft-cognitiveservices-speech-sdk";
import { PassThrough } from "stream";

function speakTextAsync(synthesizer, text) {
  return new Promise((resolve, reject) => {
    synthesizer.speakTextAsync(text,
      function (result) {
        synthesizer.close();
        resolve(result)
      },
      function (err) {
        synthesizer.close();
        reject(err);
    })
  })
}

// ...

const bufferStream = new PassThrough();
const stream = sdk.PushAudioOutputStream.create({
  write: (a) => bufferStream.write(Buffer.from(a)),
  close: () => bufferStream.end(),
});
const audioConfig = sdk.AudioConfig.fromStreamOutput(stream);
const speechConfig = sdk.SpeechConfig.fromSubscription(subscriptionKey, serviceRegion);

speechConfig.speechSynthesisVoiceName = LANG_TO_NAME[language];
speechConfig.speechSynthesisOutputFormat = sdk.SpeechSynthesisOutputFormat.Ogg16Khz16BitMonoOpus;

const synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);
synthesizer.SynthesisCanceled = function (s, e) {
  const cancellationDetails = sdk.CancellationDetails.fromResult(e.result);
  let str = "(cancel) Reason: " + sdk.CancellationReason[cancellationDetails.reason];
  if (cancellationDetails.reason === sdk.CancellationReason.Error) {
      str += ": " + e.result.errorDetails;
  }
  console.error(str);
};

await speakTextAsync(synthesizer, text)
setHeader(event, 'content-type', 'audio/ogg; codecs=opus');
setHeader(event, 'content-disposition', 'attachment; filename="speech.ogg"');
return sendStream(event, bufferStream);

Convert Google text to speech API result to HTTP streamed response

2023-10-13T02:02:00+00:00

When using the Google Cloud Text-to-Speech API, the default behavior of the synthesizeSpeech() method, as described in the synthesizeSpeech() documentation, is to return the audioContent as a complete buffer.

However, if you want to enable streaming playback for long audios, you can convert the buffer to a file streaming response. To achieve this, you can utilize the PassThrough class from the Node.js Stream API, as outlined in the PassThrough documentation.

Here is a sample snippet from a Nuxt 3 project that demonstrates this:

import { TextToSpeechClient } from "@google-cloud/text-to-speech/build/src/v1";

// ...

const request = {
  input: { text },
  voice: { languageCode: language, ssmlGender: "NEUTRAL" },
  audioConfig: { audioEncoding: "OGG_OPUS" },
};
const [response] = await client.synthesizeSpeech(request);
// response.audioContent is a Buffer object
if (!response.audioContent) return { error: "No audio content" };

// PassThrough is both a write and read stream
const bufferStream = new PassThrough();
// set the whole buffer content as PassThrough input.
bufferStream.end(Buffer.from(response.audioContent));

// set mime types
setHeader(event, 'content-type', 'audio/ogg; codecs=opus');

// send the PassThrough output as HTTP streamed response
return sendStream(event, bufferStream);

How to workaround “Import failed” error in Medium with debugger

2023-05-15T16:00:00+00:00

Hacking the “Import your story” in debugger for proper backdate

Originally posted in LikeCoin medium publication.

Import story tool

Medium has a powerful feature that allows you to import websites you own into stories. All you need to do is to provide an URL then press Import. Check it out if you haven’t tried it before.

You might ask “Why should we use import though?”. The Medium editor is so easy to use and powerful (kudos to the editor developers) that it can handle pasting from external sources very easily. It often takes a simple copy-and-paste to post any articles from my WordPress site to Medium.

However one significant difference is the “import story” feature parses the published date and canonical link of the original website, then sets them accordingly in the Medium story. On the other hand, you cannot change the published date of any manually pasted story. The published date would be set as the time you post the pasted story in Medium.

Backdating the published date is an important issue when you are trying to sync articles in batches from existing websites to a Medium publication. You don’t want to bomb subscribers with notifications of stories that are months old, or flood the publication page with post from 2022.

Import fail!

So when I see this error when importing posts from our WordPress site, I know I am screwed.

There is no useful error message as in why the import failed. Medium document simply states that if the service fails, it fails, and tells you to paste the content instead. As mentioned earlier, it is not feasible for me to manually post all stories without backdating. A simple Google search reveals that it is possible to force a backdate, but it requires using a placeholder page containing the published date metadata. In the case of articles, this can be accomplished by pushing the placeholder page to a Github repository.

As a developer, I am too lazy to host a page and manually set the dates for each post I need to import. Lets try digging into the Medium import error page and see if we can find out the actual cause of the failed import. We will be using Chrome’s developer tool.

First thing one would look at would be the console, any JavaScript or API call errors should be shown here. However as seen in the image, only some boring message about CSP is shown, so no luck in the console.

Second thing to look for would be in the Network tab. Since Medium uses a third part service for parsing and importing external website, one would expect some external API is called. We can filter the network request to “Fetch/XHR” to only show API calls, and see if anything interesting shows up. However, there is no failed HTTP requests. Most of the request are analytics events. By inspecting the payload and response one by one though, a particular API call seems interesting.

oh-noes

The endpoint is called /oh-noes and the request payload looks exactly like a Javascript error stack. This seems to be some home brew error logging API (we all know Sentry is expensive). The stack value inside the payload points to an “Import Error” throw by the Javascript file /main-posters.bundle.${hash}.js , the exactly location of the file and the line number is also shown. By digging into the source of this file, we can trace the origin of the import error that is troubling us. To view the source of js in Chrome, go to Sources tab of the developer tools, then find the js file according to the path shown in the stack payload.

Debugger

It is a obfuscated JavaScript file, as expected in most modern web application. All the variable names are shortened, functions names and structure are messed up for improved size and performance. Let us just skip straight to the interesting part by searching the function name QRa and line number mentioned in above error payload.

Finally something promising show up. We see words like “postHTML”, also a “errorCode” that is set to 400, which probably hints it is a HTTP error code. Going a few more lines below allow us to see how Medium show different error messages for some error codes it encounters.

As we can see above, there are three kinds of import error. One for 400/404 error, one for 403/500/504 error, and one that catches all error. We can assume errCode is the HTTP error code encountered by the importer when crawling the target URL. Unfortunately, the error message we see in our import error page is the catch-all case. To understand the actual errCode for our case, we would want to know about the stat of a.Ph variable during our import. To achieve this, we can set a breakpoint on QRa .

After setting a breakpoint, retry the import flow by refreshing the browser. The page execution will pause when it reaches the breakpoint we set. On the left we can see where the JavaScript execution was paused; on the right we can see the content(state) of all the variables when the JavaScript reaches the breakpoint. The variable we are interested in is a.Ph (note that it is case sensitive).

Unfortunately we can see the errorCode is 0, which means we don’t know why the import fail. However a very interesting observation is that all the fields except postHTML is properly filled. As we can see in the source, having a.Ph.postHTML empty would throw us into an error case. What if we actually fill in some random text for postHTML here? Actually we can do that!

Hacking values in debugger

Double click on the postHTML key, it would become editable. Type in a random text enclosed with "" for it to be a proper string. Continue the page JavaScript execution by pressing continue in the Chrome developer tool popup.

Its’ working, or not?

Voila! The error is gone! But what do we actually get as the result?

The text we just entered in postHTML would show up as the story content!

Well thats not ideal for import, but remember that:

The importer actually successfully crawled all the metadata required for import, except postHTML . That can be seen in debugger. Medium editor is very powerful at handling pasted content Lets try copy and pasting the content from the original site…

Perfect. The best part of this is the publication date and canonical link would automatically be set according to the target URL. No pushing and editing needed!

Result

The result of importing multiple article from 2022 is shown below. Welcome to browse our publication or our more updated site.

Shoutout to Medium for the powerful and friendly story editor. But it would be nicer if the import story features can show more useful error message!

TL;DR

if your import fails with unknown error, but you really need those backdates:

Import URL and fail once
Check the JS source of Import Error by viewing /oh-noes payload
Breakpoint on the origin of the Import Error
Set a random non-empty postHTML to bypass the error screen
Paste back the actual content
Publish!

Docker + HAProxy = PROXY protocol for Everything

2017-09-27T16:00:00+00:00

For cantonese version, please check here

Originally posted on Lakoo’s medium on 2017-09-28, translated to english

MMORPG Experience Sharing / How to make any TCP network service in a container support PROXY protocol using Docker networking.

Background: Our mobile MMORPG Teon server is located in AWS Japan. Recently, some Taiwanese players have reported network issues and suspected problems with certain ISP routes. Relocating the server is too cumbersome, so we want to set up a proxy server in Taiwan to provide reliable and stable connections to all players.

Issue: The game server needs to record players’ IP addresses and cannot lose player IP information through proxy NAT.

In general, HTTP proxies use headers like X-Forwarded-for to preserve the original connection’s IP information, but this approach is not applicable in a pure TCP environment.

Traditional transparent proxy configurations require kernel support for TPROXY and modification of the default gateway. However, even when using a cross-AWS GCP VPN network, the EC2/VPC gateway settings do not support specifying a server outside of AWS as the internet gateway.

Solution:

uses HAProxy + PROXY protocol as a transparent proxy.

First, within the game server’s docker network, we launch an HAProxy container to act as the default gateway for the entire docker network.

haproxy:
image: tombull/haproxy
links: - game-server-container # game server's container name
ports: - "8000:8000"
cap_add: - ALL # ALL is for demo lazy purpose only
environment:
HAPROXY_PORTS=8000
networks:
teon-net:
ipv4_address: 172.20.0.10
volumes: - ./haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg
networks:
teon-net:
driver: bridge
ipam:
config: - subnet: 172.20.0.0/16
gateway: 172.20.0.1

The image uses https://hub.docker.com/r/tombull/haproxy/, which already includes the necessary iptable rules. You just need to configure net.ipv4.ip_nonlocal_bind and NET_ADMIN capabilities.

Since docker-compose currently doesn’t have a way to specify individual container gateways, you need to enable NET_ADMIN within the game server’s container and run an ip command during container entry to set the gateway.

ip route delete default
ip route add default via 172.20.0.10

In HAProxy, open a port that accepts the PROXY protocol and overwrite the original connection’s IP when connecting to the game server, achieving the effect of a transparent proxy.

frontend game-proxy
mode tcp
option tcplog
option clitcpka
bind 0.0.0.0:8000 accept-proxy
default_backend teon-servers

backend game-servers
option tcplog
mode tcp
source 0.0.0.0 usesrc clientip # overwrite src ip
server game game-server-container:8080

Since the game server’s container specifies HAProxy as the default gateway, HAProxy will handle the subsequent NAT issues on inbound connections. As for outbound connections, additional iptable NAT rules are required:

# Not sure why this is needed for outbound

# `! -d` means do not apply for destination 172.20.0.0/16

iptables -t nat -A POSTROUTING ! -d 172.20.0.0/16 -o eth0 -j MASQUERADE

backend reverse-proxy
mode tcp
server aws-haproxy aws-ip-address-here:8000 send-proxy

By connecting to the Taiwan HAProxy, anyone will be reverse-proxied to the AWS HAProxy and then connected to the game server. By controlling the number of HAProxy instances and the backend targets, you can preserve the original IP and have control over routing.

The above architecture, implemented using Docker networking, offers portability advantages:

It avoids the need to manage iptables and routing for each backend or proxy by creating separate VMs, reducing startup costs, and facilitating rapid deployment and modification. Architecture can be managed on a per-docker-compose/stack basis. Conceptually, any TCP-based docker application can be treated as a single application supporting the PROXY protocol, without considering network and proxy architecture issuesThe article you mentioned discusses how to use Docker networking and HAProxy to support the PROXY protocol for TCP network services running in containers. The PROXY protocol allows preserving the original IP address information when using proxies in a TCP environment.

Conclusion: The PROXY protocol, as a technology for preserving the source IP in proxies, has fewer limitations compared to the traditional TPROXY technique. However, it seems that support for PROXY protocol beyond the HTTP proxy level is still lacking, especially considering that AWS ELB already supports it (supported list).

The above solution using Docker with HAProxy to support the PROXY protocol aims to help everyone in environments other than web servers to easily benefit from the advantages of the PROXY protocol.

Additional read:

https://www.haproxy.com/blog/haproxy/proxy-protocol/

https://www.haproxy.com/blog/preserve-source-ip-address-despite-reverse-proxies/

https://www.haproxy.com/blog/using-haproxy-with-the-proxy-protocol-to-better-secure-your-database/

魔術！乜叉嘢都支援 PROXY protocol！

2017-09-27T16:00:00+00:00

For english version, please click here

Docker + HAProxy = PROXY protocol for EVERYONE

Originally posted on Lakoo’s medium on 2017-09-28

MMORPG 經驗分享 / How to make any TCP network service in container support PROXY protocol, using docker networking.

背景：我哋隻 mobile MMORPG Teon 伺服器位於 AWS 日本。最近有部分台灣玩家反映網絡唔順暢，懷疑係部份 ISP 線路有問題。搬遷伺服器又太麻煩，所以想係台灣起台 proxy 伺服器，提供可靠穩定線路俾所有玩家。

問題：遊戲伺服器需要記錄玩家 IP，不能夠係 proxy NAT 後損失玩家 IP 資訊。

一般 HTTP Proxy 使用 X-Forwarded-for 之類嘅 Header 保留原連接嘅 IP 資訊，但係純 TCP 環境並不適用。

而傳統 Transparent proxy 的設定，需要 kernel 支援 TPROXY ，亦要修改 default gateway，但即使係用咗跨 AWS GCP 嘅 VPN network， EC2/VPC 的 gateway setting 都並不支援指定 AWS 以外的伺服器作為 Internet gateway。

解決方法：

圖來自 https://www.haproxy.com/blog/using-haproxy-with-the-proxy-protocol-to-better-secure-your-database/ 使用 HAProxy + PROXY protocol 作 transparent proxy。

首先係遊戲伺服器嘅 docker network 內起一個 HAProxy container ，作為整個 docker network 嘅 default gateway。

haproxy:
image: tombull/haproxy
links: - game-server-container # game server's container name
ports: - "8000:8000"
cap_add: - ALL # ALL is for demo lazy purpose only
environment:
HAPROXY_PORTS=8000
networks:
teon-net:
ipv4_address: 172.20.0.10
volumes: - ./haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg
networks:
teon-net:
driver: bridge
ipam:
config: - subnet: 172.20.0.0/16
gateway: 172.20.0.1

Image 使用 https://hub.docker.com/r/tombull/haproxy/ ，已經有齊需要嘅 iptable rules，只要設定好 net.ipv4.ip_nonlocal_bind 同 NET_ADMIN 之類嘅 CAP 就可以。

另外因為 docker-compose 暫時未有方法指定個別 container 嘅 gateway ，需要在遊戲伺服器嘅 container 內開啟 NET_ADMIN，同係 container entry 時行 ip command ，設定 gateway。

ip route delete default
ip route add default via 172.20.0.10

係 HAProxy 開一個接受 PROXY protocol 嘅埠，連接到遊戲伺服器時覆寫返原連接嘅 IP ，就可以起到 transparent proxy 嘅效果。

frontend game-proxy
mode tcp
option tcplog
option clitcpka
bind 0.0.0.0:8000 accept-proxy
default_backend teon-servers

backend game-servers
option tcplog
mode tcp
source 0.0.0.0 usesrc clientip # overwrite src ip
server game game-server-container:8080

同時因為遊戲伺服器嘅 container 指定咗 default gateway 係 HAProxy，HAProxy 會自行處理 inbound 後續嘅 NAT 問題。至於 outbound 連接，實測需要增加 iptable NAT rule 處理：

# Not sure why this is needed for outbound

# `! -d` means do not apply for destination 172.20.0.0/16

iptables -t nat -A POSTROUTING ! -d 172.20.0.0/16 -o eth0 -j MASQUERADE

最後係台灣 HAProxy 中繼設定

backend reverse-proxy
mode tcp
server aws-haproxy aws-ip-address-here:8000 send-proxy

咁樣任何人只要連接台灣 HAProxy，就會被 reverse proxy 到 AWS 的 HAProxy，再被連接到 game server。中間只要控制 HAProxy 嘅數量同 backend 目標，就可以做到保存原 ip，自由控制 routing 的效果。

上述架構使用 docker networking 處理嘅好處，主要在於 portability：

避免咗每一個 backend 或者 proxy 都要自行開個 vm 管理 iptable 同 routing ，減少開機成本，同時方便快速部署同修改。可以以一個 docker-compose/stack 為單位管理 architecture，概念上即係任何使用 TCP 嘅 docker application ，只要使用上面設定方法，架構上都可以當係單個支援 PROXY protocol 嘅 application，而毋須考慮網絡和 proxy 架構問題。結論： PROXY protocol 作為 proxy 保存 source ip 嘅技術，比傳統 TPROXY 嘅技術限制少。但係在 aws ELB 早已支援 PROXY protocol 嘅當下， http proxy 層面以外嘅支援似乎還欠奉 (supported list)。

上述利用 docker 配合 HAproxy 支援 PROXY protocol 嘅方案，希望幫助到大家係 web server 以外嘅環境，簡單地利用到 PROXY protocol 嘅好處。

Additional read:

https://www.haproxy.com/blog/haproxy/proxy-protocol/

https://www.haproxy.com/blog/preserve-source-ip-address-despite-reverse-proxies/

https://www.haproxy.com/blog/using-haproxy-with-the-proxy-protocol-to-better-secure-your-database/

Android NDK project 用 Circleci 2.0 自動出 build

2017-05-30T16:00:00+00:00

Automated build of Android NDK native app in CircleCI 2.0

Originally posted on Lakoo’s medium on 2017-05-31

tl;dr 如果你咁啱都係用 compiled SDK 25，NDK 14b，build tool 25.03 嘅話，下面嘅 image+ config 可以直接攞去試用

jobs:
build:
docker: - image: lakoo/android-ndk:25-25.0.3-r14b
working_directory: ~/app
environment:
TERM: dumb
steps: - checkout - run:
name: Assemble Stable Release
command: ./gradlew assembleStableRelease - store_artifacts:
path: app/build/outputs/apk/
destination: apks/

正文：

話説因爲前人對於效能嘅奇怪執着，Teon 嘅 Android 客戶端主要 code base 都係由 C 配合 JNI 組成，用 NDK compile 做原生嘅 .so 檔。Develop 嘅時候依靠 Android Studio 都叫勉強可以簡單 setup 到個開發環境，但係打算嘗試 CI 嘅時候，一係就 compile 嘅速度龜到暈，一係就唔係少哩樣就係少個樣，根本控制唔到裝咗咩，加上 NDK 嘅 toolchain 成 GB 咁大，build time 簡直係惡夢。

咁啱最近 circleci 出咗 2.0 版本，一次過解決曬所有問題： 1。支援自製嘅 docker base image 作爲 ci 底，用啲咩工具自己係個 Dockerfile 加減就得，唔使晒時間下載無用嘅嘢，亦唔使等十世都先等到常用嘅工具加入 CI 2。行 docker， compile clang 快到暈 3。 circleci 2.0 嘅反應快過舊版，重複類似嘅 build 都可以閒返唔少等 init vm 同 per step setup 嘅時間

假設你已經有個 github repo，簡單講下點樣可以 setup：

1。首先預備 compile android ndk 需要嘅 base image，以目前 Teon 用緊嘅 https://hub.docker.com/r/lakoo/android-ndk 做例子，

FROM openjdk:8-jdk
...
RUN cd /opt/android-sdk-linux && \ wget -q — output-document=sdk-tools.zip https://dl.google.com/android/repository/sdk-tools-linux-3859397.zip && \ unzip sdk-tools.zip && \ rm -f sdk-tools.zip && \ echo y | sdkmanager “build-tools;25.0.3” “platforms;android-25” && \ echo y | sdkmanager “extras;android;m2repository” “extras;google;m2repository” “extras;google;google_play_services”
RUN wget -q — output-document=android-ndk.zip https://dl.google.com/android/repository/android-ndk-r14b-linux-x86_64.zip && \ unzip android-ndk.zip && \ rm -f android-ndk.zip && \ mv android-ndk-r14b android-ndk-linux

Dockerfile 只安裝咗最低限度需要嘅 build tools，sdk，ndk 同 support library/google service 嘅 repo 留意用咗熱門嘅 library image openjdk:8-jdk 做底，可以令到 docker pull 嘅時候共享其它人嘅 cache 加快速度另外就係同一個 docker image 如果 build machine 有 pull 過嘅都會有 cache 免去 download 步驟，所以上面嘅 image 啱使嘅話不妨攞去用，大家方便！

2。寫返啱個 config.yml，係 repo root 開個叫 .circleci 嘅資料夾放入去

jobs:
build:
docker: - image: lakoo/android-ndk:25-25.0.3-r14b
working_directory: ~/app
environment:
TERM: dumb
steps: - checkout - run:
name: Assemble Stable Release
command: ./gradlew assembleStableRelease - store_artifacts:
path: app/build/outputs/apk/
destination: apks/

會重複不斷 build 嘅可以考慮加個 step cache 返 gradle 管理嘅 library

- save_cache:
  key: teonclient--
  paths: - "~/.gradle" - "~/.m2"
  restore 只要係 checkout 候加返同一個 key 嘅 restore 就得
- restore_cache:
  key: teonclient--

3。登入 circleci，揀返啱個 github project 開始第一個 build，成功之後嘅成品 apk 係 artifact 到下載得返

成功嘅話 apk 就會係 artifacts 出現

同樣道理，proguard mapping 同 symbol 都可以用 step 儲成 artifact，但要留意 artifact 並唔係俾用家作永久保存用，放一排之後有可能會俾人 del！

4。有需要可以去 project setting 設定埋 Build forked pull requests，咁每個 pull request 就會自動觸發 circleci ，結果會自動係返 github 顯示

private repo 嘅 Build forked pull requests 要手動開啟

成功就會出綠剔

小結：目前 Teon 用咗 circleci 自動 build PR 後，無論係技術同事 review PR 定係 QA 同事做 feature QA 都方便唔少。其實 circleci 使用環境參數配合 gradle 嘅 build variant 設定可以做到配合唔同 workflow 嘅更多變化，有機會再寫

William Chong’s Cloud

Modifying `NewSingleHostReverseProxy` Response Data in Go without HTTP Errors

Background

Built-in Functions vs. Middleware

HTTP Error?

Fixing the Content-Length Header in ModifyResponse Approach

Fixing the Content-Length Header in Gin Middleware Approach

Conclusion

Fixing Google Analytics (GA4) purchase funnel and Stripe Checkout

Background

Issue: 0 purchase events in GA funnel

GA Sessions, Stripe Checkout and third party domains

Fixing session with cross-domain measurement

Making cross-domain measurement work manually

Another solution: Server side event recording

Extra config to make analytics data cleaner

Concatenating ogg vorbis (.ogg) audio files on frontend

Background

Working around the output limit

Concatenating the audio on the frontend? A challenge

Solution #1: Simple concatenation using cat

Solution #2: Web Audio and MediaStream Recording API

Final Solution: Bring FFmpeg to the browser side

Additional remarks on FFmpeg license

Convert OpenAI API stream to HTTP streamed response

Background

Possible Solutions and Why HTTP?

Backend, all about transformation

Frontend, wth?

Done… or not yet? NGINX and load balancers

Done (Real)

Convert Azure text to speech API result to HTTP streamed response

Convert Google text to speech API result to HTTP streamed response

How to workaround “Import failed” error in Medium with debugger

Hacking the “Import your story” in debugger for proper backdate

Import story tool

Import fail!

oh-noes

Debugger

Hacking values in debugger

Its’ working, or not?

Result

TL;DR

Docker + HAProxy = PROXY protocol for Everything

魔術！乜叉嘢都支援 PROXY protocol！

Android NDK project 用 Circleci 2.0 自動出 build

Fixing the Content-Length Header in `ModifyResponse` Approach

Solution #1: Simple concatenation using `cat`