String Length & Bytes In JavaScript
• JavaScript, Node
In most cases, you can assume one character in a string is 1 byte, but that is only in most cases. How many bytes do you think ü
is? It turns out it is 2 bytes. But if you run 'ü'.length
it will return the string’s length as 1. Unicode characters can appear as a single character but be made up of multiple bytes of data. Usually, this isn’t a big deal if you just need the length of a string, but if you actually need the size in bytes of a string it is a big deal.
I bring this all up because I recently came across an issue with a library that was uploading data to AWS S3 and not letting the AWS SDK automatically compute the size of the data. They were doing something like below:
return s3Client.putObject({
Body: contents || '',
Bucket: bucket,
Key: fullKey,
ContentLength: contents ? contents.length : 0,
ContentType: CONTENT_TYPE_PLAIN_TEXT
});
Once I started trying to upload data that had Unicode characters I was getting BadDigest: The Content-MD5 you specified did not match what we received.
errors from AWS. I knew the AWS SDK would automatically compute the size of the data being uploaded so things started working once I removed that property. It wasn’t until then that I realized why it actually mattered in this case.
If you are using node and need the true size, in bytes, of a string you can do so with a Buffer
.
Buffer.from('ü').length
// Returns 2