It’s not really about immigration, is it?

July 15, 2018

Last week in London, a taxi cab driver struck up a conversation with me while taking me to Victoria station. “Ah, you’re American are you?” he asked. “Trump’s gonna be in town in a few days, you know,” he added.

Oh yes. I knew. I had been amused earlier that morning when reading that the Mayor of London had approved the Baby Trump Blimp.

“I hear you’ve got quite the welcome planned for him,” I replied.

“Oh yes. He’s doing a great job, you know. Speaks his mind. Doesn’t take the piss from anybody. And, he’s actually doing something about the immigration problem over there.”

Oh, my. It was early morning, and I wasn’t really in the mood for this kind of conversation. So, I tried to mumble my way incoherently to something more neutral.

That got me a minute or so.

“Where in the States are you from?” he asked. Well, at least there’s a way back to a neutral conversation, I thought.

“I actually live in Berlin now.”

“Oh, really!” he exclaimed. “Hell of a lot of immigrants there now. I hear they’re just flooding in, aren’t they? How’s it like living there?”

Shit. Ok, let’s do this.

“Well, actually, it seems better than you what you might be hearing. I know some of the locals are really unhappy about it. From my point of view, however, the only impact on my day-to-day life has been to make for long queues at the local government offices. And, really, given the reality of what they’ve come from, I can handle that.”

“Really?” he asked with surprise. “Still, I can’t imagine living with so many immigrants all around.”

Ah, for fuck’s sake.

“You know, since I’m American, I guess that makes me an immigrant as well,” I said. “My wife too. She’s from Greece.”

Silence.

Fifteen minutes later or so, we arrived at the train station. I paid. The cabbie drove off. And, I’ve been thinking about that conversation a lot in the days since and about how being white means I’m not one of those immigrants. Which means it’s not really about immigration now, is it?

Working with Azure blob md5 digests

June 21, 2018

Azure Blob Storage provides MD5 digests of files in base64 encoding. This causes a small issue because many command line tools — as well as AWS S3 — use hex encoding for their digests. You’ll need to do some conversions if you want to compare files using the two different digests.

What are MD5 digests and why would we care about comparing them? In a nutshell, they provide a short constant-sized checksum for a file. No matter how big a file is, you can get a summary that represents the state of the file using a standard function. Take a digest from a 4MB file (or a 4TB one!) on your local computer and one generated on a server of a file you want to compare it to, and you only need to transfer 32 or so bytes to tell if the contents are the same or not.

Well, that’s not strictly true. There are collisions in the MD5 algorithm that make it unacceptable for cryptographically secure purposes. When you know that you’re comparing what should be the same file on both sides of a network connection, however, it’s perfect for the job. Comparing digests instead of transfering entire files makes a directory sync process much faster.

Ok. Enough about that. Let’s look at how to compare digests in the two different encodings, which is the whole purpose of this post.

Using shell

The command line md5 tool on macOS (also known as md5sum on Linux) for my website’s 404 page on my local filesystem gives:

$ md5 -q public/404.html 
"3b75d4041511720a85f535897008d14b"

If I look at that same file’s checksum as returned by Azure (I’m using the az and jq command line tools to fetch the properties of the blob and select out the right field from the returned JSON), I get:

$ az storage blob show -c \$web -n 404.html | 
    jq -r .properties.contentSettings.contentMd5
"O3XUBBURcgqF9TWJcAjRSw=="

There’s the difference between the hex and base64 encodings. Comparing those two digests won’t give us the results we want. Enter xxd, the command line hexdump tool, and base64 the command line base64 encoder:

$ md5 -q public/404.html | xxd -r -p | base64
"O3XUBBURcgqF9TWJcAjRSw=="

Perfect. Now we can compare digests. And, as you’d expect, we can run everything in reverse to go the other way:

$ echo "O3XUBBURcgqF9TWJcAjRSw==" | base64 -D | xxd -p
"3b75d4041511720a85f535897008d14b"

Put all of the above together, and we can get the hex-encoded hash direct from Azure with:

$ az storage blob show -c \$web -n 404.html |  
    jq -r .properties.contentSettings.contentMd5 | 
    base64 -D | 
    xxd -p
"3b75d4041511720a85f535897008d14b"

Now, we’re cooking. If you’re a hardcore shell user, like my friend Nathan Herald (who taught me all about the magical command-line JSON processor that is jq), you can wrap these pipelines up in some bash functions and you’re set.

Using ruby

While I like proving things out using shell, the reason I went down this rabbit hole in the first place was to sync my website content with Azure Blob Storage. I use Rake to build my site, so let’s look at this in Ruby.

Grabbing the hex encoded digest of a file in Ruby is straightforward:

require 'digest'

path = "public/404.html"
hexdigest = Digest::MD5.hexdigest(File.read(path))
# hexdigest is "3b75d4041511720a85f535897008d14b"

Getting the base64 encoded version is just as easy:

base64digest = Digest::MD5.base64digest(data)
# base64digest is "O3XUBBURcgqF9TWJcAjRSw=="

If you need to go between the two, however, because you’re comparing a digest you have from AWS S3 with one from Azure Blob Storage, it’s a bit more work. This requires getting comfortable with Ruby’s Array#pack and String#unpack methods. To go from hex to base64 encoding, you can do:

base64digest = [[hexdigest].pack("H*")].pack("m0")
# base64digest is "O3XUBBURcgqF9TWJcAjRSw=="

And, to do the reverse from base64 to hex:

base64digest.unpack("m")[0].unpack("H*")[0]
# returns "3b75d4041511720a85f535897008d14b"

These incantations aren’t exactly pretty and don’t really expose their intent well, but once figured out they work nicely and can be hidden away behind a function somewhere.

Putting things together

I mentioned that I went down this rabbit hole to upload my website to Azure blob storage. Here’s was the next step of proving out how to do that in Ruby, and an example of why using MD5 digests is useful:

require 'azure/storage/blob'

path = "public/404.html"
data = File.read(path)
local_digest = Digest::MD5.base64digest(data)

client = Azure::Storage::Blob::BlobService.create
blob = client.get_blob_properties("$web", "404.html")
remote_digest = blob.properties[:content_md5]

if local_digest != remote_digest 
  client.create_block_blob("$web", "404.html", data)
end

After this, the next step was to get all the hashes for both the remote and local files, compare them, and upload the ones that changed. However, that’s an exercise I’ll leave to the intrepid reader.

A new gig

June 20, 2018

After four years of working on to-dos, tasks, and lists of various sorts, I’m changing jobs. I’m joining the Microsoft ScaleUp program as a CTO in Residence. Even better, I’ll be working with Chad Fowler again.

What does “CTO in Residence” mean? Good question. The most defined part of my job will be to act as a “CTO on loan” for startups that are in Microsoft’s ScaleUp program in Europe and around the world. I’ll have peers in London, Beijing, Tel Aviv, Shanghai, Bangalore, Sydney, and Seattle, each of whom works with startups in their general area. Not only will this role let me reconnect with the world of startups in a big way, but I’ll also be a lot closer to parts of the company that have been embracing Open Source in a big way to implement Microsoft’s “Intelligent Cloud + Intelligent Edge” vision.

This transition also marks three years at Microsoft for me. Given history, that’s something that’s still really strange for me to contemplate. But it’s working well so far, and the kinds of changes that make Microsoft plausibly a good home for GitHub are the same ones that keep me interested in sticking around. It’s weird, but exciting!