Since I’ve abandoned Facebook my primary source of tech news has become Twitter and this week my feed is raging with two seemingly unrelated security/privacy incidents: Zoom’s zero day and Superhuman’s email tracking scandal. I write “seemingly”, because despite these being two very different companies operating in two different markets (Zoom in video conference calls and Superhuman in emails), building very different products (Zoom is all about jump in, jump out - Superhuman is a workspace) these incidents stem from the same fundamental fault: The telephone experience.
Not so long ago, when people still used to call each other and conduct phone calls all you had to do to talk to someone was click on the contact (or punch in the phone number), wait a few moments for the other side to answer and violla, start talking. On the receiver end, you got a notification and connected with one click (or picked up the phone). Privacy expectations were pretty clear - either side could record the conversation, mute their side or disconnect and had reasonable expectation that 3rd parties were not involved (other than trusted carriers). Each side knows that they are connected and can reasonably detect the other side is still engaged.
This model is so pervasive that almost every digital communication platform tries to mimic it - just pop open your video chat app and look at the controls, have they changed much conceptually? But the click-once-to-connect model, while being easy from a local usability viewpoint, is also fundamentally flawed on a more broad level. Let’s analyze what would happen on a public network where any two parties can connect to each other using this model, remembering how humans actually behave in the real world. One click connection means minimal attention, which on a public system immediately clashes with an identity problem: How do we know who’s on the other side? Not surprisingly, the phone system is plagued with fraud and unsolicited calls. We’ve tried, and failed, to resolve these issues with Caller id and smartphone’s contact management - because such identity mapping system has obvious problems like name take-over, typosquatting etc (DNS anyone?). But still, you have control over when and if a 2nd party was connected, right? turns out it isn’t so good - as pocket calls often remind us. What about trusted carriers and freedom from 3rd parties? also turns out to be problematic as this model doesn’t say anything about ownership - think corporate phones and switchboards. If a 3rd party pays for communication, do you trust it? do you trust the providers it chose to work with? what happens when multiple such parties communicate? who owns the call? the participants or the client who paid?
Note that I haven’t wrote a word about the underlying technology - because these “bugs” are in the product experience model, not the code. Thus they will inevitably appear in some form on any system which tries to recreate that experience. What is happening today is a modern version of all the things we hated telephones for. And this was before attention and focus became the scarcest of resources.
It’s quite clear how video chats got here. But what does email tracking got to do with it? you see, the original “mail” experience never had this “tracking” feature. You sent a mail, and the only way you knew the recipient got your message was if they chose to respond; mail was strictly asynchronous. But as the one-click model prevailed, various companies tried to make email more phone like - where you knew the other side picked up. The email protocol doesn’t support this of course, and so they hacked it, exploiting email clients along the way - not unlike how Zoom hacked browser security policies to shave user clicks off. And what a surprise, it was abused… much the same way people used call-and-immediately-hangup over phones to detect if you were home.
Make no mistake: Zoom and Superhuman incidents are not a code bug; they are product bug.