Artificial Intelligence has achieved breakthroughs that directly affect documentary and investigative reporting, or any video where participants need anonymity. Thanks to advances in artificial intelligence (AI), standard methods of cloaking identities through pixelation and audio adjustment are much less effective than they were even five years ago.
Lives may be at stake
AI, maybe you’ve heard of it. A recent application of AI is in an area called Image Super Resolution or ISR, which uses AI to scale up lossy or pixelated video to determine the most likely content that the video sensor caught. Hollywood uses a fake version all the time: the getaway car is caught on a grainy surveillance video and the hero barks enhance and, miraculously, the license number appears.
The reality is a bit different and scarier. A pixellated face and a disguised voice can be reverse engineered to determine an individual’s identity, leaving them open to reprisals.
A technology called convolutional neural networks is the tool of choice. They’re very good at sorting through possibilities quickly, and they never tire.
The recent paper Amortised Map Inference For Image Super-Resolution, machine learning researchers Casper Kaae Sonderby, Jose Caballero, Lucas Theis, Wenzhe Shi & Ferenc Huszar, all of the Twitter Cortex in London, describe a new approach to enhancing images.
Results are amazing. Starting with this image:
They get this:
The method works better, today, on natural features, such as faces or a tree. It can’t single one person out of 6 billion. But if a repressive regime knows who the top 500 activists are, chances are good that an AI/ISR bot will identify them, despite pixellation.
The research on ISR is vast and growing, and it wouldn’t surprise me if some of it were funded by security agencies here and abroad. There is similar work being done to undo audio cloaking as well.
What to do?
Clearly, pixellating faces no longer protects vulnerable people. I believe the most likely solution for video is to replace a person’s face with a generic face and then pixellate that. For voice the audio might have to be read by another person and then disguised, or perhaps converted to text and read by a computer voice.
But I’ll leave the solutions engineering to others.
The StorageMojo take
Most of the angst around AI is focused on human-level – or beyond – intelligence – HAL 9000, Skynet – rather than highly specialized AI. But the latter are easier to develop and can deliver astounding results in limited domains.
Today about half of all American’s faces are in a police database. Videos are going to have to smudge or replace faces before pixelation to preserve anonymity.
At a high level this is simply using AI to add value to the imaging systems we already use. The payback scales with our cameras and storage. Expect many more analogous systems.
It’s the applications of these systems, and their unforeseen consequences, that we’ll want to have a say in. For example, what if the enhanced image looks like you – and you weren’t there?
Courteous comments welcome, of course.