The TLDR
Metadata is the data about your data. It’s not what you said in the message – it’s who you sent it to, when, from where, how often, and for how long. It’s not the content of your photo – it’s the GPS coordinates embedded in the file, the camera model, and the exact timestamp. Intelligence agencies, law enforcement, advertisers, and stalkers all know that metadata is often more valuable than content. The NSA’s former General Counsel, Stewart Baker, said it plainly: “metadata absolutely tells you everything about somebody’s life.” Former NSA Director Michael Hayden went further: “we kill people based on metadata.”
The Reality
People obsess over end-to-end encryption – making sure the content of their messages can’t be read. And that matters. But the part that’s actually used to track, profile, and target people is almost always the metadata.
Consider this: the content of your phone call to a suicide hotline at 3 AM is protected by encryption. But the fact that you called a suicide hotline, at 3 AM, from your home, for 47 minutes – that’s metadata. Anyone with access to your call records knows what happened without hearing a single word. An ACLU analysis demonstrated that metadata analysis of phone records could reveal a person’s medical conditions, religious practices, and political affiliations with startling accuracy.
The NSA’s bulk metadata collection program, revealed by Edward Snowden in 2013 and documented extensively in Glenn Greenwald’s No Place to Hide, collected the phone metadata of virtually every American – every call made, every call received, the duration, the timestamp, and the cell tower that handled it. Not the content. Just the metadata. That was considered sufficient for intelligence purposes.
When people say “I have nothing to hide,” they’re almost always thinking about content. They’re not thinking about the patterns.
“We Kill People Based on Metadata”
This isn’t hyperbole. The NSA’s drone targeting program used metadata – phone call patterns, location data, device co-location – to identify targets. Former NSA Director Michael Hayden confirmed this publicly. The metadata doesn’t tell you what someone said. It tells you where they sleep, who they associate with, what their daily patterns are, and when those patterns change. That’s enough.
How It Works
Communication Metadata
Every communication system generates metadata, even when the content is encrypted. Here’s what various platforms can see:
Phone calls: Caller number, recipient number, call duration, timestamp, cell tower IDs (which give approximate location), IMEI of the device.
Text messages (SMS): Same as calls, plus message size. Your carrier logs all of this and retains it for years. In the US, carriers responded to at least 1.3 million law enforcement requests for subscriber data in 2019 alone.
Email: Sender, recipient, CC/BCC, subject line, timestamp, IP address of sender, email client, attachment names and sizes. Email headers contain a wealth of routing information. Even with encrypted email (PGP/GPG), the headers including to/from and subject are in plaintext.
WhatsApp: Meta’s WhatsApp uses Signal Protocol for end-to-end encryption of message content. But Meta retains metadata: who you message, when, how often, your IP address, phone number, device identifiers, and your contacts list. Meta’s privacy policy confirms that WhatsApp shares metadata with other Meta companies for advertising purposes. The content is encrypted. The pattern of your life is not.
Signal: Designed specifically to minimize metadata. Signal doesn’t store your contact list on its servers, introduced sealed sender to hide who’s messaging whom from Signal’s own servers, and stores virtually no metadata. The contrast with WhatsApp is instructive – both use Signal Protocol for content encryption, but their metadata practices are opposite.
Telegram: “Secret chats” are end-to-end encrypted, but regular chats are not. Telegram’s cloud-based architecture means they hold message content for non-secret chats plus the full metadata graph of who talks to whom, when, and in which groups.
File Metadata
EXIF in Photos
Every photo taken with a smartphone embeds Exchangeable Image File Format (EXIF) data. This typically includes:
- GPS coordinates – latitude and longitude, often precise to within a few meters
- Timestamp – exact date and time the photo was taken
- Camera/phone model – make, model, lens information
- Orientation – which way the camera was held
- Thumbnail – a small preview image (which can survive even if you crop the main image)
- Software – what app took or edited the photo
The GPS data alone is enough to identify where you live (photos taken at home), where you work, what restaurants you visit, and who you spend time with (photos taken at the same coordinates as another person’s photos). ExifTool, the standard utility for reading and removing EXIF data, can extract over 20,000 different metadata tags from image files.
Document Metadata
Microsoft Office documents, PDFs, and other file formats embed metadata that most people never think about:
- Author name – usually pulled from your OS account or Office license
- Organization – your company name from Office settings
- Revision history – how many times the document was edited
- Total editing time – cumulative time spent editing
- Last saved by – the name of the last person who saved the file
- Tracked changes – even “accepted” tracked changes can sometimes be recovered
- Comments – deleted comments may persist in the file structure
- Embedded objects – files pasted into documents can contain their own metadata chains
PDF metadata can include the software used to create the document, the original filename, and the author’s system username. Lawyers have been burned by this repeatedly – sending documents to opposing counsel with metadata that reveals editing history, internal comments, or the identity of contributors who were supposed to remain anonymous.
Audio and Video Metadata
Audio files (MP3, FLAC, WAV) contain ID3 tags or similar metadata: recording date, software, sometimes GPS. Video files contain even more: camera model, recording settings, GPS tracks throughout the video, audio channel information, and encoding details. A video uploaded to social media may be stripped of some metadata by the platform, but the platform itself retains the originals.
Behavioral Metadata
Fitness App Profiles
Your fitness tracker generates a continuous stream of behavioral metadata: sleep patterns, heart rate variability throughout the day, step counts, workout times, GPS routes of your runs, and recovery metrics. Individually, these seem innocuous. Combined, they paint an extraordinarily detailed portrait.
Your Oura ring or fitness app knows when you sleep, when you wake up, your resting heart rate (which can indicate illness, stress, or pregnancy), your activity level, and your daily routine. Changes in these patterns can reveal life events – a breakup (disrupted sleep, changed routines), a new job (different wake times, different location patterns), an illness (elevated resting heart rate, reduced activity), or a pregnancy (specific HRV and temperature patterns).
Smart Home Device Logs
Alexa logs every voice command, including false activations. Smart thermostat data reveals when you’re home and when you’re away. Smart lock logs show who enters your home and when. Robot vacuum maps reveal your floor plan. Individually, these are “convenience” features. Collectively, they’re a surveillance system you paid for and installed yourself.
How It Gets Exploited
John McAfee and the Photo That Gave Him Away
In 2012, Vice magazine published an “exclusive” interview with John McAfee while he was on the run from Belizean authorities. The reporter posted a photo of McAfee taken with an iPhone. The EXIF data embedded in the photo included GPS coordinates that placed McAfee in Guatemala. He was located and arrested shortly after. One photo. One metadata field. Fugitive found.
Journalists Burned by Document Metadata
In 2005, CBS News and Dan Rather aired a story about George W. Bush’s National Guard service based on documents that were later questioned. The metadata in the documents revealed they were created in Microsoft Word, not on a 1970s-era typewriter, contributing to one of the most high-profile journalism scandals in modern media.
More recently, whistleblower Reality Winner was identified in 2017 partly through printer tracking dots – microscopic yellow dots that most color laser printers embed on every page, encoding the printer’s serial number and the date/time of printing. The NSA document she leaked was traced back to her specific printer, and from there to her.
Strava Military Base Exposure
In 2018, the fitness tracking app Strava published a global heatmap showing the aggregate GPS tracks of its members. The map inadvertently revealed the locations, layouts, and patrol routes of secret military bases in Afghanistan, Iraq, and Syria. Soldiers using Strava during their workouts had unknowingly mapped their bases in precise detail. The data was behavioral metadata – GPS coordinates over time – and it was published openly.
Instagram and Location Data
Instagram strips EXIF data from uploaded photos (since 2012), but the platform retains the original metadata internally. And if you geotagged the post through Instagram’s interface, that location is public. Stalkers have used Instagram geotags, combined with visual details in photos (identifiable landmarks, reflections, shadows indicating time of day), to locate people. The metadata you voluntarily add is as dangerous as the metadata you forget to remove.
What You Can Do
Strip EXIF Before Sharing
Make it a habit: before you share any photo outside of a trusted, private channel, strip the EXIF data. Tools:
- ExifTool – Command line, works on everything:
exiftool -all= photo.jpg - iOS: The Photos app now lets you strip location before sharing (tap Share > Options > toggle off Location)
- Android: Google Photos strips some EXIF when sharing, but not all. Use a dedicated app or ExifTool
- Desktop: On Windows, right-click > Properties > Details > “Remove Properties and Personal Information”
See our EXIF removal guide for the full workflow.
Choose Messaging Apps for Metadata Minimization
If content encryption is a lock on your diary, metadata minimization is not keeping a diary of your diary habits. Signal is the gold standard here – sealed sender, no contact list upload, minimal server-side metadata. WhatsApp encrypts your messages but feeds your metadata to Meta’s advertising machine. That’s a meaningful difference.
If you must use WhatsApp (and hundreds of millions of people have no practical alternative), understand what you’re trading: message content is protected, but your communication graph – who you talk to and when – is not.
Sanitize Documents Before Sending
Before sending any document externally:
- Word/Office: File > Info > Check for Issues > Inspect Document. Remove all metadata categories.
- PDF: Use a PDF editor to remove metadata, or print to a new PDF (which strips most metadata but not all).
- Google Docs: Exporting to PDF from Google Docs includes your Google account name as the author. Be aware.
Minimize Behavioral Metadata
- Review what your fitness apps share and with whom. Disable public Strava profiles. Turn off GPS tracking for workouts where you don’t need it (indoor workouts, for instance).
- Audit smart home device logs and voice history. Delete Alexa voice history regularly. Consider whether you need always-on listening at all.
- Review app permissions, especially location. Background location access is a metadata firehose.
Sources & Further Reading
- Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State – The definitive account of the NSA metadata program
- ACLU: NSA Surveillance – Analysis of what metadata reveals about individuals
- ExifTool by Phil Harvey – The standard tool for reading and writing file metadata
- EFF: Printer Tracking Dots – Which printers embed tracking codes
- Signal: Sealed Sender – How Signal minimizes communication metadata
- Strava Global Heatmap Incident – How fitness metadata exposed military bases
- NIST SP 800-188: De-Identifying Government Datasets – Federal guidance on metadata and re-identification risks
- WhatsApp Privacy Policy – What metadata WhatsApp collects and shares with Meta