Monday, November 7, 2016

650,000 - the number of emails Trump said could not be reviewed but he's wrong

Donald Trump has once again proven his complete lack of fitness for the presidency - or much of anything else for that matter. Simply put, he does not know s#!t about modern information technology and thus continues to misinform his audience about those 650,000 emails the FBI found on Weiner’s computer.

He said “You can’t review 650,000 new emails in 8 days.” (Business Insider)

When I was wearing my computer science hat I routinely crunched data sets larger than that on my office PC - overnight! Government agencies like the FBI have far more powerful and much faster computers than I could ever dream about.

His ignorant claim has been bruited about the media. Here are some choice citations.

From BBC: Even Edward Snowden chimed in saying that old laptops could do it in a matter of hours.

A longer version from Wired.com:

“You can’t review 650,000 emails in eight days,” Trump said Sunday in a campaign speech in Michigan hours after Comey’s latest update to Congress came out. “You can’t do it, folks. Hillary Clinton is guilty.” Trump supporter General Michael Flynn did the math on Twitter:

Wired then shows a tweet from General Flynn, one of Trump’s advisors: “There R 691,200 seconds in 8 days … a n email/second? IMPOSSIBLE” This guy is a total fool when it comes to information technology.

But fortunately for Comey’s eyesight—and for Clinton’s presidential campaign—Trump is wrong: the FBI can review hundreds of thousands of emails in a week, using automated search and filtering tools rather than Flynn’s absurd notion of Comey reading the documents manually. “This is not rocket science,” says Jonathan Zdziarski, a forensics expert who’s consulted for law enforcement and worked as a systems administrator. “Eight days is more than enough time to pull this off in a responsible way.”

One former FBI forensics expert even tells WIRED he’s personally assessed far larger collections of data, far faster. “You can triage a dataset like this in a much shorter amount of time,” says the former agent, who asked to remain anonymous to avoid any political backlash. “We’d routinely collect terabytes of data in a search. I’d know what was important before I left the guy’s house.”

In this case in particular, forensics experts say, investigators’ jobs might even be particularly easy: Because the new collection of emails under investigation were taken from the laptop of Anthony Weiner, the husband of Clinton Aide Huma Abedin, only a portion of those emails would be messages sent to or from Clinton or anyone else on the campaign rather than those sent to or from Weiner’s contacts. Simple filtering by “to:” or “from:” could cut out hundreds of thousands of messages.

Next, the agents could filter out duplicate emails from those they’d already analyzed in their months-long investigation earlier this year. According to multiple media reports, the vast majority of emails the FBI examined over the last week were, in fact, duplicates. Those copies could be spotted by their message ID, points out Zdziarski, a unique alphanumeric identifier for each email. Or if any duplicate messages somehow had different message IDs—say, because they had been copied into replies or forwarded—the FBI agents could use a forensics tool like Encase or AccessData Forensics Tool Kit to make cryptographic “hashes” of full messages or chunks of them. That hashing process converts portions of text into shorter character strings that uniquely represent the text: running a hash function on that same text will always produce the same short string of characters, but any tiny change in the text produces a different hash string. And that allows a program to quickly compare and match text samples.

In fact, according to the former agent who spoke with WIRED, the FBI has tools to quickly identify indicators of classified documents in a large corpus of data. Zdziarski compares those tools to the software that checks for plagiarism, but instead checks for matches or near-matches in text with a collection of classified material. And the FBI could also search for keywords to prioritize reading any new messages about subjects they’d already pursued in their previous investigation of Clinton’s emails.

The real question, wrote cybersecurity consultant Rob Graham in his blog, isn’t how the FBI managed to conclude its investigation in eight days. It’s how it managed to take so long. “Computer geeks have tools that make searching the emails extremely easy,” wrote Graham. “Given those emails, and a list of known email accounts from Hillary and associates, and a list of other search terms, it would take me only a few hours to reduce the workload from 650,000 emails to only a couple hundred, which a single person can read in less than a day.”

In other words, no, General Flynn, it’s not impossible to read an email in a second. That’s what computers are for.

No comments:

Post a Comment