The Circle of HOPE (2018): "The Enron Email Corpus: Where the Bodies Are Buried?" (Download)
Saturday, July 21, 2018: 4:00 pm (Booth): As the biggest public domain email database, the Enron email corpus details financial deception in the world’s largest energy trading company and, at the time in 2002, triggered the most costly U.S. bankruptcy and its most massive audit failure. What can Enron tell us today? This talk will invite fresh perspective on how email has (and has not) changed since 2002. Can modern forensic methods find where any new email bodies are buried, even when scanning through the evidence of a previously closed case? The presentation highlights some funny and poignant examples of how humans in business suits write to each other when planning mischief. For the previous six years prior to its failure, Fortune Magazine had named Enron as "America’s most innovative company." Enron’s former chief financial officer now lectures profitably to business groups and hedge funds using a new self-appointed title of "chief loophole officer." More than 3,000 studies have dissected Enron’s email, but have failed to uncover some of its more fascinating forensic artifacts. Nearly two decades later, we revisit this trove to discover what modern tools can do with it. For instance, when the Federal Energy Regulatory Commission (FERC) originally released two terabytes (1.6 million emails and attachments), they claimed to have stripped all personal information. Yet a modern machine learning pipeline in 2018 can identify almost 50,000 previously unreported instances, including credit card numbers, bank accounts, and additional evidence that potentially harms the 99 percent of Enron employees who were never charged. At least one example of detectable malware is still included in the official Enron corpus (called "Joke-StressRelief") along with 231 other executables which continue to accompany each download. This talk will further investigate whether by using email traffic alone, machine learning can predict all of the (eventually charged) persons of interest. It will discuss how Hadoop distributed processing on multiple, clustered virtual machines was deployed. More than 50 algorithms were analyzed for both accuracy (90 plus percent) and execution times. In compliance with new (June 2018) European privacy rules for explainable artificial intelligence, each algorithmic decision was reduced to human-understandable rules and rank order to define which email factors might prove most predictive to future fraud and conspiracy investigations.