This paper focuses on findings from analysis and information extraction of the Enron email archive. Playing the role of investigator, the goal is to extract key evidence of the Enron Corporation's suspicious accounting practices.
The Enron email archive contains more than 500,000 emails from 159 personal email accounts. The task of reading all these emails is unimaginable and impractical, not to mention how difficult it would be to identify patterns among key email accounts. SAS Text Analytics uses both statistical and linguistic technology to parse, explore and categorize the email collection, supporting key functions such as:
- Multisource information retrieval and integration.
- Advanced natural language processing to parse information.
- Term stemming and misspelling identification.
- Content categorization based on statistical and linguistic rules.
- Fact and entity extraction based on part-of-speech tagging and pattern recognition.
- Document filtering.
Please complete your details to download the full report.
Have a SAS profile? To complete this form automatically: log in