Collection  |  Software, E-Resource Enron email dataset.

About this Item

Title
Enron email dataset.
Summary
"This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation. The email dataset was later purchased by Leslie Kaelbling at MIT, and turned out to have a number of integrity problems. A number of folks at SRI, notably Melinda Gervasio, worked hard to correct these problems, and it is thanks to them (not me) that the dataset is available. The dataset here does not include attachments, and some messages have been deleted "as part of a redaction effort due to requests from affected employees". Invalid email addresses were converted to something of the form user@enron.com whenever possible (i.e., recipient is specified in some parse-able format like "Doe, John" or "Mary K. Smith") and to no_address@enron.com when no recipient was specified." -- Enron Email Dataset website / William W. Cohen.
Contributor Names
Enron Corp.
Cohen, William W., distributor.
United States. Federal Energy Regulatory Commission, compiler.
Created / Published
[Philadelphia, PA] : William W. Cohen, MLD, CMU, [2015]
Subject Headings
-  Enron Corp
-  Electronic mail messages
Genre
Data sets
Notes
-  Title from website
-  Downloaded by the Library of Congress on January 23, 2019.
Medium
Dataset.
Call Number/Physical Location
HE7551
Repository
s-Online Electronic Resource
Digital Id
https://hdl.loc.gov/loc.gdc/gdcdatasets.2018487913
Library of Congress Control Number
2018487913
Language
English
Online Format
compressed data
Description
"This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation. The email dataset was later purchased by Leslie Kaelbling at MIT, and turned out to have a number of integrity problems. A number of folks at SRI, notably Melinda Gervasio, worked hard to correct these problems, and it is thanks to them (not me) that the dataset is available. The dataset here does not include attachments, and some messages have been deleted "as part of a redaction effort due to requests from affected employees". Invalid email addresses were converted to something of the form user@enron.com whenever possible (i.e., recipient is specified in some parse-able format like "Doe, John" or "Mary K. Smith") and to no_address@enron.com when no recipient was specified." -- Enron Email Dataset website / William W. Cohen.
LCCN Permalink
https://lccn.loc.gov/2018487913
Additional Metadata Formats
MARCXML Record
MODS Record
Dublin Core Record

Rights & Access

The Library of Congress is providing access to The Selected Datasets Collection for educational and research purposes. The Library has obtained permission for the use of many materials in the Collection, and presents additional materials for educational and research purposes in accordance with fair use under United States copyright law. Researchers should watch for modern documents that may be copyrighted (for example, published in the United States more than 95 years ago, or unpublished and the author died less than 70 years ago).

You are responsible for deciding whether your use of the items in this collection is legal. You are also responsible for securing any permissions needed to use the items. You will need written permission from the copyright owners of materials not in the public domain for distribution, reproduction, or other use of protected items beyond that allowed by fair use or other statutory exemptions. Some content may be protected under international law. You may also need permission from holders of other rights, such as publicity and/or privacy rights.

More about Copyright and other Restrictions

Credit Line: Library of Congress, Digital Collections Management and Services Division

Cite This Item

Citations are generated automatically from bibliographic data as a convenience, and may not be complete or accurate.

Chicago citation style:

Enron Corp, and William W Cohen. Enron Email Dataset. [Philadelphia, PA: William W. Cohen, MLD, CMU, 2015] Software, E-Resource. https://www.loc.gov/item/2018487913/.

APA citation style:

Enron Corp & Cohen, W. W. (2015) Enron Email Dataset. [Philadelphia, PA: William W. Cohen, MLD, CMU] [Software, E-Resource] Retrieved from the Library of Congress, https://www.loc.gov/item/2018487913/.

MLA citation style:

Enron Corp, and William W Cohen. Enron Email Dataset. [Philadelphia, PA: William W. Cohen, MLD, CMU, 2015] Software, E-Resource. Retrieved from the Library of Congress, <www.loc.gov/item/2018487913/>.

More Collections like this