Collection | Software, E-Resource Enron email dataset.
About this Item
- Title
- Enron email dataset.
- Summary
- "This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation. The email dataset was later purchased by Leslie Kaelbling at MIT, and turned out to have a number of integrity problems. A number of folks at SRI, notably Melinda Gervasio, worked hard to correct these problems, and it is thanks to them (not me) that the dataset is available. The dataset here does not include attachments, and some messages have been deleted "as part of a redaction effort due to requests from affected employees". Invalid email addresses were converted to something of the form user@enron.com whenever possible (i.e., recipient is specified in some parse-able format like "Doe, John" or "Mary K. Smith") and to no_address@enron.com when no recipient was specified." -- Enron Email Dataset website / William W. Cohen.
- Contributor Names
- Enron Corp.
- Cohen, William W., distributor.
- United States. Federal Energy Regulatory Commission, compiler.
- Created / Published
- [Philadelphia, PA] : William W. Cohen, MLD, CMU, [2015]
- Subject Headings
- - Enron Corp
- - Electronic mail messages
- Genre
- Data sets
- Notes
- - Title from website
- - Downloaded by the Library of Congress on January 23, 2019.
- Medium
- Dataset
- Call Number/Physical Location
- HE7551
- Repository
- s-Online Electronic Resource
- Digital Id
- https://hdl.loc.gov/loc.gdc/gdcdatasets.2018487913
- Library of Congress Control Number
- 2018487913
- Online Format
- compressed data
- LCCN Permalink
- https://lccn.loc.gov/2018487913
- Additional Metadata Formats
- MARCXML Record
- MODS Record
- Dublin Core Record
Format
Contributors
Dates
Languages
Subjects
Rights & Access
Cite This Item
More Collections like this
-
CollectionSimple English Wikipedia.
Simplewiki The dataset is composed of the content of Simple Wikipedia including articles and revision history in XML. The XML dumps are in a Export format and compressed in bzip2 and .7z formats;...- Contributor: Wikimedia Foundation
- Date: 2003
-
CollectionGrand comics database dataset
GCD A database of creator credits, story details, and other information useful to the readers and fans of comic books, minicomics, and fanzines.- Contributor: Grand Comics Database Project
- Date: 1994
-
CollectionNational Enquirer Index and Database Files, 1977- The National Enquirer Index and Database Files were created by Mike Handy and other retired Library of Congress staff and volunteers to index the microfilm holdings of National Enquirer at the Library...
- Contributor: Handy, Mike
- Date: 1977
-
CollectionMeme Generator : collected datasets.
Meme Generator. | Meme Generator dataset Meme Generator allows users to create and share image macros (featuring a picture, or artwork, superimposed with text) in the style of popular internet memes. The site also serves as a searchable...- Contributor: Library of Congress - American Folklife Center
- Date: 2010
-
CollectionDinosaur comics. This dataset was generated from content harvested from the Library of Congress's web archive of qwantz.com (Dinosaur Comics!): https://www.loc.gov/item/lcwaN0009953/. It includes minimal metadata about 3,325 image objects from the Dinosaur Comics! web...
- Contributor: Library of Congress Web Archiving Program - North, Ryan
- Date: 2019