Strategies for Cleaning Organizational s with an Application to Enron Dataset
|
|
|
- Brianna Miller
- 10 years ago
- Views:
Transcription
1 Strategies for Cleaning Organizational s with an Application to Enron Dataset Yingjie Zhou Mark Goldberg Malik Magdon-Ismail William A. Wallace Abstract Archived organizational datasets have been considered valuable data resources for various studies, such as spam detection, classification, Social Network Analysis (SNA), and text mining. Similar to other forms of raw data, data can be messy and needs to be cleaned before any analysis is conducted. However, few studies have presented investigation on the cleaning of archived organizational s. This paper examines the properties of organizational s and difficulties faced in the cleaning process. Cleaning strategies are then proposed to solve the identified problems. The strategies are applied to the Enron dataset. Contact: Yingjie Zhou Dept. of Decision Sciences and Engineering Systems Troy, NY Tel: Fax: Key Words: Organizational Cleaning, Enron Dataset Support: This paper was supported in part by National Science Foundation Grant No , Social Communication Networks for Early Warning in Disasters.
2 Strategies for Cleaning Organizational s with an Application to Enron Dataset Yingjie Zhou, Mark Goldberg, Malik Magdon-Ismail, and William A. Wallace Archived organizational datasets have been considered valuable data resources for various studies, such as spam detection, classification, Social Network Analysis (SNA), and text mining. Similar to other forms of raw data, data can be messy and needs to be cleaned before any analysis is conducted. However, few studies have presented investigation on the cleaning of archived organizational s. This paper examines the properties of organizational s and difficulties faced in the cleaning process. Cleaning strategies are then proposed to solve the identified problems. The strategies are applied to the Enron dataset. To simplify the denotations in this paper, lname, fname, mname and id will be used to represent an employee s last name, first name, middle name, and organizational ID respectively. Names having a middle name will be used in examples unless otherwise mentioned. All the code is written, compiled, and run in Perl [Wall, Christiansen & Orwant, 2000]. Properties of Organizational Data , one of the important communication tools, is faster than conventional mail but not as interactive as telephone and online chatting. It has become an important way for people to exchange thoughts, opinions, and feelings. In general, an internet message contains two parts: the header and the body. The header contains structured information, including sender, receiver(s), subject, and date. The body contains unstructured text, including the content of the and sometimes a signature block at the end; often the message may have quotations ( forwarded or replied to text) and attachment(s). Information such as who talks with whom at what time regarding what issues may be retrieved from the header and the body. Moreover, forwarded text gives the origin of the information, and the replied to text explains the intent of current . The signature normally gives the contact information of the sender and may be used to identify the sender s position in the organization. data has several favorable features for SNA research. First, s are formatted, and the format is usually defined and followed. An starts with the header, followed by a blank line, and then the body. The header usually includes several header fields, such as From, To, Subject, and Date. Each header field starts with a field name, followed by a :, then a space or a tab, and then field value. Second, s are easily collected and normally stored in a server. Assuming privacy of the ers has been ensured, data collection and storage are relatively inexpensive. Third, s are unobtrusive. They document the real past enabling researchers to trace past events. It is usually difficult to guarantee the unbiasedness and accuracy for other types of SNA data because of various factors affecting data collection. For example, the unbiasedness and accuracy of survey data depends on the design of the survey form, the understanding, and judgment of the clients. The employees information, such as position and responsibilities, are usually available and very helpful in interpreting results from the SNA. Despite the favorable properties, data has data cleaning problems like many other data types. Identifying these problems and proposing a thorough and systematic approach to solving them are the purposes of this paper. Difficulties in Cleaning Organizational s In general, organizational datasets have three problems. First, multiple addresses, names, or IDs exist for the same person. Even though the header name, such as To in an , makes itself clear, that is, the value followed is the information about the receiver(s), the value could be in an address form or name form. One person can be labeled with various name formats by other people. For example, one can be labeled as lname, fname by one colleague and fname mname lname by another. These optional names will not affect the actual delivery because the delivery address is supplied in the protocol, but the difficulty of mapping these names correctly is greatly increased. Moreover, one person may have multiple addresses from domains in addition to his/her organizational one, such as yahoo, gmail, hotmail, etc. Even within the same organizational domain, people may have several addresses. When the people who send and receive s are of interests in SNA research, mislabeling a person may lead to confusing or even wrong conclusions. Second, duplicate s exist. For example, when A sends an to B, that would be in the Outbox of A and Inbox of B simultaneously. If there are multiple recipients, each one of them would have a copy of that . Duplicate s should be removed if or word frequency is being studied, otherwise the number of s will be overestimated in a nonlinear way and the accuracy of results of SNA cannot be determined. Third, the content of the is difficult to extract. The content is normally mingled with the signature and the quotation. If the content of s is an essential part in the research, the irrelevant parts should be removed before any text mining techniques are applied. 1
3 Our work mainly focuses on the first and second problems identified above. The third problem will not be addressed in this paper. One approach for solving it has been proposed using the Support Vector Machine (SVM) method [Tang, Li, Cao & Tang, 2005]. Strategies for Cleaning an Organizational Dataset Strategies must be developed for cleaning the s in an organization. The strategies proposed consist of four tasks: entities identification, formats extraction, formats generalization, and duplicate s removal. Task 1, 2, and 3 are sequential, while Task 4 can be implemented independently. The first task is to identify the communicating entities. We assume that the archival organization s are organized by employee folders that may contain further subfolders, such as Inbox, Sent_Items, and Deleted_Items, etc. By observing the From header field in the folder Sent_Items or similar folders that contain s sent by the owner, his/her name and ID normally can be determined. The study subjects of SNA are therefore defined after each folder has been mapped to an employee of the organization. However, the From header field itself is not sufficient to detect all the addresses and optional names belonging to that employee. Therefore, the second task of the cleaning strategies is to collect all the addresses and optional names for each employee by examining the To, Cc, and Bcc header fields in addition to the From field. By matching the employee s first name, last name and ID in the header fields, we would collect many raw formats of addresses and optional names. The raw formats should be converted to the extracted formats by removing the unnecessary characters. Since we assume that all the s in one employee s folder are either sent from or to that employee, at least one of the fields should match that employee. If none of the fields in an do, that should be recorded for further investigation. In most cases that is sent to a list that the employee is in. As a result, a list of addresses and optional names is compiled for each employee when the second task is completed. The third task is then to generalize the formats of addresses and optional names of each employee by taking the union operation on the extracted formats of all employees. Extra attention should be paid to unusual names and addresses. The purpose of generalization is to make the parsing procedure applicable to all employees. The fourth task is to remove duplicate s from the entire dataset. Duplicate s exist because both sender and receiver(s) may save a copy of the , or occasionally one is mistakenly sent more than once. Any two s having the same recipients, day, and content are considered as duplicate s in our approach. The day, not the time, is used as one of the constraints because one could be sent more than once unintentionally. The generalized formats from the third task are used in parsing the From, To, Cc, and Bcc header fields to find the sender and recipient(s) for each unique identified by the fourth task. Various statistics and social networks based on frequencies can be computed and constructed. The cleaning strategies are general to organizational datasets although each dataset may have its own problems. The Enron dataset is used to test the effectiveness of cleaning strategies proposed in this paper. Introduction to the Enron Dataset In this section, a brief history of the Enron dataset is introduced, followed by the organization and the format of these s. The Enron dataset is valuable because it is one of the very few collections of organizational s that are publicly available. The s of this period ( ) record the dynamics of Enron, from glory to collapse. Enron was the World s Leading Energy Company; it declared bankruptcy in December 2001, which was followed by numerous investigations. During the investigations, the original Enron dataset, consisting of 92% of Enron s staff s, i.e. 619,446 messages in total, was posted to the web by the Federal Energy Regulatory Commission (FERC) in May of 2002 [Federal Energy Regulatory Commission, 2002]. Leslie Kaelbling at Massachusetts Institute of Technology (MIT) purchased the dataset, and found integrity problems with it. A group of researchers at SRI International worked on these problems for their Cognitive Assistant that Learns and Organizes (CALO) project, and the resulting dataset was sent to and posted by Professor William W. Cohen at Carnegie Mellon University (CMU) [Cohen, 2004]. This dataset is called the March 2, 2004 Version, which is widely accepted by many researchers. In this version, the attachments are excluded, and some messages have been deleted upon the request of Enron employees. The resulting corpus contains 517,431 messages organized into 150 folders. The folder s name is given as the employee s last name, followed by a dash, followed by the initial letter of the employee s first name. For example, folder allen-p is named after Enron employee Phillip K. Allen. Presumably we guess that each folder matches one employee, but this conjecture is not correct, which will be discussed in detail in the Data Cleaning Experiment section. Each employee folder contains subfolders, such as inbox, sent, _sent_mail, discussion_threads, 2
4 all_documents, deleted_items, and subfolders created by the employee. A large number of duplicate s exist in those folders. An Enron message contains the following header fields in order (the header field in parenthesis is optional): Message-ID, Date, From, ( To ), Subject, ( Cc ), Mime-Version, Content-Type, Content- Transfer-Encoding, ( Bcc ), X-From, X-To, X-cc, X-bcc, X-Folder, X-Origin, and X-FileName. The content is separated with the headers by a blank line. The signature and quotation are continued if they exist. The header field names initiated with an X- means that the field values are from the original message. The field values of From, To, Cc and Bcc are converted from those of X-From, X-To, X-Cc and X- Bcc correspondingly by the SRI researchers based on their rules. As we noted in the previous section, the field value could be in address form or name form; the SRI researchers recognized this problem and tried to convert into a uniform address form. However, the conversion process is not satisfactory as shown in the Data Cleaning Experiment section. An abridged Enron message allen-p\inbox\62 is shown in Figure 1 as an illustration. Unimportant header fields are removed; the content of the and quotation message is shortened as < CONTENT> and <Quotation Message> to save space. In Figure 1, lines from1 to 6 are the header, the body is from line 7 to 10, and line 11 is the abridged quotation message. 1 Date: Fri, 2 Nov :18: (PST) 2 From: [email protected] 3 To: [email protected] 4 Subject: RE: 5 X-From: Ratcliff, Renee </O=ENRON/OU=NA/CN=RECIPIENTS/CN=RRATCLI> 6 X-To: Allen, Phillip K. </O=ENRON/OU=NA/CN=RECIPIENTS/CN=Pallen> 7 Phillip, 8 < CONTENT> 9 Thanks, 10 Renee 11 <Quotation Message> Figure 1: An Enron Message allen-p\inbox\62 Related Work on the Enron Data Cleaning Most data cleaning methods, such as the detection of outliers and treatment of missing data, are designed for numerical data. Although several products are available for cleaning, they are mainly concerned with the readability of individual s. For instance, ecleaner 2000 and Text Money Pro are able to clean forwarded messages by removing the > characters; they are also capable of removing line breaks, empty white space, and extra blank lines to improve readability [Boxer Software, 2004; Decker, 2000]. However, their capabilities do not satisfy the requirements for cleaning organizational archived s. Few studies have done a thorough investigation on cleaning archived s to our knowledge. As Carley and Skillicorn noted: The sheer size and complexity of the dataset resulted in massive amounts of time being spent simply cleaning the data; e.g., eliminating copies of messages, identifying when the same person had multiple ids, and so on. [Carley & Skillicorn, 2005]. Despite the difficulties in cleaning organization s, researchers have done part of the cleaning work for the Enron dataset. Previous research efforts on duplicate s removal for the Enron dataset are discussed first, followed by the mapping problem. The easiest way of removing duplicate s is to remove folders since some folders contain a large number of duplicate s [Klimt & Yang, 2004; Shetty & Adibi, 2004]. Although simply removing folders was fine for some research purposes, it was not adequate for the purpose of removing duplicate messages. Andrés Corrada- Emmanuel removed the duplicate s with a different approach. By calculating the MD5 digest of the body constrained by the same day, he concluded that the dataset contained 250,484 unique messages belonging to 149 employees [Corrada-Emmanuel, 2004]. This approach of identifying duplicate s is more rational than simply deleting folders. Therefore, the idea is adopted in our research by adding recipient(s) as another constraint. Chapanond, Krishnamoorthy, and Yener specifically discussed the problem of mapping multiple addresses and names to a person [Chapanond, Krishnamoorthy &Yener, 2005]. They extracted addresses from From and To fields, which were converted from X-From and X-To by the SRI researchers. However, they found that one person could have multiple addresses. For instance, Vince J Kaminski had addresses: [email protected], [email protected], [email protected], [email protected], 3
5 j. and They claimed that manual inspection was necessary to deal with the employees having the same last name or unexpected characters. It is not hard to notice that some of the addresses, such as j. are unusual. In addition, addresses for employees in an organization are normally assigned based on clearly defined rules. It is suspicious that one employee could have so many distinct addresses. We cannot help thinking that there is something wrong with the From and To fields. Indeed, we found that the conversion itself was flawed during our cleaning process. Although part of the cleaning work has been done in the previous research, a systematic way of cleaning the Enron dataset should be performed with improvement and corrections. Data Cleaning Experiment with the Enron Dataset The proposed strategies guide the Enron data cleaning experiment. Most of the employees involved in the Enron dataset are senior managers and traders. One assumption made in most previous research work is that one folder corresponds to one employee. Priebe, Conroy, Marchete, and Park, however, found 184 employees in Enron dataset [Priebe, Conroy, Marchete & Park, 2005], which was the number of unique addresses they obtained from the From header of s in the Sent boxes after removing some addresses that were clearly not related to the 150 employees. Therefore, our first task was to identify who were the employees. We found 156 employees after removing the secretaries, assistants, and people or addresses that were irrelevant to the employees by searching the From field of every in subfolder _sent_mail, sent and sent_items. Three cases of folder-employee matching exist in the Enron dataset: 1 folder to 1 employee, 1 folder to 2 employees, and 2 folders to 1 employee. Case I: One folder corresponds to one employee. For example, in subfolder _sent_mail, sent and sent_items of user folder allen-p we found 8 s sent by Ina Rangel, 1 by Pam Butler, and s by Phillip K Allen. By further examining the content of the s, Ina Rangel seemed to be an assistant of Allen K Phillip, but there was no clue to the identification of Pam Butler. In this case, we claim Phillip K. Allen is the only owner of folder allen-p. Case II: One folder corresponds to two employees. This case happens when two employees have same last name and same initial of first name. Sometimes two employees have the same first name but distinct middle names. For example, in folder campbell-l, we found 5 s sent by Robert J. Baker, 303 s by Larry Campbell, and 359 s by Larry F Campbell. The five s from Robert J. Baker can be ignored. However, Larry Campbell and Larry F Campbell are two different persons, which can be determined by the organizational IDs (lcampbe versus lcampbel) and content from campbell-l\all_documents\952. In this case, we claim that both Larry Campbell and Larry F Campbell are the owners of the folder campbell-l. The same situation applies to folder dean-c, gay-r, hernandez-j, hodge-j, mcconnell-m, scott-s, and taylor-m. Case III: Two folders correspond to one employee. Two such cases happen. In folders panus-s and phaniss, Stephanie Panus is the only sender, so we concluded that both folders belonged to her. Another two folders, whalley-g and whalley-l, were found to belong to Greg Whalley. We identified 156 Employees with their names and Enron IDs from Enron dataset. We then limited our scope to the s among these 156 employees. The second cleaning task was to investigate the recipient information of all s in each employee folder to find his/her addresses and optional names. As we said, the To, Cc, and Bcc fields were converted by the SRI researchers from the original fields X-To, Xcc, and X-bcc. The intention of conversion is good: all the messy formats are converted to a format like [email protected]. For example, Rick Buy was converted to [email protected]. However, even if the conversion is correct, various addresses have to be mapped to an employee [Chapanond, Krishnamoorthy & Yener, 2005]. Also, the rules for the conversion have flaws: The rules are aggressive; addresses from other domains were converted to Enron domain when there were multiple recipients. For example, Kilburn, Bobbi <[email protected]> in the X-To field was converted to [email protected], and Braintree Electric Light Dept. <[email protected]> was converted to [email protected] in message baughman-d\inbox\218. The conversion is obviously wrong. The rules may lead to confusing and incorrect results even for Enron employees. For example, Taylor, Mark E (Legal) </O=ENRON/OU=NA/CN=RECIPIENTS/CN=Mtaylo1> in X-To was converted to legal <[email protected]> in message cash-m\sent_items\505, which doesn t make sense. Worse than that, two different names may be converted to the same address. For example, in message davisd\deleted_items\101, Davis, Mark Dana </O=ENRON/OU=NA/CN=RECIPIENTS/CN=MDAVIS> sent an to Davis, Dana </O=ENRON/OU=NA/CN=RECIPIENTS/CN=Ddavis>, but the converted 4
6 results showed that sent an to the same address Employees who have the same last name and same initial of first name may be confounded. Useful information is lost during the conversion. For example, in message allen-p\sent_items\68, Mike Grigsby <Mike in X-To was converted to ; the information that Mike Grigsby worked in the department of Enron Capital and Trade (ECT) in Houston (HOU) was lost. The original header fields, therefore, were used in our cleaning procedure. Since X-bcc field was not provided in the Enron dataset, it was ignored in the cleaning process. The X-To and X-cc fields of all s in each employee folder were examined for collecting the addresses and optional names by matching the last name, first name, and Enron ID. We found that two formats dominated; one was fname mname lname, and the other was lname, fname mname. </o=enron/ou=na/cn=recipients/cn= id>. Other standard formats include and The raw formats contained various unnecessary characters associated with the real addresses. The extracted formats were obtained by removing these characters. For example, in folder allen-p, the raw formats found for Phillip K. Allen are shown in the left column of Table 1. The corresponding extracted formats are listed in the right column of Table 1. Table 1: Raw Formats and Extracted Formats for Employee Phillip K. Allen Raw Formats phillip k allen allen, phillip k. </o=enron/ou=na/cn=recipients/cn=pallen> [email protected] "phillip allen" <[email protected]> "[email protected]" <[email protected]> phillip <[email protected]> phillip allen <[email protected]> "allen, phillip k" <[email protected]> <[email protected]> [email protected] '[email protected]' phillip allen - home ( ) <[email protected]> "phillip allen" <[email protected]>@enron [email protected] [email protected] <[email protected]> [email protected] phillip k allen <[email protected]> Extracted Formats phillip k allen allen, phillip k. </o=enron/ou=na/cn=recipients/cn=pallen> [email protected] [email protected] [email protected] [email protected] [email protected] As we have identified in the first task, one folder may have two users. Interestingly, we found that not only were we confused by their similar names, but so were Enron employees themselves. They kept forwarding misdirected s to each other, and complained about the mess, such as scott-s\all_documents\1328, scotts\sent_items\389, and taylor-m\all_documents\1740. After we have finished the second cleaning task for all the 156 employees, 156 tables similar to Table 1 were created. In the third task, we compared these tables, and found that the extracted formats followed explicit rules although some employees might miss one format or another. The Union operator was taken to make a comprehensive list of all the formats. Missing formats might be found for individuals by comparing with the formats in the comprehensive list. For example, we found lname, fname <something> was also a common format. However, we failed to discover it for Phillip K. Allen from his folder allen-p, instead we obtained format allen, phillip </o=enron/ou=na/cn=recipients/cn=notesaddr/cn=ba4cd662-58db2db b8-5b412a> from other user folders. Same argument held for format phillip k allen/hou/ect. For those addresses outside of the Enron domain, we had to deal with them individually. Most of the time, these addresses were used to forward s from or to himself/herself. Occasionally they were used to communicate with colleagues. In conclusion, the standard formats that were general to all Enron employees were summarized as: (1) fname mname lname; (2) lname, fname mname. <...= id>; (3) lname, fname < >; (4) [email protected]; (5) [email protected]; and (6) [email protected]. Other formats, which were only applicable to one employee or a small group of employees, were listed only for those employees. The general formats together with the individual formats of all employees were used for parsing the headers. The fourth task is to remove the duplicate s. We suggest that any two s having the same recipients, day, and content be considered as duplicate s. Calculating MD5 Digest for these three elements, 252,830 5
7 unique messages were found. This number is slightly less than 250,484 when only day and content were used [Corrada-Emmanuel, 2004]. This method is very sensitive to the s with quoted messages. Therefore, the tools for cleaning the quotations could be used before removing the duplicate s to achieve better results. Upon completion of all four tasks, we have cleaned the s among 156 Enron employees identified from the Enron dataset distributed by William W. Cohen. By cleaned, we mean that we have removed the duplicate s, identified 156 Enron employees from 150 folders, mapped optional names and multiple IDs for each individual, and finally picked out the s among these 156 employees from the whole dataset. This subset of the s can then be used for the SNA. Conclusions and Future Work Data cleaning strategies for archived organizational s are proposed and applied to the Enron dataset. These strategies should provide guidance when cleaning organizational s. However, an organizational dataset may have its own unique problems. In general, the strategies are practical and served well in cleaning the Enron s. In the future, the strategies will be applied to other datasets to test their robustness. Machine Learning techniques will be used to distinguish persons having similar or even the same names by learning their social contacts and content. Finally, cleaning the content of the organizational s is also part of our future research. References [Boxer Software, 2004] Boxer Software, 2004, Ugly Messages: Beware the Text Monkey. Retrieved October 20, 2006, from [Carley & Skillicorn, 2005] Carley, Kathleen M. & David Skillicorn, 2005, Special Issue on Analyzing Large Scale Networks: The Enron Corpus. Computational & Mathematical Organization Theory, 11(3), , [Chapanond, Krishnamoorthy & Yener, 2005] Chapanond, Anurat, Mukkai S. Krishnamoorthy & Bülent Yener, 2005, Graph Theoretic and Spectral Analysis of Enron Data. Computational & Mathematical Organization Theory, 11(3), , [Cohen, 2004] Cohen, William W., 2004, Enron Dataset. Retrieved March 12, 2005, from [Corrada-Emmanuel, 2004] Corrada-Emmanuel, Andrés, 2004, Enron Dataset Research. Retrieved March 12, 2005, from [Decker, 2000] Decker, John, 2000, eclean Retrieved October 20, 2006, from [Federal Energy Regulatory Commission, 2002] Federal Energy Regulatory Commission, 2002, FERC: Information Released in Enron Investigation. Retrieved March 12, 2005, from [Klimt & Yang, 2004] Klimt, Bryan & Yiming Yang, 2004, Introducing the Enron Corpus. Conference on and Anti-Spam, Mountain View, CA, [Priebe, Conroy, Marchete & Park, 2005] Priebe, Carey E., John M. Conroy, David J. Marchette & Youngser Park, 2005, Scan Statistics on Enron Graphs. Computational & Mathematical Organization Theory, 11(3), , [Shetty & Adibi, 2004] Shetty, Jitesh & Jafar Adibi, 2004, The Enron Dataset Database Schema and Brief Statistical Report. Technical report, Information Sciences Institute, [Tang, Li, Cao & Tang, 2005] Tang, Jie, Hang Li, Yunbo Cao & Zhaohui Tang, 2005, Data Cleaning. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. Chicago, IL, August 21-24, [Wall, Christiansen & Orwant, 2000] Wall, Larry, Tom Christiansen & Jon Orwant, Programming Perl. 3 rd Edition, O'Reilly, July 14,
The Enron Corpus: A New Dataset for Email Classification Research
The Enron Corpus: A New Dataset for Email Classification Research Bryan Klimt and Yiming Yang Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213-8213, USA {bklimt,yiming}@cs.cmu.edu
AHS Computing. Outlook Seminar Series Session #2
AHS Computing Outlook Seminar Series Session #2 Introduction Email templates Archiving email Creating Rules Automating Replies Sorting Junk Email Settings Email Templates File with predesigned, customized
Monitoring Corporate Password Sharing Using Social Network Analysis
Monitoring Corporate Password Sharing Using Social Network Analysis Andrew S. Patrick NRC Canada Paper to be presented at the International Sunbelt Social Network Conference, St. Pete Beach, Florida, January
How to Analyze Company Using Social Network?
How to Analyze Company Using Social Network? Sebastian Palus 1, Piotr Bródka 1, Przemysław Kazienko 1 1 Wroclaw University of Technology, Wyb. Wyspianskiego 27, 50-370 Wroclaw, Poland {sebastian.palus,
Email Basics Workshop
Email Basics Workshop Electronic mail, or email, is the most frequently used service on the Internet. Being able to use email effectively and efficiently is a core skill required in today s world. WLAC
Message Archiving User Guide
Message Archiving User Guide Spam Soap, Inc. 3193 Red Hill Avenue Costa Mesa, CA 92626 United States p.866.spam.out f.949.203.6425 e. [email protected] www.spamsoap.com RESTRICTION ON USE, PUBLICATION,
Basic E- mail Skills. Google s Gmail. www.netliteracy.org
Email it s convenient, free and easy. Today, it is the most rapidly growing means of communication. This is a basic introduction to email and we use a conversational non- technical style to explain how
Microsoft Outlook 2010 contains a Junk E-mail Filter designed to reduce unwanted e-mail messages in your
Overview of the Junk E-mail Filter in Outlook (Much of this text is extracted from the Outlook Help files.) Microsoft Outlook 2010 contains a Junk E-mail Filter designed to reduce unwanted e-mail messages
MICROSOFT OUTLOOK 2010
MICROSOFT OUTLOOK 2010 George W. Rumsey Computer Resource Center 1525 East 53rd, Suite 906 Chicago, IL 60615 (773) 955-4455 www.computer-resource.com [email protected] What Is Outlook?... 1 Folders... 2
Managing Junk Mail. About the Junk Mail Filter
Managing Junk Mail Outlook can filter out certain types of messages and send them to a separate folder to keep your Inbox from being cluttered with junk mail. Outlook can also disable links in suspicious
About the Junk E-mail Filter
1 of 5 16/04/2007 11:28 AM Help and How-to Home > Help and How-to About the Junk E-mail Filter Applies to: Microsoft Office Outlook 2003 Hide All The Junk E-mail Filter in Outlook is turned on by default,
A Monograph on Data Mining Techniques for Email Forensics
A Monograph on Data Mining Techniques for Email Forensics Venkata Krishna Kota Central Research Laboratory Bharat Electronics Limited Bangalore, India [email protected] Abstract Email Forensics
Email. Electronic mail, or e-mail, is the most frequently used service on the Internet. Seema Sirpal Delhi University Computer Centre
Email Electronic mail, or e-mail, is the most frequently used service on the Internet Seema Sirpal Delhi University Computer Centre Why use Email You can send a message any time, any where. You can send
Using Outlook 2010 for Email
Using Outlook 2010 for Email Workbook Edition 1 June 2013 Document Reference: 3774 Contents Using Outlook 2010 for Email 1. Introduction Microsoft Outlook... 1 Outlook Basics... 1 2. The Ribbon Mail, Contacts
ModusMail Software Instructions.
ModusMail Software Instructions. Table of Contents Basic Quarantine Report Information. 2 Starting A WebMail Session. 3 WebMail Interface. 4 WebMail Setting overview (See Settings Interface).. 5 Account
A Guide to Information Technology Security in Trinity College Dublin
A Guide to Information Technology Security in Trinity College Dublin Produced by The IT Security Officer & Training and Publications 2003 Web Address: www.tcd.ie/itsecurity Email: [email protected] 1 2
To change the sending options, open the email to be sent and then go to the options tab on the ribbon.
Advanced email options When sending an email using Outlook, there are several additional options you can use to change the way the email is sent or how people can respond to it. The options which can be
BULLGUARD SPAMFILTER
BULLGUARD SPAMFILTER GUIDE Introduction 1.1 Spam emails annoyance and security risk If you are a user of web-based email addresses, then you probably do not need antispam protection as that is already
Analysis of Spam Filter Methods on SMTP Servers Category: Trends in Anti-Spam Development
Analysis of Spam Filter Methods on SMTP Servers Category: Trends in Anti-Spam Development Author André Tschentscher Address Fachhochschule Erfurt - University of Applied Sciences Applied Computer Science
MailEnable Web Mail End User Manual V 2.x
MailEnable Web Mail End User Manual V 2.x MailEnable Messaging Services for Microsoft Windows NT/2000/2003 MailEnable Pty. Ltd. 486 Neerim Road Murrumbeena VIC 3163 Australia t: +61 3 9569 0772 f: +61
Microsoft Outlook. KNOW HOW: Outlook. Using. Guide for using E-mail, Contacts, Personal Distribution Lists, Signatures and Archives
Trust Library Services http://www.mtwlibrary.nhs.uk http://mtwweb/cgt/library/default.htm http://mtwlibrary.blogspot.com KNOW HOW: Outlook Using Microsoft Outlook Guide for using E-mail, Contacts, Personal
SPAM UNDERSTANDING & AVOIDING
SPAM UNDERSTANDING & AVOIDING Modified: September 28, 2006 SPAM UNDERSTANDING & AVOIDING...5 What is Spam?...6 How to avoid Spam...6 How to view message headers...8 Setting up a spam rule...10 Checking
Catholic Archdiocese of Atlanta Outlook 2003 Training
Catholic Archdiocese of Atlanta Outlook 2003 Training Information Technology Department of the Archdiocese of Atlanta Table of Contents BARRACUDA SPAM FILTER... 3 WHAT IS THE SPAM FILTER MS OUTLOOK PLUG-IN?...
Center for Faculty Development and Support. Gmail Overview
Center for Faculty Development and Support Gmail Overview Table of Contents Gmail Overview... 1 Overview... 3 Learning Objectives... 3 Access Gmail Account... 3 Compose Mail... 4 Read and Reply Mail...
Bayesian Spam Filtering
Bayesian Spam Filtering Ahmed Obied Department of Computer Science University of Calgary [email protected] http://www.cpsc.ucalgary.ca/~amaobied Abstract. With the enormous amount of spam messages propagating
Connecting to LUA s webmail
Connecting to LUA s webmail Effective immediately, the Company has enhanced employee remote access to email (Outlook). By utilizing almost any browser you will have access to your Company e-mail as well
Emailing from The E2 Shop System EMail address Server Name Server Port, Encryption Protocol, Encryption Type, SMTP User ID SMTP Password
Emailing from The E2 Shop System With recent releases of E2SS (at least 7.2.7.23), we will be allowing two protocols for EMail delivery. A new protocol for EMail delivery Simple Mail Transfer Protocol
ONE Mail Direct for Desktop Software
ONE Mail Direct for Desktop Software Version: 1 Document ID: 3931 Document Owner: ONE Mail Product Team Copyright Notice Copyright 2015, ehealth Ontario All rights reserved No part of this document may
Managing Junk E-mail Folder in Outlook 2010. This document contains instructions for users on how to manage spam filters in Microsoft Outlook 2010.
Ready Reference 041013 Introduction Managing Junk E-mail Folder in Outlook 2010 This document contains instructions for users on how to manage spam filters in Microsoft Outlook 2010. Manual Filtering of
How to make the Emails you Send with Outlook and Exchange Appear to Originate from Different Addresses
How to make the Emails you Send with Outlook and Exchange Appear to Originate from Different Addresses If you only have a single email address from which you send all your business and personal emails
New Developments in the Automatic Classification of Email Records. Inge Alberts, André Vellino, Craig Eby, Yves Marleau
New Developments in the Automatic Classification of Email Records Inge Alberts, André Vellino, Craig Eby, Yves Marleau ARMA Canada 2014 INTRODUCTION 2014 2 OUTLINE 1. Research team 2. Research context
Spam detection with data mining method:
Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,
Combining textual and non-textual features for e-mail importance estimation
Combining textual and non-textual features for e-mail importance estimation Maya Sappelli a Suzan Verberne b Wessel Kraaij a a TNO and Radboud University Nijmegen b Radboud University Nijmegen Abstract
How to Manage Email. Guidance for staff
How to Manage Email Guidance for staff 1 Executive Summary Aimed at Note Purpose Benefits staff Necessary skills to All staff who use email This guidance does NOT cover basic IT literacy skills. Staff
[email protected]
Email Electronic mail is the transmission of mainly text based messages across networks. This can be within a particular network - internal mail - or between networks - external mail. The most common network
Outlook 2010 Essentials
Outlook 2010 Essentials Training Manual SD35 Langley Page 1 TABLE OF CONTENTS Module One: Opening and Logging in to Outlook...1 Opening Outlook... 1 Understanding the Interface... 2 Using Backstage View...
Division of Student Affairs Email Quota Practices / Guidelines
Division of Student Affairs Email Quota Practices / Guidelines Table of Contents Quota Rules:... 1 Mailbox Organization:... 2 Mailbox Folders... 2 Mailbox Rules... 2 Mailbox Size Monitoring:... 3 Using
Email Basics. For more information on the Library and programs, visit www.bcpls.org BCPLS 08/10/2010 PEMA
Email Basics Email, short for Electronic Mail, consists of messages which are sent and received using the Internet. There are many different email services available that allow you to create an email account
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Essential Project Management Reports in Clinical Development Nalin Tikoo, BioMarin Pharmaceutical Inc., Novato, CA
Essential Project Management Reports in Clinical Development Nalin Tikoo, BioMarin Pharmaceutical Inc., Novato, CA ABSTRACT Throughout the course of a clinical trial the Statistical Programming group is
Microsoft Exchange 2010 Email Training. Microsoft Outlook 2007 Outlook Web App
Microsoft Exchange 2010 Email Training Microsoft Outlook 2007 Outlook Web App Table of Contents INTRODUCTION 1.1 What Does Microsoft Exchange Do? 1.2 Advantage/Disadvantage 1.3 Outlook 2007 vs. Outlook
Stores copies of items you sent to others, by default. Stores items created offline that you want to send the next time you are online
Outlook Folders: All new messages are stored in Inbox unless rules are created. Stores copies of items you sent to others, by default Stores temporarily deleted items until you permanently delete or retrieve
Table of Contents Chapter 1 INTRODUCTION TO MAILENABLE SOFTWARE... 3 MailEnable Webmail Introduction MailEnable Requirements and Getting Started
Webmail User Manual Table of Contents Chapter 1 INTRODUCTION TO MAILENABLE SOFTWARE... 3 MailEnable Webmail Introduction MailEnable Requirements and Getting Started Chapter 2 MAILENABLE KEY FEATURES OVERVIEW...
FORWARDING EMAIL (directed to a non-gcccd email account) Revised 3/22/13
FORWARDING EMAIL (directed to a non-gcccd email account) Revised 3/22/13 FORWARDING FROM OUTLOOK WEB ACCESS (easiest method) You can setup your email to auto forward to a non-gcccd account such as Yahoo,
WINDOWS LIVE MAIL FEATURES
WINDOWS LIVE MAIL Windows Live Mail brings a free, full-featured email program to Windows XP, Windows Vista and Windows 7 users. It combines in one package the best that both Outlook Express and Windows
Managing your e-mail accounts
Managing your e-mail accounts Introduction While at Rice University, you will receive an e-mail account that will be used for most of your on-campus correspondence. Other tutorials will tell you how to
HOW TO SAVE AND FILE LOTUS NOTES EMAILS
Email messages that are university records should be filed and retained with other records to which they relate. Saving emails to a unit s shared drive is an effective way to extract them from the email
How to make the Emails you Send from Outlook 2010 appear to Originate from different Email Addresses
How to make the Emails you Send from Outlook 2010 appear to Originate from different Email Addresses If you only use a single email address to send out all your business and personal emails then you're
Email Management CSCU9B2 CSCU9B2 1
Email Management CSCU9B2 CSCU9B2 1 Contents Email clients choosing and using Email message header and content Emailing to lists of people In and out message management Mime attachments and HTML email SMTP,
THUNDERBIRD SETUP (STEP-BY-STEP)
Jim McKnight www.jimopi.net Thunderbird_Setup.lwp revised 12-11-2013 (Note1: Not all sections have been updated for the latest version of Thunderbird available at the time I verified that Section. Each
Sender and Receiver Addresses as Cues for Anti-Spam Filtering Chih-Chien Wang
Sender and Receiver Addresses as Cues for Anti-Spam Filtering Chih-Chien Wang Graduate Institute of Information Management National Taipei University 69, Sec. 2, JianGuo N. Rd., Taipei City 104-33, Taiwan
Focus On echalk Email. Introduction. In This Guide. Contents:
Focus On echalk Email Introduction Email can be very useful in a school setting. For instance, instead of writing out a memo and delivering it to everyone s mailbox in the main office, you can simply send
Filtering E-mail for Spam: PC
Filtering E-mail for Spam: PC Last Revised: April 2003 Table of Contents Introduction... 1 Objectives... 1 Filtering E-mail for Spam... 2 What Is Spam?... 2 What Is UT Doing About Spam?... 2 What Can You
MS Visio Org Charts and Diagrams. The Essentials. Janet W. Lee Technology Training Coordinator Loyola Marymount University, Los Angeles
MS Visio Org Charts and Diagrams 2 Outlook Desktop: The Essentials Janet W. Lee Technology Training Coordinator Loyola Marymount University, Los Angeles http://its.lmu.edu/training February 2014 Compiled
How To Preserve Email Records In Mississippi
EMAIL MANAGEMENT GUIDELINES FOR COUNTIES AND MUNICIPALITIES 1. Purpose The purpose of these guidelines is to ensure that the electronic mail records of county and municipal government officials and employees
Outlook 2010 basics quick reference sheet
Outlook 2010 basics Microsoft Outlook 2010 is the world s leading personal information management and communications application. Outlook 2010 delivers complete e-mail, contact, calendar, and task functionality.
Getting Started Guide
Getting Started Guide Mulberry IMAP Internet Mail Client Versions 3.0 & 3.1 Cyrusoft International, Inc. Suite 780 The Design Center 5001 Baum Blvd. Pittsburgh PA 15213 USA Tel: +1 412 605 0499 Fax: +1
Core Essentials. Outlook 2010. Module 1. Diocese of St. Petersburg Office of Training [email protected]
Core Essentials Outlook 2010 Module 1 Diocese of St. Petersburg Office of Training [email protected] TABLE OF CONTENTS Topic One: Getting Started... 1 Workshop Objectives... 2 Topic Two: Opening and Closing
Archiving Your Mail in Outlook 2007
About Archiving Archiving Your Mail in Outlook 2007 All messages, contact information, calendars and other data you create in Microsoft Outlook 2007 are kept in your mailbox on the Exchange (FIUmail) server.
About junk e-mail protection
About junk e-mail protection Entourage 2008 Junk E-Mail Protection Entourage has a built-in junk mail filter that helps separate junk e-mail also called spam from legitimate messages. By default, the level
Using WinGate 6 Email. Concepts, Features, and Configurations.
WinGate Feature Guide Using WinGate 6 Email Concepts, Features, and Configurations. Neil Gooden Qbik New Zealand Limited Rev 1.0 December 2004 2 Introduction...3 Basic Email Concepts... 3 Differences in
GFI Product Guide. GFI MailArchiver Archive Assistant
GFI Product Guide GFI MailArchiver Archive Assistant The information and content in this document is provided for informational purposes only and is provided "as is" with no warranty of any kind, either
Guideline for E-mail Services
Guideline for E-mail Services Under the Policy on Information Technology, the Vice-President and Provost is authorized to establish guidelines for information technology services at the University of Toronto.
Chapter 3 ADDRESS BOOK, CONTACTS, AND DISTRIBUTION LISTS
Chapter 3 ADDRESS BOOK, CONTACTS, AND DISTRIBUTION LISTS 03Archer.indd 71 8/4/05 9:13:59 AM Address Book 3.1 What Is the Address Book The Address Book in Outlook is actually a collection of address books
Outlook 2013. And Outlook Web App For. Trent University
Outlook 2013 And Outlook Web App For Trent University Outlook 2013 and Outlook Web App for Trent University Page 2 of 127 Copyright 2013 Shaw Computer Systems Introduction to Outlook 2013 TABLE OF CONTENTS
Web Mail Classic Web Mail
April 14 Web Mail Classic Web Mail Version 2.2 Table of Contents 1 Technical Requirements... 4 2 Accessing your Web Mail... 4 3 Web Mail Features... 5 3.1 Home... 5 3.1.1 Mailbox Summary... 5 3.1.2 Announcements...
Enterprise Email Archive Managed Archiving & ediscovery Services User Manual
Enterprise Email Archive Managed Archiving & ediscovery Services User Manual Copyright (C) 2012 MessageSolution Inc. All Rights Reserved Table of Contents Chapter 1: Introduction... 3 1.1 About MessageSolution
SSWLHC-List Listserve Member Guide
SSWLHC-List Listserve Member Guide Table of Contents INTRODUCTION 3 WHAT IS A LISTSERVE? 3 COMMON TERMS 3 MAILMAN LISTSERVE SOFTWARE 4 LISTSERVE INTERFACES 4 WEB INTERFACE 5 EMAIL INTERFACE 5 UNSUBSCRIBING
School Mail System. - Access through Office 365 Exchange Online. User Guide FOR. Education Bureau (EDB)
School Mail System - Access through Office 365 Exchange Online User Guide FOR Education Bureau (EDB) Version: 1.0 May 2015 The Government of the Hong Kong Special Administrative Region The contents of
SPAMMING BOTNETS: SIGNATURES AND CHARACTERISTICS
SPAMMING BOTNETS: SIGNATURES AND CHARACTERISTICS INTRODUCTION BOTNETS IN SPAMMING WHAT IS AUTORE? FACING CHALLENGES? WE CAN SOLVE THEM METHODS TO DEAL WITH THAT CHALLENGES Extract URL string, source server
Tutorial for Horde email. Contents
Tutorial for Horde email Contents Basics 1. Starting Horde 2. Reading emails 3. Replying / Forwarding 4. New email 5. Attachments 6. Save as Draft 7. Address books Adding contact details and accessing
Outlook XP Email Only
Outlook XP Email Only Table of Contents OUTLOOK XP EMAIL 5 HOW EMAIL WORKS: 5 POP AND SMTP: 5 TO SET UP THE POP AND SMTP ADDRESSES: 6 TO SET THE DELIVERY PROPERTY: 8 STARTING OUTLOOK: 10 THE OUTLOOK BAR:
Archiving Your Mail in Outlook 2010
About Archiving Archiving Your Mail in Outlook 2010 All messages, contact information, calendars and other data you create in Microsoft Outlook 2010 are kept in your mailbox on the Exchange server. But
Cleaning Up Your Outlook Mailbox and Keeping It That Way ;-) Mailbox Cleanup. Quicklinks >>
Cleaning Up Your Outlook Mailbox and Keeping It That Way ;-) Whether you are reaching the limit of your mailbox storage quota or simply want to get rid of some of the clutter in your mailbox, knowing where
SONA SYSTEMS RESEARCHER DOCUMENTATION
SONA SYSTEMS RESEARCHER DOCUMENTATION Introduction Sona Systems is used for the scheduling and management of research participants and the studies they participate in. Participants, researchers, principal
Archiving and Managing Your Mailbox
Archiving and Managing Your Mailbox We Need You to Do Your Part We ask everyone to participate in routinely cleaning out their mailbox. Large mailboxes with thousands of messages impact backups and may
416 Agriculture Hall Michigan State University 517-355-3776 http://support.anr.msu.edu [email protected]
416 Agriculture Hall Michigan State University 517-355-3776 http://support.anr.msu.edu [email protected] Title: ANR TS How To Efficiently Remove Items In Outlook To Free Up Space Document No. - 162 Revision
How To Analyze Email Activity At Enron
Discovery of Email Communication Networks from the Enron Corpus with a Genetic Algorithm using Social Network Analysis Garnett Wilson and Wolfgang Banzhaf During the legal investigation of Enron Corporation,
Table of Contents. Introduction: 2. Settings: 6. Archive Email: 9. Search Email: 12. Browse Email: 16. Schedule Archiving: 18
MailSteward Manual Page 1 Table of Contents Introduction: 2 Settings: 6 Archive Email: 9 Search Email: 12 Browse Email: 16 Schedule Archiving: 18 Add, Search, & View Tags: 20 Set Rules for Tagging or Excluding:
NEVER guess an e-mail address. Your mail will nearly always go to the wrong person.
16. WebMail (E-mail) E-mail is a mechanism for sending messages and information between computer users. Individuals are identified by their e-mail address, which is used in much the same way as a postal
Basics of Microsoft Outlook/Email. Microsoft Outlook
Basics of Microsoft Outlook/Email Microsoft Outlook Workshop Outline for Improve Your Outlook Microsoft Outlook Contents Starting the application... 3 The Outlook 2010 window... 3 Expanding and minimizing
Life after Microsoft Outlook Google Apps
Welcome Welcome to Gmail! Now that you ve switched from Microsoft Outlook to, here are some tips on beginning to use Gmail. Google Apps What s Different? Here are some of the differences you ll notice
How to make sure you receive all emails from the University of Edinburgh
How to make sure you receive all emails from the University of Edinburgh To ensure that any email from The University of Edinburgh - or any address you choose - is not automatically sent to your junk or
Webmail with. Sun Convergence
Webmail with 09 08 1 TABLE OF CONTENTS TABLE OF CONTENTS 1 2 1.1 Getting started 2 1.2 Reading E-mail 4 Sorting 4 Searching 4 Opening a message 5 Writing a message 6 1.3 Sending E-mail 6 Message with an
EFFECTIVE DATE: JULY 1, 2010
Town of Florence POLICY TITLE: EMAIL RETENTION POLICY RESPONSIBLE DEPARTMENT: Town Clerk Office APPROVAL: EFFECTIVE DATE: JULY 1, 2010 AP / RESOLUTION NO.: 2010-02 REFERENCES: TOWN MANAGER SIGNATURE: TOWN
Email Archiving E-mail Compliance Storage Management Electronic Discovery
Email Archiving E-mail Compliance Storage Management Electronic Discovery archiver Athena www.athenaarchiver.com Athena Archiver is a next-generation email and instant message archiving system which enables
Introduction. How does email filtering work? What is the Quarantine? What is an End User Digest?
Introduction The purpose of this memo is to explain how the email that originates from outside this organization is processed, and to describe the tools that you can use to manage your personal spam quarantine.
1 Accessing E-mail accounts on the Axxess Mail Server
1 Accessing E-mail accounts on the Axxess Mail Server The Axxess Mail Server provides users with access to their e-mail folders through POP3, and IMAP protocols, or OpenWebMail browser interface. The server
Practical tips for managing e mail
E MAIL MANAGEMENT E mail messages both sent and received that provide evidence of a government transaction are considered public records. Agencies and locality Records Officers must ensure that e mail
