From 7b8c0d723e635f7a8cde3ceb4425426528ac8240 Mon Sep 17 00:00:00 2001 From: elijah Date: Thu, 26 May 2016 11:19:05 -0700 Subject: added support for processing message headers --- README.md | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) (limited to 'README.md') diff --git a/README.md b/README.md index dd59a16..abd0714 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,74 @@ USAGE rake reset cat postfix.log.1 | bin/parse-email-logs +DB NOTES +============================================ + +There is one record for each message delivery. This means that there might be +many records with duplicate queue_ids. Some messages never get successfully +delivered, and have status of 'deferred' or 'bounced'. + +A lot of messages never show up in the db, such as those rejected because they +had viruses or blocked by RBLs. + +Emails addresses are cleaned and then hashed using HMAC. For example: + + 1. Elijah + 2. elijah@riseup.net + 3. 31b8edad2227cc37ecead62bb14dcfe9@ff437a33d77574732ae1e09add6cfe49 + +The username and the domain parts are hashed separately. + +The fields: + +* id: sequence number. ignore it. + +* message_id: a hash of the actual message id in the headers. It might be empty + or missing. + +* queue_id: the id assigned to this delivery by postfix. messages with many + recipients might be spread across multiple queue_ids + +* first_seen_at: the first time a log entry with this queue_id appeared in the + logs. + +* date: the actual "Date" header. + +* sent_at: + * for incoming: the date header + * for outgoing: first_seen_at + +* received_at: + * for incoming: first_seen_at + * for outgoing: when the mx server logs status=sent + +* sender: the envelope sender, hashed + +* recipient: the envelope recipient, hashed. + +* from, to, cc, bcc: the addresses in respective headers, hashed. + +* message_size: the byte size of the entire message + +* spam_score: not currently gathered + +* subject_size: the number of characters in the "Subject" header + +* is_list: true if message was sent by a mailing list + +* is_outgoing: true if the message is outgoing + +* re_message_id: message id of another message that this message is in reply to + +* status: one of deferred, bounced, or sent. You can ignore all messages that + are not "sent". deferred messages might later get delivered, so we keep + these records when scanning the logs. + +* delay: I am not sure exactly what this is, but postfix logs it and it seems + interesting. + +* delays: again, not sure exactly what it is. + NOTES ============================================ -- cgit v1.2.3