summaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md68
1 files changed, 68 insertions, 0 deletions
diff --git a/README.md b/README.md
index dd59a16..abd0714 100644
--- a/README.md
+++ b/README.md
@@ -15,6 +15,74 @@ USAGE
rake reset
cat postfix.log.1 | bin/parse-email-logs
+DB NOTES
+============================================
+
+There is one record for each message delivery. This means that there might be
+many records with duplicate queue_ids. Some messages never get successfully
+delivered, and have status of 'deferred' or 'bounced'.
+
+A lot of messages never show up in the db, such as those rejected because they
+had viruses or blocked by RBLs.
+
+Emails addresses are cleaned and then hashed using HMAC. For example:
+
+ 1. Elijah <elijah@riseup.net>
+ 2. elijah@riseup.net
+ 3. 31b8edad2227cc37ecead62bb14dcfe9@ff437a33d77574732ae1e09add6cfe49
+
+The username and the domain parts are hashed separately.
+
+The fields:
+
+* id: sequence number. ignore it.
+
+* message_id: a hash of the actual message id in the headers. It might be empty
+ or missing.
+
+* queue_id: the id assigned to this delivery by postfix. messages with many
+ recipients might be spread across multiple queue_ids
+
+* first_seen_at: the first time a log entry with this queue_id appeared in the
+ logs.
+
+* date: the actual "Date" header.
+
+* sent_at:
+ * for incoming: the date header
+ * for outgoing: first_seen_at
+
+* received_at:
+ * for incoming: first_seen_at
+ * for outgoing: when the mx server logs status=sent
+
+* sender: the envelope sender, hashed
+
+* recipient: the envelope recipient, hashed.
+
+* from, to, cc, bcc: the addresses in respective headers, hashed.
+
+* message_size: the byte size of the entire message
+
+* spam_score: not currently gathered
+
+* subject_size: the number of characters in the "Subject" header
+
+* is_list: true if message was sent by a mailing list
+
+* is_outgoing: true if the message is outgoing
+
+* re_message_id: message id of another message that this message is in reply to
+
+* status: one of deferred, bounced, or sent. You can ignore all messages that
+ are not "sent". deferred messages might later get delivered, so we keep
+ these records when scanning the logs.
+
+* delay: I am not sure exactly what this is, but postfix logs it and it seems
+ interesting.
+
+* delays: again, not sure exactly what it is.
+
NOTES
============================================