|
I tried retrieving some old messages from the server, and they all come in one email, in digest format, up to 100 per email, like Graham said. Even if we retrieve "only" 100 messages (in one email) every hour, it should be more than enough for any mailing list.
However, more than just a digest, ezmlm also includes each email as an attachment, in html format (with an ezm extension). That html contains the main headers - Subject, From, Date, To - followed by the message body. It doesn't seem to have any message IDs though, that may be a problem. Do you guys think those headers are enough to reconstruct the archive? Perhaps with "flat" threads determined by the subject?
Besides the ezm attachments, *some of* the message bodies are also included separately, as html attachments, with some kind of numeric IDs for the file name. I haven't figured out yet how it chooses which messages to include like that. Perhaps plain text vs html in the original message?
And about converting emails from my email client.. first, while I probably have all the emails I really care about, it would be nice to get the complete archive (which I don't have). Second, the emails are in my yahoo account; I'd have to run fetchyahoo or something, I wonder if it can retrieve them in original form. And third, I may have deleted some.
Anyway, let me know what you think about my findings above.
Thanks
Adrian
|