Archiving old messages

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Archiving old messages

aditsu
Hi, I added a mailing list to nabble and finally got it to archive messages (there was an issue with X-No-Archive apparently).
However, it's only archiving new messages, and not fetching the past ones.
The list server is using ezmlm, which supports stuff like: (list)-get.123_145@(server) to get messages 123 through 145, and other commands. Nabble could use these features to build the archive, but I guess it wasn't programmed to do it.
The only way to archive old messages, that I've seen mentioned, seems to be sending an mbox file to support at nabble.com. However, besides the fact that it puts the burden on the user to provide the archive himself, in an arcane format which has 4 incompatible variations, I also sent several emails to that address but never got any reply. I wouldn't want to spend many days writing code and building that pesky mbox file then send it only to find that it went to the proverbial /dev/null

So.. any suggestions?

Reply | Threaded
Open this post in threaded view
|

Re: Archiving old messages

Will <Nabble>
Administrator
Yes, this seems to be a good suggestion. So we'll take a look.

But on the other hand, not all mailing list software supports this. Also, the list admins may have some security feature to block frequent requests. Nabble could get blacklisted for offering this as a general feature.

To make sure the email command works, can you retrieve those emails to your local machine and see if there is any limit?

Also, to avoid waiting, you can probably try find some software than turn emails in your email client into a mbox file.
Reply | Threaded
Open this post in threaded view
|

Re: Archiving old messages

aditsu
I tried retrieving some old messages from the server, and they all come in one email, in digest format, up to 100 per email, like Graham said. Even if we retrieve "only" 100 messages (in one email) every hour, it should be more than enough for any mailing list.

However, more than just a digest, ezmlm also includes each email as an attachment, in html format (with an ezm extension). That html contains the main headers - Subject, From, Date, To - followed by the message body. It doesn't seem to have any message IDs though, that may be a problem. Do you guys think those headers are enough to reconstruct the archive? Perhaps with "flat" threads determined by the subject?

Besides the ezm attachments, *some of* the message bodies are also included separately, as html attachments, with some kind of numeric IDs for the file name. I haven't figured out yet how it chooses which messages to include like that. Perhaps plain text vs html in the original message?

And about converting emails from my email client.. first, while I probably have all the emails I really care about, it would be nice to get the complete archive (which I don't have). Second, the emails are in my yahoo account; I'd have to run fetchyahoo or something, I wonder if it can retrieve them in original form. And third, I may have deleted some.

Anyway, let me know what you think about my findings above.

Thanks
Adrian