Discussion:
Fastest way to inject a lot of mail?
(too old to reply)
John Levine
2024-03-05 21:14:44 UTC
Permalink
One of my clients has an application that builds several hundred customized
messages reporting what changed in the past day, and sends each one to a
list of people who have subscribed to it. (This isn't spam, they complain
when they don't get it.)

We currently send the mail by putting all the recipients on the bcc:
line and running /usr/sbin/sendmail -t and feeding it the message
through a pipe. By the time all the messages are done this takes a
while. Is there a faster way to do it? SMTP to 127.0.0.1? LMTP?
--
Regards,
John Levine, ***@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
Grant Taylor
2024-03-06 01:31:36 UTC
Permalink
Hi John,
Post by John Levine
One of my clients has an application that builds several hundred
customized messages reporting what changed in the past day, and sends
each one to a list of people who have subscribed to it. (This isn't
spam, they complain when they don't get it.)
;-)
Post by John Levine
line and running /usr/sbin/sendmail -t and feeding it the message
through a pipe. By the time all the messages are done this takes a
while. Is there a faster way to do it? SMTP to 127.0.0.1? LMTP?
Please clarify -- I'm trying to understand / confirm -- are sending
multiple envelope recipients per customized report? Or are you sending
individual messages per recipient?

Also, what delivery mode are you using? Queued / interactive? (That
might not be the proper nomenclature.)
--
Grant. . . .
Marco Moock
2024-03-06 09:51:37 UTC
Permalink
Post by John Levine
line and running /usr/sbin/sendmail -t and feeding it the message
through a pipe. By the time all the messages are done this takes a
while. Is there a faster way to do it? SMTP to 127.0.0.1? LMTP?
Can you find the reason for that?
You can check the logs when the mail from the MSP reached the MTA.
Are milters in use?
--
kind regards
Marco

Send spam to ***@cartoonies.org
John Levine
2024-03-06 20:59:04 UTC
Permalink
Post by Marco Moock
Post by John Levine
line and running /usr/sbin/sendmail -t and feeding it the message
through a pipe. By the time all the messages are done this takes a
while. Is there a faster way to do it? SMTP to 127.0.0.1? LMTP?
Can you find the reason for that?
Yeah, because it's doing a lot of work sending tens of thousands of
messages. There's no milters on the system where they're injected.
They go through a smarthost with a DKIM signing milter but it seems
plenty fast.

In answer to another question, the number of recipients per message
varies but is typically between 10 and 50. At some point we should
redo it to do individual deliveries so we can customize them more
but not any time soon.
--
Regards,
John Levine, ***@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
HQuest
2024-03-08 00:59:38 UTC
Permalink
Aside of the recommended "Tuning" by Claus, your workflow the way I read felt inefficient. You prep the mail, fire up a sendmail MSA instance (just to wrap the message with proper mail headers), handle to another sendmail MTA or whatever (to add DKIM headers) and move it forward... if you already have a trusted MTA elsewhere, why don't you just deliver the message right into that MTA via the application itself?

If you are lazy (as I am), a rudimentary, poorly written, very insecure and extremely lazy bash script can do the job:
echo ${mail_message_complete_with_envelope_headers_and_ehlo} > /dev/tcp/$smtpsrv/$smtpport

Assuming you trust it enough to cut the authentication and StartTLS pieces to save precious CPU cycles. The DKIM header will still be added by the MTA, but by not spinning multiple MSAs which, combined with its dynamic libraries, you save quite some time to get messages moving.

Or if you really want to use sendmail as MSA because authentication/TLS/reasons, keep one running, and deliver the email message to it via TCP/IP. A fork() is always much faster than a full load.
Claus Aßmann
2024-03-08 05:36:21 UTC
Permalink
Post by HQuest
echo ${mail_message_complete_with_envelope_headers_and_ehlo} >
/dev/tcp/$smtpsrv/$smtpport
That might fail due to "unauthorized PIPELINING".
HQuest
2024-03-08 12:48:58 UTC
Permalink
Post by Claus Aßmann
That might fail due to "unauthorized PIPELINING".
My cron scripts and/or sendmail.cf would beg to differ, but my point is that it is much superior to deliver a message to a running MSA/MTA than spinning up a new copy of sendmail for every message to be delivered.
Claus Aßmann
2024-03-08 20:04:43 UTC
Permalink
Post by HQuest
Post by Claus Aßmann
That might fail due to "unauthorized PIPELINING".
My cron scripts and/or sendmail.cf would beg to differ, but my point is
Do you run 8.18? Did you disable the extra checks?
What happens if your MTA is "too busy" (421) or replies with some
error to one of the commands?
Post by HQuest
that it is much superior to deliver a message to a running MSA/MTA than
spinning up a new copy of sendmail for every message to be delivered.
That might be the case, but AFAICT it does not apply to the way the
OP is submitting mails (a single mail with a large list of recipients).
--
Note: please read the netiquette before posting. I will almost never
reply to top-postings which include a full copy of the previous
article(s) at the end because it's annoying, shows that the poster
is too lazy to trim his article, and it's wasting the time of all readers.
Grant Taylor
2024-03-07 01:30:17 UTC
Permalink
By the time all the messages are done this takes a while.
Please quantify "all the messages" and "takes a while".

How many messages (SMTP envelopes)?

How long does it take (seconds / minutes / hours / days)?
--
Grant. . . .
Claus Aßmann
2024-03-07 06:06:35 UTC
Permalink
Post by John Levine
Is there a faster way to do it? SMTP to 127.0.0.1? LMTP?
First you need to identify the bottleneck(s),
then you can work on solutions.

BTW: did you read the fine documentation?
(hint: "TUNING"...)
--
Note: please read the netiquette before posting. I will almost never
reply to top-postings which include a full copy of the previous
article(s) at the end because it's annoying, shows that the poster
is too lazy to trim his article, and it's wasting the time of all readers.
John Levine
2024-03-09 00:25:28 UTC
Permalink
Post by Claus Aßmann
Post by John Levine
Is there a faster way to do it? SMTP to 127.0.0.1? LMTP?
First you need to identify the bottleneck(s),
then you can work on solutions.
Well, yeah, that's why I was wondering whether running the sendmail program
is likely to be slow.
Post by Claus Aßmann
BTW: did you read the fine documentation?
(hint: "TUNING"...)
I did and unless I missed something, it says nothing about injecting
mail via the sendmail command other than the obvious thing that you
want to queue rather than delivering synchronously.

So here's a question: I have on the order of 10,000 messages, each
with a dozen or so recipients. It's currently running the sendmail
command for each one. If I opened a connection to 127.0.0.1 and
did a sequence of MAIL FROM/RCPT TO/DATA, would that be faster? How
about if I did it with N processes in parallel for some modest N? It
currently takes about 6 hours on a moderately fast VPS.

If nobody has any idea, OK, but it's hard to believe I'm the first person
ever to wonder about this.
--
Regards,
John Levine, ***@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
Grant Taylor
2024-03-09 01:39:50 UTC
Permalink
Post by John Levine
So here's a question: I have on the order of 10,000 messages, each
with a dozen or so recipients.
That's quite a few discrete messages.
Post by John Levine
It currently takes about 6 hours on a moderately fast VPS.
Rough math, that's a little over 2 seconds per message.

On one hand that seems a little slow, but on the other hand, maybe not.

How big are the messages? There's a big difference if it's a few kB of
text vs multiple MB of attachments.

Depending on the VPS and the disk(s) backing it, I could see how this
may be a disk I/O performance issue. This seems especially germane on a
VPS which is likely shared and may have disk I/O throttling.

I'd suggest looking at this from an OS performance perspective.

If it's Linux, `iostat -x 1` or `sar` or `nmon` are good candidates.

I don't remember, are there any milters in Sendmail?

What are you using for the DNS server? Is it local to the system or are
you dependent on something across the network. If it's across the
network, how far across the network is it?

Are there any errors in any logs?

I would naively think that Sendmail itself could handle messages quite a
bit faster. But I'm probably thinking about SMTP interface vs command
line forking.
--
Grant. . . .
John Levine
2024-03-09 03:19:52 UTC
Permalink
Post by Grant Taylor
Post by John Levine
So here's a question: I have on the order of 10,000 messages, each
with a dozen or so recipients.
That's quite a few discrete messages.
Post by John Levine
It currently takes about 6 hours on a moderately fast VPS.
Rough math, that's a little over 2 seconds per message.
On one hand that seems a little slow, but on the other hand, maybe not.
How big are the messages? There's a big difference if it's a few kB of
text vs multiple MB of attachments.
Not large, plain text, maybe 10K.
Post by Grant Taylor
If it's Linux, `iostat -x 1` or `sar` or `nmon` are good candidates.
I don't remember, are there any milters in Sendmail?
Not on this machine.
Post by Grant Taylor
What are you using for the DNS server? Is it local to the system or are
you dependent on something across the network. If it's across the
network, how far across the network is it?
I'll have to check but I believe there's a local cache on the LAN. It's
sending it all to a smarthost so I wouldn't expect a lot of DNS traffic.
Post by Grant Taylor
Are there any errors in any logs?
No, it all works, just not terribly fast.
Post by Grant Taylor
I would naively think that Sendmail itself could handle messages quite a
bit faster. But I'm probably thinking about SMTP interface vs command
line forking.
Right. Hey, here's a question: if I injected the mail via SMTP to
127.0.0.1, would that be faster than forking and running sendmail?
Slower? Or am I the first person in sendmail's 35 year history to ask
this question?
--
Regards,
John Levine, ***@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
Grant Taylor
2024-03-09 03:32:01 UTC
Permalink
Post by John Levine
Not large, plain text, maybe 10K.
ACK
Post by John Levine
I'll have to check but I believe there's a local cache on the LAN.
It's sending it all to a smarthost so I wouldn't expect a lot of
DNS traffic.
I'm inclined to agree with you. But I don't know what type of DNS
queries Sendmail might be doing. I think that a sniffer would answer
that in short order.
Post by John Levine
No, it all works, just not terribly fast.
ACK

Have you considered wrapping your call to the sendmail binary in time
and seeing how long things are taking?

Is there a chance that the vast majority are very fast and after some
threshold something slows down considerably for a period of time?

Dare I say it, this is where more data tends to help.
Post by John Levine
Right. Hey, here's a question: if I injected the mail via SMTP to
127.0.0.1, would that be faster than forking and running sendmail?
Slower? Or am I the first person in sendmail's 35 year history to
ask this question?
I would be flabbergasted if you are the first to ask this. I suspect
many -> most that have are not paying attention to this newsgroup to answer.
--
Grant. . . .
Claus Aßmann
2024-03-09 06:22:03 UTC
Permalink
Post by John Levine
Post by Grant Taylor
What are you using for the DNS server? Is it local to the system or are
I'll have to check but I believe there's a local cache on the LAN. It's
sending it all to a smarthost so I wouldn't expect a lot of DNS traffic.
Seems like a wrong expectation - or did you turn off DNS lookups?
It's explained in the fine documentation mentioned earlier:
* DNS Lookups

If it's one mail with lots of addresses: all of this is done
sequentially in one process - so hopefully all the data is in a
local cache.
--
Note: please read the netiquette before posting. I will almost never
reply to top-postings which include a full copy of the previous
article(s) at the end because it's annoying, shows that the poster
is too lazy to trim his article, and it's wasting the time of all readers.
John Levine
2024-03-09 19:38:25 UTC
Permalink
Post by Claus Aßmann
Post by John Levine
I'll have to check but I believe there's a local cache on the LAN. It's
sending it all to a smarthost so I wouldn't expect a lot of DNS traffic.
Seems like a wrong expectation - or did you turn off DNS lookups?
* DNS Lookups
Ah, it's hiding in the TUNING file. I suppose I can turn off the
canonify stuff. The DNS caches are on the same LAN with ping times
under a millisecond so cache location is not likely to be a problem,
but the addreesses in the list should all be real ones.

Does sendmail really replace CNAMEs in recipient host names? That's
been deprecated for 25 years.

I must say it's pretty impressive that sendmail's internal structure is so
opaque that nobody has any idea whether running the sendmail program is
likely to be faster or slower than TCP submission.
--
Regards,
John Levine, ***@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
Andrzej Adam Filip
2024-03-09 07:29:22 UTC
Permalink
Post by John Levine
Post by Claus Aßmann
Post by John Levine
Is there a faster way to do it? SMTP to 127.0.0.1? LMTP?
First you need to identify the bottleneck(s),
then you can work on solutions.
Well, yeah, that's why I was wondering whether running the sendmail program
is likely to be slow.
Post by Claus Aßmann
BTW: did you read the fine documentation?
(hint: "TUNING"...)
I did and unless I missed something, it says nothing about injecting
mail via the sendmail command other than the obvious thing that you
want to queue rather than delivering synchronously.
So here's a question: I have on the order of 10,000 messages, each
with a dozen or so recipients. It's currently running the sendmail
command for each one. If I opened a connection to 127.0.0.1 and
did a sequence of MAIL FROM/RCPT TO/DATA, would that be faster? How
about if I did it with N processes in parallel for some modest N? It
currently takes about 6 hours on a moderately fast VPS.
If nobody has any idea, OK, but it's hard to believe I'm the first person
ever to wonder about this.
Hints about injecting a few thousand emails to local sendmail:
1. Inject multiple messages via the same SMTP connection
to avoid needles forking of local sendmail processes
2. Group recipients by recipient's domain
to reduce number of outgoing SMTP sessions
3. Use parallel SMTP injections with VERB SMTP command
to avoid merely queuing due to high local system load.
VERB turns on "sequential delivery" with reporting delivery progress
so your parallel submits won't merely put messages to the queue for
later delivery. It may require allowing VERB from 127.0.0.1 *ONLY*.
Start with a few SMTP sessions. Consider slow increase to a few
dozens.

AFAIR Sympa mail list manager provides some useful hints.
Queue fast and send out fast does not mean the same for mass mailing.

Anyway: Expect a few+ surprises from anti-spam measures of receiving
servers, fresh/new surprises *too* . Usenet message short advises
must be incomplete so do expect a few nasty surprises.

About lack of recipes: People tend to avoid providing too easy recipes
also for (incompetent) spammers.
--
[Andrew] Andrzej A. Filip
Claus Aßmann
2024-03-09 08:38:01 UTC
Permalink
Post by John Levine
Post by Claus Aßmann
(hint: "TUNING"...)
I did and unless I missed something, it says nothing about injecting
You are looking for an answer to one specific question - but maybe
your question does not address the actual problem?

As others have told you: if you don't know what's "slow", you won't
be able to solve the problem -- except maybe by trying different things.

So you could just try your alternative (one SMTP session, multiple
transactions) to see what happens.

PS: someone once complained that some MTA was slow sending mail
until I asked them about the actual data... which showed their
(outgoing) internet bandwidth was completely used by the MTA (because
they sent one RCPT per TA with "lots" of RCPTs and large mails).
That is, unless you know which bottleneck is actually "hit"
it's hard to tell what to do differently...
--
Note: please read the netiquette before posting. I will almost never
reply to top-postings which include a full copy of the previous
article(s) at the end because it's annoying, shows that the poster
is too lazy to trim his article, and it's wasting the time of all readers.
Loading...