in

ASP.NET Weblogs

Extreme JS

JS Greenwood's WebLog on architecture, .NET, processes, and life...

Another solution to spam

My previous solution to spam detailed in another entry was based upon how I, as a service-oriented technical architect, naturally approach such problems.  Having had a bit of a think about it, and a bit of enlightenment about how money isn't the only currency (CPU time, etc. also are in a way), I've come up with a new zero-infrastructure solution:

  1. A plugin is developed for mail clients such as Outlook, Eudora, etc.  This is installed on all client machines
  2. This plugin is activated whenever an item is sent.  It takes a subset of:
    • The sender's e-mail address
    • The recipient's e-mail address
    • The subject line
    • The date-time stamp
  3. Using this data, it computes some mathematically intensive function that results in a value.  This should take a few seconds (of background processing) to compute.  This function doesn't involve public/private keys - it would be a freely available algorithm that simply takes in the order of 2 seconds to come up with a value.
  4. This value is appended to the outgoing mail as a header - this is the signature
  5. --- The mail is transmitted and received by the client ---
  6. A plugin exists in the client's mail-reader that intercepts this header
  7. The function chosen for the computation must allow the "correctness" of the value to be determined within a fraction of a second rather than several seconds (there are formulae like this - I just can't remember them).  Again, this algorithm would be freely available
  8.  The validity of the header determines the validity of the e-mail

Rather than creating the signature/validating at the client, certain mail-servers could do this - both inbound and potentially outbound (from certain trusted servers).  Individual users could set up rules as to whether or not they accept unsigned mails.

Why's this solution good?
Basically, it's realistically free for everyone and requires no infrastructure.  If there's one thing that all the file-sharing applications have proven, it's that de-centralised peer-to-peer systems can thrive.  In terms of implementation, this solution would take minimal time to be developed as a plugin for mail clients - the triviality will lend itself to freeware implementations, leading to mail clients including it in the long term.  The fact that an e-mail takes an extra few seconds to send in the background wouldn't affect a normal user, but it would make sending signed bulk-mailings prohibitively expensive.  For companies that send genuine bulk mail-shots, they could just be added to an allow-list on an ad-hoc basis (i.e. when you sign up to a mailing list).

Thoughts?

Published Jul 23 2004, 01:15 AM by jsgreenwood
Filed under:

Comments

 

matt said:

http://www.wired.com/news/politics/0,1283,34520,00.html

It's nice to see these ideas are around - just needs some big player to start backing it.
July 22, 2004 8:24 PM
 

AndrewSeven said:

The problem I can see is: as processing power become more and more available, the 10 second time span will shrink, eventualy reducing it to an acceptable (to spammers) time span.

On a side note, the first thing that jumped up at me was "zero-cost", I don't see how you can call this approach zero cost.
-Someone must develop the plug-ins
-Someone must invent the algorithm
-The delay/processing overhead of executing the algorithm must be based on using 100% of the available CPU of the fastest desktop (server ?) machines available. Which means that the machines user can do no other work.




July 22, 2004 8:40 PM
 

Wim Hollebrandse said:

Hmm....

What about the BCC field?

E.g. what happens when the spammer calculates the particular signature once over the sender's e-mail, recipient (which can remain the same if real recipient is in BCC list), and then appends this perfectly correct 'signature' value to 10 million e-mails?

Regarding the datetime value, can't that be set to a constant value also, and the e-mail item then written to some spool directory to be picked up by the mail server?

Just thinking of how a spammer might approach and by pass this....

Cheers,
Wim
July 22, 2004 8:47 PM
 

JS Greenwood said:

Andrew,
The Moore's Law problem is something that would need to be overcome - but increasing the complexity of the calculation (which'll probably be some hash function) every few years wouldn't be a complex operation.
1. I could probably think back to all my maths lectures and come up with a valid algorithm, etc. in a couple of hours.
2. As for implementation - it's so trivial, it'd be freeware. I imagine I could develop an Outlook plug-in for this over a weekend.
3. Simple - launch the computation as a separate low-priority process/thread, allowing ongoing work

The main problem I actually see is in mailing from mobile devices where CPU time (and battery life) is at a premium.
July 22, 2004 8:51 PM
 

JS Greenwood said:

Wim,

Good questions. Solutions as follows:

1. BCC is a minor problem - it can be solved, though. You could include a signature header for each recipient (each computed separately), and then the recipient validates that THEIR e-mail address was one of the signatures computed.
2. The send-date should be within a predefined time-period - probably no more than a few hours in the future, and no more than a day or so in the past. You'd lose a tiny percentage due to mail-routing delays, but I think that's acceptable.

As an aside, I think date + recipient + sender is a good combination of values to check on, personally.
July 22, 2004 8:57 PM
 

Dave said:

Not sure about how well this solution would be recieved by users. I (like thousands of people) have Norton running. When I send an email - it takes norton a few seconds to can my outgoing mail for virii. This results in a high cpu load (not helped by the 2 messenger-style popup windows it shows) and a delay in sending the mail. This annoys me. It annoys me a LOT.

If, every time you send a mail, the CPU load ramps right up to generate the sig, it wouldn't take me long at all to disable to plug-in. Certainly if I was sending to more than one person, I couldn't justify it at all. I know you've stated that the computation could be done on mail servers instead of the client - but I can NOT see many mail server sys-admins welcoming this with open arms. It takes a fraction of a second to process an email thru a well-configured server. Even if the server was sending, say, 10,000 emails that were spam, it could still wade thru these faster than dealing with just a FEW emails that would take 2-3 seconds of CPU time. Hence I cannot see mail servers doing anything that takes up that much processing time. Further more - a great deal of spam is sent using hacked / hijacked servers. How would your solution address this?

I know you can say something like "the amount of time saved not having to wade thru spam will more than compensate for the delay" - but there are lots of people out there (especially in a corporate environment) that don't publish email addresses, and hence get very few pieces of spam.

Good idea tho'. Don't see it working in practice.
July 24, 2004 12:36 PM
 

Rob Styles said:

James,

I don't want to seem too negative about this idea, as I think it's fantastic to have architects of your calibre trying to solve the problem, but I think you're on a hiding to nothing on this one.

The BCC problem isn't trivial - appending a signature for each email address would add very siginifcant overhead, especially at ten seconds per recipient, and as you want to validate the authenticity of the signature the algorithm has to be a two-way algorithm that is expensive to encode and trivial to decode, this is the antihesis of most cyphers so I doubt many people have thought deeply about it. Although you may find an existing cypher that could be run backwards. This is not a simple hashing algorithm however.

The other major issue is that breaking this solution will have significant economic benefit. We already see spammers using bots to sign up web mail accounts, spam from them and then move on and we already see home machines being hijacked and assimilated into bot nets for spamming, so I'm not at all convinced that a ten second per recipient penalty will be a huge deterrent.

The other aspect of the economics, of course, is that somebody will build a hardware accelerator. As the algorithm will be public domain and very many people, enterprise mail adminsitrators included, will not want the overhead of this on their main CPUs, accelerator cards would become available and would be used by the spammers.

Sorry mate, nice idea, but too may holes.

rob
July 27, 2004 4:34 AM
 

JS Greenwood said:

Rob,

Good to see you're reading. :)

Some good points there...

1. The more I think about it, the more I realise that the CPU hit can't be over 1 second or so (still enough to block mass-spammings, though). Most bulk-mailings will be one-off things (clubs, mailing-lists, etc.) where timeliness isn't that important, or internally within companies (where this won't apply). White-listing known senders can get round this, too.

2. Yeah, whatever you implement, people will try to circumvent it. The machines-being-hijacked is a very valid point. This is a separate problem, though, that hopefully will get sorted independently (I'll keep dreaming). I wouldn't worry about accelerator cards for this too much, more that Moore's law will play into it, meaning that the computation will become more and more trivial over time. However, there is a solution to that...

3. Coming up with an algorithm isn't that hard, actually... was chatting to a mate about it over the weekend. All you need is some kind of partial-hash-collisions (using standard routines like MD5, where complete collisions are really, REALLY unlikely). Checking a partial collision (of n-bits) would be trivial. :) Over time, you'd want to add one to the required collision amount for every 18 months or so (assuming that the law holds true ad infinitum). That could easily be built into the clients - to require ((year - 1994) / 1.5) bit-collisions or greater.

In reality, none of this really matters, as until there's a critical mass of users, any solution is only a niche item. If Outlook had it implemented as standard (or even Hotmail), then the problem might be alleviated (if not solved) very quickly.

Basically, you don't have to make it impossible to circumvent, just costly. Direct-marketing works on the principle (like all marketing) that the profit from the return-rate will out-weigh the cost of the campaign. If you make mail cost an order of magnitude (or two/three) more, then mass-mailing might become far less common. Maybe. ;)
July 27, 2004 6:42 PM

Leave a Comment

(required)  
(optional)
(required)  
Add