Cloaking email addresses in Umbraco
Publishing email addresses in clear text on a web page is nice since people can see it, click on the mailto-link or copy it to their address book. The only problem is that bots crawling the web in search for email addresses can do the same. The addresses that these bots "harvest" are added to huge databases and are used for sending spam. That's something you don't want.
In my case I had the email addresses stored in a separate data field in Umbraco so it was fairly easy to do something to cloak them. Using a combination of XSLT, C# and JavaScript I put together a system for cloaking the email address for a crawler bot and still make it work as a clickable mailto-link for a human user.
In the head-tag of the page template I added a reference to an included JavaScript file with two funtions. It looks like this:
function nospam(mail)
{
var domain = "acme";
var country = ".com";
emailE = (decodeROT13(mail) + '@' + domain + country);
window.location.href = "mailto:" + emailE;
}
function decodeROT13(email)
{
var src = email;
var dst = new String('');
var len = src.length;
var b;
var t = new String('');
if(len > 0)
{
for(var ctr=0; ctr < len ; ctr++)
{
b = src.charCodeAt(ctr);
if( ( (b > 64) && (b < 78) ) || ( (b > 96) && (b < 110) ) )
{
b = b + 13;
}
else
{
if( ( (b > 77) && (b < 91) ) || ( (b > 109) && (b < 123) ) )
{
b = b - 13;
}
}
t=String.fromCharCode(b);
dst=dst.concat(t);
}
}
return dst;
}
This is used together with an XSLT-macro spiced with a little C#. The XSLT-macro looks like this:
<!DOCTYPE xsl:Stylesheet [ <!ENTITY nbsp " "> <!ENTITY raquo "»"> ]>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxml="urn:schemas-microsoft-com:xslt"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:umbraco.library="urn:umbraco.library"
xmlns:tk="urn:thomaskahn-com:xslt"
exclude-result-prefixes="msxml umbraco.library">
<xsl:output method="xml" omit-xml-declaration="yes"/>
<xsl:param name="currentPage"/>
<xsl:template match="/">
<xsl:variable name="email" select="$currentPage/data[@alias='emailAddress']"/>
<xsl:variable name="justName" select="tk:removeDomain(string($email))"/>
<a href="javascript:nospam('{$justName}');">Send an email</a>
</xsl:template>
<msxsl:script language="C#" implements-prefix="tk">
<![CDATA[
public string removeDomain(string email)
{
char[] splitter = {'@'};
string[] parsedMail = email.Split(splitter);
return rot(parsedMail[0], 13);
}
public static string rot(string s, int amt)
{
char[] ch = s.ToCharArray();
for (int i = 0; i < s.Length; i++)
{
if (ch[i] <= 'Z' && ch[i] >= 'A') ch[i] = (char)(((int)ch[i] - 'A' + amt) % 26 + 'A');
if (ch[i] <= 'z' && ch[i] >= 'a') ch[i] = (char)(((int)ch[i] - 'a' + amt) % 26 + 'a');
}
string ret = new string(ch);
return ret;
}
]]>
</msxsl:script>
</xsl:stylesheet>
The XSLT is mostly used for retrieving the email address and printing the finished result on the page. The interesting part is the C# code. There are two functions: removeDomain and rot.
RemoveDomain simply cuts an email address in half by splitting it on the @-character. So an address like "john.doe@acme.com" will be split into "john.doe" and "acme.com". We keep only the first part. This first part (john.doe) is then passed to the other function.
Rot is the method used to garble the name. We're not talking encryption because there are no encryption keys or something like that. The function just rotates the characters in the input string a certain number of positions in the alphabet. I chose 13 since ROT13 is a classic method for scrambling words.
The garbled string is then placed inside a JavaScript tag which is printed on the public page. When a user clicks on this tag a JavaScript function called nospam is called (see the JavaScript code above). Inside this function I've placed the domain (acme) and the domain suffix (.com). The function nospam takes the garbled name part of the email address, sends it to the JavaScript-function decodeROT13 which ungarbles it. After that the @-character, the domain name and the suffix is added on. Last the JavaScript makes a call to this email address using a mailto: prefix in the link. What normally happens is that the email client that you have specified in your browser opens up with the email address already printed in the to-field.
I actually don't know how effective this is, but I hope it's enough to stop most email address harvesting bots from getting what they're after(?) Also, some might argue that the whole ROT13 thing is unneccesary, but I thought it was fun to do and an extra step towards protecting the email address.
Don't hesitate to comment or suggest improvements. This was pieced together in an hour so I think there's room for improvements. Another challenge would be to make it work on email addresses that are embedded in rich text fields or other places where the email address is not accessible in the same way. Here's an interesting article on A List Apart with an elegant solution, except this solution is done in PHP.
Source files can be downloaded here: emailEncryption.zip
. Put it in the folder where all the other flags are.