Rant: Form Follies

Recently I ran across a blog that really got me going for a short time. It required Java, Javascript, and cookies including 3rd party cookies to view the web page. A friend of mine got suspicous of the site and asked me to look at it. What I discovered was a lot of stupidity and a form that quite frankly was an invitation to malicious people screaming “Hack My Site!” Of course, the site itself was so amateurish looking I wasn’t sure it hadn’t already been hacked or worse, it was a malicious site in and of itself. But it got me to thinking about poor coding practices – especially sites that rely on client side sanity and sanitation primarily.
Ok, now this site was most likely malicious so having really bad code shouldn’t surprise me. As such, I let my frustration over poor coding practices on the web go back to a very low simmer. But this morning I noticed that Ajaxian had an article that had been hacked seriously with tons of links to Viagra etc. They got it fixed soon enough (there were like 4 comments pointing out the spam) but it brought this whole rant back and I decided write it.

First of all, my tool of investigation was Firefox 3 with the following extensions: Firebug, JSView, Live HTTP Headers, Web Developer, and of course Greasemonkey. In some ways, all of them combined was like throwing a nuke into a water gun fight. The site I was looking at really was somewhat crude, code wise.

I did all of the investigation in a virtual pc in case there was something malicious on the site itself, which is what my friend feared. I really was bugged by the requirement for Java when the only Java applet I saw was a news ticker that had fairly old news. It could have been a trojan but <shrug> I didn’t worry about it since I was going to throw the virtual image away. Sadly, I did so without thinking and I can’t remember the site’s URL. I would like to go back to it to look at a couple other things but now I can’t.

The 3rd party cookies went to an IP address that resolved to a dsl connection. That should make one nervous too.

Ok, now to the form that got me to thinking. The form was a simple newsletter sign up form. It had 5 fields, first and last name, e-mail address twice, and zip code. The alarms went off for me there too. Too much information requested for a simple newsletter.

The form’s fields all called a Javascript that ran some sanity checking and a sanitization routine that deleted all non-alphanumeric characters except the @ and period. I found the form frustrating in that it didn’t tell you what was going on. If you typed a non-alphanumeric character, it would simply disappear as soon as you typed it. When you left the first e-mail field, it checked to verify it was a validly formed e-mail address. If it wasn’t valid, it put your cursor back into the now blank field without telling you what was going on. The second field verified it matched the first. Again, it didn’t tell you what was wrong, it would just make the second field blank if it didn’t match. The submit button was dimmed out until everything was ok, something I liked.

I also noticed that the action of the form didn’t quite match the website url, using the same IP as the cookie. This got me curious so I quickly created a form duplicating the website’s form without all the Javascript. Yep, the site was vulnerable to cross-site forgery and apparently did absolutely no server side sanity and sanitation as I was able to input a totally invalid data into each field which included semi-colons and apostrophes. The response page humorously tried to display I believe the results verifying my data and failed but I did see the first name which was “Fred Fl;’ntstone” – both semi-colon and apostrophe had not been changed to html entities or escaped as far as I could tell. It probably was vulnerable as well then to cross-site scripting (XSS), code injection etc.


First of all, this form was an example of a good user interface done poorly. Not providing feedback regarding what was wrong when you type something incorrectly is sad. I am getting more and more attuned to the goodness that Javascript can do for immediate data validation to the user. Earlier today I filled out a form that waited until after I had submitted it to tell me that my e-mail addresses didn’t match. The blog’s form at least was validating the e-mail addresses matched before you submitted them. That was good. Not telling me what was going on was bad!

Second, it was apparent that they had done nothing to prevent cross-site forgery. In my experience, this has been one of the most over looked and most exploited problem a lot of forms have – especially contact forms. At my old ISP, we had many a customer whose contact form was used for spamming because of this exploit. Folks! Take steps to prevent this. It isn’t that hard.

Of course, the spammers could take advantage of the cross-site forgery  because they had little or no data validation/sanitization on the server side (and none on the client side either).

I don’t care how much data validation one does on the client-side via Javascript, it is vulnerable. With the (im)proper tools, even simple ones such as Greasemonkey, client-side Javascript can be fooled, manipulated, and down right abused. I don’t know how Ajaxian got hacked but it would appear that they certainly didn’t have proper server side sanitization in place for data being submitted. I wouldn’t be surprised if Ajaxian relies a lot on the technology they promote and have neglected server-side data filtering. I do believe they would be very aware of XSS and have done everything they can to prevent it but if they have neglected the server side of things…

Folks, be paranoid regarding data coming into your server from a form. Do not trust it, no matter the source, no matter that the form is password protected, no matter only you submit it, do not trust it. Take every piece of data that you are going to use and throw it through a battery of tests, validations, sanitations and eliminations.

I suggest that you go one step further and nullify any data coming in that is not expected or used. Rather dramatic I know but I am paranoid. In my php code, obviously, register_globals is off. I rarely use GETs. I automatically set the server global $_GET to an empty array as well as $_REQUEST and a couple other globals and other vars such as $HTTP_GET_VARS that can be set based on data sent to the server from a web page. If I am not expecting a file upload, I also set $_FILES  to an empty array if it is not empty. I do this all before I do any other coding.

I do not rely on magic_quotes, especially since at this time, that setting is to be deleted in php6. Rather I escape and/or use html_entities to do some generic data sanitization. For much of my form data, I pop it through one of two methods to do further paranoid stuff. An example of what I do is this:
$o_page           = new Page(); // an object that eventually outputs an html page

$a_allowed_fields = array( "first_name", "last_name", "email_address" );
$a_cleaned_data   = $o_page->clean_array_values( $_POST, $a_allowed_fields );
$first_name       = $o_page->make_alphanumeric( $a_clean_
data[ "first_name" ] );
$last_name        = $o_page->make_alphanumeric( $a_clean_data[ "last_name" ] );
$email_address    = $o_page->make_internet_sane( $a_clean_data[ "email_address" ] );
if( $o_page->is_email_address( $email_address ) === FALSE ) {
  $email_address = "invalid_email_address@mydomain.com";
... other stuff

The clean_array method takes any associative array and an optional array which lists the keys to clean. I have a couple three different versions of it. Here is an example:

public function clean_array_values( $array, $a_allowed_keys = array() ) /*{{{*/
  $a_clean = array();
  foreach( $array as $key=>$value ) {
    if( is_array( $value ) ) {
      $a_clean[$key] = $this->clean_array_values( $value );
    } else {
      $value = trim( $value );
      if( count( $a_allowed_keys ) >= 1 ) {
        if( in_array( $key, $a_allowed_keys ) ) {
          $a_clean[$key] = htmlentities( $value, ENT_QUOTES );
      } else {
        $a_clean[$key] = htmlentities( $value, ENT_QUOTES );
  return $a_clean;
} /*}}}*/

The make_alphanumeric method is probably one of the standard code snippets running around the web

public function make_alphanumeric( $the_string ) /*{{{*/
  $the_string = str_replace( " ", "_", $the_string );
  return preg_replace( "/[^a-zA-Z0-9_\-]/", "", $the_string );
} /*}}}*/

The make_internet_sane method is very similar to the make_alphanumeric method except it allows for a few more non-alphanumeric characters that are allowed in e-mail addresses and changes all alpha to lower case. The is_email_address method is a variation of a multitude of functions out on the Net that validate e-mail addresses.

Now if you are asking, “Why do all this when it can be done at the client?” you have missed my whole point. The client side work is to help the user enter valid data – and that help should be clear to the user. The server side is to prevent the malicious attacks that by-pass all the client-side work. Yes, it does slow down the processing of the form on the server. But I would rather have a small performance hit than a web site taken down completely by hackers!

Let me be clear about this rant. I believe client-side data validation is good and proper. But! You can not rely on it alone to keep your site from being compromised. You must have the uber dishwasher of server side data cleaning in place and use it.

Leave a Reply