Rant: Form Follies

Recently I ran across a blog that really got me going for a short time. It required Java, Javascript, and cook­ies includ­ing 3rd party cook­ies to view the web page. A friend of mine got sus­pi­cous of the site and asked me to look at it. What I dis­cov­ered was a lot of stu­pid­ity and a form that quite frankly was an invi­ta­tion to mali­cious peo­ple scream­ing “Hack My Site!” Of course, the site itself was so ama­teur­ish look­ing I wasn’t sure it hadn’t already been hacked or worse, it was a mali­cious site in and of itself. But it got me to think­ing about poor cod­ing prac­tices — espe­cially sites that rely on client side san­ity and san­i­ta­tion pri­mar­ily.
Ok, now this site was most likely mali­cious so hav­ing really bad code shouldn’t sur­prise me. As such, I let my frus­tra­tion over poor cod­ing prac­tices on the web go back to a very low sim­mer. But this morn­ing I noticed that Ajaxian had an arti­cle that had been hacked seri­ously with tons of links to Viagra etc. They got it fixed soon enough (there were like 4 com­ments point­ing out the spam) but it brought this whole rant back and I decided write it.

First of all, my tool of inves­ti­ga­tion was Firefox 3 with the fol­low­ing exten­sions: Firebug, JSView, Live HTTP Headers, Web Developer, and of course Greasemonkey. In some ways, all of them com­bined was like throw­ing a nuke into a water gun fight. The site I was look­ing at really was some­what crude, code wise.

I did all of the inves­ti­ga­tion in a vir­tual pc in case there was some­thing mali­cious on the site itself, which is what my friend feared. I really was bugged by the require­ment for Java when the only Java applet I saw was a news ticker that had fairly old news. It could have been a tro­jan but <shrug> I didn’t worry about it since I was going to throw the vir­tual image away. Sadly, I did so with­out think­ing and I can’t remem­ber the site’s URL. I would like to go back to it to look at a cou­ple other things but now I can’t.

The 3rd party cook­ies went to an IP address that resolved to a dsl con­nec­tion. That should make one ner­vous too.

Ok, now to the form that got me to think­ing. The form was a sim­ple newslet­ter sign up form. It had 5 fields, first and last name, e-mail address twice, and zip code. The alarms went off for me there too. Too much infor­ma­tion requested for a sim­ple newsletter.

The form’s fields all called a Javascript that ran some san­ity check­ing and a san­i­ti­za­tion rou­tine that deleted all non-alphanumeric char­ac­ters except the @ and period. I found the form frus­trat­ing in that it didn’t tell you what was going on. If you typed a non-alphanumeric char­ac­ter, it would sim­ply dis­ap­pear as soon as you typed it. When you left the first e-mail field, it checked to ver­ify it was a validly formed e-mail address. If it wasn’t valid, it put your cur­sor back into the now blank field with­out telling you what was going on. The sec­ond field ver­i­fied it matched the first. Again, it didn’t tell you what was wrong, it would just make the sec­ond field blank if it didn’t match. The sub­mit but­ton was dimmed out until every­thing was ok, some­thing I liked.

I also noticed that the action of the form didn’t quite match the web­site url, using the same IP as the cookie. This got me curi­ous so I quickly cre­ated a form dupli­cat­ing the website’s form with­out all the Javascript. Yep, the site was vul­ner­a­ble to cross-site forgery and appar­ently did absolutely no server side san­ity and san­i­ta­tion as I was able to input a totally invalid data into each field which included semi-colons and apos­tro­phes. The response page humor­ously tried to dis­play I believe the results ver­i­fy­ing my data and failed but I did see the first name which was “Fred Fl;‘ntstone” — both semi-colon and apos­tro­phe had not been changed to html enti­ties or escaped as far as I could tell. It prob­a­bly was vul­ner­a­ble as well then to cross-site script­ing (XSS), code injec­tion etc.

<RANT ON>

First of all, this form was an exam­ple of a good user inter­face done poorly. Not pro­vid­ing feed­back regard­ing what was wrong when you type some­thing incor­rectly is sad. I am get­ting more and more attuned to the good­ness that Javascript can do for imme­di­ate data val­i­da­tion to the user. Earlier today I filled out a form that waited until after I had sub­mit­ted it to tell me that my e-mail addresses didn’t match. The blog’s form at least was val­i­dat­ing the e-mail addresses matched before you sub­mit­ted them. That was good. Not telling me what was going on was bad!

Second, it was appar­ent that they had done noth­ing to pre­vent cross-site forgery. In my expe­ri­ence, this has been one of the most over looked and most exploited prob­lem a lot of forms have — espe­cially con­tact forms. At my old ISP, we had many a cus­tomer whose con­tact form was used for spam­ming because of this exploit. Folks! Take steps to pre­vent this. It isn’t that hard.

Of course, the spam­mers could take advan­tage of the cross-site forgery  because they had lit­tle or no data validation/sanitization on the server side (and none on the client side either).

I don’t care how much data val­i­da­tion one does on the client-side via Javascript, it is vul­ner­a­ble. With the (im)proper tools, even sim­ple ones such as Greasemonkey, client-side Javascript can be fooled, manip­u­lated, and down right abused. I don’t know how Ajaxian got hacked but it would appear that they cer­tainly didn’t have proper server side san­i­ti­za­tion in place for data being sub­mit­ted. I wouldn’t be sur­prised if Ajaxian relies a lot on the tech­nol­ogy they pro­mote and have neglected server-side data fil­ter­ing. I do believe they would be very aware of XSS and have done every­thing they can to pre­vent it but if they have neglected the server side of things…

Folks, be para­noid regard­ing data com­ing into your server from a form. Do not trust it, no mat­ter the source, no mat­ter that the form is pass­word pro­tected, no mat­ter only you sub­mit it, do not trust it. Take every piece of data that you are going to use and throw it through a bat­tery of tests, val­i­da­tions, san­i­ta­tions and eliminations.

I sug­gest that you go one step fur­ther and nul­lify any data com­ing in that is not expected or used. Rather dra­matic I know but I am para­noid. In my php code, obvi­ously, register_globals is off. I rarely use GETs. I auto­mat­i­cally set the server global $_GET to an empty array as well as $_REQUEST and a cou­ple other glob­als and other vars such as $HTTP_GET_VARS that can be set based on data sent to the server from a web page. If I am not expect­ing a file upload, I also set $_FILES  to an empty array if it is not empty. I do this all before I do any other coding.

I do not rely on magic_quotes, espe­cially since at this time, that set­ting is to be deleted in php6. Rather I escape and/or use html_entities to do some generic data san­i­ti­za­tion. For much of my form data, I pop it through one of two meth­ods to do fur­ther para­noid stuff. An exam­ple of what I do is this:
<?php
$o_page           = new Page(); // an object that eventually outputs an html page

$a_allowed_fields = array( "first_name", "last_name", "email_address" );
$a_cleaned_data   = $o_page->clean_array_values( $_POST, $a_allowed_fields );
$first_name       = $o_page->make_alphanumeric( $a_clean_
data[ "first_name" ] );
$last_name        = $o_page->make_alphanumeric( $a_clean_data[ "last_name" ] );
$email_address    = $o_page->make_internet_sane( $a_clean_data[ "email_address" ] );
if( $o_page->is_email_address( $email_address ) === FALSE ) {
  $email_address = "invalid_email_address@mydomain.com";
}
... other stuff
?>

The clean_array method takes any asso­cia­tive array and an optional array which lists the keys to clean. I have a cou­ple three dif­fer­ent ver­sions of it. Here is an exam­ple:

public function clean_array_values( $array, $a_allowed_keys = array() ) /*{{{*/
{
  $a_clean = array();
  foreach( $array as $key=>$value ) {
    if( is_array( $value ) ) {
      $a_clean[$key] = $this->clean_array_values( $value );
    } else {
      $value = trim( $value );
      if( count( $a_allowed_keys ) >= 1 ) {
        if( in_array( $key, $a_allowed_keys ) ) {
          $a_clean[$key] = htmlentities( $value, ENT_QUOTES );
        }
      } else {
        $a_clean[$key] = htmlentities( $value, ENT_QUOTES );
      }
    }
  }
  return $a_clean;
} /*}}}*/

The make_alphanumeric method is prob­a­bly one of the stan­dard code snip­pets run­ning around the web

public function make_alphanumeric( $the_string ) /*{{{*/
{
  $the_string = str_replace( " ", "_", $the_string );
  return preg_replace( "/[^a-zA-Z0-9_\-]/", "", $the_string );
} /*}}}*/

The make_internet_sane method is very sim­i­lar to the make_alphanumeric method except it allows for a few more non-alphanumeric char­ac­ters that are allowed in e-mail addresses and changes all alpha to lower case. The is_email_address method is a vari­a­tion of a mul­ti­tude of func­tions out on the Net that val­i­date e-mail addresses.

Now if you are ask­ing, “Why do all this when it can be done at the client?” you have missed my whole point. The client side work is to help the user enter valid data — and that help should be clear to the user. The server side is to pre­vent the mali­cious attacks that by-pass all the client-side work. Yes, it does slow down the pro­cess­ing of the form on the server. But I would rather have a small per­for­mance hit than a web site taken down com­pletely by hackers!

Let me be clear about this rant. I believe client-side data val­i­da­tion is good and proper. But! You can not rely on it alone to keep your site from being com­pro­mised. You must have the uber dish­washer of server side data clean­ing in place and use it.

No Comments

Post a Comment

You must be logged in to post a comment.