Rant: Form Follies

Recently I ran across a blog that real­ly got me going for a short time. It required Java, Javascript, and cook­ies includ­ing 3rd par­ty cook­ies to view the web page. A friend of mine got sus­pi­cous of the site and asked me to look at it. What I dis­cov­ered was a lot of stu­pid­i­ty and a form that quite frankly was an invi­ta­tion to mali­cious peo­ple scream­ing “Hack My Site!” Of course, the site itself was so ama­teur­ish look­ing I wasn’t sure it hadn’t already been hacked or worse, it was a mali­cious site in and of itself. But it got me to think­ing about poor cod­ing prac­tices — espe­cial­ly sites that rely on client side san­i­ty and san­i­ta­tion pri­mar­i­ly.
Ok, now this site was most like­ly mali­cious so hav­ing real­ly bad code shouldn’t sur­prise me. As such, I let my frus­tra­tion over poor cod­ing prac­tices on the web go back to a very low sim­mer. But this morn­ing I noticed that Ajaxian had an arti­cle that had been hacked seri­ous­ly with tons of links to Viagra etc. They got it fixed soon enough (there were like 4 com­ments point­ing out the spam) but it brought this whole rant back and I decid­ed write it.

First of all, my tool of inves­ti­ga­tion was Firefox 3 with the fol­low­ing exten­sions: Firebug, JSView, Live HTTP Headers, Web Developer, and of course Greasemonkey. In some ways, all of them com­bined was like throw­ing a nuke into a water gun fight. The site I was look­ing at real­ly was some­what crude, code wise.

I did all of the inves­ti­ga­tion in a vir­tu­al pc in case there was some­thing mali­cious on the site itself, which is what my friend feared. I real­ly was bugged by the require­ment for Java when the only Java applet I saw was a news tick­er that had fair­ly old news. It could have been a tro­jan but I didn’t wor­ry about it since I was going to throw the vir­tu­al image away. Sadly, I did so with­out think­ing and I can’t remem­ber the site’s URL. I would like to go back to it to look at a cou­ple oth­er things but now I can’t.

The 3rd par­ty cook­ies went to an IP address that resolved to a dsl con­nec­tion. That should make one ner­vous too.

Ok, now to the form that got me to think­ing. The form was a sim­ple newslet­ter sign up form. It had 5 fields, first and last name, e-mail address twice, and zip code. The alarms went off for me there too. Too much infor­ma­tion request­ed for a sim­ple newslet­ter.

The form’s fields all called a Javascript that ran some san­i­ty check­ing and a san­i­ti­za­tion rou­tine that delet­ed all non-alphanu­mer­ic char­ac­ters except the @ and peri­od. I found the form frus­trat­ing in that it didn’t tell you what was going on. If you typed a non-alphanu­mer­ic char­ac­ter, it would sim­ply dis­ap­pear as soon as you typed it. When you left the first e-mail field, it checked to ver­i­fy it was a valid­ly formed e-mail address. If it wasn’t valid, it put your cur­sor back into the now blank field with­out telling you what was going on. The sec­ond field ver­i­fied it matched the first. Again, it didn’t tell you what was wrong, it would just make the sec­ond field blank if it didn’t match. The sub­mit but­ton was dimmed out until every­thing was ok, some­thing I liked.

I also noticed that the action of the form didn’t quite match the web­site url, using the same IP as the cook­ie. This got me curi­ous so I quick­ly cre­at­ed a form dupli­cat­ing the website’s form with­out all the Javascript. Yep, the site was vul­ner­a­ble to cross-site forgery and appar­ent­ly did absolute­ly no serv­er side san­i­ty and san­i­ta­tion as I was able to input a total­ly invalid data into each field which includ­ed semi-colons and apos­tro­phes. The response page humor­ous­ly tried to dis­play I believe the results ver­i­fy­ing my data and failed but I did see the first name which was “Fred Fl;‘ntstone” — both semi-colon and apos­tro­phe had not been changed to html enti­ties or escaped as far as I could tell. It prob­a­bly was vul­ner­a­ble as well then to cross-site script­ing (XSS), code injec­tion etc.


First of all, this form was an exam­ple of a good user inter­face done poor­ly. Not pro­vid­ing feed­back regard­ing what was wrong when you type some­thing incor­rect­ly is sad. I am get­ting more and more attuned to the good­ness that Javascript can do for imme­di­ate data val­i­da­tion to the user. Earlier today I filled out a form that wait­ed until after I had sub­mit­ted it to tell me that my e-mail address­es didn’t match. The blog’s form at least was val­i­dat­ing the e-mail address­es matched before you sub­mit­ted them. That was good. Not telling me what was going on was bad!

Second, it was appar­ent that they had done noth­ing to pre­vent cross-site forgery. In my expe­ri­ence, this has been one of the most over looked and most exploit­ed prob­lem a lot of forms have — espe­cial­ly con­tact forms. At my old ISP, we had many a cus­tomer whose con­tact form was used for spam­ming because of this exploit. Folks! Take steps to pre­vent this. It isn’t that hard.

Of course, the spam­mers could take advan­tage of the cross-site forgery  because they had lit­tle or no data validation/sanitization on the serv­er side (and none on the client side either).

I don’t care how much data val­i­da­tion one does on the client-side via Javascript, it is vul­ner­a­ble. With the (im)proper tools, even sim­ple ones such as Greasemonkey, client-side Javascript can be fooled, manip­u­lat­ed, and down right abused. I don’t know how Ajaxian got hacked but it would appear that they cer­tain­ly didn’t have prop­er serv­er side san­i­ti­za­tion in place for data being sub­mit­ted. I wouldn’t be sur­prised if Ajaxian relies a lot on the tech­nol­o­gy they pro­mote and have neglect­ed serv­er-side data fil­ter­ing. I do believe they would be very aware of XSS and have done every­thing they can to pre­vent it but if they have neglect­ed the serv­er side of things…

Folks, be para­noid regard­ing data com­ing into your serv­er from a form. Do not trust it, no mat­ter the source, no mat­ter that the form is pass­word pro­tect­ed, no mat­ter only you sub­mit it, do not trust it. Take every piece of data that you are going to use and throw it through a bat­tery of tests, val­i­da­tions, san­i­ta­tions and elim­i­na­tions.

I sug­gest that you go one step fur­ther and nul­li­fy any data com­ing in that is not expect­ed or used. Rather dra­mat­ic I know but I am para­noid. In my php code, obvi­ous­ly, register_globals is off. I rarely use GETs. I auto­mat­i­cal­ly set the serv­er glob­al $_GET to an emp­ty array as well as $_REQUEST and a cou­ple oth­er glob­als and oth­er vars such as $HTTP_GET_VARS that can be set based on data sent to the serv­er from a web page. If I am not expect­ing a file upload, I also set $_FILES  to an emp­ty array if it is not emp­ty. I do this all before I do any oth­er cod­ing.

I do not rely on magic_quotes, espe­cial­ly since at this time, that set­ting is to be delet­ed in php6. Rather I escape and/or use html_entities to do some gener­ic data san­i­ti­za­tion. For much of my form data, I pop it through one of two meth­ods to do fur­ther para­noid stuff. An exam­ple of what I do is this:
$o_page           = new Page(); // an object that eventually outputs an html page

$a_allowed_fields = array( "first_name", "last_name", "email_address" );
$a_cleaned_data   = $o_page->clean_array_values( $_POST, $a_allowed_fields );
$first_name       = $o_page->make_alphanumeric( $a_clean_
data[ "first_name" ] );
$last_name        = $o_page->make_alphanumeric( $a_clean_data[ "last_name" ] );
$email_address    = $o_page->make_internet_sane( $a_clean_data[ "email_address" ] );
if( $o_page->is_email_address( $email_address ) === FALSE ) {
  $email_address = "invalid_email_address@mydomain.com";
... other stuff

The clean_array method takes any asso­cia­tive array and an option­al array which lists the keys to clean. I have a cou­ple three dif­fer­ent ver­sions of it. Here is an exam­ple:

public function clean_array_values( $array, $a_allowed_keys = array() ) /*{{{*/
  $a_clean = array();
  foreach( $array as $key=>$value ) {
    if( is_array( $value ) ) {
      $a_clean[$key] = $this->clean_array_values( $value );
    } else {
      $value = trim( $value );
      if( count( $a_allowed_keys ) >= 1 ) {
        if( in_array( $key, $a_allowed_keys ) ) {
          $a_clean[$key] = htmlentities( $value, ENT_QUOTES );
      } else {
        $a_clean[$key] = htmlentities( $value, ENT_QUOTES );
  return $a_clean;
} /*}}}*/

The make_alphanumeric method is prob­a­bly one of the stan­dard code snip­pets run­ning around the web

public function make_alphanumeric( $the_string ) /*{{{*/
  $the_string = str_replace( " ", "_", $the_string );
  return preg_replace( "/[^a-zA-Z0-9_\-]/", "", $the_string );
} /*}}}*/

The make_internet_sane method is very sim­i­lar to the make_alphanumeric method except it allows for a few more non-alphanu­mer­ic char­ac­ters that are allowed in e-mail address­es and changes all alpha to low­er case. The is_email_address method is a vari­a­tion of a mul­ti­tude of func­tions out on the Net that val­i­date e-mail address­es.

Now if you are ask­ing, “Why do all this when it can be done at the client?” you have missed my whole point. The client side work is to help the user enter valid data — and that help should be clear to the user. The serv­er side is to pre­vent the mali­cious attacks that by-pass all the client-side work. Yes, it does slow down the pro­cess­ing of the form on the serv­er. But I would rather have a small per­for­mance hit than a web site tak­en down com­plete­ly by hack­ers!

Let me be clear about this rant. I believe client-side data val­i­da­tion is good and prop­er. But! You can not rely on it alone to keep your site from being com­pro­mised. You must have the uber dish­wash­er of serv­er side data clean­ing in place and use it.

Leave a Reply