Planet PHP

Richard Heyes (phpguru.org) User Agents (9.2.2005, 15:17 UTC)

...suck. How simple is that? A godawful mish mash of crap that somehow manages (most of the time) to identify a browser... They've been with us for years, since the first Mozilla browser I guess, and are likely to be here for a good while longer. There is some documentation to be found explaining the format here and more recently, here. But look at this...

Update: Aaaah crap. Forgot to enable comments on this article. So if you want(ed) to post a comment, now you can.

Link

Marco TabiniMore about the php|a price drop (9.2.2005, 14:01 UTC)

Yesterday, we dropped the price of a print subscription to php|architect by a $20 (Canadian) per year. The price goes down to about $78 CAD, or around $59.99 US, substantially less than it used to be. For international customers, the cost of shipping the magazine overseas has also gone down.

There are a number of reasons for this, but perhaps the primary one is the fact that our price no longer reflected the costs of producing the magazine. I spent a considerable portion of 2004 finding ways to reduce our production costs in any way imaginable without compromising the quality of our contents.

I ended up being surprisingly successful in my mission, particularly when it came to shipping, which was the expense that most worried me, since the Canadian postal service seems to be on a runaway train when it comes to costs–the cost of shipping a single issue to the US has gone up by something like 40% since we started publishing in September of ‘03!

Cost-cutting measures are generally used to improve the bottom line, but in our case I felt that the savings should be passed on to our readers, many of whom have already told me more than once that they thought php|a was too expensive. I never argued with them on this point–except to say that there was nothing I could do (and there wasn’t at the time) because producing a magazine that is not supported by advertising and has a somewhat limited circulation due to its specialized nature was simply… well, expensive. Since things have changed, now it’s the time to pass the savings along.

Link

Wez Furlong50 gmail accounts up for grabs (9.2.2005, 06:08 UTC)

[Update: 35 remaining]

Yes, 50.

If you're smart enough to figure out how to email me (NOT post a comment on the blog; I need a functional email address from you) to ask for one, you're welcome to it.

Link

Marcus Baker (The Last Craft?) How did Google get it wrong? (9.2.2005, 02:15 UTC)

If you run a blog or Wiki you will be only too aware of the Google PageRank™ sytem. In case you have been on a rather extended holiday and/or in a long coma, it’s a system whereby your site climbs the search engine results page if lot’s of other people are linking to you. It’s not quite that simple, but that’s the gist. In competition with each other to promote sales of Viagra, or to get people hooked on gambling, various crooked characters deface public sites with gay abandon. They leave a trail of links pointing at their own sites, often with Chinese titles, all to boost their own PageRank. All to climb Google.

These comment spammers are not nice people.

They will happily destroy the content of a Wiki and overwrite every page. If they don’t get every one, then it’s usually because their script is too stupid to keep track of the pages it’s already written over and so cannot get to the now newly orphaned pages. These scripts hammer the site while they operate. Not only that, but the frequency of attacks is now at epidemic proportions. I get about three separate attacks a day on my blog and about five major attacks a day on this PageRank 7 wiki. Faced with the brutality and increasing frequency of these incursions, ISPs can take down servers believing that are under a denial of service attack. Even if they understand the phenomena, such attacks cause too much server load for the value of having the small blog customer. ISPs are starting to ban the use of tools like WordPress and MoveableType on their end user accounts.

OK, it’s not just Google to blame here, but all of the search engines. It’s just that Google’s system is the most well known and this has historically made it the main spam target. In a tacit acknowledgement of this, Google have decided to help the bloggers. Er…sort of.

Their solution is to allow you to take away the PageRank value of selected links. If you are maintaining Wiki/blog software then comment field links should have a “rel” attribute (uh?) set to “nofollow". That way the spammers will lose the incentive to spam you, because they will get no benefit from the links they leave. “Drat” they say, as the abandon their get rich quick scheme and go off to earn an honest wage.

The plan is so idiotic it’s almost surreal. It obviously in no way penalises the spammers, who are playing a percentage game anyway. So what if a few spams are ploughed into stoney ground? It does make the engine spider’s life a little easier of course, because it can spend less time indexing blogs. Lucky old engines, poor old webmasters who are expected to upgrade all of their software. Software that has been heavily customised and, given that few of these applications are design masterpieces, heavily hacked. I certainly won’t be upgrading when there is zero benefit. Even if I do, the new attribute has to survive RSS feeds and some old and not so smart news aggregators. Really I won’t have time anyway because I am too busy fighting spam.

What’s even more surreal though, is that the software authors are jumping on board and working on adding this as a feature. There is even talk of making it part of the HTML standard. This attribute is about as useful as the blink tag.

Suppose the engines had tackled it differently. Suppose that when your site was spammed, you could dispatch the content of the spam straight to Google, Yahoo, etc. They could then ban all of the links promoted with the dubious posting. A sort of “SpamBack". This changes the market forces significantly from the peddlars point of view. Far from ploughing on less furtile ground, they are now ploughing a mindfield. Rather than one hundred percent of everybody having to manually instruct the GoogleBot, all it would take would be a small percentage of spam aware applications to fight back. The spammers could not risk dumb spamming for fear of tripping these alarms.

I bet there are other simple solutions as well.

So how did Google get it wrong? There are smart people in Google, so did they not allocate enough time to this? Perhaps they lost touch? Can you see blogger peons working in a lowly office from the hallowed windows of a “plex"? Perhaps the Google blog could explain as it’s hardly a public relations coup. Whole sites have sprung up against “nofollow”.

Link

Chris ShiflettMore on Filtering Input and Escaping Output (9.2.2005, 01:37 UTC)

In my previous blog entry, I summarized the two most important steps (in my opinion) that all PHP developers should take to help secure their applications:

Filter input
Escape output

These are essentially "the least you can do" in terms of security. I consider anything less to be negligent (we all make mistakes, but these mistakes should be the exception and not the norm).

To my surprise, this simple statement has already been misinterpreted, and this is what prompted me to try to clarify things. Robert Peake writes:

Chris Shiflett has an interesting post on his blog wherein he declares that all PHP security vulnerabilities come from either a lack of flitering input or escaping output.

I hope that's not what I said, especially since it is wrong. :-) Filtering input and escaping output certainly aren't going to protect you from everything, but these two steps can improve the security of your applications substantially with very little effort.

Of course, my simple list leaves out many details, and that's fine. As I mentioned before, this list provides a broad perspective that helps to keep you on track while you focus on the details. I'm trying to help you focus on what's most important, because it's not always practical to implement every safeguard that you know.

The challenge is identifying data that comes from some external source - what is input? Robert mentions something else that I want to correct:

What this really points out once again is that web applications written in PHP do not really need to focus on much more than absolutely everything that a malicious attacker could throw at you through GET, POST or COOKIES (unless they have access to your server ENVIRONMENT ... *shudder*). Once again this means that if register_globals is turned off, these variables can only make their way in neatly packaged into corresponding $_GET, $_POST, and $_COOKIE arrays (as well as $_SESSION).

It is true that all data in $_GET, $_POST, and $_COOKIE is sent from the client and therefore tainted. However, data within $_SESSION is not. This data is persisted on the server and never even exposed over the Internet (unless you have a custom session handler that specifically does this). If you filter data on input, then you will never store tainted data in a session variable. Therefore, you can trust $_SESSION.

$_SERVER contains a mixture. Some of this data is provided by the web server, and some is provided by the client. Try this simple quiz.

Where does the data in each of the following PHP variables originate?

1. $_SERVER['DOCUMENT_ROOT']
2. $_SERVER['HTTP_HOST']
3. $_SERVER['REQUEST_URI']
4. $_SERVER['SCRIPT_NAME']

Link

Christian Stocker (Bitflux Blog) XML error handling update (8.2.2005, 20:12 UTC)

Rob updated his patch for better XML error handling support in PHP 5.1.

You can now do something like this:

libxml_use_internal_errors(true);
$ret = $dom->load($file);
if (!$ret) {
  $errors = libxml_get_errors();
  foreach ($errors as $error) {
    print $error->message;
    if ($error->file) {
      print "in file ".$error->file; line:"
      print  $error->line;
    }
    print "
";
  }
  $errors = libxml_get_errors();
}

If you don't set libxml_use_internal_errors, it does work like in PHP 5.0 and the above code would return nothing.

It doesn't work 100% correctly for XSLT errors right now, but we will fix that until 5.1 comes out ;)

Update: The problem with some XSLT errors are now also fixed. Get the patch from the above location (or wait until it's committed into CVS)

Update 2: It's in CVS now.

Link

Sandro ZicLOTS of Workshops and Talks (8.2.2005, 18:52 UTC)

Soon, there will be LOTS of workshops and talks on Open Source in Switzerland. For the second time, this event takes place in Bern/Switzerland, from Feb 17-19.

This is my first time representing eZ systems at an Open Source event and I am looking forward to that.

I am going to do several presentation on the eZ publish CMS : a workshop, a talk, and a demo . You can grab the first eZ publish Live-CDs there, I am going to bring them with me. They are bootable CDs, based on Mandrake Move and having a ready-made eZ publish pre-installed.

At the end of LOTS, I will moderate a panel discussion on Open Events - The Open Source Bazaar . Basically, we will try to identify differences and similarities between Open Source and typical business events, the way how knowledge is shared at "Bazaars" like LOTS, etc.

Link

Christian Stocker (Bitflux Blog) XMLReader in PHP 5.1 (8.2.2005, 12:22 UTC)

Finally, XMLReader made it into the standard PHP distribution for the upcoming 5.1. No more fiddling with pecl (which works fine, btw ;) ), if you want XMLReader built in. It's not enabled by default, but it's in the sources and doesn't have more dependencies than DOM or SimpleXML (that is libxml2 2.6.x).

Documentation is a little bit sparse right now, but see my slides, my posts about Processing Large XML Documents (plus Update), the xmlreader interface tutorial from xmlsoft.org (this is for the C API, but the PHP API is very close and a good introduction into the general idea of xmlreader) or just look at the sources and search for {{{ proto (generally a good idea, if documentation is missing for an extension).

If you're still stuck in ext/xml aka SAX world, it's definitively worth giving a try (and it works also for PHP 5.0, if you install it from pecl).

Update: Rob's working on real documentation.

Link

Derick RethansWhy I Don't Use Debian's PHP Packages (8.2.2005, 08:11 UTC)

From their 4.3.10-3 changelog:

Enable Zend Thread Safety for all SAPIs, meaning that our modules are now compiled for ZTS APIs as well.

I couldn't believe that they did this, so I checked it in the source... their rules indeed include the --enable-experimental-zts switch. Tip: Compile your own PHP packages for Debian!

Link

Derick RethansWhy I Don't Use Debian's PHP Packages (8.2.2005, 08:11 UTC)

From their 4.3.10-3 changelog:

Enable Zend Thread Safety for all SAPIs, meaning that our modules are now compiled for ZTS APIs as well.

I couldn't believe that they did this, so I checked it in the source... their rules indeed include the --enable-experimental-zts switch. Tip: Compile your own PHP packages for Debian!

Link