gnegg programming with passion


15
Sep/09
2

Introducing sacy, the Smarty Asset Compiler

We all know how beneficial to the performance of a web application it can be to serve assets like CSS files and JavaScript files in larger chunks as opposed to smaller ones.

The main reason behind this is the latency incurring from requesting a resource from the server plus the additional bandwidth of the request metadata which can grow quite large when you take cookies into account.

But knowing this, we also want to keep files separate during development to help us with the debugging and development process. We also want the deployment to not increase too much in difficulty, so we naturally dislike solutions that require additional scripts to run at deployment time.

And we certainly don't want to mess with the client-side caching that HTTP provides.

And maybe we're using Smarty and PHP.

So this is where sacy, the Smarty Asset Compiler plugin comes in.

The only thing (besides a one-time configuration of the plugin) you have to do during development is to wrap all your <link>-Tags with {asset_compile}....{/asset_compile} and the plugin will do everything else for you, where everything includes:

  • automatic detection of actually linked files
  • automatic detection of changed files
  • automatic minimizing of linked files
  • compilation of all linked files into one big file
  • linking that big file for your clients to consume. Because the file is still served by your webserver, there's no need for complicated handling of client-side caching methods (ETag, If-Modified-Since and friends): Your webserver does all that for you.
  • Because the cached file gets a new URL every time any of the corresponding source files change, you can be sure that requesting clients will retrieve the correct, up-to-date version of your assets.
  • sacy handles concurrency, without even blocking while one process is writing the compiled file (and of course without corrputing said file).

sacy is released under the MIT license and ready to be used (though it currently only handles CSS files and ignores the media-attribute - stuff I'm going to change over the next few days).

Interested? Visit the project's page on GitHub or even better, fork it and help improving it!

26
Jun/08
0

Converting ogg streams into mp3 ones

This is just an announcement for my newest quick-hack which can be used to on-the-fly convert streams from webradios which use the ogg/vorbis format into the mp3 format which is more widely supported by the various devices out there.

I have created an own dedicated page for the project for those who are interested.

Also, I really got to like github.com, not as the commercial service they intend to be (I've already written about the stupidity of hosting your company trade secrets at a company in a foreign country with foreign legislation), but as a place to quickly and easily dump some code you want to be publically available without going through all the hassle otherwise associated with public project hosting.

This is why this little script is hosted there and not here. As I'm using git, even if github goes away, I still have the full repository around to either self-host or let someone else host for me, which is a crucial requirement for me to outsource anything.

Tagged as: , , ,
25
Jun/08
2

Simplest possible RPCs in PHP

After spending hours to find out why a particular combination of SoapClient in PHP itself and SOAP::Server from PEAR didn't consistenly work together (sometimes, arrays passed around lost an arbitrary number of elements), I thought about what would be needed to make RPCs work form a PHP client to a PHP server.

I wanted nothing fancy and I certainly wanted as less an overhead as humanly possible.

This is what I came up with for the server:

<?php
header('Content-Type: text/plain');
 
require_once('a/file/containing/a/class/you/want/to/expose.php');
 
$method = str_replace('/', '', $_SERVER['PATH_INFO']);
 
if ($_SERVER['REQUEST_METHOD'] != 'POST'){
   sendResponse(array('state' =&gt; 'error', 'cause' =&gt; 'unsuppored HTTP method'));
}
 
$s = new MyServerObject();
$params = unserialize(file_get_contents('php://input'));
if ( ($res = call_user_func_array(array($s, $method), $params)) === false)
   sendResponse(array('state' => 'error', 'cause' => 'RPC failed'));
if (is_object($res))
   $res = get_object_vars($res);
sendResponse($res);
 
function sendResponse($resobj){
    echo serialize($resobj);
    exit;
 
}
 
?>

This client as shown below is a bit more complex, mainly because it contains some HTTP protocol logic. Logic, which could possibly be reduced to 2-3 lines of code if I'd use the CURL library, but the client in this case does not have the luxury of having access to such functionality.

Also, I've already had the function laying around (/me winks at domi), so that's what I used (as opposed to file_get_contents with a pre-prepared stream context). This way, we DO have the advantage of learning a bit of how HTTP works and we are totally self-contained.

<?php
class Client{
    function __call($name, $args){
        $req = $this-&gt;openHTTPRequest('http://localhost:5436/restapi.php/'.$name, 'POST', array('Content-Type' =&gt; 'text/plain'), serialize($args));
        $data = unserialize(stream_get_contents($req['handle']));
        fclose($req['handle']);
        return $data;
    }
    private function openHTTPRequest($url, $method = 'GET', $additional_headers = null, $data = null){
        $parts = parse_url($url);
 
        $fp = fsockopen($parts['host'], $parts['port'] ? $parts['port'] : 80);
        fprintf($fp, "%s %s HTTP/1.1\r\n", $method, implode('?', array($parts['path'], $parts['query'])));
        fputs($fp, "Host: ".$parts['host']."\r\n");
        if ($data){
            fputs($fp, 'Content-Length: '.strlen($data)."\r\n");
        }
        if (is_array($additional_headers)){
            foreach($additional_headers as $name => $value){
                fprintf($fp, "%s: %s\r\n", $name, $value);
            }
        }
        fputs($fp, "Connection: close\r\n\r\n");
        if ($data)
            fputs($fp, "$data\r\n");
 
        // read away header
        $header = array();
        $response = "";
        while(!feof($fp)) {
            $line = trim(fgets($fp, 1024));
            if (empty($response)){
                $response = $line;
                continue;
            }
            if (empty($line)){
                break;
            }
            list($name, $value) = explode(':', $line, 2);
            $header[strtolower(trim($name))] = trim($value);
        }
        return array('response' => $response, 'header' => $header, 'handle' => $fp);
   }
 
}
 
$client = new Client();
$result = $client->someMethod(array('data' => 'even arrays work'));
 
?>

What you can't pass around this way is objects (at least object which are not of type stdClass) as both client and server would need to have access to the prototype. Also, this seriously lacks error handling. But it generally works much better than what SOAP ever could accomplish.

Naturally, I give up stuff when compared to SOAP or any «real» RPC solution:

  • This one works only with PHP
  • It has limitations on what data structures can be passed around, though that's aleviated by PHP's incredibly strong array support.
  • It relies heavily on PHP's loosely typed nature and thus probably isn't as robust.

Still, protocols like SOAP (or even any protocol with either «simple» or «lightweight» in its name) tend to be so complicated that it's incredibly hard if not impossible to create different implementations what still correctly work together in all cases.

In my case, where I have the problem of having to separate two pieces of the same application due to unstable third-party libraries which I would not want to have linked into every PHP instance running on that server for which the solution outlined above (plus some error handling code) works better than SOAP on so many levels:

  • it's easily debuggable. No need for wireshark or comparable tools
  • client and server are written by me, so they are under my full control
  • it works all the time
  • it relies on as little functionality of PHP as possible and the functionality it depends on is widely used and tested, to I can assume that it's reasonably bug-free (aside of my own bugs).
  • it's a whole lot faster than SOAP, though this does not matter at all in this case.
Tagged as: , ,
31
Aug/07
0

PHP 5.2.4

Today, the bugfix-release 5.2.4 of PHP has been released.

This is an interesting release, because it includes my fix for bug 42117 which I discovered and fixed a couple of weeks ago.

This means that with PHP 5.2.4 I will finally be able to bzip2-encode data as it is generated on the server and stream it out to the client, greatly speeding up our windows client.

Now I only need to wait for the updated gentoo package to update our servers.

27
Jul/07
1

PHP, stream filters, bzip2.compress

Maybe you remember that, more than a year ago, I had an interesting problem with stream filters.

The general idea is that I want to output bz2-compressed data to the client as the output is being assembled - or, more to the point: The PopScan Windows-Client supports the transmission of bzip2 encoded data which gets really interesting as the amount of data to be transferred increases.

Even more so: The transmitted data is in XML format which is very easily compressed - especially with bzip2.

Once you begin to transmit multiple megabytes of uncompressed XML-data, you begin to see the sense in jumping through a hoop or two to decrease the time needed to transmit the data.

On the receiving end, I have an elaborate construct capable of downloading, decompressing, parsing and storing data as it arrives over the network.

On the sending end though, I have been less lucky: Because of that problem I had, I was unable to stream out bzip2 compressed data as it was generated - the end of the file was sometimes missing. This is why I'm using ob_start() to gather all the output and then compress it with bzcompress() to send it out.

Of course this means that all the data must be assembled before it can be compressed and the sent to the client.

As we have more and more data to transmit, the client must wait longer and longer before the data begins to reach it.

And then comes the moment when the client times out.

So I finally really had to fix the problem. I could not believe that I was unable to compress and stream out data on the fly.

It turns out that I finally found the smallest possible amount of code to illustrate the problem in a non-hacky way:

So: This fails under PHP up until 5.2.3:

<?
$str = "BEGIN (%d)\n
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip
ex ea commodo consequat. Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur
sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt
mollit anim id est laborum.
\nEND (%d)\n";

$h = fopen($_SERVER['argv'][1], 'w');
$f = stream_filter_append($h, "bzip2.compress", STREAM_FILTER_WRITE);
for($x=0; $x < 10000; $x++){
   fprintf($h, $str, $x, $x);

}
fclose($h);
echo "Written\n";
?>

Even worse though: It doesn't fail with a message, but it writes out a corrupt bzip-File.

And it gets worse: With a little amount of data it works, but as the amount of data increases, it begins to fail - at different places depending on how you shuffle the data around.

Above script will write a bzip file which - when uncompressed - will end around iteration 9600.

So now that I had a small reproducible testcase, I could report a bug in PHP: Bug 47117.

After spending so many hours on a problem which in the end boiled down to a bug in PHP (I've looked anywhere, believe me. I also tried workarounds, but all to no avail), I just could not let the story end there.

Some investigation quickly turned up a wrong check for a return value in bz2_filter.c which I was able to patch up very, very quickly, so if you visit that bug above, you will find a patch correcting the problem.

Then, when I finished patching PHP itself, hacking up the needed PHP-code to let the thing stream out the compressed data as it arrived was easy. If you want, you can have a look at bzcomp.phps which demonstrates how to plug the compression into either the output buffer handling or something quick, dirty and easier else.

Oh, and if you are tempted to do this:

function ob($buf){
        return bzcompress($buf);
}

ob_start('ob');

... it won't do any good because you will still gobble up all the data before compressing. And this:

function ob($buf){
        return bzcompress($buf);
}

ob_start('ob', 32768);

will encode in chunks (good), but it will write a bzip2-end-of-stream marker after every chunk (bad), so neither will work.

Nothing more satisfying than to fix a bug in someone else's code. Now let's hope this gets applied to PHP itself so I don't have to manually patch my installations.

10
Aug/06
2

Profiling PHP with Xdebug and KCacheGrind

Profiling can provide real revelations.

Sometimes, you have that gut feeling that a certain code path is the performance bottleneck. Then you go ahead and fix that only to see, that the code is still slow.

This is when a profiler kicks in: It helps you determine the real bottlenecks, so you can start fixing them

The PHP IDE I'm currently using, Zend Studio (it's the only PHP IDE filling my requirements on the Mac currently) does have a built-in profiler, but it's a real bitch to set up.

You need to install some binary component into your web server. Then the IDE should be able to debug and profile your application.

Emphasis on "should".

I got it to work once, but it broke soon after and I never really felt inclined to put more effort into this - even more so as I'm from time to time working with a snapshot version of PHP for which the provided binary component may not work at all.

There's an open source solution that works much better both in terms of information you can get out of it and in terms of ease of setup and use.

It's Xdebug.

On gentoo, installing is a matter of emerge dev-php5/xdebug and on other systems, pear install xdebug might do the trick.

Configuration is easy too.

Xdebug generates profiling information in the same format as valgrind, the incredible debugger the KDE people created.

And once you have that profiling information, you can use a tool like KCacheGrind to evaluate the data you've collected.

The tool provides some incredibly useful views of your code, making finding performance problems a joyful experience.

Best of all though is that I was able to compile KCacheGrind along with its dependencies on my MacBook Pro - another big advantage of having a real UNIX backend on your desktop.

By the way: Xdebug also is a debugger for PHP, though I've never used it for that as I never felt the need to step through PHP code. Because you don't have to compile it, you are often faster by instrumenting the code and just running the thing - especially once the code is spreading over a multitude of files.

20
Jul/06
1

Template engines complexity

The current edition of the german computer magazine iX has an article comparing different template engines for PHP.

When I read it, the old discussion about Smarty providing too many flow controlling options sprang to my mind again, even though that article itself doesn't say anything about whether providing a rich template language is good or not.

Many purists out there keep telling us that no flow control what so ever should be allowed in a template. The only thing a template should allow is to replace certain marker by some text. Nothing more.

Some other people insist, that having blocks which are parsed in a loop is ok too. But all the options Smarty provides are out of the question as it begins intermixing logic and design again.

I somewhat agree on that argument. But the problem is that if you are limited to simple replacements and maybe blocks, you begin to create logic in PHP which serves no other purpose than filling that specially created block structure.

What happens is that you end up with a layer of PHP (or whatever other language) code that's so closely tailored to the template (or even templates - the limitations of the block/replacement engines often require you to split a template into many partial file) that even the slightest changes in layout structure will require a rewrite in PHP.

Experience shows me that if you really intend to touch your templates to change the design, it won't suffice to change the order of some replacements here and there. You will be moving parts around and more often than not the new layout will force changes in the different blocks / template files (imagine marker {mark} moving from block HEAD to block FOOT).

So if you want to work with the down-stripped template engines while still keeping the layout easily exchangeable, you'll create layout-classes in PHP which get called from the core. These in turn use tightly coupled code to fill the templates.

When you change the layout, you'll dissect the page layouts again, recreate the wealth of template files / blocks and then update your layout classes. This means that changing the layout does in-fact require your PHP backend coders to work with the designers yet again.

Take smarty.

Basically you can feed a template a defined representation of view data (or even better: Your model data) in unlimited complexity and in raw form. You want to have floating numbers on your template represented with four significant digits? Not your problem with smarty. The template guys can do the formatting. You just feed a float to the template.

In other engines, formatting numbers for example is considered backend logic and thus must be done in PHP.

This means that when the design requirement in my example changes and numbers must be formatted with 6 significant digits, the designer is stuck. He must refer back to you, the programmer.

Not with Smarty. Remember: You got the whole data in a raw representation. A Smarty template guy, knows how to format Numbers from within Smarty. He just makes the change (which is a presentation change only) right in the template. No need to bother the backend programmer.

Furthermore, look at complex structures. Let's say a shopping cart. With Smarty, the backed can push the whole internal representation of that cart to the template (maybe after some cleaning up - I usually pass an associative array of data to the template to have a unified way of working with model data over all templates). Now it's your Smarty guys responsibility (and possibility) to do whatever job he has to do to format your model (the cart) in a way the current layout specification asks him to.

If the presentation of the cart changes (maybe some additional text info must be displayed what the template was not designed for in the first place), the model and the whole backend logic can stay the same. The template just uses the model object it's provided with to display that additional data.

Smarty is the template engine allowing to completely decouple the layout from the business logic.

And let's face it: Layout DOES in-fact contain logic: Alternating row colors, formatting numbers, displaying different texts if no entries could be found,...

When you remove logic from the layout, you will have to move it to the backend where it immediately means that you will need a backend worker whenever the layout logic changes (which it always does on redesigns).

Granted. Smarty isn't exactly easy to get used to for a HTML only guy.

But think of it: They managed to learn to replace <font> tags in their code with something more reasonable (CSS), that works completely differently and follows a completely different syntax.

What I want to say is that your layout guys are not stupid. They are well capable of learning the little bits of pieces of logic you'd want to have in your presentation layer. Let them have that responsibility means that you yourself can go back to the business logic once and for all. Your responsibility ends after pushing model objects to the view. The rest is the Smarty guys job.

Being in the process of redesigning a fully smarty-based application right now, I can tell you: It works. PHP does not need to get touched (mostly - design flaws exist everywhere). This is a BIG improvement over other stuff I've had to do before which was using the way everyone is calling clean: PHPLIB templates. I still remember fixing up tons and tons of PHP-code that was tightly coupled into the limited structure of the templates.

In my world, you can have one backend, no layout code in PHP and a unlimited amount of layout templates. Interchangable without changing anything in the PHP code. Without adding any PHP code when creating a new template.

Smarty is the only PHP template engine I know of that makes that dream come true.

Oh and btw, Smarty won the performance contest in that article with a lot of distance to the second fastest entry. So bloat can't be used as argument against smarty. Even if it IS bloated, it's not slower than non-bloated engines. It's faster.

Filed under: PHP, Programming
13
Jul/06
0

Blogroll is back – on steroids

I finally got around to adding an excerpt of the list of blogs I'm regularly reading to the navigation bar to the right.

The list is somewhat special as it's auto-updating: It refereshes every 30 minutes and displays a list of blogs in descending order of last-updated-time.

Adding the blogroll was a multi step process:

At first, I thought adding the Serendipity blogroll plugin and pointing it to my Newsgator subscription list (I'm using Newsgator to always have an up-to-date read-status in both Net News Wire and FeedDemon) was enough, but unfortunately, that did not turn out to be the case.

First, the expat module of the PHP installation on this server has a bug making it unable to parse files with the unicode byte order mark at the beginning (basically three bytes telling your machine if the document was encoded on a little- or big-endian machine). So it was clear that I had to do some restructuring of the OPML-feed (or patching around in the s9y plugin, or upgrading PHP).

Additionally, I wanted the list to be sorted in a way that the blogs with the most recent postings will be listed first.

My quickly hacked-together solution is this script which uses a RSS/Atom-parser I took from Wordpress, which means that the script is licensed under the GNU GPL (as the parser is).

I'm calling it from a cron-job once per 30 minutes (that's why the built-in cache is disabled on this configuration) to generate the OPML-file sorted by the individual feeds update time stamp.

That OPML-file then is fed into the serendipity plugin.

The only problem I now have is that the list is unfairly giving advantage to the aggregated feeds as these are updated much more often than individual persons blogs. In the future I will thus either create a penalty for these feeds, remove them from the list or just plain show more feeds on the page.

Still, this was a fun hack to do and fulfills its purpose. Think of it: Whenever I add a feed in either Net News Wire or FeedeDemon, it will automatically pop up on the blogroll on gnegg.ch - this is really nice.

On a side note: I could have used the Newsgator API to get the needed information faster and probably even without parsing the individual feeds. Still, I went the OMPL-way as that's an open format making the script useful for other people or for me should I ever change the service.

9
Feb/06
0

PHP Stream Filters

You know what I want? I want to append one of those nice and shiny PHP stream filters to the output stream.

I have this nice windows-application that recives a lot of XML-data that can be compressed with a very high compression factor. And as the windows application is for people with very limited bandwith, this seems to be the perfect thing to do.

You know, I CAN compress all my output already. By doing something like this:

<?php
ob_start();
echo "stuff";
$c = ob_get_clean();
echo bzcompress($c);
?>

The problem with this approach is that the data is only sent to the client once it's assembled completely. bzip2 on the other hand is a stream compressor that is very well able to compress a stream of data and send it out as soon as a chunk is ready.

The windows client on the reciving end is certainly capable of doing that. As soon as bytes come in, it decompresses it chunk-wise and feeds it to a Expat based parser which will handle the extracted data. Now I want this to happen on the sending side aswell.

The following code does work sometimes:

<?php
  $fh = fopen('php://stdout', 'w');
  stream_filter_append($fh, 'bzip2.compress', STREAM_FILTER_WRITE, $param);
  fwrite($fh, "Stuff");
  fclose($fh);
?>

But sometimes it doesn't and produces a incomplete bzip2-stream.

I have a certain idea of why this is happening (no sending out of data to the filter on shutdown), but I can't prevent it. Sometimes the data is not put out which makes this method unusable.

I'm afraid to report this to bugs.php.net as I'm sure it's something PHP was not designed for and it'll get marked as BOGUS faster than I can spell 'gnegg'.

So this means that the windows-client just has to wait for the data being extracted, converted to xml and compressed.

*sigh*

(thinking of it, there may be this option of outputting data to a temp-file (to which handle a filter is assigned to) and the read it out to the browser immediately afterwards. But come on, this can't be the solution, can it?)

Update: I've since tracked the problem to a bug in PHP itself for which I found a fix. My assumption of writing to a temporary file could help was wrong as PHP itself does not check the return value of a bzlib function correctly and never writes out a half-full buffer on stream close. Neither to the output stream nor to a file.

11
Jan/06
3

mp3act

When you have a home server, sooner or later your coworkers and friends (and if all is well even both in one person ;-) ) will want to have access to your library

Cablecom, my ISP, has this nice 6000/600 service, so there's plenty of upstream for others to use in principle. And you know: Here in Switzerland, the private copy among friends is still legal.

Well, last sunday it was time again. Richard wanted access to my large collection of audiobooks and if you know me (and you do as a reader of this blog), you'll know that I can't just give him those files on a DVD-R or something. No. A webbased mp3-library had to be found.

Last few times, I used Apache::MP3, but that grew kinda old on me. You know: It's a perl module and my home server does not have mod_perl installed. And I'm running Apache 2 for which Apache::MP3 is not ported yet AFAIK. And finally, I'm far more comfortable with PHP, so I wanted something written in that language so I could make a patch or two on my own.

I found mp[3]actmp3act which is written in PHP and provides a very, very nice AJAX based interface. Granted. It breaks the back-button, but everything else is very well done

And it's fast. Very fast.

Richard liked it and Christoph is currently trying to install it on his windows server, not as successful as he wants to be. mp3act is quite Unix-Only currently.

The project is in an early state of developement and certainly has a rough end here and there, but in the end, it's very well done, serves its need and is even easily modifiable (for me). Nice.

Filed under: PHP, Software