Feb/100
PHP 5.3 and friends on Karmic
I have been patient. For months I hoped that Ubuntu would sooner or later get PHP 5.3, a release I'm very much looking forward to, mainly because of the addition of anonymous inner functions to spell the death of create_function or even eval.
We didn't get 5.3 for Karmic and who knows about Lucid even (it's crazy that nearly one year after the release of 5.3, there is still debate on whether to include it in the next version of Ubuntu that will be the current LTS release for the next four years. This is IMHO quite the disservice against PHP 5.3 adoption).
Anyways: We are in the process of releasing a huge update to PopScan that is heavily focussed on getting rid of cruft, increasing speed all over the place and increasing overall code quality. Especially the last part could benefit from having 5.3 and seeing that at this point PopScan already runs well on 5.3, I really wanted to upgrade.
In comes Al-Ubuntu-be, a coworker of mine and his awesome Debian packaging skills: Where there are already a few PPAs out there that contain a 5.3 package, Albe went the extra step and added not only PHP 5.3 but quite many other packages we depend upon that might also be useful to my readers. Packages like APC, memcache, imagick and xdebug for development.
While we can make no guarantees that these packages will be maintained heavily, they will get some security update treatment (though highly likely by version bumping as opposed to backporting).
So. If you are on Karmic (and later Lucid if it won't get 5.3) and want to run PHP 5.3 with APC and Memcache, head over to Albe's PPA.
Also, I'd like to take the opportunity to thank Albe for his efforts: Having a PPA with real .deb packages as opposed to just my self-compiled mess I would have done gives us a much nicer way of updating existing installations to 5.3 and even a much nicer path back to the original packages once they come out. Thanks a lot.
Feb/100
Things I can’t do with an iPhone/iPad
- have a VoIP call going on when a mobile call/SMS arrives
- read Kindle ebooks (I can now, but knowing Apple's stance on "competing functionality", with the advent of iBook, how long do you think this will last?)
- give it to our customers as another device to use with PopScan (It's not down-lockable and there's no way for centralized app deployment that doesn't go over apple)
- plug any peripheral that isn't apple sanctioned
- plug a peripheral and use it system-wide
- play a SNES ROM (or any other console rom)
- install Adblock (which especially hurts on the iPad)
- consistenly use IM (background notifications don't work consistently)
The iPhone provides me with many advantages and thus I can live with its inherent restrictions (which are completely arbitrary - there's no technical reason for them), but I see no point to buy yet another locked-down device that does half of the stuff I'd want it to do and does it half-assed at that.
Also it's a shame that Apple obviously doesn't need any corporate customers (at least for a small company, I see no possibility).
I just hope, the open and usable Mac computer remains. I would not know what to go back to? Windows? Never. Linux? Sure. But on what hardware?
Jan/100
How we use git
the following article was a comment I made on Hacker News, but as it's quite big and as I want to keep my stuff at a central place, I'm hereby reposting it and adding a bit of formating and shameless self-promotion (i.e. links):
My company is working on a - by now - quite large web application. Initially (2004), I began with CVS and then moved to SVN and in the second half of last year, to git (after a one-year period of personal use of git-svn).
We deploy the application for our customers - sometimes to our own servers (both self-hosted and in the cloud) and sometimes to their machines.
Until middle year, as a consequence of SVN's really crappy handling of branches (it can branch, but it fails at merging), we did very incremental development, adding features on customer requests and bugfixes as needed, often times uploading specific fixes to different sites, committing them to trunk, but rarely ever updating existing applications to trunk to keep them stable.
Huge mess.
With the switch to git, we also initiated a real release management, doing one feature release every six months and keeping the released versions on strict maintenance (for all intents and purposes - the web application is highly customizable and we do make exceptions in the customized parts as to react to immediate feature-wishes of clients).
What we are doing git-wise is the reverse of what the article shows: Bug-fixes are (usually) done on the release-branches, while all feature development (except of these customizations) is done on the main branch (we just use the git default name "master").
We branch off of master when another release date nears and then tag a specific revision of that branch as the "official" release.
There is a central gitosis repository which contains what is the "official" repository, but every one of us (4 people working on this - so we're small compared to other projects I guess) has their own gitorious clone which we heavily use for code-sharing and code review ("hey - look at this feature I've done here: Pull branch foobar from my gitorious repo to see...").
With this strict policy of (for all intents and purposes) "fixes only" and especially "no schema changes", we can even auto-update customer installations to the head of their respective release-branches which keeps their installations bug-free. This is a huge advantage over the mess we had before.
Now. As master develops and bug-fixes usually happen on the branch(es), how do we integrate them back into the mainline?
This is where the concept of the "Friday merge" comes in.
On Friday, my coworker or I usually merge all changes in the release-branches upwards until they reach master. Because it's only a week worth of code, conflicts rarely happen and if they do, we remember what the issue was.
If we do a commit on a branch that doesn't make sense on master because master has sufficiently changed or a better fix for the problem is in master, then we mark these with [DONTMERGE] in the commit message and revert them as part of the merge commit.
On the other hand, in case we come across a bug during development on master and we see how it would affect production systems badly (like a security flaw - not that they happen often) and if we have already devised a simple fix that is save to apply to the branch(es), we fix those on master and then cherry-pick them on the branches.
This concept of course heavily depends upon clean patches, which is another feature git excels at: Using features like interactive rebase and interactive add, we can actually create commits that
- Either do whitespace or functional changes. Never both.
- Only touch the lines absolutely necessary for any specific feature or bug
- Do one thing and only one.
- Contain a very detailed commit message explaining exactly what the change encompasses.
This on the other hand, allows me to create extremely clean (and exhaustive) change logs and NEWS file entries.
Now some of these policies about commits were a bit painful to actually make everyone adhere to, but over time, I was able to convince everybody of the huge advantage clean commits provide even though it may take some time to get them into shape (also, you gain that time back once you have to do some blame-ing or other history digging).
Using branches with only bug-fixes and auto-deploying them, we can increase the quality of customer installations and using the concept of a "Friday merge", we make sure all bug-fixes end up in the development tree without each developer having to spend an awful long time to manually merge or without ending up in merge-hell where branches and master have diverged too much.
The addition of gitorious for easy exchange of half-baked features to make it easier to talk about code before it gets "official" helped to increase the code quality further.
git was a tremendous help with this and I would never in my life want to go back to the dark days.
I hope this additional insight might be helpful for somebody still thinking that SVN is probably enough.
Jan/100
linktrail – a failed startup – introduction
I guess it's inevitable. Good ideas may fail. And good ideas may be years ahead of their time. And of course, sometimes, people just don't listen.
But one never stops learning.
In the year 2000, I took part in a plan of a couple of guys to become the next Yahoo (Google wasn't quite there yet back then), or, to use the words we used on the site,
For these reasons, we have designed an online environment that offers a truly new way for people to store, manage and share their favourite online resources and enables them to engage in long-lasting relationships of collaboration and trust with other users.
The idea behind the project, called linktrail, was basically what would much later on be picked up by the likes of twitter, facebook (to some extent) and the various community based news sites.
The whole thing went down the drain, but the good thing is that I was able to legally salvage the source code, the install it on a personal server of mine and to publish the source code. And now that so many years have passed, it's probably time to tell the world about this, which is why I have decided to start this little series about the project. What is it? How was it made? And most importantly: Why did it fail? And concequently: What could we have done better?
But let's first start with the basics.
As I said, I was able to legally acquire the database and code (which is mostly written by me anyways) and to install the site on a server of mine, so let's get that out to start with. The site is available at linktrail.pilif.ch. What you see running there is the result of 6 months of programming by myself after a concept done by the guys I've worked with to create this.
What is linktrail?
If the tour we made back then is any good, then just taking it would probably be enough, but let me phrase in my words: The site is a collection of so called trails which in turn are small units, comparable to blogs, consisting of links, titles and descriptions. These micro-blogs are shown in a popup window (that's what we had back then) beside the browser window to allow quick navigation between the different links in the trail.
Trails are made by users, either by each user on their own or as a collaborative work between multiple users. The owner of a trail can hand out permissions to everybody or their friends (using a system quite similar to what we currently see on facebook for example)
A trail is placed in a directory of trails which was built around the directory structures we used back then, though by now, we would probably do this much more different. Users can subscribe to trails they are interested in. In that case, they will be notified if a trail they are subscribed to is updated either by the owner or anybody else with the rights to update the trail.
Every user (called expert in the site's terms) has their profile page (here's mine) that lists the trails they created and the ones they are subscribed to.
The idea was for you as an user to find others with similar interests and form a community around those interests to collaborate on trails. An in-site messaging-system helped users to communicate with each other: Aside of just sending plain text messages, it's possible to recommend trails (for easy one-click subscription) .
linktrail was my first real programming project, basically 6 months after graduating in what the US would call high school. Combine that fact with the fact that it was created during the high times of the browser wars (year 2000, remember) with web standards basically non-existing, then you can imagine what a mess is running behind the scenes.
Still, the site works fine within those constraints.
In future posts, I will talk about the history of the project, about the technology behind the site, about special features and, of course, about why this all failed and what I would do differently - both in matters of code and organization.
If I woke your interest, feel free to have a look at the code of the site which I just now converted from CVS (I started using CVS about 4 months into development, so the first commit is HUGE) to SVN to git and put it up on github for public consumption. It's licensed under a BSD license, but I doubt that you'd find anything in this mess of PHP3(!) code (though it runs unchanged(!) on PHP5 - topic of another post I guess), HTML 3.2(!) tag soup and java-script hacks.
Oh and if you can read german, I have also converted the CVS repository that contained the concept papers that were written over the time.
In preparation of this series of blog-posts, I have already made some changes to the code base (available at github):
- login after register now works
- warning about unencrypted(!) passwords in the registration form
- registering requires you to solve a reCAPTCHA.
Dec/090
JSONP. Compromised in 3…2…1…
To embed a vimeo video on some page, I had a look at their different methods for embedding and the easiest one seemed to be what is basically JSONP - a workaround for the usual restriction of disallowing AJAX over domain boundaries.
But did you know, that JSONP not only works around the subdomain restriction, it basically is one huge cross site scripting exploit and there's nothing you can do about it?
You might have heard this and you might have found articles like this one thinking that using libraries like that would make you save. But that's an incorrect assumption. The solution provided in the article has it backwards and only helps to protect the originating site against itself, but it does not help at all to protect the calling site from the remote site.
You see, the idea behind JSONP is that you source the remote script using <script src="http://remote-service.example.com/script.js"> and the remote script then (after being loaded into your page and thus being part of your page) is supposed to call some callback of the original site (from a browsers standpoint it is part the original site).
The problem is that you do not get control over the loading let alone content of that remote script. Because the cross-domain restrictions prevent you from making an AJAX request to a remote server, you are using the native HTML methods for cross domain requests (which should not have been allowed in the first place) and at that moment you relinquish all control over your site as that remotely loaded script runs in the context of your page, which is how you get around the cross domain restrictions - by loading the remote script into your page and executing it in the context of your page.
Because you never see that script until it is loaded, you cannot control what it can do.
Using JSONP is basically subjecting yourself to an XSS attack by giving the remote end complete control over your page.
And I'm not just talking about malicious remote sites... what if they themselves are vulnerable to some kind of attack? What if they were the target of a successful attack? You can't know and once you do know it's too late.
This is why I would recommend you never to rely on JSONP and find other solutions for remote scripting: Use a local proxy that does sanitization (i.e. strict JSON parsing which will save you), rely on cross-domain messaging that was added in later revisions of the upcoming HTML5 standard.
Oct/090
Sense of direction vs. field of view
Last saturday, I bought the Metroid Prime Triloogy for the Wii. I didn't yet have the Wii Metroid and it's impossible for me to use the GameCube to play the old games as the distance between my couch and the reciever is too large for the GameCube's wired joypads. It has been a long while since I last played any of the 3D Metroids, and seeing the box in a store made me want to play them again.
So all in all, this felt like a good deal to me: Getting the third Prime plus the possibility to easily play the older two for the same price that they once asked for the third one alone.
Now I'm in the middle of the first game and I made a really interesting observation: My usually very good sense of direction seems to require a minimum sized field of view to get going: While playing on the GameCube, I was constantly busy looking at the map and felt unable to recognize even the simplest landmarks.
I spent the game in a constant state of feeling lost, not knowing where to go and forgetting how to go back to places where I have seen then unreachable powerups.
Now it might just be that I remember the world from my first playthrough, but this time, playing feels completely differently to me: I constantly know where to go and where I am. Even with rooms that are very similar to each other, I constantly know where I am and how to get from point a to point b.
When I want to re-visit a place, I just go there. No looking at the map. No backtracking.
This is how I usually navigate the real world, so after so many years of feeling lost in 3D games, I'm finally able to find my way in them as well.
Of course I'm asking myself what has changed and in the end it's either the generally larger screen size of the wide-screen format of the Wii port or maybe the controls via the Wiimote that feel much more natural: The next step for me will be to try and find out which it is by connecting the Wii to a smaller (but still wide) screen.
But aside of all that, Metroid just got even better - not that I believed that to be possible.
Sep/090
Programming languages names
Today in the office, a discussion about the merits of Ruby compared to Python and the other way around (isn't it fun to have people around actually willing to discuss such issues?) lead into us making fun of different programming languages by interjecting some sore points about them into their names.
The Skype conversation went roughly as follows (I removed some stuff for brevity but all the language names are intact):
thepilif: ja-long variable names and no function pointers-va really sucks
thepilif: though there's always C(*^~**<<)++
thepilif: and then there's alyways Del-Access violation at address 02E41C10. Read of address 02E41C10-phi
thepilif: or P-false==true-HP
Coworker: ok so for the sake of it i should add py thon
thepilif: or java-everything is global-script
thepilif: too bad it doesn't work for C
thepilif: C-sigsegv
thepilif: they know why they just chose one letter
Coworker: exactly, k&r are smart
Coworker: has-how the fuck do i do a print-skell?
Coworker: pe/(^$^)/rl
thepilif: or pe-module? object? hash? what's the difference-rl
Coworker: so we could say pe/$^/rl
thepilif: and ru-lets rewrite our syntax on the fly-by
Coworker: l(i(s(p)))
thepilif: can't you wrap this into another pair of ()?
thepilif: (l(i(s(p))))))
Coworker: yes even better
thepilif: and add the syntax error
thepilif: one too many )
Coworker: it's impossible to match them just by looking
thepilif: totally impossible. yes
Coworker: the human brain is no fucking pushdown automata
Coworker: but maybe the lisp people are
Coworker: vb! vb needs one
thepilif: visual-on error resume next-basic
thepilif: and of course brain-<<<<<******<<<>>>>-fuck
thepilif: c-tries to be dynamic, but var just doesn't cut it-#
thepilif: c-not quite java nor c(++)?-#
thepilif: though the first one feels better
thepilif: oh.. and of course HT-unknown error-ML
thepilif: as a tribute to IE6
thepilif: and of course la-no bugs but still not usable-tex
thepilif: sorry, Knuth
thepilif: and send-$*$_**^$$$-mail
So the question is: Do you have anything to add? Do you feel that we were overly unfair?
Sep/092
Introducing sacy, the Smarty Asset Compiler
We all know how beneficial to the performance of a web application it can be to serve assets like CSS files and JavaScript files in larger chunks as opposed to smaller ones.
The main reason behind this is the latency incurring from requesting a resource from the server plus the additional bandwidth of the request metadata which can grow quite large when you take cookies into account.
But knowing this, we also want to keep files separate during development to help us with the debugging and development process. We also want the deployment to not increase too much in difficulty, so we naturally dislike solutions that require additional scripts to run at deployment time.
And we certainly don't want to mess with the client-side caching that HTTP provides.
And maybe we're using Smarty and PHP.
So this is where sacy, the Smarty Asset Compiler plugin comes in.
The only thing (besides a one-time configuration of the plugin) you have to do during development is to wrap all your <link>-Tags with {asset_compile}....{/asset_compile} and the plugin will do everything else for you, where everything includes:
- automatic detection of actually linked files
- automatic detection of changed files
- automatic minimizing of linked files
- compilation of all linked files into one big file
- linking that big file for your clients to consume. Because the file is still served by your webserver, there's no need for complicated handling of client-side caching methods (ETag, If-Modified-Since and friends): Your webserver does all that for you.
- Because the cached file gets a new URL every time any of the corresponding source files change, you can be sure that requesting clients will retrieve the correct, up-to-date version of your assets.
- sacy handles concurrency, without even blocking while one process is writing the compiled file (and of course without corrputing said file).
sacy is released under the MIT license and ready to be used (though it currently only handles CSS files and ignores the media-attribute - stuff I'm going to change over the next few days).
Interested? Visit the project's page on GitHub or even better, fork it and help improving it!
Sep/090
Twisted Tornado
Lately, the net is all busy talking about the new web server released by FriendFeed last week and how their server basically does the same thing as the Twisted framework that was around so much longer. One blog entry ends with
Why Facebook/Friendfeed decided to create a new web server is completely beyond us.
Well. Let me add my two cents. Not from a Python perspective (I'm quite the Python newbie, only having completed one bigger project so far), but from a software development perspective. I feel qualified to add the cents because I've been there and done that.
When you start any project, you will be on the lookout for a framework or solution to base your work on. Often times, you already have some kind of idea of how you want to proceed and what the different requirements of your solution will be.
Of course, you'll be comparing existing requirements against the solutions around, but chances are that none of the existing solutions will match your requirements exactly, so you will be faced with changing them to match.
This involves not only the changes themselves but also other considerations:
- is it even possible to change an existing solution to match your needs?
- if the existing solution is an open source project, is there a chance of your changes being accepted upstream (this is not a given, by the way).
- if not, are you willing to back- and forward-port your changes as new upstream versions get released? Or are you willing to stick with the version for eternity, manually back-porting security-issues?
and most importantly
- what takes more time: Writing a tailor-made solution from scratch or learning how the most-matching solutions ticks to make it do what you want?
There is a very strong perception around, that too many features mean bloat and that a simpler solution always trumps the complex one.
Have a look at articles like «Clojure 1, PHP 0» which compares a home-grown, tailor-made solution in one language to a complete framework in another and it seems to favor the tailor-made solution because it was more performant and felt much easier to maintain.
The truth is, you can't have it both ways:
Either you are willing to live with «bloat» and customize an existing solution, adding some features and not using others, or you are unwilling to accept any bloat and you will do a tailor-made solution that may be lacking in features, may reimplement other features of existing solutions, but will contain exactly the features you want. Thus it will not be «bloated».
FriendFeed decided to go the tailor-made route but instead of many other projects each day who go the tailor made route (take Django's reimplementations of many existing Python technologies like templating and ORM as another example) and keep using that internally, they actually went public.
Not with the intention to bad-mouth Twisted (though it kinda sounded that way due to bad choice of words), but with the intention of telling us: «Hey - here's the tailor-made implementation which we used to solve our problem - maybe it is or parts of it are useful to you, so go ahead and have a look».
Instead of complaining that reimplementation and a bit of NIH was going on, the community could embrace the offering and try to pick the interesting parts they see fitting for their implementation(s).
This kind of reinventing the wheel is a standard process that is going on all the time, both in the Free Software world as in the commercial software world. There's no reason to be concerned or alarmed. Instead we should be thankful for the groups that actually manage to put their code out for us to see - in so many cases, we never get a chance to see it and thus lose a chance at making our solutions better.
Aug/0915
Snow Leopard and PHP
Earlier versions of Mac OS X always had pretty outdated versions of PHP in their default installation, so what you usually did was to go to entropy.ch and fetch the packages provided there.
Now, after updating to Snow Leopard you'll notice that the entropy configuration has been removed and once you add it back in, you'll see Apache segfaulting and some missing symbol errors.
Entropy has not updated the packages to snow leopard yet, so you could have a look at PHP that came with stock snow leopard: This time it's even bleeding edge: Snow Leopard comes with PHP 5.3.0.
Unfortunately though, some vital extensions are missing, most notably for me, the PostgeSQL extension.
This time around though, Snow Leopard comes with a functioning PHP development toolset, so there's nothing stopping you to build it yourself, so here's how to get the official PostgreSQL extension working on Snow Leopard's stock php:
- Make sure that you have installed the current Xcode Tools. You'll need a working compiler for this.
- Make sure that you have installed PostgreSQL and know where it is on your machine. In my case, I've used the One-click installer from EnterpriseDB (which persisted the update to 10.6).
- Now that Snow Leopard uses a full 64bit userspace, we'll have to make sure that the PostgreSQL client library is available as a 64 bit binary - or even better, as an universal binary.Unfortunately, that's not the case with the one-click installer, so we'll have to fix that first:
- Download the sources of the PostgreSQL version you have installed from postgresql.org
- Open a terminal and use the following commands:
% tar xjf postgresql-[version].tar.bz2 % cd postgresql-[version] % CFLAGS="-arch i386 -arch x86_64" ./configure --prefix=/usr/local/mypostgres % make
make will fail sooner or later because you the postgres build scripts can't handle building an universal binary server, but the compile will progress enough for us to now build libpq. Let's do this:
% make -C src/interfaces % sudo make -C src/interfaces install % make -C src/include % sudo make -C src/include install % make -C src/bin % sudo make -C src/bin install
- Download the php 5.3.0 source code from their website. I used the bzipped version.
- Open your Terminal and cd to the location of the download. Then use the following commands:
% tar -xjf php-5.3.0.tar.bz2 % cd php-5.3.0/ext/pgsql % phpize % ./configure --with-pgsql=/usr/local/mypostgres % make -j8 # in case of one of these nice 8 core macs :p % sudo make install % cd /etc % cp php.ini-default php.ini
- Now edit your new php.ini and add the line
extension=pgsql.so
And that's it. Restart Apache (using apachectl or the System Preferences) and you'll have PostgreSQL support.
All in all this is a tedious process and it's the price us early adopters have to pay constantly.
If you want an honest recommendation on how to run PHP with PostgreSQL support on Snow Leopard, I'd say: Don't. Wait for the various 3rd party packages to get updated.