Sep/090
Twisted Tornado
Lately, the net is all busy talking about the new web server released by FriendFeed last week and how their server basically does the same thing as the Twisted framework that was around so much longer. One blog entry ends with
Why Facebook/Friendfeed decided to create a new web server is completely beyond us.
Well. Let me add my two cents. Not from a Python perspective (I'm quite the Python newbie, only having completed one bigger project so far), but from a software development perspective. I feel qualified to add the cents because I've been there and done that.
When you start any project, you will be on the lookout for a framework or solution to base your work on. Often times, you already have some kind of idea of how you want to proceed and what the different requirements of your solution will be.
Of course, you'll be comparing existing requirements against the solutions around, but chances are that none of the existing solutions will match your requirements exactly, so you will be faced with changing them to match.
This involves not only the changes themselves but also other considerations:
- is it even possible to change an existing solution to match your needs?
- if the existing solution is an open source project, is there a chance of your changes being accepted upstream (this is not a given, by the way).
- if not, are you willing to back- and forward-port your changes as new upstream versions get released? Or are you willing to stick with the version for eternity, manually back-porting security-issues?
and most importantly
- what takes more time: Writing a tailor-made solution from scratch or learning how the most-matching solutions ticks to make it do what you want?
There is a very strong perception around, that too many features mean bloat and that a simpler solution always trumps the complex one.
Have a look at articles like «Clojure 1, PHP 0» which compares a home-grown, tailor-made solution in one language to a complete framework in another and it seems to favor the tailor-made solution because it was more performant and felt much easier to maintain.
The truth is, you can't have it both ways:
Either you are willing to live with «bloat» and customize an existing solution, adding some features and not using others, or you are unwilling to accept any bloat and you will do a tailor-made solution that may be lacking in features, may reimplement other features of existing solutions, but will contain exactly the features you want. Thus it will not be «bloated».
FriendFeed decided to go the tailor-made route but instead of many other projects each day who go the tailor made route (take Django's reimplementations of many existing Python technologies like templating and ORM as another example) and keep using that internally, they actually went public.
Not with the intention to bad-mouth Twisted (though it kinda sounded that way due to bad choice of words), but with the intention of telling us: «Hey - here's the tailor-made implementation which we used to solve our problem - maybe it is or parts of it are useful to you, so go ahead and have a look».
Instead of complaining that reimplementation and a bit of NIH was going on, the community could embrace the offering and try to pick the interesting parts they see fitting for their implementation(s).
This kind of reinventing the wheel is a standard process that is going on all the time, both in the Free Software world as in the commercial software world. There's no reason to be concerned or alarmed. Instead we should be thankful for the groups that actually manage to put their code out for us to see - in so many cases, we never get a chance to see it and thus lose a chance at making our solutions better.
Apr/090
Do not change base library behavior
Modern languages like JavaScript or Ruby provide the programmer with an option to "reopen" any class to add additional behavior to them. In the case of Ruby and JavaScript, this is not constrained in any way: You are able to reopen any class - even the ones that come with your language itself and there are no restrictions on the functionality of your extension methods.
Ruby at least knows of the concept of private methods and fields which you can't call from your additional methods, but that's just Ruby. JS knows of no such thing.
This provides awesome freedom to the users of these languages. Agreed. Miss a method on a class? Easy. Just implement that and call it from wherever you want.
This also helps to free you from things like
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(of))); |
which is lots of small (but terribly inconventiently named) classes wrapped into each other to provide the needed functionality. In this example, what the author wanted is to read a file line-by-line. Why exactly do I need three objects for this? Separation of concern is nice, but stuff like this make learning a language needlessly complicated.
In the world of Ruby or JS, you would just extend FileInputStream with whatever functionality you need and then call that, creating code that is much easier to read.
FileInputStream.prototype.readLine = function(){...} //... of.readLine(); //... |
And yet, if you are a library (as opposed to consumer code), this is a terrible, terrible thing to do!
We have seen previous instances of the kind of problems you will cause: Libraries adding functionality to existing classes create real problems when multiple libraries are doing the same thing and the consuming application is using both libraries.
Let's say for example, that your library A added that method sum() to the generic Array class. Let's also say that your consumer also uses library B which does the same thing.
What's the deal about this, you might ask? It's pretty clear, what sum does after all?
Is it? It probably is when that array contains something that is summable. But what if there is, say, a string in the array you want to sum up? In your library, the functionality of sum() could be defined as "summing up all the numeric values in the array, assuming 0 for non-numeric values". In the other library, sum() could be defined as "summing up all the numeric values in the array, throwing an exception if sum() encounters invalid value".
If your consumer loads your library A first and later on that other library B, you will be calling B's Array#sum().
Now due to your definition of sum(), you assume that it's pretty safe to call sum() with an array that contains mixed values. But because you are now calling B's sum(), you'll get an exception you certainly did not expect in the first place!
Loading B after A in the consumer caused A to break because both created the same method conforming to different specs.
Loading A after B would fix the problem in this case, but what, say, if both you and B implement Array#avg, but with reversed semantics this time around?
You see, there is no escape.
Altering classes in the global name space breaks any name spacing facility that may have been available in your language. Even if all your "usual" code lives in your own, unique name space, the moment you alter the global space, you break out of your small island and begin to compete with the rest of the world.
If you are a library, you cannot be sure that you are alone in that competition.
And even if you are a top level application you have to be careful not to break implementations of functions provided by libraries you either use directly or, even worse, indirectly.
If you need a real-life example, the following code in an (outdated) version of scriptaculous' effects.js broke jQuery, despite the latter being very, very careful to check if it can rely on the base functionality provided:
Array.prototype.call = function() { var args = arguments; this.each(function(f){ f.apply(this, args) }); } |
Interestingly enough, Array#call wasn't used in the affected version of the library. This was a code artifact that actually did nothing but break a completely independent library (I did not have time to determine the exact nature of the breakage).
Not convinced? After all I was using an outdated version of scriptaculous and I should have updated (which is not an option if you have even more libraries dependent on bugs in exactly that version - unless you update all other components as well and then fix all the then broken unit tests).
Firefox 3.0 was the first browser to add document.getElementByClassName, a method also implemented by Prototype. Of course the functionality in Firefox was slightly different from the implementation in Prototype, which now called the built-in version instead its own version which caused a lot of breakage all over the place.
So, dear library developers, stay in your own namespace, please. You'll make us consumers (and your own) lives so much more easier.
Feb/090
All-time favourite tools – update
It has been more than four years since I've last talked about my all-time favourite tools. I guess it's time for an update.
Surprisingly, I still stand behind the tools listed there: My love for Exim is still un-changed (it just got bigger lately - but that's for another post). PostgreSQL is cooler than ever and powers PopScan day-in, day-out without flaws.
Finally, I'm still using InnoSetup for my Windows Setup programs, though that has lost a bit of importance in my daily work as we're shifting more and more to the web.
Still. There are two more tools I must add to the list:
- jQuery is a JavaScript helper libary that allows you to interact with the DOM of any webpage, hiding away browser incompatibilities. There are a couple of libraries out there which do the same thing, but only jQuery is such a pleasure to work with: It works flawlessly, provides one of the most beautiful APIs I've ever seen in any library and there are tons and tons of self-contained plug-ins out there that help you do whatever you would want to on a web page.
jQuery is an integral part of making web applications equivalent to their desktop counterparts in matters of user interface fluidity and interactivity.
All while being such a nice API that I'm actually looking forward to do the UI work - as opposed to the earlier days which can most accurately be described as UI sucks. - git is my version control system of choice. There are many of them out there in the world and I've tried the majority of them for one thing or another. But only git combines the awesome backwards-compatibility to what I've used before and what's still in use by my coworkers (SVN) with abilities to beautify commits, have feature branches, very high speed of execution and very easy sharing of patches.
No single day passes without me using git and running into a situation where I'm reminded of the incredible beauty that is git.
In four years, I've not seen one more other tool I've as consistenly used with as much joy as git and jQuery, so those two certainly have earned their spot in my heart.
Feb/090
Google Apps: Mail Routing
Just today while beginning the evaluation of a Google Apps For Your Domain Premium account, I noticed something that may be obvious to all of you Google Apps user out there, but certainly isn't documented well enough for you to notice before you sign up:
Google Apps Premium has kick-ass mail routing functionality.
Not only can you configure Gmail to only accept mails from defined upstream-server, thus allowing you to keep the MX to some already existing server where you can do alias resolution for example. No. You can also tell Gmail to send outgoing mail via an external relay.
This is ever so helpful as it allows you to keep all the control you need over incoming email - for example if you have email-triggered applications running. Or you have email-aliases (basically forwarders where xxx@domain.com is forwarded to yyy@other-domain.com) which Google Apps does not support.
Because you can keep your old MX, your existing applications keep working and your aliases continue to resolve.
Allowing you to send all outgoing mail via your relay, in turn, allows you to get away without updating SPF records and forcing customers to change filters they may have set up for you.
This feature alone can decide between a go or no-go when evaluating Google Apps and I cannot understand why they have not emphasized on this way more than they currently do.
Feb/090
My new friend: git rebase -i
Last summer, I was into making git commits look nice with the intent of pushing a really nice and consistent set of patches to the remote repository.
The idea is that a clean remote history is a convenience for my fellow developers and for myself. A clean history means very well-defined patches - should a merge of a branch be neccesary in the future. It also means much easier hunting for regressions and generally more fun doing some archeology in the code.
My last post was about using git add -i to refine the commits going into the repository. But what if you screw up the commit anyways? What if you forget to add a new file and notice it only some commits later?
This is where git rebase -i comes into play as this allows you to reorder your local commits and to selectively squash multiple commits into one.
Let's see how we would add a forgotten file to a commit a couple of commits ago.
- You add the forgotten file and commit it. The commit message doesn't really matter here.
- You use
git logorgitkto find the commit id before the one you want to amend this new file to. Let's say it's 6bd80e12707c9b51c5f552cdba042b7d78ea2824 - Pick the first few characters (or the whole ID) and pass them to git rebase -i.
% git rebase -i 6bd80e12
git will now open your favorite editor displaying your list of commits since the revision you have given. This could look like this.
pick 6bd80e1 some commit message. This is where I have forgotten the file pick 4c1d210 one more commit message pick 5d2f4ed this is my forgotten file # Rebase fc9a0c6..5d2f4ed onto fc9a0c6 # # Commands: # p, pick = use commit # e, edit = use commit, but stop for amending # s, squash = use commit, but meld into previous commit # # If you remove a line here THAT COMMIT WILL BE LOST. # However, if you remove everything, the rebase will be aborted. #
The comment in the file says it all - just reorder the first three (or how many there are in your case) to look like this:
pick 6bd80e1 some commit message. This is where I have forgotten the file squash 5d2f4ed this is my forgotten file pick 4c1d210 one more commit message
Save the file. Git will now do some magic and open the text editor again where you can amend the commit message for the commit you squashed your file into. If it's really just a forgotten file, you'll probably keep the message the same.
One word of caution though: Do not do this on branches you have already pushed to a remote machine or otherwise shared with somebody else. git gets badly confused if it has to pull altered history.
Isn't it nice that after moths you still find new awesomeness in your tool of choice?
I guess I'll have to update my all-time favorite tools list. It's from 2004, so it's probably ripe for that update.
Git rules.
Sep/0823
Automatic language detection
If you write a website, do not use Geolocation to determine the language to display to your user.
If you write a desktop application, do not use the region setting to determine the language to display to your user.
This is incredibly annoying for some of us, especially for me which is why I'm ranting here.
The moment Google released their (awful) German translation for their RSS reader, I was served the German version just because I have a Swiss IP address.
Here in Switzerland, we actually speak one of three (or four, depending on who you ask) languages, so defaulting to German is probably not of much help for the people in the french speaking part.
Additionally, there are many users fluent in (at least reading) English. We always prefer the original language if at all possible because generally, translations never quite work. Even if you have the best translators at work, translated texts never feel fluid. Especially not when you are used to the original version.
So, Google, what were you thinking to switch me over to the German version of the reader? I have been using the English version for more than a year, so clearly, I understood enough of that language to be able to use it. More than 90% of the RSS feeds I'm subscribed to are, in fact, in English. Can you imagine how pissed I was to see the interface changed?
This is even worse on the iPhone/iPod frontend, because, there, you don't even provide an option to change the language aside of manually hacking the URL.
Or take desktop applications. I live in the German speaking parts of Switzerland. True. So naturally I have set my locale settings to Swiss German. You know: I want to have the correct number formatting, I want my weeks to start on Mondays. I want the correct currency. I want my 24 hours clock I'm used to.
Actually, I also want the German week and month names, because I will be using these in most of my letters and documents, which are, in fact, German too.
But my OS installation is English. I am used to English. I prefer English. Why do so many programs insist to use the locale setting to determine the display language? Do you developers think it's funny to have a mish-mash of languages on the screen? Don't you think that me using an English OS version may be an indication that I do not want to read your crappy German translation alongside the English user interface of my OS?
Don't you think that it feels really stupid to have a button in a German dialog box open another, English, dialog (the first one is from Chrome, the one that opens once you click "Zertifikate verwalten" (Manage certificates) is from Windows itself)?
In Chrome, I can at least fix the language - once I found the knob to turn. At first, it was easier for me to just delete the German localization file from the chrome installation because, due to being completely unused to German UIs, I was unable to find the right setting.
This is really annoying and I see this particular problem being neglected on an incredibly large scale. I know that I am a minority, but the problem is so terribly easy to fix:
- All current browsers send an Accept-Language header. In contrast to the earlier times, nowadays, it is actually correctly preset in all the common browsers. Use that. Don't use my IP-address.
- Instead of reading the locale setting in my OS, ask the OS for its UI language and use that to determine which localization to load (actually, this is the recommended way of doing things according to Microsoft's guidelines at least since Windows XP which was 2001).
Using these two simple tricks, you help a minority without hindering the majority in any way and without additional development overhead!
Actually, you'll be getting away a lot cheaper than before. GeoIP is expensive if you want it to be accurate (and you do want that. Don't you?), whereas there are ready-to-use libraries to determine the correct language even from the most complex Accept-Language-Header.
Asking the OS for the UI language isn't harder than asking it for the locale, so no overhead there either.
Please, developers, please have mercy! Stop the annoyance! Stop it now!
Apr/080
Ubuntu 8.04
I'm sure that you have heard the news: Ubuntu 8.04 is out.
Congratulations to Canonical and their community for another fine release of a really nice Linux distribution.
What prompted me to write this entry though is the fact that I have updated shion from 7.10 to 8.04 this afternoon. Over a SSH connection.
The whole process took about 10 minutes (including the download time) and was completely flawless. Everything kept working as it was before. After the reboot (which also went flawlessly), even OpenVPN came back up and connected to the office so I could have a look at how the update went.
This is very, very impressive. Updates are tricky. Especially considering that it's not one application that's updated, not even one OS. It's a seemingly random collection of various applications with their interdependencies, making it virtually impossible to test each and every configuration.
This shows that with a good foundation, everything is possible - even when you don't have the opportunity to test for each and every case.
Congratulations agin, Ubuntu team!
Apr/080
Web service authentication
When reading an article about how to make google reader work with authenticated feeds, one big flaw behind all those web 2.0 services sprang to my mind: Authentication.
I know that there are efforts underway to standardise on a common method of service authentication, but we are nowhere near there yet.
Take facebook: They offer you to enter your email account data into some form to send an invitation to all your friends. Or the article I was referring to: They want your account data for a authenticated feed to make them available in google reader.
But think of what you are giving away...
For your service provider to be able to interact with that other service, they need to store your passwort. Be it short term (facebook, hopefully) or long term (any online feed reader with authentication support). They can (and do) assure you that they will store the data in encrypted form, but to be able to access the service in the end, they need the unencrypted password, thus requiring them to not only use reversible encryption, but to also keep the encryption key around.
Do you want a company in a country whose laws you are not familiar with to have access to all your account data? Do you want to give them the password to your personal email account? Or to everything else in case you share passwords?
People don't seem to get this problem as account data is freely given all over the place.
Efforts like OAuth are clearly needed, but as webbased technology, they clearly can't solve all the problems (what about Email accounts for example).
But is this the right way? We can't even trust desktop applications. Personally, I think the good old username/password combination is at the end of its usefulness (was it ever really useful?). We need new, better, ways for proving our identity. Something that is easily passed around and yet cannot be copied.
SSL client certificates feel like an underused but very interesting option. Let's make two examples. The first one is your authenticated feed. The second one is your SSL-enabled email server. Let's say that you want to give a web service revokable access to both services without ever giving away personal information.
For the authenticated feed, the external service will present the feed server with its client side certificate which you have signed. By checking your signature, the authenticated feed knows your identity and by checking your CRL it knows whether you authorized the access or not. The service doesn't know your password and can't use your signature for anything but accessing that feed.
The same goes for the email server: The third party service logs in with your username and the signed client certificate (signed by you), but without password. The service doesn't need to know your password and in case they do something wrong, you revoke your signature and be done with it (I'm not sure whether mail servers support client certificates, but I gather they do as it's part of the SSL spec).
Client side certificates already provide a standard means for secure authentication without ever passing a known secret around. Why isn't it used way more often these days?
Mar/080
Another new look
It has been a while since the last redesign of gnegg.ch, but is a new look after just a little more than one year of usage really needed?
The point is that I have changed blogging engines yet again. This time it's from Serendipity to Word Press.
What motivated the change?
Interestingly enough, if you ask me, s9y is clearly the better product than Wordpress. If WordPress is Mac OS, then s9y is Linux: It has more features it's based on cleaner code, it doesn't have any commercial backing at all. So the question remains: Why switch?
Because that OSX/Linux-analogy also works the other way around: s9y is an ugly duckling compared to WP. External tools won't work (well) with s9y due to it not being known well enough. The amount of knobs to tweak is sometimes overwhelming and the available plugins are not nearly as polished as the WP ones.
All these are reasons to make me switch. I've used a s9y to wp converter, but some heavy tweaking was needed to make it actually transfer category assignements and tags (the former didn't work, the latter wasn't even implemented). Unfortunately, the changes were too hackish to actually publish them here, but it's quite easily done.
Aside of that, most of the site has survived the switch quite nicely (the permalinks are broken once again though), so let's see how this goes
