Tuesday, December 28, 2010

Why HD looks weird.

Very often I'm sitting next to a HD teevee, and people say 'that looks weird'.

Why?

There's a number of reasons, but the two main ones are:
  1. when compared to SD digital teevee is: Poor colour depth compared to the number of pixels.
  2. For a person used to analog teevee (PAL, or NTSC) - and not digital teevee, a major weirdness is the digital compression used. Digital teevee is compressed

Compared to SD digital teevee - No increase in colour richness.

They increased the number of pixels shown per cm (or inch), but did not increase the richness of colour displayed.

On PAL for example, you may have 422 colour depth - which is 8 bit colour. If you come from the computer world you may think that is 8 bits per channel colour (RGB 888). However in the broadcast world they use the YUV colour space, and this is usually only 8 or 10 bits per pixel. 4 bits Y, 2 bits U, 2 bits V == 8 bits per pixel.

If the colour depth is the same, then why does it look weirder in HD? Imagine a rainbow going from Left->Right. PAL would have 768 pixels wide, and 'HD' 1920 pixels wide.

The same number of colours would be on the rainbow line, only the HD one would be kind of zoomed in.

The error is zoomed in - or enhanced.

There's other issues like how blue ray only does 8bit colour, and then there are cameras that only output 8bit colour... or worse. Then the equipment in various distribution centers, or the center which puts the little logo in left hand side of the screen, might have different resolutions (say one operates in YUV 422... 8bit). Then you have the compression codecs, and settings being used by the broadcasters. Then the link from your broadcast box to your LCD can down-sample to 8bit. Finally the LCD internally may only do 8 bit, or the display only 8bit. If any one link in the chain from the camera to the LCD is 8 bit, then the final result will be 8bit (or possibly even worse, if the conversions are done badly... eg a fast YUV422->RGB->YUV422 conversion are not lossless). Or the camera natively uses bayer colour space, which is converted to YUV422 by the interface, and then to RGB888 in the computer, then back to YUV422 in the encoder, through to the teevee which is shown in bayer colour space again right at the end. Each conversion to a different colour space is lossy.



Other reasons why it looks weird are enhancing the error of the frame intervals. Eg, film at 24fps somehow shown on a screen that only does 50hz. We've gotten used to things been shown at one frame rate compared to another. The movement is different - and human beings can easily tell the difference.


Teevee is now very complex, and a lot of factors complicate what is being shown to you. Everything, from the speed of things being shown to you, to the colour depth used, the bandwidth used per teevee channel, to the size of the display to the type of technology being used to show you... even the glass on the screen is different.

Why does HD look weird?

All of those things have an affect, however the lack of enough extra colour depth is one of the main things.


Compared to analog teevee - digital compression.



For a person used to analog teevee (PAL, or NTSC) - and not digital teevee, a major weirdness is the digital compression used. Digital teevee is compressed, so they can fit more channels in the space available to them. The compression changes the image so to the human eye... little detail is lost.

That's the aim anyway. However, compression is never perfect - and many people can see the weirdness in the compressed images.


The world of teevee is weird - but digital teevee is even weirder.

Tuesday, October 12, 2010

iphone web app development, from the trenches.


Dear reader,

I joined the iOS web developers for adventure and a chance to see the world but instead I am working in a mud hole, freezing my arse off, with a constant fear of death. I am writing this note in hope that it makes it out of the trenches. In case I do not.

It's bloody here in the iOS trenches and I feel my days are numbered. With these thoughts on my mind, I hope to share this with you.

Some of this stuff is not documented in the standard issue manual, or disseminated via the standard propaganda channels. I feel it ought be of use to you.

Unfortunately it has been very hectic here, so the words will likely be rushed and detail will be lacking. I apologise for this, but I still think it will be useful (no brain rockets, just some Damn Useful Information).

In case I don't make it,
Your friend from the trenches,
René Dudfield.


ps. if you find this note, please consider commenting on the back with any other useful information your fellow iOS trench mates might find useful.


-------------- -------------- --------------


Going full screen requires a cache manifest on the iphone. Otherwise the files do not save. The cache manifest is a horrible beast, that has quite a few gotchas. It will only go fullscreen when run as a web app, not when run through safari.

You can detect with javascript if your app is running through safari, or as a web app. This is most useful for asking the user if they want to put your app on their home screen.

Changing css opacity quickly is really slow on the iphone. This makes some jquery things slow - like show/hide.

Use the css animations, and transitions if you can. Just like animating opacity is slow with javascript on iphone, so is animating other css attributes.

There is a 5MB limit on the overall app cache size. Big files will not cache (like music and videos).

If you are using cache busting urls, it stuffs up the cache manifest. eg, mystyles.css?v=234 in your html, will make the cache not work correctly. I guess since the cache has just mystyles.css not mystyles.css?v=234.

For cache manifest to work on iphone you need a html 5 doctype, and a html 5 tag without a xmlns attribute.

Google chrome can be a good debugging tool for the cache manifest file. Since it shows you problems in the console.

Restarting the iphone can help clear the safari cache.

Safari 4, and iOS 4 have the audio tag - not earlier. Calling play on an Audio object after calling load does not really work. However if you set the autoplay attribute to true then it plays as soon as it can.

The audio can stutter a little bit if you are doing many other things at the time, or if the network slows, and it's not all loaded yet.

iphone can not play video inline on a webpage. It can only play fullscreen. iPad can play the video inline.

SVG on iphone is pretty quick. Canvas is not so quick... but you can still do basic things ok.

CSS media queries let you supply a different CSS if you are on an iphone/ipad/android etc.

Mouse events are a bit slow to react. You can use mouse events, but learning how to use the touch events will allow you to develop a much better experience. The touch start event, and the touch end event both come before a mouse event arrives. This means that it will always respond faster by using the touch events.

Make touchable things(buttons) big, and make them a similar size. If you have a really small button next to a really big button it might be impossible for the person to press on the really small button. They can zoom in and out, and if the buttons are all a similar size, then they will be zoomed in at the right level to be able to press on the buttons.


Wednesday, September 29, 2010

Tweeting python packaging tips.

I've begun 'tweeting' python packaging tips. I'm hoping to go up to 100 useful python packaging tips. If you're a twit too, please feel free to join in on the conversations.


update: I wrote a little script to download all the python packaging related tweets from twitter. For some reason twitter is only giving me 32 of them... but I've written about 40 so far. Going to run through these at the london python dojo tonight in a mini talk.



setup.py build_ext --inplace to compile your extensions inside the source directory. Good for developing inplace

distribute 'setup.py develop' Installed pkg points to dev copy, quicker changes during dev. develop -u to remove

For pyrex/cython/swig packages, include the generated C code so people do not need to install cython/pyrex/swig.

For debugging info set DISTUTILS_DEBUG. os.environ['DISTUTILS_DEBUG'] = "on" OR export DISTUTILS_DEBUG=on

Don't put code in package/__init__.py file. Makes debugging/editing harder as there are lots of __init__.py files

Name the folder your package lives in after the package name, not src or lib. eg ./mypackage/

Make quick and dirty .deb .rpm packaging of any python package with checkinstall

Create man pages(unix docs) for scripts and command line programs you make available. See rst2man and help2man.

Install scripts (cmd line programs) with pkg http://docs.python.org/distutils/setupscript.html#installing-scripts

Including a short summary of changes at end of 'description' metadata lets people quickly see changes on pypi.

put a #hashtag in your description metadata, and your package will turn up on twitter under that.

Read other peoples python packages to see what they do well. Especially packages you use.

Distutils2 is the future of packaging. It's not ready yet though. http://pypi.python.org/pypi/Distutils2

setup.cfg is an (optional) setup config file. http://docs.python.org/distutils/configfile.html

Test on multiple python versions before release. Each py version is different (2.6,3,pypy,ironpython,jython).

Tests are good. See what you are not testing with the coverage package. http://nedbatchelder.com/code/coverage/

Check your package quality with Cheesecake. 'pip install Cheesecake' 'cheesecake_index -v -n yourpackage'.

Give credit, ♥, and props to your contributors in a thanks.txt file.

pep 386 specifies version scheme for Distutils - use it. http://www.python.org/dev/peps/pep-0386/

1 eg. #django package is not at http://pypi.python.org/pypi/django . It is at http://pypi.python.org/pypi/Django

Packages are caseSensitive. Refer to it the same way it's installed, otherwise confusion installing & finding.

Check your package name is available on pypi, and that it follows the pep8 naming convention. All lowercase, etc.

No changes metadata field. Put in long_description field and CHANGES.txt so people know what's new or changed.

MANIFEST.in controls files to include/exclude (.bak .swp) http://docs.python.org/distutils/sourcedist.html

bdist_mpkg is the package required to make Mac OSX installers.

1 In your setup.py from distutils.command.build import build;build.sub_commands.append(('test', None))

Running tests after build helps catch errors in distro packaged versions, and turns all users into testers.

Great guide http://diveintopython3.org/packaging.html Slightly dated (2009) since py packaging moves fast.

pep 345, latest spec for adding metadata to Python distributions. http://www.python.org/dev/peps/pep-0345/

readme.txt - Use .txt extension. Be reStructuredText. Use windows carriage returns/newlines because notepad sux.

The debian/ dir should only be in the Debian packaging repository. The debian maintainer takes care of it.

_module.so _module.dll _module.dynlib - compiled extensions should be named with a leading underscore.

Sunday, September 26, 2010

Pokket mixer. A sound mixer from Berlin.



pokket mixer

I went to the big market at the Berlin park today, and saw this little sound mixer there. This dude and his girlfriend who were at the stall make them! They are his design too. Really cool buying electronics from the people who make them.

Did I mention I like buying small things that can fit in my pocket, and in my carry-on luggage?

It's passive - so it does not need to be powered. It's also very small. Seems to work quite well. The eq seems to work ok. It has the normal Hi, Mid, Lo for each of the two channels. The sound quality seems pretty good (even when going out to high quality studio speakers). I've bought more expensive mixers with worse audio quality.

Monday, August 09, 2010

3g modems in finland? Any plans for a month or one week?

Dear Lazy web,

Are there any 3G usb modem plans available in Finland that I can get for one week or a month? I'd need to buy the modem too.

Hopefully it should work two hours drive from Oulu.


Update: used a modem by Elisa. Worked well, even 3g. Closing comments because this post is getting spams every day - and blogger.com spam protection seems broken.

Monday, July 26, 2010

javascript (and jquery) templating

I just put up some code for doing templating with javascript(and jquery). Either on the server side or on the client side. If you open the html in your browser, it runs on the client side. If you first process it server side... then it runs server side(but not on the client side).

http://github.com/illume/nodejs_jquery_templating

This is a followup from you are using the wrong templating language - as a proof of concept.

Why is this useful?

  • No need for a server to process the templates. Either process them server side or client side.
  • Data can be stored in a json file. No need for a database for testing. Just create json files in a text file.
  • Can reuse knowledge of javascript libraries (like jquery), rather than learning one of 798394 different templating languages.
  • Can keep one html file which front end web developers can edit without them needing a new template file.
  • Can use jquery plugins on server side too(validation, etc).

Seems to work ok so far... but it still needs a lot of polish before it'll be useful. If it works out ok, I'll try it out on some real projects (where the main server side language is python).

Wednesday, June 09, 2010

Let's make a shit JavaScript interpreter! Part one.



Let's make a shit javascript interpreter! Part one.

As a learning exercise, I've begun writing a javascript ECMAScript interpreter in python. It doesn't even really exist yet, and when it does it will run really slowly, and not support all js features.

So... let's make a "from scratch", all parsing, all dancing, shit interpreter of our very own!

Teaching something is a great way to learn. Also writing things on my blog always gets good 'comments', hints, tips, plenty of heart, and outright HATE from people. All useful and entertaining :)

Tokenising

So to start with, we need something to turn the .js files into a list of tokens. This type of program is called a tokeniser.

From some javascript like this:
function i_can_has_cheezbrgr () {return 'yum';};
Into a Token list something like this:
[
{"type":"name",
"value":"function",
"from":0,
"to":8},
{"type":"name",
"value":"i_can_has_cheezbrgr",
"from":9,
"to":28},
{"type":"operator",
"value":"(",
"from":29,
"to":30},
{"type":"operator",
"value":")",
"from":30,
"to":31},
{"type":"operator",
"value":"{",
"from":32,
"to":33},
{"type":"name",
"value":"return",
"from":33,
"to":39},
{"type":"string",
"value":"yum",
"from":40,
"to":45},
{"type":"operator",
"value":";",
"from":45,
"to":46},
{"type":"operator",
"value":"}",
"from":46,
"to":47},
{"type":"operator",
"value":";",
"from":47,
"to":48}
]
Wikipedia has a page on Parsing (also see List_of_unusual_articles for some other background information).

"Tokenization is the process of demarcating and possibly classifying sections of a string of input characters. The resulting tokens are then passed on to some other form of processing. The process can be considered a sub-task of parsing input." -- wikipedia Lexical_analysis#Token page.

We can has vegetarian cheeseburger... but how can we parse javascript?

To the rescue, comes uncle Crockford the javascript guru of jslint fame. He wrote this lovely article: http://javascript.crockford.com/tdop/tdop.html. The ideas come from a 1973 paper called "Top Down Operator Precedence". The Crockford article is great, since it is free, short, and well written javascript. Unlike the 1973 paper it gets the ideas from... which is behind a paywall, long, and uses a 1973 language called "(l,(i,(s,(p))))".

As well as being short and simple... Phil Hassey used "Top Down Operator Precedence" and this article on his journey making tinypy.

Goat driven development



Just as Phil did with tinypy, I'm going to use Goat Driven Development. Well, I'm not even sure what Goat Driven Development is... so maybe not.

Another python using dude, Fredrik Lundh, wrote some articles on "Simple Top-Down Parsing in Python" and Top-Down Operator Precedence Parsing.

Also see Eli Bendersky's article on Top Down Operator Precedence.

So where to begin?

After reading those articles a few times... scratching my head 13 times, making 27 hums, a few haaarrrrs, one hrmmmm, and four lalalas...

light bulb: A brilliant plan!

Eli Bendersky implements a full tokeniser, and parser for simple expressions like "1 + 2 * 4".

Let's copy this approach, but simplify it even more. Our first step is to make a tokeniser for a such an expression. That should be easy right?


A Token data structure.

Uncle Doug Crockford uses this structure for a token.

// Produce an array of simple token objects from a string.
// A simple token object contains these members:
// type: 'name', 'string', 'number', 'operator'
// value: string or number value of the token
// from: index of first character of the token
// to: index of the last character + 1


Here's an example token from above:

{"type":"name",
"value":"i_can_has_cheezbrgr",
"from":9,
"to":28}



Writing the tokeniser

Often a tokeniser is generated... or written by hand.

Fredrik Lundh writes a simple tokeniser using a regular expression.

>>> import re
>>> program = "1 + 2"
>>> [(number, operator) for number, operator in
... re.compile("\s*(?:(\d+)|(.))").findall(program)]
[('1', ''), ('', '+'), ('2', '')]


This is a valid approach... but regexen blow up minds. Instead I'm going to write one using a state machine, in a big while loop with lots of ifs and elses.

Our homework

Write a tokeniser for simple expressions like "1 + 2 * 4". Output a list of tokens like the javascript one does... eg.

{"type":"name",
"value":"i_can_has_cheezbrgr",
"from":9,
"to":28}


Until next time...

Really, I have no idea what I'm doing... but that's never stopped me before! It's going to be a shit javascript, but it will be our shit javascript.

Sunday, June 06, 2010

My javascript reading list over the last few months

Over the last few months I've been fairly deep into javascript land. Both in my full time job, and in most of my coding side projects - I've been mostly doing javascript. Of course, like most web programmers I've been dabbling in javascript over the years... but never so intensely. On the way I've been collecting a 'reading list' of videos, articles, books, and projects.

Here are some of the good links from the last few months.


Crockford on javascript videos:
http://www.yuiblog.com/blog/2010/04/08/video-crockonjs-5/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+YahooUserInterfaceBlog+%28Yahoo!+User+Interface+Blog%29

akihabara arcade:
http://www.kesiev.com/akihabara/

canvas javascript games:
http://www.benjoffe.com/code/

javascript 3d engine:
http://github.com/mrdoob/three.js

Aves Engine: HTML/JavaScript Game Engine (youtube.com)
http://www.youtube.com/watch?v=Ol3qQ4CEUTo
http://ajaxian.com/archives/aves-game-engine?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+ajaxian+%28Ajaxian+Blog%29

css animations and transitions
http://webkit.org/blog/130/css-transforms/
http://webkit.org/blog/138/css-animation/

3d video navigation on iphone
http://ajaxian.com/archives/iads

apple web app documentation
http://developer.apple.com/safari/library/referencelibrary/GettingStarted/GS_iPhoneWebApp/index.html#//apple_ref/doc/uid/TP40008134
http://developer.apple.com/safari/library/samplecode/FingerTips/Introduction/Intro.html

apple visual effects guide
http://developer.apple.com/safari/library/documentation/InternetWeb/Conceptual/SafariVisualEffectsProgGuide/Introduction/Introduction.html#//apple_ref/doc/uid/TP40008032

apple multi touch events
http://developer.apple.com/safari/library/documentation/AppleApplications/Reference/SafariWebContent/HandlingEvents/HandlingEvents.html#//apple_ref/doc/uid/TP40006511-SW22

android, and APIs
http://www.thesearethedroids.com/2009/12/15/creating-android-apps-with-html-css-and-javascript/
http://www.phonegap.com/
http://developer.android.com/resources/articles/using-webviews.html

Safari on iPhone Graphics, Media, and Visual Effects Coding How-To's:
http://developer.apple.com/safari/library/codinghowtos/Mobile/GraphicsMediaAndVisualEffects/index.html

Preparing Your Web Content for iPad:
http://developer.apple.com/safari/library/technotes/tn2010/tn2262.html

gamequery
http://gamequery.onaluf.org/

javascript collision detection
http://www.lukewallin.co.uk/?go=engine

box2d physics javascript:
http://box2d-js.sourceforge.net/index2.html

render engine javascript game engine
http://www.renderengine.com/

vector maths library
http://sylvester.jcoglan.com/

wii javascript
http://en.wikipedia.org/wiki/Wii_Opera_SDK

game js, pygame alike.
http://code.google.com/p/gamejs/

physics:
http://www.queness.com/post/3296/8-amazing-javascript-experiments-of-physic-and-gravity-simulation

how to detect html5 things:
http://diveintohtml5.org/everything.html

javascript audio synth
http://acko.net/blog/javascript-audio-synthesis-with-html-5

audio with js:
http://www.phon.ucl.ac.uk/home/mark/audio/play.htm
- best one.
http://www.javascripter.net/faq/sound/play.htm
http://simplythebest.net/sounds/sound_guide.html

using flash from js for sound
http://www.schillmania.com/content/projects/soundmanager2/

cross browser css image rotation
http://samuli.hakoniemi.net/cross-browser-rotation-transformation-with-css/
http://snook.ca/archives/html_and_css/css-text-rotation

open source iphone game:
http://ajaxian.com/archives/golingo-a-great-titanium-mobile-web-game-open-sourced-for-you?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+ajaxian+%28Ajaxian+Blog%29

jsdom.
http://github.com/tmpvar/jsdom
http://www.yuiblog.com/blog/2010/05/20/video-insua/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+YahooUserInterfaceBlog+%28Yahoo!+User+Interface+Blog%29

a number of interesting projects
http://www.bramstein.com/projects/

image editor
http://pixlr.com/

chrome web development extensions
https://chrome.google.com/extensions/featured/web_dev

html5 ipad app
http://mir.aculo.us/2010/06/04/making-an-ipad-html5-app-making-it-really-fast/

particles
http://spielzeugz.de/html5/liquid-particles.html

Friday, May 14, 2010

You are using the wrong #templating system for #html.

Use javascript + json + DOM for html templating.

Front end developers know how to use this technology. They know this technology very well, so will be way productive.

No weird new template language to learn and install on every single project.

No dependency on a rapidly changing server side backend... or database, or 3rd party API required.

No taking the html, and turning it into [$%%WEIRD TEMPLATE LANG%%$$] step. Instead you can use javascript, json and the DOM.

Being able to use the templating system either via the browser, or via the server side means there is no dependency between the front end and the back end. Front end developers just need an example json file to work from. Then the back end developer just needs to create the json file.

It does mean you are dependent on a DOM+javascript implementation on the server side if you want to do things like server side caching, and supporting Agents(browsers, bots etc) that do not use javascript on the client side.

Just json files. Just javascript. Just the DOM.

You are (currently) using the wrong templating system.

Wednesday, April 14, 2010

Validation through html forms.

Continuing on from a previous article I wrote earlier in the year... 'Using a html form as the model', this post describes using validation.

With html 5 comes a whole bunch of validation you can use in your forms. This makes a html form even more useful as a schema than the html 4 forms. Since you can use types like email, telephone, time, date, and such. You can also use things like minimum, and maximum length. As well as the ever useful 'required' field.

You can read about html 5 forms here.

Tuesday, April 06, 2010

UX, rsi. ctrl+c considered harmful. If not done properly.

Ctrl+c, as well as a single handed touchpad move is considered harmful.

Try and do combinations of keys, mouse moves, and gestures one finger at a time.

Why? RSI. RSI hurts people.

Please consider people when designing your user interfaces. Either avoid single handed moves, or warn against them.

Thanks.

When using computers, consider resting regularly, stopping when you hurt(and before you hurt). Also consider using one hand or finger for each action. So rather than shift+A done with one hand, try doing it with two hands. Also consider moving your hand away from qwerty middle keys do get a less stretchy press of the shift/ctrl/alt/cmd keys.

Thanks.

For more details on RSI, and RSI prevention, please see: http://www.rsi.deas.harvard.edu/spread.html.

Monday, March 15, 2010

better search engine

http://duckduckgo.com/

Friday, March 12, 2010

Memory usage of processes from python?

Is there a way to find the memory usage of python processes?

Trying to find some portable way of doing this. However, so far I think a new module might be needed...

I've got linux mostly covered, but maybe you know how with freebsd, OSX, windows(9x-7)?

So is there something built into python already? Is there a X-platform third party module already? Or a module just for one platform available?



update: here's the linux code I found and cleaned up a bit memory_usage.py if anyone is interested. bytes_resident = memory_usage.resident(). It reads /proc/PID/status... eg, like "$ cat /proc/PID/status | grep VmRSS" would.

pympler: 'Pympler is a development tool to measure, monitor and analyze the memory behavior of Python objects in a running Python application.'

psutil: 'psutil is a module providing an interface for retrieving information on running processes and system utilization (CPU, memory) in a portable way by using Python, implementing many functionalities offered by tools like ps, top and Windows task manager.'

dowser: 'Dowser is a CherryPy application that displays sparklines of Python object counts, and allows you to trace their referents. This helps you track memory usage and leaks in any Python program, but especially CherryPy sites.'

syrupy: 'Syrupy is a Python script that regularly takes snapshots of the memory and CPU load of one or more running processes, so as to dynamically build up a profile of their usage of system resources.'

Some non-pythony memory tools: valgrind memcheck, massif and cachegrind (linux), MallocDebug (osx)

Wednesday, March 10, 2010

how I recovered a friends MacOSX drive

MacOSX can sometimes corrupt a drive (like most OSen).

I've seen it happen a few times, in a couple of different ways. One way is if you interrupt some file transfers. Like by pressing 'stop' on a big transfer - this can trash your partition table. You'll likely see vfs errors in your log at this point. Anyway... linux, ubuntu and 'testdisk' came to the rescue. testdisk found the partition for me, and wrote it back... luckily it worked and my friend and I did a happy dance.

The HFS+ partition was saved.

Sunday, March 07, 2010

gnome multimedia keys via dbus-python

Ever wanted to get your multi media keys to do something other that play multi media? You could get these key events any number of ways. One way is through dbus. Here is an example: gnome_multimedia_keys.py

Here is a version which does not block your mainloop. So it's useful for integrating with other libraries (like pygame :) I made a dbus with pygame example too.

Saturday, March 06, 2010

Ideas for Super Surfaces in pygame.

pygame already has a sub-surface, which is part of a larger surface. Sub-surfaces refer to the same pixels as the surface they come from, and share the same Interface as a Surface. It's good for doing sprite sheets, where you save one file with many smaller images - but then being able to manipulate them as if you loaded the images as smaller separate surfaces.

However, sometimes we would like to operate on a whole bunch of smaller surfaces stuck together. This is what I'd like to call a Super Surface - a collection of smaller surfaces which can act as one big surface. It's a complementary idea to the sub-surface, and intuitively it should be there... but it's not yet.

Unfortunately it's a lot harder to code a Super Surface compared to a sub-surface. Since a Super Surface would need to have all the surface affecting routines changed to work with it.

For example, everything in the draw modules would need to be redone. So would all of the surface methods. So when you draw a line on a super surface, it should affect all of the sub-surfaces.

How can Super Surfaces be implemented.

Still trying to think of a simple/clever way of implementing it efficiently, without having to recode all of the drawing/blitting routines. However, just like how sub-surfaces in pygame can simplify code immensely - so too will Super Surfaces.

One method, might be to do a rect translation, and then broadcast the translated method calls over the sub surfaces of the Super Surface. So we translate the rects into the sub-surface coordinate space from the Super Surfaces coordinate space.

I'm not so sure if we can get a function pointer to a method from within the function pointer itself with the python C API. This would simplify it, because the translate-and-broadcast functionality could be implemented in one place and reused within all of the various surface affecting routines automatically. Something to research...

Common methods for implementing tiling engines.

Super Surfaces share a lot of the same properties a tiling engine. So how are tiling engines implemented?

A common way for tiling engines to work is to have the images being placed in a grid, and for each of the sub-surfaces to be the same size. So the image at [0][0] in the array is at those coordinates. This simplifies the implementation, but is not very flexible. A slightly more complex way to implement it is to have each sub surface have its position stored relative to the super surface. This complicates the clipping a little, and for speedy use it really requires a spatial hash... like a quadtree.

So I'm thinking of doing the more free form implementation... probably just a naive implementation to start with... then perhaps a quatree version as an improvement. It would be better to use a spatial hash more efficient at moving elements - but that could be a third improvement.

Use cases for Super Surfaces

Use cases include viewing massive images. Much graphics hardware has texture size limits, so it is common to combine many smaller textures when drawing larger images.

Another common use case is for 'tile engines'. Where tile engines show a larger image made up of many smaller images.

To allow using different image formats for different parts of an image is another use case. For example, a piece of the image which is just black only needs a few bytes to represent the image... it does not need 32bits per pixel. Whereas the part of the image which includes a persons face with 50% transparency will require 32bits per pixel for sure. This will save memory, and increase speed. It increases speed, because it is possible less memory bandwidth means faster operations. Also, less complicated blending - or more optimized blitting can happen (eg, fast 8 bit Run Length Encoding blitters). This should be possible with no extra effort from the programmer too.

Finally, by using smaller sub-surfaces it in effect becomes a tiling engine automatically. Tiling in graphics is used to split operations up easily. This gives memory, and speed advantages. By processing a tile at a time - you only need to have those tiles in memory at that time. It also gives easy parallelism, since the separate tiles are a form of data parallelism (which is the easiest type). For some hardware, it is possible to use parallelism when the surfaces are kept separate.

I'm interested in any comments, especially if you've made a tiling engine like this before?

Wednesday, March 03, 2010

Why bzr and launchpad? launchpad is open source

Why bzr and launchpad? Bzr AND Launchpad are open source.

launchpad: open source
github: closed source
sourceforge: closed source (was open source in the past)
bitbucket: closed source
googlecode: closed source

You can submit changes to launchpad at: dev.launchpad.net. As well as (submit launchpad bugs) and feature requests against it if you don't want to make the patch yourself.

When there is a good open source alternative, I always choose the good open source option. I initially had problems with bzr a couple of years ago... but it has been quite good to me over the last couple of months. So I'm moving over all of my projects from other version control hosting services to launchpad.

Of course bzr and launchpad are also written in python (with selected optional C optimizations), so that makes for happy hacking :)


update: reported a bug here about the 'can not find source code easily on launchpad' issue. Any other issues with launchpad?

Monday, March 01, 2010

London code dojo - 4th March '10 18:30 – 21:30 (ish).

Details, and booking here: http://ldnpydojo.eventwax.com/7th-london-python-code-dojo

"""What is a coding dojo? This is a coding dojo.

Last time we attempted to refactor the various adventure game solutions from the January dojo. Whilst interesting, perhaps refactoring isn’t that exciting an activity for a dojo :-). Nevertheless, people seemed to be having fun and we did achieve our goal of a “one true” adventure game code base that allows us to define and navigate around a game world. The code can be found here: http://github.com/ntoll/code-dojo/tree/master/adventure/week2/.

After discussion at the end of the February dojo (and later in the pub) we decided that this time round we’re going to try another small-groups based exercise with a “show and tell” at the end as we continue to build the world’s greatest adventure game. Problems we might want to tackle include: a command parser, keeping track of game state/score/objects, NPCs/AI, authoring tools, turning it into a MUD (and so the list goes on…)

Photos from all the dojos so far can be found here: http://divvyshot.com/event/IsJtx/

Free pizza and beer will be provided.

Participants get the chance to win a cool book (thanks O’Reilly)."""

Thursday, February 25, 2010

uh0h, I made a logo. What do you think?

Made a quick little logo for a website called...







Tried a few different versions... showed my girl friend a few, and stuck with this one in the end. Made it with a simple DejaVu Sans Mono Bold font. Ya for the DejaVu font project. Modified the zero(0) a bit, then inverted the colors (ya! negative space). Played around with the letter spacing... applied an old school guassian blur filter to the right side, resized for web... and done!

What do you think?



The brief is 'something quick for yet-another-culture blog/zine called uh0h... as in uh oh I dropped a hammer on my foot.'.




update: Added some results in this image below. Which is updated with a pygame/freetype script. The idea is that the results in the image can be updated without the rss feed getting updated. Hoping to clean the script up and add it to pywebsite. The same technique seems to be used by some wordpress blogs with their 'comments:3' image links - so I think it will be useful for all sorts of things shown on blogs where frequent updates are required without having to update the post.

Wednesday, February 24, 2010

svn merging is easy...

Subversion(svn) merging is easy... iff you are using a modern svn version (1.6.x).
Here it is in short:

$ cd /dir/of/your-branch
$ svn merge ^/trunk
Where ^/trunk is the url of the branch you want to merge from. For more details, have a look in the svn book basic merging section.

Also this article on svn merging explains it fairly well.

The svn 1.6 release notes and 1.5 release notes also talk about various svn updates including merge enhancements amongst other goodies (like improved python bindings).

Really, merging is not too bad in subversion now. I know this post won't stop everyone from using 2004 reasons arguing over version control systems... but what ever.

Ok, bzr, hg, and git all have lots of nice features - but merging in svn is pretty easy now so please bash(zsh) it for other reasons if you must.

Now, let's move back to the vi VS emacs argument ;)

Friday, February 19, 2010

The secret to my web development productivity...

Can you see the secret to my web development productivity in this photo?




No it's not the red cup of coffee.

It's not the pieces of sticky tape on my laptop.

Follow the cable from my laptop...

and have a look under the desk.

...

... I'll wait whilst you have a look before telling you the answer.

...

That's right!!!

...

it's a joypad.

Using python and pygame, I've made a little app which listens for joystick events, and then deploys the website I'm working on.

With a mash, kick or prod of my foot, up goes my website. Deployed.

Deploy is just a fancy word for 'upload my website, do database migrations, restart app servers, run tests, rollback if there are failures... etc'.


For 5 british pounds, $7.8 USD, or 5.74 euros, you can get one of these joypads delivered to most places these days.

It's giving me too much pleasure pressing it every time I want to upload a new version of the website. Most live updates EVER today - and I have the joypad to thank for it.

The joypad is the secret to my web development productivity. Please don't tell anyone.



update: here's a slightly cleaned up, and slightly silly version of 'Joy to Deploy'...

Check out using bzr revision control:
    bzr co lp:joytodeploy
Or you can view joytodeploy.py source code.


joytodeploy.py deployment_program

for example:
joytodeploy.py deploy.sh with great justice

joytodeploy.py echo 'ninjas are better than pirates'

Thursday, February 18, 2010

Genshi templates - header/footer templates, and including templates.

How to include a Genshi template inside of another genshi template?

Use "py:match" inside the template you want to include into other pages (eg. a sitelayout.html).

Then you can use "xi:include" in the pages you want the files included in. (eg. best_picture_of_my_cat_today.html)

Now, it's not a "simple include this template here" kind of thing. The py:match can find various parts of the file, and then replace them how they like. For example, a page layout template will match the header and footer with "py:match".

It is best explained with examples, and in the documentation:
  • pylons genshi example
  • Includes section of the genshi documentation where it explains "xi:include".
  • Genshi documentation where it explains py:match.


  • Hopefully this explains the genshi way of including a template into a template.

    Wednesday, February 17, 2010

    My fling with engine X.

    Projects that have X in the name are cool. Engines are cool (especially steam powered ones). Spelling engine 'ngin' gains 17.5 l33t points too. All combined, this makes the name nginx super schwet!

    I've been using nginx for a while now, and have found it quite good so far. Been moving a couple of websites to it, fooling around with it a lot.

    So far I've been able to use it for everything I've tried. Some of my apache configs are fairly long... so that is saying quite a bit. On my low memory (512MB) server it has saved quite a bit of memory - even though I've only moved over a couple of websites. Along with the cherrypy memory reduction work I did recently, my server has a bit more room to breathe... (and for me to waste memory on hosting other websites! ya!).

    Nginx has a good reputation as being rock solid - so I hope it holds true for me. Then perhaps I can get rid of apache completely (on this one server). I've tried to replace apache with other web servers before... but always come up with a reason to move back to apache. Either some application uses a feature that the other server does not support, or the other server is just not as robust as apache. I don't like fixing, or looking at servers... I just want them to work without hassle. I'm not afraid of working, and learning about a new webserver to work... It's just that some webservers are too high maintenance.

    Fastcgi is one way nginx allows you to host php and python websites. Nginx can also be used as a reverse proxy server. Personally I like to host python websites with cherrypy and a reverse proxy, and use fastcgi with php. The nginx proxy_pass configuration seems to work quite well... as does its simple-to-setup load balancing.

    Like all good software, I love to check out how it was made. Reading the nginx source code is breath of fresh air. Despite Igor[the main author] being Russian, the code is written in english C(with tiny smatterings of asm/perl)... with very few comments. It is a modular, and very clean code base. It doesn't seem to have any unittests... but it's still quality software.

    Our relationship is still a fling really. Before I invite nginx to meet my parents, I'll give it few more months. Ok, my love letter to nginx is done now.

    Thursday, February 04, 2010

    python - unifying c types from different packages.

    Python already has a number of objects to represent c types. However, there is a need to improve interoperability between systems using these c types. Below I explain the need, and discuss existing efforts to address this need. Then ways to transparently translate between the various type systems without each system needing to know about each other are also discussed.

    In the ctypes library - you can represent an unsigned 32bit integer with ctypes.c_unit32.

    In the array, and struct modules there are different array type codes. For example, 'L' represents unsigned int with a minimum of 4 bytes on 32bit systems and 8 on 64bit systems.

    numpy, cython, pyopengl and other python extensions have their own types representing c types too. Most extensions which link up to languages which use static typing represent basic c types to python in some way.

    Not only libraries, but various compilers and translation tools also use c types. For example tinypyC++, cython, swig, etc. Also type inference is done from things like shedskin, and rpython - but they represent types internally with their own type objects.

    Standardising on one set of c type objects or string codes would give some compatibility advantages. However, that will be hard to change for backwards compatibility reasons. A mapping between the various types should provide plenty of the advantages. For example, to be able to translate from a ctypes to a numpy type should be fairly simple.

    Here you can see that numpy, ctypes and the python.array module already have integration:

    >>> import numpy, cython, ctypes, OpenGL.GL

    >>> numpy.array([1,2,3,4.2], ctypes.c_uint32)
    array([1, 2, 3, 4], dtype=uint32)

    >>> numpy.array([1,2,3,4.2], numpy.uint32)
    array([1, 2, 3, 4], dtype=uint32)

    >>> numpy.array([1,2,3,4.2], 'L')
    array([1, 2, 3, 4], dtype=uint32)

    >>> numpy.array([1,2,3,4.2], OpenGL.GL.GLuint)
    array([1, 2, 3, 4], dtype=uint32)

    >>> numpy.array([1,2,3,4.2], cython.uint)
    ------------------------------------------------------------
    Traceback (most recent call last):
    File "", line 1, in
    TypeError: data type not understood

    Pretty cool hey? Numpy already knows about many of the type variables available in the python ecosystem. With the notable exception of cython.

    I think there is a need to try and standardise use of c type variables - so that more code can interoperate without each system needing to know about each other systems type objects. Alternatively a translation layer can be made in place.

    For example an adaptor something like this:
    # this registers two types which are the same.
    >>> type_registry.register_types(numpy.uint32,
    ... cython.uint)

    # here numpy.array does not know about cython directly,
    # but can look at the registered type we just did to get it from there.
    >>> numpy.array([1,2,3,4.2], cython.uint)
    array([1, 2, 3, 4], dtype=uint32)

    # if numpy does not know about the adaptor registry then we can still
    # use the registry, if only in a more ugly - non transparent way
    # by calling a translate function directly:
    >>> numpy.array([1,2,3,4.2],
    ... type_registry.translate(cython.uint))
    array([1, 2, 3, 4], dtype=uint32)

    Instead of an adaptor, a magic variable could be used which would contain the 'standard c type variable' from python. For example - cython.uint.__ctype__ == ctype.c_unit32. Then numpy could look for a __ctype__ variable and use that - without having to be extended for every system that is made. One problem with a magic variable over registered types is that some python objects can not have those magic variables assigned. For example, try adding a __ctype__ variable to an int instance - it won't work.

    Either the adaptor, or the magic variable would let cython - and other systems use their own type objects and still have a way to translate the types to the standard python c type variables (when/if they are chosen).

    A simple mapping (with a dict) from a package to the standard c type objects/type codes is another method that could be used. This will allow a package to fairly easily hook into the eco system. For example cython could have a __c_type_mappings__ magic variable at the top level of its package. Then another package looking to translate the type could look to the package for this __c_type_mappings__ variable. The advantage of this is that many times variables can be injected into a package but not into extension types in the package. On the other hand this feels icky.

    The c types from the ctype package seem to be a fairly good choice for this. PyopenGL 3.x series uses the ctypes as its types. eg, OpenGL.GL.GLuint == ctypes.c_uint32. Except ctypes is a fairly big dependency just for a few types.

    The buffer pep 3118 being introduced into python/numpy to make buffer sharing between libraries is a similar use case. However it involves sharing instances of differently typed buffers - and has quite clever semantics for many use cases. The formats from that pep could also probably be used to share type information.

    The buffer protocol pep specifies extra format strings over the ones specified in the python.array module. So as to be able to specify a more complete set of type, and memory layouts. So rather than using the ctypes types, it probably makes sense to use the new buffer protocol format codes specified in pep 3118. As they are just strings without any further dependencies on the rest of the ctypes machinery (eg libffi etc). Of course, if you are using ctypes already - then depending on it is not a problem.

    Of course ctypes.c_uint32 is more descriptive than 'L' to many people, so the format codes (eg 'L') should just be used for specification. People should still use their own type objects - but provide a translation to their format codes as specified in pep 3118 the new buffer protocol.

    The codes specified in pep 3118 will probably need to be expanded as more c types need to be described. For example bit fields and bit depths of types are not described in the pep. Many systems specify the bit depth of the type - numpy, ctypes, opengl, etc. For example they use 'uint32' rather than 'unsigned int'. Also bit fields are becoming more common in C so they should be added to the type code formats in someway too.

    In conclusion there is a need for interoperability of c types from various python extensions, libraries and python compilers. The pep 3118 format codes, and ctypes types are good candidates to work with for standard c type objects/codes. Adaptor registries, simple mappings, and/or magic variable names could be used to enhance interoperability.


    Sunday, January 17, 2010

    good bye Vivienne

    will miss my little sister. wish we had more time together. so sad. shit

    Friday, January 15, 2010

    Jquery 1.4

    Another quality jquery release... jquery 1.4.

    Some very handy changes, my favourite probably being able to use functions to set values. In short, some of the changes are:

  • setting values with functions
  • html5 form support
  • better JSON/JSONP and ajax functionality
  • easier element construction (just pass a dict of attributes)
  • reverse indexing like in python, where -1 is the last item.
  • better animation for multiple attributes at once
  • performance improvements, code cleanups, and bug fixing

  • Full details in the jquery 1.4 release notes.

    Wednesday, January 13, 2010

    worker pool optimization - batching apis, for some tasks.

    What are worker pools anyway?

    Worker pools are an optimization to the problem of processing many things in parallel. Rather than have a worker for every item you want to process, you spread the load between the available workers. As an example, creating 1,000,000 processes to add 1,000,000 numbers together is a bit heavy weight. You probably want to divide it up between 8 processes for your 8 core machine. This is the optimization worker pools do.

    With a usual worker pool there is a queue of jobs/data for the workers to work on. The workers are usually threads, processes or separate machines which do the work passed to them.

    So for 1000 items on that queue, there are around 1000 accesses to that queue. Usually the queue has to be thread safe or process safe, so that pieces of data are not sent to many workers at once.

    This can be an efficient method to use for some types of data. For example, if each job can take different amounts of time, like IO tasks over the internet... this is not optimal, but pretty good.

    Problem work loads for typical worker pools.

    Let's assume that the tasks are fairly easy to measure the average(or median if you like) time of each task. So either not IO tasks, or fairly equal length tasks. Then the central queue idea starts to fall down for the following reasons.

    What if the cost of starting a new job is quite high? Like if starting each job happened over a machine with a 200ms network latency (say using a HTTP call to the other side of the planet). Or if a new process needs to be spawned for each task ( say with exec or fork ).

    Or if the cost of accessing the queue is quite high? Like if you have a lock on the queue (eg a GIL) and lots of workers. Then the contention on that lock will be quite high.

    What if there are a lot more items than 1000? Like if there are 10,000,000 items? With so many items, it is worth trying to reduce or avoid that cost of accessing the queue all together.

    How to optimize the worker pool for these problems?

    The obvious solution is to divide the items up into chunks first, and then feed those big chunks of work to each worker. Luckily the obvious solution works quite well! It's trivial to divide a whole list of things into roughly equal size chunks quite quickly ( a python one liner *1).

    An example of how to improve your worker pool.

    Here is some example code to transform a pygame.threads.tmap command that uses a worker pool to do its work off a central worker queue, into one that first divides the work into roughly equal parts. Mentally replace pygame.threads.tmap with your own worker pool map function to get the same effect.

    #file: divide_and_map.py
    import operator

    # Here's our one liner divider, as two lines.
    def divide_it(l, num_parts):
    return [ l[i:i+num_parts] for i in xrange(0, len(l), num_parts)]

    # Here is our_map which transforms a map into
    # one which takes bigger pieces.
    def our_map(old_map, f, work_to_do, num_workers):
    bigger_pieces = divide_it(work_to_do, len(work_to_do)//num_workers+1)
    parts = old_map(lambda parts: map(f, parts), bigger_pieces)
    return reduce(operator.add, parts)

    # now an example of how it can speed things up.
    if __name__ == "__main__":
    import pygame, pygame.threads, time

    # use 8 worker threads for our worker queue.
    num_workers = 8
    # Use the pygame threaded map function as our
    # normal worker queue.
    old_map = pygame.threads.tmap
    pygame.threads.init(num_workers)

    # make up a big list of work to do.
    work_to_do = list(range(100000))

    # a minimal function to run on all of the items of data.
    f = lambda x:x+1

    # We time our normal worker queue method.
    t3 = time.time()
    r = pygame.threads.tmap(f, work_to_do)
    t4 = time.time()

    # We use our new map function to divide the data up first.
    t1 = time.time()
    r = our_map(old_map, f, work_to_do, num_workers)
    t2 = time.time()
    del r

    print "dividing the work up time:%s:" % (t2-t1)
    print "normal threaded worker queue map time:%s:" % (t4-t3)

    $ python divide_and_map.py
    dividing the work up time:0.0565769672394:
    normal threaded worker queue map time:6.26608109474:


    For our contrived example we have 100,000 pieces of data to work through. If you created a thread for each piece of data it would surely take for ever. Which is why people often use a worker queue. However a normal worker queue can still be improved apon.

    Results for this contrived example made to make this technique look good?

    We get a 100x speedup by dividing the work up in this way. This won't work for all types of data and functions... but for certain cases as mentioned above, it is a great improvement. Not bad for something that could be written in one line of python!*1

    It's an interesting case of how massaging your data to use Batching API design techniques gives good results. It also shows how writing parallel code can be sped up with knowledge of the data you are processing.

    *1 - Well it could be done in one line if we were functional ninjas... but for sane reading it is split up into 12 lines.

    Tuesday, January 12, 2010

    mini languages that non programmers can understand

    There are hopefully a number of mini text based programming languages that non-programmers can understand. But what are they?

    One that I've used in the past is something like this:

    Which would parse into a python/javascript data structure like this:
    {name: 'Bob',
    gender: 'male/female',
    age:'22'
    }

    It's suprisingly common in things like search engines. Grandmas who occasionally check their email might not get it (but many do I'm sure!)... but I think a lot of others do. For things like search it is ok, if people know the magic 'terms'. If they do not know the terms, then they can just enter text to search normally. The mini language is used by advanced users.

    This is quite good for single line free form data entry. Since people only need to know the concept that you have 'key:value'. It's slightly easier than using urls, since separators can be different things.

    csv files


    Next up are comma separated files - csv.
    For example:
    name,gender,age
    Bob,female,22
    These are like spread sheets. Many people seem to be able to edit these quite fine. Especially if they have a spread sheet program to do the editing.

    URLS

    URLs are a mini language. With things like # anchors, query strings, and even paths being used by people all the time.

    ?name=Bob&age=22&gender=female

    ini files

    Common as configuration files.
    [heading]
    key=value

    subsitution templates

    Common for web sites, and email systems.

    Hi ${name},

    ${age} ${gender}

    basic html and other mark up languages

    Quite a lot of people know bits of html. 'How do I do a new line? oh, that's right: brrrrrrrr.'

    However, I think that html is on the edge of too complicated. Modern html is especially complicated.

    Things like markdown, bbcode, and wiki languages all fall into this category. The languages can sometimes have only 5-10 elements - which make them easy to learn the basics.

    Older Wiki language text could look just like text someone would write in notepad. However modern ones - like html - now have all sorts of ${}[]==++ characters with special meanings.



    Are there any other mini languages which are easy to understand for non-programmers?

    Sunday, January 10, 2010

    pypy svn jit cheat sheet.

    A quick cheat sheet for trying out pypy jit. This is for a ubuntu/debian 32bit machine translating the jit version of pypy-c.
    # install dependencies for debian/ubuntu.
    sudo apt-get install python-dev libz-dev libbz2-dev libncurses-dev libexpat1-dev libssl-dev libgc-dev libffi-dev

    # download from svn.
    svn co http://codespeak.net/svn/pypy/trunk pypy-trunk

    # Translate/compile the jit. This part can take a while.
    # Don't worry. Relax have a home brew.
    cd pypy-trunk/pypy/translator/goal
    ./translate.py -Ojit targetpypystandalone.py

    # Or for the low memory pypy-c use this translate instead...
    #./translate.py --gcremovetypeptr targetpypystandalone --objspace-std-withsharingdict
    The pypy getting started documentation has more about it if you're interested.

    pypy has most standard modules up to python2.5... and some from newer versions.

    I didn't need any external compiled extension modules... just sqlite, and cherrypy. So for this project I could use pypy! *happy dance*

    Didn't take long to port my code to it. I only had a couple of issues, which people in the #pypy channel on freenode helped me with.

    There is an alpha version of the sqlite3 which uses ctypes included with pypy as the 'pysqlite2' module. Seemed to work well enough for me, and passed all my tests.
    import sqlite3
    # for pypy, since the sqlite3 package is empty... but they have a pysqlite2.
    if not hasattr(sqlite3, "register_converter"):
    from pysqlite2 import dbapi2 as sqlite3
    Another issue I had, was I was using 'x is not y' to compare two integers in one function. In cpython they use a hack so that the first 100 numbers or so share the same identity. Since the numbers were always less than 100, this code was working fine in cpython. However, pypy doesn't have that problem/feature - so I just used the != instead.

    I think those were the only two things I had to change. All my tests were passing, and it was the end of the day, so I went down the pub to see a friend visiting from Tokyo who was having a birthday.

    Friday, January 08, 2010

    Unladen swallow review. Still needs work.

    Tried out unladen swallow on two work loads today. After the announcement they are going to try and bring it into core. So I finally got around to trying it (last time the build failed). An hour or so later, the build finally finished and I could try it out. The C++ llvm takes ages to compile, which is what took most of the extra time. What follows is a review of unladen swallow - as it stands today.

    The good part? Extensions work(mostly)! w00t. I could compile the C extension pygame, and run things with it.

    Now to run code I care about, my work loads - to see if their numbers hold true for me.
    cherrypy webserver benchmark: crash
    pygame tests: some crashes, mostly work.
    pygame.examples.testsprite : random pauses in the animation.


    The crashes I've found so far seem to be thread related I guess. The cherrypy one, and some of the pygame ones both use threads, so I'm guessing that's it.

    Random pauses for applications is a big FAIL. Animations fail to work, and user interactions pause or stutter. Web requests can take longer for unknown reasons etc. I'm not sure what causes the pauses, but they be there(arrrr, pirate noise).

    LLVM is a big, fast moving dependency written with another language, and a whole other runtime (C++). Unladen swallow uses a bundled version of it, since they often need the latest... and they need the latest fixes to it. This might make it difficult for OS packagers. Or LLVM might stabalise soon, and it could be a non-issue. Depending on C++ is a big issue for some people. Since some applications and environments can not use C++.

    The speed of unladen swallow? Slower than normal python for *my* benchmarks. Well, I couldn't benchmark some things because they crash with unladen... so eh. I might be able to track these problems down, but I just can't see the benefit so far. My programs I've tried do not go faster, so I'm not going to bother.

    Python 3 seems to be 80% of the speed for IO type programs like web servers (cherrypy) (see benchmarks in my previous post). However unladen-swallow only seems to be 10-20% slower for pygame games, but the random pauses make it unusable.

    Python2.x + psyco are way faster still on both these work loads. 20%-100% faster than python2.6 alone. Psyco, and stackless are both still being developed, and both seem to be giving better results than unladen swallow. Using selective optimisation with tools like shedskin, tinypyC++, rpython, cython will give you 20x speedups. So for many, writing code in a subset of python to get the speedups is worth it. Other people will be happy to write the 1% of their program that needs the speed in C. This is the good thing about unladen swallow... you should be able to keep using any C/C++/fortran extensions.

    Unladen-swallow has a google reality distortion bubble around it. They only benchmark programs they care about, and others are ignored. There are other peoples reports of their programs going slower, or not faster. However the response seems to be 'that is not our goal'. This is fine for them, as they are doing the work, and they want their own work loads to go faster. However, I'm not sure if ignoring the rest of the python communities work loads is a good idea if they are considering moving it into trunk.

    It's too early to declare unladen-swallow done, and good imho. I also think better research needs to go into it before declaring it an overall win at all. Outside review should be done to see if it actually makes things quicker/better for people. For my workloads, and for other peoples workloads it is worse. It also adds dependencies to C++ libraries - which is a nono for some python uses. Extra dependencies also increase the startup time. Startup time with unladen swallow is 33% slower compared to python for me (time python -c "import time").

    Let's look at one of their benchmarks - html5lib. See the issue html5lib no quicker or slower than CPython . They arranged the benchmark so unladen-swallow is run 10 times, to allow unladen swallow to warm up. Since Cpython is faster the first time through.

    blue - unladen-swallow, red - cpython 2.6. Time(y) for 10 runs(x).

    Notice, how jumpy the performance is of unladen on the other runs? This might be related to the random pauses unladen swallow has. I don't like this style of benchmark which does not account for the first run. Many times you only want to run code on a set of data once.

    When looking at their benchmark numbers, consider how they structure their benchmarks. It's always good to try benchmarking on your own workloads, rather than believing benchmarks from vendors.

    Memory usage is higher with unladen swallow. It takes around two times as much memory just to start the interpreter. The extra C++ memory management libraries, the extra set of byte code, and then extra machine code for everything has its toll. Memory usage is very important for servers, and for embedded systems. It is also important for most other types of programs. The main bottleneck is not the cpu, but memory, disk, and other IO. So they are trading better cpu speed (theoretically) for worse memory. However since memory is often the bottleneck - and not the cpu, the runtimes will often be slower for lots of work loads.

    It seems python2.6 will still be faster than unladen swallow for many peoples work loads. If they do not get other peoples programs and workloads working faster, or working at all, it will not be a carrot. As peoples programs work, and go faster with python2.6/2.7 it will be a stick*.

    Unladen swallow has not (yet) got to it's 5x faster goal, and for many work loads it is still slower or the same speed. For these reasons, I think it's too early to think about incorporating unladen swallow into python.

    * (ps... ok, that made no sense, sorry. Sticks and carrots?!?... donkeys like carrots, but so do ponies. I don't think we should hit people with sticks. Also people don't like carrots as much as perhaps chocolate or beer. Perhaps all this time hitting people with sticks and trying to get them to do things with carrots is the problem. Python 3 has heaps of cool things in it already... but more cool things always helps! Beer and chocolate would probably work best.)

    Thursday, January 07, 2010

    Using a html form as the model.

    I've mentioned this technique before, but I think it is worth repeating more clearly. Without any other code clouding the main message: 'All you need is love html.

    A HTML form can describe a model of your data quite well. This lets you go from HTML form design to a working Create Read Update Delete (CRUD) system. A CRUD can be thought of as a nice admin interface to edit your data (or a CMS if you prefer that acronym).

    For example:
    <form action='savepage' method='post'>
    title:<input type='text' name='title'>
    <br>textarea:
    <textarea name='content'></textarea>
    <input type='submit' name='submit'></form>
    That's all you need to create a CRUD. Things like validation can be defined in the html easily enough then implemented in your code to be checked server side, and client side. Parse the form to get the model, and go from there.

    The benefits are simplicity, and that designers can make a form, and then pretty quickly go to a working system. No need to edit sql, java, php, python, etc - just normal html forms.

    Another benefit is more Rapid Application Design (RAD). From the html design you can quickly move to a working app. Especially in a work flow where the designers and clients mock up various forms - this is quicker. It also stops blockages in the production pipeline. Blockages that happen when waiting for a python/php/java programmer to implement the model.

    Multiple file uploads with html 5, cherrypy 3.2 and firefox 3.6beta5.

    Here's an example for uploading multiple files with HTML 5 and newer browsers, like the firefox 3.6 beta5. You can shift/ctrl select multiple files, and even drag and drop them in. Makes for a much nicer user experience uploading files - rather than having to select one at a time, or having to load some java/flash thing.

    It uses the unreleased cherrypy 3.2, with it's new request entity parsing tool hooks. See http://www.cherrypy.org/wiki/RequestBodies for details about the new control allowed over the whole process. It's a lot easier to make custom request entity parsing behaviour now, and in a much less hacky way than before.

    http://rene.f0o.com/~rene/stuff/cherry_multi_fileupload.tar.gz

    With the tool in there, files come in as a list of files instead.

    Wednesday, January 06, 2010

    Oldest python file in your home directory?

    Feeling just a little nostalgic this time of year.

    Just made a little script to find the oldest python files on your hard drive.
    http://rene.f0o.com/~rene/stuff/oldest_python.py

    oldest_python.py [path]
    oldest_python.py mystuff/python
    oldest_python.py
    oldest_python.py ~
    Update: Lennart mentions a unixy way in the comments of finding oldest files with this:
    find . -name '*.py' -printf "%T+ %p \n" | sort | more

    With that I found some really old python files of mine... woh! the oldest ones dated in 1998. There are older C, C++, haskel, javascript, java, pascal, prolog, asm, sql, perl, etc, etc and heaps of other old junk files, but those are the first ones I could find written in python.

    I guess that means I've been programming python for around 12 years now. Python was at version 1.4 or so, and 1.5 was released not long after. New style objects did not exist, and it was not all too uncommon to be able to segfault the interpreter (ping could easily crash linux & windows in those days, so python was doing pretty good).

    So what did some of the older python files do?

    Cut up writing was the oldest tool I found. Cutup writing was a technique used a lot in the 90s (before then and to this day as well). The idea is that you cutup various pieces of writing, and move the pieces around to get ideas.

    After that came a script to randomly ping different web hosts (by http) every few minutes. In 1998 it was common for ISPs to disconnect you from your modem if you were idle for a while. So this script seems to ping one of a random selection of hosts and then go back to sleep. I remember this being the first thing I wrote in python. It took me less than an hour to learn enough python do this little script. That one hour probably saved me $300/year or so in telephone bills.

    After that there is a script to convert all file names to lower case. Useful for bringing the contents of FAT drives onto a linux box.

    Then there was some thread testing code. Threads these days are way better, with lots better tools, better implementations, and more known about how to use them appropriately. Using threads in those days for IO was pretty crazy... but apache used processes! Python had a wicked set of async IO servers in its toolbox. Which were pretty darn cool in the day.

    Finally there were some mp3 making tools - to convert my massive collection of CDs. This was when some machines could barely play mp3s without crackling. Seems my tool would use various linux tools to make my job easier. Rip cds, get their names, convert them to mp3s. Using some old scsi drives I could go about my business without my machine slowing down completely and becoming unusable.

    What are your oldest python files? http://rene.f0o.com/~rene/stuff/oldest_python.py


    Tuesday, January 05, 2010

    Fifth London code dojo 6.30pm Thursday 7th of Jan.

    Here's the london dojo google calendar for much of 2010.

    More details, and signup here: http://ldnpydojo.eventwax.com/5th-london-python-code-dojo/. The idea is to practice, learn and teach python skills.