Tuesday, July 22, 2008

Going to speak at DjangoCon!

Wow, my name is on the DjangoCon schedule on Sunday, 2:25 pm. Jim Baker told me about it just a few days ago. This may be a bit surprising, but looks like the whole DjangoCon is a bit surprising.

I can hardly believe that I'm going to be there, and I'm extremely happy to have this opportunity to meet the Django community and to show what we are doing on this GSoC-funded project of getting Django running on Jython and integrating it with some cool JVM stuff.

Well, I still have to do all the paper-work (I don't have a US visa yet, this is going to be my first visit to the country), and there is not too much time to do it. Not to mention that I've to practice my spoken English. But we humans are optimistic by nature, and I think that such optimism let us do most of the great things we do!

Sunday, July 20, 2008

The Jython Import Logic

Motivation


I think the coolest feature of Jython is the seamless integration with Java. Let say you have the following java class:

package com.leosoto;
public class HelloWorld {
public void hello() {
System.out.println("Hello World");
}
public void hello(String name) {
System.out.printf("Hello %s!", name);
}
}

If the class is on the classpath when you start Jython, using it from python code is straightforward:

>>> from com.leosoto import HelloWorld
>>> h = HelloWorld()
>>> h.hello()
Hello World
>>> h.hello("joe")
Hello joe!

Now, did you knew that if the class was not pointed by the classpath, we could also package it on a jar, and the following would also work:

>>> import sys
>>> sys.path.append('/path/to/helloworld.jar')
>>> from com.leosoto import HelloWorld

Until yesterday, I didn't knew!

Since part of my GSoC project is to come with a way to package Django projects in a single distributable war file, I've spent a complete day reading and playing with the Jython import logic, and here is what I got.

Not much different than Python, right?


First of all, Jython is an implementation of the Python language. So the import mechanism follow strictly what is know as PEP 302: import hooks. I don't want to repeat what is documented there, but a quick explanation is in order:

  • First, try custom importers registered on sys.meta_path. If one of them is capable of importing the requested module, we are done.

  • For each entry of sys.path:

    • Find the first hook registered on sys.path_hookthat can handle the path entry (for example, zipimport is registered there and handle all "*.zip" paths)

    • If a importer hook was found, try it. If the importer loaded the module, we are done

    • If a importer hook was not found, use the builtin import logic (good old *.py files inside directories). If the module is successfully loaded, we are done.



How are java classes loaded then?


With a built-in import hook, naturally ;-)

If you start CPython and look at sys.path_hooks you get:

>>> import sys
>>> sys.path_hooks
[<type 'zipimport.zipimporter'>]

On Jython, the result is slightly different:

>>> import sys
>>> sys.path_hooks
[<type 'JavaImporter'>, <type 'zipimport.zipimporter'>]

The JavaImporter only recognizes the '__classpath__' entry on sys.path, so it is fired after looking at path components before '__classpath__'. This gives us some control over which namespaces will end up containing python modules and which will contain java packages/classes, if some conflict occurs (such as the very real issue of having the 'test' python module and the 'test' java package). Naturally, the '__classpath__' entry is added automagically to sys.path on Jython startup.

But...

>>> sys.path_hooks = []
>>> sys.path_importer_cache = {}
>>> del sys.modules['java']
>>> import java
>>> dir(java)
['__name__', 'applet', 'awt', 'beans' ... ]

This should have failed, after removing the JavaImporter hook and all the involved caches. Well, there is also some magic going on here...

Jython, JavaImporter and Java Packages


When an import is going to fail (that is, after searching on all sys.path entries and having no results), Jython tries to load a java package or java class. But wasn't that the task of the JavaImporter?

Well, sort of. Half of such job is the responsability of the JavaImporter. The other half is managed by the SysPackageManager, which keeps in memory a tree of discovered java packages.

When the Jython interpreter starts, the SysPackageManager looks for all jars and directores on the classpath and build the tree of java packages. You can also explicitely add a Java package into the PackageManager by calling sys.add_package("package.which.was.not.autodiscovered"). This is useful on environments where Jython is not allowed to look at the system classpath, or doesn't get the right information (as maybe the case when running inside a JavaEE container).

Back to JavaImporter, its job is to just look into the SysPackageManager's loaded packages and check if the requested name is present there.

And here is the magic


Another way to get packages loaded into SysPackageManager is to add a zip or jar to sys.path. The next time the import logic runs, it automatically add the contents of the new jar (or zip) to the tree of known java packages.

This is a little weird, because if you have the following in your sys.path:

['__classpath__', '/foo', 'foo.jar']

Then, if java packages on foo.jar conflicts with python modules from /foo then the java packages will prevail, because the '__classpath__' entry is before '/foo', and then the JavaImporter will do its magic.

And the other bit of magic is what we have already seen: Jython does a last attempt to load a Java package, or to be more precise, to add a package to the SysPackageManager if the imported name is know to the JVM as a class or package name. If this operation is successful, the module is directly imported by this Jython builtin import logic addition (no way to go back to the JavaImporter at this time).

Some observations


Here ends the objective part of this post. What follow now are my observations on the whole process:

  • I don't quite understand why Jython tries to load Java classes or packages at the end of the import logic after trying the standard procedure. Seems like such fallback would make the calls to sys.add_package unnecessary, but then, why does add_package exists? And, in any case, I think that JavaImporter should do this

  • The confusing situation of jar files (and java classes) in sys.path is well... confusing. The good news is that namespace conflicts aren't that common in practice, so just remembering that all java "modules" come from the magic '__classpath__' element is enough.

  • It would be nice if the Jython standard loader were installed on the meta_path. Then, JavaImporter could be added there too, just after the default python code loader. This way, we would have a more clear precedence rule (Python modules first, Java packages/classes later), instead of the current "first python modules before __classpath__, followed by java "modules", followed by python modules after __classpath__, followed bt java "modules" wich weren't registered yet on the PackageManager).

That's all


OK, that was a long post. Now that I've dumped all that info here, I can go back to coding and try to make distributable WAR files for django projects, containing the complete Jython, modjy and Django runtime.

Tuesday, July 15, 2008

Jython 2.5 Alpha Released!

As announced by Frank Wierzbicki: Jython 2.5 alpha is out.

If you work with Java and love the Python programming language, this is a good opportunity to test this great Python implementation, integrate it with some of your Java programs and tell us how it went.

Monday, July 14, 2008

My New Django/Jython Developer Workflow

Both Django and Jython project are fast moving targets these days. That's a good thing, both projects are rapidly approaching big milestones: Django 1.0 and Jython 2.5. But that also means that if you are trying to patch both codebases to integrate them, your task gets a little more complicated.

So I don't have private mercurial branches of both projects anymore, because it makes quite hard to update patches and keep them as separate units. The new solution is two new mercurial repositories: django.patches and jython.patches. They contain mercurial queues and correspond to the .hg/patches directory you put inside the repositories containing the mercurial mirror of each project.

Translated to command line, this is what you have to do in order to get the current code for Django on Jython:

$ hg pull https://hg.leosoto.com/django.svn.trunk
$ hg pull https://hg.leosoto.com/jython.svn.asm
$ hg pull https://hg.leosoto.com/django.patches django.svn.trunk/.hg/patches
$ hg pull https://hg.leosoto.com/jython.patches jython.svn.asm/.hg/patches
$ hg --cwd django.svn.trunk qpush -a
$ hg --cwd jython.svn.trunk qpush -a

This way, I always try to keep my patches updated to apply cleanly to each project latest svn version.

I also have a hudson running on my machine which runs the Django test suite on CPython using all backends. Once that test is finished it runs the suite again after applying my Django patches (to make sure I'm not breaking anything). Finally it runs the suite one more time, using Jython and the postgresql_zxjdbc backend. This should replace the dojstatus site, once I discover a way to publish the results I get from hudson without installing hudson on my hosting.

Thursday, July 10, 2008

By the way... I'm a Jython commiter! :)

I've been so busy last weeks (half-time job, Summer of Code, and university final exams and labs!), that I even forgot to mention that Frank gave me committer access to the Jython project, two weeks ago.

I want to thank publicly all Jython core developers for the confidence they have put on my work, especially Jim Baker, my GSoC mentor. Also Phillipe Jenvey, Nicholas Riley and obviously Frank Wierzbicki, have been extremely helpful guiding me when I needed help.

As I now commit most of my Jython patches directly to the SVN repository, I'm going to deprecate the jython.doj Mercurial repository. I'll post about it soon, along with a new recipe to get Django running on top Jython, using the asm branch (i.e, the upcoming 2.5 version).

Saturday, July 5, 2008

The Devil is in the Details

It's a cliché, but really, look at this commit message I just wrote. It's for a supposedly simple change I made to Jython to get '%d' % foo and '%f' % bar working, on some corner cases[1]:

StringFormatter: '%d' and '%f' support for the __int__ and __float__ protocol respectively.

The implementation is more convulted than it should be, because we have PyString implementing __float__ and __int__ at the "java level" but not at the "python level". For string formatting, only "python level" __float__ and __int__ must be
supported.

Also, now that __int__ can return a PyLong, this case needs special care. Basically formatInteger now can call formatLong if a PyLong is found as the result of __int__. Then, as formatLong can also be called from formatInteger, __hex__, __oct__ and __str__ conversions were moved inside formatLong.

Finally, test_format_jy was changed to stop checking that we don't support big floats on '%d' (CPython doesn't, but that seems a limitation of the specific implementation and I can't imagine a program that could break on Jython because we *support* it).


Python is wonderful, but there are a lot of details which make it tricky when implementing it. Nice to see that, when we play the role of Python users, we aren't exposed too much to this subtleties. In fact, I'd say that it is one of the languages with the better "user interface" I've seen.

[1] Naturally, I find this corner cases when running and testing the Django codebase. Well, that's one of the points of my SoC project: Test how CPython-compliant Jython is, and fix it when it isn't :).

Tuesday, July 1, 2008

We Should Master Regular Expressions

When someone tell us that every programmer should know regular expressions, it's not only about using them to validate or match input on our programs. After all, seems like many of us can live using split() and replace() and some ad-hoc code instead of learning regexps.

My point today is that they are also useful when coding. I just needed to replace every code that looked like this:


<mx:RemoteObject id="grabaMuestreosRemote"
...
fault="Alert.show('Problemas al grabar los muestreos')"/>

To:

<mx:RemoteObject id="grabaMuestreosRemote"
...
fault="reportFault('Problemas al grabar los muestreos', event.fault)"/>


The change is on the last line, replacing the alert by a slightly more involved logic (which lives inside the reportFault function).

Solution? Find/Replace, using regexps (this was done with eclipse, but every reasonable editor have this feature):

Find:fault="Alert.show\('([^']*)'\)"

Replace With:fault="reportFault('$1', event.fault)"

Quick explanation: \( and \) matches literal parenthesis; they are escaped because they have their own special meaning on regexp: capturing. And they are using for capturing the string message inside quotes, on '([^']*)'. That means: a single quote (') followed by any character which is not a single quote ([^']), repeated 0 or more times (*), followed by a single quite ('). So the non-escaped parenthesis are used to capture (i.e, remember, store) what was found inside the quotes. Later, you use the captured value by specifying $1 on the replacement text. If you have more captures, they are labeled $2, $3 and so on.

By the way, I'm not a regexp master. In fact, I admit to frequently resort to ad-hoc code, especially if I'm in a hurry.

But the simple exercise of summing all the time spent on writing ad-hoc code, plus the time wasted doing non-trivial find & replace by hand, have convinced me to learn them, and hopefully master them.