Motivation
I think the coolest feature of Jython is the seamless integration with Java. Let say you have the following java class:
package com.leosoto;
public class HelloWorld {
public void hello() {
System.out.println("Hello World");
}
public void hello(String name) {
System.out.printf("Hello %s!", name);
}
}
If the class is on the classpath when you start Jython, using it from python code is straightforward:
>>> from com.leosoto import HelloWorld
>>> h = HelloWorld()
>>> h.hello()
Hello World
>>> h.hello("joe")
Hello joe!
Now, did you knew that if the class was not pointed by the classpath, we could also package it on a jar, and the following would also work:
>>> import sys
>>> sys.path.append('/path/to/helloworld.jar')
>>> from com.leosoto import HelloWorld
Until yesterday, I didn't knew!
Since part of my GSoC project is to come with a way to package Django projects in a single distributable war file, I've spent a complete day reading and playing with the Jython import logic, and here is what I got.
Not much different than Python, right?
First of all, Jython is an implementation of the Python language. So the import mechanism follow strictly what is know as PEP 302: import hooks. I don't want to repeat what is documented there, but a quick explanation is in order:
- First, try custom importers registered on sys.meta_path. If one of them is capable of importing the requested module, we are done.
- For each entry of sys.path:
- Find the first hook registered on sys.path_hookthat can handle the path entry (for example, zipimport is registered there and handle all "*.zip" paths)
- If a importer hook was found, try it. If the importer loaded the module, we are done
- If a importer hook was not found, use the builtin import logic (good old *.py files inside directories). If the module is successfully loaded, we are done.
How are java classes loaded then?
With a built-in import hook, naturally ;-)
If you start CPython and look at sys.path_hooks you get:
>>> import sys
>>> sys.path_hooks
[<type 'zipimport.zipimporter'>]
On Jython, the result is slightly different:
>>> import sys
>>> sys.path_hooks
[<type 'JavaImporter'>, <type 'zipimport.zipimporter'>]
The JavaImporter only recognizes the '__classpath__' entry on sys.path, so it is fired after looking at path components before '__classpath__'. This gives us some control over which namespaces will end up containing python modules and which will contain java packages/classes, if some conflict occurs (such as the very real issue of having the 'test' python module and the 'test' java package). Naturally, the '__classpath__' entry is added automagically to sys.path on Jython startup.
But...
>>> sys.path_hooks = []
>>> sys.path_importer_cache = {}
>>> del sys.modules['java']
>>> import java
>>> dir(java)
['__name__', 'applet', 'awt', 'beans' ... ]
This should have failed, after removing the JavaImporter hook and all the involved caches. Well, there is also some magic going on here...
Jython, JavaImporter and Java Packages
When an import is going to fail (that is, after searching on all sys.path entries and having no results), Jython tries to load a java package or java class. But wasn't that the task of the JavaImporter?
Well, sort of. Half of such job is the responsability of the JavaImporter. The other half is managed by the SysPackageManager, which keeps in memory a tree of discovered java packages.
When the Jython interpreter starts, the SysPackageManager looks for all jars and directores on the classpath and build the tree of java packages. You can also explicitely add a Java package into the PackageManager by calling sys.add_package("package.which.was.not.autodiscovered"). This is useful on environments where Jython is not allowed to look at the system classpath, or doesn't get the right information (as maybe the case when running inside a JavaEE container).
Back to JavaImporter, its job is to just look into the SysPackageManager's loaded packages and check if the requested name is present there.
And here is the magic
Another way to get packages loaded into SysPackageManager is to add a zip or jar to sys.path. The next time the import logic runs, it automatically add the contents of the new jar (or zip) to the tree of known java packages.
This is a little weird, because if you have the following in your sys.path:
['__classpath__', '/foo', 'foo.jar']
Then, if java packages on foo.jar conflicts with python modules from /foo then the java packages will prevail, because the '__classpath__' entry is before '/foo', and then the JavaImporter will do its magic.
And the other bit of magic is what we have already seen: Jython does a last attempt to load a Java package, or to be more precise, to add a package to the SysPackageManager if the imported name is know to the JVM as a class or package name. If this operation is successful, the module is directly imported by this Jython builtin import logic addition (no way to go back to the JavaImporter at this time).
Some observations
Here ends the objective part of this post. What follow now are my observations on the whole process:
- I don't quite understand why Jython tries to load Java classes or packages at the end of the import logic after trying the standard procedure. Seems like such fallback would make the calls to sys.add_package unnecessary, but then, why does add_package exists? And, in any case, I think that JavaImporter should do this
- The confusing situation of jar files (and java classes) in sys.path is well... confusing. The good news is that namespace conflicts aren't that common in practice, so just remembering that all java "modules" come from the magic '__classpath__' element is enough.
- It would be nice if the Jython standard loader were installed on the meta_path. Then, JavaImporter could be added there too, just after the default python code loader. This way, we would have a more clear precedence rule (Python modules first, Java packages/classes later), instead of the current "first python modules before __classpath__, followed by java "modules", followed by python modules after __classpath__, followed bt java "modules" wich weren't registered yet on the PackageManager).
That's all
OK, that was a long post. Now that I've dumped all that info here, I can go back to coding and try to make distributable WAR files for django projects, containing the complete Jython, modjy and Django runtime.


4 comments:
Hi,
I am still struggling with the whole logic.
I;ve tried your example and it works, but if you change the name of the package from com.leosoto -> org.leosoto it does not work anymore (at least for me)
I have found the "org" name a bit tricky to handle via the sys.path.append way.
Any comment on that?
I have more about it:
1) use the "org" name
2) add the jar file to the sys.path
3) import org ----> nothing happens
3b) try some from org.leosoto import .... ----> nothing happens
4) import xyz ----> error, but the jar file is finally processed
5) classes are available!
Does it surprise you or is it normal?
It seems it has been fixed recently
http://fisheye3.atlassian.com/changelog/jython/?cs=5956
Haven't tried it yet. Waiting for a new beta.
Mario:
You're right, the behavior seems caused by that bug.
If you don't want to wait for the next beta (which is coming soon anyway), I'd recommend you to download and build from sources. It is a straightforward process (just running ant).
Post a Comment