2006-08-26
An alternative to String#intern()
Ethan blogged about the usage of String#intern in the Xerces XML parser. Let's not delve into the discussion whether interning is particularly useful in this context. What does String#intern do? Basically, it implements a global weak symbol table that unfortunately populates the permspace. As an alternative, let's see what it takes to implement #intern at the application level: Since I'm lazy, we'll define that you do not actually need the weakness property if you have dedicated symbol tables that you can forget and that can be GCed as a whole. Luckily, ConcurrentHashMap already implements everything we need then, so it's pretty easy:
public class SymbolTable<T> { ConcurrentHashMap<T> chm = new ConcurrentHashMap<T>(); public T unique(T t) { T first = chm.putIfAbsent(t, t); return (first==null) ? t : first; } }
Not only does this little class not bloat your permspace but it's actually faster than using String#intern(). Just say no.
Rémi Forax reminds me that String.intern() is handy in order to do identity comparisons against constants. Just don't let yourself be fooled into thinking that '==' makes things faster. Interning the string in the first place requires you to hash the string and compare it to the interned version.
2006-08-25
Parsing object streams with JAXB
JAXB is a wonderful tool to bind XML to objects. Unfortunately, used naively, it has the same problem as DOM: you need to load the whole document into memory until you can start consuming the result. If you have a document with one root element and tens of thousands of children it is a good idea to combine object-binding for each child with streaming for the whole list.
I'm using a simple solution to do that. It's a list implementation that does not store its children but notifies a callback handler instead. Use that list in the parent bean and you're all set:
@XmlRootElement(name="parent") @XmlType(name="", propOrder={"children"}) public class Parent { @XmlElement(name="child") List<Child> children = new CallbackList<Child>(); }
Now, whenever JAXB has finished constructing a child object, it will call #add on my list and the object is immediately handed to the callback and garbage collected afterwards.
You may have noticed that the parent context is not fully constructed yet and unknown to the child, thus inaccessible to the callback handler. Also, I haven't found a proper way to inject objects into JAXB, so the CallbackList needs to find its callback handler through a ThreadLocal which is not exactly pretty. But the net result is great: blazing performance combined with the comfort of binding.
Next morning: alright, I just discovered the UnmarshallerListener. We can use it a) to install a write-only sink as the list implementation and b) listen for child objects without the thread-local. And without having to change any of the generated classes. Neat!