Java Generics and Collections
You need to be signed in to add a collection
19 years, 19 Java versions later: Maurice Naftalin & Stuart Marks discuss the second edition of Java Generics and Collections, exploring how immutability, streams, and sequenced collections have transformed Java development.
Transcript
Stuart Marks: Hello and welcome to the Goto Book Club. I'm Stuart Marks, I work on the OpenJDK and I'm here with Maurice Naftalin. Maurice, why don't you tell us about yourself and about the new book you have coming out?
Maurice Naftalin: I'm Maurice Naftalin, and I've been working in software development for a very long time. About half of that I've specialized in Java. The book that we're going to look at today is the second edition of a book that I wrote with Phil Wadler quite a long time ago. We wrote this book for Java 5 when generics first came out, and the Collections Framework was by then about seven years old. This was a long time ago, and we've had time to look at some experiences since then. That's what the new edition is about. Stuart and I wrote this together. Phil and I are still the main authors, and Stuart was the technical editor, but I think of him as being close to having been a coauthor.
Java Generics and Collections: A 19-Year Journey from Java 5 to Java 25
Stuart Marks: That's very flattering. So you said it was for Java 5, and we are almost at the release of Java 25 now. The version numbering has changed around, but that really has been a gap of nearly 20 years since the publication of the first edition. What led up to creating the new edition?
Why Now? The Catalyst for a Second Edition
Maurice Naftalin: It's quite nice. I don't know if this is the first time I've noticed it, but it's 19 years since the first edition was published, and it's 19 versions of Java later, from Java 3 to Java 24 right now, about to be Java 25. Quite a lot changed. The thing that really convinced O'Reilly that it was worthwhile having a new edition was the advent of sequenced collections, which you may have heard of. Stuart, you could say something about that now.
Stuart Marks: Sequenced Collections is a new set of interfaces that was added to the Java Collections Framework and was integrated in Java 21. I did that work, which is why Maurice is suggesting that I talk about it.
I also thought the other reason that I pushed it towards O'Reilly was because I thought that ever since I wrote about streams in a book in 2013, this book would require revision. The collections part would require revision because streams have changed the way that we think about bulk data processing in Java.
It turned out that wasn't as big a change to the book as I expected. The basics of working with collections hasn't changed greatly. Most of what happened was that I found alternative ways of writing many of the examples that included streams, and some of those are in the book as well.
But then what surprised me also was the extent to which, although all the code from the first edition was still valid apart from certain mistakes, the styles had altered a great deal. For example, the advent of records meant that many of the examples looked quite different. The fact that we no longer use constructors for the wrapper classes meant we had lots of examples that used new Integer, which is now deprecated and may stay deprecated again. That's up to Doctor Deprecated.
Stuart Marks: That's one of my other roles, to deal with deprecation in the platform. Sometimes I have an alter ego, Doctor Deprecated, but unfortunately he's not here right now.
Maurice Naftalin: Well, in that case, we can badmouth him because Doctor Deprecated has deprecated the integer constructor, and it's liable to be deprecated again later on as far as I understand it. But I took those examples out, and that required quite a lot of changes to the code. Then I went back, and once you start working with something and rewriting the examples, you look at the original material and you think, "Actually, I think this could have been improved. This could be better." So I ended up doing quite a substantial rewrite of many of the chapters, even in the generics part of the book, which I thought originally would be able to be left fairly untouched.
The Shift Toward Immutability and Unmodifiable Collections
Stuart Marks: Another thing that has changed a lot over the years is the emphasis in Java programming on immutability and unmodifiability. I'll take this opportunity to mention another enhancement I made to the Collections Framework, which is a set of factory methods and implementations that support unmodifiable collections. That went in a long time ago now. That was in Java 9: List.of, Set.of, and Map.of. Those appear in the examples a lot as well, don't they?
Maurice Naftalin: I should have mentioned those. Using constructors for collection instances when you know that you aren't going to want to modify them doesn't look like modern Java anymore. Examples needed changing in that way as well. Generally speaking, you would look at the first edition now and say this doesn't feel like the same language that we're writing in anymore, even though it's quite valid.
If you add together the changes I mentioned—records, the unmodifiable collections, the deprecation of the wrapper class constructors—altogether you put that all together and it looked pretty antiquated. So there had to be many changes there. I think probably of those, the one that is the most significant in terms of the overall thrust of the book would be the unmodifiability, the move towards immutability and the introduction of the unmodifiable collections.
Stuart Marks: There's definitely a stylistic change in the way Java programs have been written over the years. Certainly the Collections Framework has always supported mutable collections, and it's possible to write a program that adds elements to a collection and then removes them and then searches them and inserts them and so forth. People could still do that, probably. But I think with the advent of streams, if you want to transform the elements of a collection, typically the style is to take the elements, run them through a stream, and then deposit the resulting elements into a different collection. You're ending up copying things, and that goes hand in hand with unmodifiability because you create a new version of something with different contents instead of editing things in place.
Usage Guidance: Ownership and the Anemic Domain Model
Stuart Marks: That leads us to one of the topics we wanted to cover, which is covered in one of the new chapters of the book, which is usage guidance. Since it has been such a long time since the first edition was published, we now have quite a bit more experience with using the Collections Framework. Do you want to mention some things about what that chapter covers?
Maurice Naftalin: One thing I should say in answer to the point you just made is that we are probably doing less mutation in Java than in the past. I don't think there was any idea of trying to reduce mutation in Java 1.0 or 1.1 or something. But by the time it came to Java 5, with the advent of the concurrent collections, immutability was being seen as much more useful. There's really been quite a steady trend towards adopting the features of functional languages into Java, and not only the features, but also the ideas behind functional languages.
I've given talks—you might have seen one that I did with José Paumard on how Java has borrowed from functional languages. But it's important to say that Java is not a functional language. It wasn't designed as one, and it never will be one. We will always be doing mutation. That's baked into the DNA of the language. But that's mixing the metaphors a bit. Once you bake your DNA, it doesn't work so well anymore.
Stuart Marks: That's right.
The features from the guidance chapter—I do think the guidance chapter was new in this edition. This edition fleshes out the bare bones of the first edition. I think probably one of the most important features, one that's relatively novel that you won't find anywhere else, is thinking about the ownership of collections.
We really wanted to look at—it's like taking the issue of encapsulation, which lies at the root of object-oriented programming, obviously, and taking it to a somewhat deeper level than just the general idea of, "Well, we'll use getters and setters instead of exposing state directly."
We discover—well, we don't discover, but we explain how the state of an object should own its state entirely. We look at the work that's required in order to ensure that the state is never exposed to clients so that they might subsequently, either by design or by malice, change it and make the object, as it were, out of its own control.
Stuart Marks: Most Java programmers are familiar with the idea of encapsulation. If you have an object with a private field, then of course you will want to have methods—getters and setters, but also other methods that will update that field in a very controlled way. Java programmers are very comfortable with the idea of fields being private to an object and encapsulating fields and whatnot.
But if you stick a collection into a field, a collection is now a different object, and that's somehow confusing. What you're doing is if you just have a getter, you're just passing around a reference, and it's very easy if you're not careful for an object to lose control over its internals if it simply has references to internal collections.
There are some examples in the book of inadvertent aliasing of collections between objects and loss of encapsulation and whatnot. There are a couple obvious things—well, again, obvious—there are some things that have been in the platform for a very long time, which is the ability to wrap a collection in an unmodifiable wrapper to prevent somebody from making unintended modifications, and then also making defensive copies.
It seems a little strange, but I think we did some searching, and I don't think that topic is actually adequately covered anywhere else, even though it's been an issue since day one. Personally, when we were discussing this material, I thought, "Well, this is so obvious. People obviously need to know that they need to make a copy here, right? It's obvious"—I keep saying that word. But every so often I would stumble across code, even in the JDK, that gets it wrong. That convinced me that it's worthwhile to actually write this stuff down, even though it's a bit of Java lore, it's worth writing down in detail and justifying it.
Maurice Naftalin: I think the basic problem comes from the fact that even though in principle everybody understands the difference between shallow copy and deep copy of an object, I don't think they actually think about this very seriously when it comes to collections for some reason. I got comments on the book, including—by the way, I got a comment from Heinz Kabutz, who I know understands the difference between shallow and deep copy. He's very expert, and he was saying, "Well, why did you use the word unmodifiable for these collections instead of immutable?"
This is actually a central question to what we're talking about here, because if you don't make the distinction between what we call unmodifiable in the book—which means shallowly immutable, that is, the contents of a collection can't directly be altered, but things that it refers to can be altered—if you don't make that distinction, then you're going to have problems.
If you don't get this difference between shallow immutability, let's call it, even though we call it unmodifiability, and real immutability, then full encapsulation doesn't mean anything to you. Because full encapsulation, if you want to be serious about encapsulation, then you probably want to make it impossible for a client to mutate any part of the state graph that they get passed a reference to, not just at the top level.
That issue means that to take encapsulation really seriously requires more thought than it usually gets. If you don't take it really seriously, then you're opening yourself to the corruption of the internal state of your objects.
Stuart Marks: We've been talking about encapsulation and internal state and thinking about things a lot. One of the items that's covered in the usage guidance chapter is something called the anemic domain model. Did you want to mention some thoughts about that?
Maurice Naftalin: Well, the anemic domain model—that's a term that comes from a different field of object-oriented design. I think Martin Fowler invented this in the first place, and he used it to talk about domain objects which don't actually embody domain logic. Without domain logic, all of the logic of manipulating these objects has to reside in the service layer of an application.
The corresponding problem, if you've got collections, is to have simply something like, say for example, a set of lists or a list of lists or a list of lists of lists. But generally speaking, often when you want to define a type like that, you are actually defining a domain object, and you probably want to have fields within it. Rather than simply being a bare collection like this, you probably want to define a class.
Instead of saying—the example that we use in the book, for instance, is that a project is a set of tasks. So you might say then that a list of projects was a list of sets of tasks, but that may be inadequate. It's actually quite likely to be inadequate if a project needs to have extra information attached to it. The example in the book is that a project might have a total duration, which is the summed duration of all the individual tasks.
Now, you could calculate that duration. It's a dynamic property, so you could calculate it every time you need it from the duration of the individual tasks. But if that's inefficient, then you're better off defining a project object and having a named type which is more than simply a set of tasks.
This problem that I think you mentioned in the first place, Stuart, because I think you've seen it in the use of the Collections Framework, which is that people just simply use these bare collection types, probably corresponds to the anemic domain type problem and is solved in the same way by actually defining proper domain objects.
Stuart Marks: As the maintainer of the Collections Framework, the way I see this is I get requests for people who want to have features added to the collection implementations. For instance, something like LinkedHashMap—there's a lot going on in there. It's a map, so it's a mapping from keys to values, but it also maintains ordering. Also, there's some facilities in there so that the order is updated and maintained automatically.
The temptation is for people to say, "Oh, I need a LinkedHashMap, but all I need is a variation of a LinkedHashMap that changes the order automatically when something happens," or "I want to get a notification." The extension of this is "I want a collection, but I want a callback or a notification when something changes."
To me that's a smell, which is it sounds like you're passing around a collection and expecting something to happen when somebody modifies the collection, and you shouldn't do that. You just wrap one of your own objects around the collection and have your object contain the collection. Then you can define exactly the right methods on it that you want, that do exactly what you want. That's the antidote for the anemic domain model.
Maurice Naftalin: That actually overlaps quite nicely with something that we also discussed at length in the design retrospective chapter, which is how fine-grained, how specialized do you want your individual collection classes to be? The people that are writing to you are, by implication, wanting a really vast framework consisting of lots of very highly specialized collection classes.
One of the issues that we thought about in the book was, "Well, what are you really looking for in a collections framework?" We definitely rejected the kind of framework that that sort of modification would lead to.
Extension Points and the Fragile Base Class Problem
Stuart Marks: Also, there are some facilities in the Collections Framework that facilitate extension and modification. That's another thing we spend some time on in the usage chapter, which is the fragile base class problem. Sometimes people do want a variation on one of the existing collections, and so they reach for the first thing that they're accustomed to, which is subclassing and inheritance. They can easily get themselves into trouble that way.
They override methods until the implementation does what they want. Unfortunately, that exposes them to fragility. If the superclass changes, like when they upgrade the JDK, the behavior of superclasses might change and affect subclasses, and then that will potentially break the program.
The Collections Framework contains a set of abstract classes with "abstract" in the name, and they are intended to be overridden and subclassed. If you want to provide customized behavior, that's the right extension point to do that, and that avoids the fragile base class problem.
Maurice Naftalin: I knew about these abstract classes for a long time, and I also knew about the fragile base class problem. But it wasn't really until shortly before writing the book that I realized quite how well-designed the abstract classes are in the Collections Framework. Josh Bloch, who wrote the Collections Framework and then went on to write Effective Java, explains in Effective Java how important it is that classes for extension should be designed specifically for it, to be extended.
If you go and look in the Collections Framework, you see that he really carried out, he really implemented his own advice very capably. They really are very well designed, and it's very easy to extend those and get new collection classes which behave in exactly the way that you want and with exactly the difference that you want from the standard implementations. It's a really nice piece of work, and it's not nearly well-known enough.
As a maintainer of the Collections Framework, I do get feature requests, and occasionally a feature request—I'll look at it and say, "Well, I can understand why somebody wants to do that, but the Collections Framework doesn't need to be enhanced to provide that feature. It's easy to provide it yourself by subclassing AbstractList and writing this method." Then it's easy for programmers to define whatever custom features that they want in extending the framework that way.
Design Retrospective: The Unsupported Operation Exception Controversy
Stuart Marks: You did mention another one of the new chapters, which is the design retrospective. I think one of the big topics we covered in that was this notion of the lack of read-only or so-called immutable interfaces and the use of UnsupportedOperationException. I think we probably should mention that since that is a fairly big topic.
Maurice Naftalin: We wanted to. It's a big topic because people still keep on arguing about it. Nearly 30 years—well, 20 years after—no, more than that, nearly 30 years after the introduction of the collection classes, people are still saying, "Why do we have UnsupportedOperationException?" I think of this as the unsupported operation exception problem, because people hate this.
They say, "Well, an interface ought to say what you're able to do." They cite the interface segregation principle, which is a standard principle from object-oriented design, which says that clients shouldn't have to know more about the capability of a library than what they need. In other words, the interface should be narrow enough that it's useful in its totality to a client.
Of course, UnsupportedOperationException—the idea does indeed break that principle. We had to look fairly carefully at how you might have evaded this problem. There's a significant number of conflicting forces involved in the design of the Collections Framework. We talked about this a little bit. For example, you want to keep it reasonably small. You don't want to have UnsupportedOperationException. You'd like to have the names of the types reflect their function and so on.
It turns out that the different alternatives that you might consider are none of them anything like perfect. Indeed, the other frameworks that we looked at—Guava, for example, or the Eclipse framework—they're very different design choices. These choices are not bad ones, but none of them solve all of the problems that we're talking about, because some of these are just conflicting problems.
Stuart Marks: That's definitely something that people have complained about over the years. When the original Collections Framework was introduced in JDK 1.2—that would have been in 1998, so it's not quite 30 years, but it's getting up there—it was several years before the first edition was published.
Just to recap for people who aren't totally up on this: we have some collections that are modifiable and some that are unmodifiable, but they use the same interface. If you have a list, then you can add things to it. But if you have a non-modifiable list, you still have the same list interface, and it has an add method. What happens if you call the add method on it? Well, it throws UnsupportedOperationException.
This does on the face of it violate object design principle, the interface segregation principle. But I think it raises a bunch of other questions, which is, how do you know whether or not you can modify a particular collection? Shouldn't there be a way to check to see whether you can modify the collection?
I think that actually leads back to something we discussed a bit earlier, which was ownership of collections. You should only be modifying things that you own. If you own it, you know that it's modifiable. So in practice, this is not much of a problem unless your ownership of collections is out of control.
Occasionally people will complain about this, and they'll say things like, "Well, I'm trying to modify a collection, but sometimes it throws UnsupportedOperationException." That raises a whole bunch of deeper questions, which is where did that collection come from? If you got it from somewhere else, maybe you're actually trying to modify somebody else's internal state.
It's unfortunate because the type system is actually not helping you solve this problem. You need to use coding conventions to deal with this.
Maurice Naftalin: Some people think that coding conventions are kind of inferior, that type constraints ought to tie everything down. But one of the things that we found in writing this part of the book was that type hierarchies are only one way of constraining what collections should be able to do.
Contracts are actually very powerful—what you're calling coding conventions, but what I prefer to call contracts because it sounds better and it's more legal and it's better defined. Of course, they should be properly documented.
An interesting example of the way that constraints can move between a type hierarchy and a contract is to do with sequenced collections. That was an idea which was often enforced by contract, and many of the collections that became sequenced collections—but when that interface was introduced, the types that implemented it became constrained by the formal type system. But the formal type system is only one way of enforcing semantic constraints, and quite often it's not even necessarily the best one.
Stuart Marks: A classic example, which we also cover in the book, is immutability. It's very easy to say "I have this type and it's immutable," but at least in Java, there's really nothing preventing somebody from making a subclass and adding a mutator method and making the thing mutable.
That's not really something that you can enforce in a type hierarchy unless you add additional constraints on it, such as making the class final or making a sealed interface, which are actually—well, it's always been possible to make a class final, but the notion of a sealed interface is something that's actually relatively recent. I don't remember exactly when that went in.
But it used to be the case that the instance creation of interfaces was out of the control of the library authors, and anybody could implement an interface. But that's not true anymore. On the other hand, if you seal an interface, that also means that you're shutting off third-party extensions. There's a trade-off there. In many cases, we do want frameworks such as the Collections Framework to be extensible by third parties.
Stuart Marks: Let's see. There's one small issue we can tackle before we wrap this up, which is the idea of null handling in collections.
The Null Handling Mess
Maurice Naftalin: Oh no, that is too painful. This is perhaps not the most enjoyable topic to write about. Well, I mean, really, I should be used to it. You're like the Collections Framework. Of all the things, of all the design issues that we talked about, I think we go into some detail on the issue of the type hierarchy and the ways of constraining collection behavior. I think it's possible to make a reasonably good case to defend the design decisions that were made in that area.
However, when it comes to null, I think you would have to be an extremely good lawyer to defend the different, the inconsistent treatment of null in the Collections Framework. It's really not its proudest aspect.
Stuart Marks: It's quite far from its proudest moment. Just a brief history: the so-called legacy collections, Vector and Hashtable, which came in JDK 1.0, disallowed nulls. Then the main Collections Framework introduced in 1.2 allowed nulls in most instances—HashMap allows nulls. Then the concurrent collections in Java 5 disallowed nulls.
We've kind of sort of kept with that. But the fact is now we have multiple implementations that implement the same interfaces that have different treatment of nulls, and the contradictions—they're just too many to list here. But the contradictions in the specs, and inconsistencies are rife. As you said, it's not the proudest moment of the collection design. But this is one of the messes that has built up over the past 25-plus years.
A Lasting Achievement
Maurice Naftalin: I think we should try to end this conversation on a more cheerful note than the treatment of nulls in the Collections Framework. Looking at the design retrospective on the whole, I think that the decisions that were made—I mean, on the whole, the introduction, the design of the Collections Framework, which was done in one go, really, by Josh Bloch—is pretty remarkable. It was a considerable achievement, and it stood up very well for nearly 30 years.
So I think we can say—I think we should look at the good points of this. I'd like to say about the book that I think we've been able to document that in a lot more detail than is usually done and shine some light into some not well-understood corners of the framework that could be pretty valuable for the development community in the future.
I feel like the book overall is a worthwhile piece of communication with the community about both generics and collections, looking at them in the light of a lot of experience since they were introduced. So I feel like you should buy the book. We've talked about some of the topics here, only a few of the topics in very broad outline. There's a lot of detail on all of those. I would recommend if this is of any interest to you, you should take a look at it.
Stuart Marks: You're quite right on both of those points. The Collections API has held up quite well over multiple decades. When I look at other APIs which have come and gone over the years, I think people still make effective use of the Collections Framework today. There are still ways we can continue to evolve, to make its continued usefulness possible. Yes, we cover a lot of additional details in the book, so I recommend you go out and find the book wherever popular books are sold.
Well, all right. Thanks very much for the conversation, and thanks, obviously, for all of your work on the book as well, Stuart.
Stuart Marks: You're the author, and you put the lion's share of the effort into it. Congratulations on delivering the second edition of this book after so many years and so many updates.
Maurice Naftalin: Thanks very much. Thank you.
Stuart Marks: Bye bye.
About the speakers
Stuart Marks ( interviewer )
Consulting Member Of Technical Staff
Maurice Naftalin ( author )
Java Champion & Author