What's wrong with this picture?
class PageStoreDB {
Page getPage(int pageId) throws PageNotFound, IOException {//...
boolean deletePage(int pageId) throws IOException {//...
boolean pageExists(int pageId) throws IOException {//...
//etc...
}There might well be more than I can see, even in this simple snippet, but I can see at least two problems--one pretty obvious and one maybe not so obvious. Obvious (assuming we're in Java): PageStoreDB should be an interface, to make it easier to mock for tests, swap out implementations, etc. Less obvious: I don't think pageId should be an int.
Almost everyone makes it an int, of course. I've done it plenty of times myself, because it's natural--usually you're using an id to read from a database, and the id in the database is typically an int. As programmers we also tend to think of ints as "primitive"--about as simple as you can get. I'd like to suggest that this isn't so. What if our snippet looked like this?
class PageID {
PageID invertSign(PageID id) {//...
PageID add(PageID id) {//...
PageID subtract(PageID id) {//...
PageID multiply(PageID id) {//...
PageID divide(PageID id) {//...
PageID leftShift(PageID id) {//...
PageID arithmeticRightShift(PageID id) {//...
PageID logicalRightShift(PageID id) {//...
PageID greaterThan(PageID id) {//...
PageID lessThan(PageID id) {//...
PageID equalTo(PageID id) {//...
PageID notEqualTo(PageID id) {//...
PageID lessThanOrEqualTo(PageID id) {//...
//...and much, much more...
}
class PageStoreDB {
Page getPage(PageID pageId) throws PageNotFound,
IOException {//...
boolean deletePage(PageID pageId) throws IOException {//...
boolean pageExists(PageID pageId) throws IOException {//...
//etc...
}
You'd probably think the programmer who created this was borderline insane. About the only methods in PageID that you'll actually use are equalTo and notEqualTo. The rest are not just superfluous, they're distracting and potentially dangerous. Do you really want to be able to logically right shift an identifier by accident? But this is exactly the interface you're getting when you use an int for an identifier. What you're really looking for in an identifier from the client side is something more like this:
abstract class PageID {
abstract boolean equals(Object o);
}
The rest of the implementation details, mapping this object representation to the id field in the database, are exactly that--implementation details--and should be handled in implementing classes in the PageStoreDB's purview.
So, you say, is that it? I have to write at least two classes to handle the simple identification of a Page, a job I could do with no work using an int, and all I get in return is that I can't accidentally right shift my id? I still think you could argue that was enough to justify the work, but I think we actually get more.
Let's say we're using ints for our ids and, somewhere down the road, we have to start merging pages in the page store--that is, we want to essentially forward one id to another id. With the int approach, we have some problems. Just testing id1 == id2 doesn't really work any more. You have to start writing methods like PageStoreDB.idForwardsTo, and remembering to call them when you want to compare ids. If we're using the class as an id, however, it's pretty easy to handle. We have each object keep a list of forwards, and consult these in the equals method. This lets us maintain a nice invariant of our system--that all ids that identify the same page test as equal to each other--without altering client code, because we encapsulated early on.
We get a similar benefit if we stop identifying pages with ints in the database, perhaps because generating them in sequential fashion is causing a bottleneck, and start using uuids instead. Very little code needs to be changed, and all of it is in the provider class, where it belongs, instead of in the client code.
There are a host more benefits as well: logging, checking for goodness, control over creation. If and as these become needed, they can all be done in one place.
I'm coming to believe that the basic principle here applies pretty widely: namely, that you shouldn't use "primitive" types to represent just about anything in your system. Use a URL class or an AnchorText class instead of a raw string, for instance, or an enum with custom methods for a set of bit fields. It doesn't take much extra work once you get into the habit, and the aggregate work you save yourself down the road is usually more than worth it.


Dharmesh Shah 1:00 PM on September 16, 2010
This is a simple, but great point.
In my experience, it's seemingly small design decisions that make a big, big difference in the long-term in term.
What's really great about this types of "smarter" choices is that the effort required is modest to it the right way (the big O is constant), but the payback is continuous. Just my kind of investments.
I contrast this to other decisions that though do provide for long-term flexibility come at a high short-term cost. I'm not a fan of large "excess inventory" (in terms of lines of code).
Thanks for the article.
Owen Raccuglia 12:49 AM on September 18, 2010
This is cool. I'll be watching my code to see where this would help me.
Philip Jacob 7:48 AM on September 30, 2010
We model our IDs using objects at StyleFeeder as well. It makes tracing their use through the codebase easier than Strings or primitives. And it also helps us do some validation because we know our IDs are one of three lengths (legacy reasons), so we can find out pretty quickly if something's not going to work out.
Scott Faria 11:31 AM on January 27, 2011
Echoing the "simple, but great point" feeling. This kind of pattern is really common and lots of people get it wrong by passing around primitives as identifier. An identifier should never be something mutable or ambiguous.
A small nitpick but it actually quite important: Any time you override equals you need to think about overriding hashCode as well. Your PageID subclasses are going to be causing lots of collisions if they find their way into a HashSet, HashMap, etc.