Design Pattern? nested data types

0rthodontist · Aug 2, 2006

Last Spring our team had a problem designing one part of our plugin (Java, but what's relevant is that it is OOP). We had to store a lot of users, who are organized into sessions. The sessions are in turn organized into courses, and all the courses have to be stored somewhere also. The solution we used, which was mostly satisfactory, was:
1. The singleton UserList contains a flat list of all users, and was originally intended to be the single point of access to user operations
2. The singleton CourseList contains a list of all Courses
3. Each Course contains a list of Sessions for that course
4. Each Session contains a list of users in that session

The problem is that there are operations that need to be done at various levels in this hierarchy. For example, if you log a user out of an individual Session, this could be done at the Course level, since the Session the user was in might become empty and have to be removed. But it could also be done at the UserList level, which originally was going to be an abstraction for the whole framework. The laziest thing to do is make the operation available at the lowest level of the hierarchy that it could be, while still allowing for all the necessary operations. But it does not seem right that the user of the framework should be required to guess the correct level. For example, they might try u.getCourse().removeUserFromSession(); or they might try u.getSession().removeUserFromSession(); or they might try CourseList.getInstance().removeUserFromSession(u); - if they hadn't designed it, it wouldn't be obvious which was the right choice. This looks like a problem of insufficient data hiding. On the other hand, a flat design with everything at the UserList level can too quickly become bloated--had we flattened the design so that , UserList would have had about 30 methods in it. Also, a top-level UserList.getInstance().removeUserFromSession(u) would have to in turn call a protected method like u.getCourse().removeUserFromSession(), so it is sort of redundant. What we settled on doing was basically a haphazard setup, with methods available wherever we happened to think of when writing them originally, and anyone who wanted to use the framework had to be moderately familiar with it.

Is there a formal design pattern which handles this situation? The one that sounded the most promising was "chain of responsibility" but that turned out not to apply very closely.

chroot · Aug 2, 2006

You don't need a pattern. You need an enclosing class.

The reason this is confusing is that you're exposing too much of the internal mechanism of how you're storing the data to the potential users of that data. Users don't give a damn how you store the data -- they only care about the operations they wish to make on the data.

You just need one public class, perhaps called CourseManager, which presents a simple, coherent interface to its users. CourseManager would have overloaded methods like removeUser(User, Session) and removeUser(User, Course).

The user of CourseManager does not need to know how you've chosen to store the lists and their associations internally.

- Warren

0rthodontist · Aug 2, 2006

Well, that was what UserList was originally intended to be. However, the flat design you suggest did not work very well for our database manager, which was about 40 undifferentiated functions and a huge mess to wade through.

I think a hierarchy is useful for someone trying to use the framework, but it has to be made intuitive--and the simplest place to put a method often is an unintuitive place for the user. Someone wanting to remove a user from a session would probably first think of the session for that. But the Session is an inconvenient place to have the removal method, and putting it there would only serve an aesthetic purpose, with the real work still being done in the enclosing Course.

chroot · Aug 2, 2006

I'm not suggesting a flat design. I'm suggesting that you abstract the details of your data structure (lists of lists of objects) from the actions users will want to make upon the data.

I hardly see 40 different atomic operations here -- I see more like 5 or 10, with strictly defined behavior. The user might have to perform two or three atomic operations to complete a given task, and that's okay.

Besides, a name like "UserList" is a horrible name for a module that has any kind of intelligence.

- Warren

0rthodontist · Aug 2, 2006

Well, if the user has to do more than one atomic operation to get a single thing done, then, if they aren't familiar with the framework, they have the power to screw things up and perform nonsensical operations.

It really would be about 30 methods. We also have to deal with Slides and Scribbles, which though not strictly part of the user framework, do have to be associated with the sessions. For example, the following methods are public for a Course in an old revision of our program (i don't have our final program with me at the moment):
Course(String, int)
addSyncsession(String, UserConnection)
getDefaultSession()
getID()
getName()
getSession(int)
getSessions()
getSynchronousSessions()
registerUser(String, int, boolean)
removeUserFromSession(UserConnection)
setName(String)

chroot · Aug 2, 2006

Okay, why are you showing me methods from Course? The user ideally won't have access to Course's methods. The user would ideally have to go through the CourseManager to manipulate the data structure, rather than mucking with elements of the data structure directly. (Though it would make sense for the accessor methods like getName() to remain public.)

Perhaps you're "dumbing this down" your situation to make it understandable on a forum, but I see no reason at all why you cannot (or should not) abstract out the atomic actions the user will make on the entire data structure, then make the data structure itself private.

- Warren

0rthodontist · Aug 2, 2006

It's just that there is a cumbersome number of methods. If everything is controlled through one class, it would be like the database manager. Being able to deal with individual courses when it's intuitive to do seems better than going through one big interface to everything. What kind of atomic operations are you suggesting?

chroot · Aug 2, 2006

You've already seen the problems inherent in exposing what is essentially just a data structure to the outside world. It isn't "intuitive" for the methods that act upon that data structure to be spread throughout all the levels of the data structure.

I think you've gotten a case of "objectitis" -- a common condition in which programmers try to make everything an abstract object, even when it doesn't make much sense. Just because you're writing an object-oriented language doesn't necessarily mean every object has to have a public API.

The atomic operations I'm talking about are:

- Finding a course.
- Finding a session.
- Adding a user.
- Removing a user.

Some pseudocode might look like the following:

Code:

// Add a new user to session 2 of course CS101
Session s = courseManager.getSession("CS101", "2");
courseManager.addUser(new User("Warren"), s);

Code:

// Print out all the sessions of course CS101
Enumeration e = courseManager.enumerateSessions("CS101");
while (e.hasMoreElements())
{
    Session s = (Session) e.nextElement();
    System.out.println(s.toString());
}

etc.

- Warren

0rthodontist · Aug 2, 2006

There are many more methods than that. A user can also be removed from a system, removed from a course but still in the system, added to a course, added as the leader of a new session, moved to an existing session, registered with a course, and registered with the system, in addition to all of the other methods that sessions and courses have that do not directly move a user around. Where do they all go?

I think a case could be made for putting the "ambiguous" methods, like removing a user from a session, into a single flat class, and retaining the others, like getting the course ID, with their respective obvious objects. But this is only good if you have a clear way to characterise ambiguously-located methods.

Also, this is a networked application that's supposed to be able to handle a lot of different connections, so breaking operations like moving a user into "remove user" and "add user" could cause synchronization problems.

chroot · Aug 2, 2006

0rthodontist said:

There are many more methods than that. A user can also be removed from a system, removed from a course but still in the system, added to a course, added as the leader of a new session, moved to an existing session, registered with a course, and registered with the system, in addition to all of the other methods that sessions and courses have that do not directly move a user around. Where do they all go?

In my opinon, they all go into the CourseManager, broken up into a small number of atomic, easily-verified operations.

I'm not saying that determining the appropriate atomic operations is easy, but it's certainly no more difficult than designing the painful interface you originally proposed. At most, I'd say there are a half dozen atomic operations here, and any complex operations ("Add a new leader to a session") could be easily broken up into several atoms executed in sequence ("Create a new user, add the new user to the system, find a session, add the user to the session, promote the user to leader status in that session").

The best way to handle an unwiedly collection of interrelated methods is to reduce them to a small set of distinct atomic methods, and let the user call several in sequence.

Of course, this is just my opinion.

I think a case could be made for putting the "ambiguous" methods, like removing a user from a session, into a single flat class, and retaining the others, like getting the course ID, with their respective obvious objects. But this is only good if you have a clear way to characterise ambiguously-located methods.

Again, I think a better approach is to provide a handful of atomic operations, and put them all into one place. I feel like I keep repeating myself, so you must not like that solution.

Also, this is a networked application that's supposed to be able to handle a lot of different connections, so breaking operations like moving a user into "remove user" and "add user" could cause synchronization problems.

This sounds like a perfect opportunity to implement a mutex operation. Whenever a user needs to modify the structure with a sequence of atomic operations, she just locks the structure first, granting her temporary exclusive access to the structure. You're going to have to implement exclusivity anyway, if you intend the structure to be networked and thus necessarily thread-safe.

- Warren

0rthodontist · Aug 2, 2006

Again, I think a better approach is to provide a handful of atomic operations, and put them all into one place. I feel like I keep repeating myself, so you must not like that solution.

Well, such a handful would not be small. You wouldn't want to put getting the session ID and the course ID in the same place, even if you do want to do so with user movements.

This sounds like a perfect opportunity to implement a mutex operation. Whenever a user needs to modify the structure with a sequence of atomic operations, she just locks the structure first, granting her temporary exclusive access to the structure. You're going to have to implement exclusivity anyway, if you intend the structure to be networked and thus necessarily thread-safe.

OK-we just declared all of the methods synchronized, but I guess you could use the other way and that would work.

Still, I have serious doubt that this can be reduced to a few atomic operations. In addition to those under Course, here are the public methods under Session (in the old version):
addSlide(Slide)
getCurrentSlideID()
getID()
getLeader()
getName()
getSlide(int)
getUsers()
isDefault()
isEmpty()
logSession()
setCurrentSlideID(String)
setLeader(UserConnection)
setName(String)
writeToSession(Response)
writeToSession(Response, UserConnection)

those under UserList:
getInstance()
addConnection(UserConnection)
addUser(UserConnection, int)
getUser(String)
moveUser(Session, UserConnection)
newSession(UserConnection, String)
removeUser(UserConnection)

and those under CourseList:
CourseList()
addCourse(Course)
addCourse(String, int)
getCourse(int)
getCoursebyIndex(int)
removeCourse(int)
size()

Just for convenience, the ones under Course are:

0rthodontist said:

Course(String, int)
addSyncsession(String, UserConnection)
getDefaultSession()
getID()
getName()
getSession(int)
getSessions()
getSynchronousSessions()
registerUser(String, int, boolean)
removeUserFromSession(UserConnection)
setName(String)

I don't think all these can be condensed into any single class of reasonable size, even if you do make some of the atomics smaller.

One other thing I was thinking about is that maybe moving a user around should be handled directly by the user class, which would have privileges to make all the changes to other classes in the user framework by itself. The user class actually was mostly left out of the user framework because it was made earlier as part of another package, as the UserConnection, before the need to organize users by course and session was fully understood.

chroot · Aug 2, 2006

Let's see here... the accessor methods can remain under their subtype classes... in other words, Session will still have its own getID(), getLeader() and so on methods. I don't think it'd be a problem to leave the setID() and setLeader() methods and so on there, too.

The only methods that would go into CourseManager are those that actually modify the data structure as a whole. So now, from your list, we're down to a handful:

addSlide(Slide)
addCourse(Course)
removeCourse(int)
addConnection(UserConnection)
newSession(UserConnection, String)
addUser(UserConnection, int)
removeUser(UserConnection)

Basically, you just wind up with a triplet of get, add, and remove atoms for each type of data -- Courses, Users, Sessions, etc. They're all rigidly defined, easy to verify and debug. Moving data from one part of the tree to another becomes a sequence of atoms: remove followed by add.

Each time you add an element to the data structure, you provide the location to which you want to add it. The various methods for searching the data structure (getCourse(), getUser(), getSession(), etc.) can be used to find parts of the structure. Then, when the user adds new data, she provides the location where that data should be added in terms of an element she got from getCourse(), getSession(), etc.

I still see a half dozen atoms, in total.

- Warren

0rthodontist · Aug 2, 2006

Well, there are definitely 3 more that fit your description of modifying the structure, namely
addSyncSession()
registerUser(String, int, boolean)
removeUserFromSession(UserConnection)
Also, some of the operations are a little more involved than just changing the data structure. For example, removeCourse will also boot everyone logged into that course, and newSession has to appoint the user as the session leader.

Overall though, 10 basic methods plus a couple others (getInstance, maybe getUser) is very reasonable for an access class. But all the other methods (like the various getters) would have to remain in place in their respective classes.

Let's say that instead of the classes I have, I had twenty other kinds of containers that have their own add and remove operations. Then it probably wouldn't be practical to have a single class to do all data alteration.

chroot · Aug 2, 2006

I think a few of those methods you listed, like "removeUserFromSession" would be better implemented as a sequence of atomic operations -- get user, get session, remove user from session.

Some of the atomic methods, like removeCourse(), do indeed make non-trivial changes to the data structure. This is why I suggest keeping the methods as atomic as possible, so they can be easily verified.

removeCourse() shouldn't include code to actually traverse the data structure to find a course -- that should be in getCourse() (or findCourse() or whatever you want to call it).

I agree that all the get and set functions that pertain to a specifc piece of data (User, Course, Session) should remain in their respective classes. The only methods you need in CourseManager are those that manipulate the structure of the tree, not just the content of its leaves.

If you really had twenty additional kinds of data to store, I'd suggest exploding the entire thing into several distinct trees, and using some kind of relational strategy to associate them. Of course, at that point, you're effectively designing a database, and I would probably just dust off a copy of MySQL.

- Warren

0rthodontist · Aug 2, 2006

No, removeUserFromSession can't be implemented as any sequence of atomic operations you've mentioned. It actually is one of the more complex operations, since it must close the session if the session is empty, or if the session is not empty and the user is the leader, it must pass leadership to another user.

Using a database definitely is an option, but within OOP I think that you could standardize the location of the operations. If the only things to worry about are "remove" and "add" operations, then you can naturally locate them in the object that contains those things being removed or added. If that object might have to change the state of other objects in the hierarchy, then it should have that privilege even though it might be lower in the hierarchy than they are.

Design Pattern? nested data types

Is A.I. more than the sum of its parts?

AI vs. Humans as Processors in an Environment

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Design Pattern? nested data types

Similar threads