Data hiding considered harmful?

0rthodontist · Oct 28, 2006

I've been thinking about functional programming and what really makes it different from imperative programming. It's not that you can prove things more easily about programs written in it--that may be true, but most programmers do not prove the correctness of most of the code they write so it can't be the central feature. I think that the difference is that data is "up front" and easily visible as the arguments to a function or the return value of the function. This makes it easier to understand. I recently had the misfortune of trying to figure out how a disorganized Python program works. This program had a central loop that just called a series of functions with no arguments. The functions depended entirely on side effects, setting member variables instead of returning values (and worse, in Python you don't have to declare your member variables before you use them; they were getting declared within the functions, not in the initializer). It was very difficult to read--even determining the type of a lot of things was a challenge, since they were being read from external data and Python doesn't have an explicit type system.

Data hiding is right in one thing: it's bad to allow outside code to directly alter your class data, and perhaps also bad to allow outside code to directly access your class data. But I believe that it's clearer to read if all data handled and its type should all be instantly in the view of a programmer reading the code. i.e. a distinction should be made between hiding data from another part of the program, which is good, and hiding data from the programmer, which is not good. If a function alters any global or member variables or performs any I/O, it should say that does in its declaration--and what exactly it does to that data should be documented, not abstracted away. A while ago I was trying to determine how HashMap worked and why it didn't seem to be efficient. The abstract description contained no clue, and it was only by going past the abstraction, into the source code for the class, that I figured out what it was really doing.

I want a programming language that has these properties:
--object oriented
--strongly typed
--all inputs to a function must be declared along with their type
--the return value of the function must be declared along with its type
--if any global or member variables that persist outside the function are altered or accessed within a function, these must be declared in the function declaration
--if any IO is used in a function, this is basically the same as altering global variables and must be declared
--the documentation guidelines ask that for each non-trivial function, the following should be described:
1. What its inputs are
2. What its output means
3. What (global data, member vars, input) it reads data from
4. What (global data, member vars, output) it writes data to and what its output to those places mean
5. What exceptions it may cause, and when
--the documentation is tiered, so that it can be set to various levels with lower levels of documentation hidden. 1 and 2 are the bare essentials--if there is any documentation shown then 1 and 2 should be shown. Also 5 should be shown. If more information is needed, 3 and 4 should be shown, with I/O and global data at a higher level of visibility than member vars.
--Optimally, the type signature for a class, describing all of that besides the manual documentation, should have the option to be generated automatically to save typing.

As Frederick P. Brooks said in The Mythical Man Month:
"Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious."

haki · Oct 28, 2006

What is your point?

o) Python code can be a mess, I completely agree.

o) Data hiding, in OOP that is called encapsulation and in OOP there is no "global" data.

HashMap has a specific purpose I don't see why you should look into the implementation.

Maybe you just use a bad library.

Well there are some OOP style functional languages, there is O'Haskell, OCaml and such, but this languages look to me like they are made for some kind of research and not for making commercial grade applications.

From my experience Java is just great if you do Java style Java programming and not Pascal or C style Java.

0rthodontist · Oct 28, 2006

haki said:

What is your point?

I wasn't clear?
OK, here it is again:

Hiding data is good practice to prevent other code from unknowingly messing with your classes, but it is not good to hide data from programmers reading your code. All of the flow of data within a program should be documented, including I/O as well as member variables, global data, and the usual parameters/return values of functions. The language should enforce this documentation by a strict type system and by requiring function declarations to include this information, and in addition it should promote this documentation by officially supporting manual documentation practices that document all data flow.

o) Python code can be a mess, I completely agree.

o) Data hiding, in OOP that is called encapsulation and in OOP there is no "global" data.

Encapsulation is a bit different from data hiding. You can hide data without encapsulating things into objects, and you can encapsulate things into objects without hiding any data.

Member variables are basically equivalent to global data, as they can be used and altered by many functions. So is, e.g., the screen state or the keyboard input, other shared devices or the file system. Also, I may be mistaken since it's been years since I used C++ but I think that C++ allows global variables in addition to member variables.

HashMap has a specific purpose I don't see why you should look into the implementation.
Maybe you just use a bad library.

The problem is that there is nothing in the documentation stating that hasValue() has a bad runtime, and there are ways to implement a HashMap with a good runtime for hasValue(). Only by looking at the actual representation of the member data for HashMap is it possible to see that hasValue() must be linear. If you want to call Java 5.0 a bad library, I can only agree in this case.

Well there are some OOP style functional languages, there is O'Haskell, OCaml and such, but this languages look to me like they are made for some kind of research and not for making commercial grade applications.

From my experience Java is just great if you do Java style Java programming and not Pascal or C style Java.

Essentially no language is "great" for a large enough project--that's the whole reason for research in software engineering.

I actually think that a language like I am describing is closer to a standard OO language than it is to a functional language. I'm talking about use of global variables, I/O, local state, etc., all somewhat at odds with pure functional programming, but with everything named in type signatures and not hidden.

haki · Oct 28, 2006

You were a bit clearer this time.

I totally agree with you that any program should have an extensive documentation!

In Java one is obligated to put JavaDocs on anything that is public, that is you are supose to comment public methods, fields and classes with JavaDoc but some people don't do that, grrr.

Well I like the concept of encapsulation but it should be properly done. e.g. beginners mistake

class Person{

private String name;

private Date dob;

...

public Date getDateOfBirth(){
return dob; // mistake, you should not return a reference of mutable class
}

}

now one can do

Jim.getDateOfBirth().setTime(1212); and it will corrupt the internal state of the object it should be

public getDateOfBirth(){
return dob.clone(); // correct since Date is mutable we must return a clone
}

Same goes for returning Collections one should e.g.

private ArrayList<T> someImportantData = ...

it should be

public ArrayList getImportantData(){
return Collections.unmodifiablelist(someImportantData);
}

or something of equivalent nature, then there is no possibility that one can corrupt your object.

Ofcorse methods with side effects should be well documented.

It is very easy to write bad code.

The equivalent to global data in Java would be static stuff but that is not very OOP.

I do not agree that java 5.0 has a bad library. I agree that the Collections frameworks has some kinks. e.g. there is a method called binarySearch but if you put in a linked list it actually does a linear search how decieving! But in spite of that it is better to have a standardized collections library than non at all. http://www.artima.com/weblogs/viewpost.jsp?thread=4894 for what I do with Java I find the Collections framework sufficient.

I don't recall HashMap having a hasValue() method
http://java.sun.com/j2se/1.5.0/docs/api/java/util/HashMap.html

I think that you would like Java or Ruby. But each language has it tricks soo one should not use a generic approach to languages.

0rthodontist · Oct 28, 2006

haki said:

You were a bit clearer this time.

I totally agree with you that any program should have an extensive documentation!

In Java one is obligated to put JavaDocs on anything that is public, that is you are supose to comment public methods, fields and classes with JavaDoc but some people don't do that, grrr.

Well I like the concept of encapsulation but it should be properly done. e.g. beginners mistake

class Person{

private String name;

private Date dob;

...

public Date getDateOfBirth(){
return dob; // mistake, you should not return a reference of mutable class
}

}

now one can do

Jim.getDateOfBirth().setTime(1212); and it will corrupt the internal state of the object it should be

public getDateOfBirth(){
return dob.clone(); // correct since Date is mutable we must return a clone
}

Same goes for returning Collections one should e.g.

private ArrayList<T> someImportantData = ...

it should be

public ArrayList getImportantData(){
return Collections.unmodifiablelist(someImportantData);
}

or something of equivalent nature, then there is no possibility that one can corrupt your object.

I don't see how that relates to my point. It would be a helpful improvement to Java, to prevent or at least warn people from returning references to private objects. But it's not quite the same.

What I mean is something like the following:

Code:

import java.io.*;

class Example
{
	private int[] abc;
	public int xyz;
	
	public Example() alters abc
	{
		abc = new int[10];
	}
	
	/**
	 * Documentation explaining what exactly foo does to
	 * x, y, xyz, abc, System.in, System.out would go here
	 * if this function actually did something worth documenting
	 */
	public String foo(int x, int y) throws NumberFormatException, IOException; takes xyz, System.in; alters abc, xyz, System.out
	{
		abc[2] = x;
		abc[3] = y;
		BufferedReader b = new BufferedReader(new InputStreamReader(System.in));
		abc[4] = Integer.parseInt(b.readLine());
		abc[5] = xyz
		xyz += x + y;
		System.out.println(xyz + "\t" + y);
		return "7";
	}
}

where I have introduced the additional keywords "takes" and "alters" to describe global variables or member variables that foo reads from or writes to. Since abc is private, most people viewing the function would not notice that foo alters abc, so they wouldn't have to see or care about the details of that--but the idea is that if they do need more information, perhaps because they want to rewrite a system that involves the Example class and need to know how stuff is supposed to work on the inside, then they can increase their documentation detail level and see what's going on.

Also, the "takes" and "alters" statements for foo would "trickle up" to become part of the declarations of any functions that calls foo, just like if an object calls foo, since foo throws IOException that object has to throw an IOException just like foo does (unless it catches the exception). So if I defined a function bar that calls ex.foo() where ex is an Example object, then bar would have to state that it alters ex, and if you increased documentation detail level you would also see that bar takes ex.xyz, System.in and alters ex.abc, ex.xyz, System.out. Through tracing what data the functions alter, you get a clearer picture of what's going on in any particular function.

Ofcorse methods with side effects should be well documented.

It is very easy to write bad code.

The equivalent to global data in Java would be static stuff but that is not very OOP.

You're right. Usually you use static final, which is OOP.

I do not agree that java 5.0 has a bad library. I agree that the Collections frameworks has some kinks. e.g. there is a method called binarySearch but if you put in a linked list it actually does a linear search how decieving! But in spite of that it is better to have a standardized collections library than non at all. http://www.artima.com/weblogs/viewpost.jsp?thread=4894 for what I do with Java I find the Collections framework sufficient.

I agree, that's what I meant by "in this case." It's one instance of bad documentation, but the library overall is useful.

Actually what I'm saying doesn't apply to library interfaces, which typically do present a unified documentation and are designed to be easy to use from the outside, so much as it does to code maintenance, where you have to rewrite and update code that other people wrote years ago, and figure out their logic.

I don't recall HashMap having a hasValue() method
http://java.sun.com/j2se/1.5.0/docs/api/java/util/HashMap.html

Oh, you're right--I meant containsValue()

I think that you would like Java or Ruby. But each language has it tricks soo one should not use a generic approach to languages.

I haven't used Ruby. What are the advantages of it?

haki · Oct 29, 2006

I don't agree with you on this one! What you prosose would only make things look more complicated than they actually are.

Normally you need not such functionality you are proposining. Generally you are more conserned about providing a specifc functionality(behaviour, methods) than conserned with what data field gets modified.

It is pain in the ... to clean up the code after somebody did "it just works" style coding before you, and then it happens to stop working or you need to add some functionality and you are the guy to do it. I hate such situations. Recently I had to do some upgrading and I ended up rewriting the whole thing, I hate C style Java the most, once I was looking at some Java code and then the guy who wanted me to do some work said oh here is the documenation to help you - he had a printed names of all the methods and comments! The reson you do JavaDocs is soo you comments are in the same place as your code! I gave him the link to JavaDoc and walked away - I was in position to do that, not to mention the infamous C style Java Array e.g.

Code:

String[] whatever = {"Programming","is","fun"};
int lengthOfWhatever = 3;

in your example supose you would have a method called readValuesFromConsole(); one would expect this method would modify the internal state of the object, I don't see why should you put a buch of takes and alters.

Also, writing documentation is already the least favorite part of development then saying that somebody should document the inner workings of their method calls that is not very good.

What you propose is something IMHO that looks good in theory but will probably not work well in the practice. IHMO the public inteface should be comented heavily the rest is optional, but a good idea to comment maybe less obvious code.

For tracing trough methods use a good debugger!

With HashMap the most popular methods are put and get, I would think of using an alternative structure if you would do a lot containsValues, since in a Map you usually care about the keys and not the values, I would imagine that containsValues would do a linear search trough all the keys and looking for the specified value. Or maybe you would have 2 HashMaps one with Keys-Values and the other with Values-Keys but then again you should have unique values and unique keys.

Ruby has more features than Java

http://www.ruby-lang.org/en/

e.g.

Code:

# Ruby knows what you
# mean, even if you
# want to do math on
# an entire Array
cities  = %w[ London
              Oslo
              Paris
              Amsterdam
              Berlin ]
visited = %w[Berlin Oslo]
 
puts "I still need " +
     "to visit the " +
     "following cities:",
     cities - visited

but I find it confusing at times. I primarily like Java because it is a plain and simple language in its core. Simple things work best.

haki · Oct 29, 2006

On the other hand it is not such a bad idea after all. It is just that this is a bit "low level". You could have designed a tool to do what you propose. Having a new keywords alters and takes would only add to the boiler-plate nature of java. Some people don't even like seeing getters and setters then imagine having to look at alters and takes stuff, since this is Java it is not possible to hide this. I think that one solution would be you could have put your basic code into one file lest say MyClass.java and you would put getters/setters your alters/takes and some more comments into let say MyClassSuppliment.java and generally the MyClassSuppliment could be generated from MyClass, but I don't like the idea of my code being split into two files not to mention if you have 2000 Classes then you would have 4000 Classes. I know if you have already 2000 why not having 4000.

But good research idea

-Job- · Oct 29, 2006

Orthodontist et al, you should check out C#.Net, it's an improved Java that can support manual memory management. It also has enumerations, generic collections, out-parameters, partial classes, etc.
For example, in C# getters and setters can be defined as follows:

Code:

private string _MyProp = "MyValue";
public string MyProperty
{
      get{
            //return some value
      }
      set{
            //perform some action
      }
}

MyProperty being the visible property, from a programming stance. In the "get" function you can copy the value and return that, for example, to prevent access to the private variable. In "set" you can set the value and perform some other action, such as generate an event.
With C# you can compile your code to run on any architecture (with a Virtual Machine) or compile to native code on a specific architecture (i.e. x86).
I've always been a fan of Java for it's organization, but C# is amazing IMO.
As for documentation there are some tools that automatically generate doc pages, such ndoc. Also there are some Visual Studio plugins, such as ghostdoc which make it really easy to generate parseable comments.
Here's a C# tutorial if you're curious.
http://www.softsteel.co.uk/tutorials/cSharp/lesson1.html

0rthodontist · Oct 29, 2006

I knew that "too much trouble" would be a concern. I think that it wouldn't be so important for the following reasons:
1. Most methods don't alter very many member variables or global variables. In strict functional programming, this never happens. So the number of takes and alters will not be large. If it is large, you would start to suspect that maybe the method is too complex or programmed poorly.

It might become legitimately inconvenient if you are doing a GUI with a lot of widgets. However, I have a proposed solution to that: the ability to categorize the variables in your "takes" and "alters" clauses, so that they have a directory-type structure. You might say "takes [window]: JTextArea ta, JButton okbutton, [other]: int[] responseArr".
(incidentally I think it is also a good idea to have the types of everything included in the takes/alters clauses)

2. I would intend this to be used with documentation reading and generating tools rather than just straight text editors. Generating "takes" and "alters" should be completely automatic, requiring no programmer effort. Furthermore someone reading the documentation should be able to expand and collapse the various levels of data being taken or altered in these clauses. So in the above widget example, someone reading the documentation would only see
takes [window], int[] responseArr
Then they can choose to expand [window] and get
takes [window]: JTextArea ta, JButton okbutton, [other]: int[] responseArr
Then, say they choose to expand ta, and they might get something like this:
takes [window]: ((JTextArea ta: accessibleContext), JButton okbutton), [other]: int[] responseArr
enabling them to see exactly which data of ta is being used. Probably there is no good reason for a user to want to know that, but it's for illustration. In other situations there might be a good reason.
(I'm making up syntax on the fly here, but I hope you get the idea)

In addition, the type information of things in takes and alters clauses should be user-switchable; they should be able to choose to see it or not.

Any takes/alters documentation the programmer chooses to provide would be associated with particular data. So that if you documented exactly which data your function takes out of ta and associated that comment with the variable ta, then someone reading the documentation would not see your comment unless they chose to also see that your method takes ta.
------------------------------

Ideally, the documentation reader would also be the IDE. So that when you call a function, you can see right there what data it will change (the IDE will fill it in, and again you can choose to expand it). In the Example class example, you might type

Code:

Example z = new Example();
String s = foo(10, 12);

and then your IDE/documentation reader would fill in for you,

Code:

String s = foo(10, 12) throwing NumberFormatException, IOException; taking xyz, System.in; altering xyz, System.out;

depending on how much documentation you have chosen to see. You could lower that level of documentation visibility to see only what you typed, or you could increase it to see "altering abc, xyz, System.out" where the private field abc was not shown before but now is shown.

This may seem complicated but I think that if these tools were put into practice they would not obscure or impede anything that's already being done--since both generation of takes and alters clauses, and your choice of what documentation you want to see, would be automatic. It would only help someone reading the code to see what is truly going on and would not even be noticeable to someone who just wants to use an interface and doesn't have to care about the side effects.

haki · Oct 29, 2006

The problem with C# is that somehow you end up using one of M$'s products and that is not cheap.

Anyway, I think that what you are proposing should not be necessary if people would write good clean code with good documentation. But then again I am not a library developer I just use libraries, but at times I do have to add some things or fix some bugs but somehow I need to review the code anyway since I am going to do some changes.

The easiest would be if you would add some annotations e.g.

Code:

private int[] values;

@alters values
@takes System.in
public void readFromConsole(){
...
}

but I don't know what is wrong with just

Code:

/** Reads a sequence of ints from the standard input and stores them internally.
*
*/
public void readFromConsole(){
...
}

For what I am doing I don't need such functionality, but ...

It wouldn't be a bad idea to have the documentation enforced at language level, that is if you didnt put e.g. JavaDocs on all the public stuff you get a compile time error.

0rthodontist · Oct 29, 2006

Well, if all documenting were sufficiently good and all code were correct, there wouldn't be such a problem. I think that saying what your code reads from and changes just as comments in the javadoc is not enough though. Again, I think that the strength of functional programming--what makes it in some ways more readable than imperative programming--is that most of the time you get to _see_ all the data that's being passed to a function, right when you call that function, and you get to see everything that the function writes to, since it only writes to the return value. There's nothing behind the curtain; it's all right up front.

Proper documenting like you suggest would go a long way towards pulling the operation of a function out of the dark, but it has two disadvantages:
1. The coder must manually specify it, and will often forget to mention some particular effect. Say he changes what the function does--is he going to conscientiously update the javadoc? Maybe maybe not. This is the most important disadvantage.
2. You can only get at it by reading the comment or by going to the javadoc. It's not available to you right when you're reading the code that uses that function, so you're still left wondering "what does this do?" until you break your train of thought, open another window and go look at the doc. My suggestion of takes and alters integrated with the IDE would answer such questions immediately.

I think that an automatic takes/alters system, for any imperative programming language, would go a long way towards compensating for bad documentation or no documentation. No human documentation is perfect. Even seemingly perfect documentation frequently doesn't mention small state changes that a method makes to its object, and I think it should at some level--not to do so is to be ambiguous. Takes/alters would take care of that, and an interactive doc system would allow readers to selectively hide details they don't want to think about.

Enforcing documentation formally as a syntax is good on large projects but on large projects it's usually company rules to do it anyway. That still doesn't mean it's done well and says everything it should, and anyway on little programs there's sometimes no reason to document and you just want to bang something out quick. A takes/alters system could take the place of a lot of documentation on smaller programs and augment the data provided by manual docs on larger systems.

CRGreathouse · Nov 6, 2006

haki said:

The problem with C# is that somehow you end up using one of M$'s products and that is not cheap.

I think that C# is a better product than Java. There are plenty of free implementations out there -- Microsoft has a free Express edition, and there are perhaps half a dozen 3rd party C# products (of which Mono is the most popular, as I recall).

chroot · Nov 6, 2006

Keep in mind that the meta-information (what a function alters, what input it consumes) is useless without a methodology for using that information to prove that code is correct. It doesn't help the programmer at all to explicitly list whether or not a function has a side-effect (e.g. consuming input) if she writes it down but never uses that information again.

It really sounds to me like you just want to follow Cleanroom or zero-defect software engineering practices. The actual language you choose to use is pretty much irrelevant; you could use [a subset] of any language you like, and apply very strict software engineering practices within that envrionment.

You could even "store" the meta-information about altering global vairables and consuming stdin within comments, and then use a lint tool to walk the syntax tree and keep that information accurate.

I can recommend some good books on the subject of zero-defect coding techniques, if you're interested.

- Warren

0rthodontist · Nov 6, 2006

Certainly it would help if you wanted to prove the code correct, but my point is that it would improve readability. Having takes/alters information is the kind of "commenting" that is absolutely essential for someone trying to read your code to maintain it. If the code doesn't contain that information right at hand, they'll have to have to hunt it down anyway. It is my opinion right now--though it easily could change--that proof of correctness has little use except in very critical software. I'm more interested in code readability than proving correctness.

Data hiding considered harmful?

1. What is data hiding?

2. Why is data hiding considered harmful?

3. What are the potential benefits of data hiding?

4. Are there any alternatives to data hiding?

5. When should data hiding be used?

Similar threads

Hot Threads

Recent Insights