Yes, you're right--my states are actually continuous. I am trying to to discretize them so I can use learning algorithms that operate on tables. That being the case, the actions I choose to be available for the agent is also a discrete set. My system is in discrete time, also.
My idea behind calculating entropy of P(state j | state i,action a) is that, since the system is a Markov process, if I have represented the state space well enough (i.e., have enough vectors in the discretization), the transition probabilities (which are in reality deterministic for the 'real' states) will become very sparse. If I underrepresent the states, I will classify two different states as the same, and consequently, my transition probabilities will not be as sparse. i.e., transition probabilities will be more 'uncertain'--have more entropy.
Ideally, the entropy would then be monotonically decreasing for more vectors added to the discretization. By 'elbow', I mean the point in a monotonically decreasing graph right before the curve begins to flatten out, i.e., point of highest curvature. Analogous to identifying the number of labels in a clustering problem by looking at # of labels vs. inter-cluster error.
The problem, though, is that the graph looks like an upside down elbow, due to entropy calculated on more bins increasing. I don't think all is lost, though, since the distribution itself determines how much entropy increases by adding more bins. For example, a uniform distribution will have its entropy increase the most when adding more bins--the relationship is linear. The more concentrated the density is, the less adding bins will affect the entropy.