What "language" means in Kolmogorov complexity?

In summary, The conversation is discussing the concept of Kolmogorov complexity, which is the shortest length of a computer program that can generate a particular sequence. The question being addressed is whether all uncountable sets have elements with infinite Kolmogorov complexity. The answer explains that this is not necessarily the case, as some uncountable sets may have finite complexity. It also mentions the use of different types of programs, such as those that decide a language or recognize it. The conversation ends with a discussion about the difficulty of defining an uncountable set with a word of infinite complexity.
  • #1
zrek
115
0
Please help me to understand the answer I found on mathoverflow.
The question was: "Do all uncountable sets contain elements with infinite Kolmogorov complexity?"

The reasoning is clear for me, but I'd like to understand every word of the answer, which is the following:
"...given a language L⊆{0,1}*, we could let K(L) be the length of the shortest program that decides L, or K(L)=∞ if L is undecidable. (Or we could also talk about programs that recognize L, in which case even the language HALT would have a finite complexity.)"

I tried to understand the words in the context, please confirm that I understand the following special phrases correctly.

"given a language L⊆{0,1}"
means:
"given L, an arbitrary infinite series of 0 and 1 characters"

"program that decides L"
means:
"program that outputs L"

If this is OK, I still don't understand what " programs that recognize L" means?

Thank you for your help.
 
Physics news on Phys.org
  • #2
A program that decides L means it takes a sequence that is an element of the set of L sequences as an input and outputs true or false depending on if the sequence is a member of L (true) or undecidable (false).

This pdf may describe it better:

https://courses.engr.illinois.edu/cs373/sp2013/Lectures/lec21.pdf

and this one:

http://www.cs.nyu.edu/courses/fall09/V22.0453-001/chapter-4.pdf
 
Last edited:
  • #3
Hey zrek.

If I understand correctly you are looking at whether all uncountable sets have a computer program that is infinite in length to generate the sequence.

It may help to think of infinite series (which are used to model many kinds of numbers not rational and more general) and the nature of basis in an ordered field.

You should find that there will be a lot of uncountable sequences that can have finite Kolomogorov Complexity - even if many don't.
 
  • #4
chiro said:
If I understand correctly you are looking at whether all uncountable sets have a computer program that is infinite in length to generate the sequence.
Thank you, but the topic of Kolmogorov complexity and the cardinality of sets is clear for me, my problem is that I'm not proficient in the wording of the mathematical language. I'd like to learn to express myself as mathematicans do in this area. I feel also that sometimes I can't understand just because I don't know the phrases they use. I'm looking for some help to develop my understanding, thank you!
 
  • #5
jedishrfu said:
A program that decides L means it takes a sequence that is an element of the set of L sequences as an input and outputs true or false depending on if the sequence is a member of L (true) or undecidable (false)

OK, I understand this, thank you.
So the L "language" means that L is an (infinite) set of infinite long sequences of zeroes an ones in our case? ( "...given a language L⊆{0,1}" )
If this is the meaning of it, then I don't understand the reasoning:
"let K(L) be the length of the shortest program that decides L"

K supposed to be a (hypotetic) function or a program (that computes complexity) of one string or sequence. Its output should be one number (integer) (or outputs an infinity mark), the length of another program (found by K). This is the "shortest program" that is mentioned in the answer, right?
Then why they say that it "decides L", since its task is another thing, not "to decide" whether or not a sequence is a member of L?
 
  • #6
zrek said:
OK, I understand this, thank you.
So the L "language" means that L is an (infinite) set of infinite long sequences of zeroes an ones in our case? ( "...given a language L⊆{0,1}" )

That has been one of the difficulties that kept me from an answer. Does the language allow words of infinite length or words of arbitrary but finite length? In other words: what does the asterisk on ##\{0,1\}^*## mean?

If this is the meaning of it, then I don't understand the reasoning:
"let K(L) be the length of the shortest program that decides L"

That's been another difficulty. ##K(l)## is the minimal length of a program that outputs a word ##l##. So how is ##K(L)## defined? The supremum over all words ##l \in L##? And to decide ##l \in L## one doesn't necessarily have to construct a program that outputs ##l##. There might be a theorem that states the equivalence of both concepts which I don't know.

K supposed to be a (hypotetic) function or a program (that computes complexity) of one string or sequence. Its output should be one number (integer) (or outputs an infinity mark), the length of another program (found by K). This is the "shortest program" that is mentioned in the answer, right?

I thought it is the program that defines the Kolmogorov complexity: the length of the shortest possible program (on which automaton ever) that produces a given word of ##L## as output. E.g. ##\pi \in \mathbb{R}## is a word of infinite length over ##\{0,1\}##, of finite length over, e.g. ##\{0,1,\pi,e, \sqrt{2}\},## but a program to output ##\pi## can be very short although it won't halt.

Then why they say that it "decides L", since its task is another thing, not "to decide" whether or not a sequence is a member of L?

I don't understand what you wanted to say here. See also above about this potential theorem (I haven't tried to figure it out on the net). But there is surely a difference between "to decide ##L##" and "to decide non ##L##".

Beside all that, are you sure that there is an uncountable set (of what: alphabet or words?) at all with a word of infinite Kolmogorov complexity?
I suppose yes, but my example with ##\pi## shows that it is not trivial, since you have to define this word anyhow and the definition already is a finite program whose output is the word.

I took all my uncertainty of the statements above to the fact that I'm not an expert in automata theory. There seem to be a couple of suspected theorems involved to allow the colourful mixture of different terms and concepts: alphabet (although not explicitly mentioned), words, language, sets, Entscheidungsproblem.
 
  • #7
Information is always finite and this is critical to know whenever dealing with any computer science / information aspect.

You can symbolically represent an information amount of information (through something like an infinite series definition) but when it comes to physical representation it has to be finite (i.e. a finite number of actual bits).

How you do that is arbitrary (including the use of code) but it always has to be finite.

Langauge has to adhere to those constraints.
 
  • Like
Likes jim mcnamara
  • #8
Maybe a late reply to older thread, but anyway...

Don't know about the question at hand but regarding some of the confusions...

I tried to understand the words in the context, please confirm that I understand the following special phrases correctly.

"given a language L⊆{0,1}"
means:
"given L, an arbitrary infinite series of 0 and 1 characters"

"program that decides L"
means:
"program that outputs L"

If this is OK, I still don't understand what " programs that recognize L" means?

{0,1}* simply means the set of all "finite strings" (strings of finite length). That is:
{0,1}*={0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, 101, 110, 111, ...}

So L⊆{0,1}* simply means any set whose elements finite strings formed using symbols/alphabet 0 and 1 only.

A program in the context can simply be thought of as a function (with one parameter) in any modern programming language for which the input is of type String (say output is also of type String).

Now a program that decides L means one that takes any string as input and outputs a string "0" or "1" (and no other string) depending on whether the "input string" belongs to the language L or not. A language L is decidable only if there exists some program that decides it.

The following terms are equivalent:
decidable L -- L is a recursive set

non-decidable L
L is either:
a) recursively enumerable but not a recursive set
b) not recursively enumerable

Recursive sets are a strict sub-set of recursively enumerable sets. L being recursively enumerable means there exists a program that takes any string as input and either:
a) outputs "1" if the "input string" belongs to L
b) doesn't halt (runs forever) if the "input string" doesn't belong to L

If L is decidable L'(complement of L) is also decidable (which is same as saying that complement of a recursive set is also a recursive set).

If L is not decidable L' is not decidable either.
(proof follows by showing that if L' "was" to be decidable you could make a program that could decide L too)
 
Last edited:
  • Like
Likes jim mcnamara and Stephen Tashi
  • #9
Never too late SSequence, thank you for your explanation. Unfortunately my feeling is that I still don't understand the essence of the meaning of the "language".

SSequence said:
... L⊆{0,1}* simply means any set whose elements finite strings formed using symbols/alphabet 0 and 1 only.
So L is a set of finite length strings, with formed using a finite big alphabet (character set). (Say that the size of the caracter set is SC, in our case it is 2)
If this is true, then since the strings have finite lengths, then there is a limit "MAX" for any L languange we can say there is no longer string in it than this MAX.
By this we can say that the number elements of any L languange is finite. For more strict speaking, we can say that for any L language with a given MAX and SC, the elements of the language is no more than SC^MAX. (All of the languages are "finite")
Is this right?
SSequence said:
... Now a program that decides L means one that takes any string as input and outputs a string "0" or "1" (and no other string) depending on whether the "input string" belongs to the language L or not. A language L is decidable only if there exists some program that decides it.

Since any language is finite, we can say that there will be always a program that decides it, therefore there are no undecidable Ls (unless it is not well-defined).
In this point I think I lost the thread, because why some people are speaking about undecidable languages, if they are not exist at all?
 
  • #10
zrek said:
If this is true, then since the strings have finite lengths, then there is a limit "MAX" for any L languange we can say there is no longer string in it than this MAX.

No. For example, {0,1}* does not contain the infinite sequence 0,1,0,1,0,1,... but it contains all the sequences that are an initial sequence of terms of that infinite sequence: (0), (0,1), (0,1,0), (0,1,0,1),... etc. So there is no MAX length for sequences in {0,1}*.

If a language L is a set with a finite number of elements, you could say there is a MAX length for the strings in L. But if L happens to be a subset of {0,1}* with an infinite number of elements then you cannot establish a MAX length for the strings in L.
 
  • #11
Probably not a major point, but I forgot that empty string (string of length 0) is also supposed to be part of {0,1}*. If we denote it by e then concatenating with other strings has no effect. For example,
e0001 = 0001
e01e10e = 0110
etc.

Also, if a string belongs to some language L it is not necessarily required that all (or any for that matter) its substrings also belong to L. For example, the following is a valid example of language:
{11111, 0, 00, 000, 0000, 00000, ...}

"...given a language L⊆{0,1}*, we could let K(L) be the length of the shortest program that decides L, or K(L)=∞ if L is undecidable. (Or we could also talk about programs that recognize L, in which case even the language HALT would have a finite complexity.)"
Didn't look at this before. I think in the context of the quote "program that recognizes L" simply means one that recursively enumerates L (because HALT is recursively enumerable but not recursive). In which case the phrases "L is recognisable" and "L is recursively enumerable" can be thought of as equivalent.

I do not know whether this usage for "recognisable" is standard even though it makes sense.

And finally, I wrote before:
Recursive sets are a strict sub-set of recursively enumerable sets.
This wording is more concise but could be considered a little ambiguous.

More strictly (at the cost of being verbose), one could say that the collection of recursive sets (that is, collection whose elements are recursive sets) is a strict sub-set of collection of recursively enumerable sets (collection whose elements are recursively enumerable sets).
 
Last edited:
  • #12
Stephen Tashi said:
No. For example, {0,1}* does not contain the infinite sequence 0,1,0,1,0,1,... but it contains all the sequences that are an initial sequence of terms of that infinite sequence: (0), (0,1), (0,1,0), (0,1,0,1),... etc. So there is no MAX length for sequences in {0,1}*.
.

I think there is a difference between "there is no MAX" and "the MAX can not be represented by a specific integer". This difference distinguish between the finite and infinite lengths.
If there is no MAX, then we are talking about infinite string lengths. And vice verse: if we define so that the strings are not infinite, then we say it this way because we imagine a MAX (even if we say that the MAX can go up to infinity "but never reaches it")
Please let me know if I got the term "finite string" wrongly this way.

SSequence said: "So L⊆{0,1}* simply means any set whose elements finite strings"
Therefore the number of elements in it also have to be finite, making it even more reasonable to talking about a MAX.

Otherwise I think there is something contradictionary in one of the argument of yours.

I think I begin to understand it:
There can be 3 kinds of languages (regarding of infinity):
(1): Finite number of elements (strings), all elements lengths are finite.
(2): Finite number of elements, some elements are infinite strings.
(3): Infinite number of elements, some (most of the) elemenst are infinite strings.
(+1): There is at least one undefined string in the language.

(1) is decidable for sure.
(3) is not decidable for sure.

(2) is maybe decidable if we accept the possibility of programs that accepts formulas of string definitions as input. (Otherwise it is not possible to define all of the elements of the language, making it falling to the (+1) cathegory)
 
  • #13
zrek said:
SSequence said: "So L⊆{0,1}* simply means any set whose elements finite strings"
Therefore the number of elements in it also have to be finite,

No. Why would the number of elements in L have to be finite?

For example, let L be the set of finite strings consisting of alternating 0's and 1's. Each element of L is a finite string, but there is no upper bound on how long a string in L may be.
 
  • #14
Stephen Tashi said:
For example, let L be the set of finite strings consisting of alternating 0's and 1's. Each element of L is a finite string, but there is no upper bound on how long a string in L may be.
While I still find confusing to define a string as "finite" and simultaniously stating that "its length have no upper bound" and I'm pretty sure that no one can give me a particular example that fits to both of these requirements, you enlightened me and now I understand that it is just like the set of natural numbers: the set is infinite, but any number is finite in it. Thank you for your help and patience.
 
  • #15
I'm still try to understand the things in your answer (thanks for it, I feel I'm developing my knowledge)
SSequence said:
A language L is decidable only if there exists some program that decides it.
I have a feeling that any well defined language is decidable, is it right?
I mean if we strictly defined something, then the definition must be capable to be interpreted by a program, therefore the language is decidable.

Or is this true only for the "recursively enumerable" languages?
Is there a strictly defined language that is known to be undecidable but recursively enumerable, and if there is, can you give me a specific example?
 
  • #16
SSequence said:
Didn't look at this before. I think in the context of the quote "program that recognizes L" simply means one that recursively enumerates L (because HALT is recursively enumerable but not recursive). In which case the phrases "L is recognisable" and "L is recursively enumerable" can be thought of as equivalent.

Wolud you please explain what you mean on " HALT is recursively enumerable but not recursive" exactly?

I think now I know what the language means in this context.
Please help me one more thing to understand in the quote "...given a language L⊆{0,1}*, we could let K(L) be the length of the shortest program that decides L..."

K supposed to be a specific (but only an imaginary) function that takes one finite string as input, and outputs an integer, the Kolmogorov complexity of that string.
How come to the picture the K(L), in which L is a language, a set of finite strings?
 
  • #17
These topics are covered in numerous elementary/intermediate books. I will strongly recommend that you read the relevant books/notes(early chapters at least) available on the subject ("recursion theory" or "computability theory"). The questions you are asking are very basic (but neverthless the related concepts still take significant time to sink in) ... and to be honest what I have written is just about most of the stuff I know about this topic.

As far as kolmogorov complexity is concerned I think that it would make sense to try to understand it once you have some hold on the basics of recursion theory (I don't know about it anymore than it is related to "smallest programs" in various ways).

I think I begin to understand it:
There can be 3 kinds of languages (regarding of infinity):
(1): Finite number of elements (strings), all elements lengths are finite.
(2): Finite number of elements, some elements are infinite strings.
(3): Infinite number of elements, some (most of the) elemenst are infinite strings.
(+1): There is at least one undefined string in the language.
You are getting in completely wrong track with this. Just forget about "infinite strings" for now. As already mentioned before if you have "infinite string" like 11111111111..., you can just model it with a language:
{ e, 1, 11, 111, 1111, 11111, 111111, ...}

I understand that it is just like the set of natural numbers
To do computations (of the kind mentioned in this thread) you need a basic set of objects. The main condition is that objects can be counted easily. Natural numbers are definition of counting. Strings can also be counted quite easily. In some scenarios natural numbers are more convenient and in others strings take this role.

I have a feeling that any well defined language is decidable, is it right?
That is not the case. The undecidable languages are too numerous.

Wolud you please explain what you mean on " HALT is recursively enumerable but not recursive" exactly?
HALT usually refers to halting problem (in some variation). As it happens all recursive/decidable languages can easily (somewhat trivially) be shown to be r.e.(recursively enumerable) languages.

But the converse doesn't happen to be true and halting set is one of the most easily understandable examples of a language that is r.e. but not recursive/decidable.

"...given a language L⊆{0,1}*, we could let K(L) be the length of the shortest program that decides L..."

K supposed to be a specific (but only an imaginary) function that takes one finite string as input, and outputs an integer, the Kolmogorov complexity of that string.
K is supposed to be a function whose input is some sub-set of {0,1}* and whose output is a natural number (denoting length of program that decides it).

K doesn't take a string as input (I am going to drop "finite" because we are always talking about finite by default). It only takes a set of strings as input. If you meant that it can take a set which contains only one string, that is possible of course. For example, K could take {11111} as input but not 11111 (because that is not a set of strings).

How come to the picture the K(L), in which L is a language, a set of finite strings?
First note that the set of strings that number of strings in L need not be finite. For example, L could be {1,11,111,...} (or many other examples). As for how to picture it? Well it is not particularly simple. Here is the best informal picture I can think of (note that this is only relevant to how K(L) was defined in the quote you gave in the OP):

Basically you can think of unique languages placed in each ordinal position. For all ordinals that are greater than or equal to ω the corresponding K(L) would be infinite. For ordinals less than ω at some points K(L) would be finite (denoting decidable languages) and at some points it would be infinite (denoting r.e. languages that are not recursive/decidable).

edit:
If we are talking about "recognising L" (instead of "deciding L") this picture would have to be adjusted a little. In that case you can imagine K(L) being infinite once again for ordinals that are greater than or equal to ω. However K(L) will be finite for all ordinals less than ω (denoting r.e. languages).For readers that are familiar with arithmetic hierarchy. I don't know arithmetic hierarchy well enough, but what I am thinking with this is that the first row (less than ω) contains r.e. languages. The second row (greater or equal than ω but less than ω.2) contains languages that are h-r.e. but not h-recursive ... and so on (obviously continuing in this manner AH itself will end pretty soon). Something of that sort basically.
 
Last edited:
  • #18
I'm pretty sure that there are things in this topic I don't really understand (especially the notations at least), but you underestimate my knowledge about it. Most of that you explained above are clear for me. (Halting problem, cardinality of sets etc).
SSequence said:
K doesn't take a string as input (I am going to drop "finite" because we are always talking about finite by default). It only takes a set of strings as input
Sorry, I asked not clearly. Basically the fundamental concept of Kolmogorov complexity is about a function K that accepts a string.

Quote from this paper:
http://homepages.cwi.nl/~paulv/papers/info.pdf
"Kolmogorov com-
plexity is defined by a function that maps objects (to be thought of as natural numbers or sequences of
symbols, for example outcomes of the random variables figuring in the Shannon theory) to the natural
numbers. Intuitively, the Kolmogorov complexity of a sequence is the length (in bits) of the shortest
computer program that prints the sequence and then halts."

I tried to ask about the different meaning of K(l) and K(L).
However you answered this too, by mentioning "think of unique languages placed in each ordinal position".
I liked your view, and now I think I understand the quote "...given a language L⊆{0,1}*, we could let K(L) be the length of the shortest program that decides L..."

One last thing that I try to questioning (and this is what I also tried to ask earlier):
For the generalization of the Kolmogorov complexity to infinite strings we suppose a K(L) notation, in which we require K to be a function or program, how can we imagine to hand over a language as L? (Please note that even the halting problem is only applies if we are talking about Turing-complete model of computation, so I suppose that our terms are acceptable only on this logical field)
Because I know only one way to do so: if someone is able to strictly define L in some other language, that is represented by a string (so L finally a finite string). In this case however we arrived to a snake that eats its own tail. (I mean this is not a good way to generalize tha Kolmogorov complexity to infinite strigs, because of its inner contradictionary dependencies)

OK, I don't expect you to answer this, you already helped me a lot, thank you!
 
  • #19
Bumping an old thread but I noticed a mistake --- not an obvious one but somewhat subtle (for my level at least I guess). Of course if someone can affirm that the comments below are correct (hopefully!), that would be helpful.
SSequence said:
For readers that are familiar with arithmetic hierarchy. I don't know arithmetic hierarchy well enough, but what I am thinking with this is that the first row (less than ω) contains r.e. languages. The second row (greater or equal than ω but less than ω.2) contains languages that are h-r.e. but not h-recursive ... and so on (obviously continuing in this manner AH itself will end pretty soon). Something of that sort basically.
I think that instead of above it should be like:
For readers that are familiar with arithmetic hierarchy. I don't know arithmetic hierarchy well enough, but what I am thinking with this is that the first row (less than ω) contains recursive languages. The second row (greater or equal than ω but less than ω.2) contains languages that are h-recursive but not recursive ... and so on (obviously continuing in this manner AH itself will end pretty soon). Something of that sort basically.
At the time I wrote this, I thought that probably both of these quotes would be equivalent. However, I don't think they should be. The former one should probably be incorrect as it seems it would miss out sets (and hence would be invalid/not robust).

Why? It seems that there should be languages/sets are H-recursive but not r.e. I think that should be because of the construction of post/kleene where they describe a pair of sets/functions which are reducible to H(or alternatively K for diagonal set) but not to each other.
It should follow from that H isn't reducible to any of those two sets (say A and B) either. Because if H was reducible to any of A or B, then one of A or B would be reducible to the other (and hence a contradiction).

Now, I don't understand the detail of the construction, but none of the two sets in that construction can be r.e. --- if they were, that construction would solve post's problem (but it didn't).

==========

Also a question that seems to have more direct relevance to the topic. Consider the smallest turing machine which begins with blank tape and halts with n-consecutive 1's on the tape. In a similar manner, consider the smallest program (with instructions limited so that there are finite number of programs of a given length), with all variables initially 0, which outputs the constant function n.
If we form the function which takes as input n and outputs the length of resulting (smallest) program, can we prove this function to be uncomputable. If so, can someone give a reference where the result is shown in detail.
 
Last edited:
  • #20
SSequence said:
It seems that there should be languages/sets are H-recursive but not r.e. ...
I think I have made it much more complicated than it needs to be :P. This statement is true, but for much more trivial reasons.
Any set, which is co-r.e. but not recursive, serves to demonstrate this claim. If K is the diagonal set, then the complement of K (the set K'=N-K) is an example.

The more substantial claim is that there are sets which are H-recursive but neither r.e. nor co-r.e. After looking up a bit, it seems that a simple construction serves to demonstrate that is true too. The set:
A = { 2n | n∈K } ∪ { 2n+1 | n∈K' }
serves as an example.
One can prove by first demonstrating that it is H-recursive/K-recursive. After that generate contradictions of the form:
i) If A is r.e. then K is recursive (contradiction)
ii) If A is co-r.e. then K is recursive (contradiction)
 

What is Kolmogorov complexity?

Kolmogorov complexity is a measure of the amount of information in a given object or data set. It is based on the idea that the complexity of an object is determined by the length of the shortest computer program that can produce it.

How does Kolmogorov complexity relate to language?

In the context of Kolmogorov complexity, language refers to a system of symbols and rules used to communicate information. The complexity of a language can be measured by the length of the shortest program that can generate it.

Can Kolmogorov complexity be used to compare different languages?

Yes, Kolmogorov complexity can be used to compare different languages by measuring the size of programs needed to generate them. However, this comparison may not always be meaningful as different languages may have different rules and symbols, making it difficult to directly compare their complexity.

How does Kolmogorov complexity differ from traditional measures of language complexity?

Traditional measures of language complexity, such as grammar rules or vocabulary size, focus on specific aspects of a language. Kolmogorov complexity, on the other hand, considers the overall information content and compressibility of a language, taking into account all of its components.

What are the limitations of using Kolmogorov complexity to measure language?

One limitation is that Kolmogorov complexity is an abstract concept and can be difficult to apply in practical situations. It also assumes a universal computing machine, which may not accurately reflect the capabilities of real-world computers. Additionally, it may not capture the nuances and complexities of human languages, as it is primarily based on mathematical principles.

Similar threads

  • Engineering and Comp Sci Homework Help
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Programming and Computer Science
Replies
1
Views
718
  • Engineering and Comp Sci Homework Help
Replies
3
Views
880
  • Engineering and Comp Sci Homework Help
Replies
1
Views
1K
Replies
26
Views
1K
  • Differential Equations
Replies
3
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
Back
Top