In [[computer science]], the '''[[Andrey Kolmogorov|Kolmogorov]] complexity''' (also known as '''descriptive complexity''', '''Kolmogorov-Chaitin complexity''', '''stochastic complexity''', '''algorithmic entropy''', or '''program-size complexity''') of an object such as a piece of text is a measure of the [[computation]]al resources needed to specify the object. For example consider the following two [[string (computer science)|strings]] of length 64
0101010101010101010101010101010101010101010101010101010101010101
1100100001100001110111101110110011111010010000100101011110010110
The first string admits a short [[English language]] description, namely "32 repetitions of '01'", which consists of 20 characters. The second one has no obvious simple description other than writing down the string itself, which has 64 characters.
[[Image:Mandelpart2.jpg|300px|right|thumb|This image illustrates part of the [[Mandelbrot set]] [[fractal]]. The size of the [[JPEG]] file encoding the bitmap of this image is more than 17 kilobytes (approximately 140000 bits). The same file can be generated by a computer program much shorter than 140000 bits, however. Thus, the Kolmogorov complexity of the JPEG file encoding the bitmap is much less than 140000.]]
More formally, the [[complexity]] of a string is the length of the string's shortest description in some fixed [[description language]]. The sensitivity of complexity relative to the choice of description language is discussed below. It can be shown that the Kolmogorov complexity of any string cannot be too much larger than the length of the string itself. Strings whose Kolmogorov complexity is small relative to the string's size are not considered to be complex.
The notion of Kolmogorov complexity is surprisingly deep and can be used to state and prove impossibility results akin to [[Gödel's incompleteness theorem]] and [[halting problem|Turing's halting problem]].
== Definition==
To define Kolmogorov complexity, we must first specify a description language for strings. Such a description language can be based on a programming language such as [[Lisp programming language|Lisp]], [[Pascal (programming language)|Pascal]], or [[Java Virtual Machine]] bytecode. If '''P''' is a program which outputs a string ''x'', then '''P''' is a description of ''x''. The length of the description is just the length of '''P''' as a character string. In determining the length of '''P''', the lengths of any subroutines used in '''P''' must be accounted for. The length of any integer constant ''n'' which occurs in the program '''P''' is the number of bits required to represent ''n'', that is (roughly) log<sub>2</sub>''n''.
We could alternatively choose an encoding for [[Turing machine]]s, where an ''encoding'' is a function which associates to each Turing Machine '''M''' a bitstring <'''M'''>. If '''M''' is a Turing Machine which on input ''w'' outputs string ''x'', then the concatenated string <'''M'''> ''w'' is a description of ''x''. For theoretical analysis, this approach is more suited for constructing detailed formal proofs and is generally preferred in the research literature. In this article we will use an informal approach.
Any string ''s'' has at least one description, namely the program
'''function''' GenerateFixedString()
'''return''' ''s''
Among all the descriptions of ''s'', there is one with shortest length denoted ''d''(''s''). In case there is more than one program of the same minimal length, choose one arbitrarily, for example selecting the lexicographically first among them. ''d''(''s'') is the '''minimal description''' of ''s''. The '''Kolmogorov complexity''' of ''s'', written ''K''(''s''), is
:<math>K(s) = |d(s)|. \quad </math>
In the other words, ''K''(''s'') is the length of the minimal description of ''s''.
We now consider how the choice of description language affects the value of ''K'' and show that the effect of changing the description language is bounded.
'''Theorem'''. If ''K''<sub>1</sub> and ''K''<sub>2</sub> are the complexity functions relative to description languages ''L''<sub>1</sub> and ''L''<sub>2</sub>, then there is a constant ''c'' (which depends only on the languages ''L''<sub>1</sub> and ''L''<sub>2</sub>) such that
: <math>\forall s\ |K_1(s) - K_2(s)| \leq c, \quad </math>
'''Proof'''. By symmetry, it suffices to prove that there is some constant ''c'' such that for all bitstrings ''s'',
: <math> K_1(s) \leq K_2(s) + c. </math>
To see why this is so, there is a program in the language ''L''<sub>1</sub> which acts as an [[interpreter (computing)|interpreter]] for ''L''<sub>2</sub>:
'''function''' InterpretLanguage('''string''' ''p'')
where ''p'' is a program in ''L''<sub>2</sub>. The interpreter is characterized by the following property:
: Running InterpretLanguage on input ''p'' returns the result of running ''p''.
Thus if '''P''' is a program in ''L''<sub>2</sub> which is a minimal description of ''s'', then InterpretLanguage('''P''') returns the string ''s''. The length of this description of ''s'' is the sum of
# The length of the program InterpretLanguage, which we can take to be the constant ''c''.
# The length of '''P''' which by definition is ''K''<sub>2</sub>(''s'').
This proves the desired upper bound.
See also [[invariance theorem]].
==History and context==
[[Algorithmic information theory]] is the area of computer science that studies Kolmogorov complexity and other complexity measures on strings (or other [[data structure]]s). The field was developed by [[Andrey Kolmogorov]], [[Ray Solomonoff]] and [[Gregory Chaitin]] starting in the late 1960s. There are several variants of Kolmogorov complexity or algorithmic information. The most widely used one is based on self-delimiting programs and is mainly due to [[Leonid Levin]] (1974).
Naming this concept "Kolmogorov complexity" is an example of the [[Matthew effect]].
==Basic results==
In the following, we will fix one definition and simply write ''K''(''s'') for the complexity of the string ''s''.
It is not hard to see that the minimal description of a string cannot be too much larger than the string itself: the program GenerateFixedString above that outputs ''s'' is a fixed amount larger than ''s''.
'''Theorem'''. There is a constant ''c'' such that
:<math> \forall s \ K(s) \leq |s| + c, \quad </math>
=== Uncomputability of Kolmogorov complexity ===
The first surprising result is that there is no way to effectively compute ''K''.
'''Theorem'''. ''K'' is not a [[computable function]].
In other words, there is no program which takes a string ''s'' as input and produces the integer ''K''(''s'') as output. We show this by contradiction. Suppose there is a program
'''function''' KolmogorovComplexity('''string''' ''s'')
that takes as input a string ''s'' and returns ''K''(''s''). Now consider the program
'''function''' GenerateComplexString('''int''' ''n'')
'''for''' i = 1 '''to''' infinity:
'''for each''' string s '''of''' length exactly i
'''if''' KolmogorovComplexity(''s'') >= ''n''
'''return''' ''s''
'''quit'''
This program calls KolmogorovComplexity as a subroutine. This program tries every string, starting with the shortest, until it finds a string with complexity at least ''n'', then returns that string. Therefore, given any positive integer ''n'', it produces a string with Kolmogorov complexity at least as great as ''n''. The program itself has a fixed length ''U''. The input to the program GenerateComplexString is an integer ''n''; here, the size of ''n'' is measured by the number of bits required to represent ''n'' which is log<sub>2</sub>(''n''). Now consider the following program:
'''function''' GenerateParadoxicalString()
'''return''' GenerateComplexString(''n''<sub>0</sub>)
This program calls GenerateComplexString as a subroutine and also has a free parameter
''n''<sub>0</sub>. This program outputs a string ''s'' whose complexity is at least ''n''<sub>0</sub>. By an auspicious choice of the parameter ''n''<sub>0</sub> we will arrive at a contradiction. To choose this value, note ''s'' is described by the program GenerateParadoxicalString whose length is at most
:<math> U + \log_2(n_0) + C \quad </math>
where ''C'' is the "overhead" added by the program GenerateParadoxicalString. Since ''n'' grows faster than log<sub>2</sub>(''n''), there exists a value ''n''<sub>0</sub> such that
:<math> U + \log_2(n_0) + C < n_0. \quad </math>
But this contradicts the definition of having a complexity at least ''n''<sub>0</sub>. Thus the program named "KolmogorovComplexity" cannot actually computably find the complexity of arbitrary strings.
This is proof by contradiction where the contradiction is similar to the [[Berry paradox]]: "Let ''n'' be the smallest positive integer that cannot be defined in fewer than twenty English words." It is also possible to show the uncomputability of K by reduction from the uncomputability of the [[halting problem]] H, since K and H are [[turing degree|turing-equivalent]]. [http://www.daimi.au.dk/~bromille/DC05/Kolmogorov.pdf]
===Chain rule for Kolmogorov complexity===
{{main | Chain rule for Kolmogorov complexity}}
The chain rule for Kolmogorov complexity states that
:<math> K(X,Y) = K(X) + K(Y|X) + O(\log(K(X,Y))).\quad</math>
It states that the shortest program that reproduces ''X'' and ''Y'' is [[Big-O notation|no more]] than a logarithmic term larger than a program to reproduce ''X'' and a program to reproduce ''Y'' given ''X''. Using this statement one can define [[Mutual information#Absolute mutual information|an analogue of mutual information for Kolmogorov complexity]].
== Compression ==
It is however straightforward to compute upper bounds for ''K''(''s''): simply [[data compression|compress]] the string ''s'' with some method, implement the corresponding decompressor in the chosen language, concatenate the decompressor to the compressed string, and measure the resulting string's length.
A string ''s'' is compressible by a number ''c'' if it has a description whose length does not exceed |''s''| − ''c''. This is equivalent to saying <span style="white-space:nowrap">''K''(''s'') ≤ |''s''| − ''c''</span>. Otherwise ''s'' is incompressible by ''c''. A string incompressible by 1 is said to be simply ''incompressible''; by the [[pigeonhole principle]], incompressible strings must exist, since there are 2<sup>''n''</sup> bit strings of length ''n'' but only <span style="white-space:nowrap">2<sup>''n''</sup>−2</span> shorter strings, that is strings of length ''n'' − 1 or less.
For the same reason, "most" strings are complex in the sense that they cannot be significantly compressed: ''K''(''s'') is not much smaller than |''s''|, the length of ''s'' in bits. To make this precise, fix a value of ''n''. There are 2<sup>''n''</sup> bitstrings of length ''n''. The [[Uniform distribution (discrete)|uniform]] [[probability]] distribution on the space of these bitstrings assigns to each string of length exactly ''n'' equal weight 2<sup>−''n''</sup>.
'''Theorem'''. With the uniform probability distribution on the space of bitstrings of length ''n'', the probability that a string is incompressible by ''c'' is at least <span style="white-space:nowrap">1 − 2<sup>−''c''+1 </sup> + 2<sup>−''n''</sup></span>.
To prove the theorem, note that the number of descriptions of length not exceeding <span style="white-space:nowrap">''n'' − ''c''</span> is given by the [[geometric series]]:
:<math> 1 + 2 + 2^2 + \cdots + 2^{n-c} = 2^{n-c+1}-1.\quad </math>
There remain at least
:<math> 2^n-2^{n-c+1}+1 \quad </math>
many bitstrings of length ''n'' that are incompressible by ''c''. To determine the probability divide by 2<sup>''n''</sup>.
This theorem is the justification for various challenges in [http://www.faqs.org/faqs/compression-faq/part1/ comp.compression FAQ]. Despite this result, it is sometimes claimed by certain individuals (considered [[Crank (person)|cranks]]) that they have produced algorithms which uniformly compress data without loss. See [[lossless data compression]].
==Chaitin's incompleteness theorem ==
We know that most strings are complex in the sense that they cannot be described in any significantly "compressed" way. However, it turns out that the fact that a specific string is complex cannot be formally proved, if the string's length is above a certain threshold. The precise formalization is as follows. First fix a particular [[axiomatic system]] '''S''' for the [[natural number]]s. The axiomatic system has to be powerful enough so that to certain assertions '''A''' about complexity of strings one can associate a formula '''F'''<sub>'''A'''</sub> in '''S'''. This association must have the following property: if '''F'''<sub>'''A'''</sub> is provable from the axioms of '''S''', then the corresponding assertion '''A''' is true. This "formalization" can be achieved either by an artificial encoding such as a [[Gödel numbering]] or by a formalization which more clearly respects the intended interpretation of '''S'''.
'''Theorem'''. There exists a constant ''L'' (which only depends on the particular axiomatic system and the choice of description language) such that there does not exist a string ''s'' for which the statement
: <math> K(s) \geq L \quad </math>
(as formalized in '''S''') can be proven within the axiomatic system '''S'''.
Note that by the abundance of nearly incompressible strings, the vast majority of those statements must be true.
The proof of this result is modeled on a self-referential construction used in [[Berry's paradox]]. The proof is by contradiction. If the theorem were false, then
:'''Assumption (X)''': For any integer ''n'' there exists a string ''s'' for which there is a proof in '''S''' of the formula "''K''(''s'') ≥ ''n''" (which we assume can be formalized in '''S''').
We can find an effective enumeration of all the formal proofs in '''S''' by some procedure
'''function''' NthProof('''int''' ''n'')
which takes as input ''n'' and outputs some proof. This function
enumerates all proofs. Some of these are proofs for formulas we do not
care about here (examples of proofs which will be listed by the procedure NthProof are the various known proofs of the [[law of quadratic reciprocity]], those of [[Fermat's little theorem]] or the proof of [[Fermat's last theorem]] all translated into the formal language of '''S'''). Some of these are complexity formulas of the form ''K''(''s'') ≥ ''n'' where ''s'' and ''n'' constants in the language of '''S'''. There is a program
'''function''' NthProofProvesComplexityFormula('''int''' ''n'')
which determines whether the ''n''<sup>th</sup> proof actually proves
a complexity formula ''K''(''s'') ≥ ''L''. The strings ''s'' and
the integer ''L'' in turn are computable by programs:
'''function''' StringNthProof('''int''' ''n'')
'''function''' ComplexityLowerBoundNthProof('''int''' ''n'')
Consider the following program
'''function''' GenerateProvablyComplexString('''int''' ''n'')
'''for''' i = 1 to infinity:
'''if''' NthProofProvesComplexityFormula(i) '''and''' ComplexityLowerBoundNthProof(i) >= ''n''
'''return''' StringNthProof(''i'')
'''quit'''
Given an ''n'', this program tries every proof until it finds a string
and a proof in the formal system '''S''' of the formula ''K''(''s'') ≥ ''n''. The program
terminates by our '''Assumption (X)'''. Now this program has a length ''U''.
There is an integer ''n''<sub>0</sub> such that ''U'' + log<sub>2</sub>(''n''<sub>0</sub>) + ''C'' < ''n''<sub>0</sub>, where ''C'' is the overhead cost of
'''function''' GenerateProvablyParadoxicalString()
'''return''' GenerateProvablyComplexString(''n''<sub>0</sub>)
'''quit'''
The program GenerateProvablyParadoxicalString outputs a string ''s'' for which ''K''(''s'') ≥
''n''<sub>0</sub> can be formally proved in '''S'''. In particular ''K''(''s'') ≥
''n''<sub>0</sub> is true. However, ''s'' is also described by a program of length
''U''+log<sub>2</sub>(''n''<sub>0</sub>)+''C'' so its complexity is less than ''n''<sub>0</sub>. This contradiction proves '''Assumption (X)''' cannot hold.
Similar ideas are used to prove the properties of [[Chaitin's constant]].
The [[minimum message length]] principle of statistical and inductive inference and machine learning was developed by [[Chris Wallace (computer scientist)|C.S. Wallace]] and D.M. Boulton in 1968. MML is [[Bayesian probability|Bayesian]] (it incorporates prior beliefs) and
information-theoretic. It has the desirable properties of statistical
invariance (the inference transforms with a re-parametrisation, such as from
polar coordinates to Cartesian coordinates), statistical consistency (even
for very hard problems, MML will converge to any underlying model) and efficiency (the MML model will converge to any true underlying model about as quickly as is possible). [http://www.csse.monash.edu.au/~dld/CSWallacePublications/ C.S. Wallace] and D.L. Dowe showed a formal connection between MML and algorithmic information theory (or Kolmogorov complexity) in 1999.
==Kolmogorov randomness==
''Kolmogorov randomness'' (also called ''algorithmic randomness'') defines a [[string (computer science)|string]] (usually of [[bit]]s) as being [[randomness|random]] if and only if it is shorter than any [[computer program]] that can produce that string. This definition of randomness is critically dependent on the definition of Kolmogorov complexity. To make this definition complete, a computer has to be specified, usually a [[Turing machine]]. According to the above definition of randomness, a random string is also an "incompressible" string, in the sense that it is impossible to give a representation of the string using a program whose length is shorter than the length of the string itself. However, according to this definition, most strings shorter than a certain length end up to be (Chaitin-Kolmogorovically) random because the best one can do with very small strings is to write a program which simply prints these strings.
==See also==
*[[Berlekamp-Massey algorithm]]
*[[Chaitin–Kolmogorov randomness|Chaitin-Kolmogorov randomness]]
*[[Data compression]]
*[[List of important publications in computer science#algorithmic information theory|Important publications in algorithmic information theory]]
*[[Levenshtein distance]]
== References ==
* Thomas M. Cover, Joy A. Thomas. ''Elements of information theory'', 1st Edition. New York: Wiley-Interscience, 1991. ISBN 0-471-06259-6.
:2nd Edition. New York: Wiley-Interscience, 2006. ISBN 0-471-24195-4.
* Rónyai Lajos, Ivanyos Gábor, Szabó Réka, ''Algoritmusok''. TypoTeX, 1999.
* Ming Li and Paul Vitányi, ''An Introduction to Kolmogorov Complexity and Its Applications'', Springer, 1997. [http://citeseer.ist.psu.edu/li97introduction.html Introduction chapter full-text].
* Yu Manin, ''A Course in Mathematical Logic'', Springer-Verlag, 1977.
* Michael Sipser, ''Introduction to the Theory of Computation'', PWS Publishing Company, 1997.
==External links==
* [http://www.kolmogorov.com/ The Legacy of Andrei Nikolaevich Kolmogorov]
* [http://www.cs.umaine.edu/~chaitin/ Chaitin's online publications]
* [http://www.idsia.ch/~juergen/ray.html Solomonoff's IDSIA page]
* [http://www.idsia.ch/~juergen/kolmogorov.html Generalizations of algorithmic information] by [[Juergen Schmidhuber|J. Schmidhuber]]
* [http://homepages.cwi.nl/~paulv/kolmogorov.html Ming Li and Paul Vitanyi, An Introduction to Kolmogorov Complexity and Its Applications, 2nd Edition, Springer Verlag, 1997.]
* [http://homepages.cwi.nl/~tromp/cl/cl.html Tromp's lambda calculus computer model offers a concrete definition of K()]
* Universal AI based on Kolmogorov Complexity ISBN 3-540-22139-5 by [[Marcus Hutter|M. Hutter]]: ISBN 3-540-22139-5
* [http://www3.oup.co.uk/computer_journal/hdb/Volume_42/Issue_04/pdf/420270.pdf Minimum Message Length and Kolmogorov Complexity] (by [http://www.csse.monash.edu.au/~dld/CSWallacePublications C.S. Wallace] and [http://www.csse.monash.edu.au/~dld D.L. Dowe], Computer Journal, Vol. 42, No. 4, 1999).
* [http://www.csse.monash.edu.au/~dld David Dowe]'s [http://www.csse.monash.edu.au/~dld/MML.html Minimum Message Length (MML)] and [http://www.csse.monash.edu.au/~dld/Occam.html Occam's razor] pages.
* P. Grunwald, M. A. Pitt and I. J. Myung (ed.), [http://mitpress.mit.edu/catalog/item/default.asp?sid=4C100C6F-2255-40FF-A2ED-02FC49FEBE7C&ttype=2&tid=10478 Advances in Minimum Description Length: Theory and Applications], M.I.T. Press, April 2005, ISBN 0-262-07262-9.
[[Category:Algorithmic information theory|*]]
[[Category:Information theory|*]]
[[Category:Recursion theory]]
[[Category:Theory of computation]]
[[de:Kolmogorow-Komplexität]]
[[fa:نظریه الگوریتمی اطلاعات]]
[[fr:Complexité de Kolmogorov]]
[[gl:Complexidade de Kolmogorov]]
[[he:סיבוכיות קולמוגורוב]]
[[ja:コルモゴロフ複雑性]]
[[pl:Złożoność Kołmogorowa]]
[[pt:Complexidade de Kolmogorov]]
[[ru:Колмогоровская сложность]]
[[tr:Kolmogorov karmaşıklığı]]
[[zh:柯氏复杂性]]