Quantum-mechanical thermodynamics is the heart of thermodynamics, accounting for such
phenomena as spin-statistics, correct Boltzmann counting, and most of the interesting manybody system characteristics such as superfluidity and Fermi pressure. At its most basic level, to
put this discipline in an information-theoretic context requires generalizing the idea of
information.
Because of the intricate correlations that quantum systems can possess, they are difficult to
interpret within the bounds of classical information theory. But a complete theory which
accounts for both the quantum and classical cases has been developed (Cer97b). The von
Neumann entropy for a quantum system, represented by a density matrix ρA, where A is a
quantum system or quantum source,
H(A) = - A A
Trρ log ρ
is familiar to students of statistical mechanics for being a measure of the disorder of a quantum
system (see the explanation in I.1.i.), but it was recently established that H(A) is the minimum
number of qubits needed to losslessly code an ensemble of quantum states, thus giving
information-theoretic importance to this representation – see the argument later in this section
for a summary of the proof (Sch95). Further, if ρΑ is composed of orthogonal quantum states it
clearly reduces to a classical Shannon entropy H(A), since we can just diagonalize ρA in the
orthogonal basis; we will discuss this fact later in the context of quantum channels. For two
operators A, B, the definitions of joint, conditional, and mutual entropy can be defined as
follows, where ρAB is the joint density matrix:
H (A, B) = −Tr{ρ AB
log ρ AB
{
H (A | B) = −Tr{ρ AB
log ρ A|B
{
I (A, B) = Tr{ρ AB
log ρ A;B
{
The definitions of ρA|B and ρA ;B are surprisingly subtle. They are given by
AB
A B
σ
ρ
−
= 2
|
,
[ ( )]
n
n
AB
n
A B
n
A B
1/ 1/
;
lim
−
∞→
ρ = ρ ⊗ ρ ρ
respectively, where
σAB = 1A ⊗ log2ρΑΒ - log2ρAB
and
ρΑ = TrB{ρΑΒ}, ρΒ = TrA{ρΑΒ}.
(As noted earlier, TrB means to take the partial trace of the density matrix; e.g.,
a a ab AB
a b
b
A
ρ ' = ∑ ρ ' .
This is where the duality between projection operators and density matrices becomes useful.)
The justification of these choices is basically by analogy: as ρΑΒ becomes diagonal, for example,
the quantum conditional entropy converges to its classical value; also, ρA;B satisfies the classical
identity I(A,B) = H(A)+H(B) - H(A,B). But ρA|B and ρA ;B are not true density matrices. For
example, ρA|B may have eigenvalues greater than 1, when σAB has negative eigenvalues – in such
situations one obtains negative conditional entropies! It has been shown that a separable density
matrix ρΑΒ (i.e., one that can be expanded in the form 35
⊗ ∑
i
i
B
i
i A
a ρ ρ
and therefore can’t be told apart from a classical ensemble containing two completely decoupled
systems) must have positive semidefinite σAB and thus positive conditional entropies (Cer97a),
but sufficiently entangled systems can indeed guarantee strange behavior like negative
conditional entropies. And oddly enough, I(A, B) can exceed the entropy of each individual set of
variables! The above definitions and generalizations thereof have been applied to quantum
teleportation and superdense coding (i.e., where two classical bits are encoded in one qubit) in
the paper (Cer97b). In this paper the authors treat entangled qubits as virtual conjugate particles
e, e (collectively called ebits, or entangled qubits), carrying opposite information content ( +1
qubit each), and suggest that e may be visualized as e going backwards through time! One can,
for example, draw a Feynman diagram for ultradense coding (consumption of a shared entangled
state, plus transmission of one qubit, is equivalent to transmission of two classical bits). Thus
these ‘information quanta’ suddenly have many properties analogous to those of virtual particles
in relativistic quantum field theory – in particular, only through interactions that result in
classical bits may they become visible to the world. Importantly, this formalism allows an
intuitive picture of conserved information flows. Ebits can be created simply by transmitting one
bit of a Bell pair while keeping the other, but one cannot create a qubit directly from an ebit
(some classical bits are required). An interesting point is that if the ebit used for teleportation or
superdense coding is only partly entangled, then the transmission will appear noisy.
The Schmidt decomposition, sometimes called the polar decomposition, is a way of representing
an entangled pure state in the form
i
d
i
i i
= ∑c a ⊗ b
=1
ψ
where the ci
are positive real coefficients, and
i
a ,
i
b are appropriately chosen orthonormal
states of two subsystems A, B, which may perhaps be located far apart from one another
(Ben95b) (Sch95). It is useful to note that local operations cannot increase the number of
nonzero terms in the Schmidt decomposition; i.e., one cannot create entanglement with purely
local operations (although by consuming other entangled bits, one can increase the entanglement
of another subsystem). Local unitary operations on subsystem A can only change the
eigenvectors of A (not the eigenvalues), and cannot affect B’s observations at all. As noted
above, in this basis σAB is positive semidefinite. If A makes an observation of his part of the
system, this is equivalent to him first tracing over the degrees of freedom of B’s subsystem
(which results in an apparent mixed state of his subsystem, represented by a diagonal density
matrix ρA in the Schmidt basis), then making an observation on this reduced density matrix:
i
d
i
A B i i
Tr ∑c a a
=
= =
1
2
ρ ψ ψ
Then the entropy of entanglement is defined to be
∑
=
− = − = − =
d
i
A A B B i i
E Tr Tr c c
1
2 2
ρ log(ρ ) ρ log(ρ ) log( ) .
Note that it is just the Shannon entropy of the squared density matrix coefficients in the Schmidt
basis. E = 0 for a direct product, and E = 1 for a Bell state. It can be proven (Pop97) that this is
the unique measure of entanglement for pure states. The argument, like Shannon’s definition of 36
information, is based on two axioms: it is impossible to increase entanglement purely by local
operations, and it is possible to reversibly transfer entanglement from some set of shared ebits to
previously unentangled qubits (leaving the original ebits unentangled) (Ben96c). (The last
statement is true asymptotically; one can reversibly transform k systems in one entangled state
into n systems in pure singlet states only in the limit n,k →∞ ; in fact n/k is often asymptotically
an
irrational number, so due to incommensurability such reversible procedures can’t even exist
except in the infinite limit!) Formally these two statements are akin to the Second Law of
thermodynamics and the statement that all reversible heat engines are equally efficient, so the
derivation of the entropy of entanglement is completely analogous to the derivation of entropy in
thermodynamics! Thus, proceeding as in thermodynamics, we require the degree of
entanglement to be extensive, we can measure the entanglement of an arbitrary state by
transforming it into a set of singlet states (which are defined to have entanglement equal to 1),
and so on. Unfortunately this definition is only good in the quantum analogy to the
‘thermodynamic limit,’ meaning access to infinitely many pure ebits, which rarely occurs in real
life. Also, nobody has come up with a measure of entanglement for a mixed state, since no on
has exhibited a reversible way to convert a density matrix into pure states; candidates include the
number of singlets that can be packed into a density matrix, or the number which can be
extracted – sometimes these two values can be different (Ben96d). Therefore many other
definitions may be more practical, if appropriately justified; it is difficult to imagine what
physical manifestation this measure of mixed-state entanglement might take. Indeed, the recent
literature indicates that even respectable scientists occasionally need reprimanding due to the
ambiguity of dealing with issues as nonintuitive as entanglement (Unr).
One interesting manifestation of these results is that two people with many partially-entangled
qubits can increase the degree of entanglement of a few qubits to arbitrary purity, at the expense
of the others. Note that this is not in contradiction of our above statements, since only the total
entanglement is required to be conserved. For a large number n of entangled pairs of two-state
particles, each with entropy of entanglement E < 1, the yield of pure singlets goes like nE–
O(log(n)) (compare this to the pure state yield of bulk spin resonance quantum computation,
below). The process of Schmidt projection is as follows: suppose that the initial system is in the
product state
(cos sin 2 2
(
1
1 1 i i
n
i
i i ∏ a b a b
=
ψ = θ + θ
which has 2
n
terms, each with one of n + 1 distinct coefficients cos(θ)
n-k
sin(θ)
k
We can treat .
these states as n + 1 orthogonal subspaces, labed by the power k of sin(θ) in the coefficient. A
and B project the state ψ onto one of these subspaces by making an observation, and obtaining
some k; this collapses the state down into a maximally entangled state, which occupies a 2n!/(nk)!k! dimensional subspace of the original space; if lucky, one can get even more ebits than one
started with (but the expected entanglement cannot be greater than before). It’s analogous to the
method of taking a biased coin, say with probability of heads .3, and getting an unbiased random
bit by flipping the coin and keeping sequences HT = 1 and TH = 0 (probability .21 each), and
discarding TT and HH (probability .49 and .09, respectively). In our case, measuring a power k
symmetrizes our state by selecting all possible combinations with some equal probability
(although we cannot actively choose that probability) and discarding all the states with 37
probabilities that differ from it. One can transform this new state formally into singlets as
follows: measure k for each of m batches, each containing n entangled pairs, getting ki
, i = 1..m.
Let Dm be the product of all the coefficients n!/ki
!(n-ki
)!, i = 1..m, and continue until Dm∈[2
l
,
2
l
(1+ε)], where ε is the desired maximum error in the entanglement, and l is some integer. Then
A and B make a measurement on the system which projects it into one of two subspaces, one of
dimension 2
l+1
(which is in a maximally entangled state of two 2
l
-dimensional subsystems) and
one of dimension 2(Dm –2
l
) (which discards all the entanglement). In the former case, we have
again effectively symmetrized the state; in the latter, we lose everything. (Such is the risky
nature of quantum engineering.) Finally, another round of Schmidt decomposition arranges the
density matrix into a product of l singlets.
Quantum channels, over which qubit messages are sent, are a useful abstraction. In classical
information theory, a source A produces messages ai
with probability p(ai
), and the fidelity of the
coding system is the probability that the decoded message is the same as the transmitted one; in
quantum information theory a quantum source codes each messages a from a source A into a
signal state M
a of a system M; the ensemble of messages is then representable by a density
matrix
ρ = ∑
a
a
p a)π ( , where πa = M M
a a ,
and the fidelity for a channel where ra is received when πa is transmitted is defined to be
f ( = ) )Tr( a a
a
∑ p a π r
(so that for perfect noiseless channels f = 1) (Sch95). Compare this to our definition of classical
probability fidelity in I.1.ii.; we will unite these pictures in the next paragraph. Another possible
definition of a quantum channel W, which lends itself more to problems involving noise, is to
consider a channel to be a probability distribution on unitary transformations U which act upon
Hsignal ⊗ Henvironment
; before reading a quantum channel, one must take a partial trace over
Henvironment
, and one can define the fidelity as
x W x
all signals x
min
(Cal96). Note that there are subtleties for the quantum channel which one does not have to
consider in the classical case: for example, one cannot copy an arbitrary quantum signal (the
“no-cloning” theorem (Woo82)); the proof follows immediately from the linearity of quantum
mechanics. Only if the signals are orthogonal can they be copied (i.e., measuring a system which
is known to be in an eigenstate is trivial, and yields all the information available about the
system). Fundamental to quantum communication is the idea of transposition, or placing a
system Y in the same state as a system X (perhaps resetting X to a useless state in the process);
unitary tranposition from system X to system Y can occur iff the states in X have the same inner
products as the corresponding states in Y, e.g. ax bx = ay by , which occurs iff the Hilbert
space of Y is of dimension no less than the Hilbert space of X, for obvious reasons. If a message
is unitarily transposed from a source A to a system M, which upon transmission is decoded into a
message A’ (on an isomorphic quantum system) by the inverse of the unitary coding operator,
A → M → A’
then we say M is the quantum channel for the communication. The question arises, when is the
quantum channel M big enough for a particular quantum source A?