Although the library stops developing, it is worth to learn since there are many prominent papers using Theano. Besides, there are lots of stuff to learn from the implementation.
So, Theano basically is a symbolic graphs of computation. Plus, it can do automatically gradient computing stuff.
Learning Materials
Installation
sudo pip install theano
To fix the inconsistent between float32
and
float64
, in .theanorc
:
[global]
floatX = float32
Getting Started
There are several conventions:
import theano.tensor as T
from theano import function
T
stands for tensor, it includes tensor operations.function
constructs a function from given input, output and other properties.- function([input], output): the input always is a list even though there is only 1 argument.
Algebra
Scalars
x = T.dscalar('x')
y = T.dscalar('y')
z = x + y
f = function([x, y], z)
To create a scalar varialbe, use T.<type>scalar('name_variable')
where <type>
stands for the type of
the variable. The prefixes b, i, f, d, c
used for byte, integer, float, double,
complex
respectively. By the way, there are 7 primitive types in Theano: byte
(b), 16-bit integer (w), 32-bit integers (i), 4-bit integers (l), float (f),
double (d), complex ©.
Use pp
to pretty-print the symbolic variable of Theano.
Function
This damn thing seems very important. Let break it down and see how to manipulate it. Beyound basic argument including inputs, output, there are two more fancier ones:
givens
: pairs ofVar1, Var2
which later the function will substituteVar1
byVar2
.updates
: Update rules.
To break the function down or debug it, pydotprint
visualize the function by graph. By far, this is the most intuitive way to examine the function.
Data Management
Must-Read material: Understand Memory Aliasing for Speed and Correctness. There are some takeaway notes:
- Use
borrow=True
when creating new share variables. - Use
borrow=False
when retrieving the values of TensorVariable, this also is the default value ofborrow
inget_value
.
Tensor Operations
Nondifferentiable functions
Firstly, let take a look at the sgn
function. Its
gradients
are zeros. It means that if we put the signed function, which is especially
common in the hashing problem, all prior components of the network are not able
to update their weights. Therefore, if the loss function is an autoencoder,
namely:
$$ L = \left\lVert f(sgn(g(X))) - X \right\rVert $$
Where $g$ and $f$ are encoder and decoder, respectively. We could not learn the encoder at all.
Gradients
- To set a variable to be a constant, use
theano.gradient.zero_grad