adapted from http://deeplearning.net/software/theano/tutorial/
Theano Tutorial
Basic Algebra
Adding two scalars
import theano.tensor as T
from theano import function
x = T.fscalar('x')
y = T.fscalar('y')
By calling T.dscalar
with a string argument, we create a variable representing a floating-point scalar quantity with the given name.
z = x + y
z is another variable representing the addition of x and y. We can use pp
function to pretty-print out the computation associate with z:
from theano import pp
print(pp(z))
The last step is to create a function taking x and y as inputs and giving z as output:
f = function([x, y], z)
The first argument to function is a list of variables that will be provided as inputs to the function. The second argument is a single variable or a list of variables. For either case, the second argument is what we want to see as output when we apply the function. f may then be used like a normal Python function.
Adding two matrices
x = T.fmatrix('x')
y = T.fmatrix('y')
z = x + y
f = function([x, y], z)
More examples
http://deeplearning.net/software/theano/tutorial/examples.html
Logistic function
For logistic function, :
x = T.fmatrix('x')
s = 1 / (1 + T.exp(-x))
# even for ONE argument, we have to use []
logistic = function([x], s)
Setting a default value for an argument
from theano import Param
x, y = T.fscalars('x', 'y')
z = x + y
f = function([x, Param(y, default=1)], z)
f(10)
Using shared variables
For example, we want to make an accumulator: at the beginning, the state is initialized to zero. Then, on each function call, the state is incremented by the function's argument.
Let's first define the accumulator function. It adds its argument to the internal state, and returns the OLD state value:
from theano import shared
state = shared(0)
inc = T.iscalar('inc')
accumulator = function([inc], state, updates=[(state, state+inc)])
The shared
function constructs shared variables. Their value may be shared between multiple functions. The value can be accessed and modified by .get_value()
and .set_value()
methods.
As a parameter of function
, updates
must be supplied with a list of pairs of the form (shared-variable, new expression). It can also be a dictionary whose keys are shared-variables and values are the new expressions. It means "whenever this function runs, it will replace the .value
of each shared variable with the result of the corresponding expression". Above, the accumulator replaces the state
value with the sum of the state and the increment amount.
.set_value()
can be used to reset the state.
Why we need theano.shared
? For efficiency. Updates to shared variables can sometimes be done more quickly using in-place algorithms. Also, Theano has more control over where and how shared variables are allocated, which is important for GPU.
Using random numbers
The way to think about putting randomness into Theano's computations is to put random variables in the graph. Theano will allocate a Numpy RandomStream object (a random number generator) for each such variable, and draw from it as necessary. This sort of sequence of random numbers are called a random stream. Random streams are at their core shared variables, so the observations on shared variables hold here as well.
An example is:
from theano.tensor.shared_randomstreams import RandomStreams
from theano import function
srng = RandomStreams(seed=234)
rv_u = srng.uniform((2,2))
rv_n = srng.normal((2, 2))
f = function([], rv_u)
g = funciton([], rv_n, no_default_updates=True) # not updating rv_n.rng
nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)
rv_u
is a random straem of 2x2 matrices of draws from a uniform distribution. rv_n
is a random stream of 2x2 matrices of draws from a normal distribution.
An important remark is that a ranodm variable is drawn at most once
during any single function execution. So the nearly_zeros
function is guaranteed to return approximately 0 even though the rv_u
random variable appears three times in the output expression.
seeding streams
We can seed one random variable by seeding or assigning to the .rgn
attribute, using .rng.set_value()
rng_val = rv_u.rng.get_value(borrow=True) # get the rng for rv_u
rng_val.seed(12345) # seed the generator
rv_u.rng.set_value(rng_val, borrow=True) # assign back seeded rng
sharing streams between functions
a bit TRICKY example
state_after_v0 = rv_u.rng.get_value().get_state()
v0 = f()
v1 = f()
rng = rv_u.rng.get_value(borrow=True)
rng.set_state(state_after_v0)
rv_u.rng.set_value(rng, borrow=True)
v2 = f() # v2 == v0
v3 = f() # v3 == v1
Copying random state between Theano graphs
Graph structures
The first step in writing Theano code is to write down all mathematical relations using symbolic placeholders (variables). When writing down these expressions you use operations like +, -, *, sum(), tanh(). All these are represented internally as ops*. An op represents a certain computation on some type of inputs producing some type of output. You can see it as a function definition in most programming languages.
Theano builds internally a graph structure composed of interconnected variable
nodes, op
nodes and apply
nodes. An apply
node represents the application of an op
to some variables
.
x = T.fmatrix('x')
y = T.fmatrix('y')
z = x + y
The graph can be traversed starting from outputs (the result of some computation) down to its inputs using the owner field.
Automatic differentiation
tensor.grad()
will traverse the graph from the outputs back towards the inputs through all apply
nodes (apply
nodes are those that define which computations the graph does). For each such apply
node, its op
defines how to compute the gradient
of the node's outputs with respect to its inputs.
Derivatives in Theano
Computing the gradient
Say we want to compute :
from theano import pp
x = T.fscalar('x')
y = x ** 2
gy = T.grad(y, x)
print(pp(gy))
f = function([x], gy)
From print(pp(gy))
we can see that fill((x ** TensorConstant{2}), TensorConstant{1.0})
, which means to make a matrix of the same shame as x**2
and fill it with 1.0
.
Computing the Jacobian
Jacobian
designates the tensor comprising the first partial derivatives of the output of a function with respect to its inputs. theano.gradient.jacobian()
will do it automatically
Loop
Simple loop with accumulation: computing $A^{k}$
Given k we want to get A**k
using a loop:
result = 1
for i in xrange(k):
result = result * A
There are three things to notice: the initial value assigned to result
, the accumulation of results in result
, and the unchanging variable A
. Unchanging variables are passed to scan
as non_sequences
. Initialization occurs in outputs_info
, and the accumulation happens automatically.
The equivalent Theano code is:
k = T.iscalar('k')
A = T.vector('A')
result, updates = theano.scan(fn=lambda prior_result, A: prior_result * A, outputs_info=T.ones_like(A), non_sequences=A, n_steps=k)
# We only care about A**k, but scan has provided us with A**1 through A**k.
# Discard the values that we don't care about. Scan is smart enough to
# notice this and not waste memory saving them.
final_result = result[-1]
# compiled function that returns A**k
power = theano.function(inputs=[A,k],
outputs=final_result, updates=updates)
print(power(range(10),2))
print(power(range(10),4))
Within theano.scan
, the order of parameters to fn
is fixed: the output of the prior call to fn
(or the initial value, initially) is the first parameter, followed by all non-sequences
.
Next we initialize the output as a tensor with the same shape and dtype as A
, filled with ones. We give A
to scan
as a non-sequences
parameter and specify the number of steps k
to iterate over our lambda expression.
theano.scan
returns a tuple containing our result (result
) and a dictionary of updates (empty in this case). The result is a 3D tensor containing the value of A**k
for each step. We want the last value so we compile a function to return just that. Due to the internal optimization, we don't have to worry if A
or k
is large.
Iterating over the first dimension of a tensor: Calculating a polynomial
theano.scan
can iterate over the leading dimension of tensors (similar to for x in a_list
).
The tensor to be looped over should be provided to scan
using the sequence
keyword argument.
Here’s an example that builds a symbolic calculation of a polynomial from a list of its coefficients:
coefficients = theano.tensor.vector("coefficients")
x = T.scalar("x")
max_coefficients_supported = 10000
# Generate the components of the polynomial
components, updates = theano.scan(fn=lambda coefficient, power, free_variable: coefficient * (free_variable ** power),
outputs_info=None,
sequences=[coefficients, theano.tensor.arange(max_coefficients_supported)],
non_sequences=x)
# Sum them up
polynomial = components.sum()
# Compile a function
calculate_polynomial = theano.function(inputs=[coefficients, x], outputs=polynomial)
# Test
test_coefficients = numpy.asarray([1, 0, 2], dtype=numpy.float32)
test_value = 3
print calculate_polynomial(test_coefficients, test_value)
print 1.0 * (3 ** 0) + 0.0 * (3 ** 1) + 2.0 * (3 ** 2)
There is no accumulation of results, we can set outputs_info
to None
. This indicates to scan
that it doesn't need to pass the prior result to fn
.
The general order of function parameters to fn
is:
sequences (if any), prior result(s) (if needed), non-sequences (if any)
In sequences=[coefficients,T.arange(max_coefficients_supported)]
, scan
will truncate to the shortest of them.
Simple accumulation into a scalar, ditching lambda
The following example stresses a pitfall to be careful: the initial outputs_info
must be of a shape similar to that of the output variable generated at each iteration and moreover, it must not involve an implicit downcast of the latter.