How to obtain second derivatives of a Loss function with respect to the parameters of a neural network using gradient tape in Tensorflow eager modeHow to get the gradients of loss with respect to activations in TensorflowNaN values after specific number of iterations in Tensorflow not due to diverging GradientsUsing make_template() in TensorFlowTensorFlow DCGAN model: stability and convergence problemsKeras AttributeError: 'list' object has no attribute 'ndim'Model is not able to over-fit even a single audio file after 200 epochsHow to pause and resume gradient taping in eager execution?Tensorflow eager execution - nested gradient tapeTensorflow Neural network loss not decreasingComputing gradients wrt model inputs in Tensorflow eager mode
Why were the bells ignored in S8E5?
Why aren't satellites disintegrated even though they orbit earth within their Roche Limits?
Physically unpleasant work environment
Why does string strummed with finger sound different from the one strummed with pick?
Do we see some Unsullied doing this in S08E05?
Given 0s on Assignments with suspected and dismissed cheating?
Non-African Click Languages
Is Precocious Apprentice enough for Mystic Theurge?
What kind of environment would favor hermaphroditism in a sentient species over regular, old sexes?
What would a Dragon have to exhale to cause rain?
Why would company (decision makers) wait for someone to retire, rather than lay them off, when their role is no longer needed?
Why is Drogon so much better in battle than Rhaegal and Viserion?
How long do Aarakocra live?
Square spiral in Mathematica
Rushed passport - does my reason qualify?
Why use a retrograde orbit?
What color to choose as "danger" if the main color of my app is red
Could a space colony 1g from the sun work?
When did Britain learn about American independence?
Why do galaxies collide?
Five Powers of Fives Produce Unique Pandigital Number...Solve for X..Tell me Y
Iterate lines of string variable in bash
How to continually and organically let my readers know what time it is in my story?
Using a Snow jacket for non snow conditions?
How to obtain second derivatives of a Loss function with respect to the parameters of a neural network using gradient tape in Tensorflow eager mode
How to get the gradients of loss with respect to activations in TensorflowNaN values after specific number of iterations in Tensorflow not due to diverging GradientsUsing make_template() in TensorFlowTensorFlow DCGAN model: stability and convergence problemsKeras AttributeError: 'list' object has no attribute 'ndim'Model is not able to over-fit even a single audio file after 200 epochsHow to pause and resume gradient taping in eager execution?Tensorflow eager execution - nested gradient tapeTensorflow Neural network loss not decreasingComputing gradients wrt model inputs in Tensorflow eager mode
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I am creating a basic auto-encoder for the MNIST dataset using TensorFlow eager mode. I would like to observe the second-order partial derivatives of my loss function with respect to the parameters of the network as it trains. Currently, calling tape.gradient()
on the output of in_tape.gradient
returns None
(where in_tape
is a GradientTape
nested inside the outer GradientTape
called tape, I have included my code below)
I have tried calling the tape.gradient()
directly on the in_tape.gradient()
with None being returned. My next approach was to iterate over the output of in_tape.gradient()
and apply tape.gradient()
to each gradient individually (with respect to my model variables) with None
being returned each time.
I receive a single None
value for any tape.gradient()
call, not a list of None values which I believe would indicate None
for a single partial derivative, which would be expected in some cases.
I am currently only trying to get the second derivatives for the first set of weights (from input to hidden layers), however, I will scale it to include all weights once I have this working.
tf.enable_eager_execution()
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((train_images.shape[0], train_images.shape[1]*train_images.shape[2])).astype(np.float32)/255
test_images = test_images.reshape((test_images.shape[0], test_images.shape[1]*test_images.shape[2])).astype(np.float32)/255
num_epochs = 200
batch_size = 100
learning_rate = 0.0003
class MNISTModel(tf.keras.Model):
def __init__(self, device='/gpu:0'):
super(MNISTModel, self).__init__()
self.device = device
self.initializer = tf.initializers.random_uniform(0.0, 0.5)
self.hidden = tf.keras.layers.Dense(200, use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Hidden")
self.out = tf.keras.layers.Dense(train_images.shape[1], use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Output")
self.hidden.build(train_images.shape[1])
self.out.build(200)
def call(self, x):
return self.out(self.hidden(x))
def loss_func(model, x, y_):
return tf.reduce_mean(tf.losses.mean_squared_error(labels=y_, predictions=model(x)))
#return tf.reduce_mean((y_ - model(x))**4)
model = MNISTModel()
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
for epochs in range(num_epochs):
print("Started epoch ", epochs)
print("Num batches is: ", train_images.shape[0]/batch_size)
for i in range(0,1): #(int(train_images.shape[0]/batch_size)):
with tfe.GradientTape(persistent=True) as tape:
tape.watch(model.variables)
with tfe.GradientTape() as in_tape:
in_tape.watch(model.variables)
loss = loss_func(model,train_images[0:batch_size],train_images[0:batch_size])
grads = tape.gradient(loss, model.variables)
IH_partial_grads = np.array([])
for i in range(len(grads[0])):
collector = np.array([])
for j in range(len(grads[0][i])):
collector = np.append(collector, tape.gradient(grads[0][i][j], model.variables[0]))
IH_partial_grads = np.append(IH_partial_grads, collector)
optimizer.apply_gradients(zip(grads, model.variables), global_step=tf.train.get_or_create_global_step())
print("Epoch test loss: ", loss_func(model, test_images, test_images))
My ultimate goal is to form the hessian matrix for the loss function with respect to all parameters of my network.
Thanks for any and all help!
tensorflow hessian-matrix eager-execution
add a comment |
I am creating a basic auto-encoder for the MNIST dataset using TensorFlow eager mode. I would like to observe the second-order partial derivatives of my loss function with respect to the parameters of the network as it trains. Currently, calling tape.gradient()
on the output of in_tape.gradient
returns None
(where in_tape
is a GradientTape
nested inside the outer GradientTape
called tape, I have included my code below)
I have tried calling the tape.gradient()
directly on the in_tape.gradient()
with None being returned. My next approach was to iterate over the output of in_tape.gradient()
and apply tape.gradient()
to each gradient individually (with respect to my model variables) with None
being returned each time.
I receive a single None
value for any tape.gradient()
call, not a list of None values which I believe would indicate None
for a single partial derivative, which would be expected in some cases.
I am currently only trying to get the second derivatives for the first set of weights (from input to hidden layers), however, I will scale it to include all weights once I have this working.
tf.enable_eager_execution()
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((train_images.shape[0], train_images.shape[1]*train_images.shape[2])).astype(np.float32)/255
test_images = test_images.reshape((test_images.shape[0], test_images.shape[1]*test_images.shape[2])).astype(np.float32)/255
num_epochs = 200
batch_size = 100
learning_rate = 0.0003
class MNISTModel(tf.keras.Model):
def __init__(self, device='/gpu:0'):
super(MNISTModel, self).__init__()
self.device = device
self.initializer = tf.initializers.random_uniform(0.0, 0.5)
self.hidden = tf.keras.layers.Dense(200, use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Hidden")
self.out = tf.keras.layers.Dense(train_images.shape[1], use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Output")
self.hidden.build(train_images.shape[1])
self.out.build(200)
def call(self, x):
return self.out(self.hidden(x))
def loss_func(model, x, y_):
return tf.reduce_mean(tf.losses.mean_squared_error(labels=y_, predictions=model(x)))
#return tf.reduce_mean((y_ - model(x))**4)
model = MNISTModel()
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
for epochs in range(num_epochs):
print("Started epoch ", epochs)
print("Num batches is: ", train_images.shape[0]/batch_size)
for i in range(0,1): #(int(train_images.shape[0]/batch_size)):
with tfe.GradientTape(persistent=True) as tape:
tape.watch(model.variables)
with tfe.GradientTape() as in_tape:
in_tape.watch(model.variables)
loss = loss_func(model,train_images[0:batch_size],train_images[0:batch_size])
grads = tape.gradient(loss, model.variables)
IH_partial_grads = np.array([])
for i in range(len(grads[0])):
collector = np.array([])
for j in range(len(grads[0][i])):
collector = np.append(collector, tape.gradient(grads[0][i][j], model.variables[0]))
IH_partial_grads = np.append(IH_partial_grads, collector)
optimizer.apply_gradients(zip(grads, model.variables), global_step=tf.train.get_or_create_global_step())
print("Epoch test loss: ", loss_func(model, test_images, test_images))
My ultimate goal is to form the hessian matrix for the loss function with respect to all parameters of my network.
Thanks for any and all help!
tensorflow hessian-matrix eager-execution
I can't reproduce your error. Could you please specify how exactly you're gettingNone
?
– Sharky
Mar 25 at 20:40
Hi @Sharky, thanks for having a look! I'm not sure what else to say, at the end of running the above codeIH_partial_grads
isNone
. So when you run the above code do you obtain anIH_partial_grads
matrix which has dimensions 145600x145600?
– Devon Jarvis
Mar 26 at 21:30
add a comment |
I am creating a basic auto-encoder for the MNIST dataset using TensorFlow eager mode. I would like to observe the second-order partial derivatives of my loss function with respect to the parameters of the network as it trains. Currently, calling tape.gradient()
on the output of in_tape.gradient
returns None
(where in_tape
is a GradientTape
nested inside the outer GradientTape
called tape, I have included my code below)
I have tried calling the tape.gradient()
directly on the in_tape.gradient()
with None being returned. My next approach was to iterate over the output of in_tape.gradient()
and apply tape.gradient()
to each gradient individually (with respect to my model variables) with None
being returned each time.
I receive a single None
value for any tape.gradient()
call, not a list of None values which I believe would indicate None
for a single partial derivative, which would be expected in some cases.
I am currently only trying to get the second derivatives for the first set of weights (from input to hidden layers), however, I will scale it to include all weights once I have this working.
tf.enable_eager_execution()
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((train_images.shape[0], train_images.shape[1]*train_images.shape[2])).astype(np.float32)/255
test_images = test_images.reshape((test_images.shape[0], test_images.shape[1]*test_images.shape[2])).astype(np.float32)/255
num_epochs = 200
batch_size = 100
learning_rate = 0.0003
class MNISTModel(tf.keras.Model):
def __init__(self, device='/gpu:0'):
super(MNISTModel, self).__init__()
self.device = device
self.initializer = tf.initializers.random_uniform(0.0, 0.5)
self.hidden = tf.keras.layers.Dense(200, use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Hidden")
self.out = tf.keras.layers.Dense(train_images.shape[1], use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Output")
self.hidden.build(train_images.shape[1])
self.out.build(200)
def call(self, x):
return self.out(self.hidden(x))
def loss_func(model, x, y_):
return tf.reduce_mean(tf.losses.mean_squared_error(labels=y_, predictions=model(x)))
#return tf.reduce_mean((y_ - model(x))**4)
model = MNISTModel()
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
for epochs in range(num_epochs):
print("Started epoch ", epochs)
print("Num batches is: ", train_images.shape[0]/batch_size)
for i in range(0,1): #(int(train_images.shape[0]/batch_size)):
with tfe.GradientTape(persistent=True) as tape:
tape.watch(model.variables)
with tfe.GradientTape() as in_tape:
in_tape.watch(model.variables)
loss = loss_func(model,train_images[0:batch_size],train_images[0:batch_size])
grads = tape.gradient(loss, model.variables)
IH_partial_grads = np.array([])
for i in range(len(grads[0])):
collector = np.array([])
for j in range(len(grads[0][i])):
collector = np.append(collector, tape.gradient(grads[0][i][j], model.variables[0]))
IH_partial_grads = np.append(IH_partial_grads, collector)
optimizer.apply_gradients(zip(grads, model.variables), global_step=tf.train.get_or_create_global_step())
print("Epoch test loss: ", loss_func(model, test_images, test_images))
My ultimate goal is to form the hessian matrix for the loss function with respect to all parameters of my network.
Thanks for any and all help!
tensorflow hessian-matrix eager-execution
I am creating a basic auto-encoder for the MNIST dataset using TensorFlow eager mode. I would like to observe the second-order partial derivatives of my loss function with respect to the parameters of the network as it trains. Currently, calling tape.gradient()
on the output of in_tape.gradient
returns None
(where in_tape
is a GradientTape
nested inside the outer GradientTape
called tape, I have included my code below)
I have tried calling the tape.gradient()
directly on the in_tape.gradient()
with None being returned. My next approach was to iterate over the output of in_tape.gradient()
and apply tape.gradient()
to each gradient individually (with respect to my model variables) with None
being returned each time.
I receive a single None
value for any tape.gradient()
call, not a list of None values which I believe would indicate None
for a single partial derivative, which would be expected in some cases.
I am currently only trying to get the second derivatives for the first set of weights (from input to hidden layers), however, I will scale it to include all weights once I have this working.
tf.enable_eager_execution()
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((train_images.shape[0], train_images.shape[1]*train_images.shape[2])).astype(np.float32)/255
test_images = test_images.reshape((test_images.shape[0], test_images.shape[1]*test_images.shape[2])).astype(np.float32)/255
num_epochs = 200
batch_size = 100
learning_rate = 0.0003
class MNISTModel(tf.keras.Model):
def __init__(self, device='/gpu:0'):
super(MNISTModel, self).__init__()
self.device = device
self.initializer = tf.initializers.random_uniform(0.0, 0.5)
self.hidden = tf.keras.layers.Dense(200, use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Hidden")
self.out = tf.keras.layers.Dense(train_images.shape[1], use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Output")
self.hidden.build(train_images.shape[1])
self.out.build(200)
def call(self, x):
return self.out(self.hidden(x))
def loss_func(model, x, y_):
return tf.reduce_mean(tf.losses.mean_squared_error(labels=y_, predictions=model(x)))
#return tf.reduce_mean((y_ - model(x))**4)
model = MNISTModel()
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
for epochs in range(num_epochs):
print("Started epoch ", epochs)
print("Num batches is: ", train_images.shape[0]/batch_size)
for i in range(0,1): #(int(train_images.shape[0]/batch_size)):
with tfe.GradientTape(persistent=True) as tape:
tape.watch(model.variables)
with tfe.GradientTape() as in_tape:
in_tape.watch(model.variables)
loss = loss_func(model,train_images[0:batch_size],train_images[0:batch_size])
grads = tape.gradient(loss, model.variables)
IH_partial_grads = np.array([])
for i in range(len(grads[0])):
collector = np.array([])
for j in range(len(grads[0][i])):
collector = np.append(collector, tape.gradient(grads[0][i][j], model.variables[0]))
IH_partial_grads = np.append(IH_partial_grads, collector)
optimizer.apply_gradients(zip(grads, model.variables), global_step=tf.train.get_or_create_global_step())
print("Epoch test loss: ", loss_func(model, test_images, test_images))
My ultimate goal is to form the hessian matrix for the loss function with respect to all parameters of my network.
Thanks for any and all help!
tensorflow hessian-matrix eager-execution
tensorflow hessian-matrix eager-execution
edited Mar 25 at 19:48
Sharky
2,2612918
2,2612918
asked Mar 23 at 16:49
Devon JarvisDevon Jarvis
637
637
I can't reproduce your error. Could you please specify how exactly you're gettingNone
?
– Sharky
Mar 25 at 20:40
Hi @Sharky, thanks for having a look! I'm not sure what else to say, at the end of running the above codeIH_partial_grads
isNone
. So when you run the above code do you obtain anIH_partial_grads
matrix which has dimensions 145600x145600?
– Devon Jarvis
Mar 26 at 21:30
add a comment |
I can't reproduce your error. Could you please specify how exactly you're gettingNone
?
– Sharky
Mar 25 at 20:40
Hi @Sharky, thanks for having a look! I'm not sure what else to say, at the end of running the above codeIH_partial_grads
isNone
. So when you run the above code do you obtain anIH_partial_grads
matrix which has dimensions 145600x145600?
– Devon Jarvis
Mar 26 at 21:30
I can't reproduce your error. Could you please specify how exactly you're getting
None
?– Sharky
Mar 25 at 20:40
I can't reproduce your error. Could you please specify how exactly you're getting
None
?– Sharky
Mar 25 at 20:40
Hi @Sharky, thanks for having a look! I'm not sure what else to say, at the end of running the above code
IH_partial_grads
is None
. So when you run the above code do you obtain an IH_partial_grads
matrix which has dimensions 145600x145600?– Devon Jarvis
Mar 26 at 21:30
Hi @Sharky, thanks for having a look! I'm not sure what else to say, at the end of running the above code
IH_partial_grads
is None
. So when you run the above code do you obtain an IH_partial_grads
matrix which has dimensions 145600x145600?– Devon Jarvis
Mar 26 at 21:30
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55316096%2fhow-to-obtain-second-derivatives-of-a-loss-function-with-respect-to-the-paramete%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55316096%2fhow-to-obtain-second-derivatives-of-a-loss-function-with-respect-to-the-paramete%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I can't reproduce your error. Could you please specify how exactly you're getting
None
?– Sharky
Mar 25 at 20:40
Hi @Sharky, thanks for having a look! I'm not sure what else to say, at the end of running the above code
IH_partial_grads
isNone
. So when you run the above code do you obtain anIH_partial_grads
matrix which has dimensions 145600x145600?– Devon Jarvis
Mar 26 at 21:30