How to obtain second derivatives of a Loss function with respect to the parameters of a neural network using gradient tape in Tensorflow eager modeHow to get the gradients of loss with respect to activations in TensorflowNaN values after specific number of iterations in Tensorflow not due to diverging GradientsUsing make_template() in TensorFlowTensorFlow DCGAN model: stability and convergence problemsKeras AttributeError: 'list' object has no attribute 'ndim'Model is not able to over-fit even a single audio file after 200 epochsHow to pause and resume gradient taping in eager execution?Tensorflow eager execution - nested gradient tapeTensorflow Neural network loss not decreasingComputing gradients wrt model inputs in Tensorflow eager mode

Why were the bells ignored in S8E5?

Why aren't satellites disintegrated even though they orbit earth within their Roche Limits?

Physically unpleasant work environment

Why does string strummed with finger sound different from the one strummed with pick?

Do we see some Unsullied doing this in S08E05?

Given 0s on Assignments with suspected and dismissed cheating?

Non-African Click Languages

Is Precocious Apprentice enough for Mystic Theurge?

What kind of environment would favor hermaphroditism in a sentient species over regular, old sexes?

What would a Dragon have to exhale to cause rain?

Why would company (decision makers) wait for someone to retire, rather than lay them off, when their role is no longer needed?

Why is Drogon so much better in battle than Rhaegal and Viserion?

How long do Aarakocra live?

Square spiral in Mathematica

Rushed passport - does my reason qualify?

Why use a retrograde orbit?

What color to choose as "danger" if the main color of my app is red

Could a space colony 1g from the sun work?

When did Britain learn about American independence?

Why do galaxies collide?

Five Powers of Fives Produce Unique Pandigital Number...Solve for X..Tell me Y

Iterate lines of string variable in bash

How to continually and organically let my readers know what time it is in my story?

Using a Snow jacket for non snow conditions?



How to obtain second derivatives of a Loss function with respect to the parameters of a neural network using gradient tape in Tensorflow eager mode


How to get the gradients of loss with respect to activations in TensorflowNaN values after specific number of iterations in Tensorflow not due to diverging GradientsUsing make_template() in TensorFlowTensorFlow DCGAN model: stability and convergence problemsKeras AttributeError: 'list' object has no attribute 'ndim'Model is not able to over-fit even a single audio file after 200 epochsHow to pause and resume gradient taping in eager execution?Tensorflow eager execution - nested gradient tapeTensorflow Neural network loss not decreasingComputing gradients wrt model inputs in Tensorflow eager mode






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








1















I am creating a basic auto-encoder for the MNIST dataset using TensorFlow eager mode. I would like to observe the second-order partial derivatives of my loss function with respect to the parameters of the network as it trains. Currently, calling tape.gradient() on the output of in_tape.gradient returns None (where in_tape is a GradientTape nested inside the outer GradientTape called tape, I have included my code below)



I have tried calling the tape.gradient() directly on the in_tape.gradient() with None being returned. My next approach was to iterate over the output of in_tape.gradient() and apply tape.gradient() to each gradient individually (with respect to my model variables) with None being returned each time.



I receive a single None value for any tape.gradient() call, not a list of None values which I believe would indicate None for a single partial derivative, which would be expected in some cases.



I am currently only trying to get the second derivatives for the first set of weights (from input to hidden layers), however, I will scale it to include all weights once I have this working.



tf.enable_eager_execution()

mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((train_images.shape[0], train_images.shape[1]*train_images.shape[2])).astype(np.float32)/255
test_images = test_images.reshape((test_images.shape[0], test_images.shape[1]*test_images.shape[2])).astype(np.float32)/255

num_epochs = 200
batch_size = 100
learning_rate = 0.0003

class MNISTModel(tf.keras.Model):
def __init__(self, device='/gpu:0'):
super(MNISTModel, self).__init__()
self.device = device
self.initializer = tf.initializers.random_uniform(0.0, 0.5)
self.hidden = tf.keras.layers.Dense(200, use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Hidden")
self.out = tf.keras.layers.Dense(train_images.shape[1], use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Output")
self.hidden.build(train_images.shape[1])
self.out.build(200)

def call(self, x):
return self.out(self.hidden(x))

def loss_func(model, x, y_):
return tf.reduce_mean(tf.losses.mean_squared_error(labels=y_, predictions=model(x)))
#return tf.reduce_mean((y_ - model(x))**4)

model = MNISTModel()
optimizer = tf.train.GradientDescentOptimizer(learning_rate)

for epochs in range(num_epochs):
print("Started epoch ", epochs)
print("Num batches is: ", train_images.shape[0]/batch_size)
for i in range(0,1): #(int(train_images.shape[0]/batch_size)):
with tfe.GradientTape(persistent=True) as tape:
tape.watch(model.variables)
with tfe.GradientTape() as in_tape:
in_tape.watch(model.variables)
loss = loss_func(model,train_images[0:batch_size],train_images[0:batch_size])
grads = tape.gradient(loss, model.variables)
IH_partial_grads = np.array([])
for i in range(len(grads[0])):
collector = np.array([])
for j in range(len(grads[0][i])):
collector = np.append(collector, tape.gradient(grads[0][i][j], model.variables[0]))
IH_partial_grads = np.append(IH_partial_grads, collector)
optimizer.apply_gradients(zip(grads, model.variables), global_step=tf.train.get_or_create_global_step())
print("Epoch test loss: ", loss_func(model, test_images, test_images))


My ultimate goal is to form the hessian matrix for the loss function with respect to all parameters of my network.



Thanks for any and all help!










share|improve this question
























  • I can't reproduce your error. Could you please specify how exactly you're getting None?

    – Sharky
    Mar 25 at 20:40











  • Hi @Sharky, thanks for having a look! I'm not sure what else to say, at the end of running the above code IH_partial_grads is None. So when you run the above code do you obtain an IH_partial_grads matrix which has dimensions 145600x145600?

    – Devon Jarvis
    Mar 26 at 21:30

















1















I am creating a basic auto-encoder for the MNIST dataset using TensorFlow eager mode. I would like to observe the second-order partial derivatives of my loss function with respect to the parameters of the network as it trains. Currently, calling tape.gradient() on the output of in_tape.gradient returns None (where in_tape is a GradientTape nested inside the outer GradientTape called tape, I have included my code below)



I have tried calling the tape.gradient() directly on the in_tape.gradient() with None being returned. My next approach was to iterate over the output of in_tape.gradient() and apply tape.gradient() to each gradient individually (with respect to my model variables) with None being returned each time.



I receive a single None value for any tape.gradient() call, not a list of None values which I believe would indicate None for a single partial derivative, which would be expected in some cases.



I am currently only trying to get the second derivatives for the first set of weights (from input to hidden layers), however, I will scale it to include all weights once I have this working.



tf.enable_eager_execution()

mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((train_images.shape[0], train_images.shape[1]*train_images.shape[2])).astype(np.float32)/255
test_images = test_images.reshape((test_images.shape[0], test_images.shape[1]*test_images.shape[2])).astype(np.float32)/255

num_epochs = 200
batch_size = 100
learning_rate = 0.0003

class MNISTModel(tf.keras.Model):
def __init__(self, device='/gpu:0'):
super(MNISTModel, self).__init__()
self.device = device
self.initializer = tf.initializers.random_uniform(0.0, 0.5)
self.hidden = tf.keras.layers.Dense(200, use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Hidden")
self.out = tf.keras.layers.Dense(train_images.shape[1], use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Output")
self.hidden.build(train_images.shape[1])
self.out.build(200)

def call(self, x):
return self.out(self.hidden(x))

def loss_func(model, x, y_):
return tf.reduce_mean(tf.losses.mean_squared_error(labels=y_, predictions=model(x)))
#return tf.reduce_mean((y_ - model(x))**4)

model = MNISTModel()
optimizer = tf.train.GradientDescentOptimizer(learning_rate)

for epochs in range(num_epochs):
print("Started epoch ", epochs)
print("Num batches is: ", train_images.shape[0]/batch_size)
for i in range(0,1): #(int(train_images.shape[0]/batch_size)):
with tfe.GradientTape(persistent=True) as tape:
tape.watch(model.variables)
with tfe.GradientTape() as in_tape:
in_tape.watch(model.variables)
loss = loss_func(model,train_images[0:batch_size],train_images[0:batch_size])
grads = tape.gradient(loss, model.variables)
IH_partial_grads = np.array([])
for i in range(len(grads[0])):
collector = np.array([])
for j in range(len(grads[0][i])):
collector = np.append(collector, tape.gradient(grads[0][i][j], model.variables[0]))
IH_partial_grads = np.append(IH_partial_grads, collector)
optimizer.apply_gradients(zip(grads, model.variables), global_step=tf.train.get_or_create_global_step())
print("Epoch test loss: ", loss_func(model, test_images, test_images))


My ultimate goal is to form the hessian matrix for the loss function with respect to all parameters of my network.



Thanks for any and all help!










share|improve this question
























  • I can't reproduce your error. Could you please specify how exactly you're getting None?

    – Sharky
    Mar 25 at 20:40











  • Hi @Sharky, thanks for having a look! I'm not sure what else to say, at the end of running the above code IH_partial_grads is None. So when you run the above code do you obtain an IH_partial_grads matrix which has dimensions 145600x145600?

    – Devon Jarvis
    Mar 26 at 21:30













1












1








1








I am creating a basic auto-encoder for the MNIST dataset using TensorFlow eager mode. I would like to observe the second-order partial derivatives of my loss function with respect to the parameters of the network as it trains. Currently, calling tape.gradient() on the output of in_tape.gradient returns None (where in_tape is a GradientTape nested inside the outer GradientTape called tape, I have included my code below)



I have tried calling the tape.gradient() directly on the in_tape.gradient() with None being returned. My next approach was to iterate over the output of in_tape.gradient() and apply tape.gradient() to each gradient individually (with respect to my model variables) with None being returned each time.



I receive a single None value for any tape.gradient() call, not a list of None values which I believe would indicate None for a single partial derivative, which would be expected in some cases.



I am currently only trying to get the second derivatives for the first set of weights (from input to hidden layers), however, I will scale it to include all weights once I have this working.



tf.enable_eager_execution()

mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((train_images.shape[0], train_images.shape[1]*train_images.shape[2])).astype(np.float32)/255
test_images = test_images.reshape((test_images.shape[0], test_images.shape[1]*test_images.shape[2])).astype(np.float32)/255

num_epochs = 200
batch_size = 100
learning_rate = 0.0003

class MNISTModel(tf.keras.Model):
def __init__(self, device='/gpu:0'):
super(MNISTModel, self).__init__()
self.device = device
self.initializer = tf.initializers.random_uniform(0.0, 0.5)
self.hidden = tf.keras.layers.Dense(200, use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Hidden")
self.out = tf.keras.layers.Dense(train_images.shape[1], use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Output")
self.hidden.build(train_images.shape[1])
self.out.build(200)

def call(self, x):
return self.out(self.hidden(x))

def loss_func(model, x, y_):
return tf.reduce_mean(tf.losses.mean_squared_error(labels=y_, predictions=model(x)))
#return tf.reduce_mean((y_ - model(x))**4)

model = MNISTModel()
optimizer = tf.train.GradientDescentOptimizer(learning_rate)

for epochs in range(num_epochs):
print("Started epoch ", epochs)
print("Num batches is: ", train_images.shape[0]/batch_size)
for i in range(0,1): #(int(train_images.shape[0]/batch_size)):
with tfe.GradientTape(persistent=True) as tape:
tape.watch(model.variables)
with tfe.GradientTape() as in_tape:
in_tape.watch(model.variables)
loss = loss_func(model,train_images[0:batch_size],train_images[0:batch_size])
grads = tape.gradient(loss, model.variables)
IH_partial_grads = np.array([])
for i in range(len(grads[0])):
collector = np.array([])
for j in range(len(grads[0][i])):
collector = np.append(collector, tape.gradient(grads[0][i][j], model.variables[0]))
IH_partial_grads = np.append(IH_partial_grads, collector)
optimizer.apply_gradients(zip(grads, model.variables), global_step=tf.train.get_or_create_global_step())
print("Epoch test loss: ", loss_func(model, test_images, test_images))


My ultimate goal is to form the hessian matrix for the loss function with respect to all parameters of my network.



Thanks for any and all help!










share|improve this question
















I am creating a basic auto-encoder for the MNIST dataset using TensorFlow eager mode. I would like to observe the second-order partial derivatives of my loss function with respect to the parameters of the network as it trains. Currently, calling tape.gradient() on the output of in_tape.gradient returns None (where in_tape is a GradientTape nested inside the outer GradientTape called tape, I have included my code below)



I have tried calling the tape.gradient() directly on the in_tape.gradient() with None being returned. My next approach was to iterate over the output of in_tape.gradient() and apply tape.gradient() to each gradient individually (with respect to my model variables) with None being returned each time.



I receive a single None value for any tape.gradient() call, not a list of None values which I believe would indicate None for a single partial derivative, which would be expected in some cases.



I am currently only trying to get the second derivatives for the first set of weights (from input to hidden layers), however, I will scale it to include all weights once I have this working.



tf.enable_eager_execution()

mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((train_images.shape[0], train_images.shape[1]*train_images.shape[2])).astype(np.float32)/255
test_images = test_images.reshape((test_images.shape[0], test_images.shape[1]*test_images.shape[2])).astype(np.float32)/255

num_epochs = 200
batch_size = 100
learning_rate = 0.0003

class MNISTModel(tf.keras.Model):
def __init__(self, device='/gpu:0'):
super(MNISTModel, self).__init__()
self.device = device
self.initializer = tf.initializers.random_uniform(0.0, 0.5)
self.hidden = tf.keras.layers.Dense(200, use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Hidden")
self.out = tf.keras.layers.Dense(train_images.shape[1], use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Output")
self.hidden.build(train_images.shape[1])
self.out.build(200)

def call(self, x):
return self.out(self.hidden(x))

def loss_func(model, x, y_):
return tf.reduce_mean(tf.losses.mean_squared_error(labels=y_, predictions=model(x)))
#return tf.reduce_mean((y_ - model(x))**4)

model = MNISTModel()
optimizer = tf.train.GradientDescentOptimizer(learning_rate)

for epochs in range(num_epochs):
print("Started epoch ", epochs)
print("Num batches is: ", train_images.shape[0]/batch_size)
for i in range(0,1): #(int(train_images.shape[0]/batch_size)):
with tfe.GradientTape(persistent=True) as tape:
tape.watch(model.variables)
with tfe.GradientTape() as in_tape:
in_tape.watch(model.variables)
loss = loss_func(model,train_images[0:batch_size],train_images[0:batch_size])
grads = tape.gradient(loss, model.variables)
IH_partial_grads = np.array([])
for i in range(len(grads[0])):
collector = np.array([])
for j in range(len(grads[0][i])):
collector = np.append(collector, tape.gradient(grads[0][i][j], model.variables[0]))
IH_partial_grads = np.append(IH_partial_grads, collector)
optimizer.apply_gradients(zip(grads, model.variables), global_step=tf.train.get_or_create_global_step())
print("Epoch test loss: ", loss_func(model, test_images, test_images))


My ultimate goal is to form the hessian matrix for the loss function with respect to all parameters of my network.



Thanks for any and all help!







tensorflow hessian-matrix eager-execution






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 25 at 19:48









Sharky

2,2612918




2,2612918










asked Mar 23 at 16:49









Devon JarvisDevon Jarvis

637




637












  • I can't reproduce your error. Could you please specify how exactly you're getting None?

    – Sharky
    Mar 25 at 20:40











  • Hi @Sharky, thanks for having a look! I'm not sure what else to say, at the end of running the above code IH_partial_grads is None. So when you run the above code do you obtain an IH_partial_grads matrix which has dimensions 145600x145600?

    – Devon Jarvis
    Mar 26 at 21:30

















  • I can't reproduce your error. Could you please specify how exactly you're getting None?

    – Sharky
    Mar 25 at 20:40











  • Hi @Sharky, thanks for having a look! I'm not sure what else to say, at the end of running the above code IH_partial_grads is None. So when you run the above code do you obtain an IH_partial_grads matrix which has dimensions 145600x145600?

    – Devon Jarvis
    Mar 26 at 21:30
















I can't reproduce your error. Could you please specify how exactly you're getting None?

– Sharky
Mar 25 at 20:40





I can't reproduce your error. Could you please specify how exactly you're getting None?

– Sharky
Mar 25 at 20:40













Hi @Sharky, thanks for having a look! I'm not sure what else to say, at the end of running the above code IH_partial_grads is None. So when you run the above code do you obtain an IH_partial_grads matrix which has dimensions 145600x145600?

– Devon Jarvis
Mar 26 at 21:30





Hi @Sharky, thanks for having a look! I'm not sure what else to say, at the end of running the above code IH_partial_grads is None. So when you run the above code do you obtain an IH_partial_grads matrix which has dimensions 145600x145600?

– Devon Jarvis
Mar 26 at 21:30












0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55316096%2fhow-to-obtain-second-derivatives-of-a-loss-function-with-respect-to-the-paramete%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55316096%2fhow-to-obtain-second-derivatives-of-a-loss-function-with-respect-to-the-paramete%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript