How to obtain second derivatives of a Loss function with respect to the parameters of a neural network using gradient tape in Tensorflow eager modeHow to get the gradients of loss with respect to activations in TensorflowNaN values after specific number of iterations in Tensorflow not due to diverging GradientsUsing make_template() in TensorFlowTensorFlow DCGAN model: stability and convergence problemsKeras AttributeError: 'list' object has no attribute 'ndim'Model is not able to over-fit even a single audio file after 200 epochsHow to pause and resume gradient taping in eager execution?Tensorflow eager execution - nested gradient tapeTensorflow Neural network loss not decreasingComputing gradients wrt model inputs in Tensorflow eager mode

Why were the bells ignored in S8E5?

Why aren't satellites disintegrated even though they orbit earth within their Roche Limits?

Physically unpleasant work environment

Why does string strummed with finger sound different from the one strummed with pick?

Do we see some Unsullied doing this in S08E05?

Given 0s on Assignments with suspected and dismissed cheating?

Non-African Click Languages

Is Precocious Apprentice enough for Mystic Theurge?

What kind of environment would favor hermaphroditism in a sentient species over regular, old sexes?

What would a Dragon have to exhale to cause rain?

Why would company (decision makers) wait for someone to retire, rather than lay them off, when their role is no longer needed?

Why is Drogon so much better in battle than Rhaegal and Viserion?

How long do Aarakocra live?

Square spiral in Mathematica

Rushed passport - does my reason qualify?

Why use a retrograde orbit?

What color to choose as "danger" if the main color of my app is red

Could a space colony 1g from the sun work?

When did Britain learn about American independence?

Why do galaxies collide?

Five Powers of Fives Produce Unique Pandigital Number...Solve for X..Tell me Y

Iterate lines of string variable in bash

How to continually and organically let my readers know what time it is in my story?

Using a Snow jacket for non snow conditions?

How to obtain second derivatives of a Loss function with respect to the parameters of a neural network using gradient tape in Tensorflow eager mode

How to get the gradients of loss with respect to activations in TensorflowNaN values after specific number of iterations in Tensorflow not due to diverging GradientsUsing make_template() in TensorFlowTensorFlow DCGAN model: stability and convergence problemsKeras AttributeError: 'list' object has no attribute 'ndim'Model is not able to over-fit even a single audio file after 200 epochsHow to pause and resume gradient taping in eager execution?Tensorflow eager execution - nested gradient tapeTensorflow Neural network loss not decreasingComputing gradients wrt model inputs in Tensorflow eager mode

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I am creating a basic auto-encoder for the MNIST dataset using TensorFlow eager mode. I would like to observe the second-order partial derivatives of my loss function with respect to the parameters of the network as it trains. Currently, calling tape.gradient() on the output of in_tape.gradient returns None (where in_tape is a GradientTape nested inside the outer GradientTape called tape, I have included my code below)

I have tried calling the tape.gradient() directly on the in_tape.gradient() with None being returned. My next approach was to iterate over the output of in_tape.gradient() and apply tape.gradient() to each gradient individually (with respect to my model variables) with None being returned each time.

I receive a single None value for any tape.gradient() call, not a list of None values which I believe would indicate None for a single partial derivative, which would be expected in some cases.

I am currently only trying to get the second derivatives for the first set of weights (from input to hidden layers), however, I will scale it to include all weights once I have this working.

tf.enable_eager_execution()

mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((train_images.shape[0], train_images.shape[1]*train_images.shape[2])).astype(np.float32)/255
test_images = test_images.reshape((test_images.shape[0], test_images.shape[1]*test_images.shape[2])).astype(np.float32)/255

num_epochs = 200
batch_size = 100
learning_rate = 0.0003

class MNISTModel(tf.keras.Model):
 def __init__(self, device='/gpu:0'):
 super(MNISTModel, self).__init__()
 self.device = device
 self.initializer = tf.initializers.random_uniform(0.0, 0.5)
 self.hidden = tf.keras.layers.Dense(200, use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Hidden")
 self.out = tf.keras.layers.Dense(train_images.shape[1], use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Output")
 self.hidden.build(train_images.shape[1])
 self.out.build(200)

 def call(self, x):
 return self.out(self.hidden(x))

def loss_func(model, x, y_):
 return tf.reduce_mean(tf.losses.mean_squared_error(labels=y_, predictions=model(x)))
 #return tf.reduce_mean((y_ - model(x))**4)

model = MNISTModel()
optimizer = tf.train.GradientDescentOptimizer(learning_rate)

for epochs in range(num_epochs):
 print("Started epoch ", epochs)
 print("Num batches is: ", train_images.shape[0]/batch_size)
 for i in range(0,1): #(int(train_images.shape[0]/batch_size)):
 with tfe.GradientTape(persistent=True) as tape:
 tape.watch(model.variables)
 with tfe.GradientTape() as in_tape:
 in_tape.watch(model.variables)
 loss = loss_func(model,train_images[0:batch_size],train_images[0:batch_size])
 grads = tape.gradient(loss, model.variables)
 IH_partial_grads = np.array([]) 
 for i in range(len(grads[0])):
 collector = np.array([])
 for j in range(len(grads[0][i])):
 collector = np.append(collector, tape.gradient(grads[0][i][j], model.variables[0]))
 IH_partial_grads = np.append(IH_partial_grads, collector)
 optimizer.apply_gradients(zip(grads, model.variables), global_step=tf.train.get_or_create_global_step())
 print("Epoch test loss: ", loss_func(model, test_images, test_images))

My ultimate goal is to form the hessian matrix for the loss function with respect to all parameters of my network.

Thanks for any and all help!

edited Mar 25 at 19:48

Sharky

2,2612918

asked Mar 23 at 16:49

Devon Jarvis

637

I can't reproduce your error. Could you please specify how exactly you're getting None?

– Sharky
Mar 25 at 20:40

Hi @Sharky, thanks for having a look! I'm not sure what else to say, at the end of running the above code IH_partial_grads is None. So when you run the above code do you obtain an IH_partial_grads matrix which has dimensions 145600x145600?

– Devon Jarvis
Mar 26 at 21:30

add a comment |

I receive a single None value for any tape.gradient() call, not a list of None values which I believe would indicate None for a single partial derivative, which would be expected in some cases.

I am currently only trying to get the second derivatives for the first set of weights (from input to hidden layers), however, I will scale it to include all weights once I have this working.

tf.enable_eager_execution()

mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((train_images.shape[0], train_images.shape[1]*train_images.shape[2])).astype(np.float32)/255
test_images = test_images.reshape((test_images.shape[0], test_images.shape[1]*test_images.shape[2])).astype(np.float32)/255

num_epochs = 200
batch_size = 100
learning_rate = 0.0003

class MNISTModel(tf.keras.Model):
 def __init__(self, device='/gpu:0'):
 super(MNISTModel, self).__init__()
 self.device = device
 self.initializer = tf.initializers.random_uniform(0.0, 0.5)
 self.hidden = tf.keras.layers.Dense(200, use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Hidden")
 self.out = tf.keras.layers.Dense(train_images.shape[1], use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Output")
 self.hidden.build(train_images.shape[1])
 self.out.build(200)

 def call(self, x):
 return self.out(self.hidden(x))

def loss_func(model, x, y_):
 return tf.reduce_mean(tf.losses.mean_squared_error(labels=y_, predictions=model(x)))
 #return tf.reduce_mean((y_ - model(x))**4)

model = MNISTModel()
optimizer = tf.train.GradientDescentOptimizer(learning_rate)

for epochs in range(num_epochs):
 print("Started epoch ", epochs)
 print("Num batches is: ", train_images.shape[0]/batch_size)
 for i in range(0,1): #(int(train_images.shape[0]/batch_size)):
 with tfe.GradientTape(persistent=True) as tape:
 tape.watch(model.variables)
 with tfe.GradientTape() as in_tape:
 in_tape.watch(model.variables)
 loss = loss_func(model,train_images[0:batch_size],train_images[0:batch_size])
 grads = tape.gradient(loss, model.variables)
 IH_partial_grads = np.array([]) 
 for i in range(len(grads[0])):
 collector = np.array([])
 for j in range(len(grads[0][i])):
 collector = np.append(collector, tape.gradient(grads[0][i][j], model.variables[0]))
 IH_partial_grads = np.append(IH_partial_grads, collector)
 optimizer.apply_gradients(zip(grads, model.variables), global_step=tf.train.get_or_create_global_step())
 print("Epoch test loss: ", loss_func(model, test_images, test_images))

My ultimate goal is to form the hessian matrix for the loss function with respect to all parameters of my network.

Thanks for any and all help!

edited Mar 25 at 19:48

Sharky

2,2612918

asked Mar 23 at 16:49

Devon Jarvis

637

I can't reproduce your error. Could you please specify how exactly you're getting None?

– Sharky
Mar 25 at 20:40

Hi @Sharky, thanks for having a look! I'm not sure what else to say, at the end of running the above code IH_partial_grads is None. So when you run the above code do you obtain an IH_partial_grads matrix which has dimensions 145600x145600?

– Devon Jarvis
Mar 26 at 21:30

add a comment |

I receive a single None value for any tape.gradient() call, not a list of None values which I believe would indicate None for a single partial derivative, which would be expected in some cases.

I am currently only trying to get the second derivatives for the first set of weights (from input to hidden layers), however, I will scale it to include all weights once I have this working.

tf.enable_eager_execution()

mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((train_images.shape[0], train_images.shape[1]*train_images.shape[2])).astype(np.float32)/255
test_images = test_images.reshape((test_images.shape[0], test_images.shape[1]*test_images.shape[2])).astype(np.float32)/255

num_epochs = 200
batch_size = 100
learning_rate = 0.0003

class MNISTModel(tf.keras.Model):
 def __init__(self, device='/gpu:0'):
 super(MNISTModel, self).__init__()
 self.device = device
 self.initializer = tf.initializers.random_uniform(0.0, 0.5)
 self.hidden = tf.keras.layers.Dense(200, use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Hidden")
 self.out = tf.keras.layers.Dense(train_images.shape[1], use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Output")
 self.hidden.build(train_images.shape[1])
 self.out.build(200)

 def call(self, x):
 return self.out(self.hidden(x))

def loss_func(model, x, y_):
 return tf.reduce_mean(tf.losses.mean_squared_error(labels=y_, predictions=model(x)))
 #return tf.reduce_mean((y_ - model(x))**4)

model = MNISTModel()
optimizer = tf.train.GradientDescentOptimizer(learning_rate)

for epochs in range(num_epochs):
 print("Started epoch ", epochs)
 print("Num batches is: ", train_images.shape[0]/batch_size)
 for i in range(0,1): #(int(train_images.shape[0]/batch_size)):
 with tfe.GradientTape(persistent=True) as tape:
 tape.watch(model.variables)
 with tfe.GradientTape() as in_tape:
 in_tape.watch(model.variables)
 loss = loss_func(model,train_images[0:batch_size],train_images[0:batch_size])
 grads = tape.gradient(loss, model.variables)
 IH_partial_grads = np.array([]) 
 for i in range(len(grads[0])):
 collector = np.array([])
 for j in range(len(grads[0][i])):
 collector = np.append(collector, tape.gradient(grads[0][i][j], model.variables[0]))
 IH_partial_grads = np.append(IH_partial_grads, collector)
 optimizer.apply_gradients(zip(grads, model.variables), global_step=tf.train.get_or_create_global_step())
 print("Epoch test loss: ", loss_func(model, test_images, test_images))

My ultimate goal is to form the hessian matrix for the loss function with respect to all parameters of my network.

Thanks for any and all help!

edited Mar 25 at 19:48

Sharky

2,2612918

asked Mar 23 at 16:49

Devon Jarvis

637

I receive a single None value for any tape.gradient() call, not a list of None values which I believe would indicate None for a single partial derivative, which would be expected in some cases.

I am currently only trying to get the second derivatives for the first set of weights (from input to hidden layers), however, I will scale it to include all weights once I have this working.

tf.enable_eager_execution()

mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((train_images.shape[0], train_images.shape[1]*train_images.shape[2])).astype(np.float32)/255
test_images = test_images.reshape((test_images.shape[0], test_images.shape[1]*test_images.shape[2])).astype(np.float32)/255

num_epochs = 200
batch_size = 100
learning_rate = 0.0003

class MNISTModel(tf.keras.Model):
 def __init__(self, device='/gpu:0'):
 super(MNISTModel, self).__init__()
 self.device = device
 self.initializer = tf.initializers.random_uniform(0.0, 0.5)
 self.hidden = tf.keras.layers.Dense(200, use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Hidden")
 self.out = tf.keras.layers.Dense(train_images.shape[1], use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Output")
 self.hidden.build(train_images.shape[1])
 self.out.build(200)

 def call(self, x):
 return self.out(self.hidden(x))

def loss_func(model, x, y_):
 return tf.reduce_mean(tf.losses.mean_squared_error(labels=y_, predictions=model(x)))
 #return tf.reduce_mean((y_ - model(x))**4)

model = MNISTModel()
optimizer = tf.train.GradientDescentOptimizer(learning_rate)

for epochs in range(num_epochs):
 print("Started epoch ", epochs)
 print("Num batches is: ", train_images.shape[0]/batch_size)
 for i in range(0,1): #(int(train_images.shape[0]/batch_size)):
 with tfe.GradientTape(persistent=True) as tape:
 tape.watch(model.variables)
 with tfe.GradientTape() as in_tape:
 in_tape.watch(model.variables)
 loss = loss_func(model,train_images[0:batch_size],train_images[0:batch_size])
 grads = tape.gradient(loss, model.variables)
 IH_partial_grads = np.array([]) 
 for i in range(len(grads[0])):
 collector = np.array([])
 for j in range(len(grads[0][i])):
 collector = np.append(collector, tape.gradient(grads[0][i][j], model.variables[0]))
 IH_partial_grads = np.append(IH_partial_grads, collector)
 optimizer.apply_gradients(zip(grads, model.variables), global_step=tf.train.get_or_create_global_step())
 print("Epoch test loss: ", loss_func(model, test_images, test_images))

My ultimate goal is to form the hessian matrix for the loss function with respect to all parameters of my network.

Thanks for any and all help!

tensorflow hessian-matrix eager-execution

edited Mar 25 at 19:48

Sharky

2,2612918

asked Mar 23 at 16:49

Devon Jarvis

637

edited Mar 25 at 19:48

Sharky

2,2612918

asked Mar 23 at 16:49

Devon Jarvis

637

edited Mar 25 at 19:48

Sharky

2,2612918

edited Mar 25 at 19:48

Sharky

2,2612918

edited Mar 25 at 19:48

Sharky

2,2612918

asked Mar 23 at 16:49

Devon Jarvis

637

asked Mar 23 at 16:49

Devon Jarvis

637

asked Mar 23 at 16:49

Devon Jarvis

637

I can't reproduce your error. Could you please specify how exactly you're getting None?

– Sharky
Mar 25 at 20:40

Hi @Sharky, thanks for having a look! I'm not sure what else to say, at the end of running the above code IH_partial_grads is None. So when you run the above code do you obtain an IH_partial_grads matrix which has dimensions 145600x145600?

– Devon Jarvis
Mar 26 at 21:30

add a comment |

I can't reproduce your error. Could you please specify how exactly you're getting None?

– Sharky
Mar 25 at 20:40

Hi @Sharky, thanks for having a look! I'm not sure what else to say, at the end of running the above code IH_partial_grads is None. So when you run the above code do you obtain an IH_partial_grads matrix which has dimensions 145600x145600?

– Devon Jarvis
Mar 26 at 21:30

I can't reproduce your error. Could you please specify how exactly you're getting None?

– Sharky
Mar 25 at 20:40

Hi @Sharky, thanks for having a look! I'm not sure what else to say, at the end of running the above code IH_partial_grads is None. So when you run the above code do you obtain an IH_partial_grads matrix which has dimensions 145600x145600?

– Devon Jarvis
Mar 26 at 21:30

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55316096%2fhow-to-obtain-second-derivatives-of-a-loss-function-with-respect-to-the-paramete%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현