Why the pytorch implementation is so inefficient?Why would a DQN give similar values to all actions in the action space (2) for all observationsWho changed my weight initialization?Test Accuracy Increases Whilst Loss IncreasesOutput of conv2d in kerasDeep neural network not learningbackprop in merged modelsThe output NN is image an image with values 0 or 1, but the expected are a range of integers between 0 and 255Conv2D and Conv3D: increase the accuracyHow can I freeze last layer of my own model?'Sequential' object has no attribute 'loss' - When I used GridSearchCV to tuning my Keras model

What does Kasparov mean by "I was behind in three and even in one after six games"?

Why are so many countries still in the Commonwealth?

Automatic Habit of Meditation

How important is a good quality camera for good photography?

How do professional electronic musicians/sound engineers combat listening fatigue?

What is the lowest-speed bogey a jet fighter can intercept/escort?

Spoken encryption

Is it legal to use cash pulled from a credit card to pay the monthly payment on that credit card?

Knights fighting a steam locomotive they believe is a dragon

Why is my read in of data taking so long?

Can two figures have the same area, perimeter, and same number of segments have different shape?

What is the meaning of "you has the wind of me"?

Area of parallelogram = Area of square. Shear transform

Why are off grid solar setups only 12, 24, 48 VDC?

Timing/Stack question about abilities triggered during combat

Iterate over non-const variables in C++

Inadvertently nuked my disk permission structure - why?

Reduce column width of table while also aligning values at decimal point

Where to place an artificial gland in the human body?

Send a single HTML email from Thunderbird, overriding the default "plain text" setting

Examples of simultaneous independent breakthroughs

How can I make sure my players' decisions have consequences?

This message is flooding my syslog, how to find where it comes from?

Is it normal practice to screen share with a client?



Why the pytorch implementation is so inefficient?


Why would a DQN give similar values to all actions in the action space (2) for all observationsWho changed my weight initialization?Test Accuracy Increases Whilst Loss IncreasesOutput of conv2d in kerasDeep neural network not learningbackprop in merged modelsThe output NN is image an image with values 0 or 1, but the expected are a range of integers between 0 and 255Conv2D and Conv3D: increase the accuracyHow can I freeze last layer of my own model?'Sequential' object has no attribute 'loss' - When I used GridSearchCV to tuning my Keras model






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








0















I have implemented a paper about a CNN architecture in both Keras and Pytorch but keras implementation is much more efficient it takes 4 gb of gpu for training with 50000 samples and 10000 validation samples but pytorch one takes all the 12 gb of gpu and i cant even use a validation set !
Optimizer for both of them is sgd with momentum and same settings for both.
more info about the paper:[architecture]:https://github.com/Moeinh77/Lightweight-Deep-Convolutional-Network-for-Tiny-Object-Recognition/edit/master/train.py



pytorch code :



class SimpleCNN(torch.nn.Module):

def __init__(self):
super(SimpleCNN, self).__init__()

self.conv2d_11 = torch.nn.Conv2d(3, 64, kernel_size = 3, stride = 1, padding = 1)
self.conv2d_12 = torch.nn.Conv2d(64, 64, kernel_size = 3, stride = 1, padding = 1)

self.conv2d_21 = torch.nn.Conv2d(64, 128, kernel_size = 3, stride = 1, padding = 1)
self.conv2d_22 = torch.nn.Conv2d(128, 128, kernel_size = 3, stride = 1, padding = 1)

self.conv2d_31 = torch.nn.Conv2d(128, 256, kernel_size = 3, stride = 1, padding = 1)
self.conv2d_32 = torch.nn.Conv2d(256, 256, kernel_size = 3, stride = 1, padding = 1)
self.conv2d_33 = torch.nn.Conv2d(256, 256, kernel_size = 3, stride = 1, padding = 1)

self.conv2d_41 = torch.nn.Conv2d(256, 512, kernel_size = 3, stride = 1, padding = 1)
self.conv2d_42 = torch.nn.Conv2d(512, 512, kernel_size = 3, stride = 1, padding = 1)

self.conv2d_51 = torch.nn.Conv2d(512, 512, kernel_size = 3, stride = 1, padding = 1)

self.Batchnorm_1=torch.nn.BatchNorm2d(64)
self.Batchnorm_2=torch.nn.BatchNorm2d(128)
self.Batchnorm_3=torch.nn.BatchNorm2d(256)
self.Batchnorm_4=torch.nn.BatchNorm2d(512)

self.dropout2d_1=torch.nn.Dropout2d(p=0.3)
self.dropout2d_2=torch.nn.Dropout2d(p=0.4)
self.dropout2d_3=torch.nn.Dropout2d(p=0.5)

self.dropout1d=torch.nn.Dropout(p=0.5)

self.maxpool2d = torch.nn.MaxPool2d(kernel_size = 2, stride = 2, padding = 0)

self.avgpool2d = torch.nn.AvgPool2d(kernel_size = 2, stride = 2, padding = 0)

self.fc = torch.nn.Linear(512, 10)

def forward(self, x):

############################# Phase 1
#print(x.size())
x = F.relu(self.conv2d_11(x))
x = self.dropout2d_1(x) #rate =0.3
x = self.Batchnorm_1(x) #input 64
#print(x.size())

x = F.relu(self.conv2d_12(x))
x = self.dropout2d_1(x) #rate=0.3
x = self.Batchnorm_1(x) #input 64
#print(x.size())

x = self.maxpool2d(x)
#print(x.size())
############################# Phase 2
x = F.relu(self.conv2d_21(x))
x = self.dropout2d_1(x) #rate=0.3
x = self.Batchnorm_2(x) #input 128
#print(x.size())

x = F.relu(self.conv2d_22(x))
x = self.dropout2d_1(x) #rate=0.3
x = self.Batchnorm_2(x) #input 128
#print(x.size())

x = self.maxpool2d(x)
#print(x.size())
############################# Phase 3
x = F.relu(self.conv2d_31(x))
x = self.dropout2d_2(x) #rate=0.4
x = self.Batchnorm_3(x) #input 256
#print(x.size())

x = F.relu(self.conv2d_32(x))
x = self.dropout2d_2(x) #rate=0.4
x = self.Batchnorm_3(x) #input 256
#print(x.size())

x = F.relu(self.conv2d_33(x))
x = self.dropout2d_2(x) #rate=0.4
x = self.Batchnorm_3(x) #input 256
#print(x.size())

x = self.maxpool2d(x)
#print(x.size())
############################# Phase 4
x = F.relu(self.conv2d_41(x))
x = self.dropout2d_2(x)
x = self.Batchnorm_4(x)
#print(x.size())

x = F.relu(self.conv2d_42(x))
x = self.dropout2d_2(x)
x = self.Batchnorm_4(x)
#print(x.size())

x = self.maxpool2d(x)
#print(x.size())
############################# Phase 5
x = F.relu(self.conv2d_51(x))
x = self.dropout2d_3(x)
x = self.Batchnorm_4(x)
#print(x.size())

x = self.avgpool2d(x)
#print(x.size())
x = x.view(x.size(0), -1)
#print(x.size())
x = self.dropout1d(x)
x = F.relu(self.fc(x))
x = self.dropout1d(x)
#print(x.size())
x = F.softmax(x)
###############################

return(x)


import time
from torch.optim.lr_scheduler import ReduceLROnPlateau

def trainNet(model, batch_size, n_epochs, learning_rate):

lr=learning_rate

#Print all of the hyperparameters of the training iteration:
print("======= HYPERPARAMETERS =======")
print("Batch size=", batch_size)
print("Epochs=", n_epochs)
print("Base learning_rate=", learning_rate)
print("=" * 30)

#Get training data
n_batches = len(train_loader)

#Time for printing
training_start_time = time.time()

#Loss function"
loss = torch.nn.CrossEntropyLoss()
optimizer = createOptimizer(model, lr)

scheduler = ReduceLROnPlateau(optimizer, 'min'
,patience=3,factor=0.9817
,verbose=True,)

#Loop for n_epochs
for epoch in range(n_epochs):

#save the weightsevery 10 epochs
if epoch % 10 == 0 :
torch.save(model.state_dict(), 'model.ckpt')


#print('learning rate : :.3f '.format(lr))
#Create our loss and optimizer functions

running_loss = 0.0
print_every = n_batches // 10
start_time = time.time()
total_train_loss = 0
total_train_acc = 0
epoch_time = 0

for i, data in enumerate(train_loader, 0):

#free up the cuda memory
inputs=None
labels=None

inputs, labels = data

inputs, labels = Variable(inputs.to(device)), Variable(labels.to(device))

optimizer.zero_grad()

outputs = model(inputs)

score, predictions = torch.max(outputs.data, 1)
acc = (labels==predictions).sum()
total_train_acc += acc

loss_size = loss(outputs, labels)
loss_size.backward()
optimizer.step()

running_loss += loss_size.item()
total_train_loss += loss_size.item()

#Print every 10th batch of an epoch
if (i + 1) % (print_every + 1) == 0:
print("Epoch , :d % t | train_loss: :.3f | train_acc:% | took: :.2fs".format(
epoch+1, int(100 * (i+1) / n_batches), running_loss / print_every
,int(acc), time.time() - start_time))

epoch_time += (time.time() - start_time)

#Reset running loss and time
running_loss = 0.0
start_time = time.time()

scheduler.step(total_train_loss)
torch.cuda.empty_cache()
#At the end of the epoch, do a pass on the validation set
total_val_loss = 0

for inputs, labels in val_loader:

#Wrap tensors in Variables
inputs, labels = Variable(inputs.to(device)), Variable(labels.to(device))

#Forward pass
val_outputs = model(inputs)
val_loss_size = loss(val_outputs, labels)
total_val_loss += val_loss_size.item()

print("-"*30)
print("Train loss = :.2f | Train acc = :.1f% | Val loss=:.2f | took: :.2fs".format(
total_train_loss / len(train_loader),total_train_acc/ len(train_loader)
,total_val_loss/len(val_loader),epoch_time))
print("="*60)


print("Training finished, took :.2fs".format(time.time() - training_start_time))
CNN = SimpleCNN().to(device)
CNN.eval()

trainNet(CNN, batch_size=64, n_epochs=250, learning_rate=0.1)



Keras:



from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten,Activation
from tensorflow.keras.layers import Conv2D, MaxPool2D,BatchNormalization,GlobalAveragePooling2D

model = Sequential()
#####################################################
# Phase 1
model.add(Conv2D(64,(3,3),input_shape=(32,32,3),padding='same'))
model.add(Activation('relu'))
model.add(Dropout(rate=0.3))
model.add(BatchNormalization())

#(32,32,3)

model.add(Conv2D(64,(3,3),padding='same'))
model.add(Activation('relu'))
model.add(Dropout(rate=0.3))
model.add(BatchNormalization())
#(32,32,3)


model.add(MaxPool2D((2,2)))
#(16,16,3)

#####################################################
#Phase 2
model.add(Conv2D(128, (3,3),padding='same'))
model.add(Activation('relu'))
model.add(Dropout(rate=0.3))
model.add(BatchNormalization())
#(16,16,3)

model.add(Conv2D(128, (3,3),padding='same'))
model.add(Activation('relu'))
model.add(Dropout(rate=0.3))
model.add(BatchNormalization())
#(16,16,3)

model.add(MaxPool2D((2,2),padding='same'))
#(8,8,3)

#####################################################
#Phase 3
model.add(Conv2D(256, (3,3),padding='same'))
model.add(Activation('relu'))
model.add(Dropout(rate=0.4))
model.add(BatchNormalization())
#(8,8,3)


model.add(Conv2D(256, (3,3),padding='same'))
model.add(Activation('relu'))
model.add(Dropout(rate=0.4))
model.add(BatchNormalization())
#(8,8,3)

model.add(Conv2D(256, (3,3),padding='same'))
model.add(Activation('relu'))
model.add(Dropout(rate=0.4))
model.add(BatchNormalization())
#(8,8,3)

model.add(MaxPool2D((2,2)))
#(4,4,3)

#####################################################
#Phase 4
model.add(Conv2D(512, (3,3),padding='same'))
model.add(Activation('relu'))
model.add(Dropout(rate=0.4))
model.add(BatchNormalization())
#(4,4,3)

model.add(Conv2D(512, (3,3),padding='same'))
model.add(Activation('relu'))
model.add(Dropout(rate=0.4))
model.add(BatchNormalization())
#(4,4,3)

model.add(MaxPool2D((2,2)))
#(2,2,3)

#####################################################
#Phase 5
model.add(Conv2D(512, (3,3),padding='same'))
model.add(Activation('relu'))
model.add(Dropout(rate=0.5))
model.add(BatchNormalization())
#(2,2,3)

model.add(GlobalAveragePooling2D(data_format='channels_last'))
model.add(Flatten())
model.add(Dropout(rate=0.5))

model.add(Dense(10,activation='relu'))
model.add(Dropout(rate=0.5))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer=sgd_optimizer,loss='categorical_crossentropy',metrics=['accuracy'])
history=model.fit(x=x_train,y=y_train,batch_size=64,
epochs=250,verbose=1,callbacks=[checkpoint],validation_data=(x_test,y_test))









share|improve this question




























    0















    I have implemented a paper about a CNN architecture in both Keras and Pytorch but keras implementation is much more efficient it takes 4 gb of gpu for training with 50000 samples and 10000 validation samples but pytorch one takes all the 12 gb of gpu and i cant even use a validation set !
    Optimizer for both of them is sgd with momentum and same settings for both.
    more info about the paper:[architecture]:https://github.com/Moeinh77/Lightweight-Deep-Convolutional-Network-for-Tiny-Object-Recognition/edit/master/train.py



    pytorch code :



    class SimpleCNN(torch.nn.Module):

    def __init__(self):
    super(SimpleCNN, self).__init__()

    self.conv2d_11 = torch.nn.Conv2d(3, 64, kernel_size = 3, stride = 1, padding = 1)
    self.conv2d_12 = torch.nn.Conv2d(64, 64, kernel_size = 3, stride = 1, padding = 1)

    self.conv2d_21 = torch.nn.Conv2d(64, 128, kernel_size = 3, stride = 1, padding = 1)
    self.conv2d_22 = torch.nn.Conv2d(128, 128, kernel_size = 3, stride = 1, padding = 1)

    self.conv2d_31 = torch.nn.Conv2d(128, 256, kernel_size = 3, stride = 1, padding = 1)
    self.conv2d_32 = torch.nn.Conv2d(256, 256, kernel_size = 3, stride = 1, padding = 1)
    self.conv2d_33 = torch.nn.Conv2d(256, 256, kernel_size = 3, stride = 1, padding = 1)

    self.conv2d_41 = torch.nn.Conv2d(256, 512, kernel_size = 3, stride = 1, padding = 1)
    self.conv2d_42 = torch.nn.Conv2d(512, 512, kernel_size = 3, stride = 1, padding = 1)

    self.conv2d_51 = torch.nn.Conv2d(512, 512, kernel_size = 3, stride = 1, padding = 1)

    self.Batchnorm_1=torch.nn.BatchNorm2d(64)
    self.Batchnorm_2=torch.nn.BatchNorm2d(128)
    self.Batchnorm_3=torch.nn.BatchNorm2d(256)
    self.Batchnorm_4=torch.nn.BatchNorm2d(512)

    self.dropout2d_1=torch.nn.Dropout2d(p=0.3)
    self.dropout2d_2=torch.nn.Dropout2d(p=0.4)
    self.dropout2d_3=torch.nn.Dropout2d(p=0.5)

    self.dropout1d=torch.nn.Dropout(p=0.5)

    self.maxpool2d = torch.nn.MaxPool2d(kernel_size = 2, stride = 2, padding = 0)

    self.avgpool2d = torch.nn.AvgPool2d(kernel_size = 2, stride = 2, padding = 0)

    self.fc = torch.nn.Linear(512, 10)

    def forward(self, x):

    ############################# Phase 1
    #print(x.size())
    x = F.relu(self.conv2d_11(x))
    x = self.dropout2d_1(x) #rate =0.3
    x = self.Batchnorm_1(x) #input 64
    #print(x.size())

    x = F.relu(self.conv2d_12(x))
    x = self.dropout2d_1(x) #rate=0.3
    x = self.Batchnorm_1(x) #input 64
    #print(x.size())

    x = self.maxpool2d(x)
    #print(x.size())
    ############################# Phase 2
    x = F.relu(self.conv2d_21(x))
    x = self.dropout2d_1(x) #rate=0.3
    x = self.Batchnorm_2(x) #input 128
    #print(x.size())

    x = F.relu(self.conv2d_22(x))
    x = self.dropout2d_1(x) #rate=0.3
    x = self.Batchnorm_2(x) #input 128
    #print(x.size())

    x = self.maxpool2d(x)
    #print(x.size())
    ############################# Phase 3
    x = F.relu(self.conv2d_31(x))
    x = self.dropout2d_2(x) #rate=0.4
    x = self.Batchnorm_3(x) #input 256
    #print(x.size())

    x = F.relu(self.conv2d_32(x))
    x = self.dropout2d_2(x) #rate=0.4
    x = self.Batchnorm_3(x) #input 256
    #print(x.size())

    x = F.relu(self.conv2d_33(x))
    x = self.dropout2d_2(x) #rate=0.4
    x = self.Batchnorm_3(x) #input 256
    #print(x.size())

    x = self.maxpool2d(x)
    #print(x.size())
    ############################# Phase 4
    x = F.relu(self.conv2d_41(x))
    x = self.dropout2d_2(x)
    x = self.Batchnorm_4(x)
    #print(x.size())

    x = F.relu(self.conv2d_42(x))
    x = self.dropout2d_2(x)
    x = self.Batchnorm_4(x)
    #print(x.size())

    x = self.maxpool2d(x)
    #print(x.size())
    ############################# Phase 5
    x = F.relu(self.conv2d_51(x))
    x = self.dropout2d_3(x)
    x = self.Batchnorm_4(x)
    #print(x.size())

    x = self.avgpool2d(x)
    #print(x.size())
    x = x.view(x.size(0), -1)
    #print(x.size())
    x = self.dropout1d(x)
    x = F.relu(self.fc(x))
    x = self.dropout1d(x)
    #print(x.size())
    x = F.softmax(x)
    ###############################

    return(x)


    import time
    from torch.optim.lr_scheduler import ReduceLROnPlateau

    def trainNet(model, batch_size, n_epochs, learning_rate):

    lr=learning_rate

    #Print all of the hyperparameters of the training iteration:
    print("======= HYPERPARAMETERS =======")
    print("Batch size=", batch_size)
    print("Epochs=", n_epochs)
    print("Base learning_rate=", learning_rate)
    print("=" * 30)

    #Get training data
    n_batches = len(train_loader)

    #Time for printing
    training_start_time = time.time()

    #Loss function"
    loss = torch.nn.CrossEntropyLoss()
    optimizer = createOptimizer(model, lr)

    scheduler = ReduceLROnPlateau(optimizer, 'min'
    ,patience=3,factor=0.9817
    ,verbose=True,)

    #Loop for n_epochs
    for epoch in range(n_epochs):

    #save the weightsevery 10 epochs
    if epoch % 10 == 0 :
    torch.save(model.state_dict(), 'model.ckpt')


    #print('learning rate : :.3f '.format(lr))
    #Create our loss and optimizer functions

    running_loss = 0.0
    print_every = n_batches // 10
    start_time = time.time()
    total_train_loss = 0
    total_train_acc = 0
    epoch_time = 0

    for i, data in enumerate(train_loader, 0):

    #free up the cuda memory
    inputs=None
    labels=None

    inputs, labels = data

    inputs, labels = Variable(inputs.to(device)), Variable(labels.to(device))

    optimizer.zero_grad()

    outputs = model(inputs)

    score, predictions = torch.max(outputs.data, 1)
    acc = (labels==predictions).sum()
    total_train_acc += acc

    loss_size = loss(outputs, labels)
    loss_size.backward()
    optimizer.step()

    running_loss += loss_size.item()
    total_train_loss += loss_size.item()

    #Print every 10th batch of an epoch
    if (i + 1) % (print_every + 1) == 0:
    print("Epoch , :d % t | train_loss: :.3f | train_acc:% | took: :.2fs".format(
    epoch+1, int(100 * (i+1) / n_batches), running_loss / print_every
    ,int(acc), time.time() - start_time))

    epoch_time += (time.time() - start_time)

    #Reset running loss and time
    running_loss = 0.0
    start_time = time.time()

    scheduler.step(total_train_loss)
    torch.cuda.empty_cache()
    #At the end of the epoch, do a pass on the validation set
    total_val_loss = 0

    for inputs, labels in val_loader:

    #Wrap tensors in Variables
    inputs, labels = Variable(inputs.to(device)), Variable(labels.to(device))

    #Forward pass
    val_outputs = model(inputs)
    val_loss_size = loss(val_outputs, labels)
    total_val_loss += val_loss_size.item()

    print("-"*30)
    print("Train loss = :.2f | Train acc = :.1f% | Val loss=:.2f | took: :.2fs".format(
    total_train_loss / len(train_loader),total_train_acc/ len(train_loader)
    ,total_val_loss/len(val_loader),epoch_time))
    print("="*60)


    print("Training finished, took :.2fs".format(time.time() - training_start_time))
    CNN = SimpleCNN().to(device)
    CNN.eval()

    trainNet(CNN, batch_size=64, n_epochs=250, learning_rate=0.1)



    Keras:



    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import Dense, Dropout, Flatten,Activation
    from tensorflow.keras.layers import Conv2D, MaxPool2D,BatchNormalization,GlobalAveragePooling2D

    model = Sequential()
    #####################################################
    # Phase 1
    model.add(Conv2D(64,(3,3),input_shape=(32,32,3),padding='same'))
    model.add(Activation('relu'))
    model.add(Dropout(rate=0.3))
    model.add(BatchNormalization())

    #(32,32,3)

    model.add(Conv2D(64,(3,3),padding='same'))
    model.add(Activation('relu'))
    model.add(Dropout(rate=0.3))
    model.add(BatchNormalization())
    #(32,32,3)


    model.add(MaxPool2D((2,2)))
    #(16,16,3)

    #####################################################
    #Phase 2
    model.add(Conv2D(128, (3,3),padding='same'))
    model.add(Activation('relu'))
    model.add(Dropout(rate=0.3))
    model.add(BatchNormalization())
    #(16,16,3)

    model.add(Conv2D(128, (3,3),padding='same'))
    model.add(Activation('relu'))
    model.add(Dropout(rate=0.3))
    model.add(BatchNormalization())
    #(16,16,3)

    model.add(MaxPool2D((2,2),padding='same'))
    #(8,8,3)

    #####################################################
    #Phase 3
    model.add(Conv2D(256, (3,3),padding='same'))
    model.add(Activation('relu'))
    model.add(Dropout(rate=0.4))
    model.add(BatchNormalization())
    #(8,8,3)


    model.add(Conv2D(256, (3,3),padding='same'))
    model.add(Activation('relu'))
    model.add(Dropout(rate=0.4))
    model.add(BatchNormalization())
    #(8,8,3)

    model.add(Conv2D(256, (3,3),padding='same'))
    model.add(Activation('relu'))
    model.add(Dropout(rate=0.4))
    model.add(BatchNormalization())
    #(8,8,3)

    model.add(MaxPool2D((2,2)))
    #(4,4,3)

    #####################################################
    #Phase 4
    model.add(Conv2D(512, (3,3),padding='same'))
    model.add(Activation('relu'))
    model.add(Dropout(rate=0.4))
    model.add(BatchNormalization())
    #(4,4,3)

    model.add(Conv2D(512, (3,3),padding='same'))
    model.add(Activation('relu'))
    model.add(Dropout(rate=0.4))
    model.add(BatchNormalization())
    #(4,4,3)

    model.add(MaxPool2D((2,2)))
    #(2,2,3)

    #####################################################
    #Phase 5
    model.add(Conv2D(512, (3,3),padding='same'))
    model.add(Activation('relu'))
    model.add(Dropout(rate=0.5))
    model.add(BatchNormalization())
    #(2,2,3)

    model.add(GlobalAveragePooling2D(data_format='channels_last'))
    model.add(Flatten())
    model.add(Dropout(rate=0.5))

    model.add(Dense(10,activation='relu'))
    model.add(Dropout(rate=0.5))
    model.add(Dense(10, activation='softmax'))
    model.compile(optimizer=sgd_optimizer,loss='categorical_crossentropy',metrics=['accuracy'])
    history=model.fit(x=x_train,y=y_train,batch_size=64,
    epochs=250,verbose=1,callbacks=[checkpoint],validation_data=(x_test,y_test))









    share|improve this question
























      0












      0








      0


      1






      I have implemented a paper about a CNN architecture in both Keras and Pytorch but keras implementation is much more efficient it takes 4 gb of gpu for training with 50000 samples and 10000 validation samples but pytorch one takes all the 12 gb of gpu and i cant even use a validation set !
      Optimizer for both of them is sgd with momentum and same settings for both.
      more info about the paper:[architecture]:https://github.com/Moeinh77/Lightweight-Deep-Convolutional-Network-for-Tiny-Object-Recognition/edit/master/train.py



      pytorch code :



      class SimpleCNN(torch.nn.Module):

      def __init__(self):
      super(SimpleCNN, self).__init__()

      self.conv2d_11 = torch.nn.Conv2d(3, 64, kernel_size = 3, stride = 1, padding = 1)
      self.conv2d_12 = torch.nn.Conv2d(64, 64, kernel_size = 3, stride = 1, padding = 1)

      self.conv2d_21 = torch.nn.Conv2d(64, 128, kernel_size = 3, stride = 1, padding = 1)
      self.conv2d_22 = torch.nn.Conv2d(128, 128, kernel_size = 3, stride = 1, padding = 1)

      self.conv2d_31 = torch.nn.Conv2d(128, 256, kernel_size = 3, stride = 1, padding = 1)
      self.conv2d_32 = torch.nn.Conv2d(256, 256, kernel_size = 3, stride = 1, padding = 1)
      self.conv2d_33 = torch.nn.Conv2d(256, 256, kernel_size = 3, stride = 1, padding = 1)

      self.conv2d_41 = torch.nn.Conv2d(256, 512, kernel_size = 3, stride = 1, padding = 1)
      self.conv2d_42 = torch.nn.Conv2d(512, 512, kernel_size = 3, stride = 1, padding = 1)

      self.conv2d_51 = torch.nn.Conv2d(512, 512, kernel_size = 3, stride = 1, padding = 1)

      self.Batchnorm_1=torch.nn.BatchNorm2d(64)
      self.Batchnorm_2=torch.nn.BatchNorm2d(128)
      self.Batchnorm_3=torch.nn.BatchNorm2d(256)
      self.Batchnorm_4=torch.nn.BatchNorm2d(512)

      self.dropout2d_1=torch.nn.Dropout2d(p=0.3)
      self.dropout2d_2=torch.nn.Dropout2d(p=0.4)
      self.dropout2d_3=torch.nn.Dropout2d(p=0.5)

      self.dropout1d=torch.nn.Dropout(p=0.5)

      self.maxpool2d = torch.nn.MaxPool2d(kernel_size = 2, stride = 2, padding = 0)

      self.avgpool2d = torch.nn.AvgPool2d(kernel_size = 2, stride = 2, padding = 0)

      self.fc = torch.nn.Linear(512, 10)

      def forward(self, x):

      ############################# Phase 1
      #print(x.size())
      x = F.relu(self.conv2d_11(x))
      x = self.dropout2d_1(x) #rate =0.3
      x = self.Batchnorm_1(x) #input 64
      #print(x.size())

      x = F.relu(self.conv2d_12(x))
      x = self.dropout2d_1(x) #rate=0.3
      x = self.Batchnorm_1(x) #input 64
      #print(x.size())

      x = self.maxpool2d(x)
      #print(x.size())
      ############################# Phase 2
      x = F.relu(self.conv2d_21(x))
      x = self.dropout2d_1(x) #rate=0.3
      x = self.Batchnorm_2(x) #input 128
      #print(x.size())

      x = F.relu(self.conv2d_22(x))
      x = self.dropout2d_1(x) #rate=0.3
      x = self.Batchnorm_2(x) #input 128
      #print(x.size())

      x = self.maxpool2d(x)
      #print(x.size())
      ############################# Phase 3
      x = F.relu(self.conv2d_31(x))
      x = self.dropout2d_2(x) #rate=0.4
      x = self.Batchnorm_3(x) #input 256
      #print(x.size())

      x = F.relu(self.conv2d_32(x))
      x = self.dropout2d_2(x) #rate=0.4
      x = self.Batchnorm_3(x) #input 256
      #print(x.size())

      x = F.relu(self.conv2d_33(x))
      x = self.dropout2d_2(x) #rate=0.4
      x = self.Batchnorm_3(x) #input 256
      #print(x.size())

      x = self.maxpool2d(x)
      #print(x.size())
      ############################# Phase 4
      x = F.relu(self.conv2d_41(x))
      x = self.dropout2d_2(x)
      x = self.Batchnorm_4(x)
      #print(x.size())

      x = F.relu(self.conv2d_42(x))
      x = self.dropout2d_2(x)
      x = self.Batchnorm_4(x)
      #print(x.size())

      x = self.maxpool2d(x)
      #print(x.size())
      ############################# Phase 5
      x = F.relu(self.conv2d_51(x))
      x = self.dropout2d_3(x)
      x = self.Batchnorm_4(x)
      #print(x.size())

      x = self.avgpool2d(x)
      #print(x.size())
      x = x.view(x.size(0), -1)
      #print(x.size())
      x = self.dropout1d(x)
      x = F.relu(self.fc(x))
      x = self.dropout1d(x)
      #print(x.size())
      x = F.softmax(x)
      ###############################

      return(x)


      import time
      from torch.optim.lr_scheduler import ReduceLROnPlateau

      def trainNet(model, batch_size, n_epochs, learning_rate):

      lr=learning_rate

      #Print all of the hyperparameters of the training iteration:
      print("======= HYPERPARAMETERS =======")
      print("Batch size=", batch_size)
      print("Epochs=", n_epochs)
      print("Base learning_rate=", learning_rate)
      print("=" * 30)

      #Get training data
      n_batches = len(train_loader)

      #Time for printing
      training_start_time = time.time()

      #Loss function"
      loss = torch.nn.CrossEntropyLoss()
      optimizer = createOptimizer(model, lr)

      scheduler = ReduceLROnPlateau(optimizer, 'min'
      ,patience=3,factor=0.9817
      ,verbose=True,)

      #Loop for n_epochs
      for epoch in range(n_epochs):

      #save the weightsevery 10 epochs
      if epoch % 10 == 0 :
      torch.save(model.state_dict(), 'model.ckpt')


      #print('learning rate : :.3f '.format(lr))
      #Create our loss and optimizer functions

      running_loss = 0.0
      print_every = n_batches // 10
      start_time = time.time()
      total_train_loss = 0
      total_train_acc = 0
      epoch_time = 0

      for i, data in enumerate(train_loader, 0):

      #free up the cuda memory
      inputs=None
      labels=None

      inputs, labels = data

      inputs, labels = Variable(inputs.to(device)), Variable(labels.to(device))

      optimizer.zero_grad()

      outputs = model(inputs)

      score, predictions = torch.max(outputs.data, 1)
      acc = (labels==predictions).sum()
      total_train_acc += acc

      loss_size = loss(outputs, labels)
      loss_size.backward()
      optimizer.step()

      running_loss += loss_size.item()
      total_train_loss += loss_size.item()

      #Print every 10th batch of an epoch
      if (i + 1) % (print_every + 1) == 0:
      print("Epoch , :d % t | train_loss: :.3f | train_acc:% | took: :.2fs".format(
      epoch+1, int(100 * (i+1) / n_batches), running_loss / print_every
      ,int(acc), time.time() - start_time))

      epoch_time += (time.time() - start_time)

      #Reset running loss and time
      running_loss = 0.0
      start_time = time.time()

      scheduler.step(total_train_loss)
      torch.cuda.empty_cache()
      #At the end of the epoch, do a pass on the validation set
      total_val_loss = 0

      for inputs, labels in val_loader:

      #Wrap tensors in Variables
      inputs, labels = Variable(inputs.to(device)), Variable(labels.to(device))

      #Forward pass
      val_outputs = model(inputs)
      val_loss_size = loss(val_outputs, labels)
      total_val_loss += val_loss_size.item()

      print("-"*30)
      print("Train loss = :.2f | Train acc = :.1f% | Val loss=:.2f | took: :.2fs".format(
      total_train_loss / len(train_loader),total_train_acc/ len(train_loader)
      ,total_val_loss/len(val_loader),epoch_time))
      print("="*60)


      print("Training finished, took :.2fs".format(time.time() - training_start_time))
      CNN = SimpleCNN().to(device)
      CNN.eval()

      trainNet(CNN, batch_size=64, n_epochs=250, learning_rate=0.1)



      Keras:



      from tensorflow.keras.models import Sequential
      from tensorflow.keras.layers import Dense, Dropout, Flatten,Activation
      from tensorflow.keras.layers import Conv2D, MaxPool2D,BatchNormalization,GlobalAveragePooling2D

      model = Sequential()
      #####################################################
      # Phase 1
      model.add(Conv2D(64,(3,3),input_shape=(32,32,3),padding='same'))
      model.add(Activation('relu'))
      model.add(Dropout(rate=0.3))
      model.add(BatchNormalization())

      #(32,32,3)

      model.add(Conv2D(64,(3,3),padding='same'))
      model.add(Activation('relu'))
      model.add(Dropout(rate=0.3))
      model.add(BatchNormalization())
      #(32,32,3)


      model.add(MaxPool2D((2,2)))
      #(16,16,3)

      #####################################################
      #Phase 2
      model.add(Conv2D(128, (3,3),padding='same'))
      model.add(Activation('relu'))
      model.add(Dropout(rate=0.3))
      model.add(BatchNormalization())
      #(16,16,3)

      model.add(Conv2D(128, (3,3),padding='same'))
      model.add(Activation('relu'))
      model.add(Dropout(rate=0.3))
      model.add(BatchNormalization())
      #(16,16,3)

      model.add(MaxPool2D((2,2),padding='same'))
      #(8,8,3)

      #####################################################
      #Phase 3
      model.add(Conv2D(256, (3,3),padding='same'))
      model.add(Activation('relu'))
      model.add(Dropout(rate=0.4))
      model.add(BatchNormalization())
      #(8,8,3)


      model.add(Conv2D(256, (3,3),padding='same'))
      model.add(Activation('relu'))
      model.add(Dropout(rate=0.4))
      model.add(BatchNormalization())
      #(8,8,3)

      model.add(Conv2D(256, (3,3),padding='same'))
      model.add(Activation('relu'))
      model.add(Dropout(rate=0.4))
      model.add(BatchNormalization())
      #(8,8,3)

      model.add(MaxPool2D((2,2)))
      #(4,4,3)

      #####################################################
      #Phase 4
      model.add(Conv2D(512, (3,3),padding='same'))
      model.add(Activation('relu'))
      model.add(Dropout(rate=0.4))
      model.add(BatchNormalization())
      #(4,4,3)

      model.add(Conv2D(512, (3,3),padding='same'))
      model.add(Activation('relu'))
      model.add(Dropout(rate=0.4))
      model.add(BatchNormalization())
      #(4,4,3)

      model.add(MaxPool2D((2,2)))
      #(2,2,3)

      #####################################################
      #Phase 5
      model.add(Conv2D(512, (3,3),padding='same'))
      model.add(Activation('relu'))
      model.add(Dropout(rate=0.5))
      model.add(BatchNormalization())
      #(2,2,3)

      model.add(GlobalAveragePooling2D(data_format='channels_last'))
      model.add(Flatten())
      model.add(Dropout(rate=0.5))

      model.add(Dense(10,activation='relu'))
      model.add(Dropout(rate=0.5))
      model.add(Dense(10, activation='softmax'))
      model.compile(optimizer=sgd_optimizer,loss='categorical_crossentropy',metrics=['accuracy'])
      history=model.fit(x=x_train,y=y_train,batch_size=64,
      epochs=250,verbose=1,callbacks=[checkpoint],validation_data=(x_test,y_test))









      share|improve this question














      I have implemented a paper about a CNN architecture in both Keras and Pytorch but keras implementation is much more efficient it takes 4 gb of gpu for training with 50000 samples and 10000 validation samples but pytorch one takes all the 12 gb of gpu and i cant even use a validation set !
      Optimizer for both of them is sgd with momentum and same settings for both.
      more info about the paper:[architecture]:https://github.com/Moeinh77/Lightweight-Deep-Convolutional-Network-for-Tiny-Object-Recognition/edit/master/train.py



      pytorch code :



      class SimpleCNN(torch.nn.Module):

      def __init__(self):
      super(SimpleCNN, self).__init__()

      self.conv2d_11 = torch.nn.Conv2d(3, 64, kernel_size = 3, stride = 1, padding = 1)
      self.conv2d_12 = torch.nn.Conv2d(64, 64, kernel_size = 3, stride = 1, padding = 1)

      self.conv2d_21 = torch.nn.Conv2d(64, 128, kernel_size = 3, stride = 1, padding = 1)
      self.conv2d_22 = torch.nn.Conv2d(128, 128, kernel_size = 3, stride = 1, padding = 1)

      self.conv2d_31 = torch.nn.Conv2d(128, 256, kernel_size = 3, stride = 1, padding = 1)
      self.conv2d_32 = torch.nn.Conv2d(256, 256, kernel_size = 3, stride = 1, padding = 1)
      self.conv2d_33 = torch.nn.Conv2d(256, 256, kernel_size = 3, stride = 1, padding = 1)

      self.conv2d_41 = torch.nn.Conv2d(256, 512, kernel_size = 3, stride = 1, padding = 1)
      self.conv2d_42 = torch.nn.Conv2d(512, 512, kernel_size = 3, stride = 1, padding = 1)

      self.conv2d_51 = torch.nn.Conv2d(512, 512, kernel_size = 3, stride = 1, padding = 1)

      self.Batchnorm_1=torch.nn.BatchNorm2d(64)
      self.Batchnorm_2=torch.nn.BatchNorm2d(128)
      self.Batchnorm_3=torch.nn.BatchNorm2d(256)
      self.Batchnorm_4=torch.nn.BatchNorm2d(512)

      self.dropout2d_1=torch.nn.Dropout2d(p=0.3)
      self.dropout2d_2=torch.nn.Dropout2d(p=0.4)
      self.dropout2d_3=torch.nn.Dropout2d(p=0.5)

      self.dropout1d=torch.nn.Dropout(p=0.5)

      self.maxpool2d = torch.nn.MaxPool2d(kernel_size = 2, stride = 2, padding = 0)

      self.avgpool2d = torch.nn.AvgPool2d(kernel_size = 2, stride = 2, padding = 0)

      self.fc = torch.nn.Linear(512, 10)

      def forward(self, x):

      ############################# Phase 1
      #print(x.size())
      x = F.relu(self.conv2d_11(x))
      x = self.dropout2d_1(x) #rate =0.3
      x = self.Batchnorm_1(x) #input 64
      #print(x.size())

      x = F.relu(self.conv2d_12(x))
      x = self.dropout2d_1(x) #rate=0.3
      x = self.Batchnorm_1(x) #input 64
      #print(x.size())

      x = self.maxpool2d(x)
      #print(x.size())
      ############################# Phase 2
      x = F.relu(self.conv2d_21(x))
      x = self.dropout2d_1(x) #rate=0.3
      x = self.Batchnorm_2(x) #input 128
      #print(x.size())

      x = F.relu(self.conv2d_22(x))
      x = self.dropout2d_1(x) #rate=0.3
      x = self.Batchnorm_2(x) #input 128
      #print(x.size())

      x = self.maxpool2d(x)
      #print(x.size())
      ############################# Phase 3
      x = F.relu(self.conv2d_31(x))
      x = self.dropout2d_2(x) #rate=0.4
      x = self.Batchnorm_3(x) #input 256
      #print(x.size())

      x = F.relu(self.conv2d_32(x))
      x = self.dropout2d_2(x) #rate=0.4
      x = self.Batchnorm_3(x) #input 256
      #print(x.size())

      x = F.relu(self.conv2d_33(x))
      x = self.dropout2d_2(x) #rate=0.4
      x = self.Batchnorm_3(x) #input 256
      #print(x.size())

      x = self.maxpool2d(x)
      #print(x.size())
      ############################# Phase 4
      x = F.relu(self.conv2d_41(x))
      x = self.dropout2d_2(x)
      x = self.Batchnorm_4(x)
      #print(x.size())

      x = F.relu(self.conv2d_42(x))
      x = self.dropout2d_2(x)
      x = self.Batchnorm_4(x)
      #print(x.size())

      x = self.maxpool2d(x)
      #print(x.size())
      ############################# Phase 5
      x = F.relu(self.conv2d_51(x))
      x = self.dropout2d_3(x)
      x = self.Batchnorm_4(x)
      #print(x.size())

      x = self.avgpool2d(x)
      #print(x.size())
      x = x.view(x.size(0), -1)
      #print(x.size())
      x = self.dropout1d(x)
      x = F.relu(self.fc(x))
      x = self.dropout1d(x)
      #print(x.size())
      x = F.softmax(x)
      ###############################

      return(x)


      import time
      from torch.optim.lr_scheduler import ReduceLROnPlateau

      def trainNet(model, batch_size, n_epochs, learning_rate):

      lr=learning_rate

      #Print all of the hyperparameters of the training iteration:
      print("======= HYPERPARAMETERS =======")
      print("Batch size=", batch_size)
      print("Epochs=", n_epochs)
      print("Base learning_rate=", learning_rate)
      print("=" * 30)

      #Get training data
      n_batches = len(train_loader)

      #Time for printing
      training_start_time = time.time()

      #Loss function"
      loss = torch.nn.CrossEntropyLoss()
      optimizer = createOptimizer(model, lr)

      scheduler = ReduceLROnPlateau(optimizer, 'min'
      ,patience=3,factor=0.9817
      ,verbose=True,)

      #Loop for n_epochs
      for epoch in range(n_epochs):

      #save the weightsevery 10 epochs
      if epoch % 10 == 0 :
      torch.save(model.state_dict(), 'model.ckpt')


      #print('learning rate : :.3f '.format(lr))
      #Create our loss and optimizer functions

      running_loss = 0.0
      print_every = n_batches // 10
      start_time = time.time()
      total_train_loss = 0
      total_train_acc = 0
      epoch_time = 0

      for i, data in enumerate(train_loader, 0):

      #free up the cuda memory
      inputs=None
      labels=None

      inputs, labels = data

      inputs, labels = Variable(inputs.to(device)), Variable(labels.to(device))

      optimizer.zero_grad()

      outputs = model(inputs)

      score, predictions = torch.max(outputs.data, 1)
      acc = (labels==predictions).sum()
      total_train_acc += acc

      loss_size = loss(outputs, labels)
      loss_size.backward()
      optimizer.step()

      running_loss += loss_size.item()
      total_train_loss += loss_size.item()

      #Print every 10th batch of an epoch
      if (i + 1) % (print_every + 1) == 0:
      print("Epoch , :d % t | train_loss: :.3f | train_acc:% | took: :.2fs".format(
      epoch+1, int(100 * (i+1) / n_batches), running_loss / print_every
      ,int(acc), time.time() - start_time))

      epoch_time += (time.time() - start_time)

      #Reset running loss and time
      running_loss = 0.0
      start_time = time.time()

      scheduler.step(total_train_loss)
      torch.cuda.empty_cache()
      #At the end of the epoch, do a pass on the validation set
      total_val_loss = 0

      for inputs, labels in val_loader:

      #Wrap tensors in Variables
      inputs, labels = Variable(inputs.to(device)), Variable(labels.to(device))

      #Forward pass
      val_outputs = model(inputs)
      val_loss_size = loss(val_outputs, labels)
      total_val_loss += val_loss_size.item()

      print("-"*30)
      print("Train loss = :.2f | Train acc = :.1f% | Val loss=:.2f | took: :.2fs".format(
      total_train_loss / len(train_loader),total_train_acc/ len(train_loader)
      ,total_val_loss/len(val_loader),epoch_time))
      print("="*60)


      print("Training finished, took :.2fs".format(time.time() - training_start_time))
      CNN = SimpleCNN().to(device)
      CNN.eval()

      trainNet(CNN, batch_size=64, n_epochs=250, learning_rate=0.1)



      Keras:



      from tensorflow.keras.models import Sequential
      from tensorflow.keras.layers import Dense, Dropout, Flatten,Activation
      from tensorflow.keras.layers import Conv2D, MaxPool2D,BatchNormalization,GlobalAveragePooling2D

      model = Sequential()
      #####################################################
      # Phase 1
      model.add(Conv2D(64,(3,3),input_shape=(32,32,3),padding='same'))
      model.add(Activation('relu'))
      model.add(Dropout(rate=0.3))
      model.add(BatchNormalization())

      #(32,32,3)

      model.add(Conv2D(64,(3,3),padding='same'))
      model.add(Activation('relu'))
      model.add(Dropout(rate=0.3))
      model.add(BatchNormalization())
      #(32,32,3)


      model.add(MaxPool2D((2,2)))
      #(16,16,3)

      #####################################################
      #Phase 2
      model.add(Conv2D(128, (3,3),padding='same'))
      model.add(Activation('relu'))
      model.add(Dropout(rate=0.3))
      model.add(BatchNormalization())
      #(16,16,3)

      model.add(Conv2D(128, (3,3),padding='same'))
      model.add(Activation('relu'))
      model.add(Dropout(rate=0.3))
      model.add(BatchNormalization())
      #(16,16,3)

      model.add(MaxPool2D((2,2),padding='same'))
      #(8,8,3)

      #####################################################
      #Phase 3
      model.add(Conv2D(256, (3,3),padding='same'))
      model.add(Activation('relu'))
      model.add(Dropout(rate=0.4))
      model.add(BatchNormalization())
      #(8,8,3)


      model.add(Conv2D(256, (3,3),padding='same'))
      model.add(Activation('relu'))
      model.add(Dropout(rate=0.4))
      model.add(BatchNormalization())
      #(8,8,3)

      model.add(Conv2D(256, (3,3),padding='same'))
      model.add(Activation('relu'))
      model.add(Dropout(rate=0.4))
      model.add(BatchNormalization())
      #(8,8,3)

      model.add(MaxPool2D((2,2)))
      #(4,4,3)

      #####################################################
      #Phase 4
      model.add(Conv2D(512, (3,3),padding='same'))
      model.add(Activation('relu'))
      model.add(Dropout(rate=0.4))
      model.add(BatchNormalization())
      #(4,4,3)

      model.add(Conv2D(512, (3,3),padding='same'))
      model.add(Activation('relu'))
      model.add(Dropout(rate=0.4))
      model.add(BatchNormalization())
      #(4,4,3)

      model.add(MaxPool2D((2,2)))
      #(2,2,3)

      #####################################################
      #Phase 5
      model.add(Conv2D(512, (3,3),padding='same'))
      model.add(Activation('relu'))
      model.add(Dropout(rate=0.5))
      model.add(BatchNormalization())
      #(2,2,3)

      model.add(GlobalAveragePooling2D(data_format='channels_last'))
      model.add(Flatten())
      model.add(Dropout(rate=0.5))

      model.add(Dense(10,activation='relu'))
      model.add(Dropout(rate=0.5))
      model.add(Dense(10, activation='softmax'))
      model.compile(optimizer=sgd_optimizer,loss='categorical_crossentropy',metrics=['accuracy'])
      history=model.fit(x=x_train,y=y_train,batch_size=64,
      epochs=250,verbose=1,callbacks=[checkpoint],validation_data=(x_test,y_test))






      keras deep-learning pytorch






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 26 at 17:11









      Moeinh77Moeinh77

      422 silver badges8 bronze badges




      422 silver badges8 bronze badges






















          1 Answer
          1






          active

          oldest

          votes


















          1














          Edit: on a closer look, acc doesn't seem to require gradient, so this paragraph probably doesn't apply.
          It looks like the most significant issue is that total_train_acc accumulates history across the training loop (see https://pytorch.org/docs/stable/notes/faq.html for details).
          Changing total_train_acc += acc to total_train_acc += acc.item() should fix this.



          Another thing you should use with torch.no_grad() for the validation loop.



          Not really about speed, but model.train() and model.eval() should be used for training/evaluation to make batchnorm and dropout layers work in correct mode.






          share|improve this answer
























            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55362722%2fwhy-the-pytorch-implementation-is-so-inefficient%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            Edit: on a closer look, acc doesn't seem to require gradient, so this paragraph probably doesn't apply.
            It looks like the most significant issue is that total_train_acc accumulates history across the training loop (see https://pytorch.org/docs/stable/notes/faq.html for details).
            Changing total_train_acc += acc to total_train_acc += acc.item() should fix this.



            Another thing you should use with torch.no_grad() for the validation loop.



            Not really about speed, but model.train() and model.eval() should be used for training/evaluation to make batchnorm and dropout layers work in correct mode.






            share|improve this answer





























              1














              Edit: on a closer look, acc doesn't seem to require gradient, so this paragraph probably doesn't apply.
              It looks like the most significant issue is that total_train_acc accumulates history across the training loop (see https://pytorch.org/docs/stable/notes/faq.html for details).
              Changing total_train_acc += acc to total_train_acc += acc.item() should fix this.



              Another thing you should use with torch.no_grad() for the validation loop.



              Not really about speed, but model.train() and model.eval() should be used for training/evaluation to make batchnorm and dropout layers work in correct mode.






              share|improve this answer



























                1












                1








                1







                Edit: on a closer look, acc doesn't seem to require gradient, so this paragraph probably doesn't apply.
                It looks like the most significant issue is that total_train_acc accumulates history across the training loop (see https://pytorch.org/docs/stable/notes/faq.html for details).
                Changing total_train_acc += acc to total_train_acc += acc.item() should fix this.



                Another thing you should use with torch.no_grad() for the validation loop.



                Not really about speed, but model.train() and model.eval() should be used for training/evaluation to make batchnorm and dropout layers work in correct mode.






                share|improve this answer















                Edit: on a closer look, acc doesn't seem to require gradient, so this paragraph probably doesn't apply.
                It looks like the most significant issue is that total_train_acc accumulates history across the training loop (see https://pytorch.org/docs/stable/notes/faq.html for details).
                Changing total_train_acc += acc to total_train_acc += acc.item() should fix this.



                Another thing you should use with torch.no_grad() for the validation loop.



                Not really about speed, but model.train() and model.eval() should be used for training/evaluation to make batchnorm and dropout layers work in correct mode.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Mar 27 at 4:17

























                answered Mar 27 at 4:07









                Sergey DymchenkoSergey Dymchenko

                4,9051 gold badge13 silver badges31 bronze badges




                4,9051 gold badge13 silver badges31 bronze badges


















                    Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.







                    Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.



















                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55362722%2fwhy-the-pytorch-implementation-is-so-inefficient%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

                    Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

                    Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript