Reading multiple generic files with tf.dataHow do I check whether a file exists without exceptions?How do I copy a file in Python?How do you read from stdin?How do I list all files of a directory?How to read a file line-by-line into a list?Generate random integers between 0 and 9Catch multiple exceptions in one line (except block)Delete a file or folderHow to read a text file into a string variable and strip newlines?Why is reading lines from stdin much slower in C++ than Python?

Why did the United States not resort to nuclear weapons in Vietnam?

Being told my "network" isn't PCI Complaint. I don't even have a server! Do I have to comply?

Russian pronunciation of /etc (a directory)

Should 2FA be enabled on service accounts?

Could flaps be raised upward to serve as spoilers / lift dumpers?

Can black block with a hanging piece in a back rank mate situation?

Password management for kids - what's a good way to start?

Went to a big 4 but got fired for underperformance in a year recently - Now every one thinks I'm pro - How to balance expectations?

How to trick a fairly simplistic kill-counter?

Why are prop blades not shaped like household fan blades?

Does the problem of P vs NP come under the category of Operational Research?

What is my clock telling me to do?

How does Asimov's second law deal with contradictory orders from different people?

How to structure presentation to avoid getting questions that will be answered later in the presentation?

Cross out words with TikZ: line opacity

Skipping same old introductions

Is it really a problem to declare that a visitor to the UK is my "girlfriend", in terms of her successfully getting a Standard Visitor visa?

Why should I use a big powerstone instead of smaller ones?

What is the significance of $(logname)?

Protect a 6 inch air hose from physical damage

Can birds evolve without trees?

Adjective for when skills are not improving and I'm depressed about it

How is Sword Coast North governed?

How can flights operated by the same company have such different prices when marketed by another?



Reading multiple generic files with tf.data


How do I check whether a file exists without exceptions?How do I copy a file in Python?How do you read from stdin?How do I list all files of a directory?How to read a file line-by-line into a list?Generate random integers between 0 and 9Catch multiple exceptions in one line (except block)Delete a file or folderHow to read a text file into a string variable and strip newlines?Why is reading lines from stdin much slower in C++ than Python?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








3















I'm trying to implement an input pipeline with tf.data.
The features are in a matrix exported from matlab while the labels are in other files that require particular functions in order to be read.



The file names that must be loaded can be computed given a number.



This is how I implemented it



def load_files(k):
mesh_file = file_path(k, "off", flags.dataset_mesh)
mat_file = file_path(k, "mat", flags.dataset_mat)

mesh = pymesh.load_mesh(mesh_file)
mat = scipy.io.loadmat(mat_file)

return mesh.vertices, mat


def generator_fn():
return (load_files(x) for x in range(1000000 + 1))


def input_fn() -> Dataset:
dataset = tf.data.Dataset.from_generator(generator_fn,
output_types=(tf.as_dtype(tf.float32), tf.as_dtype(tf.float32)), )
dataset = dataset.batch(batch_size=flags.batch_size).repeat()
dataset = dataset.cache()
dataset = dataset.prefetch(buffer_size=flags.prefetch_buffer_size)
return dataset


The problem is that the GPU usage is very low, around 5% (2080 ti). I'm not sure where the bottleneck is.
I'm testing with a simple MLP, but the gpu usage doesn't seem to change despite the layers or neurons per layer that I add.



I'm performing the training in this way:



model = keras.Sequential([
keras.layers.Flatten(input_shape=(n_input,)),
keras.layers.Dense(1024, activation=tf.nn.relu),
.
.
.
keras.layers.Dense(1024, activation=tf.nn.relu),
keras.layers.Dense(n_output, activation=None)
])

model.compile(optimizer='adam', loss='mean_squared_error')

model.fit(input_fn().make_one_shot_iterator(), steps_per_epoch=1000000, epochs=1)


So, I think that the problem may lie: on how I feed the data (the problem shouldn't be just the file reading since I'm on a SSD NVMe), on how I do the training, or that it's just a simple network despite the layers that I add.



However, I would like to know if there is a more efficient way to feed the data.



I'm using tensorflow-gpu 2.0.0a0, I ran a benchmark from lambda-labs and it was able to use the gpu at 100%










share|improve this question
























  • Try to feed your data using some standard single file format such as TFRecord or HDF5 that will definitely help.

    – lifeisshubh
    Mar 26 at 23:44











  • What datatype does load_files return? Have you tried profiling it? In what file format are data stored in both file_path ? You don't need this input_fn().make_one_shot_iterator() I guess the problem is in this line load_files(x) for x in range

    – Sharky
    Mar 27 at 1:00












  • I'm a bit busy this week, I'll definetly try to convert the whole dataset (it's quite heavy, around 30GB) in TFRecord. I don't particularly like this idea since I'd like to keep the data compatible with matlab, and I'd like to streamline the dataset production as much as possible. load_files returns two matrices of floats: one with shape (n, 3), i.e. the vertices of a mesh, the other with shape (n,), i.e. a function defined on the vertices of such mesh. I don't know to profile it, I'll have to check it. Why should the problem be the generator?

    – Luca
    Mar 27 at 19:35












  • Doesn't the GPU load change when you up the batch_size?

    – Szymon Maszke
    Apr 8 at 11:13

















3















I'm trying to implement an input pipeline with tf.data.
The features are in a matrix exported from matlab while the labels are in other files that require particular functions in order to be read.



The file names that must be loaded can be computed given a number.



This is how I implemented it



def load_files(k):
mesh_file = file_path(k, "off", flags.dataset_mesh)
mat_file = file_path(k, "mat", flags.dataset_mat)

mesh = pymesh.load_mesh(mesh_file)
mat = scipy.io.loadmat(mat_file)

return mesh.vertices, mat


def generator_fn():
return (load_files(x) for x in range(1000000 + 1))


def input_fn() -> Dataset:
dataset = tf.data.Dataset.from_generator(generator_fn,
output_types=(tf.as_dtype(tf.float32), tf.as_dtype(tf.float32)), )
dataset = dataset.batch(batch_size=flags.batch_size).repeat()
dataset = dataset.cache()
dataset = dataset.prefetch(buffer_size=flags.prefetch_buffer_size)
return dataset


The problem is that the GPU usage is very low, around 5% (2080 ti). I'm not sure where the bottleneck is.
I'm testing with a simple MLP, but the gpu usage doesn't seem to change despite the layers or neurons per layer that I add.



I'm performing the training in this way:



model = keras.Sequential([
keras.layers.Flatten(input_shape=(n_input,)),
keras.layers.Dense(1024, activation=tf.nn.relu),
.
.
.
keras.layers.Dense(1024, activation=tf.nn.relu),
keras.layers.Dense(n_output, activation=None)
])

model.compile(optimizer='adam', loss='mean_squared_error')

model.fit(input_fn().make_one_shot_iterator(), steps_per_epoch=1000000, epochs=1)


So, I think that the problem may lie: on how I feed the data (the problem shouldn't be just the file reading since I'm on a SSD NVMe), on how I do the training, or that it's just a simple network despite the layers that I add.



However, I would like to know if there is a more efficient way to feed the data.



I'm using tensorflow-gpu 2.0.0a0, I ran a benchmark from lambda-labs and it was able to use the gpu at 100%










share|improve this question
























  • Try to feed your data using some standard single file format such as TFRecord or HDF5 that will definitely help.

    – lifeisshubh
    Mar 26 at 23:44











  • What datatype does load_files return? Have you tried profiling it? In what file format are data stored in both file_path ? You don't need this input_fn().make_one_shot_iterator() I guess the problem is in this line load_files(x) for x in range

    – Sharky
    Mar 27 at 1:00












  • I'm a bit busy this week, I'll definetly try to convert the whole dataset (it's quite heavy, around 30GB) in TFRecord. I don't particularly like this idea since I'd like to keep the data compatible with matlab, and I'd like to streamline the dataset production as much as possible. load_files returns two matrices of floats: one with shape (n, 3), i.e. the vertices of a mesh, the other with shape (n,), i.e. a function defined on the vertices of such mesh. I don't know to profile it, I'll have to check it. Why should the problem be the generator?

    – Luca
    Mar 27 at 19:35












  • Doesn't the GPU load change when you up the batch_size?

    – Szymon Maszke
    Apr 8 at 11:13













3












3








3








I'm trying to implement an input pipeline with tf.data.
The features are in a matrix exported from matlab while the labels are in other files that require particular functions in order to be read.



The file names that must be loaded can be computed given a number.



This is how I implemented it



def load_files(k):
mesh_file = file_path(k, "off", flags.dataset_mesh)
mat_file = file_path(k, "mat", flags.dataset_mat)

mesh = pymesh.load_mesh(mesh_file)
mat = scipy.io.loadmat(mat_file)

return mesh.vertices, mat


def generator_fn():
return (load_files(x) for x in range(1000000 + 1))


def input_fn() -> Dataset:
dataset = tf.data.Dataset.from_generator(generator_fn,
output_types=(tf.as_dtype(tf.float32), tf.as_dtype(tf.float32)), )
dataset = dataset.batch(batch_size=flags.batch_size).repeat()
dataset = dataset.cache()
dataset = dataset.prefetch(buffer_size=flags.prefetch_buffer_size)
return dataset


The problem is that the GPU usage is very low, around 5% (2080 ti). I'm not sure where the bottleneck is.
I'm testing with a simple MLP, but the gpu usage doesn't seem to change despite the layers or neurons per layer that I add.



I'm performing the training in this way:



model = keras.Sequential([
keras.layers.Flatten(input_shape=(n_input,)),
keras.layers.Dense(1024, activation=tf.nn.relu),
.
.
.
keras.layers.Dense(1024, activation=tf.nn.relu),
keras.layers.Dense(n_output, activation=None)
])

model.compile(optimizer='adam', loss='mean_squared_error')

model.fit(input_fn().make_one_shot_iterator(), steps_per_epoch=1000000, epochs=1)


So, I think that the problem may lie: on how I feed the data (the problem shouldn't be just the file reading since I'm on a SSD NVMe), on how I do the training, or that it's just a simple network despite the layers that I add.



However, I would like to know if there is a more efficient way to feed the data.



I'm using tensorflow-gpu 2.0.0a0, I ran a benchmark from lambda-labs and it was able to use the gpu at 100%










share|improve this question














I'm trying to implement an input pipeline with tf.data.
The features are in a matrix exported from matlab while the labels are in other files that require particular functions in order to be read.



The file names that must be loaded can be computed given a number.



This is how I implemented it



def load_files(k):
mesh_file = file_path(k, "off", flags.dataset_mesh)
mat_file = file_path(k, "mat", flags.dataset_mat)

mesh = pymesh.load_mesh(mesh_file)
mat = scipy.io.loadmat(mat_file)

return mesh.vertices, mat


def generator_fn():
return (load_files(x) for x in range(1000000 + 1))


def input_fn() -> Dataset:
dataset = tf.data.Dataset.from_generator(generator_fn,
output_types=(tf.as_dtype(tf.float32), tf.as_dtype(tf.float32)), )
dataset = dataset.batch(batch_size=flags.batch_size).repeat()
dataset = dataset.cache()
dataset = dataset.prefetch(buffer_size=flags.prefetch_buffer_size)
return dataset


The problem is that the GPU usage is very low, around 5% (2080 ti). I'm not sure where the bottleneck is.
I'm testing with a simple MLP, but the gpu usage doesn't seem to change despite the layers or neurons per layer that I add.



I'm performing the training in this way:



model = keras.Sequential([
keras.layers.Flatten(input_shape=(n_input,)),
keras.layers.Dense(1024, activation=tf.nn.relu),
.
.
.
keras.layers.Dense(1024, activation=tf.nn.relu),
keras.layers.Dense(n_output, activation=None)
])

model.compile(optimizer='adam', loss='mean_squared_error')

model.fit(input_fn().make_one_shot_iterator(), steps_per_epoch=1000000, epochs=1)


So, I think that the problem may lie: on how I feed the data (the problem shouldn't be just the file reading since I'm on a SSD NVMe), on how I do the training, or that it's just a simple network despite the layers that I add.



However, I would like to know if there is a more efficient way to feed the data.



I'm using tensorflow-gpu 2.0.0a0, I ran a benchmark from lambda-labs and it was able to use the gpu at 100%







python python-3.x tensorflow tensorflow-datasets tensorflow2.0






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 26 at 23:35









LucaLuca

1612 silver badges9 bronze badges




1612 silver badges9 bronze badges















  • Try to feed your data using some standard single file format such as TFRecord or HDF5 that will definitely help.

    – lifeisshubh
    Mar 26 at 23:44











  • What datatype does load_files return? Have you tried profiling it? In what file format are data stored in both file_path ? You don't need this input_fn().make_one_shot_iterator() I guess the problem is in this line load_files(x) for x in range

    – Sharky
    Mar 27 at 1:00












  • I'm a bit busy this week, I'll definetly try to convert the whole dataset (it's quite heavy, around 30GB) in TFRecord. I don't particularly like this idea since I'd like to keep the data compatible with matlab, and I'd like to streamline the dataset production as much as possible. load_files returns two matrices of floats: one with shape (n, 3), i.e. the vertices of a mesh, the other with shape (n,), i.e. a function defined on the vertices of such mesh. I don't know to profile it, I'll have to check it. Why should the problem be the generator?

    – Luca
    Mar 27 at 19:35












  • Doesn't the GPU load change when you up the batch_size?

    – Szymon Maszke
    Apr 8 at 11:13

















  • Try to feed your data using some standard single file format such as TFRecord or HDF5 that will definitely help.

    – lifeisshubh
    Mar 26 at 23:44











  • What datatype does load_files return? Have you tried profiling it? In what file format are data stored in both file_path ? You don't need this input_fn().make_one_shot_iterator() I guess the problem is in this line load_files(x) for x in range

    – Sharky
    Mar 27 at 1:00












  • I'm a bit busy this week, I'll definetly try to convert the whole dataset (it's quite heavy, around 30GB) in TFRecord. I don't particularly like this idea since I'd like to keep the data compatible with matlab, and I'd like to streamline the dataset production as much as possible. load_files returns two matrices of floats: one with shape (n, 3), i.e. the vertices of a mesh, the other with shape (n,), i.e. a function defined on the vertices of such mesh. I don't know to profile it, I'll have to check it. Why should the problem be the generator?

    – Luca
    Mar 27 at 19:35












  • Doesn't the GPU load change when you up the batch_size?

    – Szymon Maszke
    Apr 8 at 11:13
















Try to feed your data using some standard single file format such as TFRecord or HDF5 that will definitely help.

– lifeisshubh
Mar 26 at 23:44





Try to feed your data using some standard single file format such as TFRecord or HDF5 that will definitely help.

– lifeisshubh
Mar 26 at 23:44













What datatype does load_files return? Have you tried profiling it? In what file format are data stored in both file_path ? You don't need this input_fn().make_one_shot_iterator() I guess the problem is in this line load_files(x) for x in range

– Sharky
Mar 27 at 1:00






What datatype does load_files return? Have you tried profiling it? In what file format are data stored in both file_path ? You don't need this input_fn().make_one_shot_iterator() I guess the problem is in this line load_files(x) for x in range

– Sharky
Mar 27 at 1:00














I'm a bit busy this week, I'll definetly try to convert the whole dataset (it's quite heavy, around 30GB) in TFRecord. I don't particularly like this idea since I'd like to keep the data compatible with matlab, and I'd like to streamline the dataset production as much as possible. load_files returns two matrices of floats: one with shape (n, 3), i.e. the vertices of a mesh, the other with shape (n,), i.e. a function defined on the vertices of such mesh. I don't know to profile it, I'll have to check it. Why should the problem be the generator?

– Luca
Mar 27 at 19:35






I'm a bit busy this week, I'll definetly try to convert the whole dataset (it's quite heavy, around 30GB) in TFRecord. I don't particularly like this idea since I'd like to keep the data compatible with matlab, and I'd like to streamline the dataset production as much as possible. load_files returns two matrices of floats: one with shape (n, 3), i.e. the vertices of a mesh, the other with shape (n,), i.e. a function defined on the vertices of such mesh. I don't know to profile it, I'll have to check it. Why should the problem be the generator?

– Luca
Mar 27 at 19:35














Doesn't the GPU load change when you up the batch_size?

– Szymon Maszke
Apr 8 at 11:13





Doesn't the GPU load change when you up the batch_size?

– Szymon Maszke
Apr 8 at 11:13












0






active

oldest

votes










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55367692%2freading-multiple-generic-files-with-tf-data%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes




Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.







Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55367692%2freading-multiple-generic-files-with-tf-data%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript