Reading multiple generic files with tf.dataHow do I check whether a file exists without exceptions?How do I copy a file in Python?How do you read from stdin?How do I list all files of a directory?How to read a file line-by-line into a list?Generate random integers between 0 and 9Catch multiple exceptions in one line (except block)Delete a file or folderHow to read a text file into a string variable and strip newlines?Why is reading lines from stdin much slower in C++ than Python?

Why did the United States not resort to nuclear weapons in Vietnam?

Being told my "network" isn't PCI Complaint. I don't even have a server! Do I have to comply?

Russian pronunciation of /etc (a directory)

Should 2FA be enabled on service accounts?

Could flaps be raised upward to serve as spoilers / lift dumpers?

Can black block with a hanging piece in a back rank mate situation?

Password management for kids - what's a good way to start?

Went to a big 4 but got fired for underperformance in a year recently - Now every one thinks I'm pro - How to balance expectations?

How to trick a fairly simplistic kill-counter?

Why are prop blades not shaped like household fan blades?

Does the problem of P vs NP come under the category of Operational Research?

What is my clock telling me to do?

How does Asimov's second law deal with contradictory orders from different people?

How to structure presentation to avoid getting questions that will be answered later in the presentation?

Cross out words with TikZ: line opacity

Skipping same old introductions

Is it really a problem to declare that a visitor to the UK is my "girlfriend", in terms of her successfully getting a Standard Visitor visa?

Why should I use a big powerstone instead of smaller ones?

What is the significance of $(logname)?

Protect a 6 inch air hose from physical damage

Can birds evolve without trees?

Adjective for when skills are not improving and I'm depressed about it

How is Sword Coast North governed?

How can flights operated by the same company have such different prices when marketed by another?

Reading multiple generic files with tf.data

How do I check whether a file exists without exceptions?How do I copy a file in Python?How do you read from stdin?How do I list all files of a directory?How to read a file line-by-line into a list?Generate random integers between 0 and 9Catch multiple exceptions in one line (except block)Delete a file or folderHow to read a text file into a string variable and strip newlines?Why is reading lines from stdin much slower in C++ than Python?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I'm trying to implement an input pipeline with tf.data.
The features are in a matrix exported from matlab while the labels are in other files that require particular functions in order to be read.

The file names that must be loaded can be computed given a number.

This is how I implemented it

def load_files(k):
 mesh_file = file_path(k, "off", flags.dataset_mesh)
 mat_file = file_path(k, "mat", flags.dataset_mat)

 mesh = pymesh.load_mesh(mesh_file)
 mat = scipy.io.loadmat(mat_file)

 return mesh.vertices, mat


def generator_fn():
 return (load_files(x) for x in range(1000000 + 1))


def input_fn() -> Dataset:
 dataset = tf.data.Dataset.from_generator(generator_fn,
 output_types=(tf.as_dtype(tf.float32), tf.as_dtype(tf.float32)), )
 dataset = dataset.batch(batch_size=flags.batch_size).repeat()
 dataset = dataset.cache()
 dataset = dataset.prefetch(buffer_size=flags.prefetch_buffer_size)
 return dataset

The problem is that the GPU usage is very low, around 5% (2080 ti). I'm not sure where the bottleneck is.
I'm testing with a simple MLP, but the gpu usage doesn't seem to change despite the layers or neurons per layer that I add.

I'm performing the training in this way:

model = keras.Sequential([
 keras.layers.Flatten(input_shape=(n_input,)),
 keras.layers.Dense(1024, activation=tf.nn.relu),
 .
 .
 .
 keras.layers.Dense(1024, activation=tf.nn.relu),
 keras.layers.Dense(n_output, activation=None)
])

model.compile(optimizer='adam', loss='mean_squared_error')

model.fit(input_fn().make_one_shot_iterator(), steps_per_epoch=1000000, epochs=1)

So, I think that the problem may lie: on how I feed the data (the problem shouldn't be just the file reading since I'm on a SSD NVMe), on how I do the training, or that it's just a simple network despite the layers that I add.

However, I would like to know if there is a more efficient way to feed the data.

I'm using tensorflow-gpu 2.0.0a0, I ran a benchmark from lambda-labs and it was able to use the gpu at 100%

asked Mar 26 at 23:35

Luca

1612 silver badges9 bronze badges

Try to feed your data using some standard single file format such as TFRecord or HDF5 that will definitely help.

– lifeisshubh
Mar 26 at 23:44

What datatype does load_files return? Have you tried profiling it? In what file format are data stored in both file_path ? You don't need this input_fn().make_one_shot_iterator() I guess the problem is in this line load_files(x) for x in range

– Sharky
Mar 27 at 1:00

I'm a bit busy this week, I'll definetly try to convert the whole dataset (it's quite heavy, around 30GB) in TFRecord. I don't particularly like this idea since I'd like to keep the data compatible with matlab, and I'd like to streamline the dataset production as much as possible. load_files returns two matrices of floats: one with shape (n, 3), i.e. the vertices of a mesh, the other with shape (n,), i.e. a function defined on the vertices of such mesh. I don't know to profile it, I'll have to check it. Why should the problem be the generator?

– Luca
Mar 27 at 19:35

Doesn't the GPU load change when you up the batch_size?

– Szymon Maszke
Apr 8 at 11:13

add a comment |

I'm trying to implement an input pipeline with tf.data.
The features are in a matrix exported from matlab while the labels are in other files that require particular functions in order to be read.

The file names that must be loaded can be computed given a number.

This is how I implemented it

def load_files(k):
 mesh_file = file_path(k, "off", flags.dataset_mesh)
 mat_file = file_path(k, "mat", flags.dataset_mat)

 mesh = pymesh.load_mesh(mesh_file)
 mat = scipy.io.loadmat(mat_file)

 return mesh.vertices, mat


def generator_fn():
 return (load_files(x) for x in range(1000000 + 1))


def input_fn() -> Dataset:
 dataset = tf.data.Dataset.from_generator(generator_fn,
 output_types=(tf.as_dtype(tf.float32), tf.as_dtype(tf.float32)), )
 dataset = dataset.batch(batch_size=flags.batch_size).repeat()
 dataset = dataset.cache()
 dataset = dataset.prefetch(buffer_size=flags.prefetch_buffer_size)
 return dataset

I'm performing the training in this way:

model = keras.Sequential([
 keras.layers.Flatten(input_shape=(n_input,)),
 keras.layers.Dense(1024, activation=tf.nn.relu),
 .
 .
 .
 keras.layers.Dense(1024, activation=tf.nn.relu),
 keras.layers.Dense(n_output, activation=None)
])

model.compile(optimizer='adam', loss='mean_squared_error')

model.fit(input_fn().make_one_shot_iterator(), steps_per_epoch=1000000, epochs=1)

However, I would like to know if there is a more efficient way to feed the data.

I'm using tensorflow-gpu 2.0.0a0, I ran a benchmark from lambda-labs and it was able to use the gpu at 100%

asked Mar 26 at 23:35

Luca

1612 silver badges9 bronze badges

Try to feed your data using some standard single file format such as TFRecord or HDF5 that will definitely help.

– lifeisshubh
Mar 26 at 23:44

What datatype does load_files return? Have you tried profiling it? In what file format are data stored in both file_path ? You don't need this input_fn().make_one_shot_iterator() I guess the problem is in this line load_files(x) for x in range

– Sharky
Mar 27 at 1:00

I'm a bit busy this week, I'll definetly try to convert the whole dataset (it's quite heavy, around 30GB) in TFRecord. I don't particularly like this idea since I'd like to keep the data compatible with matlab, and I'd like to streamline the dataset production as much as possible. load_files returns two matrices of floats: one with shape (n, 3), i.e. the vertices of a mesh, the other with shape (n,), i.e. a function defined on the vertices of such mesh. I don't know to profile it, I'll have to check it. Why should the problem be the generator?

– Luca
Mar 27 at 19:35

Doesn't the GPU load change when you up the batch_size?

– Szymon Maszke
Apr 8 at 11:13

add a comment |

I'm trying to implement an input pipeline with tf.data.
The features are in a matrix exported from matlab while the labels are in other files that require particular functions in order to be read.

The file names that must be loaded can be computed given a number.

This is how I implemented it

def load_files(k):
 mesh_file = file_path(k, "off", flags.dataset_mesh)
 mat_file = file_path(k, "mat", flags.dataset_mat)

 mesh = pymesh.load_mesh(mesh_file)
 mat = scipy.io.loadmat(mat_file)

 return mesh.vertices, mat


def generator_fn():
 return (load_files(x) for x in range(1000000 + 1))


def input_fn() -> Dataset:
 dataset = tf.data.Dataset.from_generator(generator_fn,
 output_types=(tf.as_dtype(tf.float32), tf.as_dtype(tf.float32)), )
 dataset = dataset.batch(batch_size=flags.batch_size).repeat()
 dataset = dataset.cache()
 dataset = dataset.prefetch(buffer_size=flags.prefetch_buffer_size)
 return dataset

I'm performing the training in this way:

model = keras.Sequential([
 keras.layers.Flatten(input_shape=(n_input,)),
 keras.layers.Dense(1024, activation=tf.nn.relu),
 .
 .
 .
 keras.layers.Dense(1024, activation=tf.nn.relu),
 keras.layers.Dense(n_output, activation=None)
])

model.compile(optimizer='adam', loss='mean_squared_error')

model.fit(input_fn().make_one_shot_iterator(), steps_per_epoch=1000000, epochs=1)

However, I would like to know if there is a more efficient way to feed the data.

I'm using tensorflow-gpu 2.0.0a0, I ran a benchmark from lambda-labs and it was able to use the gpu at 100%

asked Mar 26 at 23:35

Luca

1612 silver badges9 bronze badges

I'm trying to implement an input pipeline with tf.data.
The features are in a matrix exported from matlab while the labels are in other files that require particular functions in order to be read.

The file names that must be loaded can be computed given a number.

This is how I implemented it

def load_files(k):
 mesh_file = file_path(k, "off", flags.dataset_mesh)
 mat_file = file_path(k, "mat", flags.dataset_mat)

 mesh = pymesh.load_mesh(mesh_file)
 mat = scipy.io.loadmat(mat_file)

 return mesh.vertices, mat


def generator_fn():
 return (load_files(x) for x in range(1000000 + 1))


def input_fn() -> Dataset:
 dataset = tf.data.Dataset.from_generator(generator_fn,
 output_types=(tf.as_dtype(tf.float32), tf.as_dtype(tf.float32)), )
 dataset = dataset.batch(batch_size=flags.batch_size).repeat()
 dataset = dataset.cache()
 dataset = dataset.prefetch(buffer_size=flags.prefetch_buffer_size)
 return dataset

I'm performing the training in this way:

model = keras.Sequential([
 keras.layers.Flatten(input_shape=(n_input,)),
 keras.layers.Dense(1024, activation=tf.nn.relu),
 .
 .
 .
 keras.layers.Dense(1024, activation=tf.nn.relu),
 keras.layers.Dense(n_output, activation=None)
])

model.compile(optimizer='adam', loss='mean_squared_error')

model.fit(input_fn().make_one_shot_iterator(), steps_per_epoch=1000000, epochs=1)

However, I would like to know if there is a more efficient way to feed the data.

I'm using tensorflow-gpu 2.0.0a0, I ran a benchmark from lambda-labs and it was able to use the gpu at 100%

python python-3.x tensorflow tensorflow-datasets tensorflow2.0

asked Mar 26 at 23:35

Luca

1612 silver badges9 bronze badges

asked Mar 26 at 23:35

Luca

1612 silver badges9 bronze badges

asked Mar 26 at 23:35

Luca

1612 silver badges9 bronze badges

asked Mar 26 at 23:35

Luca

1612 silver badges9 bronze badges

asked Mar 26 at 23:35

Luca

1612 silver badges9 bronze badges

Try to feed your data using some standard single file format such as TFRecord or HDF5 that will definitely help.

– lifeisshubh
Mar 26 at 23:44

What datatype does load_files return? Have you tried profiling it? In what file format are data stored in both file_path ? You don't need this input_fn().make_one_shot_iterator() I guess the problem is in this line load_files(x) for x in range

– Sharky
Mar 27 at 1:00

I'm a bit busy this week, I'll definetly try to convert the whole dataset (it's quite heavy, around 30GB) in TFRecord. I don't particularly like this idea since I'd like to keep the data compatible with matlab, and I'd like to streamline the dataset production as much as possible. load_files returns two matrices of floats: one with shape (n, 3), i.e. the vertices of a mesh, the other with shape (n,), i.e. a function defined on the vertices of such mesh. I don't know to profile it, I'll have to check it. Why should the problem be the generator?

– Luca
Mar 27 at 19:35

Doesn't the GPU load change when you up the batch_size?

– Szymon Maszke
Apr 8 at 11:13

add a comment |

Try to feed your data using some standard single file format such as TFRecord or HDF5 that will definitely help.

– lifeisshubh
Mar 26 at 23:44

What datatype does load_files return? Have you tried profiling it? In what file format are data stored in both file_path ? You don't need this input_fn().make_one_shot_iterator() I guess the problem is in this line load_files(x) for x in range

– Sharky
Mar 27 at 1:00

I'm a bit busy this week, I'll definetly try to convert the whole dataset (it's quite heavy, around 30GB) in TFRecord. I don't particularly like this idea since I'd like to keep the data compatible with matlab, and I'd like to streamline the dataset production as much as possible. load_files returns two matrices of floats: one with shape (n, 3), i.e. the vertices of a mesh, the other with shape (n,), i.e. a function defined on the vertices of such mesh. I don't know to profile it, I'll have to check it. Why should the problem be the generator?

– Luca
Mar 27 at 19:35

Doesn't the GPU load change when you up the batch_size?

– Szymon Maszke
Apr 8 at 11:13

Try to feed your data using some standard single file format such as TFRecord or HDF5 that will definitely help.

– lifeisshubh
Mar 26 at 23:44

What datatype does load_files return? Have you tried profiling it? In what file format are data stored in both file_path ? You don't need this input_fn().make_one_shot_iterator() I guess the problem is in this line load_files(x) for x in range

– Sharky
Mar 27 at 1:00

I'm a bit busy this week, I'll definetly try to convert the whole dataset (it's quite heavy, around 30GB) in TFRecord. I don't particularly like this idea since I'd like to keep the data compatible with matlab, and I'd like to streamline the dataset production as much as possible. load_files returns two matrices of floats: one with shape (n, 3), i.e. the vertices of a mesh, the other with shape (n,), i.e. a function defined on the vertices of such mesh. I don't know to profile it, I'll have to check it. Why should the problem be the generator?

– Luca
Mar 27 at 19:35

Doesn't the GPU load change when you up the batch_size?

– Szymon Maszke
Apr 8 at 11:13

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55367692%2freading-multiple-generic-files-with-tf-data%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현