OpenCL Pipeline failed to allocate buffer with cl_mem_object_allocation_failureHow do I determine available device memory in OpenCL?OpenCL - Multiple GPU Buffer SynchronizationOpenCL same code different results: 1.Nvidia760 2.Nvidia560 3.VivanteOpenCL - what happens if GPU memory is larger than system RAMGot completely confused on how to OpenCL data transferDirect frame buffer access using OpenCL and SWTHow to create read-only memory buffer across multiple devices in OpenCL?With OpenCL, How to get GPU memory usage?OpenCL: How would one split an existing buffer into two?trying to compile OpenCL 1.2 exampleUsing structure as buffer holder
Are the A380 engines interchangeable (given they are not all equipped with reverse)?
What is the best type of paint to paint a shipping container?
What verb is かまされる?
Lost property on Portuguese trains
Round towards zero
How many US airports have 4 or more parallel runways?
Would it be possible to have a GMO that produces chocolate?
Where was Carl Sagan working on a plan to detonate a nuke on the Moon? Where was he applying when he leaked it?
Did a flight controller ever answer Flight with a no-go?
Prove your innocence
Sum ergo cogito?
Read file lines into shell line separated by space
Can a Rogue PC teach an NPC to perform Sneak Attack?
Sql server sleeping state is increasing using ADO.NET?
How do I, an introvert, communicate to my friend and only colleague, an extrovert, that I want to spend my scheduled breaks without them?
Is gzip atomic?
Why in most German places is the church the tallest building?
Are modern clipless shoes and pedals that much better than toe clips and straps?
Uri tokenizer as a simple state machine
How do the Etherealness and Banishment spells interact?
Did anyone try to find the little box that held Professor Moriarty and his wife after the crash?
Why doesn't 'd /= d' throw a division by zero exception?
Is there any practical application for performing a double Fourier transform? ...or an inverse Fourier transform on a time-domain input?
Nothing like a good ol' game of ModTen
OpenCL Pipeline failed to allocate buffer with cl_mem_object_allocation_failure
How do I determine available device memory in OpenCL?OpenCL - Multiple GPU Buffer SynchronizationOpenCL same code different results: 1.Nvidia760 2.Nvidia560 3.VivanteOpenCL - what happens if GPU memory is larger than system RAMGot completely confused on how to OpenCL data transferDirect frame buffer access using OpenCL and SWTHow to create read-only memory buffer across multiple devices in OpenCL?With OpenCL, How to get GPU memory usage?OpenCL: How would one split an existing buffer into two?trying to compile OpenCL 1.2 exampleUsing structure as buffer holder
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I have an OpenCL pipeline that process image/video and it can be greedy with the memory sometimes. It is crashing on cl::Buffer() allocation like this:
cl_int err = CL_SUCCESS;
cl::Buffer tmp = cl::Buffer(m_context, CL_MEM_READ_WRITE, sizeData, NULL, &err);
with the error -4 - cl_mem_object_allocation_failure
.
This occurs at a fix point in my pipeline by using very large images. If I just downscale the image a bit, it pass through the pipeline at this very memory intensive part.
I have access to a Nvidia card with 4go that bust at a certain point, and also tried on an AMD GPU with 2go which bust earlier.
According to this thread, there is no need to know the current allocation due to swapping with VRAM, but it seems that my pipeline bust the memory of my device.
So here are my question:
1) Is there any settings on my computer, or pipeline to set to allow more VRAM ?
2) Is it okay to use CL_DEVICE_GLOBAL_MEM_SIZE
as reference of the maximum size to allocate, or I need to do CL_DEVICE_GLOBAL_MEM_SIZE
- (local memory + private), or something like that ?
According to my own memory profiler, I have 92% of the CL_DEVICE_GLOBAL_MEM_SIZE
allocated at the crash. And by resizing a bit, the pipeline says that I used 89% on the resized image and it passed, so I assume that my large image is on the edge to pass.
c++ opencl
add a comment |
I have an OpenCL pipeline that process image/video and it can be greedy with the memory sometimes. It is crashing on cl::Buffer() allocation like this:
cl_int err = CL_SUCCESS;
cl::Buffer tmp = cl::Buffer(m_context, CL_MEM_READ_WRITE, sizeData, NULL, &err);
with the error -4 - cl_mem_object_allocation_failure
.
This occurs at a fix point in my pipeline by using very large images. If I just downscale the image a bit, it pass through the pipeline at this very memory intensive part.
I have access to a Nvidia card with 4go that bust at a certain point, and also tried on an AMD GPU with 2go which bust earlier.
According to this thread, there is no need to know the current allocation due to swapping with VRAM, but it seems that my pipeline bust the memory of my device.
So here are my question:
1) Is there any settings on my computer, or pipeline to set to allow more VRAM ?
2) Is it okay to use CL_DEVICE_GLOBAL_MEM_SIZE
as reference of the maximum size to allocate, or I need to do CL_DEVICE_GLOBAL_MEM_SIZE
- (local memory + private), or something like that ?
According to my own memory profiler, I have 92% of the CL_DEVICE_GLOBAL_MEM_SIZE
allocated at the crash. And by resizing a bit, the pipeline says that I used 89% on the resized image and it passed, so I assume that my large image is on the edge to pass.
c++ opencl
1
You can use host memory instead i.e.clCreateBuffer( ... | CL_MEM_USE_HOST_PTR , ..., size, host_ptr, ...);
– Victor Gubin
Mar 27 at 17:31
@VictorGubin By using Host memory, I need to provide a pointer on the host, but If I want to allocate it only on the device, because I never need it on host (aka host_ptr == NULL) what should I do ? I would like to avoid the transfer as much as possible.
– Vuwox
Mar 27 at 17:49
pointer is a memory address aka size_t (unsigned long or unsigned long long depending on CPU arch ). When you creating a cl buffer using a host ptr - it means GPU will use existing memory ( i.e. RAM) at the address of pointer, and nothing will be copied from RAM to VRAM. Otherwise clCreateBuffer will allocate a memory block in VRAM, and return you a pointer on the memory block allocated.
– Victor Gubin
Mar 28 at 11:38
Yes, I understand all of this. But by allocating using CL_MEM_USE_HOST_PTR, I first need to allocate something on CPU to point onto, and when calling the buffer, I need to wait the bandwitdh for the transfer of that memory on the GPU device, but by allocating without the flag, and NULL pointer, its allocating it directly on the device without any transfer requires, which is super fast, specially when you need memory to reside on the device and never query on the host. But Im just wondering if there is a way to tell when to stop allocate like that, and switch to CL_MEM_USE_HOST_PTR maybe.
– Vuwox
Mar 28 at 13:17
add a comment |
I have an OpenCL pipeline that process image/video and it can be greedy with the memory sometimes. It is crashing on cl::Buffer() allocation like this:
cl_int err = CL_SUCCESS;
cl::Buffer tmp = cl::Buffer(m_context, CL_MEM_READ_WRITE, sizeData, NULL, &err);
with the error -4 - cl_mem_object_allocation_failure
.
This occurs at a fix point in my pipeline by using very large images. If I just downscale the image a bit, it pass through the pipeline at this very memory intensive part.
I have access to a Nvidia card with 4go that bust at a certain point, and also tried on an AMD GPU with 2go which bust earlier.
According to this thread, there is no need to know the current allocation due to swapping with VRAM, but it seems that my pipeline bust the memory of my device.
So here are my question:
1) Is there any settings on my computer, or pipeline to set to allow more VRAM ?
2) Is it okay to use CL_DEVICE_GLOBAL_MEM_SIZE
as reference of the maximum size to allocate, or I need to do CL_DEVICE_GLOBAL_MEM_SIZE
- (local memory + private), or something like that ?
According to my own memory profiler, I have 92% of the CL_DEVICE_GLOBAL_MEM_SIZE
allocated at the crash. And by resizing a bit, the pipeline says that I used 89% on the resized image and it passed, so I assume that my large image is on the edge to pass.
c++ opencl
I have an OpenCL pipeline that process image/video and it can be greedy with the memory sometimes. It is crashing on cl::Buffer() allocation like this:
cl_int err = CL_SUCCESS;
cl::Buffer tmp = cl::Buffer(m_context, CL_MEM_READ_WRITE, sizeData, NULL, &err);
with the error -4 - cl_mem_object_allocation_failure
.
This occurs at a fix point in my pipeline by using very large images. If I just downscale the image a bit, it pass through the pipeline at this very memory intensive part.
I have access to a Nvidia card with 4go that bust at a certain point, and also tried on an AMD GPU with 2go which bust earlier.
According to this thread, there is no need to know the current allocation due to swapping with VRAM, but it seems that my pipeline bust the memory of my device.
So here are my question:
1) Is there any settings on my computer, or pipeline to set to allow more VRAM ?
2) Is it okay to use CL_DEVICE_GLOBAL_MEM_SIZE
as reference of the maximum size to allocate, or I need to do CL_DEVICE_GLOBAL_MEM_SIZE
- (local memory + private), or something like that ?
According to my own memory profiler, I have 92% of the CL_DEVICE_GLOBAL_MEM_SIZE
allocated at the crash. And by resizing a bit, the pipeline says that I used 89% on the resized image and it passed, so I assume that my large image is on the edge to pass.
c++ opencl
c++ opencl
edited Mar 27 at 18:55
Vuwox
asked Mar 27 at 17:14
VuwoxVuwox
1,97413 silver badges28 bronze badges
1,97413 silver badges28 bronze badges
1
You can use host memory instead i.e.clCreateBuffer( ... | CL_MEM_USE_HOST_PTR , ..., size, host_ptr, ...);
– Victor Gubin
Mar 27 at 17:31
@VictorGubin By using Host memory, I need to provide a pointer on the host, but If I want to allocate it only on the device, because I never need it on host (aka host_ptr == NULL) what should I do ? I would like to avoid the transfer as much as possible.
– Vuwox
Mar 27 at 17:49
pointer is a memory address aka size_t (unsigned long or unsigned long long depending on CPU arch ). When you creating a cl buffer using a host ptr - it means GPU will use existing memory ( i.e. RAM) at the address of pointer, and nothing will be copied from RAM to VRAM. Otherwise clCreateBuffer will allocate a memory block in VRAM, and return you a pointer on the memory block allocated.
– Victor Gubin
Mar 28 at 11:38
Yes, I understand all of this. But by allocating using CL_MEM_USE_HOST_PTR, I first need to allocate something on CPU to point onto, and when calling the buffer, I need to wait the bandwitdh for the transfer of that memory on the GPU device, but by allocating without the flag, and NULL pointer, its allocating it directly on the device without any transfer requires, which is super fast, specially when you need memory to reside on the device and never query on the host. But Im just wondering if there is a way to tell when to stop allocate like that, and switch to CL_MEM_USE_HOST_PTR maybe.
– Vuwox
Mar 28 at 13:17
add a comment |
1
You can use host memory instead i.e.clCreateBuffer( ... | CL_MEM_USE_HOST_PTR , ..., size, host_ptr, ...);
– Victor Gubin
Mar 27 at 17:31
@VictorGubin By using Host memory, I need to provide a pointer on the host, but If I want to allocate it only on the device, because I never need it on host (aka host_ptr == NULL) what should I do ? I would like to avoid the transfer as much as possible.
– Vuwox
Mar 27 at 17:49
pointer is a memory address aka size_t (unsigned long or unsigned long long depending on CPU arch ). When you creating a cl buffer using a host ptr - it means GPU will use existing memory ( i.e. RAM) at the address of pointer, and nothing will be copied from RAM to VRAM. Otherwise clCreateBuffer will allocate a memory block in VRAM, and return you a pointer on the memory block allocated.
– Victor Gubin
Mar 28 at 11:38
Yes, I understand all of this. But by allocating using CL_MEM_USE_HOST_PTR, I first need to allocate something on CPU to point onto, and when calling the buffer, I need to wait the bandwitdh for the transfer of that memory on the GPU device, but by allocating without the flag, and NULL pointer, its allocating it directly on the device without any transfer requires, which is super fast, specially when you need memory to reside on the device and never query on the host. But Im just wondering if there is a way to tell when to stop allocate like that, and switch to CL_MEM_USE_HOST_PTR maybe.
– Vuwox
Mar 28 at 13:17
1
1
You can use host memory instead i.e.
clCreateBuffer( ... | CL_MEM_USE_HOST_PTR , ..., size, host_ptr, ...);
– Victor Gubin
Mar 27 at 17:31
You can use host memory instead i.e.
clCreateBuffer( ... | CL_MEM_USE_HOST_PTR , ..., size, host_ptr, ...);
– Victor Gubin
Mar 27 at 17:31
@VictorGubin By using Host memory, I need to provide a pointer on the host, but If I want to allocate it only on the device, because I never need it on host (aka host_ptr == NULL) what should I do ? I would like to avoid the transfer as much as possible.
– Vuwox
Mar 27 at 17:49
@VictorGubin By using Host memory, I need to provide a pointer on the host, but If I want to allocate it only on the device, because I never need it on host (aka host_ptr == NULL) what should I do ? I would like to avoid the transfer as much as possible.
– Vuwox
Mar 27 at 17:49
pointer is a memory address aka size_t (unsigned long or unsigned long long depending on CPU arch ). When you creating a cl buffer using a host ptr - it means GPU will use existing memory ( i.e. RAM) at the address of pointer, and nothing will be copied from RAM to VRAM. Otherwise clCreateBuffer will allocate a memory block in VRAM, and return you a pointer on the memory block allocated.
– Victor Gubin
Mar 28 at 11:38
pointer is a memory address aka size_t (unsigned long or unsigned long long depending on CPU arch ). When you creating a cl buffer using a host ptr - it means GPU will use existing memory ( i.e. RAM) at the address of pointer, and nothing will be copied from RAM to VRAM. Otherwise clCreateBuffer will allocate a memory block in VRAM, and return you a pointer on the memory block allocated.
– Victor Gubin
Mar 28 at 11:38
Yes, I understand all of this. But by allocating using CL_MEM_USE_HOST_PTR, I first need to allocate something on CPU to point onto, and when calling the buffer, I need to wait the bandwitdh for the transfer of that memory on the GPU device, but by allocating without the flag, and NULL pointer, its allocating it directly on the device without any transfer requires, which is super fast, specially when you need memory to reside on the device and never query on the host. But Im just wondering if there is a way to tell when to stop allocate like that, and switch to CL_MEM_USE_HOST_PTR maybe.
– Vuwox
Mar 28 at 13:17
Yes, I understand all of this. But by allocating using CL_MEM_USE_HOST_PTR, I first need to allocate something on CPU to point onto, and when calling the buffer, I need to wait the bandwitdh for the transfer of that memory on the GPU device, but by allocating without the flag, and NULL pointer, its allocating it directly on the device without any transfer requires, which is super fast, specially when you need memory to reside on the device and never query on the host. But Im just wondering if there is a way to tell when to stop allocate like that, and switch to CL_MEM_USE_HOST_PTR maybe.
– Vuwox
Mar 28 at 13:17
add a comment |
1 Answer
1
active
oldest
votes
Some parts of your device's VRAM may be used for the pixel buffer, constant memory, or other uses. For AMD cards, you can set the environment variables GPU_MAX_HEAP_SIZE
and GPU_MAX_ALLOC_PERCENT
to use a larger part of the VRAM, though this may have unintended side-effects. Both are expressed as percentages of your physically available memory on the card. Additionally, there is a limit on the size for each memory allocation. You can get the maximum size for a single memory allocation by querying CL_DEVICE_MAX_MEM_ALLOC_SIZE
, which may be less than CL_DEVICE_GLOBAL_MEM_SIZE
. For AMD cards, this size can be controlled with GPU_SINGLE_ALLOC_PERCENT
. This requires no changes to your code, simply set the variables before you call your executable:
GPU_MAX_ALLOC_PERCENT="100"
GPU_MAX_HEAP_SIZE="100"
GPU_SINGLE_ALLOC_PERCENT="100"
./your_program
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55383011%2fopencl-pipeline-failed-to-allocate-buffer-with-cl-mem-object-allocation-failure%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Some parts of your device's VRAM may be used for the pixel buffer, constant memory, or other uses. For AMD cards, you can set the environment variables GPU_MAX_HEAP_SIZE
and GPU_MAX_ALLOC_PERCENT
to use a larger part of the VRAM, though this may have unintended side-effects. Both are expressed as percentages of your physically available memory on the card. Additionally, there is a limit on the size for each memory allocation. You can get the maximum size for a single memory allocation by querying CL_DEVICE_MAX_MEM_ALLOC_SIZE
, which may be less than CL_DEVICE_GLOBAL_MEM_SIZE
. For AMD cards, this size can be controlled with GPU_SINGLE_ALLOC_PERCENT
. This requires no changes to your code, simply set the variables before you call your executable:
GPU_MAX_ALLOC_PERCENT="100"
GPU_MAX_HEAP_SIZE="100"
GPU_SINGLE_ALLOC_PERCENT="100"
./your_program
add a comment |
Some parts of your device's VRAM may be used for the pixel buffer, constant memory, or other uses. For AMD cards, you can set the environment variables GPU_MAX_HEAP_SIZE
and GPU_MAX_ALLOC_PERCENT
to use a larger part of the VRAM, though this may have unintended side-effects. Both are expressed as percentages of your physically available memory on the card. Additionally, there is a limit on the size for each memory allocation. You can get the maximum size for a single memory allocation by querying CL_DEVICE_MAX_MEM_ALLOC_SIZE
, which may be less than CL_DEVICE_GLOBAL_MEM_SIZE
. For AMD cards, this size can be controlled with GPU_SINGLE_ALLOC_PERCENT
. This requires no changes to your code, simply set the variables before you call your executable:
GPU_MAX_ALLOC_PERCENT="100"
GPU_MAX_HEAP_SIZE="100"
GPU_SINGLE_ALLOC_PERCENT="100"
./your_program
add a comment |
Some parts of your device's VRAM may be used for the pixel buffer, constant memory, or other uses. For AMD cards, you can set the environment variables GPU_MAX_HEAP_SIZE
and GPU_MAX_ALLOC_PERCENT
to use a larger part of the VRAM, though this may have unintended side-effects. Both are expressed as percentages of your physically available memory on the card. Additionally, there is a limit on the size for each memory allocation. You can get the maximum size for a single memory allocation by querying CL_DEVICE_MAX_MEM_ALLOC_SIZE
, which may be less than CL_DEVICE_GLOBAL_MEM_SIZE
. For AMD cards, this size can be controlled with GPU_SINGLE_ALLOC_PERCENT
. This requires no changes to your code, simply set the variables before you call your executable:
GPU_MAX_ALLOC_PERCENT="100"
GPU_MAX_HEAP_SIZE="100"
GPU_SINGLE_ALLOC_PERCENT="100"
./your_program
Some parts of your device's VRAM may be used for the pixel buffer, constant memory, or other uses. For AMD cards, you can set the environment variables GPU_MAX_HEAP_SIZE
and GPU_MAX_ALLOC_PERCENT
to use a larger part of the VRAM, though this may have unintended side-effects. Both are expressed as percentages of your physically available memory on the card. Additionally, there is a limit on the size for each memory allocation. You can get the maximum size for a single memory allocation by querying CL_DEVICE_MAX_MEM_ALLOC_SIZE
, which may be less than CL_DEVICE_GLOBAL_MEM_SIZE
. For AMD cards, this size can be controlled with GPU_SINGLE_ALLOC_PERCENT
. This requires no changes to your code, simply set the variables before you call your executable:
GPU_MAX_ALLOC_PERCENT="100"
GPU_MAX_HEAP_SIZE="100"
GPU_SINGLE_ALLOC_PERCENT="100"
./your_program
answered Mar 31 at 16:56
Jan-GerdJan-Gerd
7923 silver badges6 bronze badges
7923 silver badges6 bronze badges
add a comment |
add a comment |
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55383011%2fopencl-pipeline-failed-to-allocate-buffer-with-cl-mem-object-allocation-failure%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
You can use host memory instead i.e.
clCreateBuffer( ... | CL_MEM_USE_HOST_PTR , ..., size, host_ptr, ...);
– Victor Gubin
Mar 27 at 17:31
@VictorGubin By using Host memory, I need to provide a pointer on the host, but If I want to allocate it only on the device, because I never need it on host (aka host_ptr == NULL) what should I do ? I would like to avoid the transfer as much as possible.
– Vuwox
Mar 27 at 17:49
pointer is a memory address aka size_t (unsigned long or unsigned long long depending on CPU arch ). When you creating a cl buffer using a host ptr - it means GPU will use existing memory ( i.e. RAM) at the address of pointer, and nothing will be copied from RAM to VRAM. Otherwise clCreateBuffer will allocate a memory block in VRAM, and return you a pointer on the memory block allocated.
– Victor Gubin
Mar 28 at 11:38
Yes, I understand all of this. But by allocating using CL_MEM_USE_HOST_PTR, I first need to allocate something on CPU to point onto, and when calling the buffer, I need to wait the bandwitdh for the transfer of that memory on the GPU device, but by allocating without the flag, and NULL pointer, its allocating it directly on the device without any transfer requires, which is super fast, specially when you need memory to reside on the device and never query on the host. But Im just wondering if there is a way to tell when to stop allocate like that, and switch to CL_MEM_USE_HOST_PTR maybe.
– Vuwox
Mar 28 at 13:17