Iterator protocol within numpyBuild a Basic Python IteratorHow to iterate through two lists in parallel?In Python, how do I determine if an object is iterable?Iterating over dictionaries using 'for' loopsPythonic way to create a long multi-line stringHow do I find the length (or dimensions, size) of a numpy matrix in python?numpy get index where value is trueHow to iterate over rows in a DataFrame in Pandas?Filter rows of a numpy array?How to specify depth of iterator in numpy?
Longest bridge/tunnel that can be cycled over/through?
Were Alexander the Great and Hephaestion lovers?
Rebus with 20 song titles
Inward extrusion is not working
How did old MS-DOS games utilize various graphic cards?
Is it a problem if <h4>, <h5> and <h6> are smaller than regular text?
Should I give professor gift at the beginning of my PhD?
How come the nude protesters were not arrested?
Does the Long March-11 increase its thrust after clearing the launch tower?
CROSS APPLY produces outer join
Why didn't Voldemort recognize that Dumbledore was affected by his curse?
Overlapping String-Blocks
Implement Own Vector Class in C++
Is the term 'open source' a trademark?
Why can't I use =default for default ctors with a member initializer list
How to manually rewind film?
Importance of Building Credit Score?
Group Integers by Originality
Did Milano or Benatar approve or comment on their namesake MCU ships?
Is a lack of character descriptions a problem?
Why would future John risk sending back a T-800 to save his younger self?
How to hide an urban landmark?
Which languages would be most useful in Europe at the end of the 19th century?
What is the actual quality of machine translations?
Iterator protocol within numpy
Build a Basic Python IteratorHow to iterate through two lists in parallel?In Python, how do I determine if an object is iterable?Iterating over dictionaries using 'for' loopsPythonic way to create a long multi-line stringHow do I find the length (or dimensions, size) of a numpy matrix in python?numpy get index where value is trueHow to iterate over rows in a DataFrame in Pandas?Filter rows of a numpy array?How to specify depth of iterator in numpy?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
Is there a way to work with iterators instead of (for example) numpy.ndarray
in numpy?
For example, imagine I have a 2D-array and I want to know if there is a row that only contain even numbers:
import numpy as np
x = np.array([[1, 2], [2, 4], [3, 6]])
np.any(np.all(x % 2 == 0, axis=1))
Is there a way to do this kind of things without instantiating the intermediate objects in memory? (or maybe it is already the case and I just don't know it) In this example, that would mean having an iterator over [False True False]
instead of an array. In other words, can we do something that would be equivalent to:
has_an_even_row = False
for row in x:
if np.all(row % 2 == 0):
has_an_even_row = True
break
My question doesn't only concern all
and any
but all function/methods in numpy. If it isn't possible I wonder if there is a practical reason for not having this in numpy
. (Maybe everyone thinks it's useless, that would be a good reason)
python numpy iterator
|
show 2 more comments
Is there a way to work with iterators instead of (for example) numpy.ndarray
in numpy?
For example, imagine I have a 2D-array and I want to know if there is a row that only contain even numbers:
import numpy as np
x = np.array([[1, 2], [2, 4], [3, 6]])
np.any(np.all(x % 2 == 0, axis=1))
Is there a way to do this kind of things without instantiating the intermediate objects in memory? (or maybe it is already the case and I just don't know it) In this example, that would mean having an iterator over [False True False]
instead of an array. In other words, can we do something that would be equivalent to:
has_an_even_row = False
for row in x:
if np.all(row % 2 == 0):
has_an_even_row = True
break
My question doesn't only concern all
and any
but all function/methods in numpy. If it isn't possible I wonder if there is a practical reason for not having this in numpy
. (Maybe everyone thinks it's useless, that would be a good reason)
python numpy iterator
1
Sure you can iterate over the rows. Use the usual pythonfor
loop. But be ware the action will usually, but not always, be slower.
– hpaulj
Mar 24 at 18:03
I just updated my question, I'm looking for a solution internal to numpy.
– cglacet
Mar 24 at 18:04
What exactly are you envisioning? If the creation of intermediate objects is problematic, look intonumexpr
. But, as hpaulj is saying, if you want an iterator, use a for-loop.
– juanpa.arrivillaga
Mar 24 at 18:05
1
You can also look atnumba
which is a JIT compiler that will just-in-time-compile functions that use simple loops over numpy data structures into native code. In my experience, it is quite effective.
– juanpa.arrivillaga
Mar 24 at 18:17
1
numpy
is like a Lego set. It is fast and easy to use when you stick with the given building blocks. It does not include a custom block molding machine - you have to get that from some other source.
– hpaulj
Mar 24 at 20:08
|
show 2 more comments
Is there a way to work with iterators instead of (for example) numpy.ndarray
in numpy?
For example, imagine I have a 2D-array and I want to know if there is a row that only contain even numbers:
import numpy as np
x = np.array([[1, 2], [2, 4], [3, 6]])
np.any(np.all(x % 2 == 0, axis=1))
Is there a way to do this kind of things without instantiating the intermediate objects in memory? (or maybe it is already the case and I just don't know it) In this example, that would mean having an iterator over [False True False]
instead of an array. In other words, can we do something that would be equivalent to:
has_an_even_row = False
for row in x:
if np.all(row % 2 == 0):
has_an_even_row = True
break
My question doesn't only concern all
and any
but all function/methods in numpy. If it isn't possible I wonder if there is a practical reason for not having this in numpy
. (Maybe everyone thinks it's useless, that would be a good reason)
python numpy iterator
Is there a way to work with iterators instead of (for example) numpy.ndarray
in numpy?
For example, imagine I have a 2D-array and I want to know if there is a row that only contain even numbers:
import numpy as np
x = np.array([[1, 2], [2, 4], [3, 6]])
np.any(np.all(x % 2 == 0, axis=1))
Is there a way to do this kind of things without instantiating the intermediate objects in memory? (or maybe it is already the case and I just don't know it) In this example, that would mean having an iterator over [False True False]
instead of an array. In other words, can we do something that would be equivalent to:
has_an_even_row = False
for row in x:
if np.all(row % 2 == 0):
has_an_even_row = True
break
My question doesn't only concern all
and any
but all function/methods in numpy. If it isn't possible I wonder if there is a practical reason for not having this in numpy
. (Maybe everyone thinks it's useless, that would be a good reason)
python numpy iterator
python numpy iterator
edited Mar 24 at 18:11
cglacet
asked Mar 24 at 17:53
cglacetcglacet
1,755820
1,755820
1
Sure you can iterate over the rows. Use the usual pythonfor
loop. But be ware the action will usually, but not always, be slower.
– hpaulj
Mar 24 at 18:03
I just updated my question, I'm looking for a solution internal to numpy.
– cglacet
Mar 24 at 18:04
What exactly are you envisioning? If the creation of intermediate objects is problematic, look intonumexpr
. But, as hpaulj is saying, if you want an iterator, use a for-loop.
– juanpa.arrivillaga
Mar 24 at 18:05
1
You can also look atnumba
which is a JIT compiler that will just-in-time-compile functions that use simple loops over numpy data structures into native code. In my experience, it is quite effective.
– juanpa.arrivillaga
Mar 24 at 18:17
1
numpy
is like a Lego set. It is fast and easy to use when you stick with the given building blocks. It does not include a custom block molding machine - you have to get that from some other source.
– hpaulj
Mar 24 at 20:08
|
show 2 more comments
1
Sure you can iterate over the rows. Use the usual pythonfor
loop. But be ware the action will usually, but not always, be slower.
– hpaulj
Mar 24 at 18:03
I just updated my question, I'm looking for a solution internal to numpy.
– cglacet
Mar 24 at 18:04
What exactly are you envisioning? If the creation of intermediate objects is problematic, look intonumexpr
. But, as hpaulj is saying, if you want an iterator, use a for-loop.
– juanpa.arrivillaga
Mar 24 at 18:05
1
You can also look atnumba
which is a JIT compiler that will just-in-time-compile functions that use simple loops over numpy data structures into native code. In my experience, it is quite effective.
– juanpa.arrivillaga
Mar 24 at 18:17
1
numpy
is like a Lego set. It is fast and easy to use when you stick with the given building blocks. It does not include a custom block molding machine - you have to get that from some other source.
– hpaulj
Mar 24 at 20:08
1
1
Sure you can iterate over the rows. Use the usual python
for
loop. But be ware the action will usually, but not always, be slower.– hpaulj
Mar 24 at 18:03
Sure you can iterate over the rows. Use the usual python
for
loop. But be ware the action will usually, but not always, be slower.– hpaulj
Mar 24 at 18:03
I just updated my question, I'm looking for a solution internal to numpy.
– cglacet
Mar 24 at 18:04
I just updated my question, I'm looking for a solution internal to numpy.
– cglacet
Mar 24 at 18:04
What exactly are you envisioning? If the creation of intermediate objects is problematic, look into
numexpr
. But, as hpaulj is saying, if you want an iterator, use a for-loop.– juanpa.arrivillaga
Mar 24 at 18:05
What exactly are you envisioning? If the creation of intermediate objects is problematic, look into
numexpr
. But, as hpaulj is saying, if you want an iterator, use a for-loop.– juanpa.arrivillaga
Mar 24 at 18:05
1
1
You can also look at
numba
which is a JIT compiler that will just-in-time-compile functions that use simple loops over numpy data structures into native code. In my experience, it is quite effective.– juanpa.arrivillaga
Mar 24 at 18:17
You can also look at
numba
which is a JIT compiler that will just-in-time-compile functions that use simple loops over numpy data structures into native code. In my experience, it is quite effective.– juanpa.arrivillaga
Mar 24 at 18:17
1
1
numpy
is like a Lego set. It is fast and easy to use when you stick with the given building blocks. It does not include a custom block molding machine - you have to get that from some other source.– hpaulj
Mar 24 at 20:08
numpy
is like a Lego set. It is fast and easy to use when you stick with the given building blocks. It does not include a custom block molding machine - you have to get that from some other source.– hpaulj
Mar 24 at 20:08
|
show 2 more comments
2 Answers
2
active
oldest
votes
The number of temporary arrays may be more than you realize:
In [224]: x = np.array([[1, 2], [2, 4], [3, 6]])
In [225]: x % 2
Out[225]:
array([[1, 0],
[0, 0],
[1, 0]])
In [226]: _ == 0
Out[226]:
array([[False, True],
[ True, True],
[False, True]])
In [227]: np.all(_, axis=1)
Out[227]: array([False, True, False])
In [228]: np.any(_)
Out[228]: True
In this case, working row by row would save on calculating the last row's values.
The last any
step might short-circuit, stopping when it hits the True
- that's an implementation detail.
A thoroughly iterative, no excess calculations method would be something like:
In [231]: val = False
...: for row in x:
...: for col in row:
...: if col%2!=0:
...: break
...: val=(row,col)
...: break
In [232]: val
Out[232]: (array([2, 4]), 2)
This approach would make sense if I were writing in C or a lisp
like language, where testing, memory management, and calculations all occur at the same code level. But it wouldn't be very modular or reusable.
The idea underlying numpy
is to provide a comprehensive set of compiled building blocks. Those blocks won't be optimal for all tasks, but on the whole they are fast and easy to use.
It's generally recommended to use the given building blocks for quick development. Once it's working then worry about improving the speed of time critical steps.
"But it wouldn't be very modular or reusable" I agree, that's why I was wondering if it existed inside numpy. "It's generally recommended to use the given building blocks for quick development. Once it's working then worry about improving the speed of time critical steps.", basically if I had a use case where memory is limiting you would advice to rewriting the code in Cython or any lower level language?
– cglacet
Mar 26 at 14:20
add a comment |
The numpy
library doesn't give you very many tools to use some of the conventional Python protocols because it is focused on performance within a narrow domain (numeric computation). The whole purpose of numpy
is to do numeric operations that are slow in pure Python much more quickly (close to your hardware's maximum speed, like code written in a lower level language like C) without loosing all of the benefits of Python (like garbage collection and easy to read syntax).
The downside to focusing on a narrow domain is that you lose some benefits of more general code. So your for
loop code can do less work than numpy does, because it can short-circuit, breaking out of the iteration as soon as the result is known. It doesn't need to do the modulus for every row if it found the result it needs already.
But I suspect if you test it, your numpy
code may still going to be faster a lot of the time (test on real data, not trivial stuff like in your example)! Even though it computes a whole bunch of intermediate results up front, the low level operations are so much faster than the equivalent in pure Python that it doesn't matter that it has to iterate over the whole array.
I'll surely try to compare time performances and come back here :). But that wouldn't really be a sufficient reason for not having a way to have iterators, there probably is a memory-speed tradeoff here. Unless I'm missing something.
– cglacet
Mar 26 at 14:14
Well, I guess I just don't understand exactly what you're expecting. Numpy arrays are iterable, so you can write normal Python code to operate on them (though it may not be as convenient or even as fast as using normal Python data structures). Many numpy functions only work on arrays, rather than iterables, and the reason for that is that their performance benefits are only available for arrays, not for arbitrary objects.
– Blckknght
Mar 27 at 0:23
"and the reason for that is that their performance benefits are only available for arrays" that's the part I really don't understand, from what I understand numpy takes advantage of static typing (together with type homogeneous structures) to speedup things and save memory. What I fail to understand is why this can't be used to build some other (statically typed) set of functions that instead of having arrays as both input and output would have arrays as input and iterators as output (a custom kind of iterator since it wouldn't iterate over arbitrary object, but instead over a given type).
– cglacet
Mar 27 at 7:33
Testing this is a bit hard as it requires re-writing some parts of numpy, but I'll try to in the near future if nobody tells me it's just not possible because of some reason I fail to see for now (maybe because I have an over-simplified vision of how numpy works).
– cglacet
Mar 27 at 7:37
The iterator protocol isn't that specific. You can't really have a function that only accepts one kind of iterator and say that's using the iterator protocol. You either callnext
on the arbitrary iterator object you've been given (which is slow, since it does a Python function call and might run arbitrary Python code), or you don't.
– Blckknght
Mar 27 at 7:40
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55326761%2fiterator-protocol-within-numpy%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
The number of temporary arrays may be more than you realize:
In [224]: x = np.array([[1, 2], [2, 4], [3, 6]])
In [225]: x % 2
Out[225]:
array([[1, 0],
[0, 0],
[1, 0]])
In [226]: _ == 0
Out[226]:
array([[False, True],
[ True, True],
[False, True]])
In [227]: np.all(_, axis=1)
Out[227]: array([False, True, False])
In [228]: np.any(_)
Out[228]: True
In this case, working row by row would save on calculating the last row's values.
The last any
step might short-circuit, stopping when it hits the True
- that's an implementation detail.
A thoroughly iterative, no excess calculations method would be something like:
In [231]: val = False
...: for row in x:
...: for col in row:
...: if col%2!=0:
...: break
...: val=(row,col)
...: break
In [232]: val
Out[232]: (array([2, 4]), 2)
This approach would make sense if I were writing in C or a lisp
like language, where testing, memory management, and calculations all occur at the same code level. But it wouldn't be very modular or reusable.
The idea underlying numpy
is to provide a comprehensive set of compiled building blocks. Those blocks won't be optimal for all tasks, but on the whole they are fast and easy to use.
It's generally recommended to use the given building blocks for quick development. Once it's working then worry about improving the speed of time critical steps.
"But it wouldn't be very modular or reusable" I agree, that's why I was wondering if it existed inside numpy. "It's generally recommended to use the given building blocks for quick development. Once it's working then worry about improving the speed of time critical steps.", basically if I had a use case where memory is limiting you would advice to rewriting the code in Cython or any lower level language?
– cglacet
Mar 26 at 14:20
add a comment |
The number of temporary arrays may be more than you realize:
In [224]: x = np.array([[1, 2], [2, 4], [3, 6]])
In [225]: x % 2
Out[225]:
array([[1, 0],
[0, 0],
[1, 0]])
In [226]: _ == 0
Out[226]:
array([[False, True],
[ True, True],
[False, True]])
In [227]: np.all(_, axis=1)
Out[227]: array([False, True, False])
In [228]: np.any(_)
Out[228]: True
In this case, working row by row would save on calculating the last row's values.
The last any
step might short-circuit, stopping when it hits the True
- that's an implementation detail.
A thoroughly iterative, no excess calculations method would be something like:
In [231]: val = False
...: for row in x:
...: for col in row:
...: if col%2!=0:
...: break
...: val=(row,col)
...: break
In [232]: val
Out[232]: (array([2, 4]), 2)
This approach would make sense if I were writing in C or a lisp
like language, where testing, memory management, and calculations all occur at the same code level. But it wouldn't be very modular or reusable.
The idea underlying numpy
is to provide a comprehensive set of compiled building blocks. Those blocks won't be optimal for all tasks, but on the whole they are fast and easy to use.
It's generally recommended to use the given building blocks for quick development. Once it's working then worry about improving the speed of time critical steps.
"But it wouldn't be very modular or reusable" I agree, that's why I was wondering if it existed inside numpy. "It's generally recommended to use the given building blocks for quick development. Once it's working then worry about improving the speed of time critical steps.", basically if I had a use case where memory is limiting you would advice to rewriting the code in Cython or any lower level language?
– cglacet
Mar 26 at 14:20
add a comment |
The number of temporary arrays may be more than you realize:
In [224]: x = np.array([[1, 2], [2, 4], [3, 6]])
In [225]: x % 2
Out[225]:
array([[1, 0],
[0, 0],
[1, 0]])
In [226]: _ == 0
Out[226]:
array([[False, True],
[ True, True],
[False, True]])
In [227]: np.all(_, axis=1)
Out[227]: array([False, True, False])
In [228]: np.any(_)
Out[228]: True
In this case, working row by row would save on calculating the last row's values.
The last any
step might short-circuit, stopping when it hits the True
- that's an implementation detail.
A thoroughly iterative, no excess calculations method would be something like:
In [231]: val = False
...: for row in x:
...: for col in row:
...: if col%2!=0:
...: break
...: val=(row,col)
...: break
In [232]: val
Out[232]: (array([2, 4]), 2)
This approach would make sense if I were writing in C or a lisp
like language, where testing, memory management, and calculations all occur at the same code level. But it wouldn't be very modular or reusable.
The idea underlying numpy
is to provide a comprehensive set of compiled building blocks. Those blocks won't be optimal for all tasks, but on the whole they are fast and easy to use.
It's generally recommended to use the given building blocks for quick development. Once it's working then worry about improving the speed of time critical steps.
The number of temporary arrays may be more than you realize:
In [224]: x = np.array([[1, 2], [2, 4], [3, 6]])
In [225]: x % 2
Out[225]:
array([[1, 0],
[0, 0],
[1, 0]])
In [226]: _ == 0
Out[226]:
array([[False, True],
[ True, True],
[False, True]])
In [227]: np.all(_, axis=1)
Out[227]: array([False, True, False])
In [228]: np.any(_)
Out[228]: True
In this case, working row by row would save on calculating the last row's values.
The last any
step might short-circuit, stopping when it hits the True
- that's an implementation detail.
A thoroughly iterative, no excess calculations method would be something like:
In [231]: val = False
...: for row in x:
...: for col in row:
...: if col%2!=0:
...: break
...: val=(row,col)
...: break
In [232]: val
Out[232]: (array([2, 4]), 2)
This approach would make sense if I were writing in C or a lisp
like language, where testing, memory management, and calculations all occur at the same code level. But it wouldn't be very modular or reusable.
The idea underlying numpy
is to provide a comprehensive set of compiled building blocks. Those blocks won't be optimal for all tasks, but on the whole they are fast and easy to use.
It's generally recommended to use the given building blocks for quick development. Once it's working then worry about improving the speed of time critical steps.
edited Mar 24 at 21:31
answered Mar 24 at 21:12
hpauljhpaulj
122k791166
122k791166
"But it wouldn't be very modular or reusable" I agree, that's why I was wondering if it existed inside numpy. "It's generally recommended to use the given building blocks for quick development. Once it's working then worry about improving the speed of time critical steps.", basically if I had a use case where memory is limiting you would advice to rewriting the code in Cython or any lower level language?
– cglacet
Mar 26 at 14:20
add a comment |
"But it wouldn't be very modular or reusable" I agree, that's why I was wondering if it existed inside numpy. "It's generally recommended to use the given building blocks for quick development. Once it's working then worry about improving the speed of time critical steps.", basically if I had a use case where memory is limiting you would advice to rewriting the code in Cython or any lower level language?
– cglacet
Mar 26 at 14:20
"But it wouldn't be very modular or reusable" I agree, that's why I was wondering if it existed inside numpy. "It's generally recommended to use the given building blocks for quick development. Once it's working then worry about improving the speed of time critical steps.", basically if I had a use case where memory is limiting you would advice to rewriting the code in Cython or any lower level language?
– cglacet
Mar 26 at 14:20
"But it wouldn't be very modular or reusable" I agree, that's why I was wondering if it existed inside numpy. "It's generally recommended to use the given building blocks for quick development. Once it's working then worry about improving the speed of time critical steps.", basically if I had a use case where memory is limiting you would advice to rewriting the code in Cython or any lower level language?
– cglacet
Mar 26 at 14:20
add a comment |
The numpy
library doesn't give you very many tools to use some of the conventional Python protocols because it is focused on performance within a narrow domain (numeric computation). The whole purpose of numpy
is to do numeric operations that are slow in pure Python much more quickly (close to your hardware's maximum speed, like code written in a lower level language like C) without loosing all of the benefits of Python (like garbage collection and easy to read syntax).
The downside to focusing on a narrow domain is that you lose some benefits of more general code. So your for
loop code can do less work than numpy does, because it can short-circuit, breaking out of the iteration as soon as the result is known. It doesn't need to do the modulus for every row if it found the result it needs already.
But I suspect if you test it, your numpy
code may still going to be faster a lot of the time (test on real data, not trivial stuff like in your example)! Even though it computes a whole bunch of intermediate results up front, the low level operations are so much faster than the equivalent in pure Python that it doesn't matter that it has to iterate over the whole array.
I'll surely try to compare time performances and come back here :). But that wouldn't really be a sufficient reason for not having a way to have iterators, there probably is a memory-speed tradeoff here. Unless I'm missing something.
– cglacet
Mar 26 at 14:14
Well, I guess I just don't understand exactly what you're expecting. Numpy arrays are iterable, so you can write normal Python code to operate on them (though it may not be as convenient or even as fast as using normal Python data structures). Many numpy functions only work on arrays, rather than iterables, and the reason for that is that their performance benefits are only available for arrays, not for arbitrary objects.
– Blckknght
Mar 27 at 0:23
"and the reason for that is that their performance benefits are only available for arrays" that's the part I really don't understand, from what I understand numpy takes advantage of static typing (together with type homogeneous structures) to speedup things and save memory. What I fail to understand is why this can't be used to build some other (statically typed) set of functions that instead of having arrays as both input and output would have arrays as input and iterators as output (a custom kind of iterator since it wouldn't iterate over arbitrary object, but instead over a given type).
– cglacet
Mar 27 at 7:33
Testing this is a bit hard as it requires re-writing some parts of numpy, but I'll try to in the near future if nobody tells me it's just not possible because of some reason I fail to see for now (maybe because I have an over-simplified vision of how numpy works).
– cglacet
Mar 27 at 7:37
The iterator protocol isn't that specific. You can't really have a function that only accepts one kind of iterator and say that's using the iterator protocol. You either callnext
on the arbitrary iterator object you've been given (which is slow, since it does a Python function call and might run arbitrary Python code), or you don't.
– Blckknght
Mar 27 at 7:40
|
show 1 more comment
The numpy
library doesn't give you very many tools to use some of the conventional Python protocols because it is focused on performance within a narrow domain (numeric computation). The whole purpose of numpy
is to do numeric operations that are slow in pure Python much more quickly (close to your hardware's maximum speed, like code written in a lower level language like C) without loosing all of the benefits of Python (like garbage collection and easy to read syntax).
The downside to focusing on a narrow domain is that you lose some benefits of more general code. So your for
loop code can do less work than numpy does, because it can short-circuit, breaking out of the iteration as soon as the result is known. It doesn't need to do the modulus for every row if it found the result it needs already.
But I suspect if you test it, your numpy
code may still going to be faster a lot of the time (test on real data, not trivial stuff like in your example)! Even though it computes a whole bunch of intermediate results up front, the low level operations are so much faster than the equivalent in pure Python that it doesn't matter that it has to iterate over the whole array.
I'll surely try to compare time performances and come back here :). But that wouldn't really be a sufficient reason for not having a way to have iterators, there probably is a memory-speed tradeoff here. Unless I'm missing something.
– cglacet
Mar 26 at 14:14
Well, I guess I just don't understand exactly what you're expecting. Numpy arrays are iterable, so you can write normal Python code to operate on them (though it may not be as convenient or even as fast as using normal Python data structures). Many numpy functions only work on arrays, rather than iterables, and the reason for that is that their performance benefits are only available for arrays, not for arbitrary objects.
– Blckknght
Mar 27 at 0:23
"and the reason for that is that their performance benefits are only available for arrays" that's the part I really don't understand, from what I understand numpy takes advantage of static typing (together with type homogeneous structures) to speedup things and save memory. What I fail to understand is why this can't be used to build some other (statically typed) set of functions that instead of having arrays as both input and output would have arrays as input and iterators as output (a custom kind of iterator since it wouldn't iterate over arbitrary object, but instead over a given type).
– cglacet
Mar 27 at 7:33
Testing this is a bit hard as it requires re-writing some parts of numpy, but I'll try to in the near future if nobody tells me it's just not possible because of some reason I fail to see for now (maybe because I have an over-simplified vision of how numpy works).
– cglacet
Mar 27 at 7:37
The iterator protocol isn't that specific. You can't really have a function that only accepts one kind of iterator and say that's using the iterator protocol. You either callnext
on the arbitrary iterator object you've been given (which is slow, since it does a Python function call and might run arbitrary Python code), or you don't.
– Blckknght
Mar 27 at 7:40
|
show 1 more comment
The numpy
library doesn't give you very many tools to use some of the conventional Python protocols because it is focused on performance within a narrow domain (numeric computation). The whole purpose of numpy
is to do numeric operations that are slow in pure Python much more quickly (close to your hardware's maximum speed, like code written in a lower level language like C) without loosing all of the benefits of Python (like garbage collection and easy to read syntax).
The downside to focusing on a narrow domain is that you lose some benefits of more general code. So your for
loop code can do less work than numpy does, because it can short-circuit, breaking out of the iteration as soon as the result is known. It doesn't need to do the modulus for every row if it found the result it needs already.
But I suspect if you test it, your numpy
code may still going to be faster a lot of the time (test on real data, not trivial stuff like in your example)! Even though it computes a whole bunch of intermediate results up front, the low level operations are so much faster than the equivalent in pure Python that it doesn't matter that it has to iterate over the whole array.
The numpy
library doesn't give you very many tools to use some of the conventional Python protocols because it is focused on performance within a narrow domain (numeric computation). The whole purpose of numpy
is to do numeric operations that are slow in pure Python much more quickly (close to your hardware's maximum speed, like code written in a lower level language like C) without loosing all of the benefits of Python (like garbage collection and easy to read syntax).
The downside to focusing on a narrow domain is that you lose some benefits of more general code. So your for
loop code can do less work than numpy does, because it can short-circuit, breaking out of the iteration as soon as the result is known. It doesn't need to do the modulus for every row if it found the result it needs already.
But I suspect if you test it, your numpy
code may still going to be faster a lot of the time (test on real data, not trivial stuff like in your example)! Even though it computes a whole bunch of intermediate results up front, the low level operations are so much faster than the equivalent in pure Python that it doesn't matter that it has to iterate over the whole array.
answered Mar 24 at 22:07
BlckknghtBlckknght
66.3k664111
66.3k664111
I'll surely try to compare time performances and come back here :). But that wouldn't really be a sufficient reason for not having a way to have iterators, there probably is a memory-speed tradeoff here. Unless I'm missing something.
– cglacet
Mar 26 at 14:14
Well, I guess I just don't understand exactly what you're expecting. Numpy arrays are iterable, so you can write normal Python code to operate on them (though it may not be as convenient or even as fast as using normal Python data structures). Many numpy functions only work on arrays, rather than iterables, and the reason for that is that their performance benefits are only available for arrays, not for arbitrary objects.
– Blckknght
Mar 27 at 0:23
"and the reason for that is that their performance benefits are only available for arrays" that's the part I really don't understand, from what I understand numpy takes advantage of static typing (together with type homogeneous structures) to speedup things and save memory. What I fail to understand is why this can't be used to build some other (statically typed) set of functions that instead of having arrays as both input and output would have arrays as input and iterators as output (a custom kind of iterator since it wouldn't iterate over arbitrary object, but instead over a given type).
– cglacet
Mar 27 at 7:33
Testing this is a bit hard as it requires re-writing some parts of numpy, but I'll try to in the near future if nobody tells me it's just not possible because of some reason I fail to see for now (maybe because I have an over-simplified vision of how numpy works).
– cglacet
Mar 27 at 7:37
The iterator protocol isn't that specific. You can't really have a function that only accepts one kind of iterator and say that's using the iterator protocol. You either callnext
on the arbitrary iterator object you've been given (which is slow, since it does a Python function call and might run arbitrary Python code), or you don't.
– Blckknght
Mar 27 at 7:40
|
show 1 more comment
I'll surely try to compare time performances and come back here :). But that wouldn't really be a sufficient reason for not having a way to have iterators, there probably is a memory-speed tradeoff here. Unless I'm missing something.
– cglacet
Mar 26 at 14:14
Well, I guess I just don't understand exactly what you're expecting. Numpy arrays are iterable, so you can write normal Python code to operate on them (though it may not be as convenient or even as fast as using normal Python data structures). Many numpy functions only work on arrays, rather than iterables, and the reason for that is that their performance benefits are only available for arrays, not for arbitrary objects.
– Blckknght
Mar 27 at 0:23
"and the reason for that is that their performance benefits are only available for arrays" that's the part I really don't understand, from what I understand numpy takes advantage of static typing (together with type homogeneous structures) to speedup things and save memory. What I fail to understand is why this can't be used to build some other (statically typed) set of functions that instead of having arrays as both input and output would have arrays as input and iterators as output (a custom kind of iterator since it wouldn't iterate over arbitrary object, but instead over a given type).
– cglacet
Mar 27 at 7:33
Testing this is a bit hard as it requires re-writing some parts of numpy, but I'll try to in the near future if nobody tells me it's just not possible because of some reason I fail to see for now (maybe because I have an over-simplified vision of how numpy works).
– cglacet
Mar 27 at 7:37
The iterator protocol isn't that specific. You can't really have a function that only accepts one kind of iterator and say that's using the iterator protocol. You either callnext
on the arbitrary iterator object you've been given (which is slow, since it does a Python function call and might run arbitrary Python code), or you don't.
– Blckknght
Mar 27 at 7:40
I'll surely try to compare time performances and come back here :). But that wouldn't really be a sufficient reason for not having a way to have iterators, there probably is a memory-speed tradeoff here. Unless I'm missing something.
– cglacet
Mar 26 at 14:14
I'll surely try to compare time performances and come back here :). But that wouldn't really be a sufficient reason for not having a way to have iterators, there probably is a memory-speed tradeoff here. Unless I'm missing something.
– cglacet
Mar 26 at 14:14
Well, I guess I just don't understand exactly what you're expecting. Numpy arrays are iterable, so you can write normal Python code to operate on them (though it may not be as convenient or even as fast as using normal Python data structures). Many numpy functions only work on arrays, rather than iterables, and the reason for that is that their performance benefits are only available for arrays, not for arbitrary objects.
– Blckknght
Mar 27 at 0:23
Well, I guess I just don't understand exactly what you're expecting. Numpy arrays are iterable, so you can write normal Python code to operate on them (though it may not be as convenient or even as fast as using normal Python data structures). Many numpy functions only work on arrays, rather than iterables, and the reason for that is that their performance benefits are only available for arrays, not for arbitrary objects.
– Blckknght
Mar 27 at 0:23
"and the reason for that is that their performance benefits are only available for arrays" that's the part I really don't understand, from what I understand numpy takes advantage of static typing (together with type homogeneous structures) to speedup things and save memory. What I fail to understand is why this can't be used to build some other (statically typed) set of functions that instead of having arrays as both input and output would have arrays as input and iterators as output (a custom kind of iterator since it wouldn't iterate over arbitrary object, but instead over a given type).
– cglacet
Mar 27 at 7:33
"and the reason for that is that their performance benefits are only available for arrays" that's the part I really don't understand, from what I understand numpy takes advantage of static typing (together with type homogeneous structures) to speedup things and save memory. What I fail to understand is why this can't be used to build some other (statically typed) set of functions that instead of having arrays as both input and output would have arrays as input and iterators as output (a custom kind of iterator since it wouldn't iterate over arbitrary object, but instead over a given type).
– cglacet
Mar 27 at 7:33
Testing this is a bit hard as it requires re-writing some parts of numpy, but I'll try to in the near future if nobody tells me it's just not possible because of some reason I fail to see for now (maybe because I have an over-simplified vision of how numpy works).
– cglacet
Mar 27 at 7:37
Testing this is a bit hard as it requires re-writing some parts of numpy, but I'll try to in the near future if nobody tells me it's just not possible because of some reason I fail to see for now (maybe because I have an over-simplified vision of how numpy works).
– cglacet
Mar 27 at 7:37
The iterator protocol isn't that specific. You can't really have a function that only accepts one kind of iterator and say that's using the iterator protocol. You either call
next
on the arbitrary iterator object you've been given (which is slow, since it does a Python function call and might run arbitrary Python code), or you don't.– Blckknght
Mar 27 at 7:40
The iterator protocol isn't that specific. You can't really have a function that only accepts one kind of iterator and say that's using the iterator protocol. You either call
next
on the arbitrary iterator object you've been given (which is slow, since it does a Python function call and might run arbitrary Python code), or you don't.– Blckknght
Mar 27 at 7:40
|
show 1 more comment
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55326761%2fiterator-protocol-within-numpy%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Sure you can iterate over the rows. Use the usual python
for
loop. But be ware the action will usually, but not always, be slower.– hpaulj
Mar 24 at 18:03
I just updated my question, I'm looking for a solution internal to numpy.
– cglacet
Mar 24 at 18:04
What exactly are you envisioning? If the creation of intermediate objects is problematic, look into
numexpr
. But, as hpaulj is saying, if you want an iterator, use a for-loop.– juanpa.arrivillaga
Mar 24 at 18:05
1
You can also look at
numba
which is a JIT compiler that will just-in-time-compile functions that use simple loops over numpy data structures into native code. In my experience, it is quite effective.– juanpa.arrivillaga
Mar 24 at 18:17
1
numpy
is like a Lego set. It is fast and easy to use when you stick with the given building blocks. It does not include a custom block molding machine - you have to get that from some other source.– hpaulj
Mar 24 at 20:08