Python how to add multiple arrays with different length into oneConvert Python sequence to NumPy array, filling missing valuesHow do I copy a file in Python?What is the difference between Python's list methods append and extend?How can I safely create a nested directory?How can I remove a trailing newline?How do I parse a string to a float or int?How to get the current time in PythonHow can I make a time delay in Python?How do I get the number of elements in a list?How do I concatenate two lists in Python?How do I lowercase a string in Python?
My machine, client installed VPN,
Where to find the Arxiv endorsement code?
Convert a string of digits from words to an integer
Do interval ratios take overtones into account or solely the fundamental frequency?
Can I exile my opponent's Progenitus/True-Name Nemesis with Teferi, Hero of Dominaria's emblem?
what organs or modifications would be needed to have hairy fish?
I reverse the source code, you reverse the input!
Why do some modern glider wings like the Schleicher 29 have a tadpole shape rather than a teardrop shape?
Question about a degree 5 polynomial with no rational roots
Concerning a relationship in the team
Whaling ship logistics
Received a package but didn't order it
Windows 10 deletes lots of tiny files super slowly. Anything that can be done to speed it up?
I transpose the source code, you transpose the input!
A word that refers to saying something in an attempt to anger or embarrass someone into doing something that they don’t want to do?
Dynamic DataSource for Droplist in Content Editor
An impressive body of work
Why does my browser attempt to download pages from http://clhs.lisp.se instead of viewing them normally?
Another student has been assigned the same MSc thesis as mine (and already defended)
Is determiner 'a' needed here?
Why would an airline put 15 passengers at once on standby?
How many stack cables would be needed if we want to stack two 3850 switches
Population of post-Soviet states. Why decreasing?
What happens to a net with the Returning Weapon artificer infusion after it hits?
Python how to add multiple arrays with different length into one
Convert Python sequence to NumPy array, filling missing valuesHow do I copy a file in Python?What is the difference between Python's list methods append and extend?How can I safely create a nested directory?How can I remove a trailing newline?How do I parse a string to a float or int?How to get the current time in PythonHow can I make a time delay in Python?How do I get the number of elements in a list?How do I concatenate two lists in Python?How do I lowercase a string in Python?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I am working a program that needs to mix audio arrays together with a given starting index. For example
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
onset(0, 2, 8)
result = mixing_function(sig,onset)
Based on the onset, signal2 will add to signal1 from index 2, and signal3 will add to the mix from index 8, so the mixing part will be zero padded. It should return:
[1,2,8,9,5,0,0,0,7,7,7,7]
I am not sure what is the effective way to write the code for this. For now, I created a zero array with the maximum length maxlen. Then I add each element in sig to the corresponding index range of the result :
def mixing_function(sig,onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
result = np.zeros(maxlen)
for i in range(len(onset)):
result[onset[i]:onset[i] + len(sig[i])] += sig[i]
return result
However, this can be quite slow especially when there are many signals being mixed together all with different onsets. Please advice if there is a much more efficient way .
Many thanks
J
python numpy
add a comment
|
I am working a program that needs to mix audio arrays together with a given starting index. For example
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
onset(0, 2, 8)
result = mixing_function(sig,onset)
Based on the onset, signal2 will add to signal1 from index 2, and signal3 will add to the mix from index 8, so the mixing part will be zero padded. It should return:
[1,2,8,9,5,0,0,0,7,7,7,7]
I am not sure what is the effective way to write the code for this. For now, I created a zero array with the maximum length maxlen. Then I add each element in sig to the corresponding index range of the result :
def mixing_function(sig,onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
result = np.zeros(maxlen)
for i in range(len(onset)):
result[onset[i]:onset[i] + len(sig[i])] += sig[i]
return result
However, this can be quite slow especially when there are many signals being mixed together all with different onsets. Please advice if there is a much more efficient way .
Many thanks
J
python numpy
1
I don't see any obvious way of speeding this up. Just that you couldnumba
-compile it. And don't wrap your list of arrays insig
into anumpy
array. Just keep it a list of arrays.
– j08lue
Mar 28 at 19:34
"addition" in the context of arrays is ambiguous. For instance, it can mean appending. In this case, you seem to mean element-wise addition. Also, normal English is "add A and B" or "add A to B", not "B adds A".
– Acccumulation
Mar 28 at 20:16
Padding with zeros (or something else) comes up periodically. There are good answers in Convert Python sequence to NumPy array, filling missing values, including a clean version of themask
, and also one usingitertools.zip_longest
.
– hpaulj
Mar 29 at 2:43
add a comment
|
I am working a program that needs to mix audio arrays together with a given starting index. For example
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
onset(0, 2, 8)
result = mixing_function(sig,onset)
Based on the onset, signal2 will add to signal1 from index 2, and signal3 will add to the mix from index 8, so the mixing part will be zero padded. It should return:
[1,2,8,9,5,0,0,0,7,7,7,7]
I am not sure what is the effective way to write the code for this. For now, I created a zero array with the maximum length maxlen. Then I add each element in sig to the corresponding index range of the result :
def mixing_function(sig,onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
result = np.zeros(maxlen)
for i in range(len(onset)):
result[onset[i]:onset[i] + len(sig[i])] += sig[i]
return result
However, this can be quite slow especially when there are many signals being mixed together all with different onsets. Please advice if there is a much more efficient way .
Many thanks
J
python numpy
I am working a program that needs to mix audio arrays together with a given starting index. For example
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
onset(0, 2, 8)
result = mixing_function(sig,onset)
Based on the onset, signal2 will add to signal1 from index 2, and signal3 will add to the mix from index 8, so the mixing part will be zero padded. It should return:
[1,2,8,9,5,0,0,0,7,7,7,7]
I am not sure what is the effective way to write the code for this. For now, I created a zero array with the maximum length maxlen. Then I add each element in sig to the corresponding index range of the result :
def mixing_function(sig,onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
result = np.zeros(maxlen)
for i in range(len(onset)):
result[onset[i]:onset[i] + len(sig[i])] += sig[i]
return result
However, this can be quite slow especially when there are many signals being mixed together all with different onsets. Please advice if there is a much more efficient way .
Many thanks
J
python numpy
python numpy
edited Mar 28 at 19:21
J_yang
asked Mar 28 at 19:16
J_yangJ_yang
8553 gold badges13 silver badges30 bronze badges
8553 gold badges13 silver badges30 bronze badges
1
I don't see any obvious way of speeding this up. Just that you couldnumba
-compile it. And don't wrap your list of arrays insig
into anumpy
array. Just keep it a list of arrays.
– j08lue
Mar 28 at 19:34
"addition" in the context of arrays is ambiguous. For instance, it can mean appending. In this case, you seem to mean element-wise addition. Also, normal English is "add A and B" or "add A to B", not "B adds A".
– Acccumulation
Mar 28 at 20:16
Padding with zeros (or something else) comes up periodically. There are good answers in Convert Python sequence to NumPy array, filling missing values, including a clean version of themask
, and also one usingitertools.zip_longest
.
– hpaulj
Mar 29 at 2:43
add a comment
|
1
I don't see any obvious way of speeding this up. Just that you couldnumba
-compile it. And don't wrap your list of arrays insig
into anumpy
array. Just keep it a list of arrays.
– j08lue
Mar 28 at 19:34
"addition" in the context of arrays is ambiguous. For instance, it can mean appending. In this case, you seem to mean element-wise addition. Also, normal English is "add A and B" or "add A to B", not "B adds A".
– Acccumulation
Mar 28 at 20:16
Padding with zeros (or something else) comes up periodically. There are good answers in Convert Python sequence to NumPy array, filling missing values, including a clean version of themask
, and also one usingitertools.zip_longest
.
– hpaulj
Mar 29 at 2:43
1
1
I don't see any obvious way of speeding this up. Just that you could
numba
-compile it. And don't wrap your list of arrays in sig
into a numpy
array. Just keep it a list of arrays.– j08lue
Mar 28 at 19:34
I don't see any obvious way of speeding this up. Just that you could
numba
-compile it. And don't wrap your list of arrays in sig
into a numpy
array. Just keep it a list of arrays.– j08lue
Mar 28 at 19:34
"addition" in the context of arrays is ambiguous. For instance, it can mean appending. In this case, you seem to mean element-wise addition. Also, normal English is "add A and B" or "add A to B", not "B adds A".
– Acccumulation
Mar 28 at 20:16
"addition" in the context of arrays is ambiguous. For instance, it can mean appending. In this case, you seem to mean element-wise addition. Also, normal English is "add A and B" or "add A to B", not "B adds A".
– Acccumulation
Mar 28 at 20:16
Padding with zeros (or something else) comes up periodically. There are good answers in Convert Python sequence to NumPy array, filling missing values, including a clean version of the
mask
, and also one using itertools.zip_longest
.– hpaulj
Mar 29 at 2:43
Padding with zeros (or something else) comes up periodically. There are good answers in Convert Python sequence to NumPy array, filling missing values, including a clean version of the
mask
, and also one using itertools.zip_longest
.– hpaulj
Mar 29 at 2:43
add a comment
|
4 Answers
4
active
oldest
votes
Here are some stats for different solutions to the problem. I was able to squeeze a little more performance by vectorizing the implementation to get maxlen, but besides that, I think you will have to try cython or trying other programming languages.
import numpy as np
from numba import jit
from time import time
np.random.seed(42)
def mixing_function(sig, onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
result = np.zeros(maxlen)
for i in range(len(onset)):
result[onset[i]:onset[i] + len(sig[i])] += sig[i]
return result
def mix(sig, onset):
siglengths = np.vectorize(len)(sig)
maxlen = max(onset + siglengths)
result = np.zeros(maxlen)
for i in range(len(sig)):
result[onset[i]: onset[i]+siglengths[i]] += sig[i]
return result
@jit(nopython=True)
def mixnumba(sig, onset):
# maxlen = np.max([onset[i] + len(sig[i]) for i in range(len(sig))])
maxlen = -1
for i in range(len(sig)):
maxlen = max(maxlen, sig[i].size + onset[i])
result = np.zeros(maxlen)
for i in range(len(sig)):
result[onset[i]: onset[i] + sig[i].size] += sig[i]
return result
def signal_adder_with_onset(data, onset):
data = np.array(data)
# Get lengths of each row of data
lens = np.array([len(i) for i in data])
#adjust with offset for max possible lengths
max_size = lens + onset
# Mask of valid places in each row
mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
& (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))
# Setup output array and put elements from data into masked positions
out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
out[mask] = np.concatenate(data)
return out.sum(axis=0)
sigbig = [np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]
onsetbig = np.random.randint(0, 10000, size=10000)
sigrepeat = np.repeat(sig, 500000).tolist()
onsetrepeat = np.repeat(onset, 500000)
assert all(mixing_function(sigbig, onsetbig) == mix(sigbig, onsetbig))
assert all(mixing_function(sigbig, onsetbig) == mixnumba(sigbig, onsetbig))
assert all(mixing_function(sigbig, onsetbig) == signal_adder_with_onset(sigbig, onsetbig))
%timeit result = mixing_function(sigbig, onsetbig)
%timeit result = mix(sigbig, onsetbig)
%timeit result = mixnumba(sigbig, onsetbig)
%timeit result = signal_adder_with_onset(sigbig, onsetbig)
# Output
114 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
108 ms ± 2.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
368 ms ± 8.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
13.4 s ± 211 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit result = mixing_function(sigrepeat, onsetrepeat)
%timeit result = mix(sigrepeat, onsetrepeat)
%timeit result = mixnumba(sigrepeat, onsetrepeat)
%timeit result = signal_adder_with_onset(sigrepeat.tolist(), onsetrepeat)
# Output
933 ms ± 6.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
803 ms ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.07 s ± 85.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
254 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
TL.DR.
Marginal performance improvement (around 10% faster) by using np.vectorize
in order to get maxlen
for long signals of random length. Note that for many small signals, @Paritosh Singh answer performs faster than the others.
add a comment
|
If you offset the signals, then put them in a data frame, NaN
will be added to columns to make all the rows the same length. Then you can do df.sum()
. That will return a float rather than int, however.
this sounds interesting with pandas. can you give a code example of the offset?
– J_yang
Mar 28 at 20:43
add a comment
|
Try numpy zero arrays of equal length with the signals appropriately inserted and simply performing 3 numpy array additions. Should speed things up considerably.
def mixing_function(sig,onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
sig1 = np.zeros(maxlen)
sig2 = np.zeros(maxlen)
sig3 = np.zeros(maxlen)
sig1[onset[0]:onset[0] + len(sig[0])] = sig[0]
sig2[onset[1]:onset[1] + len(sig[1])] = sig[1]
sig3[onset[2]:onset[2] + len(sig[2])] = sig[2]
result = sig1+sig2+sig3
print(sig1)
print(sig2)
print(sig3)
print(result)
the code above is just an example, in practice, sig might contain dozens to hundreds of items though. So still can't get away with a for loop, which will be essentially the same.
– J_yang
Mar 28 at 20:42
Ah. Yes. Probably doesn't scale well. But if numpy additions aren't doing it for you, I'm not sure what will.
– thatNLPguy
Mar 28 at 20:46
add a comment
|
Here's an attempt that should do the trick.
def signal_adder_with_onset(data, onset):
# Get lengths of each row of data
lens = np.array([len(i) for i in data])
#adjust with offset for max possible lengths
max_size = lens + onset
# Mask of valid places in each row
mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
& (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))
# Setup output array and put elements from data into masked positions
out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
out[mask] = np.concatenate(data)
return out.sum(axis=0)
import numpy as np
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
onset = np.array((0, 2, 8))
result = signal_adder_with_onset(sig, onset)
print(result)
#[1 2 8 9 5 0 0 0 7 7 7 7]
Edit: Vectorized operations only kick in with more data, and are slower with smaller amounts of data.
Added for comparison
import time
def signal_adder_with_onset(data, onset):
# Get lengths of each row of data
lens = np.array([len(i) for i in data])
#adjust with offset for max possible lengths
max_size = lens + onset
# Mask of valid places in each row
mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
& (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))
# Setup output array and put elements from data into masked positions
out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
out[mask] = np.concatenate(data)
return out.sum(axis=0)
def mixing_function(sig,onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
result = np.zeros(maxlen)
for i in range(len(onset)):
result[onset[i]:onset[i] + len(sig[i])] += sig[i]
return result
import numpy as np
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
sig = np.repeat(sig, 1000000)
onset = np.array((0, 2, 8))
onset = np.repeat(onset, 1000000)
start1 = time.time()
result = signal_adder_with_onset(sig, onset)
end1 = time.time()
start2 = time.time()
result2 = mixing_function(sig,onset)
end2 = time.time()
print(f"Original function: end2 - start2 n Vectorized function: end1 - start1")
print(result)
#Output:
Original function: 9.28258752822876
Vectorized function: 2.5798118114471436
[1000000 2000000 8000000 9000000 5000000 0 0 0 7000000 7000000 7000000
7000000]
This code is actually much slower than the already proposed code in the op.
– Kevin Liu
Mar 28 at 20:34
well actually it is about 5 times slower with this method I am afraid.
– J_yang
Mar 28 at 20:34
It does the trick, but is it really faster? I checked and for me this method works slower.
– Ardweaden
Mar 28 at 20:34
Well, using a different dataset I got very different results:sig = np.array([np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)])
onset = np.random.randint(0, 10000, size=10000)
Gives the results:Original function: 0.156998872756958 Vectorized function: 14.857199907302856
I think long signals of varying length is a much more realistic scenario than one million tiny signals, but I guess only the op can determine what kind of data he expects.
– Kevin Liu
Mar 28 at 21:19
add a comment
|
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55405295%2fpython-how-to-add-multiple-arrays-with-different-length-into-one%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Here are some stats for different solutions to the problem. I was able to squeeze a little more performance by vectorizing the implementation to get maxlen, but besides that, I think you will have to try cython or trying other programming languages.
import numpy as np
from numba import jit
from time import time
np.random.seed(42)
def mixing_function(sig, onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
result = np.zeros(maxlen)
for i in range(len(onset)):
result[onset[i]:onset[i] + len(sig[i])] += sig[i]
return result
def mix(sig, onset):
siglengths = np.vectorize(len)(sig)
maxlen = max(onset + siglengths)
result = np.zeros(maxlen)
for i in range(len(sig)):
result[onset[i]: onset[i]+siglengths[i]] += sig[i]
return result
@jit(nopython=True)
def mixnumba(sig, onset):
# maxlen = np.max([onset[i] + len(sig[i]) for i in range(len(sig))])
maxlen = -1
for i in range(len(sig)):
maxlen = max(maxlen, sig[i].size + onset[i])
result = np.zeros(maxlen)
for i in range(len(sig)):
result[onset[i]: onset[i] + sig[i].size] += sig[i]
return result
def signal_adder_with_onset(data, onset):
data = np.array(data)
# Get lengths of each row of data
lens = np.array([len(i) for i in data])
#adjust with offset for max possible lengths
max_size = lens + onset
# Mask of valid places in each row
mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
& (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))
# Setup output array and put elements from data into masked positions
out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
out[mask] = np.concatenate(data)
return out.sum(axis=0)
sigbig = [np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]
onsetbig = np.random.randint(0, 10000, size=10000)
sigrepeat = np.repeat(sig, 500000).tolist()
onsetrepeat = np.repeat(onset, 500000)
assert all(mixing_function(sigbig, onsetbig) == mix(sigbig, onsetbig))
assert all(mixing_function(sigbig, onsetbig) == mixnumba(sigbig, onsetbig))
assert all(mixing_function(sigbig, onsetbig) == signal_adder_with_onset(sigbig, onsetbig))
%timeit result = mixing_function(sigbig, onsetbig)
%timeit result = mix(sigbig, onsetbig)
%timeit result = mixnumba(sigbig, onsetbig)
%timeit result = signal_adder_with_onset(sigbig, onsetbig)
# Output
114 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
108 ms ± 2.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
368 ms ± 8.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
13.4 s ± 211 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit result = mixing_function(sigrepeat, onsetrepeat)
%timeit result = mix(sigrepeat, onsetrepeat)
%timeit result = mixnumba(sigrepeat, onsetrepeat)
%timeit result = signal_adder_with_onset(sigrepeat.tolist(), onsetrepeat)
# Output
933 ms ± 6.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
803 ms ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.07 s ± 85.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
254 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
TL.DR.
Marginal performance improvement (around 10% faster) by using np.vectorize
in order to get maxlen
for long signals of random length. Note that for many small signals, @Paritosh Singh answer performs faster than the others.
add a comment
|
Here are some stats for different solutions to the problem. I was able to squeeze a little more performance by vectorizing the implementation to get maxlen, but besides that, I think you will have to try cython or trying other programming languages.
import numpy as np
from numba import jit
from time import time
np.random.seed(42)
def mixing_function(sig, onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
result = np.zeros(maxlen)
for i in range(len(onset)):
result[onset[i]:onset[i] + len(sig[i])] += sig[i]
return result
def mix(sig, onset):
siglengths = np.vectorize(len)(sig)
maxlen = max(onset + siglengths)
result = np.zeros(maxlen)
for i in range(len(sig)):
result[onset[i]: onset[i]+siglengths[i]] += sig[i]
return result
@jit(nopython=True)
def mixnumba(sig, onset):
# maxlen = np.max([onset[i] + len(sig[i]) for i in range(len(sig))])
maxlen = -1
for i in range(len(sig)):
maxlen = max(maxlen, sig[i].size + onset[i])
result = np.zeros(maxlen)
for i in range(len(sig)):
result[onset[i]: onset[i] + sig[i].size] += sig[i]
return result
def signal_adder_with_onset(data, onset):
data = np.array(data)
# Get lengths of each row of data
lens = np.array([len(i) for i in data])
#adjust with offset for max possible lengths
max_size = lens + onset
# Mask of valid places in each row
mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
& (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))
# Setup output array and put elements from data into masked positions
out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
out[mask] = np.concatenate(data)
return out.sum(axis=0)
sigbig = [np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]
onsetbig = np.random.randint(0, 10000, size=10000)
sigrepeat = np.repeat(sig, 500000).tolist()
onsetrepeat = np.repeat(onset, 500000)
assert all(mixing_function(sigbig, onsetbig) == mix(sigbig, onsetbig))
assert all(mixing_function(sigbig, onsetbig) == mixnumba(sigbig, onsetbig))
assert all(mixing_function(sigbig, onsetbig) == signal_adder_with_onset(sigbig, onsetbig))
%timeit result = mixing_function(sigbig, onsetbig)
%timeit result = mix(sigbig, onsetbig)
%timeit result = mixnumba(sigbig, onsetbig)
%timeit result = signal_adder_with_onset(sigbig, onsetbig)
# Output
114 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
108 ms ± 2.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
368 ms ± 8.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
13.4 s ± 211 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit result = mixing_function(sigrepeat, onsetrepeat)
%timeit result = mix(sigrepeat, onsetrepeat)
%timeit result = mixnumba(sigrepeat, onsetrepeat)
%timeit result = signal_adder_with_onset(sigrepeat.tolist(), onsetrepeat)
# Output
933 ms ± 6.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
803 ms ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.07 s ± 85.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
254 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
TL.DR.
Marginal performance improvement (around 10% faster) by using np.vectorize
in order to get maxlen
for long signals of random length. Note that for many small signals, @Paritosh Singh answer performs faster than the others.
add a comment
|
Here are some stats for different solutions to the problem. I was able to squeeze a little more performance by vectorizing the implementation to get maxlen, but besides that, I think you will have to try cython or trying other programming languages.
import numpy as np
from numba import jit
from time import time
np.random.seed(42)
def mixing_function(sig, onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
result = np.zeros(maxlen)
for i in range(len(onset)):
result[onset[i]:onset[i] + len(sig[i])] += sig[i]
return result
def mix(sig, onset):
siglengths = np.vectorize(len)(sig)
maxlen = max(onset + siglengths)
result = np.zeros(maxlen)
for i in range(len(sig)):
result[onset[i]: onset[i]+siglengths[i]] += sig[i]
return result
@jit(nopython=True)
def mixnumba(sig, onset):
# maxlen = np.max([onset[i] + len(sig[i]) for i in range(len(sig))])
maxlen = -1
for i in range(len(sig)):
maxlen = max(maxlen, sig[i].size + onset[i])
result = np.zeros(maxlen)
for i in range(len(sig)):
result[onset[i]: onset[i] + sig[i].size] += sig[i]
return result
def signal_adder_with_onset(data, onset):
data = np.array(data)
# Get lengths of each row of data
lens = np.array([len(i) for i in data])
#adjust with offset for max possible lengths
max_size = lens + onset
# Mask of valid places in each row
mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
& (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))
# Setup output array and put elements from data into masked positions
out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
out[mask] = np.concatenate(data)
return out.sum(axis=0)
sigbig = [np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]
onsetbig = np.random.randint(0, 10000, size=10000)
sigrepeat = np.repeat(sig, 500000).tolist()
onsetrepeat = np.repeat(onset, 500000)
assert all(mixing_function(sigbig, onsetbig) == mix(sigbig, onsetbig))
assert all(mixing_function(sigbig, onsetbig) == mixnumba(sigbig, onsetbig))
assert all(mixing_function(sigbig, onsetbig) == signal_adder_with_onset(sigbig, onsetbig))
%timeit result = mixing_function(sigbig, onsetbig)
%timeit result = mix(sigbig, onsetbig)
%timeit result = mixnumba(sigbig, onsetbig)
%timeit result = signal_adder_with_onset(sigbig, onsetbig)
# Output
114 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
108 ms ± 2.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
368 ms ± 8.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
13.4 s ± 211 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit result = mixing_function(sigrepeat, onsetrepeat)
%timeit result = mix(sigrepeat, onsetrepeat)
%timeit result = mixnumba(sigrepeat, onsetrepeat)
%timeit result = signal_adder_with_onset(sigrepeat.tolist(), onsetrepeat)
# Output
933 ms ± 6.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
803 ms ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.07 s ± 85.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
254 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
TL.DR.
Marginal performance improvement (around 10% faster) by using np.vectorize
in order to get maxlen
for long signals of random length. Note that for many small signals, @Paritosh Singh answer performs faster than the others.
Here are some stats for different solutions to the problem. I was able to squeeze a little more performance by vectorizing the implementation to get maxlen, but besides that, I think you will have to try cython or trying other programming languages.
import numpy as np
from numba import jit
from time import time
np.random.seed(42)
def mixing_function(sig, onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
result = np.zeros(maxlen)
for i in range(len(onset)):
result[onset[i]:onset[i] + len(sig[i])] += sig[i]
return result
def mix(sig, onset):
siglengths = np.vectorize(len)(sig)
maxlen = max(onset + siglengths)
result = np.zeros(maxlen)
for i in range(len(sig)):
result[onset[i]: onset[i]+siglengths[i]] += sig[i]
return result
@jit(nopython=True)
def mixnumba(sig, onset):
# maxlen = np.max([onset[i] + len(sig[i]) for i in range(len(sig))])
maxlen = -1
for i in range(len(sig)):
maxlen = max(maxlen, sig[i].size + onset[i])
result = np.zeros(maxlen)
for i in range(len(sig)):
result[onset[i]: onset[i] + sig[i].size] += sig[i]
return result
def signal_adder_with_onset(data, onset):
data = np.array(data)
# Get lengths of each row of data
lens = np.array([len(i) for i in data])
#adjust with offset for max possible lengths
max_size = lens + onset
# Mask of valid places in each row
mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
& (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))
# Setup output array and put elements from data into masked positions
out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
out[mask] = np.concatenate(data)
return out.sum(axis=0)
sigbig = [np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]
onsetbig = np.random.randint(0, 10000, size=10000)
sigrepeat = np.repeat(sig, 500000).tolist()
onsetrepeat = np.repeat(onset, 500000)
assert all(mixing_function(sigbig, onsetbig) == mix(sigbig, onsetbig))
assert all(mixing_function(sigbig, onsetbig) == mixnumba(sigbig, onsetbig))
assert all(mixing_function(sigbig, onsetbig) == signal_adder_with_onset(sigbig, onsetbig))
%timeit result = mixing_function(sigbig, onsetbig)
%timeit result = mix(sigbig, onsetbig)
%timeit result = mixnumba(sigbig, onsetbig)
%timeit result = signal_adder_with_onset(sigbig, onsetbig)
# Output
114 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
108 ms ± 2.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
368 ms ± 8.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
13.4 s ± 211 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit result = mixing_function(sigrepeat, onsetrepeat)
%timeit result = mix(sigrepeat, onsetrepeat)
%timeit result = mixnumba(sigrepeat, onsetrepeat)
%timeit result = signal_adder_with_onset(sigrepeat.tolist(), onsetrepeat)
# Output
933 ms ± 6.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
803 ms ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.07 s ± 85.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
254 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
TL.DR.
Marginal performance improvement (around 10% faster) by using np.vectorize
in order to get maxlen
for long signals of random length. Note that for many small signals, @Paritosh Singh answer performs faster than the others.
edited Mar 28 at 22:33
answered Mar 28 at 22:27
Kevin LiuKevin Liu
12310 bronze badges
12310 bronze badges
add a comment
|
add a comment
|
If you offset the signals, then put them in a data frame, NaN
will be added to columns to make all the rows the same length. Then you can do df.sum()
. That will return a float rather than int, however.
this sounds interesting with pandas. can you give a code example of the offset?
– J_yang
Mar 28 at 20:43
add a comment
|
If you offset the signals, then put them in a data frame, NaN
will be added to columns to make all the rows the same length. Then you can do df.sum()
. That will return a float rather than int, however.
this sounds interesting with pandas. can you give a code example of the offset?
– J_yang
Mar 28 at 20:43
add a comment
|
If you offset the signals, then put them in a data frame, NaN
will be added to columns to make all the rows the same length. Then you can do df.sum()
. That will return a float rather than int, however.
If you offset the signals, then put them in a data frame, NaN
will be added to columns to make all the rows the same length. Then you can do df.sum()
. That will return a float rather than int, however.
answered Mar 28 at 20:29
AcccumulationAcccumulation
1,7821 gold badge3 silver badges9 bronze badges
1,7821 gold badge3 silver badges9 bronze badges
this sounds interesting with pandas. can you give a code example of the offset?
– J_yang
Mar 28 at 20:43
add a comment
|
this sounds interesting with pandas. can you give a code example of the offset?
– J_yang
Mar 28 at 20:43
this sounds interesting with pandas. can you give a code example of the offset?
– J_yang
Mar 28 at 20:43
this sounds interesting with pandas. can you give a code example of the offset?
– J_yang
Mar 28 at 20:43
add a comment
|
Try numpy zero arrays of equal length with the signals appropriately inserted and simply performing 3 numpy array additions. Should speed things up considerably.
def mixing_function(sig,onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
sig1 = np.zeros(maxlen)
sig2 = np.zeros(maxlen)
sig3 = np.zeros(maxlen)
sig1[onset[0]:onset[0] + len(sig[0])] = sig[0]
sig2[onset[1]:onset[1] + len(sig[1])] = sig[1]
sig3[onset[2]:onset[2] + len(sig[2])] = sig[2]
result = sig1+sig2+sig3
print(sig1)
print(sig2)
print(sig3)
print(result)
the code above is just an example, in practice, sig might contain dozens to hundreds of items though. So still can't get away with a for loop, which will be essentially the same.
– J_yang
Mar 28 at 20:42
Ah. Yes. Probably doesn't scale well. But if numpy additions aren't doing it for you, I'm not sure what will.
– thatNLPguy
Mar 28 at 20:46
add a comment
|
Try numpy zero arrays of equal length with the signals appropriately inserted and simply performing 3 numpy array additions. Should speed things up considerably.
def mixing_function(sig,onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
sig1 = np.zeros(maxlen)
sig2 = np.zeros(maxlen)
sig3 = np.zeros(maxlen)
sig1[onset[0]:onset[0] + len(sig[0])] = sig[0]
sig2[onset[1]:onset[1] + len(sig[1])] = sig[1]
sig3[onset[2]:onset[2] + len(sig[2])] = sig[2]
result = sig1+sig2+sig3
print(sig1)
print(sig2)
print(sig3)
print(result)
the code above is just an example, in practice, sig might contain dozens to hundreds of items though. So still can't get away with a for loop, which will be essentially the same.
– J_yang
Mar 28 at 20:42
Ah. Yes. Probably doesn't scale well. But if numpy additions aren't doing it for you, I'm not sure what will.
– thatNLPguy
Mar 28 at 20:46
add a comment
|
Try numpy zero arrays of equal length with the signals appropriately inserted and simply performing 3 numpy array additions. Should speed things up considerably.
def mixing_function(sig,onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
sig1 = np.zeros(maxlen)
sig2 = np.zeros(maxlen)
sig3 = np.zeros(maxlen)
sig1[onset[0]:onset[0] + len(sig[0])] = sig[0]
sig2[onset[1]:onset[1] + len(sig[1])] = sig[1]
sig3[onset[2]:onset[2] + len(sig[2])] = sig[2]
result = sig1+sig2+sig3
print(sig1)
print(sig2)
print(sig3)
print(result)
Try numpy zero arrays of equal length with the signals appropriately inserted and simply performing 3 numpy array additions. Should speed things up considerably.
def mixing_function(sig,onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
sig1 = np.zeros(maxlen)
sig2 = np.zeros(maxlen)
sig3 = np.zeros(maxlen)
sig1[onset[0]:onset[0] + len(sig[0])] = sig[0]
sig2[onset[1]:onset[1] + len(sig[1])] = sig[1]
sig3[onset[2]:onset[2] + len(sig[2])] = sig[2]
result = sig1+sig2+sig3
print(sig1)
print(sig2)
print(sig3)
print(result)
answered Mar 28 at 20:38
thatNLPguythatNLPguy
1018 bronze badges
1018 bronze badges
the code above is just an example, in practice, sig might contain dozens to hundreds of items though. So still can't get away with a for loop, which will be essentially the same.
– J_yang
Mar 28 at 20:42
Ah. Yes. Probably doesn't scale well. But if numpy additions aren't doing it for you, I'm not sure what will.
– thatNLPguy
Mar 28 at 20:46
add a comment
|
the code above is just an example, in practice, sig might contain dozens to hundreds of items though. So still can't get away with a for loop, which will be essentially the same.
– J_yang
Mar 28 at 20:42
Ah. Yes. Probably doesn't scale well. But if numpy additions aren't doing it for you, I'm not sure what will.
– thatNLPguy
Mar 28 at 20:46
the code above is just an example, in practice, sig might contain dozens to hundreds of items though. So still can't get away with a for loop, which will be essentially the same.
– J_yang
Mar 28 at 20:42
the code above is just an example, in practice, sig might contain dozens to hundreds of items though. So still can't get away with a for loop, which will be essentially the same.
– J_yang
Mar 28 at 20:42
Ah. Yes. Probably doesn't scale well. But if numpy additions aren't doing it for you, I'm not sure what will.
– thatNLPguy
Mar 28 at 20:46
Ah. Yes. Probably doesn't scale well. But if numpy additions aren't doing it for you, I'm not sure what will.
– thatNLPguy
Mar 28 at 20:46
add a comment
|
Here's an attempt that should do the trick.
def signal_adder_with_onset(data, onset):
# Get lengths of each row of data
lens = np.array([len(i) for i in data])
#adjust with offset for max possible lengths
max_size = lens + onset
# Mask of valid places in each row
mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
& (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))
# Setup output array and put elements from data into masked positions
out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
out[mask] = np.concatenate(data)
return out.sum(axis=0)
import numpy as np
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
onset = np.array((0, 2, 8))
result = signal_adder_with_onset(sig, onset)
print(result)
#[1 2 8 9 5 0 0 0 7 7 7 7]
Edit: Vectorized operations only kick in with more data, and are slower with smaller amounts of data.
Added for comparison
import time
def signal_adder_with_onset(data, onset):
# Get lengths of each row of data
lens = np.array([len(i) for i in data])
#adjust with offset for max possible lengths
max_size = lens + onset
# Mask of valid places in each row
mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
& (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))
# Setup output array and put elements from data into masked positions
out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
out[mask] = np.concatenate(data)
return out.sum(axis=0)
def mixing_function(sig,onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
result = np.zeros(maxlen)
for i in range(len(onset)):
result[onset[i]:onset[i] + len(sig[i])] += sig[i]
return result
import numpy as np
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
sig = np.repeat(sig, 1000000)
onset = np.array((0, 2, 8))
onset = np.repeat(onset, 1000000)
start1 = time.time()
result = signal_adder_with_onset(sig, onset)
end1 = time.time()
start2 = time.time()
result2 = mixing_function(sig,onset)
end2 = time.time()
print(f"Original function: end2 - start2 n Vectorized function: end1 - start1")
print(result)
#Output:
Original function: 9.28258752822876
Vectorized function: 2.5798118114471436
[1000000 2000000 8000000 9000000 5000000 0 0 0 7000000 7000000 7000000
7000000]
This code is actually much slower than the already proposed code in the op.
– Kevin Liu
Mar 28 at 20:34
well actually it is about 5 times slower with this method I am afraid.
– J_yang
Mar 28 at 20:34
It does the trick, but is it really faster? I checked and for me this method works slower.
– Ardweaden
Mar 28 at 20:34
Well, using a different dataset I got very different results:sig = np.array([np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)])
onset = np.random.randint(0, 10000, size=10000)
Gives the results:Original function: 0.156998872756958 Vectorized function: 14.857199907302856
I think long signals of varying length is a much more realistic scenario than one million tiny signals, but I guess only the op can determine what kind of data he expects.
– Kevin Liu
Mar 28 at 21:19
add a comment
|
Here's an attempt that should do the trick.
def signal_adder_with_onset(data, onset):
# Get lengths of each row of data
lens = np.array([len(i) for i in data])
#adjust with offset for max possible lengths
max_size = lens + onset
# Mask of valid places in each row
mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
& (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))
# Setup output array and put elements from data into masked positions
out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
out[mask] = np.concatenate(data)
return out.sum(axis=0)
import numpy as np
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
onset = np.array((0, 2, 8))
result = signal_adder_with_onset(sig, onset)
print(result)
#[1 2 8 9 5 0 0 0 7 7 7 7]
Edit: Vectorized operations only kick in with more data, and are slower with smaller amounts of data.
Added for comparison
import time
def signal_adder_with_onset(data, onset):
# Get lengths of each row of data
lens = np.array([len(i) for i in data])
#adjust with offset for max possible lengths
max_size = lens + onset
# Mask of valid places in each row
mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
& (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))
# Setup output array and put elements from data into masked positions
out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
out[mask] = np.concatenate(data)
return out.sum(axis=0)
def mixing_function(sig,onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
result = np.zeros(maxlen)
for i in range(len(onset)):
result[onset[i]:onset[i] + len(sig[i])] += sig[i]
return result
import numpy as np
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
sig = np.repeat(sig, 1000000)
onset = np.array((0, 2, 8))
onset = np.repeat(onset, 1000000)
start1 = time.time()
result = signal_adder_with_onset(sig, onset)
end1 = time.time()
start2 = time.time()
result2 = mixing_function(sig,onset)
end2 = time.time()
print(f"Original function: end2 - start2 n Vectorized function: end1 - start1")
print(result)
#Output:
Original function: 9.28258752822876
Vectorized function: 2.5798118114471436
[1000000 2000000 8000000 9000000 5000000 0 0 0 7000000 7000000 7000000
7000000]
This code is actually much slower than the already proposed code in the op.
– Kevin Liu
Mar 28 at 20:34
well actually it is about 5 times slower with this method I am afraid.
– J_yang
Mar 28 at 20:34
It does the trick, but is it really faster? I checked and for me this method works slower.
– Ardweaden
Mar 28 at 20:34
Well, using a different dataset I got very different results:sig = np.array([np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)])
onset = np.random.randint(0, 10000, size=10000)
Gives the results:Original function: 0.156998872756958 Vectorized function: 14.857199907302856
I think long signals of varying length is a much more realistic scenario than one million tiny signals, but I guess only the op can determine what kind of data he expects.
– Kevin Liu
Mar 28 at 21:19
add a comment
|
Here's an attempt that should do the trick.
def signal_adder_with_onset(data, onset):
# Get lengths of each row of data
lens = np.array([len(i) for i in data])
#adjust with offset for max possible lengths
max_size = lens + onset
# Mask of valid places in each row
mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
& (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))
# Setup output array and put elements from data into masked positions
out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
out[mask] = np.concatenate(data)
return out.sum(axis=0)
import numpy as np
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
onset = np.array((0, 2, 8))
result = signal_adder_with_onset(sig, onset)
print(result)
#[1 2 8 9 5 0 0 0 7 7 7 7]
Edit: Vectorized operations only kick in with more data, and are slower with smaller amounts of data.
Added for comparison
import time
def signal_adder_with_onset(data, onset):
# Get lengths of each row of data
lens = np.array([len(i) for i in data])
#adjust with offset for max possible lengths
max_size = lens + onset
# Mask of valid places in each row
mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
& (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))
# Setup output array and put elements from data into masked positions
out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
out[mask] = np.concatenate(data)
return out.sum(axis=0)
def mixing_function(sig,onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
result = np.zeros(maxlen)
for i in range(len(onset)):
result[onset[i]:onset[i] + len(sig[i])] += sig[i]
return result
import numpy as np
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
sig = np.repeat(sig, 1000000)
onset = np.array((0, 2, 8))
onset = np.repeat(onset, 1000000)
start1 = time.time()
result = signal_adder_with_onset(sig, onset)
end1 = time.time()
start2 = time.time()
result2 = mixing_function(sig,onset)
end2 = time.time()
print(f"Original function: end2 - start2 n Vectorized function: end1 - start1")
print(result)
#Output:
Original function: 9.28258752822876
Vectorized function: 2.5798118114471436
[1000000 2000000 8000000 9000000 5000000 0 0 0 7000000 7000000 7000000
7000000]
Here's an attempt that should do the trick.
def signal_adder_with_onset(data, onset):
# Get lengths of each row of data
lens = np.array([len(i) for i in data])
#adjust with offset for max possible lengths
max_size = lens + onset
# Mask of valid places in each row
mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
& (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))
# Setup output array and put elements from data into masked positions
out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
out[mask] = np.concatenate(data)
return out.sum(axis=0)
import numpy as np
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
onset = np.array((0, 2, 8))
result = signal_adder_with_onset(sig, onset)
print(result)
#[1 2 8 9 5 0 0 0 7 7 7 7]
Edit: Vectorized operations only kick in with more data, and are slower with smaller amounts of data.
Added for comparison
import time
def signal_adder_with_onset(data, onset):
# Get lengths of each row of data
lens = np.array([len(i) for i in data])
#adjust with offset for max possible lengths
max_size = lens + onset
# Mask of valid places in each row
mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
& (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))
# Setup output array and put elements from data into masked positions
out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
out[mask] = np.concatenate(data)
return out.sum(axis=0)
def mixing_function(sig,onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
result = np.zeros(maxlen)
for i in range(len(onset)):
result[onset[i]:onset[i] + len(sig[i])] += sig[i]
return result
import numpy as np
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
sig = np.repeat(sig, 1000000)
onset = np.array((0, 2, 8))
onset = np.repeat(onset, 1000000)
start1 = time.time()
result = signal_adder_with_onset(sig, onset)
end1 = time.time()
start2 = time.time()
result2 = mixing_function(sig,onset)
end2 = time.time()
print(f"Original function: end2 - start2 n Vectorized function: end1 - start1")
print(result)
#Output:
Original function: 9.28258752822876
Vectorized function: 2.5798118114471436
[1000000 2000000 8000000 9000000 5000000 0 0 0 7000000 7000000 7000000
7000000]
edited Mar 28 at 20:49
answered Mar 28 at 20:20
Paritosh SinghParitosh Singh
4,5192 gold badges7 silver badges29 bronze badges
4,5192 gold badges7 silver badges29 bronze badges
This code is actually much slower than the already proposed code in the op.
– Kevin Liu
Mar 28 at 20:34
well actually it is about 5 times slower with this method I am afraid.
– J_yang
Mar 28 at 20:34
It does the trick, but is it really faster? I checked and for me this method works slower.
– Ardweaden
Mar 28 at 20:34
Well, using a different dataset I got very different results:sig = np.array([np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)])
onset = np.random.randint(0, 10000, size=10000)
Gives the results:Original function: 0.156998872756958 Vectorized function: 14.857199907302856
I think long signals of varying length is a much more realistic scenario than one million tiny signals, but I guess only the op can determine what kind of data he expects.
– Kevin Liu
Mar 28 at 21:19
add a comment
|
This code is actually much slower than the already proposed code in the op.
– Kevin Liu
Mar 28 at 20:34
well actually it is about 5 times slower with this method I am afraid.
– J_yang
Mar 28 at 20:34
It does the trick, but is it really faster? I checked and for me this method works slower.
– Ardweaden
Mar 28 at 20:34
Well, using a different dataset I got very different results:sig = np.array([np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)])
onset = np.random.randint(0, 10000, size=10000)
Gives the results:Original function: 0.156998872756958 Vectorized function: 14.857199907302856
I think long signals of varying length is a much more realistic scenario than one million tiny signals, but I guess only the op can determine what kind of data he expects.
– Kevin Liu
Mar 28 at 21:19
This code is actually much slower than the already proposed code in the op.
– Kevin Liu
Mar 28 at 20:34
This code is actually much slower than the already proposed code in the op.
– Kevin Liu
Mar 28 at 20:34
well actually it is about 5 times slower with this method I am afraid.
– J_yang
Mar 28 at 20:34
well actually it is about 5 times slower with this method I am afraid.
– J_yang
Mar 28 at 20:34
It does the trick, but is it really faster? I checked and for me this method works slower.
– Ardweaden
Mar 28 at 20:34
It does the trick, but is it really faster? I checked and for me this method works slower.
– Ardweaden
Mar 28 at 20:34
Well, using a different dataset I got very different results:
sig = np.array([np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)])
onset = np.random.randint(0, 10000, size=10000)
Gives the results: Original function: 0.156998872756958 Vectorized function: 14.857199907302856
I think long signals of varying length is a much more realistic scenario than one million tiny signals, but I guess only the op can determine what kind of data he expects.– Kevin Liu
Mar 28 at 21:19
Well, using a different dataset I got very different results:
sig = np.array([np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)])
onset = np.random.randint(0, 10000, size=10000)
Gives the results: Original function: 0.156998872756958 Vectorized function: 14.857199907302856
I think long signals of varying length is a much more realistic scenario than one million tiny signals, but I guess only the op can determine what kind of data he expects.– Kevin Liu
Mar 28 at 21:19
add a comment
|
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55405295%2fpython-how-to-add-multiple-arrays-with-different-length-into-one%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
I don't see any obvious way of speeding this up. Just that you could
numba
-compile it. And don't wrap your list of arrays insig
into anumpy
array. Just keep it a list of arrays.– j08lue
Mar 28 at 19:34
"addition" in the context of arrays is ambiguous. For instance, it can mean appending. In this case, you seem to mean element-wise addition. Also, normal English is "add A and B" or "add A to B", not "B adds A".
– Acccumulation
Mar 28 at 20:16
Padding with zeros (or something else) comes up periodically. There are good answers in Convert Python sequence to NumPy array, filling missing values, including a clean version of the
mask
, and also one usingitertools.zip_longest
.– hpaulj
Mar 29 at 2:43