Python how to add multiple arrays with different length into oneConvert Python sequence to NumPy array, filling missing valuesHow do I copy a file in Python?What is the difference between Python's list methods append and extend?How can I safely create a nested directory?How can I remove a trailing newline?How do I parse a string to a float or int?How to get the current time in PythonHow can I make a time delay in Python?How do I get the number of elements in a list?How do I concatenate two lists in Python?How do I lowercase a string in Python?

My machine, client installed VPN,

Where to find the Arxiv endorsement code?

Convert a string of digits from words to an integer

Do interval ratios take overtones into account or solely the fundamental frequency?

Can I exile my opponent's Progenitus/True-Name Nemesis with Teferi, Hero of Dominaria's emblem?

what organs or modifications would be needed to have hairy fish?

I reverse the source code, you reverse the input!

Why do some modern glider wings like the Schleicher 29 have a tadpole shape rather than a teardrop shape?

Question about a degree 5 polynomial with no rational roots

Concerning a relationship in the team

Whaling ship logistics

Received a package but didn't order it

Windows 10 deletes lots of tiny files super slowly. Anything that can be done to speed it up?

I transpose the source code, you transpose the input!

A word that refers to saying something in an attempt to anger or embarrass someone into doing something that they don’t want to do?

Dynamic DataSource for Droplist in Content Editor

An impressive body of work

Why does my browser attempt to download pages from http://clhs.lisp.se instead of viewing them normally?

Another student has been assigned the same MSc thesis as mine (and already defended)

Is determiner 'a' needed here?

Why would an airline put 15 passengers at once on standby?

How many stack cables would be needed if we want to stack two 3850 switches

Population of post-Soviet states. Why decreasing?

What happens to a net with the Returning Weapon artificer infusion after it hits?

Python how to add multiple arrays with different length into one

Convert Python sequence to NumPy array, filling missing valuesHow do I copy a file in Python?What is the difference between Python's list methods append and extend?How can I safely create a nested directory?How can I remove a trailing newline?How do I parse a string to a float or int?How to get the current time in PythonHow can I make a time delay in Python?How do I get the number of elements in a list?How do I concatenate two lists in Python?How do I lowercase a string in Python?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I am working a program that needs to mix audio arrays together with a given starting index. For example

signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
onset(0, 2, 8)
result = mixing_function(sig,onset)

Based on the onset, signal2 will add to signal1 from index 2, and signal3 will add to the mix from index 8, so the mixing part will be zero padded. It should return:

[1,2,8,9,5,0,0,0,7,7,7,7]

I am not sure what is the effective way to write the code for this. For now, I created a zero array with the maximum length maxlen. Then I add each element in sig to the corresponding index range of the result :

def mixing_function(sig,onset):
 maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
 result = np.zeros(maxlen)
 for i in range(len(onset)):
 result[onset[i]:onset[i] + len(sig[i])] += sig[i] 
 return result

However, this can be quite slow especially when there are many signals being mixed together all with different onsets. Please advice if there is a much more efficient way .

Many thanks

edited Mar 28 at 19:21

asked Mar 28 at 19:16

J_yang

8553 gold badges13 silver badges30 bronze badges

1

I don't see any obvious way of speeding this up. Just that you could numba-compile it. And don't wrap your list of arrays in sig into a numpy array. Just keep it a list of arrays.

– j08lue
Mar 28 at 19:34

"addition" in the context of arrays is ambiguous. For instance, it can mean appending. In this case, you seem to mean element-wise addition. Also, normal English is "add A and B" or "add A to B", not "B adds A".

– Acccumulation
Mar 28 at 20:16

Padding with zeros (or something else) comes up periodically. There are good answers in Convert Python sequence to NumPy array, filling missing values, including a clean version of the mask, and also one using itertools.zip_longest.

– hpaulj
Mar 29 at 2:43

add a comment
|

I am working a program that needs to mix audio arrays together with a given starting index. For example

signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
onset(0, 2, 8)
result = mixing_function(sig,onset)

Based on the onset, signal2 will add to signal1 from index 2, and signal3 will add to the mix from index 8, so the mixing part will be zero padded. It should return:

[1,2,8,9,5,0,0,0,7,7,7,7]

def mixing_function(sig,onset):
 maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
 result = np.zeros(maxlen)
 for i in range(len(onset)):
 result[onset[i]:onset[i] + len(sig[i])] += sig[i] 
 return result

However, this can be quite slow especially when there are many signals being mixed together all with different onsets. Please advice if there is a much more efficient way .

Many thanks

edited Mar 28 at 19:21

asked Mar 28 at 19:16

J_yang

8553 gold badges13 silver badges30 bronze badges

1

I don't see any obvious way of speeding this up. Just that you could numba-compile it. And don't wrap your list of arrays in sig into a numpy array. Just keep it a list of arrays.

– j08lue
Mar 28 at 19:34

"addition" in the context of arrays is ambiguous. For instance, it can mean appending. In this case, you seem to mean element-wise addition. Also, normal English is "add A and B" or "add A to B", not "B adds A".

– Acccumulation
Mar 28 at 20:16

Padding with zeros (or something else) comes up periodically. There are good answers in Convert Python sequence to NumPy array, filling missing values, including a clean version of the mask, and also one using itertools.zip_longest.

– hpaulj
Mar 29 at 2:43

add a comment
|

I am working a program that needs to mix audio arrays together with a given starting index. For example

signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
onset(0, 2, 8)
result = mixing_function(sig,onset)

Based on the onset, signal2 will add to signal1 from index 2, and signal3 will add to the mix from index 8, so the mixing part will be zero padded. It should return:

[1,2,8,9,5,0,0,0,7,7,7,7]

def mixing_function(sig,onset):
 maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
 result = np.zeros(maxlen)
 for i in range(len(onset)):
 result[onset[i]:onset[i] + len(sig[i])] += sig[i] 
 return result

However, this can be quite slow especially when there are many signals being mixed together all with different onsets. Please advice if there is a much more efficient way .

Many thanks

edited Mar 28 at 19:21

asked Mar 28 at 19:16

J_yang

8553 gold badges13 silver badges30 bronze badges

I am working a program that needs to mix audio arrays together with a given starting index. For example

signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
onset(0, 2, 8)
result = mixing_function(sig,onset)

Based on the onset, signal2 will add to signal1 from index 2, and signal3 will add to the mix from index 8, so the mixing part will be zero padded. It should return:

[1,2,8,9,5,0,0,0,7,7,7,7]

def mixing_function(sig,onset):
 maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
 result = np.zeros(maxlen)
 for i in range(len(onset)):
 result[onset[i]:onset[i] + len(sig[i])] += sig[i] 
 return result

However, this can be quite slow especially when there are many signals being mixed together all with different onsets. Please advice if there is a much more efficient way .

Many thanks

python numpy

edited Mar 28 at 19:21

asked Mar 28 at 19:16

J_yang

8553 gold badges13 silver badges30 bronze badges

edited Mar 28 at 19:21

asked Mar 28 at 19:16

J_yang

8553 gold badges13 silver badges30 bronze badges

edited Mar 28 at 19:21

asked Mar 28 at 19:16

J_yang

8553 gold badges13 silver badges30 bronze badges

asked Mar 28 at 19:16

J_yang

8553 gold badges13 silver badges30 bronze badges

asked Mar 28 at 19:16

J_yang

8553 gold badges13 silver badges30 bronze badges

1

I don't see any obvious way of speeding this up. Just that you could numba-compile it. And don't wrap your list of arrays in sig into a numpy array. Just keep it a list of arrays.

– j08lue
Mar 28 at 19:34

"addition" in the context of arrays is ambiguous. For instance, it can mean appending. In this case, you seem to mean element-wise addition. Also, normal English is "add A and B" or "add A to B", not "B adds A".

– Acccumulation
Mar 28 at 20:16

Padding with zeros (or something else) comes up periodically. There are good answers in Convert Python sequence to NumPy array, filling missing values, including a clean version of the mask, and also one using itertools.zip_longest.

– hpaulj
Mar 29 at 2:43

add a comment
|

1

I don't see any obvious way of speeding this up. Just that you could numba-compile it. And don't wrap your list of arrays in sig into a numpy array. Just keep it a list of arrays.

– j08lue
Mar 28 at 19:34

"addition" in the context of arrays is ambiguous. For instance, it can mean appending. In this case, you seem to mean element-wise addition. Also, normal English is "add A and B" or "add A to B", not "B adds A".

– Acccumulation
Mar 28 at 20:16

Padding with zeros (or something else) comes up periodically. There are good answers in Convert Python sequence to NumPy array, filling missing values, including a clean version of the mask, and also one using itertools.zip_longest.

– hpaulj
Mar 29 at 2:43

I don't see any obvious way of speeding this up. Just that you could numba-compile it. And don't wrap your list of arrays in sig into a numpy array. Just keep it a list of arrays.

– j08lue
Mar 28 at 19:34

"addition" in the context of arrays is ambiguous. For instance, it can mean appending. In this case, you seem to mean element-wise addition. Also, normal English is "add A and B" or "add A to B", not "B adds A".

– Acccumulation
Mar 28 at 20:16

Padding with zeros (or something else) comes up periodically. There are good answers in Convert Python sequence to NumPy array, filling missing values, including a clean version of the mask, and also one using itertools.zip_longest.

– hpaulj
Mar 29 at 2:43

add a comment
|

4 Answers
4

active

oldest

votes

Here are some stats for different solutions to the problem. I was able to squeeze a little more performance by vectorizing the implementation to get maxlen, but besides that, I think you will have to try cython or trying other programming languages.

import numpy as np
from numba import jit
from time import time
np.random.seed(42)

def mixing_function(sig, onset):
 maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
 result = np.zeros(maxlen)
 for i in range(len(onset)):
 result[onset[i]:onset[i] + len(sig[i])] += sig[i] 
 return result

def mix(sig, onset):
 siglengths = np.vectorize(len)(sig)
 maxlen = max(onset + siglengths)
 result = np.zeros(maxlen)
 for i in range(len(sig)):
 result[onset[i]: onset[i]+siglengths[i]] += sig[i]
 return result

@jit(nopython=True)
def mixnumba(sig, onset):
 # maxlen = np.max([onset[i] + len(sig[i]) for i in range(len(sig))])
 maxlen = -1
 for i in range(len(sig)):
 maxlen = max(maxlen, sig[i].size + onset[i])
 result = np.zeros(maxlen)
 for i in range(len(sig)):
 result[onset[i]: onset[i] + sig[i].size] += sig[i]
 return result

def signal_adder_with_onset(data, onset):
 data = np.array(data)
 # Get lengths of each row of data
 lens = np.array([len(i) for i in data])
 #adjust with offset for max possible lengths
 max_size = lens + onset
 # Mask of valid places in each row
 mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1)) 
 & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

 # Setup output array and put elements from data into masked positions
 out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
 out[mask] = np.concatenate(data)
 return out.sum(axis=0)

sigbig = [np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]
onsetbig = np.random.randint(0, 10000, size=10000)
sigrepeat = np.repeat(sig, 500000).tolist()
onsetrepeat = np.repeat(onset, 500000)

assert all(mixing_function(sigbig, onsetbig) == mix(sigbig, onsetbig))
assert all(mixing_function(sigbig, onsetbig) == mixnumba(sigbig, onsetbig))
assert all(mixing_function(sigbig, onsetbig) == signal_adder_with_onset(sigbig, onsetbig))

%timeit result = mixing_function(sigbig, onsetbig)
%timeit result = mix(sigbig, onsetbig)
%timeit result = mixnumba(sigbig, onsetbig)
%timeit result = signal_adder_with_onset(sigbig, onsetbig)
# Output
114 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
108 ms ± 2.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
368 ms ± 8.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
13.4 s ± 211 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit result = mixing_function(sigrepeat, onsetrepeat)
%timeit result = mix(sigrepeat, onsetrepeat)
%timeit result = mixnumba(sigrepeat, onsetrepeat)
%timeit result = signal_adder_with_onset(sigrepeat.tolist(), onsetrepeat)
# Output
933 ms ± 6.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
803 ms ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.07 s ± 85.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
254 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

TL.DR.
Marginal performance improvement (around 10% faster) by using np.vectorize in order to get maxlen for long signals of random length. Note that for many small signals, @Paritosh Singh answer performs faster than the others.

edited Mar 28 at 22:33

answered Mar 28 at 22:27

Kevin Liu

12310 bronze badges

add a comment
|

If you offset the signals, then put them in a data frame, NaN will be added to columns to make all the rows the same length. Then you can do df.sum(). That will return a float rather than int, however.

answered Mar 28 at 20:29

Acccumulation

1,7821 gold badge3 silver badges9 bronze badges

this sounds interesting with pandas. can you give a code example of the offset?

– J_yang
Mar 28 at 20:43

add a comment
|

Try numpy zero arrays of equal length with the signals appropriately inserted and simply performing 3 numpy array additions. Should speed things up considerably.

def mixing_function(sig,onset):
 maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
 sig1 = np.zeros(maxlen)
 sig2 = np.zeros(maxlen)
 sig3 = np.zeros(maxlen)
 sig1[onset[0]:onset[0] + len(sig[0])] = sig[0]
 sig2[onset[1]:onset[1] + len(sig[1])] = sig[1]
 sig3[onset[2]:onset[2] + len(sig[2])] = sig[2]
 result = sig1+sig2+sig3
 print(sig1)
 print(sig2)
 print(sig3)
 print(result)

answered Mar 28 at 20:38

thatNLPguy

1018 bronze badges

the code above is just an example, in practice, sig might contain dozens to hundreds of items though. So still can't get away with a for loop, which will be essentially the same.

– J_yang
Mar 28 at 20:42

Ah. Yes. Probably doesn't scale well. But if numpy additions aren't doing it for you, I'm not sure what will.

– thatNLPguy
Mar 28 at 20:46

add a comment
|

Here's an attempt that should do the trick.

def signal_adder_with_onset(data, onset):
 # Get lengths of each row of data
 lens = np.array([len(i) for i in data])
 #adjust with offset for max possible lengths
 max_size = lens + onset
 # Mask of valid places in each row
 mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1)) 
 & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

 # Setup output array and put elements from data into masked positions
 out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
 out[mask] = np.concatenate(data)
 return out.sum(axis=0)

import numpy as np
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
onset = np.array((0, 2, 8))
result = signal_adder_with_onset(sig, onset)
print(result)
#[1 2 8 9 5 0 0 0 7 7 7 7]

Edit: Vectorized operations only kick in with more data, and are slower with smaller amounts of data.

Added for comparison

import time

def signal_adder_with_onset(data, onset):
 # Get lengths of each row of data
 lens = np.array([len(i) for i in data])
 #adjust with offset for max possible lengths
 max_size = lens + onset
 # Mask of valid places in each row
 mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1)) 
 & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

 # Setup output array and put elements from data into masked positions
 out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
 out[mask] = np.concatenate(data)
 return out.sum(axis=0)

def mixing_function(sig,onset):
 maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
 result = np.zeros(maxlen)
 for i in range(len(onset)):
 result[onset[i]:onset[i] + len(sig[i])] += sig[i] 
 return result

import numpy as np
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
sig = np.repeat(sig, 1000000)
onset = np.array((0, 2, 8))
onset = np.repeat(onset, 1000000)
start1 = time.time()
result = signal_adder_with_onset(sig, onset)
end1 = time.time()
start2 = time.time()
result2 = mixing_function(sig,onset)
end2 = time.time()
print(f"Original function: end2 - start2 n Vectorized function: end1 - start1")
print(result)
#Output:
Original function: 9.28258752822876 
 Vectorized function: 2.5798118114471436
[1000000 2000000 8000000 9000000 5000000 0 0 0 7000000 7000000 7000000
 7000000]

edited Mar 28 at 20:49

answered Mar 28 at 20:20

Paritosh Singh

4,5192 gold badges7 silver badges29 bronze badges

This code is actually much slower than the already proposed code in the op.

– Kevin Liu
Mar 28 at 20:34

well actually it is about 5 times slower with this method I am afraid.

– J_yang
Mar 28 at 20:34

It does the trick, but is it really faster? I checked and for me this method works slower.

– Ardweaden
Mar 28 at 20:34

Well, using a different dataset I got very different results: sig = np.array([np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]) onset = np.random.randint(0, 10000, size=10000) Gives the results: Original function: 0.156998872756958 Vectorized function: 14.857199907302856 I think long signals of varying length is a much more realistic scenario than one million tiny signals, but I guess only the op can determine what kind of data he expects.

– Kevin Liu
Mar 28 at 21:19

add a comment
|

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55405295%2fpython-how-to-add-multiple-arrays-with-different-length-into-one%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

import numpy as np
from numba import jit
from time import time
np.random.seed(42)

def mixing_function(sig, onset):
 maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
 result = np.zeros(maxlen)
 for i in range(len(onset)):
 result[onset[i]:onset[i] + len(sig[i])] += sig[i] 
 return result

def mix(sig, onset):
 siglengths = np.vectorize(len)(sig)
 maxlen = max(onset + siglengths)
 result = np.zeros(maxlen)
 for i in range(len(sig)):
 result[onset[i]: onset[i]+siglengths[i]] += sig[i]
 return result

@jit(nopython=True)
def mixnumba(sig, onset):
 # maxlen = np.max([onset[i] + len(sig[i]) for i in range(len(sig))])
 maxlen = -1
 for i in range(len(sig)):
 maxlen = max(maxlen, sig[i].size + onset[i])
 result = np.zeros(maxlen)
 for i in range(len(sig)):
 result[onset[i]: onset[i] + sig[i].size] += sig[i]
 return result

def signal_adder_with_onset(data, onset):
 data = np.array(data)
 # Get lengths of each row of data
 lens = np.array([len(i) for i in data])
 #adjust with offset for max possible lengths
 max_size = lens + onset
 # Mask of valid places in each row
 mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1)) 
 & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

 # Setup output array and put elements from data into masked positions
 out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
 out[mask] = np.concatenate(data)
 return out.sum(axis=0)

sigbig = [np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]
onsetbig = np.random.randint(0, 10000, size=10000)
sigrepeat = np.repeat(sig, 500000).tolist()
onsetrepeat = np.repeat(onset, 500000)

assert all(mixing_function(sigbig, onsetbig) == mix(sigbig, onsetbig))
assert all(mixing_function(sigbig, onsetbig) == mixnumba(sigbig, onsetbig))
assert all(mixing_function(sigbig, onsetbig) == signal_adder_with_onset(sigbig, onsetbig))

%timeit result = mixing_function(sigbig, onsetbig)
%timeit result = mix(sigbig, onsetbig)
%timeit result = mixnumba(sigbig, onsetbig)
%timeit result = signal_adder_with_onset(sigbig, onsetbig)
# Output
114 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
108 ms ± 2.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
368 ms ± 8.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
13.4 s ± 211 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit result = mixing_function(sigrepeat, onsetrepeat)
%timeit result = mix(sigrepeat, onsetrepeat)
%timeit result = mixnumba(sigrepeat, onsetrepeat)
%timeit result = signal_adder_with_onset(sigrepeat.tolist(), onsetrepeat)
# Output
933 ms ± 6.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
803 ms ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.07 s ± 85.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
254 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

edited Mar 28 at 22:33

answered Mar 28 at 22:27

Kevin Liu

12310 bronze badges

add a comment
|

import numpy as np
from numba import jit
from time import time
np.random.seed(42)

def mixing_function(sig, onset):
 maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
 result = np.zeros(maxlen)
 for i in range(len(onset)):
 result[onset[i]:onset[i] + len(sig[i])] += sig[i] 
 return result

def mix(sig, onset):
 siglengths = np.vectorize(len)(sig)
 maxlen = max(onset + siglengths)
 result = np.zeros(maxlen)
 for i in range(len(sig)):
 result[onset[i]: onset[i]+siglengths[i]] += sig[i]
 return result

@jit(nopython=True)
def mixnumba(sig, onset):
 # maxlen = np.max([onset[i] + len(sig[i]) for i in range(len(sig))])
 maxlen = -1
 for i in range(len(sig)):
 maxlen = max(maxlen, sig[i].size + onset[i])
 result = np.zeros(maxlen)
 for i in range(len(sig)):
 result[onset[i]: onset[i] + sig[i].size] += sig[i]
 return result

def signal_adder_with_onset(data, onset):
 data = np.array(data)
 # Get lengths of each row of data
 lens = np.array([len(i) for i in data])
 #adjust with offset for max possible lengths
 max_size = lens + onset
 # Mask of valid places in each row
 mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1)) 
 & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

 # Setup output array and put elements from data into masked positions
 out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
 out[mask] = np.concatenate(data)
 return out.sum(axis=0)

sigbig = [np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]
onsetbig = np.random.randint(0, 10000, size=10000)
sigrepeat = np.repeat(sig, 500000).tolist()
onsetrepeat = np.repeat(onset, 500000)

assert all(mixing_function(sigbig, onsetbig) == mix(sigbig, onsetbig))
assert all(mixing_function(sigbig, onsetbig) == mixnumba(sigbig, onsetbig))
assert all(mixing_function(sigbig, onsetbig) == signal_adder_with_onset(sigbig, onsetbig))

%timeit result = mixing_function(sigbig, onsetbig)
%timeit result = mix(sigbig, onsetbig)
%timeit result = mixnumba(sigbig, onsetbig)
%timeit result = signal_adder_with_onset(sigbig, onsetbig)
# Output
114 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
108 ms ± 2.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
368 ms ± 8.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
13.4 s ± 211 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit result = mixing_function(sigrepeat, onsetrepeat)
%timeit result = mix(sigrepeat, onsetrepeat)
%timeit result = mixnumba(sigrepeat, onsetrepeat)
%timeit result = signal_adder_with_onset(sigrepeat.tolist(), onsetrepeat)
# Output
933 ms ± 6.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
803 ms ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.07 s ± 85.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
254 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

edited Mar 28 at 22:33

answered Mar 28 at 22:27

Kevin Liu

12310 bronze badges

add a comment
|

import numpy as np
from numba import jit
from time import time
np.random.seed(42)

def mixing_function(sig, onset):
 maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
 result = np.zeros(maxlen)
 for i in range(len(onset)):
 result[onset[i]:onset[i] + len(sig[i])] += sig[i] 
 return result

def mix(sig, onset):
 siglengths = np.vectorize(len)(sig)
 maxlen = max(onset + siglengths)
 result = np.zeros(maxlen)
 for i in range(len(sig)):
 result[onset[i]: onset[i]+siglengths[i]] += sig[i]
 return result

@jit(nopython=True)
def mixnumba(sig, onset):
 # maxlen = np.max([onset[i] + len(sig[i]) for i in range(len(sig))])
 maxlen = -1
 for i in range(len(sig)):
 maxlen = max(maxlen, sig[i].size + onset[i])
 result = np.zeros(maxlen)
 for i in range(len(sig)):
 result[onset[i]: onset[i] + sig[i].size] += sig[i]
 return result

def signal_adder_with_onset(data, onset):
 data = np.array(data)
 # Get lengths of each row of data
 lens = np.array([len(i) for i in data])
 #adjust with offset for max possible lengths
 max_size = lens + onset
 # Mask of valid places in each row
 mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1)) 
 & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

 # Setup output array and put elements from data into masked positions
 out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
 out[mask] = np.concatenate(data)
 return out.sum(axis=0)

sigbig = [np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]
onsetbig = np.random.randint(0, 10000, size=10000)
sigrepeat = np.repeat(sig, 500000).tolist()
onsetrepeat = np.repeat(onset, 500000)

assert all(mixing_function(sigbig, onsetbig) == mix(sigbig, onsetbig))
assert all(mixing_function(sigbig, onsetbig) == mixnumba(sigbig, onsetbig))
assert all(mixing_function(sigbig, onsetbig) == signal_adder_with_onset(sigbig, onsetbig))

%timeit result = mixing_function(sigbig, onsetbig)
%timeit result = mix(sigbig, onsetbig)
%timeit result = mixnumba(sigbig, onsetbig)
%timeit result = signal_adder_with_onset(sigbig, onsetbig)
# Output
114 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
108 ms ± 2.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
368 ms ± 8.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
13.4 s ± 211 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit result = mixing_function(sigrepeat, onsetrepeat)
%timeit result = mix(sigrepeat, onsetrepeat)
%timeit result = mixnumba(sigrepeat, onsetrepeat)
%timeit result = signal_adder_with_onset(sigrepeat.tolist(), onsetrepeat)
# Output
933 ms ± 6.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
803 ms ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.07 s ± 85.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
254 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

edited Mar 28 at 22:33

answered Mar 28 at 22:27

Kevin Liu

12310 bronze badges

import numpy as np
from numba import jit
from time import time
np.random.seed(42)

def mixing_function(sig, onset):
 maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
 result = np.zeros(maxlen)
 for i in range(len(onset)):
 result[onset[i]:onset[i] + len(sig[i])] += sig[i] 
 return result

def mix(sig, onset):
 siglengths = np.vectorize(len)(sig)
 maxlen = max(onset + siglengths)
 result = np.zeros(maxlen)
 for i in range(len(sig)):
 result[onset[i]: onset[i]+siglengths[i]] += sig[i]
 return result

@jit(nopython=True)
def mixnumba(sig, onset):
 # maxlen = np.max([onset[i] + len(sig[i]) for i in range(len(sig))])
 maxlen = -1
 for i in range(len(sig)):
 maxlen = max(maxlen, sig[i].size + onset[i])
 result = np.zeros(maxlen)
 for i in range(len(sig)):
 result[onset[i]: onset[i] + sig[i].size] += sig[i]
 return result

def signal_adder_with_onset(data, onset):
 data = np.array(data)
 # Get lengths of each row of data
 lens = np.array([len(i) for i in data])
 #adjust with offset for max possible lengths
 max_size = lens + onset
 # Mask of valid places in each row
 mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1)) 
 & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

 # Setup output array and put elements from data into masked positions
 out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
 out[mask] = np.concatenate(data)
 return out.sum(axis=0)

sigbig = [np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]
onsetbig = np.random.randint(0, 10000, size=10000)
sigrepeat = np.repeat(sig, 500000).tolist()
onsetrepeat = np.repeat(onset, 500000)

assert all(mixing_function(sigbig, onsetbig) == mix(sigbig, onsetbig))
assert all(mixing_function(sigbig, onsetbig) == mixnumba(sigbig, onsetbig))
assert all(mixing_function(sigbig, onsetbig) == signal_adder_with_onset(sigbig, onsetbig))

%timeit result = mixing_function(sigbig, onsetbig)
%timeit result = mix(sigbig, onsetbig)
%timeit result = mixnumba(sigbig, onsetbig)
%timeit result = signal_adder_with_onset(sigbig, onsetbig)
# Output
114 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
108 ms ± 2.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
368 ms ± 8.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
13.4 s ± 211 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit result = mixing_function(sigrepeat, onsetrepeat)
%timeit result = mix(sigrepeat, onsetrepeat)
%timeit result = mixnumba(sigrepeat, onsetrepeat)
%timeit result = signal_adder_with_onset(sigrepeat.tolist(), onsetrepeat)
# Output
933 ms ± 6.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
803 ms ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.07 s ± 85.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
254 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

edited Mar 28 at 22:33

answered Mar 28 at 22:27

Kevin Liu

12310 bronze badges

edited Mar 28 at 22:33

answered Mar 28 at 22:27

Kevin Liu

12310 bronze badges

answered Mar 28 at 22:27

Kevin Liu

12310 bronze badges

answered Mar 28 at 22:27

Kevin Liu

12310 bronze badges

add a comment
|

answered Mar 28 at 20:29

Acccumulation

1,7821 gold badge3 silver badges9 bronze badges

this sounds interesting with pandas. can you give a code example of the offset?

– J_yang
Mar 28 at 20:43

add a comment
|

answered Mar 28 at 20:29

Acccumulation

1,7821 gold badge3 silver badges9 bronze badges

this sounds interesting with pandas. can you give a code example of the offset?

– J_yang
Mar 28 at 20:43

add a comment
|

answered Mar 28 at 20:29

Acccumulation

1,7821 gold badge3 silver badges9 bronze badges

answered Mar 28 at 20:29

Acccumulation

1,7821 gold badge3 silver badges9 bronze badges

answered Mar 28 at 20:29

Acccumulation

1,7821 gold badge3 silver badges9 bronze badges

answered Mar 28 at 20:29

Acccumulation

1,7821 gold badge3 silver badges9 bronze badges

answered Mar 28 at 20:29

Acccumulation

1,7821 gold badge3 silver badges9 bronze badges

this sounds interesting with pandas. can you give a code example of the offset?

– J_yang
Mar 28 at 20:43

add a comment
|

this sounds interesting with pandas. can you give a code example of the offset?

– J_yang
Mar 28 at 20:43

this sounds interesting with pandas. can you give a code example of the offset?

– J_yang
Mar 28 at 20:43

add a comment
|

Try numpy zero arrays of equal length with the signals appropriately inserted and simply performing 3 numpy array additions. Should speed things up considerably.

def mixing_function(sig,onset):
 maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
 sig1 = np.zeros(maxlen)
 sig2 = np.zeros(maxlen)
 sig3 = np.zeros(maxlen)
 sig1[onset[0]:onset[0] + len(sig[0])] = sig[0]
 sig2[onset[1]:onset[1] + len(sig[1])] = sig[1]
 sig3[onset[2]:onset[2] + len(sig[2])] = sig[2]
 result = sig1+sig2+sig3
 print(sig1)
 print(sig2)
 print(sig3)
 print(result)

answered Mar 28 at 20:38

thatNLPguy

1018 bronze badges

the code above is just an example, in practice, sig might contain dozens to hundreds of items though. So still can't get away with a for loop, which will be essentially the same.

– J_yang
Mar 28 at 20:42

Ah. Yes. Probably doesn't scale well. But if numpy additions aren't doing it for you, I'm not sure what will.

– thatNLPguy
Mar 28 at 20:46

add a comment
|

Try numpy zero arrays of equal length with the signals appropriately inserted and simply performing 3 numpy array additions. Should speed things up considerably.

def mixing_function(sig,onset):
 maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
 sig1 = np.zeros(maxlen)
 sig2 = np.zeros(maxlen)
 sig3 = np.zeros(maxlen)
 sig1[onset[0]:onset[0] + len(sig[0])] = sig[0]
 sig2[onset[1]:onset[1] + len(sig[1])] = sig[1]
 sig3[onset[2]:onset[2] + len(sig[2])] = sig[2]
 result = sig1+sig2+sig3
 print(sig1)
 print(sig2)
 print(sig3)
 print(result)

answered Mar 28 at 20:38

thatNLPguy

1018 bronze badges

the code above is just an example, in practice, sig might contain dozens to hundreds of items though. So still can't get away with a for loop, which will be essentially the same.

– J_yang
Mar 28 at 20:42

Ah. Yes. Probably doesn't scale well. But if numpy additions aren't doing it for you, I'm not sure what will.

– thatNLPguy
Mar 28 at 20:46

add a comment
|

Try numpy zero arrays of equal length with the signals appropriately inserted and simply performing 3 numpy array additions. Should speed things up considerably.

def mixing_function(sig,onset):
 maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
 sig1 = np.zeros(maxlen)
 sig2 = np.zeros(maxlen)
 sig3 = np.zeros(maxlen)
 sig1[onset[0]:onset[0] + len(sig[0])] = sig[0]
 sig2[onset[1]:onset[1] + len(sig[1])] = sig[1]
 sig3[onset[2]:onset[2] + len(sig[2])] = sig[2]
 result = sig1+sig2+sig3
 print(sig1)
 print(sig2)
 print(sig3)
 print(result)

answered Mar 28 at 20:38

thatNLPguy

1018 bronze badges

Try numpy zero arrays of equal length with the signals appropriately inserted and simply performing 3 numpy array additions. Should speed things up considerably.

def mixing_function(sig,onset):
 maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
 sig1 = np.zeros(maxlen)
 sig2 = np.zeros(maxlen)
 sig3 = np.zeros(maxlen)
 sig1[onset[0]:onset[0] + len(sig[0])] = sig[0]
 sig2[onset[1]:onset[1] + len(sig[1])] = sig[1]
 sig3[onset[2]:onset[2] + len(sig[2])] = sig[2]
 result = sig1+sig2+sig3
 print(sig1)
 print(sig2)
 print(sig3)
 print(result)

answered Mar 28 at 20:38

thatNLPguy

1018 bronze badges

answered Mar 28 at 20:38

thatNLPguy

1018 bronze badges

answered Mar 28 at 20:38

thatNLPguy

1018 bronze badges

answered Mar 28 at 20:38

thatNLPguy

1018 bronze badges

the code above is just an example, in practice, sig might contain dozens to hundreds of items though. So still can't get away with a for loop, which will be essentially the same.

– J_yang
Mar 28 at 20:42

Ah. Yes. Probably doesn't scale well. But if numpy additions aren't doing it for you, I'm not sure what will.

– thatNLPguy
Mar 28 at 20:46

add a comment
|

the code above is just an example, in practice, sig might contain dozens to hundreds of items though. So still can't get away with a for loop, which will be essentially the same.

– J_yang
Mar 28 at 20:42

Ah. Yes. Probably doesn't scale well. But if numpy additions aren't doing it for you, I'm not sure what will.

– thatNLPguy
Mar 28 at 20:46

the code above is just an example, in practice, sig might contain dozens to hundreds of items though. So still can't get away with a for loop, which will be essentially the same.

– J_yang
Mar 28 at 20:42

Ah. Yes. Probably doesn't scale well. But if numpy additions aren't doing it for you, I'm not sure what will.

– thatNLPguy
Mar 28 at 20:46

add a comment
|

Here's an attempt that should do the trick.

def signal_adder_with_onset(data, onset):
 # Get lengths of each row of data
 lens = np.array([len(i) for i in data])
 #adjust with offset for max possible lengths
 max_size = lens + onset
 # Mask of valid places in each row
 mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1)) 
 & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

 # Setup output array and put elements from data into masked positions
 out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
 out[mask] = np.concatenate(data)
 return out.sum(axis=0)

import numpy as np
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
onset = np.array((0, 2, 8))
result = signal_adder_with_onset(sig, onset)
print(result)
#[1 2 8 9 5 0 0 0 7 7 7 7]

Edit: Vectorized operations only kick in with more data, and are slower with smaller amounts of data.

Added for comparison

import time

def signal_adder_with_onset(data, onset):
 # Get lengths of each row of data
 lens = np.array([len(i) for i in data])
 #adjust with offset for max possible lengths
 max_size = lens + onset
 # Mask of valid places in each row
 mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1)) 
 & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

 # Setup output array and put elements from data into masked positions
 out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
 out[mask] = np.concatenate(data)
 return out.sum(axis=0)

def mixing_function(sig,onset):
 maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
 result = np.zeros(maxlen)
 for i in range(len(onset)):
 result[onset[i]:onset[i] + len(sig[i])] += sig[i] 
 return result

import numpy as np
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
sig = np.repeat(sig, 1000000)
onset = np.array((0, 2, 8))
onset = np.repeat(onset, 1000000)
start1 = time.time()
result = signal_adder_with_onset(sig, onset)
end1 = time.time()
start2 = time.time()
result2 = mixing_function(sig,onset)
end2 = time.time()
print(f"Original function: end2 - start2 n Vectorized function: end1 - start1")
print(result)
#Output:
Original function: 9.28258752822876 
 Vectorized function: 2.5798118114471436
[1000000 2000000 8000000 9000000 5000000 0 0 0 7000000 7000000 7000000
 7000000]

edited Mar 28 at 20:49

answered Mar 28 at 20:20

Paritosh Singh

4,5192 gold badges7 silver badges29 bronze badges

This code is actually much slower than the already proposed code in the op.

– Kevin Liu
Mar 28 at 20:34

well actually it is about 5 times slower with this method I am afraid.

– J_yang
Mar 28 at 20:34

It does the trick, but is it really faster? I checked and for me this method works slower.

– Ardweaden
Mar 28 at 20:34

Well, using a different dataset I got very different results: sig = np.array([np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]) onset = np.random.randint(0, 10000, size=10000) Gives the results: Original function: 0.156998872756958 Vectorized function: 14.857199907302856 I think long signals of varying length is a much more realistic scenario than one million tiny signals, but I guess only the op can determine what kind of data he expects.

– Kevin Liu
Mar 28 at 21:19

add a comment
|

Here's an attempt that should do the trick.

def signal_adder_with_onset(data, onset):
 # Get lengths of each row of data
 lens = np.array([len(i) for i in data])
 #adjust with offset for max possible lengths
 max_size = lens + onset
 # Mask of valid places in each row
 mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1)) 
 & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

 # Setup output array and put elements from data into masked positions
 out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
 out[mask] = np.concatenate(data)
 return out.sum(axis=0)

import numpy as np
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
onset = np.array((0, 2, 8))
result = signal_adder_with_onset(sig, onset)
print(result)
#[1 2 8 9 5 0 0 0 7 7 7 7]

Edit: Vectorized operations only kick in with more data, and are slower with smaller amounts of data.

Added for comparison

import time

def signal_adder_with_onset(data, onset):
 # Get lengths of each row of data
 lens = np.array([len(i) for i in data])
 #adjust with offset for max possible lengths
 max_size = lens + onset
 # Mask of valid places in each row
 mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1)) 
 & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

 # Setup output array and put elements from data into masked positions
 out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
 out[mask] = np.concatenate(data)
 return out.sum(axis=0)

def mixing_function(sig,onset):
 maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
 result = np.zeros(maxlen)
 for i in range(len(onset)):
 result[onset[i]:onset[i] + len(sig[i])] += sig[i] 
 return result

import numpy as np
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
sig = np.repeat(sig, 1000000)
onset = np.array((0, 2, 8))
onset = np.repeat(onset, 1000000)
start1 = time.time()
result = signal_adder_with_onset(sig, onset)
end1 = time.time()
start2 = time.time()
result2 = mixing_function(sig,onset)
end2 = time.time()
print(f"Original function: end2 - start2 n Vectorized function: end1 - start1")
print(result)
#Output:
Original function: 9.28258752822876 
 Vectorized function: 2.5798118114471436
[1000000 2000000 8000000 9000000 5000000 0 0 0 7000000 7000000 7000000
 7000000]

edited Mar 28 at 20:49

answered Mar 28 at 20:20

Paritosh Singh

4,5192 gold badges7 silver badges29 bronze badges

This code is actually much slower than the already proposed code in the op.

– Kevin Liu
Mar 28 at 20:34

well actually it is about 5 times slower with this method I am afraid.

– J_yang
Mar 28 at 20:34

It does the trick, but is it really faster? I checked and for me this method works slower.

– Ardweaden
Mar 28 at 20:34

Well, using a different dataset I got very different results: sig = np.array([np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]) onset = np.random.randint(0, 10000, size=10000) Gives the results: Original function: 0.156998872756958 Vectorized function: 14.857199907302856 I think long signals of varying length is a much more realistic scenario than one million tiny signals, but I guess only the op can determine what kind of data he expects.

– Kevin Liu
Mar 28 at 21:19

add a comment
|

Here's an attempt that should do the trick.

def signal_adder_with_onset(data, onset):
 # Get lengths of each row of data
 lens = np.array([len(i) for i in data])
 #adjust with offset for max possible lengths
 max_size = lens + onset
 # Mask of valid places in each row
 mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1)) 
 & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

 # Setup output array and put elements from data into masked positions
 out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
 out[mask] = np.concatenate(data)
 return out.sum(axis=0)

import numpy as np
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
onset = np.array((0, 2, 8))
result = signal_adder_with_onset(sig, onset)
print(result)
#[1 2 8 9 5 0 0 0 7 7 7 7]

Edit: Vectorized operations only kick in with more data, and are slower with smaller amounts of data.

Added for comparison

import time

def signal_adder_with_onset(data, onset):
 # Get lengths of each row of data
 lens = np.array([len(i) for i in data])
 #adjust with offset for max possible lengths
 max_size = lens + onset
 # Mask of valid places in each row
 mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1)) 
 & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

 # Setup output array and put elements from data into masked positions
 out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
 out[mask] = np.concatenate(data)
 return out.sum(axis=0)

def mixing_function(sig,onset):
 maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
 result = np.zeros(maxlen)
 for i in range(len(onset)):
 result[onset[i]:onset[i] + len(sig[i])] += sig[i] 
 return result

import numpy as np
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
sig = np.repeat(sig, 1000000)
onset = np.array((0, 2, 8))
onset = np.repeat(onset, 1000000)
start1 = time.time()
result = signal_adder_with_onset(sig, onset)
end1 = time.time()
start2 = time.time()
result2 = mixing_function(sig,onset)
end2 = time.time()
print(f"Original function: end2 - start2 n Vectorized function: end1 - start1")
print(result)
#Output:
Original function: 9.28258752822876 
 Vectorized function: 2.5798118114471436
[1000000 2000000 8000000 9000000 5000000 0 0 0 7000000 7000000 7000000
 7000000]

edited Mar 28 at 20:49

answered Mar 28 at 20:20

Paritosh Singh

4,5192 gold badges7 silver badges29 bronze badges

Here's an attempt that should do the trick.

def signal_adder_with_onset(data, onset):
 # Get lengths of each row of data
 lens = np.array([len(i) for i in data])
 #adjust with offset for max possible lengths
 max_size = lens + onset
 # Mask of valid places in each row
 mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1)) 
 & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

 # Setup output array and put elements from data into masked positions
 out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
 out[mask] = np.concatenate(data)
 return out.sum(axis=0)

import numpy as np
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
onset = np.array((0, 2, 8))
result = signal_adder_with_onset(sig, onset)
print(result)
#[1 2 8 9 5 0 0 0 7 7 7 7]

Edit: Vectorized operations only kick in with more data, and are slower with smaller amounts of data.

Added for comparison

import time

def signal_adder_with_onset(data, onset):
 # Get lengths of each row of data
 lens = np.array([len(i) for i in data])
 #adjust with offset for max possible lengths
 max_size = lens + onset
 # Mask of valid places in each row
 mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1)) 
 & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

 # Setup output array and put elements from data into masked positions
 out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
 out[mask] = np.concatenate(data)
 return out.sum(axis=0)

def mixing_function(sig,onset):
 maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
 result = np.zeros(maxlen)
 for i in range(len(onset)):
 result[onset[i]:onset[i] + len(sig[i])] += sig[i] 
 return result

import numpy as np
signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
sig = np.repeat(sig, 1000000)
onset = np.array((0, 2, 8))
onset = np.repeat(onset, 1000000)
start1 = time.time()
result = signal_adder_with_onset(sig, onset)
end1 = time.time()
start2 = time.time()
result2 = mixing_function(sig,onset)
end2 = time.time()
print(f"Original function: end2 - start2 n Vectorized function: end1 - start1")
print(result)
#Output:
Original function: 9.28258752822876 
 Vectorized function: 2.5798118114471436
[1000000 2000000 8000000 9000000 5000000 0 0 0 7000000 7000000 7000000
 7000000]

edited Mar 28 at 20:49

answered Mar 28 at 20:20

Paritosh Singh

4,5192 gold badges7 silver badges29 bronze badges

edited Mar 28 at 20:49

answered Mar 28 at 20:20

Paritosh Singh

4,5192 gold badges7 silver badges29 bronze badges

answered Mar 28 at 20:20

Paritosh Singh

4,5192 gold badges7 silver badges29 bronze badges

answered Mar 28 at 20:20

Paritosh Singh

4,5192 gold badges7 silver badges29 bronze badges

This code is actually much slower than the already proposed code in the op.

– Kevin Liu
Mar 28 at 20:34

well actually it is about 5 times slower with this method I am afraid.

– J_yang
Mar 28 at 20:34

It does the trick, but is it really faster? I checked and for me this method works slower.

– Ardweaden
Mar 28 at 20:34

Well, using a different dataset I got very different results: sig = np.array([np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]) onset = np.random.randint(0, 10000, size=10000) Gives the results: Original function: 0.156998872756958 Vectorized function: 14.857199907302856 I think long signals of varying length is a much more realistic scenario than one million tiny signals, but I guess only the op can determine what kind of data he expects.

– Kevin Liu
Mar 28 at 21:19

add a comment
|

This code is actually much slower than the already proposed code in the op.

– Kevin Liu
Mar 28 at 20:34

well actually it is about 5 times slower with this method I am afraid.

– J_yang
Mar 28 at 20:34

It does the trick, but is it really faster? I checked and for me this method works slower.

– Ardweaden
Mar 28 at 20:34

Well, using a different dataset I got very different results: sig = np.array([np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]) onset = np.random.randint(0, 10000, size=10000) Gives the results: Original function: 0.156998872756958 Vectorized function: 14.857199907302856 I think long signals of varying length is a much more realistic scenario than one million tiny signals, but I guess only the op can determine what kind of data he expects.

– Kevin Liu
Mar 28 at 21:19

This code is actually much slower than the already proposed code in the op.

– Kevin Liu
Mar 28 at 20:34

well actually it is about 5 times slower with this method I am afraid.

– J_yang
Mar 28 at 20:34

It does the trick, but is it really faster? I checked and for me this method works slower.

– Ardweaden
Mar 28 at 20:34

Well, using a different dataset I got very different results: sig = np.array([np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]) onset = np.random.randint(0, 10000, size=10000) Gives the results: Original function: 0.156998872756958 Vectorized function: 14.857199907302856 I think long signals of varying length is a much more realistic scenario than one million tiny signals, but I guess only the op can determine what kind of data he expects.

– Kevin Liu
Mar 28 at 21:19

add a comment
|

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

4 Answers
4

Your Answer

Post as a guest

4 Answers
4

4 Answers
4

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

4 Answers 4

4 Answers 4

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

4 Answers
4

4 Answers
4

4 Answers
4