Python how to add multiple arrays with different length into oneConvert Python sequence to NumPy array, filling missing valuesHow do I copy a file in Python?What is the difference between Python's list methods append and extend?How can I safely create a nested directory?How can I remove a trailing newline?How do I parse a string to a float or int?How to get the current time in PythonHow can I make a time delay in Python?How do I get the number of elements in a list?How do I concatenate two lists in Python?How do I lowercase a string in Python?

My machine, client installed VPN,

Where to find the Arxiv endorsement code?

Convert a string of digits from words to an integer

Do interval ratios take overtones into account or solely the fundamental frequency?

Can I exile my opponent's Progenitus/True-Name Nemesis with Teferi, Hero of Dominaria's emblem?

what organs or modifications would be needed to have hairy fish?

I reverse the source code, you reverse the input!

Why do some modern glider wings like the Schleicher 29 have a tadpole shape rather than a teardrop shape?

Question about a degree 5 polynomial with no rational roots

Concerning a relationship in the team

Whaling ship logistics

Received a package but didn't order it

Windows 10 deletes lots of tiny files super slowly. Anything that can be done to speed it up?

I transpose the source code, you transpose the input!

A word that refers to saying something in an attempt to anger or embarrass someone into doing something that they don’t want to do?

Dynamic DataSource for Droplist in Content Editor

An impressive body of work

Why does my browser attempt to download pages from http://clhs.lisp.se instead of viewing them normally?

Another student has been assigned the same MSc thesis as mine (and already defended)

Is determiner 'a' needed here?

Why would an airline put 15 passengers at once on standby?

How many stack cables would be needed if we want to stack two 3850 switches

Population of post-Soviet states. Why decreasing?

What happens to a net with the Returning Weapon artificer infusion after it hits?



Python how to add multiple arrays with different length into one


Convert Python sequence to NumPy array, filling missing valuesHow do I copy a file in Python?What is the difference between Python's list methods append and extend?How can I safely create a nested directory?How can I remove a trailing newline?How do I parse a string to a float or int?How to get the current time in PythonHow can I make a time delay in Python?How do I get the number of elements in a list?How do I concatenate two lists in Python?How do I lowercase a string in Python?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








2















I am working a program that needs to mix audio arrays together with a given starting index. For example



signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
onset(0, 2, 8)
result = mixing_function(sig,onset)


Based on the onset, signal2 will add to signal1 from index 2, and signal3 will add to the mix from index 8, so the mixing part will be zero padded. It should return:



[1,2,8,9,5,0,0,0,7,7,7,7]


I am not sure what is the effective way to write the code for this. For now, I created a zero array with the maximum length maxlen. Then I add each element in sig to the corresponding index range of the result :



def mixing_function(sig,onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
result = np.zeros(maxlen)
for i in range(len(onset)):
result[onset[i]:onset[i] + len(sig[i])] += sig[i]
return result


However, this can be quite slow especially when there are many signals being mixed together all with different onsets. Please advice if there is a much more efficient way .



Many thanks



J












share|improve this question





















  • 1





    I don't see any obvious way of speeding this up. Just that you could numba-compile it. And don't wrap your list of arrays in sig into a numpy array. Just keep it a list of arrays.

    – j08lue
    Mar 28 at 19:34











  • "addition" in the context of arrays is ambiguous. For instance, it can mean appending. In this case, you seem to mean element-wise addition. Also, normal English is "add A and B" or "add A to B", not "B adds A".

    – Acccumulation
    Mar 28 at 20:16











  • Padding with zeros (or something else) comes up periodically. There are good answers in Convert Python sequence to NumPy array, filling missing values, including a clean version of the mask, and also one using itertools.zip_longest.

    – hpaulj
    Mar 29 at 2:43


















2















I am working a program that needs to mix audio arrays together with a given starting index. For example



signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
onset(0, 2, 8)
result = mixing_function(sig,onset)


Based on the onset, signal2 will add to signal1 from index 2, and signal3 will add to the mix from index 8, so the mixing part will be zero padded. It should return:



[1,2,8,9,5,0,0,0,7,7,7,7]


I am not sure what is the effective way to write the code for this. For now, I created a zero array with the maximum length maxlen. Then I add each element in sig to the corresponding index range of the result :



def mixing_function(sig,onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
result = np.zeros(maxlen)
for i in range(len(onset)):
result[onset[i]:onset[i] + len(sig[i])] += sig[i]
return result


However, this can be quite slow especially when there are many signals being mixed together all with different onsets. Please advice if there is a much more efficient way .



Many thanks



J












share|improve this question





















  • 1





    I don't see any obvious way of speeding this up. Just that you could numba-compile it. And don't wrap your list of arrays in sig into a numpy array. Just keep it a list of arrays.

    – j08lue
    Mar 28 at 19:34











  • "addition" in the context of arrays is ambiguous. For instance, it can mean appending. In this case, you seem to mean element-wise addition. Also, normal English is "add A and B" or "add A to B", not "B adds A".

    – Acccumulation
    Mar 28 at 20:16











  • Padding with zeros (or something else) comes up periodically. There are good answers in Convert Python sequence to NumPy array, filling missing values, including a clean version of the mask, and also one using itertools.zip_longest.

    – hpaulj
    Mar 29 at 2:43














2












2








2








I am working a program that needs to mix audio arrays together with a given starting index. For example



signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
onset(0, 2, 8)
result = mixing_function(sig,onset)


Based on the onset, signal2 will add to signal1 from index 2, and signal3 will add to the mix from index 8, so the mixing part will be zero padded. It should return:



[1,2,8,9,5,0,0,0,7,7,7,7]


I am not sure what is the effective way to write the code for this. For now, I created a zero array with the maximum length maxlen. Then I add each element in sig to the corresponding index range of the result :



def mixing_function(sig,onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
result = np.zeros(maxlen)
for i in range(len(onset)):
result[onset[i]:onset[i] + len(sig[i])] += sig[i]
return result


However, this can be quite slow especially when there are many signals being mixed together all with different onsets. Please advice if there is a much more efficient way .



Many thanks



J












share|improve this question
















I am working a program that needs to mix audio arrays together with a given starting index. For example



signal1 = np.array([1,2,3,4])
signal2 = np.array([5,5,5])
signal3 = np.array([7,7,7,7])
sig = np.array([signal1,signal2,signal3])
onset(0, 2, 8)
result = mixing_function(sig,onset)


Based on the onset, signal2 will add to signal1 from index 2, and signal3 will add to the mix from index 8, so the mixing part will be zero padded. It should return:



[1,2,8,9,5,0,0,0,7,7,7,7]


I am not sure what is the effective way to write the code for this. For now, I created a zero array with the maximum length maxlen. Then I add each element in sig to the corresponding index range of the result :



def mixing_function(sig,onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
result = np.zeros(maxlen)
for i in range(len(onset)):
result[onset[i]:onset[i] + len(sig[i])] += sig[i]
return result


However, this can be quite slow especially when there are many signals being mixed together all with different onsets. Please advice if there is a much more efficient way .



Many thanks



J









python numpy






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 28 at 19:21







J_yang

















asked Mar 28 at 19:16









J_yangJ_yang

8553 gold badges13 silver badges30 bronze badges




8553 gold badges13 silver badges30 bronze badges










  • 1





    I don't see any obvious way of speeding this up. Just that you could numba-compile it. And don't wrap your list of arrays in sig into a numpy array. Just keep it a list of arrays.

    – j08lue
    Mar 28 at 19:34











  • "addition" in the context of arrays is ambiguous. For instance, it can mean appending. In this case, you seem to mean element-wise addition. Also, normal English is "add A and B" or "add A to B", not "B adds A".

    – Acccumulation
    Mar 28 at 20:16











  • Padding with zeros (or something else) comes up periodically. There are good answers in Convert Python sequence to NumPy array, filling missing values, including a clean version of the mask, and also one using itertools.zip_longest.

    – hpaulj
    Mar 29 at 2:43













  • 1





    I don't see any obvious way of speeding this up. Just that you could numba-compile it. And don't wrap your list of arrays in sig into a numpy array. Just keep it a list of arrays.

    – j08lue
    Mar 28 at 19:34











  • "addition" in the context of arrays is ambiguous. For instance, it can mean appending. In this case, you seem to mean element-wise addition. Also, normal English is "add A and B" or "add A to B", not "B adds A".

    – Acccumulation
    Mar 28 at 20:16











  • Padding with zeros (or something else) comes up periodically. There are good answers in Convert Python sequence to NumPy array, filling missing values, including a clean version of the mask, and also one using itertools.zip_longest.

    – hpaulj
    Mar 29 at 2:43








1




1





I don't see any obvious way of speeding this up. Just that you could numba-compile it. And don't wrap your list of arrays in sig into a numpy array. Just keep it a list of arrays.

– j08lue
Mar 28 at 19:34





I don't see any obvious way of speeding this up. Just that you could numba-compile it. And don't wrap your list of arrays in sig into a numpy array. Just keep it a list of arrays.

– j08lue
Mar 28 at 19:34













"addition" in the context of arrays is ambiguous. For instance, it can mean appending. In this case, you seem to mean element-wise addition. Also, normal English is "add A and B" or "add A to B", not "B adds A".

– Acccumulation
Mar 28 at 20:16





"addition" in the context of arrays is ambiguous. For instance, it can mean appending. In this case, you seem to mean element-wise addition. Also, normal English is "add A and B" or "add A to B", not "B adds A".

– Acccumulation
Mar 28 at 20:16













Padding with zeros (or something else) comes up periodically. There are good answers in Convert Python sequence to NumPy array, filling missing values, including a clean version of the mask, and also one using itertools.zip_longest.

– hpaulj
Mar 29 at 2:43






Padding with zeros (or something else) comes up periodically. There are good answers in Convert Python sequence to NumPy array, filling missing values, including a clean version of the mask, and also one using itertools.zip_longest.

– hpaulj
Mar 29 at 2:43













4 Answers
4






active

oldest

votes


















1
















Here are some stats for different solutions to the problem. I was able to squeeze a little more performance by vectorizing the implementation to get maxlen, but besides that, I think you will have to try cython or trying other programming languages.



import numpy as np
from numba import jit
from time import time
np.random.seed(42)

def mixing_function(sig, onset):
maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
result = np.zeros(maxlen)
for i in range(len(onset)):
result[onset[i]:onset[i] + len(sig[i])] += sig[i]
return result

def mix(sig, onset):
siglengths = np.vectorize(len)(sig)
maxlen = max(onset + siglengths)
result = np.zeros(maxlen)
for i in range(len(sig)):
result[onset[i]: onset[i]+siglengths[i]] += sig[i]
return result

@jit(nopython=True)
def mixnumba(sig, onset):
# maxlen = np.max([onset[i] + len(sig[i]) for i in range(len(sig))])
maxlen = -1
for i in range(len(sig)):
maxlen = max(maxlen, sig[i].size + onset[i])
result = np.zeros(maxlen)
for i in range(len(sig)):
result[onset[i]: onset[i] + sig[i].size] += sig[i]
return result

def signal_adder_with_onset(data, onset):
data = np.array(data)
# Get lengths of each row of data
lens = np.array([len(i) for i in data])
#adjust with offset for max possible lengths
max_size = lens + onset
# Mask of valid places in each row
mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
& (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

# Setup output array and put elements from data into masked positions
out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
out[mask] = np.concatenate(data)
return out.sum(axis=0)

sigbig = [np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]
onsetbig = np.random.randint(0, 10000, size=10000)
sigrepeat = np.repeat(sig, 500000).tolist()
onsetrepeat = np.repeat(onset, 500000)

assert all(mixing_function(sigbig, onsetbig) == mix(sigbig, onsetbig))
assert all(mixing_function(sigbig, onsetbig) == mixnumba(sigbig, onsetbig))
assert all(mixing_function(sigbig, onsetbig) == signal_adder_with_onset(sigbig, onsetbig))

%timeit result = mixing_function(sigbig, onsetbig)
%timeit result = mix(sigbig, onsetbig)
%timeit result = mixnumba(sigbig, onsetbig)
%timeit result = signal_adder_with_onset(sigbig, onsetbig)
# Output
114 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
108 ms ± 2.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
368 ms ± 8.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
13.4 s ± 211 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit result = mixing_function(sigrepeat, onsetrepeat)
%timeit result = mix(sigrepeat, onsetrepeat)
%timeit result = mixnumba(sigrepeat, onsetrepeat)
%timeit result = signal_adder_with_onset(sigrepeat.tolist(), onsetrepeat)
# Output
933 ms ± 6.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
803 ms ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.07 s ± 85.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
254 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


TL.DR.
Marginal performance improvement (around 10% faster) by using np.vectorize in order to get maxlen for long signals of random length. Note that for many small signals, @Paritosh Singh answer performs faster than the others.






share|improve this answer


































    0
















    If you offset the signals, then put them in a data frame, NaN will be added to columns to make all the rows the same length. Then you can do df.sum(). That will return a float rather than int, however.






    share|improve this answer

























    • this sounds interesting with pandas. can you give a code example of the offset?

      – J_yang
      Mar 28 at 20:43


















    0
















    Try numpy zero arrays of equal length with the signals appropriately inserted and simply performing 3 numpy array additions. Should speed things up considerably.



    def mixing_function(sig,onset):
    maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
    sig1 = np.zeros(maxlen)
    sig2 = np.zeros(maxlen)
    sig3 = np.zeros(maxlen)
    sig1[onset[0]:onset[0] + len(sig[0])] = sig[0]
    sig2[onset[1]:onset[1] + len(sig[1])] = sig[1]
    sig3[onset[2]:onset[2] + len(sig[2])] = sig[2]
    result = sig1+sig2+sig3
    print(sig1)
    print(sig2)
    print(sig3)
    print(result)





    share|improve this answer

























    • the code above is just an example, in practice, sig might contain dozens to hundreds of items though. So still can't get away with a for loop, which will be essentially the same.

      – J_yang
      Mar 28 at 20:42












    • Ah. Yes. Probably doesn't scale well. But if numpy additions aren't doing it for you, I'm not sure what will.

      – thatNLPguy
      Mar 28 at 20:46


















    0
















    Here's an attempt that should do the trick.



    def signal_adder_with_onset(data, onset):
    # Get lengths of each row of data
    lens = np.array([len(i) for i in data])
    #adjust with offset for max possible lengths
    max_size = lens + onset
    # Mask of valid places in each row
    mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
    & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

    # Setup output array and put elements from data into masked positions
    out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
    out[mask] = np.concatenate(data)
    return out.sum(axis=0)

    import numpy as np
    signal1 = np.array([1,2,3,4])
    signal2 = np.array([5,5,5])
    signal3 = np.array([7,7,7,7])
    sig = np.array([signal1,signal2,signal3])
    onset = np.array((0, 2, 8))
    result = signal_adder_with_onset(sig, onset)
    print(result)
    #[1 2 8 9 5 0 0 0 7 7 7 7]



    Edit: Vectorized operations only kick in with more data, and are slower with smaller amounts of data.



    Added for comparison



    import time

    def signal_adder_with_onset(data, onset):
    # Get lengths of each row of data
    lens = np.array([len(i) for i in data])
    #adjust with offset for max possible lengths
    max_size = lens + onset
    # Mask of valid places in each row
    mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
    & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

    # Setup output array and put elements from data into masked positions
    out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
    out[mask] = np.concatenate(data)
    return out.sum(axis=0)

    def mixing_function(sig,onset):
    maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
    result = np.zeros(maxlen)
    for i in range(len(onset)):
    result[onset[i]:onset[i] + len(sig[i])] += sig[i]
    return result

    import numpy as np
    signal1 = np.array([1,2,3,4])
    signal2 = np.array([5,5,5])
    signal3 = np.array([7,7,7,7])
    sig = np.array([signal1,signal2,signal3])
    sig = np.repeat(sig, 1000000)
    onset = np.array((0, 2, 8))
    onset = np.repeat(onset, 1000000)
    start1 = time.time()
    result = signal_adder_with_onset(sig, onset)
    end1 = time.time()
    start2 = time.time()
    result2 = mixing_function(sig,onset)
    end2 = time.time()
    print(f"Original function: end2 - start2 n Vectorized function: end1 - start1")
    print(result)
    #Output:
    Original function: 9.28258752822876
    Vectorized function: 2.5798118114471436
    [1000000 2000000 8000000 9000000 5000000 0 0 0 7000000 7000000 7000000
    7000000]





    share|improve this answer



























    • This code is actually much slower than the already proposed code in the op.

      – Kevin Liu
      Mar 28 at 20:34











    • well actually it is about 5 times slower with this method I am afraid.

      – J_yang
      Mar 28 at 20:34











    • It does the trick, but is it really faster? I checked and for me this method works slower.

      – Ardweaden
      Mar 28 at 20:34











    • Well, using a different dataset I got very different results: sig = np.array([np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]) onset = np.random.randint(0, 10000, size=10000) Gives the results: Original function: 0.156998872756958 Vectorized function: 14.857199907302856 I think long signals of varying length is a much more realistic scenario than one million tiny signals, but I guess only the op can determine what kind of data he expects.

      – Kevin Liu
      Mar 28 at 21:19













    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );














    draft saved

    draft discarded
















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55405295%2fpython-how-to-add-multiple-arrays-with-different-length-into-one%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1
















    Here are some stats for different solutions to the problem. I was able to squeeze a little more performance by vectorizing the implementation to get maxlen, but besides that, I think you will have to try cython or trying other programming languages.



    import numpy as np
    from numba import jit
    from time import time
    np.random.seed(42)

    def mixing_function(sig, onset):
    maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
    result = np.zeros(maxlen)
    for i in range(len(onset)):
    result[onset[i]:onset[i] + len(sig[i])] += sig[i]
    return result

    def mix(sig, onset):
    siglengths = np.vectorize(len)(sig)
    maxlen = max(onset + siglengths)
    result = np.zeros(maxlen)
    for i in range(len(sig)):
    result[onset[i]: onset[i]+siglengths[i]] += sig[i]
    return result

    @jit(nopython=True)
    def mixnumba(sig, onset):
    # maxlen = np.max([onset[i] + len(sig[i]) for i in range(len(sig))])
    maxlen = -1
    for i in range(len(sig)):
    maxlen = max(maxlen, sig[i].size + onset[i])
    result = np.zeros(maxlen)
    for i in range(len(sig)):
    result[onset[i]: onset[i] + sig[i].size] += sig[i]
    return result

    def signal_adder_with_onset(data, onset):
    data = np.array(data)
    # Get lengths of each row of data
    lens = np.array([len(i) for i in data])
    #adjust with offset for max possible lengths
    max_size = lens + onset
    # Mask of valid places in each row
    mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
    & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

    # Setup output array and put elements from data into masked positions
    out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
    out[mask] = np.concatenate(data)
    return out.sum(axis=0)

    sigbig = [np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]
    onsetbig = np.random.randint(0, 10000, size=10000)
    sigrepeat = np.repeat(sig, 500000).tolist()
    onsetrepeat = np.repeat(onset, 500000)

    assert all(mixing_function(sigbig, onsetbig) == mix(sigbig, onsetbig))
    assert all(mixing_function(sigbig, onsetbig) == mixnumba(sigbig, onsetbig))
    assert all(mixing_function(sigbig, onsetbig) == signal_adder_with_onset(sigbig, onsetbig))

    %timeit result = mixing_function(sigbig, onsetbig)
    %timeit result = mix(sigbig, onsetbig)
    %timeit result = mixnumba(sigbig, onsetbig)
    %timeit result = signal_adder_with_onset(sigbig, onsetbig)
    # Output
    114 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    108 ms ± 2.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    368 ms ± 8.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    13.4 s ± 211 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    %timeit result = mixing_function(sigrepeat, onsetrepeat)
    %timeit result = mix(sigrepeat, onsetrepeat)
    %timeit result = mixnumba(sigrepeat, onsetrepeat)
    %timeit result = signal_adder_with_onset(sigrepeat.tolist(), onsetrepeat)
    # Output
    933 ms ± 6.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    803 ms ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    4.07 s ± 85.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    254 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


    TL.DR.
    Marginal performance improvement (around 10% faster) by using np.vectorize in order to get maxlen for long signals of random length. Note that for many small signals, @Paritosh Singh answer performs faster than the others.






    share|improve this answer































      1
















      Here are some stats for different solutions to the problem. I was able to squeeze a little more performance by vectorizing the implementation to get maxlen, but besides that, I think you will have to try cython or trying other programming languages.



      import numpy as np
      from numba import jit
      from time import time
      np.random.seed(42)

      def mixing_function(sig, onset):
      maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
      result = np.zeros(maxlen)
      for i in range(len(onset)):
      result[onset[i]:onset[i] + len(sig[i])] += sig[i]
      return result

      def mix(sig, onset):
      siglengths = np.vectorize(len)(sig)
      maxlen = max(onset + siglengths)
      result = np.zeros(maxlen)
      for i in range(len(sig)):
      result[onset[i]: onset[i]+siglengths[i]] += sig[i]
      return result

      @jit(nopython=True)
      def mixnumba(sig, onset):
      # maxlen = np.max([onset[i] + len(sig[i]) for i in range(len(sig))])
      maxlen = -1
      for i in range(len(sig)):
      maxlen = max(maxlen, sig[i].size + onset[i])
      result = np.zeros(maxlen)
      for i in range(len(sig)):
      result[onset[i]: onset[i] + sig[i].size] += sig[i]
      return result

      def signal_adder_with_onset(data, onset):
      data = np.array(data)
      # Get lengths of each row of data
      lens = np.array([len(i) for i in data])
      #adjust with offset for max possible lengths
      max_size = lens + onset
      # Mask of valid places in each row
      mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
      & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

      # Setup output array and put elements from data into masked positions
      out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
      out[mask] = np.concatenate(data)
      return out.sum(axis=0)

      sigbig = [np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]
      onsetbig = np.random.randint(0, 10000, size=10000)
      sigrepeat = np.repeat(sig, 500000).tolist()
      onsetrepeat = np.repeat(onset, 500000)

      assert all(mixing_function(sigbig, onsetbig) == mix(sigbig, onsetbig))
      assert all(mixing_function(sigbig, onsetbig) == mixnumba(sigbig, onsetbig))
      assert all(mixing_function(sigbig, onsetbig) == signal_adder_with_onset(sigbig, onsetbig))

      %timeit result = mixing_function(sigbig, onsetbig)
      %timeit result = mix(sigbig, onsetbig)
      %timeit result = mixnumba(sigbig, onsetbig)
      %timeit result = signal_adder_with_onset(sigbig, onsetbig)
      # Output
      114 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
      108 ms ± 2.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
      368 ms ± 8.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
      13.4 s ± 211 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

      %timeit result = mixing_function(sigrepeat, onsetrepeat)
      %timeit result = mix(sigrepeat, onsetrepeat)
      %timeit result = mixnumba(sigrepeat, onsetrepeat)
      %timeit result = signal_adder_with_onset(sigrepeat.tolist(), onsetrepeat)
      # Output
      933 ms ± 6.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
      803 ms ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
      4.07 s ± 85.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
      254 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


      TL.DR.
      Marginal performance improvement (around 10% faster) by using np.vectorize in order to get maxlen for long signals of random length. Note that for many small signals, @Paritosh Singh answer performs faster than the others.






      share|improve this answer





























        1














        1










        1









        Here are some stats for different solutions to the problem. I was able to squeeze a little more performance by vectorizing the implementation to get maxlen, but besides that, I think you will have to try cython or trying other programming languages.



        import numpy as np
        from numba import jit
        from time import time
        np.random.seed(42)

        def mixing_function(sig, onset):
        maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
        result = np.zeros(maxlen)
        for i in range(len(onset)):
        result[onset[i]:onset[i] + len(sig[i])] += sig[i]
        return result

        def mix(sig, onset):
        siglengths = np.vectorize(len)(sig)
        maxlen = max(onset + siglengths)
        result = np.zeros(maxlen)
        for i in range(len(sig)):
        result[onset[i]: onset[i]+siglengths[i]] += sig[i]
        return result

        @jit(nopython=True)
        def mixnumba(sig, onset):
        # maxlen = np.max([onset[i] + len(sig[i]) for i in range(len(sig))])
        maxlen = -1
        for i in range(len(sig)):
        maxlen = max(maxlen, sig[i].size + onset[i])
        result = np.zeros(maxlen)
        for i in range(len(sig)):
        result[onset[i]: onset[i] + sig[i].size] += sig[i]
        return result

        def signal_adder_with_onset(data, onset):
        data = np.array(data)
        # Get lengths of each row of data
        lens = np.array([len(i) for i in data])
        #adjust with offset for max possible lengths
        max_size = lens + onset
        # Mask of valid places in each row
        mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
        & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

        # Setup output array and put elements from data into masked positions
        out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
        out[mask] = np.concatenate(data)
        return out.sum(axis=0)

        sigbig = [np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]
        onsetbig = np.random.randint(0, 10000, size=10000)
        sigrepeat = np.repeat(sig, 500000).tolist()
        onsetrepeat = np.repeat(onset, 500000)

        assert all(mixing_function(sigbig, onsetbig) == mix(sigbig, onsetbig))
        assert all(mixing_function(sigbig, onsetbig) == mixnumba(sigbig, onsetbig))
        assert all(mixing_function(sigbig, onsetbig) == signal_adder_with_onset(sigbig, onsetbig))

        %timeit result = mixing_function(sigbig, onsetbig)
        %timeit result = mix(sigbig, onsetbig)
        %timeit result = mixnumba(sigbig, onsetbig)
        %timeit result = signal_adder_with_onset(sigbig, onsetbig)
        # Output
        114 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
        108 ms ± 2.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
        368 ms ± 8.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
        13.4 s ± 211 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

        %timeit result = mixing_function(sigrepeat, onsetrepeat)
        %timeit result = mix(sigrepeat, onsetrepeat)
        %timeit result = mixnumba(sigrepeat, onsetrepeat)
        %timeit result = signal_adder_with_onset(sigrepeat.tolist(), onsetrepeat)
        # Output
        933 ms ± 6.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
        803 ms ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
        4.07 s ± 85.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
        254 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


        TL.DR.
        Marginal performance improvement (around 10% faster) by using np.vectorize in order to get maxlen for long signals of random length. Note that for many small signals, @Paritosh Singh answer performs faster than the others.






        share|improve this answer















        Here are some stats for different solutions to the problem. I was able to squeeze a little more performance by vectorizing the implementation to get maxlen, but besides that, I think you will have to try cython or trying other programming languages.



        import numpy as np
        from numba import jit
        from time import time
        np.random.seed(42)

        def mixing_function(sig, onset):
        maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
        result = np.zeros(maxlen)
        for i in range(len(onset)):
        result[onset[i]:onset[i] + len(sig[i])] += sig[i]
        return result

        def mix(sig, onset):
        siglengths = np.vectorize(len)(sig)
        maxlen = max(onset + siglengths)
        result = np.zeros(maxlen)
        for i in range(len(sig)):
        result[onset[i]: onset[i]+siglengths[i]] += sig[i]
        return result

        @jit(nopython=True)
        def mixnumba(sig, onset):
        # maxlen = np.max([onset[i] + len(sig[i]) for i in range(len(sig))])
        maxlen = -1
        for i in range(len(sig)):
        maxlen = max(maxlen, sig[i].size + onset[i])
        result = np.zeros(maxlen)
        for i in range(len(sig)):
        result[onset[i]: onset[i] + sig[i].size] += sig[i]
        return result

        def signal_adder_with_onset(data, onset):
        data = np.array(data)
        # Get lengths of each row of data
        lens = np.array([len(i) for i in data])
        #adjust with offset for max possible lengths
        max_size = lens + onset
        # Mask of valid places in each row
        mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
        & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

        # Setup output array and put elements from data into masked positions
        out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
        out[mask] = np.concatenate(data)
        return out.sum(axis=0)

        sigbig = [np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]
        onsetbig = np.random.randint(0, 10000, size=10000)
        sigrepeat = np.repeat(sig, 500000).tolist()
        onsetrepeat = np.repeat(onset, 500000)

        assert all(mixing_function(sigbig, onsetbig) == mix(sigbig, onsetbig))
        assert all(mixing_function(sigbig, onsetbig) == mixnumba(sigbig, onsetbig))
        assert all(mixing_function(sigbig, onsetbig) == signal_adder_with_onset(sigbig, onsetbig))

        %timeit result = mixing_function(sigbig, onsetbig)
        %timeit result = mix(sigbig, onsetbig)
        %timeit result = mixnumba(sigbig, onsetbig)
        %timeit result = signal_adder_with_onset(sigbig, onsetbig)
        # Output
        114 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
        108 ms ± 2.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
        368 ms ± 8.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
        13.4 s ± 211 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

        %timeit result = mixing_function(sigrepeat, onsetrepeat)
        %timeit result = mix(sigrepeat, onsetrepeat)
        %timeit result = mixnumba(sigrepeat, onsetrepeat)
        %timeit result = signal_adder_with_onset(sigrepeat.tolist(), onsetrepeat)
        # Output
        933 ms ± 6.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
        803 ms ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
        4.07 s ± 85.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
        254 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


        TL.DR.
        Marginal performance improvement (around 10% faster) by using np.vectorize in order to get maxlen for long signals of random length. Note that for many small signals, @Paritosh Singh answer performs faster than the others.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Mar 28 at 22:33

























        answered Mar 28 at 22:27









        Kevin LiuKevin Liu

        12310 bronze badges




        12310 bronze badges


























            0
















            If you offset the signals, then put them in a data frame, NaN will be added to columns to make all the rows the same length. Then you can do df.sum(). That will return a float rather than int, however.






            share|improve this answer

























            • this sounds interesting with pandas. can you give a code example of the offset?

              – J_yang
              Mar 28 at 20:43















            0
















            If you offset the signals, then put them in a data frame, NaN will be added to columns to make all the rows the same length. Then you can do df.sum(). That will return a float rather than int, however.






            share|improve this answer

























            • this sounds interesting with pandas. can you give a code example of the offset?

              – J_yang
              Mar 28 at 20:43













            0














            0










            0









            If you offset the signals, then put them in a data frame, NaN will be added to columns to make all the rows the same length. Then you can do df.sum(). That will return a float rather than int, however.






            share|improve this answer













            If you offset the signals, then put them in a data frame, NaN will be added to columns to make all the rows the same length. Then you can do df.sum(). That will return a float rather than int, however.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Mar 28 at 20:29









            AcccumulationAcccumulation

            1,7821 gold badge3 silver badges9 bronze badges




            1,7821 gold badge3 silver badges9 bronze badges















            • this sounds interesting with pandas. can you give a code example of the offset?

              – J_yang
              Mar 28 at 20:43

















            • this sounds interesting with pandas. can you give a code example of the offset?

              – J_yang
              Mar 28 at 20:43
















            this sounds interesting with pandas. can you give a code example of the offset?

            – J_yang
            Mar 28 at 20:43





            this sounds interesting with pandas. can you give a code example of the offset?

            – J_yang
            Mar 28 at 20:43











            0
















            Try numpy zero arrays of equal length with the signals appropriately inserted and simply performing 3 numpy array additions. Should speed things up considerably.



            def mixing_function(sig,onset):
            maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
            sig1 = np.zeros(maxlen)
            sig2 = np.zeros(maxlen)
            sig3 = np.zeros(maxlen)
            sig1[onset[0]:onset[0] + len(sig[0])] = sig[0]
            sig2[onset[1]:onset[1] + len(sig[1])] = sig[1]
            sig3[onset[2]:onset[2] + len(sig[2])] = sig[2]
            result = sig1+sig2+sig3
            print(sig1)
            print(sig2)
            print(sig3)
            print(result)





            share|improve this answer

























            • the code above is just an example, in practice, sig might contain dozens to hundreds of items though. So still can't get away with a for loop, which will be essentially the same.

              – J_yang
              Mar 28 at 20:42












            • Ah. Yes. Probably doesn't scale well. But if numpy additions aren't doing it for you, I'm not sure what will.

              – thatNLPguy
              Mar 28 at 20:46















            0
















            Try numpy zero arrays of equal length with the signals appropriately inserted and simply performing 3 numpy array additions. Should speed things up considerably.



            def mixing_function(sig,onset):
            maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
            sig1 = np.zeros(maxlen)
            sig2 = np.zeros(maxlen)
            sig3 = np.zeros(maxlen)
            sig1[onset[0]:onset[0] + len(sig[0])] = sig[0]
            sig2[onset[1]:onset[1] + len(sig[1])] = sig[1]
            sig3[onset[2]:onset[2] + len(sig[2])] = sig[2]
            result = sig1+sig2+sig3
            print(sig1)
            print(sig2)
            print(sig3)
            print(result)





            share|improve this answer

























            • the code above is just an example, in practice, sig might contain dozens to hundreds of items though. So still can't get away with a for loop, which will be essentially the same.

              – J_yang
              Mar 28 at 20:42












            • Ah. Yes. Probably doesn't scale well. But if numpy additions aren't doing it for you, I'm not sure what will.

              – thatNLPguy
              Mar 28 at 20:46













            0














            0










            0









            Try numpy zero arrays of equal length with the signals appropriately inserted and simply performing 3 numpy array additions. Should speed things up considerably.



            def mixing_function(sig,onset):
            maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
            sig1 = np.zeros(maxlen)
            sig2 = np.zeros(maxlen)
            sig3 = np.zeros(maxlen)
            sig1[onset[0]:onset[0] + len(sig[0])] = sig[0]
            sig2[onset[1]:onset[1] + len(sig[1])] = sig[1]
            sig3[onset[2]:onset[2] + len(sig[2])] = sig[2]
            result = sig1+sig2+sig3
            print(sig1)
            print(sig2)
            print(sig3)
            print(result)





            share|improve this answer













            Try numpy zero arrays of equal length with the signals appropriately inserted and simply performing 3 numpy array additions. Should speed things up considerably.



            def mixing_function(sig,onset):
            maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
            sig1 = np.zeros(maxlen)
            sig2 = np.zeros(maxlen)
            sig3 = np.zeros(maxlen)
            sig1[onset[0]:onset[0] + len(sig[0])] = sig[0]
            sig2[onset[1]:onset[1] + len(sig[1])] = sig[1]
            sig3[onset[2]:onset[2] + len(sig[2])] = sig[2]
            result = sig1+sig2+sig3
            print(sig1)
            print(sig2)
            print(sig3)
            print(result)






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Mar 28 at 20:38









            thatNLPguythatNLPguy

            1018 bronze badges




            1018 bronze badges















            • the code above is just an example, in practice, sig might contain dozens to hundreds of items though. So still can't get away with a for loop, which will be essentially the same.

              – J_yang
              Mar 28 at 20:42












            • Ah. Yes. Probably doesn't scale well. But if numpy additions aren't doing it for you, I'm not sure what will.

              – thatNLPguy
              Mar 28 at 20:46

















            • the code above is just an example, in practice, sig might contain dozens to hundreds of items though. So still can't get away with a for loop, which will be essentially the same.

              – J_yang
              Mar 28 at 20:42












            • Ah. Yes. Probably doesn't scale well. But if numpy additions aren't doing it for you, I'm not sure what will.

              – thatNLPguy
              Mar 28 at 20:46
















            the code above is just an example, in practice, sig might contain dozens to hundreds of items though. So still can't get away with a for loop, which will be essentially the same.

            – J_yang
            Mar 28 at 20:42






            the code above is just an example, in practice, sig might contain dozens to hundreds of items though. So still can't get away with a for loop, which will be essentially the same.

            – J_yang
            Mar 28 at 20:42














            Ah. Yes. Probably doesn't scale well. But if numpy additions aren't doing it for you, I'm not sure what will.

            – thatNLPguy
            Mar 28 at 20:46





            Ah. Yes. Probably doesn't scale well. But if numpy additions aren't doing it for you, I'm not sure what will.

            – thatNLPguy
            Mar 28 at 20:46











            0
















            Here's an attempt that should do the trick.



            def signal_adder_with_onset(data, onset):
            # Get lengths of each row of data
            lens = np.array([len(i) for i in data])
            #adjust with offset for max possible lengths
            max_size = lens + onset
            # Mask of valid places in each row
            mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
            & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

            # Setup output array and put elements from data into masked positions
            out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
            out[mask] = np.concatenate(data)
            return out.sum(axis=0)

            import numpy as np
            signal1 = np.array([1,2,3,4])
            signal2 = np.array([5,5,5])
            signal3 = np.array([7,7,7,7])
            sig = np.array([signal1,signal2,signal3])
            onset = np.array((0, 2, 8))
            result = signal_adder_with_onset(sig, onset)
            print(result)
            #[1 2 8 9 5 0 0 0 7 7 7 7]



            Edit: Vectorized operations only kick in with more data, and are slower with smaller amounts of data.



            Added for comparison



            import time

            def signal_adder_with_onset(data, onset):
            # Get lengths of each row of data
            lens = np.array([len(i) for i in data])
            #adjust with offset for max possible lengths
            max_size = lens + onset
            # Mask of valid places in each row
            mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
            & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

            # Setup output array and put elements from data into masked positions
            out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
            out[mask] = np.concatenate(data)
            return out.sum(axis=0)

            def mixing_function(sig,onset):
            maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
            result = np.zeros(maxlen)
            for i in range(len(onset)):
            result[onset[i]:onset[i] + len(sig[i])] += sig[i]
            return result

            import numpy as np
            signal1 = np.array([1,2,3,4])
            signal2 = np.array([5,5,5])
            signal3 = np.array([7,7,7,7])
            sig = np.array([signal1,signal2,signal3])
            sig = np.repeat(sig, 1000000)
            onset = np.array((0, 2, 8))
            onset = np.repeat(onset, 1000000)
            start1 = time.time()
            result = signal_adder_with_onset(sig, onset)
            end1 = time.time()
            start2 = time.time()
            result2 = mixing_function(sig,onset)
            end2 = time.time()
            print(f"Original function: end2 - start2 n Vectorized function: end1 - start1")
            print(result)
            #Output:
            Original function: 9.28258752822876
            Vectorized function: 2.5798118114471436
            [1000000 2000000 8000000 9000000 5000000 0 0 0 7000000 7000000 7000000
            7000000]





            share|improve this answer



























            • This code is actually much slower than the already proposed code in the op.

              – Kevin Liu
              Mar 28 at 20:34











            • well actually it is about 5 times slower with this method I am afraid.

              – J_yang
              Mar 28 at 20:34











            • It does the trick, but is it really faster? I checked and for me this method works slower.

              – Ardweaden
              Mar 28 at 20:34











            • Well, using a different dataset I got very different results: sig = np.array([np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]) onset = np.random.randint(0, 10000, size=10000) Gives the results: Original function: 0.156998872756958 Vectorized function: 14.857199907302856 I think long signals of varying length is a much more realistic scenario than one million tiny signals, but I guess only the op can determine what kind of data he expects.

              – Kevin Liu
              Mar 28 at 21:19















            0
















            Here's an attempt that should do the trick.



            def signal_adder_with_onset(data, onset):
            # Get lengths of each row of data
            lens = np.array([len(i) for i in data])
            #adjust with offset for max possible lengths
            max_size = lens + onset
            # Mask of valid places in each row
            mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
            & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

            # Setup output array and put elements from data into masked positions
            out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
            out[mask] = np.concatenate(data)
            return out.sum(axis=0)

            import numpy as np
            signal1 = np.array([1,2,3,4])
            signal2 = np.array([5,5,5])
            signal3 = np.array([7,7,7,7])
            sig = np.array([signal1,signal2,signal3])
            onset = np.array((0, 2, 8))
            result = signal_adder_with_onset(sig, onset)
            print(result)
            #[1 2 8 9 5 0 0 0 7 7 7 7]



            Edit: Vectorized operations only kick in with more data, and are slower with smaller amounts of data.



            Added for comparison



            import time

            def signal_adder_with_onset(data, onset):
            # Get lengths of each row of data
            lens = np.array([len(i) for i in data])
            #adjust with offset for max possible lengths
            max_size = lens + onset
            # Mask of valid places in each row
            mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
            & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

            # Setup output array and put elements from data into masked positions
            out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
            out[mask] = np.concatenate(data)
            return out.sum(axis=0)

            def mixing_function(sig,onset):
            maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
            result = np.zeros(maxlen)
            for i in range(len(onset)):
            result[onset[i]:onset[i] + len(sig[i])] += sig[i]
            return result

            import numpy as np
            signal1 = np.array([1,2,3,4])
            signal2 = np.array([5,5,5])
            signal3 = np.array([7,7,7,7])
            sig = np.array([signal1,signal2,signal3])
            sig = np.repeat(sig, 1000000)
            onset = np.array((0, 2, 8))
            onset = np.repeat(onset, 1000000)
            start1 = time.time()
            result = signal_adder_with_onset(sig, onset)
            end1 = time.time()
            start2 = time.time()
            result2 = mixing_function(sig,onset)
            end2 = time.time()
            print(f"Original function: end2 - start2 n Vectorized function: end1 - start1")
            print(result)
            #Output:
            Original function: 9.28258752822876
            Vectorized function: 2.5798118114471436
            [1000000 2000000 8000000 9000000 5000000 0 0 0 7000000 7000000 7000000
            7000000]





            share|improve this answer



























            • This code is actually much slower than the already proposed code in the op.

              – Kevin Liu
              Mar 28 at 20:34











            • well actually it is about 5 times slower with this method I am afraid.

              – J_yang
              Mar 28 at 20:34











            • It does the trick, but is it really faster? I checked and for me this method works slower.

              – Ardweaden
              Mar 28 at 20:34











            • Well, using a different dataset I got very different results: sig = np.array([np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]) onset = np.random.randint(0, 10000, size=10000) Gives the results: Original function: 0.156998872756958 Vectorized function: 14.857199907302856 I think long signals of varying length is a much more realistic scenario than one million tiny signals, but I guess only the op can determine what kind of data he expects.

              – Kevin Liu
              Mar 28 at 21:19













            0














            0










            0









            Here's an attempt that should do the trick.



            def signal_adder_with_onset(data, onset):
            # Get lengths of each row of data
            lens = np.array([len(i) for i in data])
            #adjust with offset for max possible lengths
            max_size = lens + onset
            # Mask of valid places in each row
            mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
            & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

            # Setup output array and put elements from data into masked positions
            out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
            out[mask] = np.concatenate(data)
            return out.sum(axis=0)

            import numpy as np
            signal1 = np.array([1,2,3,4])
            signal2 = np.array([5,5,5])
            signal3 = np.array([7,7,7,7])
            sig = np.array([signal1,signal2,signal3])
            onset = np.array((0, 2, 8))
            result = signal_adder_with_onset(sig, onset)
            print(result)
            #[1 2 8 9 5 0 0 0 7 7 7 7]



            Edit: Vectorized operations only kick in with more data, and are slower with smaller amounts of data.



            Added for comparison



            import time

            def signal_adder_with_onset(data, onset):
            # Get lengths of each row of data
            lens = np.array([len(i) for i in data])
            #adjust with offset for max possible lengths
            max_size = lens + onset
            # Mask of valid places in each row
            mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
            & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

            # Setup output array and put elements from data into masked positions
            out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
            out[mask] = np.concatenate(data)
            return out.sum(axis=0)

            def mixing_function(sig,onset):
            maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
            result = np.zeros(maxlen)
            for i in range(len(onset)):
            result[onset[i]:onset[i] + len(sig[i])] += sig[i]
            return result

            import numpy as np
            signal1 = np.array([1,2,3,4])
            signal2 = np.array([5,5,5])
            signal3 = np.array([7,7,7,7])
            sig = np.array([signal1,signal2,signal3])
            sig = np.repeat(sig, 1000000)
            onset = np.array((0, 2, 8))
            onset = np.repeat(onset, 1000000)
            start1 = time.time()
            result = signal_adder_with_onset(sig, onset)
            end1 = time.time()
            start2 = time.time()
            result2 = mixing_function(sig,onset)
            end2 = time.time()
            print(f"Original function: end2 - start2 n Vectorized function: end1 - start1")
            print(result)
            #Output:
            Original function: 9.28258752822876
            Vectorized function: 2.5798118114471436
            [1000000 2000000 8000000 9000000 5000000 0 0 0 7000000 7000000 7000000
            7000000]





            share|improve this answer















            Here's an attempt that should do the trick.



            def signal_adder_with_onset(data, onset):
            # Get lengths of each row of data
            lens = np.array([len(i) for i in data])
            #adjust with offset for max possible lengths
            max_size = lens + onset
            # Mask of valid places in each row
            mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
            & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

            # Setup output array and put elements from data into masked positions
            out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
            out[mask] = np.concatenate(data)
            return out.sum(axis=0)

            import numpy as np
            signal1 = np.array([1,2,3,4])
            signal2 = np.array([5,5,5])
            signal3 = np.array([7,7,7,7])
            sig = np.array([signal1,signal2,signal3])
            onset = np.array((0, 2, 8))
            result = signal_adder_with_onset(sig, onset)
            print(result)
            #[1 2 8 9 5 0 0 0 7 7 7 7]



            Edit: Vectorized operations only kick in with more data, and are slower with smaller amounts of data.



            Added for comparison



            import time

            def signal_adder_with_onset(data, onset):
            # Get lengths of each row of data
            lens = np.array([len(i) for i in data])
            #adjust with offset for max possible lengths
            max_size = lens + onset
            # Mask of valid places in each row
            mask = ((np.arange(max_size.max()) >= onset.reshape(-1, 1))
            & (np.arange(max_size.max()) < (lens + onset).reshape(-1, 1)))

            # Setup output array and put elements from data into masked positions
            out = np.zeros(mask.shape, dtype=data.dtype) #could perhaps change dtype here
            out[mask] = np.concatenate(data)
            return out.sum(axis=0)

            def mixing_function(sig,onset):
            maxlen = np.max([o + len(s) for o, s in zip(onset, sig)])
            result = np.zeros(maxlen)
            for i in range(len(onset)):
            result[onset[i]:onset[i] + len(sig[i])] += sig[i]
            return result

            import numpy as np
            signal1 = np.array([1,2,3,4])
            signal2 = np.array([5,5,5])
            signal3 = np.array([7,7,7,7])
            sig = np.array([signal1,signal2,signal3])
            sig = np.repeat(sig, 1000000)
            onset = np.array((0, 2, 8))
            onset = np.repeat(onset, 1000000)
            start1 = time.time()
            result = signal_adder_with_onset(sig, onset)
            end1 = time.time()
            start2 = time.time()
            result2 = mixing_function(sig,onset)
            end2 = time.time()
            print(f"Original function: end2 - start2 n Vectorized function: end1 - start1")
            print(result)
            #Output:
            Original function: 9.28258752822876
            Vectorized function: 2.5798118114471436
            [1000000 2000000 8000000 9000000 5000000 0 0 0 7000000 7000000 7000000
            7000000]






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Mar 28 at 20:49

























            answered Mar 28 at 20:20









            Paritosh SinghParitosh Singh

            4,5192 gold badges7 silver badges29 bronze badges




            4,5192 gold badges7 silver badges29 bronze badges















            • This code is actually much slower than the already proposed code in the op.

              – Kevin Liu
              Mar 28 at 20:34











            • well actually it is about 5 times slower with this method I am afraid.

              – J_yang
              Mar 28 at 20:34











            • It does the trick, but is it really faster? I checked and for me this method works slower.

              – Ardweaden
              Mar 28 at 20:34











            • Well, using a different dataset I got very different results: sig = np.array([np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]) onset = np.random.randint(0, 10000, size=10000) Gives the results: Original function: 0.156998872756958 Vectorized function: 14.857199907302856 I think long signals of varying length is a much more realistic scenario than one million tiny signals, but I guess only the op can determine what kind of data he expects.

              – Kevin Liu
              Mar 28 at 21:19

















            • This code is actually much slower than the already proposed code in the op.

              – Kevin Liu
              Mar 28 at 20:34











            • well actually it is about 5 times slower with this method I am afraid.

              – J_yang
              Mar 28 at 20:34











            • It does the trick, but is it really faster? I checked and for me this method works slower.

              – Ardweaden
              Mar 28 at 20:34











            • Well, using a different dataset I got very different results: sig = np.array([np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]) onset = np.random.randint(0, 10000, size=10000) Gives the results: Original function: 0.156998872756958 Vectorized function: 14.857199907302856 I think long signals of varying length is a much more realistic scenario than one million tiny signals, but I guess only the op can determine what kind of data he expects.

              – Kevin Liu
              Mar 28 at 21:19
















            This code is actually much slower than the already proposed code in the op.

            – Kevin Liu
            Mar 28 at 20:34





            This code is actually much slower than the already proposed code in the op.

            – Kevin Liu
            Mar 28 at 20:34













            well actually it is about 5 times slower with this method I am afraid.

            – J_yang
            Mar 28 at 20:34





            well actually it is about 5 times slower with this method I am afraid.

            – J_yang
            Mar 28 at 20:34













            It does the trick, but is it really faster? I checked and for me this method works slower.

            – Ardweaden
            Mar 28 at 20:34





            It does the trick, but is it really faster? I checked and for me this method works slower.

            – Ardweaden
            Mar 28 at 20:34













            Well, using a different dataset I got very different results: sig = np.array([np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]) onset = np.random.randint(0, 10000, size=10000) Gives the results: Original function: 0.156998872756958 Vectorized function: 14.857199907302856 I think long signals of varying length is a much more realistic scenario than one million tiny signals, but I guess only the op can determine what kind of data he expects.

            – Kevin Liu
            Mar 28 at 21:19





            Well, using a different dataset I got very different results: sig = np.array([np.random.randn(np.random.randint(1000, 10000)) for _ in range(10000)]) onset = np.random.randint(0, 10000, size=10000) Gives the results: Original function: 0.156998872756958 Vectorized function: 14.857199907302856 I think long signals of varying length is a much more realistic scenario than one million tiny signals, but I guess only the op can determine what kind of data he expects.

            – Kevin Liu
            Mar 28 at 21:19


















            draft saved

            draft discarded















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55405295%2fpython-how-to-add-multiple-arrays-with-different-length-into-one%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

            Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

            Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript