Counting the transitions from specific variable to anoter onePython : count the number of changes of numbersAre static class variables possible in Python?How to randomly select an item from a list?How do I return multiple values from a function?Using global variables in a functionHow do I pass a variable by reference?How can I count the occurrences of a list item?Peak detection in a 2D arrayHow to access environment variable values?Speed comparison with Project Euler: C vs Python vs Erlang vs HaskellWhy are some float < integer comparisons four times slower than others?

how to add 1 milliseconds on a datetime string?

Download file from URL to Safehouse

What is a "staved" town, like in "Staverton"?

Why is the UH-60 tail rotor canted?

Where is this photo of a group of hikers taken? Is it really in the Ural?

What was the rationale behind 36 bit computer architectures?

Company requiring me to let them review research from before I was hired

Killing a star safely

What happens when two cards both modify what I'm allowed to do?

Is it possible to eat quietly in Minecraft?

Considerations when providing money to one child now, and the other later?

Can 々 stand for a duplicated kanji with a different reading?

Path of Diffeomorphisms Fixing the Boundary

Does Impedance Matching Imply any Practical RF Transmitter Must Waste >=50% of Energy?

How could Barty Crouch Jr. have run out of Polyjuice Potion at the end of the Goblet of Fire movie?

What rules turn any attack that hits a given target into a critical hit?

German phrase for 'suited and booted'

"It is what it is" in French

How can I deal with someone that wants to kill something that isn't supposed to be killed?

Why are there not any MRI machines available in Interstellar?

What is the relationship between the theme songs in Sherlock Holmes (2009 movie) and Sherlock (BBC series)?

High income and difficulty during interviews

Is the apartment I want to rent a scam?

Span command across LaTeX environments



Counting the transitions from specific variable to anoter one


Python : count the number of changes of numbersAre static class variables possible in Python?How to randomly select an item from a list?How do I return multiple values from a function?Using global variables in a functionHow do I pass a variable by reference?How can I count the occurrences of a list item?Peak detection in a 2D arrayHow to access environment variable values?Speed comparison with Project Euler: C vs Python vs Erlang vs HaskellWhy are some float < integer comparisons four times slower than others?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








0















My problem is very similar to this post:
Python : count the number of changes of numbers



But since I cannot comment yet, I would like to know if there is a faster way?



My code is significantly the same as in the link, but the ranges of i and j are way much bigger (around one million in total) , meaning that it takes a significantly larger time to count (more than a day!)










share|improve this question

















  • 1





    Python is not really the language of choice for high-performance computing.

    – Scott Hunter
    Mar 26 at 15:21











  • I can see why. But it's the language I'm asked to use, I have no other choice.

    – Peter
    Mar 26 at 15:26






  • 1





    If your i and j values have a range of about 1 million, then there are about 1e12 possible transitions... just having those written to a file would take terabytes... is that really what you are trying to compute?

    – jdehesa
    Mar 26 at 15:34











  • If it takes more than a day, just start the job on Friday afternoon, and have the results first thing on Monday. You're not mentioning how often you need to run the same program, or any indication how fast you'd like it to be.

    – 0 0
    Mar 26 at 15:36












  • It would be good if you include the actual code you have, and possibly some sample data, in your question. Linking to another question is fine, but it's annoying to go back and forth: a single, complete question, would be nicer. (The linked question is also a rather lengthy read; a summarized version would be nice.)

    – 0 0
    Mar 26 at 15:41

















0















My problem is very similar to this post:
Python : count the number of changes of numbers



But since I cannot comment yet, I would like to know if there is a faster way?



My code is significantly the same as in the link, but the ranges of i and j are way much bigger (around one million in total) , meaning that it takes a significantly larger time to count (more than a day!)










share|improve this question

















  • 1





    Python is not really the language of choice for high-performance computing.

    – Scott Hunter
    Mar 26 at 15:21











  • I can see why. But it's the language I'm asked to use, I have no other choice.

    – Peter
    Mar 26 at 15:26






  • 1





    If your i and j values have a range of about 1 million, then there are about 1e12 possible transitions... just having those written to a file would take terabytes... is that really what you are trying to compute?

    – jdehesa
    Mar 26 at 15:34











  • If it takes more than a day, just start the job on Friday afternoon, and have the results first thing on Monday. You're not mentioning how often you need to run the same program, or any indication how fast you'd like it to be.

    – 0 0
    Mar 26 at 15:36












  • It would be good if you include the actual code you have, and possibly some sample data, in your question. Linking to another question is fine, but it's annoying to go back and forth: a single, complete question, would be nicer. (The linked question is also a rather lengthy read; a summarized version would be nice.)

    – 0 0
    Mar 26 at 15:41













0












0








0








My problem is very similar to this post:
Python : count the number of changes of numbers



But since I cannot comment yet, I would like to know if there is a faster way?



My code is significantly the same as in the link, but the ranges of i and j are way much bigger (around one million in total) , meaning that it takes a significantly larger time to count (more than a day!)










share|improve this question














My problem is very similar to this post:
Python : count the number of changes of numbers



But since I cannot comment yet, I would like to know if there is a faster way?



My code is significantly the same as in the link, but the ranges of i and j are way much bigger (around one million in total) , meaning that it takes a significantly larger time to count (more than a day!)







python






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 26 at 15:20









PeterPeter

235 bronze badges




235 bronze badges







  • 1





    Python is not really the language of choice for high-performance computing.

    – Scott Hunter
    Mar 26 at 15:21











  • I can see why. But it's the language I'm asked to use, I have no other choice.

    – Peter
    Mar 26 at 15:26






  • 1





    If your i and j values have a range of about 1 million, then there are about 1e12 possible transitions... just having those written to a file would take terabytes... is that really what you are trying to compute?

    – jdehesa
    Mar 26 at 15:34











  • If it takes more than a day, just start the job on Friday afternoon, and have the results first thing on Monday. You're not mentioning how often you need to run the same program, or any indication how fast you'd like it to be.

    – 0 0
    Mar 26 at 15:36












  • It would be good if you include the actual code you have, and possibly some sample data, in your question. Linking to another question is fine, but it's annoying to go back and forth: a single, complete question, would be nicer. (The linked question is also a rather lengthy read; a summarized version would be nice.)

    – 0 0
    Mar 26 at 15:41












  • 1





    Python is not really the language of choice for high-performance computing.

    – Scott Hunter
    Mar 26 at 15:21











  • I can see why. But it's the language I'm asked to use, I have no other choice.

    – Peter
    Mar 26 at 15:26






  • 1





    If your i and j values have a range of about 1 million, then there are about 1e12 possible transitions... just having those written to a file would take terabytes... is that really what you are trying to compute?

    – jdehesa
    Mar 26 at 15:34











  • If it takes more than a day, just start the job on Friday afternoon, and have the results first thing on Monday. You're not mentioning how often you need to run the same program, or any indication how fast you'd like it to be.

    – 0 0
    Mar 26 at 15:36












  • It would be good if you include the actual code you have, and possibly some sample data, in your question. Linking to another question is fine, but it's annoying to go back and forth: a single, complete question, would be nicer. (The linked question is also a rather lengthy read; a summarized version would be nice.)

    – 0 0
    Mar 26 at 15:41







1




1





Python is not really the language of choice for high-performance computing.

– Scott Hunter
Mar 26 at 15:21





Python is not really the language of choice for high-performance computing.

– Scott Hunter
Mar 26 at 15:21













I can see why. But it's the language I'm asked to use, I have no other choice.

– Peter
Mar 26 at 15:26





I can see why. But it's the language I'm asked to use, I have no other choice.

– Peter
Mar 26 at 15:26




1




1





If your i and j values have a range of about 1 million, then there are about 1e12 possible transitions... just having those written to a file would take terabytes... is that really what you are trying to compute?

– jdehesa
Mar 26 at 15:34





If your i and j values have a range of about 1 million, then there are about 1e12 possible transitions... just having those written to a file would take terabytes... is that really what you are trying to compute?

– jdehesa
Mar 26 at 15:34













If it takes more than a day, just start the job on Friday afternoon, and have the results first thing on Monday. You're not mentioning how often you need to run the same program, or any indication how fast you'd like it to be.

– 0 0
Mar 26 at 15:36






If it takes more than a day, just start the job on Friday afternoon, and have the results first thing on Monday. You're not mentioning how often you need to run the same program, or any indication how fast you'd like it to be.

– 0 0
Mar 26 at 15:36














It would be good if you include the actual code you have, and possibly some sample data, in your question. Linking to another question is fine, but it's annoying to go back and forth: a single, complete question, would be nicer. (The linked question is also a rather lengthy read; a summarized version would be nice.)

– 0 0
Mar 26 at 15:41





It would be good if you include the actual code you have, and possibly some sample data, in your question. Linking to another question is fine, but it's annoying to go back and forth: a single, complete question, would be nicer. (The linked question is also a rather lengthy read; a summarized version would be nice.)

– 0 0
Mar 26 at 15:41












1 Answer
1






active

oldest

votes


















1














It is most definitely a better idea to save all the transition counts to a data structure instead of counting the appearances of each individual transition. It could be something like this:



def count_transitions(numbers):
n = max(numbers)
transitions = [[0] * (n + 1) for _ in range(n + 1)]
for i in range(len(numbers) - 1):
n1 = numbers[i]
n2 = numbers[i + 1]
transitions[n1][n2] += 1
return transitions


An example of how you could use it:



test_data = [1, 0, 1, 0, 1, 2, 0, 2, 0, 1, 1]
test_result = count_transitions(test_data)
for i, row in enumerate(test_result):
for j, count in enumerate(row):
print(f'i -> j: count')


Output:



0 -> 0: 0
0 -> 1: 3
0 -> 2: 1
1 -> 0: 2
1 -> 1: 1
1 -> 2: 1
2 -> 0: 2
2 -> 1: 0
2 -> 2: 0


Now, another matter is making this fast. This algorithm should already be much faster, because it has linear complexity instead of cubic, but we can use a couple of tools to make it still better. For example, using NumPy you could do it just like this:



import numpy as np

def count_transitions_np(numbers):
numbers = np.asarray(numbers)
n = numbers.max()
transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
np.add.at(transitions, (numbers[:-1], numbers[1:]), 1)
return transitions


Or you can use Numba with something like this:



@nb.njit
def count_transitions_nb(numbers):
n = 0
for num in numbers:
n = max(num, n)
transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
for i in range(len(numbers) - 1):
n1 = numbers[i]
n2 = numbers[i + 1]
transitions[n1, n2] += 1
return transitions


Finally, yet another option is to build a sparse matrix with SciPy. Note this is not the same as a dense matrix, but you may be able to work with it too.



import numpy as np
import scipy.sparse

def count_transitions_sp(numbers):
numbers = np.asarray(numbers)
n = numbers.max()
v = np.ones(len(numbers) - 1, dtype=np.int32)
return scipy.sparse.coo_matrix((v, (numbers[:-1], numbers[1:])), (n + 1, n + 1))


And now a small benchmark:



import random

# Generate input data
random.seed(100)
numbers = [random.randint(0, 1000) for _ in range(1000000)]

# Check results are correct
result1 = count_transitions(numbers)
result2 = count_transitions_np(numbers).tolist()
result3 = count_transitions_nb(numbers).tolist()
result4 = count_transitions_sp(numbers).todense().tolist()
print(result1 == result2)
# True
print(result1 == result3)
# True
print(result1 == result4)
# True

# NumPy version of data for NumPy, Numba and SciPy
numbers_np = np.asarray(numbers)
# Time it with IPython
%timeit count_transitions(numbers)
# 178 ms ± 633 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_np(numbers_np)
# 80.7 ms ± 663 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_nb(numbers_np)
# 5.36 ms ± 240 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit count_transitions_sp(numbers_np)
# 4.05 ms ± 47.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


As you see, Numba can be really fast, and sparse matrices are quick to build too if you can use them.






share|improve this answer

























  • Thanks for your help

    – Peter
    Mar 27 at 10:04










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55360687%2fcounting-the-transitions-from-specific-variable-to-anoter-one%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














It is most definitely a better idea to save all the transition counts to a data structure instead of counting the appearances of each individual transition. It could be something like this:



def count_transitions(numbers):
n = max(numbers)
transitions = [[0] * (n + 1) for _ in range(n + 1)]
for i in range(len(numbers) - 1):
n1 = numbers[i]
n2 = numbers[i + 1]
transitions[n1][n2] += 1
return transitions


An example of how you could use it:



test_data = [1, 0, 1, 0, 1, 2, 0, 2, 0, 1, 1]
test_result = count_transitions(test_data)
for i, row in enumerate(test_result):
for j, count in enumerate(row):
print(f'i -> j: count')


Output:



0 -> 0: 0
0 -> 1: 3
0 -> 2: 1
1 -> 0: 2
1 -> 1: 1
1 -> 2: 1
2 -> 0: 2
2 -> 1: 0
2 -> 2: 0


Now, another matter is making this fast. This algorithm should already be much faster, because it has linear complexity instead of cubic, but we can use a couple of tools to make it still better. For example, using NumPy you could do it just like this:



import numpy as np

def count_transitions_np(numbers):
numbers = np.asarray(numbers)
n = numbers.max()
transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
np.add.at(transitions, (numbers[:-1], numbers[1:]), 1)
return transitions


Or you can use Numba with something like this:



@nb.njit
def count_transitions_nb(numbers):
n = 0
for num in numbers:
n = max(num, n)
transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
for i in range(len(numbers) - 1):
n1 = numbers[i]
n2 = numbers[i + 1]
transitions[n1, n2] += 1
return transitions


Finally, yet another option is to build a sparse matrix with SciPy. Note this is not the same as a dense matrix, but you may be able to work with it too.



import numpy as np
import scipy.sparse

def count_transitions_sp(numbers):
numbers = np.asarray(numbers)
n = numbers.max()
v = np.ones(len(numbers) - 1, dtype=np.int32)
return scipy.sparse.coo_matrix((v, (numbers[:-1], numbers[1:])), (n + 1, n + 1))


And now a small benchmark:



import random

# Generate input data
random.seed(100)
numbers = [random.randint(0, 1000) for _ in range(1000000)]

# Check results are correct
result1 = count_transitions(numbers)
result2 = count_transitions_np(numbers).tolist()
result3 = count_transitions_nb(numbers).tolist()
result4 = count_transitions_sp(numbers).todense().tolist()
print(result1 == result2)
# True
print(result1 == result3)
# True
print(result1 == result4)
# True

# NumPy version of data for NumPy, Numba and SciPy
numbers_np = np.asarray(numbers)
# Time it with IPython
%timeit count_transitions(numbers)
# 178 ms ± 633 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_np(numbers_np)
# 80.7 ms ± 663 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_nb(numbers_np)
# 5.36 ms ± 240 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit count_transitions_sp(numbers_np)
# 4.05 ms ± 47.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


As you see, Numba can be really fast, and sparse matrices are quick to build too if you can use them.






share|improve this answer

























  • Thanks for your help

    – Peter
    Mar 27 at 10:04















1














It is most definitely a better idea to save all the transition counts to a data structure instead of counting the appearances of each individual transition. It could be something like this:



def count_transitions(numbers):
n = max(numbers)
transitions = [[0] * (n + 1) for _ in range(n + 1)]
for i in range(len(numbers) - 1):
n1 = numbers[i]
n2 = numbers[i + 1]
transitions[n1][n2] += 1
return transitions


An example of how you could use it:



test_data = [1, 0, 1, 0, 1, 2, 0, 2, 0, 1, 1]
test_result = count_transitions(test_data)
for i, row in enumerate(test_result):
for j, count in enumerate(row):
print(f'i -> j: count')


Output:



0 -> 0: 0
0 -> 1: 3
0 -> 2: 1
1 -> 0: 2
1 -> 1: 1
1 -> 2: 1
2 -> 0: 2
2 -> 1: 0
2 -> 2: 0


Now, another matter is making this fast. This algorithm should already be much faster, because it has linear complexity instead of cubic, but we can use a couple of tools to make it still better. For example, using NumPy you could do it just like this:



import numpy as np

def count_transitions_np(numbers):
numbers = np.asarray(numbers)
n = numbers.max()
transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
np.add.at(transitions, (numbers[:-1], numbers[1:]), 1)
return transitions


Or you can use Numba with something like this:



@nb.njit
def count_transitions_nb(numbers):
n = 0
for num in numbers:
n = max(num, n)
transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
for i in range(len(numbers) - 1):
n1 = numbers[i]
n2 = numbers[i + 1]
transitions[n1, n2] += 1
return transitions


Finally, yet another option is to build a sparse matrix with SciPy. Note this is not the same as a dense matrix, but you may be able to work with it too.



import numpy as np
import scipy.sparse

def count_transitions_sp(numbers):
numbers = np.asarray(numbers)
n = numbers.max()
v = np.ones(len(numbers) - 1, dtype=np.int32)
return scipy.sparse.coo_matrix((v, (numbers[:-1], numbers[1:])), (n + 1, n + 1))


And now a small benchmark:



import random

# Generate input data
random.seed(100)
numbers = [random.randint(0, 1000) for _ in range(1000000)]

# Check results are correct
result1 = count_transitions(numbers)
result2 = count_transitions_np(numbers).tolist()
result3 = count_transitions_nb(numbers).tolist()
result4 = count_transitions_sp(numbers).todense().tolist()
print(result1 == result2)
# True
print(result1 == result3)
# True
print(result1 == result4)
# True

# NumPy version of data for NumPy, Numba and SciPy
numbers_np = np.asarray(numbers)
# Time it with IPython
%timeit count_transitions(numbers)
# 178 ms ± 633 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_np(numbers_np)
# 80.7 ms ± 663 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_nb(numbers_np)
# 5.36 ms ± 240 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit count_transitions_sp(numbers_np)
# 4.05 ms ± 47.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


As you see, Numba can be really fast, and sparse matrices are quick to build too if you can use them.






share|improve this answer

























  • Thanks for your help

    – Peter
    Mar 27 at 10:04













1












1








1







It is most definitely a better idea to save all the transition counts to a data structure instead of counting the appearances of each individual transition. It could be something like this:



def count_transitions(numbers):
n = max(numbers)
transitions = [[0] * (n + 1) for _ in range(n + 1)]
for i in range(len(numbers) - 1):
n1 = numbers[i]
n2 = numbers[i + 1]
transitions[n1][n2] += 1
return transitions


An example of how you could use it:



test_data = [1, 0, 1, 0, 1, 2, 0, 2, 0, 1, 1]
test_result = count_transitions(test_data)
for i, row in enumerate(test_result):
for j, count in enumerate(row):
print(f'i -> j: count')


Output:



0 -> 0: 0
0 -> 1: 3
0 -> 2: 1
1 -> 0: 2
1 -> 1: 1
1 -> 2: 1
2 -> 0: 2
2 -> 1: 0
2 -> 2: 0


Now, another matter is making this fast. This algorithm should already be much faster, because it has linear complexity instead of cubic, but we can use a couple of tools to make it still better. For example, using NumPy you could do it just like this:



import numpy as np

def count_transitions_np(numbers):
numbers = np.asarray(numbers)
n = numbers.max()
transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
np.add.at(transitions, (numbers[:-1], numbers[1:]), 1)
return transitions


Or you can use Numba with something like this:



@nb.njit
def count_transitions_nb(numbers):
n = 0
for num in numbers:
n = max(num, n)
transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
for i in range(len(numbers) - 1):
n1 = numbers[i]
n2 = numbers[i + 1]
transitions[n1, n2] += 1
return transitions


Finally, yet another option is to build a sparse matrix with SciPy. Note this is not the same as a dense matrix, but you may be able to work with it too.



import numpy as np
import scipy.sparse

def count_transitions_sp(numbers):
numbers = np.asarray(numbers)
n = numbers.max()
v = np.ones(len(numbers) - 1, dtype=np.int32)
return scipy.sparse.coo_matrix((v, (numbers[:-1], numbers[1:])), (n + 1, n + 1))


And now a small benchmark:



import random

# Generate input data
random.seed(100)
numbers = [random.randint(0, 1000) for _ in range(1000000)]

# Check results are correct
result1 = count_transitions(numbers)
result2 = count_transitions_np(numbers).tolist()
result3 = count_transitions_nb(numbers).tolist()
result4 = count_transitions_sp(numbers).todense().tolist()
print(result1 == result2)
# True
print(result1 == result3)
# True
print(result1 == result4)
# True

# NumPy version of data for NumPy, Numba and SciPy
numbers_np = np.asarray(numbers)
# Time it with IPython
%timeit count_transitions(numbers)
# 178 ms ± 633 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_np(numbers_np)
# 80.7 ms ± 663 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_nb(numbers_np)
# 5.36 ms ± 240 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit count_transitions_sp(numbers_np)
# 4.05 ms ± 47.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


As you see, Numba can be really fast, and sparse matrices are quick to build too if you can use them.






share|improve this answer















It is most definitely a better idea to save all the transition counts to a data structure instead of counting the appearances of each individual transition. It could be something like this:



def count_transitions(numbers):
n = max(numbers)
transitions = [[0] * (n + 1) for _ in range(n + 1)]
for i in range(len(numbers) - 1):
n1 = numbers[i]
n2 = numbers[i + 1]
transitions[n1][n2] += 1
return transitions


An example of how you could use it:



test_data = [1, 0, 1, 0, 1, 2, 0, 2, 0, 1, 1]
test_result = count_transitions(test_data)
for i, row in enumerate(test_result):
for j, count in enumerate(row):
print(f'i -> j: count')


Output:



0 -> 0: 0
0 -> 1: 3
0 -> 2: 1
1 -> 0: 2
1 -> 1: 1
1 -> 2: 1
2 -> 0: 2
2 -> 1: 0
2 -> 2: 0


Now, another matter is making this fast. This algorithm should already be much faster, because it has linear complexity instead of cubic, but we can use a couple of tools to make it still better. For example, using NumPy you could do it just like this:



import numpy as np

def count_transitions_np(numbers):
numbers = np.asarray(numbers)
n = numbers.max()
transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
np.add.at(transitions, (numbers[:-1], numbers[1:]), 1)
return transitions


Or you can use Numba with something like this:



@nb.njit
def count_transitions_nb(numbers):
n = 0
for num in numbers:
n = max(num, n)
transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
for i in range(len(numbers) - 1):
n1 = numbers[i]
n2 = numbers[i + 1]
transitions[n1, n2] += 1
return transitions


Finally, yet another option is to build a sparse matrix with SciPy. Note this is not the same as a dense matrix, but you may be able to work with it too.



import numpy as np
import scipy.sparse

def count_transitions_sp(numbers):
numbers = np.asarray(numbers)
n = numbers.max()
v = np.ones(len(numbers) - 1, dtype=np.int32)
return scipy.sparse.coo_matrix((v, (numbers[:-1], numbers[1:])), (n + 1, n + 1))


And now a small benchmark:



import random

# Generate input data
random.seed(100)
numbers = [random.randint(0, 1000) for _ in range(1000000)]

# Check results are correct
result1 = count_transitions(numbers)
result2 = count_transitions_np(numbers).tolist()
result3 = count_transitions_nb(numbers).tolist()
result4 = count_transitions_sp(numbers).todense().tolist()
print(result1 == result2)
# True
print(result1 == result3)
# True
print(result1 == result4)
# True

# NumPy version of data for NumPy, Numba and SciPy
numbers_np = np.asarray(numbers)
# Time it with IPython
%timeit count_transitions(numbers)
# 178 ms ± 633 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_np(numbers_np)
# 80.7 ms ± 663 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_nb(numbers_np)
# 5.36 ms ± 240 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit count_transitions_sp(numbers_np)
# 4.05 ms ± 47.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


As you see, Numba can be really fast, and sparse matrices are quick to build too if you can use them.







share|improve this answer














share|improve this answer



share|improve this answer








edited Mar 26 at 16:50

























answered Mar 26 at 16:26









jdehesajdehesa

32.9k4 gold badges39 silver badges61 bronze badges




32.9k4 gold badges39 silver badges61 bronze badges












  • Thanks for your help

    – Peter
    Mar 27 at 10:04

















  • Thanks for your help

    – Peter
    Mar 27 at 10:04
















Thanks for your help

– Peter
Mar 27 at 10:04





Thanks for your help

– Peter
Mar 27 at 10:04








Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.







Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55360687%2fcounting-the-transitions-from-specific-variable-to-anoter-one%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

용인 삼성생명 블루밍스 목차 통계 역대 감독 선수단 응원단 경기장 같이 보기 외부 링크 둘러보기 메뉴samsungblueminx.comeh선수 명단용인 삼성생명 블루밍스용인 삼성생명 블루밍스ehsamsungblueminx.comeheheheh

155 수학 과학 기타 둘러보기 메뉴eh추가해eh문서를 완성해