Counting the transitions from specific variable to anoter onePython : count the number of changes of numbersAre static class variables possible in Python?How to randomly select an item from a list?How do I return multiple values from a function?Using global variables in a functionHow do I pass a variable by reference?How can I count the occurrences of a list item?Peak detection in a 2D arrayHow to access environment variable values?Speed comparison with Project Euler: C vs Python vs Erlang vs HaskellWhy are some float < integer comparisons four times slower than others?
how to add 1 milliseconds on a datetime string?
Download file from URL to Safehouse
What is a "staved" town, like in "Staverton"?
Why is the UH-60 tail rotor canted?
Where is this photo of a group of hikers taken? Is it really in the Ural?
What was the rationale behind 36 bit computer architectures?
Company requiring me to let them review research from before I was hired
Killing a star safely
What happens when two cards both modify what I'm allowed to do?
Is it possible to eat quietly in Minecraft?
Considerations when providing money to one child now, and the other later?
Can 々 stand for a duplicated kanji with a different reading?
Path of Diffeomorphisms Fixing the Boundary
Does Impedance Matching Imply any Practical RF Transmitter Must Waste >=50% of Energy?
How could Barty Crouch Jr. have run out of Polyjuice Potion at the end of the Goblet of Fire movie?
What rules turn any attack that hits a given target into a critical hit?
German phrase for 'suited and booted'
"It is what it is" in French
How can I deal with someone that wants to kill something that isn't supposed to be killed?
Why are there not any MRI machines available in Interstellar?
What is the relationship between the theme songs in Sherlock Holmes (2009 movie) and Sherlock (BBC series)?
High income and difficulty during interviews
Is the apartment I want to rent a scam?
Span command across LaTeX environments
Counting the transitions from specific variable to anoter one
Python : count the number of changes of numbersAre static class variables possible in Python?How to randomly select an item from a list?How do I return multiple values from a function?Using global variables in a functionHow do I pass a variable by reference?How can I count the occurrences of a list item?Peak detection in a 2D arrayHow to access environment variable values?Speed comparison with Project Euler: C vs Python vs Erlang vs HaskellWhy are some float < integer comparisons four times slower than others?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
My problem is very similar to this post:
Python : count the number of changes of numbers
But since I cannot comment yet, I would like to know if there is a faster way?
My code is significantly the same as in the link, but the ranges of i and j are way much bigger (around one million in total) , meaning that it takes a significantly larger time to count (more than a day!)
python
|
show 3 more comments
My problem is very similar to this post:
Python : count the number of changes of numbers
But since I cannot comment yet, I would like to know if there is a faster way?
My code is significantly the same as in the link, but the ranges of i and j are way much bigger (around one million in total) , meaning that it takes a significantly larger time to count (more than a day!)
python
1
Python is not really the language of choice for high-performance computing.
– Scott Hunter
Mar 26 at 15:21
I can see why. But it's the language I'm asked to use, I have no other choice.
– Peter
Mar 26 at 15:26
1
If youriandjvalues have a range of about 1 million, then there are about 1e12 possible transitions... just having those written to a file would take terabytes... is that really what you are trying to compute?
– jdehesa
Mar 26 at 15:34
If it takes more than a day, just start the job on Friday afternoon, and have the results first thing on Monday. You're not mentioning how often you need to run the same program, or any indication how fast you'd like it to be.
– 0 0
Mar 26 at 15:36
It would be good if you include the actual code you have, and possibly some sample data, in your question. Linking to another question is fine, but it's annoying to go back and forth: a single, complete question, would be nicer. (The linked question is also a rather lengthy read; a summarized version would be nice.)
– 0 0
Mar 26 at 15:41
|
show 3 more comments
My problem is very similar to this post:
Python : count the number of changes of numbers
But since I cannot comment yet, I would like to know if there is a faster way?
My code is significantly the same as in the link, but the ranges of i and j are way much bigger (around one million in total) , meaning that it takes a significantly larger time to count (more than a day!)
python
My problem is very similar to this post:
Python : count the number of changes of numbers
But since I cannot comment yet, I would like to know if there is a faster way?
My code is significantly the same as in the link, but the ranges of i and j are way much bigger (around one million in total) , meaning that it takes a significantly larger time to count (more than a day!)
python
python
asked Mar 26 at 15:20
PeterPeter
235 bronze badges
235 bronze badges
1
Python is not really the language of choice for high-performance computing.
– Scott Hunter
Mar 26 at 15:21
I can see why. But it's the language I'm asked to use, I have no other choice.
– Peter
Mar 26 at 15:26
1
If youriandjvalues have a range of about 1 million, then there are about 1e12 possible transitions... just having those written to a file would take terabytes... is that really what you are trying to compute?
– jdehesa
Mar 26 at 15:34
If it takes more than a day, just start the job on Friday afternoon, and have the results first thing on Monday. You're not mentioning how often you need to run the same program, or any indication how fast you'd like it to be.
– 0 0
Mar 26 at 15:36
It would be good if you include the actual code you have, and possibly some sample data, in your question. Linking to another question is fine, but it's annoying to go back and forth: a single, complete question, would be nicer. (The linked question is also a rather lengthy read; a summarized version would be nice.)
– 0 0
Mar 26 at 15:41
|
show 3 more comments
1
Python is not really the language of choice for high-performance computing.
– Scott Hunter
Mar 26 at 15:21
I can see why. But it's the language I'm asked to use, I have no other choice.
– Peter
Mar 26 at 15:26
1
If youriandjvalues have a range of about 1 million, then there are about 1e12 possible transitions... just having those written to a file would take terabytes... is that really what you are trying to compute?
– jdehesa
Mar 26 at 15:34
If it takes more than a day, just start the job on Friday afternoon, and have the results first thing on Monday. You're not mentioning how often you need to run the same program, or any indication how fast you'd like it to be.
– 0 0
Mar 26 at 15:36
It would be good if you include the actual code you have, and possibly some sample data, in your question. Linking to another question is fine, but it's annoying to go back and forth: a single, complete question, would be nicer. (The linked question is also a rather lengthy read; a summarized version would be nice.)
– 0 0
Mar 26 at 15:41
1
1
Python is not really the language of choice for high-performance computing.
– Scott Hunter
Mar 26 at 15:21
Python is not really the language of choice for high-performance computing.
– Scott Hunter
Mar 26 at 15:21
I can see why. But it's the language I'm asked to use, I have no other choice.
– Peter
Mar 26 at 15:26
I can see why. But it's the language I'm asked to use, I have no other choice.
– Peter
Mar 26 at 15:26
1
1
If your
i and j values have a range of about 1 million, then there are about 1e12 possible transitions... just having those written to a file would take terabytes... is that really what you are trying to compute?– jdehesa
Mar 26 at 15:34
If your
i and j values have a range of about 1 million, then there are about 1e12 possible transitions... just having those written to a file would take terabytes... is that really what you are trying to compute?– jdehesa
Mar 26 at 15:34
If it takes more than a day, just start the job on Friday afternoon, and have the results first thing on Monday. You're not mentioning how often you need to run the same program, or any indication how fast you'd like it to be.
– 0 0
Mar 26 at 15:36
If it takes more than a day, just start the job on Friday afternoon, and have the results first thing on Monday. You're not mentioning how often you need to run the same program, or any indication how fast you'd like it to be.
– 0 0
Mar 26 at 15:36
It would be good if you include the actual code you have, and possibly some sample data, in your question. Linking to another question is fine, but it's annoying to go back and forth: a single, complete question, would be nicer. (The linked question is also a rather lengthy read; a summarized version would be nice.)
– 0 0
Mar 26 at 15:41
It would be good if you include the actual code you have, and possibly some sample data, in your question. Linking to another question is fine, but it's annoying to go back and forth: a single, complete question, would be nicer. (The linked question is also a rather lengthy read; a summarized version would be nice.)
– 0 0
Mar 26 at 15:41
|
show 3 more comments
1 Answer
1
active
oldest
votes
It is most definitely a better idea to save all the transition counts to a data structure instead of counting the appearances of each individual transition. It could be something like this:
def count_transitions(numbers):
n = max(numbers)
transitions = [[0] * (n + 1) for _ in range(n + 1)]
for i in range(len(numbers) - 1):
n1 = numbers[i]
n2 = numbers[i + 1]
transitions[n1][n2] += 1
return transitions
An example of how you could use it:
test_data = [1, 0, 1, 0, 1, 2, 0, 2, 0, 1, 1]
test_result = count_transitions(test_data)
for i, row in enumerate(test_result):
for j, count in enumerate(row):
print(f'i -> j: count')
Output:
0 -> 0: 0
0 -> 1: 3
0 -> 2: 1
1 -> 0: 2
1 -> 1: 1
1 -> 2: 1
2 -> 0: 2
2 -> 1: 0
2 -> 2: 0
Now, another matter is making this fast. This algorithm should already be much faster, because it has linear complexity instead of cubic, but we can use a couple of tools to make it still better. For example, using NumPy you could do it just like this:
import numpy as np
def count_transitions_np(numbers):
numbers = np.asarray(numbers)
n = numbers.max()
transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
np.add.at(transitions, (numbers[:-1], numbers[1:]), 1)
return transitions
Or you can use Numba with something like this:
@nb.njit
def count_transitions_nb(numbers):
n = 0
for num in numbers:
n = max(num, n)
transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
for i in range(len(numbers) - 1):
n1 = numbers[i]
n2 = numbers[i + 1]
transitions[n1, n2] += 1
return transitions
Finally, yet another option is to build a sparse matrix with SciPy. Note this is not the same as a dense matrix, but you may be able to work with it too.
import numpy as np
import scipy.sparse
def count_transitions_sp(numbers):
numbers = np.asarray(numbers)
n = numbers.max()
v = np.ones(len(numbers) - 1, dtype=np.int32)
return scipy.sparse.coo_matrix((v, (numbers[:-1], numbers[1:])), (n + 1, n + 1))
And now a small benchmark:
import random
# Generate input data
random.seed(100)
numbers = [random.randint(0, 1000) for _ in range(1000000)]
# Check results are correct
result1 = count_transitions(numbers)
result2 = count_transitions_np(numbers).tolist()
result3 = count_transitions_nb(numbers).tolist()
result4 = count_transitions_sp(numbers).todense().tolist()
print(result1 == result2)
# True
print(result1 == result3)
# True
print(result1 == result4)
# True
# NumPy version of data for NumPy, Numba and SciPy
numbers_np = np.asarray(numbers)
# Time it with IPython
%timeit count_transitions(numbers)
# 178 ms ± 633 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_np(numbers_np)
# 80.7 ms ± 663 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_nb(numbers_np)
# 5.36 ms ± 240 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit count_transitions_sp(numbers_np)
# 4.05 ms ± 47.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
As you see, Numba can be really fast, and sparse matrices are quick to build too if you can use them.
Thanks for your help
– Peter
Mar 27 at 10:04
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55360687%2fcounting-the-transitions-from-specific-variable-to-anoter-one%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
It is most definitely a better idea to save all the transition counts to a data structure instead of counting the appearances of each individual transition. It could be something like this:
def count_transitions(numbers):
n = max(numbers)
transitions = [[0] * (n + 1) for _ in range(n + 1)]
for i in range(len(numbers) - 1):
n1 = numbers[i]
n2 = numbers[i + 1]
transitions[n1][n2] += 1
return transitions
An example of how you could use it:
test_data = [1, 0, 1, 0, 1, 2, 0, 2, 0, 1, 1]
test_result = count_transitions(test_data)
for i, row in enumerate(test_result):
for j, count in enumerate(row):
print(f'i -> j: count')
Output:
0 -> 0: 0
0 -> 1: 3
0 -> 2: 1
1 -> 0: 2
1 -> 1: 1
1 -> 2: 1
2 -> 0: 2
2 -> 1: 0
2 -> 2: 0
Now, another matter is making this fast. This algorithm should already be much faster, because it has linear complexity instead of cubic, but we can use a couple of tools to make it still better. For example, using NumPy you could do it just like this:
import numpy as np
def count_transitions_np(numbers):
numbers = np.asarray(numbers)
n = numbers.max()
transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
np.add.at(transitions, (numbers[:-1], numbers[1:]), 1)
return transitions
Or you can use Numba with something like this:
@nb.njit
def count_transitions_nb(numbers):
n = 0
for num in numbers:
n = max(num, n)
transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
for i in range(len(numbers) - 1):
n1 = numbers[i]
n2 = numbers[i + 1]
transitions[n1, n2] += 1
return transitions
Finally, yet another option is to build a sparse matrix with SciPy. Note this is not the same as a dense matrix, but you may be able to work with it too.
import numpy as np
import scipy.sparse
def count_transitions_sp(numbers):
numbers = np.asarray(numbers)
n = numbers.max()
v = np.ones(len(numbers) - 1, dtype=np.int32)
return scipy.sparse.coo_matrix((v, (numbers[:-1], numbers[1:])), (n + 1, n + 1))
And now a small benchmark:
import random
# Generate input data
random.seed(100)
numbers = [random.randint(0, 1000) for _ in range(1000000)]
# Check results are correct
result1 = count_transitions(numbers)
result2 = count_transitions_np(numbers).tolist()
result3 = count_transitions_nb(numbers).tolist()
result4 = count_transitions_sp(numbers).todense().tolist()
print(result1 == result2)
# True
print(result1 == result3)
# True
print(result1 == result4)
# True
# NumPy version of data for NumPy, Numba and SciPy
numbers_np = np.asarray(numbers)
# Time it with IPython
%timeit count_transitions(numbers)
# 178 ms ± 633 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_np(numbers_np)
# 80.7 ms ± 663 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_nb(numbers_np)
# 5.36 ms ± 240 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit count_transitions_sp(numbers_np)
# 4.05 ms ± 47.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
As you see, Numba can be really fast, and sparse matrices are quick to build too if you can use them.
Thanks for your help
– Peter
Mar 27 at 10:04
add a comment |
It is most definitely a better idea to save all the transition counts to a data structure instead of counting the appearances of each individual transition. It could be something like this:
def count_transitions(numbers):
n = max(numbers)
transitions = [[0] * (n + 1) for _ in range(n + 1)]
for i in range(len(numbers) - 1):
n1 = numbers[i]
n2 = numbers[i + 1]
transitions[n1][n2] += 1
return transitions
An example of how you could use it:
test_data = [1, 0, 1, 0, 1, 2, 0, 2, 0, 1, 1]
test_result = count_transitions(test_data)
for i, row in enumerate(test_result):
for j, count in enumerate(row):
print(f'i -> j: count')
Output:
0 -> 0: 0
0 -> 1: 3
0 -> 2: 1
1 -> 0: 2
1 -> 1: 1
1 -> 2: 1
2 -> 0: 2
2 -> 1: 0
2 -> 2: 0
Now, another matter is making this fast. This algorithm should already be much faster, because it has linear complexity instead of cubic, but we can use a couple of tools to make it still better. For example, using NumPy you could do it just like this:
import numpy as np
def count_transitions_np(numbers):
numbers = np.asarray(numbers)
n = numbers.max()
transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
np.add.at(transitions, (numbers[:-1], numbers[1:]), 1)
return transitions
Or you can use Numba with something like this:
@nb.njit
def count_transitions_nb(numbers):
n = 0
for num in numbers:
n = max(num, n)
transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
for i in range(len(numbers) - 1):
n1 = numbers[i]
n2 = numbers[i + 1]
transitions[n1, n2] += 1
return transitions
Finally, yet another option is to build a sparse matrix with SciPy. Note this is not the same as a dense matrix, but you may be able to work with it too.
import numpy as np
import scipy.sparse
def count_transitions_sp(numbers):
numbers = np.asarray(numbers)
n = numbers.max()
v = np.ones(len(numbers) - 1, dtype=np.int32)
return scipy.sparse.coo_matrix((v, (numbers[:-1], numbers[1:])), (n + 1, n + 1))
And now a small benchmark:
import random
# Generate input data
random.seed(100)
numbers = [random.randint(0, 1000) for _ in range(1000000)]
# Check results are correct
result1 = count_transitions(numbers)
result2 = count_transitions_np(numbers).tolist()
result3 = count_transitions_nb(numbers).tolist()
result4 = count_transitions_sp(numbers).todense().tolist()
print(result1 == result2)
# True
print(result1 == result3)
# True
print(result1 == result4)
# True
# NumPy version of data for NumPy, Numba and SciPy
numbers_np = np.asarray(numbers)
# Time it with IPython
%timeit count_transitions(numbers)
# 178 ms ± 633 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_np(numbers_np)
# 80.7 ms ± 663 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_nb(numbers_np)
# 5.36 ms ± 240 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit count_transitions_sp(numbers_np)
# 4.05 ms ± 47.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
As you see, Numba can be really fast, and sparse matrices are quick to build too if you can use them.
Thanks for your help
– Peter
Mar 27 at 10:04
add a comment |
It is most definitely a better idea to save all the transition counts to a data structure instead of counting the appearances of each individual transition. It could be something like this:
def count_transitions(numbers):
n = max(numbers)
transitions = [[0] * (n + 1) for _ in range(n + 1)]
for i in range(len(numbers) - 1):
n1 = numbers[i]
n2 = numbers[i + 1]
transitions[n1][n2] += 1
return transitions
An example of how you could use it:
test_data = [1, 0, 1, 0, 1, 2, 0, 2, 0, 1, 1]
test_result = count_transitions(test_data)
for i, row in enumerate(test_result):
for j, count in enumerate(row):
print(f'i -> j: count')
Output:
0 -> 0: 0
0 -> 1: 3
0 -> 2: 1
1 -> 0: 2
1 -> 1: 1
1 -> 2: 1
2 -> 0: 2
2 -> 1: 0
2 -> 2: 0
Now, another matter is making this fast. This algorithm should already be much faster, because it has linear complexity instead of cubic, but we can use a couple of tools to make it still better. For example, using NumPy you could do it just like this:
import numpy as np
def count_transitions_np(numbers):
numbers = np.asarray(numbers)
n = numbers.max()
transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
np.add.at(transitions, (numbers[:-1], numbers[1:]), 1)
return transitions
Or you can use Numba with something like this:
@nb.njit
def count_transitions_nb(numbers):
n = 0
for num in numbers:
n = max(num, n)
transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
for i in range(len(numbers) - 1):
n1 = numbers[i]
n2 = numbers[i + 1]
transitions[n1, n2] += 1
return transitions
Finally, yet another option is to build a sparse matrix with SciPy. Note this is not the same as a dense matrix, but you may be able to work with it too.
import numpy as np
import scipy.sparse
def count_transitions_sp(numbers):
numbers = np.asarray(numbers)
n = numbers.max()
v = np.ones(len(numbers) - 1, dtype=np.int32)
return scipy.sparse.coo_matrix((v, (numbers[:-1], numbers[1:])), (n + 1, n + 1))
And now a small benchmark:
import random
# Generate input data
random.seed(100)
numbers = [random.randint(0, 1000) for _ in range(1000000)]
# Check results are correct
result1 = count_transitions(numbers)
result2 = count_transitions_np(numbers).tolist()
result3 = count_transitions_nb(numbers).tolist()
result4 = count_transitions_sp(numbers).todense().tolist()
print(result1 == result2)
# True
print(result1 == result3)
# True
print(result1 == result4)
# True
# NumPy version of data for NumPy, Numba and SciPy
numbers_np = np.asarray(numbers)
# Time it with IPython
%timeit count_transitions(numbers)
# 178 ms ± 633 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_np(numbers_np)
# 80.7 ms ± 663 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_nb(numbers_np)
# 5.36 ms ± 240 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit count_transitions_sp(numbers_np)
# 4.05 ms ± 47.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
As you see, Numba can be really fast, and sparse matrices are quick to build too if you can use them.
It is most definitely a better idea to save all the transition counts to a data structure instead of counting the appearances of each individual transition. It could be something like this:
def count_transitions(numbers):
n = max(numbers)
transitions = [[0] * (n + 1) for _ in range(n + 1)]
for i in range(len(numbers) - 1):
n1 = numbers[i]
n2 = numbers[i + 1]
transitions[n1][n2] += 1
return transitions
An example of how you could use it:
test_data = [1, 0, 1, 0, 1, 2, 0, 2, 0, 1, 1]
test_result = count_transitions(test_data)
for i, row in enumerate(test_result):
for j, count in enumerate(row):
print(f'i -> j: count')
Output:
0 -> 0: 0
0 -> 1: 3
0 -> 2: 1
1 -> 0: 2
1 -> 1: 1
1 -> 2: 1
2 -> 0: 2
2 -> 1: 0
2 -> 2: 0
Now, another matter is making this fast. This algorithm should already be much faster, because it has linear complexity instead of cubic, but we can use a couple of tools to make it still better. For example, using NumPy you could do it just like this:
import numpy as np
def count_transitions_np(numbers):
numbers = np.asarray(numbers)
n = numbers.max()
transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
np.add.at(transitions, (numbers[:-1], numbers[1:]), 1)
return transitions
Or you can use Numba with something like this:
@nb.njit
def count_transitions_nb(numbers):
n = 0
for num in numbers:
n = max(num, n)
transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
for i in range(len(numbers) - 1):
n1 = numbers[i]
n2 = numbers[i + 1]
transitions[n1, n2] += 1
return transitions
Finally, yet another option is to build a sparse matrix with SciPy. Note this is not the same as a dense matrix, but you may be able to work with it too.
import numpy as np
import scipy.sparse
def count_transitions_sp(numbers):
numbers = np.asarray(numbers)
n = numbers.max()
v = np.ones(len(numbers) - 1, dtype=np.int32)
return scipy.sparse.coo_matrix((v, (numbers[:-1], numbers[1:])), (n + 1, n + 1))
And now a small benchmark:
import random
# Generate input data
random.seed(100)
numbers = [random.randint(0, 1000) for _ in range(1000000)]
# Check results are correct
result1 = count_transitions(numbers)
result2 = count_transitions_np(numbers).tolist()
result3 = count_transitions_nb(numbers).tolist()
result4 = count_transitions_sp(numbers).todense().tolist()
print(result1 == result2)
# True
print(result1 == result3)
# True
print(result1 == result4)
# True
# NumPy version of data for NumPy, Numba and SciPy
numbers_np = np.asarray(numbers)
# Time it with IPython
%timeit count_transitions(numbers)
# 178 ms ± 633 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_np(numbers_np)
# 80.7 ms ± 663 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_nb(numbers_np)
# 5.36 ms ± 240 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit count_transitions_sp(numbers_np)
# 4.05 ms ± 47.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
As you see, Numba can be really fast, and sparse matrices are quick to build too if you can use them.
edited Mar 26 at 16:50
answered Mar 26 at 16:26
jdehesajdehesa
32.9k4 gold badges39 silver badges61 bronze badges
32.9k4 gold badges39 silver badges61 bronze badges
Thanks for your help
– Peter
Mar 27 at 10:04
add a comment |
Thanks for your help
– Peter
Mar 27 at 10:04
Thanks for your help
– Peter
Mar 27 at 10:04
Thanks for your help
– Peter
Mar 27 at 10:04
add a comment |
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55360687%2fcounting-the-transitions-from-specific-variable-to-anoter-one%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Python is not really the language of choice for high-performance computing.
– Scott Hunter
Mar 26 at 15:21
I can see why. But it's the language I'm asked to use, I have no other choice.
– Peter
Mar 26 at 15:26
1
If your
iandjvalues have a range of about 1 million, then there are about 1e12 possible transitions... just having those written to a file would take terabytes... is that really what you are trying to compute?– jdehesa
Mar 26 at 15:34
If it takes more than a day, just start the job on Friday afternoon, and have the results first thing on Monday. You're not mentioning how often you need to run the same program, or any indication how fast you'd like it to be.
– 0 0
Mar 26 at 15:36
It would be good if you include the actual code you have, and possibly some sample data, in your question. Linking to another question is fine, but it's annoying to go back and forth: a single, complete question, would be nicer. (The linked question is also a rather lengthy read; a summarized version would be nice.)
– 0 0
Mar 26 at 15:41