Python: List vs Dict for look up tablewhat is the time complexity to check if a dictionary has a key?Time complexity for lookup in dictionary.values() lists vs setsWhere can I find the time and space complexity of the built-in sequence types in PythonShould I use dict or list?returning value from list vs dictionaryWhy is a list access O(1) in Python?What is the overhead of using a dictionary instead of a list?Best Data Structure for storing license plates and searching if a given license plate existsWhy does my Sieve of Eratosthenes work faster with integers than with booleans?Python Multidimensional Array as a single ListHow do I check if a list is empty?Calling an external command in PythonWhat are metaclasses in Python?Finding the index of an item given a list containing it in PythonWhat is the difference between Python's list methods append and extend?Does Python have a ternary conditional operator?How to make a flat list out of list of listsImprove INSERT-per-second performance of SQLite?How do I list all files of a directory?Does Python have a string 'contains' substring method?

Are there non-military uses of 20%-enriched Uranium?

How are one-time password generators like Google Authenticator different from having two passwords?

Further factorisation of a difference of cubes?

Exception propagation: When to catch exceptions?

Best species to breed to intelligence

Is there a need for better software for writers?

When do you stop "pushing" a book?

Intersecting with the x-axis / intersecting the x-axis

Was the Highlands Ranch shooting the 115th mass shooting in the US in 2019

Is there an application which does HTTP PUT?

What is the name of meteoroids which hit Moon, Mars, or pretty much anything that isn’t the Earth?

Why are low spin tetrahedral complexes so rare?

Extending Kan fibrations, without using minimal fibrations

Company stopped paying my salary. What are my options?

Renting a house to a graduate student in my department

Why did Captain America age?

Why was the ancient one so hesitant to teach Dr Strange the art of sorcery

How is CoreiX like Corei5, i7 is related to Haswell, Ivy Bridge?

How can I avoid subordinates and coworkers leaving work until the last minute, then having no time for revisions?

Has there been evidence of any other gods?

How did Thanos not realise this had happened at the end of Endgame?

How do I compare the result of "1d20+x, with advantage" to "1d20+y, without advantage", assuming x < y?

Names of the Six Tastes

Was there a contingency plan in place if Little Boy failed to detonate?



Python: List vs Dict for look up table


what is the time complexity to check if a dictionary has a key?Time complexity for lookup in dictionary.values() lists vs setsWhere can I find the time and space complexity of the built-in sequence types in PythonShould I use dict or list?returning value from list vs dictionaryWhy is a list access O(1) in Python?What is the overhead of using a dictionary instead of a list?Best Data Structure for storing license plates and searching if a given license plate existsWhy does my Sieve of Eratosthenes work faster with integers than with booleans?Python Multidimensional Array as a single ListHow do I check if a list is empty?Calling an external command in PythonWhat are metaclasses in Python?Finding the index of an item given a list containing it in PythonWhat is the difference between Python's list methods append and extend?Does Python have a ternary conditional operator?How to make a flat list out of list of listsImprove INSERT-per-second performance of SQLite?How do I list all files of a directory?Does Python have a string 'contains' substring method?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








150















I have about 10million values that I need to put in some type of look up table, so I was wondering which would be more efficient a list or dict?



I know you can do something like this for both:



if something in dict_of_stuff:
pass


and



if something in list_of_stuff:
pass


My thought is the dict will be faster and more efficient.



Thanks for your help.



EDIT 1

Little more info on what I'm trying to do. Euler Problem 92. I'm making a look up table to see if a value calculated has all ready been calculated.



EDIT 2

Efficiency for look up.



EDIT 3

There are no values assosiated with the value...so would a set be better?










share|improve this question



















  • 1





    Efficiency in terms of what? Insert? Lookup? Memory consumption? Are you checking for pure existance of value, or is there any metadata associated with it?

    – truppo
    Feb 4 '09 at 23:36











  • As a side note, you don't need a 10 million list or dict for that specific problem but a much smaller one.

    – sfotiadis
    Aug 20 '14 at 14:16











  • What if the table was a tuple instead of a list? Are tuple elements hashed, or is it just an immutable list?

    – RufusVS
    Mar 11 '15 at 15:29











  • Please ignore my comment using tuples. I later realized tuples are just data (with no reason to be hashed) and immutability is not a useful factor here.

    – RufusVS
    Mar 12 '15 at 15:45

















150















I have about 10million values that I need to put in some type of look up table, so I was wondering which would be more efficient a list or dict?



I know you can do something like this for both:



if something in dict_of_stuff:
pass


and



if something in list_of_stuff:
pass


My thought is the dict will be faster and more efficient.



Thanks for your help.



EDIT 1

Little more info on what I'm trying to do. Euler Problem 92. I'm making a look up table to see if a value calculated has all ready been calculated.



EDIT 2

Efficiency for look up.



EDIT 3

There are no values assosiated with the value...so would a set be better?










share|improve this question



















  • 1





    Efficiency in terms of what? Insert? Lookup? Memory consumption? Are you checking for pure existance of value, or is there any metadata associated with it?

    – truppo
    Feb 4 '09 at 23:36











  • As a side note, you don't need a 10 million list or dict for that specific problem but a much smaller one.

    – sfotiadis
    Aug 20 '14 at 14:16











  • What if the table was a tuple instead of a list? Are tuple elements hashed, or is it just an immutable list?

    – RufusVS
    Mar 11 '15 at 15:29











  • Please ignore my comment using tuples. I later realized tuples are just data (with no reason to be hashed) and immutability is not a useful factor here.

    – RufusVS
    Mar 12 '15 at 15:45













150












150








150


70






I have about 10million values that I need to put in some type of look up table, so I was wondering which would be more efficient a list or dict?



I know you can do something like this for both:



if something in dict_of_stuff:
pass


and



if something in list_of_stuff:
pass


My thought is the dict will be faster and more efficient.



Thanks for your help.



EDIT 1

Little more info on what I'm trying to do. Euler Problem 92. I'm making a look up table to see if a value calculated has all ready been calculated.



EDIT 2

Efficiency for look up.



EDIT 3

There are no values assosiated with the value...so would a set be better?










share|improve this question
















I have about 10million values that I need to put in some type of look up table, so I was wondering which would be more efficient a list or dict?



I know you can do something like this for both:



if something in dict_of_stuff:
pass


and



if something in list_of_stuff:
pass


My thought is the dict will be faster and more efficient.



Thanks for your help.



EDIT 1

Little more info on what I'm trying to do. Euler Problem 92. I'm making a look up table to see if a value calculated has all ready been calculated.



EDIT 2

Efficiency for look up.



EDIT 3

There are no values assosiated with the value...so would a set be better?







python performance






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 22 '15 at 2:36









Zero Piraeus

31.6k18103129




31.6k18103129










asked Feb 4 '09 at 23:28









NopeNope

12.4k3586114




12.4k3586114







  • 1





    Efficiency in terms of what? Insert? Lookup? Memory consumption? Are you checking for pure existance of value, or is there any metadata associated with it?

    – truppo
    Feb 4 '09 at 23:36











  • As a side note, you don't need a 10 million list or dict for that specific problem but a much smaller one.

    – sfotiadis
    Aug 20 '14 at 14:16











  • What if the table was a tuple instead of a list? Are tuple elements hashed, or is it just an immutable list?

    – RufusVS
    Mar 11 '15 at 15:29











  • Please ignore my comment using tuples. I later realized tuples are just data (with no reason to be hashed) and immutability is not a useful factor here.

    – RufusVS
    Mar 12 '15 at 15:45












  • 1





    Efficiency in terms of what? Insert? Lookup? Memory consumption? Are you checking for pure existance of value, or is there any metadata associated with it?

    – truppo
    Feb 4 '09 at 23:36











  • As a side note, you don't need a 10 million list or dict for that specific problem but a much smaller one.

    – sfotiadis
    Aug 20 '14 at 14:16











  • What if the table was a tuple instead of a list? Are tuple elements hashed, or is it just an immutable list?

    – RufusVS
    Mar 11 '15 at 15:29











  • Please ignore my comment using tuples. I later realized tuples are just data (with no reason to be hashed) and immutability is not a useful factor here.

    – RufusVS
    Mar 12 '15 at 15:45







1




1





Efficiency in terms of what? Insert? Lookup? Memory consumption? Are you checking for pure existance of value, or is there any metadata associated with it?

– truppo
Feb 4 '09 at 23:36





Efficiency in terms of what? Insert? Lookup? Memory consumption? Are you checking for pure existance of value, or is there any metadata associated with it?

– truppo
Feb 4 '09 at 23:36













As a side note, you don't need a 10 million list or dict for that specific problem but a much smaller one.

– sfotiadis
Aug 20 '14 at 14:16





As a side note, you don't need a 10 million list or dict for that specific problem but a much smaller one.

– sfotiadis
Aug 20 '14 at 14:16













What if the table was a tuple instead of a list? Are tuple elements hashed, or is it just an immutable list?

– RufusVS
Mar 11 '15 at 15:29





What if the table was a tuple instead of a list? Are tuple elements hashed, or is it just an immutable list?

– RufusVS
Mar 11 '15 at 15:29













Please ignore my comment using tuples. I later realized tuples are just data (with no reason to be hashed) and immutability is not a useful factor here.

– RufusVS
Mar 12 '15 at 15:45





Please ignore my comment using tuples. I later realized tuples are just data (with no reason to be hashed) and immutability is not a useful factor here.

– RufusVS
Mar 12 '15 at 15:45












8 Answers
8






active

oldest

votes


















192














Speed



Lookups in lists are O(n), lookups in dictionaries are amortized O(1), with regard to the number of items in the data structure. If you don't need to associate values, use sets.



Memory



Both dictionaries and sets use hashing and they use much more memory than only for object storage. According to A.M. Kuchling in Beautiful Code, the implementation tries to keep the hash 2/3 full, so you might waste quite some memory.



If you do not add new entries on the fly (which you do, based on your updated question), it might be worthwhile to sort the list and use binary search. This is O(log n), and is likely to be slower for strings, impossible for objects which do not have a natural ordering.






share|improve this answer




















  • 6





    Yes, but it's a one-off operation if the contents never change. Binary search is O(log n).

    – Torsten Marek
    Feb 4 '09 at 23:54






  • 1





    @John Fouhy: the ints are not stored in the hash table, only pointers, i.e. hou have 40M for the ints (well, not really when a lot of them are small) and 60M for the hash table. I agree that it's not that much of a problem nowadays, still it's worthwhile to keep in mind.

    – Torsten Marek
    Feb 5 '09 at 11:04






  • 2





    This is an old question, but I think amortized O(1) may not hold true for very large sets/dicts. The worst case scenario according to wiki.python.org/moin/TimeComplexity is O(n). I guess it depends on the internal hashing implementation at what point the average time diverges from O(1) and starts converging on O(n). You can help the lookup performance by compartmentalizing the global sets into smaller sections based on some easily discernible attribute (like the value of the first digit, then the second, third, etc., for as long as you need to get optimal set size).

    – Nisan.H
    Sep 21 '12 at 17:51






  • 1





    @TorstenMarek This confuses me. From this page, list lookup is O(1) and dict lookup is O(n), which is the opposite of what you said. Am I misunderstanding?

    – temporary_user_name
    Nov 10 '13 at 22:03







  • 3





    @Aerovistae I think you misread the info on that page. Under list, I see O(n) for "x in s" (lookup). It also shows set and dict lookup as O(1) average case.

    – Dennis
    Mar 6 '14 at 3:23


















38














A dict is a hash table, so it is really fast to find the keys. So between dict and list, dict would be faster. But if you don't have a value to associate, it is even better to use a set. It is a hash table, without the "table" part.




EDIT: for your new question, YES, a set would be better. Just create 2 sets, one for sequences ended in 1 and other for the sequences ended in 89. I have sucessfully solved this problem using sets.






share|improve this answer
































    31














    set() is exactly what you want. O(1) lookups, and smaller than a dict.






    share|improve this answer
































      26














      I did some benchmarking and it turns out that dict is faster than both list and set for large data sets, running python 2.7.3 on an i7 CPU on linux:




      • python -mtimeit -s 'd=range(10**7)' '5*10**6 in d'



        10 loops, best of 3: 64.2 msec per loop




      • python -mtimeit -s 'd=dict.fromkeys(range(10**7))' '5*10**6 in d'



        10000000 loops, best of 3: 0.0759 usec per loop




      • python -mtimeit -s 'from sets import Set; d=Set(range(10**7))' '5*10**6 in d'



        1000000 loops, best of 3: 0.262 usec per loop



      As you can see, dict is considerably faster than list and about 3 times faster than set. In some applications you might still want to choose set for the beauty of it, though. And if the data sets are really small (< 1000 elements) lists perform pretty well.






      share|improve this answer

























      • Shouldn't it be exactly the opposite? List: 10 * 64.2 * 1000 = 642000 usec, dict: 10000000 * 0.0759 = 759000 usec and set: 1000000 * 0.262 = 262000 usec ... so sets are the fastest, followed by the list and with dict as last on your example. Or am I missing something?

        – andzep
        Nov 9 '12 at 12:15







      • 1





        ... but the question for me here is: what does this times are actually measuring? Not the access time for a given list, dict or set, but much more, the time and loops to create the list, dict, set and finally to find and access one value. So, does this have to do with the question at all? ... It's interesting though...

        – andzep
        Nov 9 '12 at 12:25







      • 7





        @andzep, you are mistaken, the -s option is to setup the timeit environment, i.e. it does not count in the total time. The -s option is run only once. On Python 3.3, I get these results: gen (range) -> 0.229 usec, list -> 157 msec, dict -> 0.0806 usec, set -> 0.0807 usec. Set and dict performance is the same. Dict however takes a bit longer to initialize than set (total time 13.580s v. 11.803s)

        – sleblanc
        Jun 17 '13 at 3:48







      • 1





        why not use builtin set? I actually get much worse results with sets.Set() than with builtin set()

        – Thomas Guyot-Sionnest
        Oct 27 '16 at 14:46






      • 1





        @ThomasGuyot-Sionnest The built in set was introduced in python 2.4 so I'm not sure why I didn't use it in my proposed solution. I get good performance with python -mtimeit -s "d=set(range(10**7))" "5*10**6 in d" using Python 3.6.0 (10000000 loops, best of 3: 0.0608 usec per loop), roughly the same as the dict benchmark so thank you for your comment.

        – EriF89
        Jan 25 '17 at 15:17



















      6














      if data are unique set() will be the most efficient, but of two - dict (which also requires uniqueness, oops :)






      share|improve this answer


















      • 1





        not really... data must be unique for the dict too.

        – nosklo
        Feb 4 '09 at 23:33











      • i've realised when i saw my answer posted %)

        – SilentGhost
        Feb 4 '09 at 23:34






      • 1





        @SilentGhost if the answer is wrong, why not deleting it? too bad for the upvotes, but that happens (well, happened)

        – Jean-François Fabre
        May 19 '17 at 21:28


















      6














      You want a dict.



      For (unsorted) lists in Python, the "in" operation requires O(n) time---not good when you have a large amount of data. A dict, on the other hand, is a hash table, so you can expect O(1) lookup time.



      As others have noted, you might choose a set (a special type of dict) instead, if you only have keys rather than key/value pairs.



      Related:




      • Python wiki: information on the time complexity of Python container operations.


      • SO: Python container operation time and memory complexities





      share|improve this answer




















      • 1





        Even for sorted lists, "in" is O(n).

        – Roger Pate
        Feb 5 '09 at 3:39






      • 2





        For a linked list, yes---but "lists" in Python are what most people would call vectors, which provide indexed access in O(1) and a find operation in O(log n), when sorted.

        – zweiterlinde
        Feb 5 '09 at 14:49











      • Are you saying that the in operator applied to a sorted list performs better than when applied to an unsorted one (for a search of a random value)? (I don't think whether they are implemented internally as vectors or as nodes in a linked-list is relevant.)

        – martineau
        Nov 27 '10 at 12:59



















      2














      As a new set of tests to show @EriF89 is still right after all these years:



      $ python -m timeit -s "l=k:k for k in xrange(5000)" "[i for i in xrange(10000) if i in l]"
      1000 loops, best of 3: 1.84 msec per loop
      $ python -m timeit -s "l=[k for k in xrange(5000)]" "[i for i in xrange(10000) if i in l]"
      10 loops, best of 3: 573 msec per loop
      $ python -m timeit -s "l=tuple([k for k in xrange(5000)])" "[i for i in xrange(10000) if i in l]"
      10 loops, best of 3: 587 msec per loop
      $ python -m timeit -s "l=set([k for k in xrange(5000)])" "[i for i in xrange(10000) if i in l]"
      1000 loops, best of 3: 1.88 msec per loop


      Here we also compare a tuple, which are known to be faster than lists (and use less memory) in some use cases. In the case of lookup table, the tuple faired no better .



      Both the dict and set performed very well. This brings up an interesting point tying into @SilentGhost answer about uniqueness: if the OP has 10M values in a data set, and it's unknown if there are duplicates in them, then it would be worth keeping a set/dict of its elements in parallel with the actual data set, and testing for existence in that set/dict. It's possible the 10M data points only have 10 unique values, which is a much smaller space to search!



      SilentGhost's mistake about dicts is actually illuminating because one could use a dict to correlate duplicated data (in values) into a nonduplicated set (keys), and thus keep one data object to hold all data, yet still be fast as a lookup table. For example, a dict key could be the value being looked up, and the value could be a list of indices in an imaginary list where that value occurred.



      For example, if the source data list to be searched was l=[1,2,3,1,2,1,4], it could be optimized for both searching and memory by replacing it with this dict:



      >>> from collections import defaultdict
      >>> d = defaultdict(list)
      >>> l=[1,2,3,1,2,1,4]
      >>> for i, e in enumerate(l):
      ... d[e].append(i)
      >>> d
      defaultdict(<class 'list'>, 1: [0, 3, 5], 2: [1, 4], 3: [2], 4: [6])


      With this dict, one can know:




      1. If a value was in the original dataset (ie 2 in d returns True)


      2. Where the value was in the original dataset (ie d[2] returns list of indices where data was found in original data list: [1, 4])





      share|improve this answer

























      • For your last paragraph, while it makes sense reading it, it would be nice (and probably easier to grasp) to see the actual code you are trying to explain.

        – kaiser
        Jun 29 '18 at 19:31











      • @kaiser good idea - I added an example to the answer.

        – hamx0r
        Jul 17 '18 at 22:22


















      0














      You don't actually need to store 10 million values in the table, so it's not a big deal either way.



      Hint: think about how large your result can be after the first sum of squares operation. The largest possible result will be much smaller than 10 million...






      share|improve this answer























        Your Answer






        StackExchange.ifUsing("editor", function ()
        StackExchange.using("externalEditor", function ()
        StackExchange.using("snippets", function ()
        StackExchange.snippets.init();
        );
        );
        , "code-snippets");

        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "1"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader:
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        ,
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );













        draft saved

        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f513882%2fpython-list-vs-dict-for-look-up-table%23new-answer', 'question_page');

        );

        Post as a guest















        Required, but never shown

























        8 Answers
        8






        active

        oldest

        votes








        8 Answers
        8






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        192














        Speed



        Lookups in lists are O(n), lookups in dictionaries are amortized O(1), with regard to the number of items in the data structure. If you don't need to associate values, use sets.



        Memory



        Both dictionaries and sets use hashing and they use much more memory than only for object storage. According to A.M. Kuchling in Beautiful Code, the implementation tries to keep the hash 2/3 full, so you might waste quite some memory.



        If you do not add new entries on the fly (which you do, based on your updated question), it might be worthwhile to sort the list and use binary search. This is O(log n), and is likely to be slower for strings, impossible for objects which do not have a natural ordering.






        share|improve this answer




















        • 6





          Yes, but it's a one-off operation if the contents never change. Binary search is O(log n).

          – Torsten Marek
          Feb 4 '09 at 23:54






        • 1





          @John Fouhy: the ints are not stored in the hash table, only pointers, i.e. hou have 40M for the ints (well, not really when a lot of them are small) and 60M for the hash table. I agree that it's not that much of a problem nowadays, still it's worthwhile to keep in mind.

          – Torsten Marek
          Feb 5 '09 at 11:04






        • 2





          This is an old question, but I think amortized O(1) may not hold true for very large sets/dicts. The worst case scenario according to wiki.python.org/moin/TimeComplexity is O(n). I guess it depends on the internal hashing implementation at what point the average time diverges from O(1) and starts converging on O(n). You can help the lookup performance by compartmentalizing the global sets into smaller sections based on some easily discernible attribute (like the value of the first digit, then the second, third, etc., for as long as you need to get optimal set size).

          – Nisan.H
          Sep 21 '12 at 17:51






        • 1





          @TorstenMarek This confuses me. From this page, list lookup is O(1) and dict lookup is O(n), which is the opposite of what you said. Am I misunderstanding?

          – temporary_user_name
          Nov 10 '13 at 22:03







        • 3





          @Aerovistae I think you misread the info on that page. Under list, I see O(n) for "x in s" (lookup). It also shows set and dict lookup as O(1) average case.

          – Dennis
          Mar 6 '14 at 3:23















        192














        Speed



        Lookups in lists are O(n), lookups in dictionaries are amortized O(1), with regard to the number of items in the data structure. If you don't need to associate values, use sets.



        Memory



        Both dictionaries and sets use hashing and they use much more memory than only for object storage. According to A.M. Kuchling in Beautiful Code, the implementation tries to keep the hash 2/3 full, so you might waste quite some memory.



        If you do not add new entries on the fly (which you do, based on your updated question), it might be worthwhile to sort the list and use binary search. This is O(log n), and is likely to be slower for strings, impossible for objects which do not have a natural ordering.






        share|improve this answer




















        • 6





          Yes, but it's a one-off operation if the contents never change. Binary search is O(log n).

          – Torsten Marek
          Feb 4 '09 at 23:54






        • 1





          @John Fouhy: the ints are not stored in the hash table, only pointers, i.e. hou have 40M for the ints (well, not really when a lot of them are small) and 60M for the hash table. I agree that it's not that much of a problem nowadays, still it's worthwhile to keep in mind.

          – Torsten Marek
          Feb 5 '09 at 11:04






        • 2





          This is an old question, but I think amortized O(1) may not hold true for very large sets/dicts. The worst case scenario according to wiki.python.org/moin/TimeComplexity is O(n). I guess it depends on the internal hashing implementation at what point the average time diverges from O(1) and starts converging on O(n). You can help the lookup performance by compartmentalizing the global sets into smaller sections based on some easily discernible attribute (like the value of the first digit, then the second, third, etc., for as long as you need to get optimal set size).

          – Nisan.H
          Sep 21 '12 at 17:51






        • 1





          @TorstenMarek This confuses me. From this page, list lookup is O(1) and dict lookup is O(n), which is the opposite of what you said. Am I misunderstanding?

          – temporary_user_name
          Nov 10 '13 at 22:03







        • 3





          @Aerovistae I think you misread the info on that page. Under list, I see O(n) for "x in s" (lookup). It also shows set and dict lookup as O(1) average case.

          – Dennis
          Mar 6 '14 at 3:23













        192












        192








        192







        Speed



        Lookups in lists are O(n), lookups in dictionaries are amortized O(1), with regard to the number of items in the data structure. If you don't need to associate values, use sets.



        Memory



        Both dictionaries and sets use hashing and they use much more memory than only for object storage. According to A.M. Kuchling in Beautiful Code, the implementation tries to keep the hash 2/3 full, so you might waste quite some memory.



        If you do not add new entries on the fly (which you do, based on your updated question), it might be worthwhile to sort the list and use binary search. This is O(log n), and is likely to be slower for strings, impossible for objects which do not have a natural ordering.






        share|improve this answer















        Speed



        Lookups in lists are O(n), lookups in dictionaries are amortized O(1), with regard to the number of items in the data structure. If you don't need to associate values, use sets.



        Memory



        Both dictionaries and sets use hashing and they use much more memory than only for object storage. According to A.M. Kuchling in Beautiful Code, the implementation tries to keep the hash 2/3 full, so you might waste quite some memory.



        If you do not add new entries on the fly (which you do, based on your updated question), it might be worthwhile to sort the list and use binary search. This is O(log n), and is likely to be slower for strings, impossible for objects which do not have a natural ordering.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Feb 4 '09 at 23:44

























        answered Feb 4 '09 at 23:38









        Torsten MarekTorsten Marek

        60.7k177995




        60.7k177995







        • 6





          Yes, but it's a one-off operation if the contents never change. Binary search is O(log n).

          – Torsten Marek
          Feb 4 '09 at 23:54






        • 1





          @John Fouhy: the ints are not stored in the hash table, only pointers, i.e. hou have 40M for the ints (well, not really when a lot of them are small) and 60M for the hash table. I agree that it's not that much of a problem nowadays, still it's worthwhile to keep in mind.

          – Torsten Marek
          Feb 5 '09 at 11:04






        • 2





          This is an old question, but I think amortized O(1) may not hold true for very large sets/dicts. The worst case scenario according to wiki.python.org/moin/TimeComplexity is O(n). I guess it depends on the internal hashing implementation at what point the average time diverges from O(1) and starts converging on O(n). You can help the lookup performance by compartmentalizing the global sets into smaller sections based on some easily discernible attribute (like the value of the first digit, then the second, third, etc., for as long as you need to get optimal set size).

          – Nisan.H
          Sep 21 '12 at 17:51






        • 1





          @TorstenMarek This confuses me. From this page, list lookup is O(1) and dict lookup is O(n), which is the opposite of what you said. Am I misunderstanding?

          – temporary_user_name
          Nov 10 '13 at 22:03







        • 3





          @Aerovistae I think you misread the info on that page. Under list, I see O(n) for "x in s" (lookup). It also shows set and dict lookup as O(1) average case.

          – Dennis
          Mar 6 '14 at 3:23












        • 6





          Yes, but it's a one-off operation if the contents never change. Binary search is O(log n).

          – Torsten Marek
          Feb 4 '09 at 23:54






        • 1





          @John Fouhy: the ints are not stored in the hash table, only pointers, i.e. hou have 40M for the ints (well, not really when a lot of them are small) and 60M for the hash table. I agree that it's not that much of a problem nowadays, still it's worthwhile to keep in mind.

          – Torsten Marek
          Feb 5 '09 at 11:04






        • 2





          This is an old question, but I think amortized O(1) may not hold true for very large sets/dicts. The worst case scenario according to wiki.python.org/moin/TimeComplexity is O(n). I guess it depends on the internal hashing implementation at what point the average time diverges from O(1) and starts converging on O(n). You can help the lookup performance by compartmentalizing the global sets into smaller sections based on some easily discernible attribute (like the value of the first digit, then the second, third, etc., for as long as you need to get optimal set size).

          – Nisan.H
          Sep 21 '12 at 17:51






        • 1





          @TorstenMarek This confuses me. From this page, list lookup is O(1) and dict lookup is O(n), which is the opposite of what you said. Am I misunderstanding?

          – temporary_user_name
          Nov 10 '13 at 22:03







        • 3





          @Aerovistae I think you misread the info on that page. Under list, I see O(n) for "x in s" (lookup). It also shows set and dict lookup as O(1) average case.

          – Dennis
          Mar 6 '14 at 3:23







        6




        6





        Yes, but it's a one-off operation if the contents never change. Binary search is O(log n).

        – Torsten Marek
        Feb 4 '09 at 23:54





        Yes, but it's a one-off operation if the contents never change. Binary search is O(log n).

        – Torsten Marek
        Feb 4 '09 at 23:54




        1




        1





        @John Fouhy: the ints are not stored in the hash table, only pointers, i.e. hou have 40M for the ints (well, not really when a lot of them are small) and 60M for the hash table. I agree that it's not that much of a problem nowadays, still it's worthwhile to keep in mind.

        – Torsten Marek
        Feb 5 '09 at 11:04





        @John Fouhy: the ints are not stored in the hash table, only pointers, i.e. hou have 40M for the ints (well, not really when a lot of them are small) and 60M for the hash table. I agree that it's not that much of a problem nowadays, still it's worthwhile to keep in mind.

        – Torsten Marek
        Feb 5 '09 at 11:04




        2




        2





        This is an old question, but I think amortized O(1) may not hold true for very large sets/dicts. The worst case scenario according to wiki.python.org/moin/TimeComplexity is O(n). I guess it depends on the internal hashing implementation at what point the average time diverges from O(1) and starts converging on O(n). You can help the lookup performance by compartmentalizing the global sets into smaller sections based on some easily discernible attribute (like the value of the first digit, then the second, third, etc., for as long as you need to get optimal set size).

        – Nisan.H
        Sep 21 '12 at 17:51





        This is an old question, but I think amortized O(1) may not hold true for very large sets/dicts. The worst case scenario according to wiki.python.org/moin/TimeComplexity is O(n). I guess it depends on the internal hashing implementation at what point the average time diverges from O(1) and starts converging on O(n). You can help the lookup performance by compartmentalizing the global sets into smaller sections based on some easily discernible attribute (like the value of the first digit, then the second, third, etc., for as long as you need to get optimal set size).

        – Nisan.H
        Sep 21 '12 at 17:51




        1




        1





        @TorstenMarek This confuses me. From this page, list lookup is O(1) and dict lookup is O(n), which is the opposite of what you said. Am I misunderstanding?

        – temporary_user_name
        Nov 10 '13 at 22:03






        @TorstenMarek This confuses me. From this page, list lookup is O(1) and dict lookup is O(n), which is the opposite of what you said. Am I misunderstanding?

        – temporary_user_name
        Nov 10 '13 at 22:03





        3




        3





        @Aerovistae I think you misread the info on that page. Under list, I see O(n) for "x in s" (lookup). It also shows set and dict lookup as O(1) average case.

        – Dennis
        Mar 6 '14 at 3:23





        @Aerovistae I think you misread the info on that page. Under list, I see O(n) for "x in s" (lookup). It also shows set and dict lookup as O(1) average case.

        – Dennis
        Mar 6 '14 at 3:23













        38














        A dict is a hash table, so it is really fast to find the keys. So between dict and list, dict would be faster. But if you don't have a value to associate, it is even better to use a set. It is a hash table, without the "table" part.




        EDIT: for your new question, YES, a set would be better. Just create 2 sets, one for sequences ended in 1 and other for the sequences ended in 89. I have sucessfully solved this problem using sets.






        share|improve this answer





























          38














          A dict is a hash table, so it is really fast to find the keys. So between dict and list, dict would be faster. But if you don't have a value to associate, it is even better to use a set. It is a hash table, without the "table" part.




          EDIT: for your new question, YES, a set would be better. Just create 2 sets, one for sequences ended in 1 and other for the sequences ended in 89. I have sucessfully solved this problem using sets.






          share|improve this answer



























            38












            38








            38







            A dict is a hash table, so it is really fast to find the keys. So between dict and list, dict would be faster. But if you don't have a value to associate, it is even better to use a set. It is a hash table, without the "table" part.




            EDIT: for your new question, YES, a set would be better. Just create 2 sets, one for sequences ended in 1 and other for the sequences ended in 89. I have sucessfully solved this problem using sets.






            share|improve this answer















            A dict is a hash table, so it is really fast to find the keys. So between dict and list, dict would be faster. But if you don't have a value to associate, it is even better to use a set. It is a hash table, without the "table" part.




            EDIT: for your new question, YES, a set would be better. Just create 2 sets, one for sequences ended in 1 and other for the sequences ended in 89. I have sucessfully solved this problem using sets.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Feb 5 '09 at 0:12

























            answered Feb 4 '09 at 23:31









            nosklonosklo

            159k46254275




            159k46254275





















                31














                set() is exactly what you want. O(1) lookups, and smaller than a dict.






                share|improve this answer





























                  31














                  set() is exactly what you want. O(1) lookups, and smaller than a dict.






                  share|improve this answer



























                    31












                    31








                    31







                    set() is exactly what you want. O(1) lookups, and smaller than a dict.






                    share|improve this answer















                    set() is exactly what you want. O(1) lookups, and smaller than a dict.







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Jan 24 '17 at 5:02









                    David Metcalfe

                    8911130




                    8911130










                    answered Feb 23 '09 at 19:24









                    recursiverecursive

                    63.2k22118210




                    63.2k22118210





















                        26














                        I did some benchmarking and it turns out that dict is faster than both list and set for large data sets, running python 2.7.3 on an i7 CPU on linux:




                        • python -mtimeit -s 'd=range(10**7)' '5*10**6 in d'



                          10 loops, best of 3: 64.2 msec per loop




                        • python -mtimeit -s 'd=dict.fromkeys(range(10**7))' '5*10**6 in d'



                          10000000 loops, best of 3: 0.0759 usec per loop




                        • python -mtimeit -s 'from sets import Set; d=Set(range(10**7))' '5*10**6 in d'



                          1000000 loops, best of 3: 0.262 usec per loop



                        As you can see, dict is considerably faster than list and about 3 times faster than set. In some applications you might still want to choose set for the beauty of it, though. And if the data sets are really small (< 1000 elements) lists perform pretty well.






                        share|improve this answer

























                        • Shouldn't it be exactly the opposite? List: 10 * 64.2 * 1000 = 642000 usec, dict: 10000000 * 0.0759 = 759000 usec and set: 1000000 * 0.262 = 262000 usec ... so sets are the fastest, followed by the list and with dict as last on your example. Or am I missing something?

                          – andzep
                          Nov 9 '12 at 12:15







                        • 1





                          ... but the question for me here is: what does this times are actually measuring? Not the access time for a given list, dict or set, but much more, the time and loops to create the list, dict, set and finally to find and access one value. So, does this have to do with the question at all? ... It's interesting though...

                          – andzep
                          Nov 9 '12 at 12:25







                        • 7





                          @andzep, you are mistaken, the -s option is to setup the timeit environment, i.e. it does not count in the total time. The -s option is run only once. On Python 3.3, I get these results: gen (range) -> 0.229 usec, list -> 157 msec, dict -> 0.0806 usec, set -> 0.0807 usec. Set and dict performance is the same. Dict however takes a bit longer to initialize than set (total time 13.580s v. 11.803s)

                          – sleblanc
                          Jun 17 '13 at 3:48







                        • 1





                          why not use builtin set? I actually get much worse results with sets.Set() than with builtin set()

                          – Thomas Guyot-Sionnest
                          Oct 27 '16 at 14:46






                        • 1





                          @ThomasGuyot-Sionnest The built in set was introduced in python 2.4 so I'm not sure why I didn't use it in my proposed solution. I get good performance with python -mtimeit -s "d=set(range(10**7))" "5*10**6 in d" using Python 3.6.0 (10000000 loops, best of 3: 0.0608 usec per loop), roughly the same as the dict benchmark so thank you for your comment.

                          – EriF89
                          Jan 25 '17 at 15:17
















                        26














                        I did some benchmarking and it turns out that dict is faster than both list and set for large data sets, running python 2.7.3 on an i7 CPU on linux:




                        • python -mtimeit -s 'd=range(10**7)' '5*10**6 in d'



                          10 loops, best of 3: 64.2 msec per loop




                        • python -mtimeit -s 'd=dict.fromkeys(range(10**7))' '5*10**6 in d'



                          10000000 loops, best of 3: 0.0759 usec per loop




                        • python -mtimeit -s 'from sets import Set; d=Set(range(10**7))' '5*10**6 in d'



                          1000000 loops, best of 3: 0.262 usec per loop



                        As you can see, dict is considerably faster than list and about 3 times faster than set. In some applications you might still want to choose set for the beauty of it, though. And if the data sets are really small (< 1000 elements) lists perform pretty well.






                        share|improve this answer

























                        • Shouldn't it be exactly the opposite? List: 10 * 64.2 * 1000 = 642000 usec, dict: 10000000 * 0.0759 = 759000 usec and set: 1000000 * 0.262 = 262000 usec ... so sets are the fastest, followed by the list and with dict as last on your example. Or am I missing something?

                          – andzep
                          Nov 9 '12 at 12:15







                        • 1





                          ... but the question for me here is: what does this times are actually measuring? Not the access time for a given list, dict or set, but much more, the time and loops to create the list, dict, set and finally to find and access one value. So, does this have to do with the question at all? ... It's interesting though...

                          – andzep
                          Nov 9 '12 at 12:25







                        • 7





                          @andzep, you are mistaken, the -s option is to setup the timeit environment, i.e. it does not count in the total time. The -s option is run only once. On Python 3.3, I get these results: gen (range) -> 0.229 usec, list -> 157 msec, dict -> 0.0806 usec, set -> 0.0807 usec. Set and dict performance is the same. Dict however takes a bit longer to initialize than set (total time 13.580s v. 11.803s)

                          – sleblanc
                          Jun 17 '13 at 3:48







                        • 1





                          why not use builtin set? I actually get much worse results with sets.Set() than with builtin set()

                          – Thomas Guyot-Sionnest
                          Oct 27 '16 at 14:46






                        • 1





                          @ThomasGuyot-Sionnest The built in set was introduced in python 2.4 so I'm not sure why I didn't use it in my proposed solution. I get good performance with python -mtimeit -s "d=set(range(10**7))" "5*10**6 in d" using Python 3.6.0 (10000000 loops, best of 3: 0.0608 usec per loop), roughly the same as the dict benchmark so thank you for your comment.

                          – EriF89
                          Jan 25 '17 at 15:17














                        26












                        26








                        26







                        I did some benchmarking and it turns out that dict is faster than both list and set for large data sets, running python 2.7.3 on an i7 CPU on linux:




                        • python -mtimeit -s 'd=range(10**7)' '5*10**6 in d'



                          10 loops, best of 3: 64.2 msec per loop




                        • python -mtimeit -s 'd=dict.fromkeys(range(10**7))' '5*10**6 in d'



                          10000000 loops, best of 3: 0.0759 usec per loop




                        • python -mtimeit -s 'from sets import Set; d=Set(range(10**7))' '5*10**6 in d'



                          1000000 loops, best of 3: 0.262 usec per loop



                        As you can see, dict is considerably faster than list and about 3 times faster than set. In some applications you might still want to choose set for the beauty of it, though. And if the data sets are really small (< 1000 elements) lists perform pretty well.






                        share|improve this answer















                        I did some benchmarking and it turns out that dict is faster than both list and set for large data sets, running python 2.7.3 on an i7 CPU on linux:




                        • python -mtimeit -s 'd=range(10**7)' '5*10**6 in d'



                          10 loops, best of 3: 64.2 msec per loop




                        • python -mtimeit -s 'd=dict.fromkeys(range(10**7))' '5*10**6 in d'



                          10000000 loops, best of 3: 0.0759 usec per loop




                        • python -mtimeit -s 'from sets import Set; d=Set(range(10**7))' '5*10**6 in d'



                          1000000 loops, best of 3: 0.262 usec per loop



                        As you can see, dict is considerably faster than list and about 3 times faster than set. In some applications you might still want to choose set for the beauty of it, though. And if the data sets are really small (< 1000 elements) lists perform pretty well.







                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited Jun 28 '12 at 17:37









                        Thiem Nguyen

                        5,86262345




                        5,86262345










                        answered Jun 28 '12 at 9:16









                        EriF89EriF89

                        513410




                        513410












                        • Shouldn't it be exactly the opposite? List: 10 * 64.2 * 1000 = 642000 usec, dict: 10000000 * 0.0759 = 759000 usec and set: 1000000 * 0.262 = 262000 usec ... so sets are the fastest, followed by the list and with dict as last on your example. Or am I missing something?

                          – andzep
                          Nov 9 '12 at 12:15







                        • 1





                          ... but the question for me here is: what does this times are actually measuring? Not the access time for a given list, dict or set, but much more, the time and loops to create the list, dict, set and finally to find and access one value. So, does this have to do with the question at all? ... It's interesting though...

                          – andzep
                          Nov 9 '12 at 12:25







                        • 7





                          @andzep, you are mistaken, the -s option is to setup the timeit environment, i.e. it does not count in the total time. The -s option is run only once. On Python 3.3, I get these results: gen (range) -> 0.229 usec, list -> 157 msec, dict -> 0.0806 usec, set -> 0.0807 usec. Set and dict performance is the same. Dict however takes a bit longer to initialize than set (total time 13.580s v. 11.803s)

                          – sleblanc
                          Jun 17 '13 at 3:48







                        • 1





                          why not use builtin set? I actually get much worse results with sets.Set() than with builtin set()

                          – Thomas Guyot-Sionnest
                          Oct 27 '16 at 14:46






                        • 1





                          @ThomasGuyot-Sionnest The built in set was introduced in python 2.4 so I'm not sure why I didn't use it in my proposed solution. I get good performance with python -mtimeit -s "d=set(range(10**7))" "5*10**6 in d" using Python 3.6.0 (10000000 loops, best of 3: 0.0608 usec per loop), roughly the same as the dict benchmark so thank you for your comment.

                          – EriF89
                          Jan 25 '17 at 15:17


















                        • Shouldn't it be exactly the opposite? List: 10 * 64.2 * 1000 = 642000 usec, dict: 10000000 * 0.0759 = 759000 usec and set: 1000000 * 0.262 = 262000 usec ... so sets are the fastest, followed by the list and with dict as last on your example. Or am I missing something?

                          – andzep
                          Nov 9 '12 at 12:15







                        • 1





                          ... but the question for me here is: what does this times are actually measuring? Not the access time for a given list, dict or set, but much more, the time and loops to create the list, dict, set and finally to find and access one value. So, does this have to do with the question at all? ... It's interesting though...

                          – andzep
                          Nov 9 '12 at 12:25







                        • 7





                          @andzep, you are mistaken, the -s option is to setup the timeit environment, i.e. it does not count in the total time. The -s option is run only once. On Python 3.3, I get these results: gen (range) -> 0.229 usec, list -> 157 msec, dict -> 0.0806 usec, set -> 0.0807 usec. Set and dict performance is the same. Dict however takes a bit longer to initialize than set (total time 13.580s v. 11.803s)

                          – sleblanc
                          Jun 17 '13 at 3:48







                        • 1





                          why not use builtin set? I actually get much worse results with sets.Set() than with builtin set()

                          – Thomas Guyot-Sionnest
                          Oct 27 '16 at 14:46






                        • 1





                          @ThomasGuyot-Sionnest The built in set was introduced in python 2.4 so I'm not sure why I didn't use it in my proposed solution. I get good performance with python -mtimeit -s "d=set(range(10**7))" "5*10**6 in d" using Python 3.6.0 (10000000 loops, best of 3: 0.0608 usec per loop), roughly the same as the dict benchmark so thank you for your comment.

                          – EriF89
                          Jan 25 '17 at 15:17

















                        Shouldn't it be exactly the opposite? List: 10 * 64.2 * 1000 = 642000 usec, dict: 10000000 * 0.0759 = 759000 usec and set: 1000000 * 0.262 = 262000 usec ... so sets are the fastest, followed by the list and with dict as last on your example. Or am I missing something?

                        – andzep
                        Nov 9 '12 at 12:15






                        Shouldn't it be exactly the opposite? List: 10 * 64.2 * 1000 = 642000 usec, dict: 10000000 * 0.0759 = 759000 usec and set: 1000000 * 0.262 = 262000 usec ... so sets are the fastest, followed by the list and with dict as last on your example. Or am I missing something?

                        – andzep
                        Nov 9 '12 at 12:15





                        1




                        1





                        ... but the question for me here is: what does this times are actually measuring? Not the access time for a given list, dict or set, but much more, the time and loops to create the list, dict, set and finally to find and access one value. So, does this have to do with the question at all? ... It's interesting though...

                        – andzep
                        Nov 9 '12 at 12:25






                        ... but the question for me here is: what does this times are actually measuring? Not the access time for a given list, dict or set, but much more, the time and loops to create the list, dict, set and finally to find and access one value. So, does this have to do with the question at all? ... It's interesting though...

                        – andzep
                        Nov 9 '12 at 12:25





                        7




                        7





                        @andzep, you are mistaken, the -s option is to setup the timeit environment, i.e. it does not count in the total time. The -s option is run only once. On Python 3.3, I get these results: gen (range) -> 0.229 usec, list -> 157 msec, dict -> 0.0806 usec, set -> 0.0807 usec. Set and dict performance is the same. Dict however takes a bit longer to initialize than set (total time 13.580s v. 11.803s)

                        – sleblanc
                        Jun 17 '13 at 3:48






                        @andzep, you are mistaken, the -s option is to setup the timeit environment, i.e. it does not count in the total time. The -s option is run only once. On Python 3.3, I get these results: gen (range) -> 0.229 usec, list -> 157 msec, dict -> 0.0806 usec, set -> 0.0807 usec. Set and dict performance is the same. Dict however takes a bit longer to initialize than set (total time 13.580s v. 11.803s)

                        – sleblanc
                        Jun 17 '13 at 3:48





                        1




                        1





                        why not use builtin set? I actually get much worse results with sets.Set() than with builtin set()

                        – Thomas Guyot-Sionnest
                        Oct 27 '16 at 14:46





                        why not use builtin set? I actually get much worse results with sets.Set() than with builtin set()

                        – Thomas Guyot-Sionnest
                        Oct 27 '16 at 14:46




                        1




                        1





                        @ThomasGuyot-Sionnest The built in set was introduced in python 2.4 so I'm not sure why I didn't use it in my proposed solution. I get good performance with python -mtimeit -s "d=set(range(10**7))" "5*10**6 in d" using Python 3.6.0 (10000000 loops, best of 3: 0.0608 usec per loop), roughly the same as the dict benchmark so thank you for your comment.

                        – EriF89
                        Jan 25 '17 at 15:17






                        @ThomasGuyot-Sionnest The built in set was introduced in python 2.4 so I'm not sure why I didn't use it in my proposed solution. I get good performance with python -mtimeit -s "d=set(range(10**7))" "5*10**6 in d" using Python 3.6.0 (10000000 loops, best of 3: 0.0608 usec per loop), roughly the same as the dict benchmark so thank you for your comment.

                        – EriF89
                        Jan 25 '17 at 15:17












                        6














                        if data are unique set() will be the most efficient, but of two - dict (which also requires uniqueness, oops :)






                        share|improve this answer


















                        • 1





                          not really... data must be unique for the dict too.

                          – nosklo
                          Feb 4 '09 at 23:33











                        • i've realised when i saw my answer posted %)

                          – SilentGhost
                          Feb 4 '09 at 23:34






                        • 1





                          @SilentGhost if the answer is wrong, why not deleting it? too bad for the upvotes, but that happens (well, happened)

                          – Jean-François Fabre
                          May 19 '17 at 21:28















                        6














                        if data are unique set() will be the most efficient, but of two - dict (which also requires uniqueness, oops :)






                        share|improve this answer


















                        • 1





                          not really... data must be unique for the dict too.

                          – nosklo
                          Feb 4 '09 at 23:33











                        • i've realised when i saw my answer posted %)

                          – SilentGhost
                          Feb 4 '09 at 23:34






                        • 1





                          @SilentGhost if the answer is wrong, why not deleting it? too bad for the upvotes, but that happens (well, happened)

                          – Jean-François Fabre
                          May 19 '17 at 21:28













                        6












                        6








                        6







                        if data are unique set() will be the most efficient, but of two - dict (which also requires uniqueness, oops :)






                        share|improve this answer













                        if data are unique set() will be the most efficient, but of two - dict (which also requires uniqueness, oops :)







                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered Feb 4 '09 at 23:30









                        SilentGhostSilentGhost

                        201k47271267




                        201k47271267







                        • 1





                          not really... data must be unique for the dict too.

                          – nosklo
                          Feb 4 '09 at 23:33











                        • i've realised when i saw my answer posted %)

                          – SilentGhost
                          Feb 4 '09 at 23:34






                        • 1





                          @SilentGhost if the answer is wrong, why not deleting it? too bad for the upvotes, but that happens (well, happened)

                          – Jean-François Fabre
                          May 19 '17 at 21:28












                        • 1





                          not really... data must be unique for the dict too.

                          – nosklo
                          Feb 4 '09 at 23:33











                        • i've realised when i saw my answer posted %)

                          – SilentGhost
                          Feb 4 '09 at 23:34






                        • 1





                          @SilentGhost if the answer is wrong, why not deleting it? too bad for the upvotes, but that happens (well, happened)

                          – Jean-François Fabre
                          May 19 '17 at 21:28







                        1




                        1





                        not really... data must be unique for the dict too.

                        – nosklo
                        Feb 4 '09 at 23:33





                        not really... data must be unique for the dict too.

                        – nosklo
                        Feb 4 '09 at 23:33













                        i've realised when i saw my answer posted %)

                        – SilentGhost
                        Feb 4 '09 at 23:34





                        i've realised when i saw my answer posted %)

                        – SilentGhost
                        Feb 4 '09 at 23:34




                        1




                        1





                        @SilentGhost if the answer is wrong, why not deleting it? too bad for the upvotes, but that happens (well, happened)

                        – Jean-François Fabre
                        May 19 '17 at 21:28





                        @SilentGhost if the answer is wrong, why not deleting it? too bad for the upvotes, but that happens (well, happened)

                        – Jean-François Fabre
                        May 19 '17 at 21:28











                        6














                        You want a dict.



                        For (unsorted) lists in Python, the "in" operation requires O(n) time---not good when you have a large amount of data. A dict, on the other hand, is a hash table, so you can expect O(1) lookup time.



                        As others have noted, you might choose a set (a special type of dict) instead, if you only have keys rather than key/value pairs.



                        Related:




                        • Python wiki: information on the time complexity of Python container operations.


                        • SO: Python container operation time and memory complexities





                        share|improve this answer




















                        • 1





                          Even for sorted lists, "in" is O(n).

                          – Roger Pate
                          Feb 5 '09 at 3:39






                        • 2





                          For a linked list, yes---but "lists" in Python are what most people would call vectors, which provide indexed access in O(1) and a find operation in O(log n), when sorted.

                          – zweiterlinde
                          Feb 5 '09 at 14:49











                        • Are you saying that the in operator applied to a sorted list performs better than when applied to an unsorted one (for a search of a random value)? (I don't think whether they are implemented internally as vectors or as nodes in a linked-list is relevant.)

                          – martineau
                          Nov 27 '10 at 12:59
















                        6














                        You want a dict.



                        For (unsorted) lists in Python, the "in" operation requires O(n) time---not good when you have a large amount of data. A dict, on the other hand, is a hash table, so you can expect O(1) lookup time.



                        As others have noted, you might choose a set (a special type of dict) instead, if you only have keys rather than key/value pairs.



                        Related:




                        • Python wiki: information on the time complexity of Python container operations.


                        • SO: Python container operation time and memory complexities





                        share|improve this answer




















                        • 1





                          Even for sorted lists, "in" is O(n).

                          – Roger Pate
                          Feb 5 '09 at 3:39






                        • 2





                          For a linked list, yes---but "lists" in Python are what most people would call vectors, which provide indexed access in O(1) and a find operation in O(log n), when sorted.

                          – zweiterlinde
                          Feb 5 '09 at 14:49











                        • Are you saying that the in operator applied to a sorted list performs better than when applied to an unsorted one (for a search of a random value)? (I don't think whether they are implemented internally as vectors or as nodes in a linked-list is relevant.)

                          – martineau
                          Nov 27 '10 at 12:59














                        6












                        6








                        6







                        You want a dict.



                        For (unsorted) lists in Python, the "in" operation requires O(n) time---not good when you have a large amount of data. A dict, on the other hand, is a hash table, so you can expect O(1) lookup time.



                        As others have noted, you might choose a set (a special type of dict) instead, if you only have keys rather than key/value pairs.



                        Related:




                        • Python wiki: information on the time complexity of Python container operations.


                        • SO: Python container operation time and memory complexities





                        share|improve this answer















                        You want a dict.



                        For (unsorted) lists in Python, the "in" operation requires O(n) time---not good when you have a large amount of data. A dict, on the other hand, is a hash table, so you can expect O(1) lookup time.



                        As others have noted, you might choose a set (a special type of dict) instead, if you only have keys rather than key/value pairs.



                        Related:




                        • Python wiki: information on the time complexity of Python container operations.


                        • SO: Python container operation time and memory complexities






                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited May 23 '17 at 12:26









                        Community

                        11




                        11










                        answered Feb 4 '09 at 23:37









                        zweiterlindezweiterlinde

                        11.1k22330




                        11.1k22330







                        • 1





                          Even for sorted lists, "in" is O(n).

                          – Roger Pate
                          Feb 5 '09 at 3:39






                        • 2





                          For a linked list, yes---but "lists" in Python are what most people would call vectors, which provide indexed access in O(1) and a find operation in O(log n), when sorted.

                          – zweiterlinde
                          Feb 5 '09 at 14:49











                        • Are you saying that the in operator applied to a sorted list performs better than when applied to an unsorted one (for a search of a random value)? (I don't think whether they are implemented internally as vectors or as nodes in a linked-list is relevant.)

                          – martineau
                          Nov 27 '10 at 12:59













                        • 1





                          Even for sorted lists, "in" is O(n).

                          – Roger Pate
                          Feb 5 '09 at 3:39






                        • 2





                          For a linked list, yes---but "lists" in Python are what most people would call vectors, which provide indexed access in O(1) and a find operation in O(log n), when sorted.

                          – zweiterlinde
                          Feb 5 '09 at 14:49











                        • Are you saying that the in operator applied to a sorted list performs better than when applied to an unsorted one (for a search of a random value)? (I don't think whether they are implemented internally as vectors or as nodes in a linked-list is relevant.)

                          – martineau
                          Nov 27 '10 at 12:59








                        1




                        1





                        Even for sorted lists, "in" is O(n).

                        – Roger Pate
                        Feb 5 '09 at 3:39





                        Even for sorted lists, "in" is O(n).

                        – Roger Pate
                        Feb 5 '09 at 3:39




                        2




                        2





                        For a linked list, yes---but "lists" in Python are what most people would call vectors, which provide indexed access in O(1) and a find operation in O(log n), when sorted.

                        – zweiterlinde
                        Feb 5 '09 at 14:49





                        For a linked list, yes---but "lists" in Python are what most people would call vectors, which provide indexed access in O(1) and a find operation in O(log n), when sorted.

                        – zweiterlinde
                        Feb 5 '09 at 14:49













                        Are you saying that the in operator applied to a sorted list performs better than when applied to an unsorted one (for a search of a random value)? (I don't think whether they are implemented internally as vectors or as nodes in a linked-list is relevant.)

                        – martineau
                        Nov 27 '10 at 12:59






                        Are you saying that the in operator applied to a sorted list performs better than when applied to an unsorted one (for a search of a random value)? (I don't think whether they are implemented internally as vectors or as nodes in a linked-list is relevant.)

                        – martineau
                        Nov 27 '10 at 12:59












                        2














                        As a new set of tests to show @EriF89 is still right after all these years:



                        $ python -m timeit -s "l=k:k for k in xrange(5000)" "[i for i in xrange(10000) if i in l]"
                        1000 loops, best of 3: 1.84 msec per loop
                        $ python -m timeit -s "l=[k for k in xrange(5000)]" "[i for i in xrange(10000) if i in l]"
                        10 loops, best of 3: 573 msec per loop
                        $ python -m timeit -s "l=tuple([k for k in xrange(5000)])" "[i for i in xrange(10000) if i in l]"
                        10 loops, best of 3: 587 msec per loop
                        $ python -m timeit -s "l=set([k for k in xrange(5000)])" "[i for i in xrange(10000) if i in l]"
                        1000 loops, best of 3: 1.88 msec per loop


                        Here we also compare a tuple, which are known to be faster than lists (and use less memory) in some use cases. In the case of lookup table, the tuple faired no better .



                        Both the dict and set performed very well. This brings up an interesting point tying into @SilentGhost answer about uniqueness: if the OP has 10M values in a data set, and it's unknown if there are duplicates in them, then it would be worth keeping a set/dict of its elements in parallel with the actual data set, and testing for existence in that set/dict. It's possible the 10M data points only have 10 unique values, which is a much smaller space to search!



                        SilentGhost's mistake about dicts is actually illuminating because one could use a dict to correlate duplicated data (in values) into a nonduplicated set (keys), and thus keep one data object to hold all data, yet still be fast as a lookup table. For example, a dict key could be the value being looked up, and the value could be a list of indices in an imaginary list where that value occurred.



                        For example, if the source data list to be searched was l=[1,2,3,1,2,1,4], it could be optimized for both searching and memory by replacing it with this dict:



                        >>> from collections import defaultdict
                        >>> d = defaultdict(list)
                        >>> l=[1,2,3,1,2,1,4]
                        >>> for i, e in enumerate(l):
                        ... d[e].append(i)
                        >>> d
                        defaultdict(<class 'list'>, 1: [0, 3, 5], 2: [1, 4], 3: [2], 4: [6])


                        With this dict, one can know:




                        1. If a value was in the original dataset (ie 2 in d returns True)


                        2. Where the value was in the original dataset (ie d[2] returns list of indices where data was found in original data list: [1, 4])





                        share|improve this answer

























                        • For your last paragraph, while it makes sense reading it, it would be nice (and probably easier to grasp) to see the actual code you are trying to explain.

                          – kaiser
                          Jun 29 '18 at 19:31











                        • @kaiser good idea - I added an example to the answer.

                          – hamx0r
                          Jul 17 '18 at 22:22















                        2














                        As a new set of tests to show @EriF89 is still right after all these years:



                        $ python -m timeit -s "l=k:k for k in xrange(5000)" "[i for i in xrange(10000) if i in l]"
                        1000 loops, best of 3: 1.84 msec per loop
                        $ python -m timeit -s "l=[k for k in xrange(5000)]" "[i for i in xrange(10000) if i in l]"
                        10 loops, best of 3: 573 msec per loop
                        $ python -m timeit -s "l=tuple([k for k in xrange(5000)])" "[i for i in xrange(10000) if i in l]"
                        10 loops, best of 3: 587 msec per loop
                        $ python -m timeit -s "l=set([k for k in xrange(5000)])" "[i for i in xrange(10000) if i in l]"
                        1000 loops, best of 3: 1.88 msec per loop


                        Here we also compare a tuple, which are known to be faster than lists (and use less memory) in some use cases. In the case of lookup table, the tuple faired no better .



                        Both the dict and set performed very well. This brings up an interesting point tying into @SilentGhost answer about uniqueness: if the OP has 10M values in a data set, and it's unknown if there are duplicates in them, then it would be worth keeping a set/dict of its elements in parallel with the actual data set, and testing for existence in that set/dict. It's possible the 10M data points only have 10 unique values, which is a much smaller space to search!



                        SilentGhost's mistake about dicts is actually illuminating because one could use a dict to correlate duplicated data (in values) into a nonduplicated set (keys), and thus keep one data object to hold all data, yet still be fast as a lookup table. For example, a dict key could be the value being looked up, and the value could be a list of indices in an imaginary list where that value occurred.



                        For example, if the source data list to be searched was l=[1,2,3,1,2,1,4], it could be optimized for both searching and memory by replacing it with this dict:



                        >>> from collections import defaultdict
                        >>> d = defaultdict(list)
                        >>> l=[1,2,3,1,2,1,4]
                        >>> for i, e in enumerate(l):
                        ... d[e].append(i)
                        >>> d
                        defaultdict(<class 'list'>, 1: [0, 3, 5], 2: [1, 4], 3: [2], 4: [6])


                        With this dict, one can know:




                        1. If a value was in the original dataset (ie 2 in d returns True)


                        2. Where the value was in the original dataset (ie d[2] returns list of indices where data was found in original data list: [1, 4])





                        share|improve this answer

























                        • For your last paragraph, while it makes sense reading it, it would be nice (and probably easier to grasp) to see the actual code you are trying to explain.

                          – kaiser
                          Jun 29 '18 at 19:31











                        • @kaiser good idea - I added an example to the answer.

                          – hamx0r
                          Jul 17 '18 at 22:22













                        2












                        2








                        2







                        As a new set of tests to show @EriF89 is still right after all these years:



                        $ python -m timeit -s "l=k:k for k in xrange(5000)" "[i for i in xrange(10000) if i in l]"
                        1000 loops, best of 3: 1.84 msec per loop
                        $ python -m timeit -s "l=[k for k in xrange(5000)]" "[i for i in xrange(10000) if i in l]"
                        10 loops, best of 3: 573 msec per loop
                        $ python -m timeit -s "l=tuple([k for k in xrange(5000)])" "[i for i in xrange(10000) if i in l]"
                        10 loops, best of 3: 587 msec per loop
                        $ python -m timeit -s "l=set([k for k in xrange(5000)])" "[i for i in xrange(10000) if i in l]"
                        1000 loops, best of 3: 1.88 msec per loop


                        Here we also compare a tuple, which are known to be faster than lists (and use less memory) in some use cases. In the case of lookup table, the tuple faired no better .



                        Both the dict and set performed very well. This brings up an interesting point tying into @SilentGhost answer about uniqueness: if the OP has 10M values in a data set, and it's unknown if there are duplicates in them, then it would be worth keeping a set/dict of its elements in parallel with the actual data set, and testing for existence in that set/dict. It's possible the 10M data points only have 10 unique values, which is a much smaller space to search!



                        SilentGhost's mistake about dicts is actually illuminating because one could use a dict to correlate duplicated data (in values) into a nonduplicated set (keys), and thus keep one data object to hold all data, yet still be fast as a lookup table. For example, a dict key could be the value being looked up, and the value could be a list of indices in an imaginary list where that value occurred.



                        For example, if the source data list to be searched was l=[1,2,3,1,2,1,4], it could be optimized for both searching and memory by replacing it with this dict:



                        >>> from collections import defaultdict
                        >>> d = defaultdict(list)
                        >>> l=[1,2,3,1,2,1,4]
                        >>> for i, e in enumerate(l):
                        ... d[e].append(i)
                        >>> d
                        defaultdict(<class 'list'>, 1: [0, 3, 5], 2: [1, 4], 3: [2], 4: [6])


                        With this dict, one can know:




                        1. If a value was in the original dataset (ie 2 in d returns True)


                        2. Where the value was in the original dataset (ie d[2] returns list of indices where data was found in original data list: [1, 4])





                        share|improve this answer















                        As a new set of tests to show @EriF89 is still right after all these years:



                        $ python -m timeit -s "l=k:k for k in xrange(5000)" "[i for i in xrange(10000) if i in l]"
                        1000 loops, best of 3: 1.84 msec per loop
                        $ python -m timeit -s "l=[k for k in xrange(5000)]" "[i for i in xrange(10000) if i in l]"
                        10 loops, best of 3: 573 msec per loop
                        $ python -m timeit -s "l=tuple([k for k in xrange(5000)])" "[i for i in xrange(10000) if i in l]"
                        10 loops, best of 3: 587 msec per loop
                        $ python -m timeit -s "l=set([k for k in xrange(5000)])" "[i for i in xrange(10000) if i in l]"
                        1000 loops, best of 3: 1.88 msec per loop


                        Here we also compare a tuple, which are known to be faster than lists (and use less memory) in some use cases. In the case of lookup table, the tuple faired no better .



                        Both the dict and set performed very well. This brings up an interesting point tying into @SilentGhost answer about uniqueness: if the OP has 10M values in a data set, and it's unknown if there are duplicates in them, then it would be worth keeping a set/dict of its elements in parallel with the actual data set, and testing for existence in that set/dict. It's possible the 10M data points only have 10 unique values, which is a much smaller space to search!



                        SilentGhost's mistake about dicts is actually illuminating because one could use a dict to correlate duplicated data (in values) into a nonduplicated set (keys), and thus keep one data object to hold all data, yet still be fast as a lookup table. For example, a dict key could be the value being looked up, and the value could be a list of indices in an imaginary list where that value occurred.



                        For example, if the source data list to be searched was l=[1,2,3,1,2,1,4], it could be optimized for both searching and memory by replacing it with this dict:



                        >>> from collections import defaultdict
                        >>> d = defaultdict(list)
                        >>> l=[1,2,3,1,2,1,4]
                        >>> for i, e in enumerate(l):
                        ... d[e].append(i)
                        >>> d
                        defaultdict(<class 'list'>, 1: [0, 3, 5], 2: [1, 4], 3: [2], 4: [6])


                        With this dict, one can know:




                        1. If a value was in the original dataset (ie 2 in d returns True)


                        2. Where the value was in the original dataset (ie d[2] returns list of indices where data was found in original data list: [1, 4])






                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited Jul 17 '18 at 22:22

























                        answered Jun 7 '18 at 13:33









                        hamx0rhamx0r

                        1,4851827




                        1,4851827












                        • For your last paragraph, while it makes sense reading it, it would be nice (and probably easier to grasp) to see the actual code you are trying to explain.

                          – kaiser
                          Jun 29 '18 at 19:31











                        • @kaiser good idea - I added an example to the answer.

                          – hamx0r
                          Jul 17 '18 at 22:22

















                        • For your last paragraph, while it makes sense reading it, it would be nice (and probably easier to grasp) to see the actual code you are trying to explain.

                          – kaiser
                          Jun 29 '18 at 19:31











                        • @kaiser good idea - I added an example to the answer.

                          – hamx0r
                          Jul 17 '18 at 22:22
















                        For your last paragraph, while it makes sense reading it, it would be nice (and probably easier to grasp) to see the actual code you are trying to explain.

                        – kaiser
                        Jun 29 '18 at 19:31





                        For your last paragraph, while it makes sense reading it, it would be nice (and probably easier to grasp) to see the actual code you are trying to explain.

                        – kaiser
                        Jun 29 '18 at 19:31













                        @kaiser good idea - I added an example to the answer.

                        – hamx0r
                        Jul 17 '18 at 22:22





                        @kaiser good idea - I added an example to the answer.

                        – hamx0r
                        Jul 17 '18 at 22:22











                        0














                        You don't actually need to store 10 million values in the table, so it's not a big deal either way.



                        Hint: think about how large your result can be after the first sum of squares operation. The largest possible result will be much smaller than 10 million...






                        share|improve this answer



























                          0














                          You don't actually need to store 10 million values in the table, so it's not a big deal either way.



                          Hint: think about how large your result can be after the first sum of squares operation. The largest possible result will be much smaller than 10 million...






                          share|improve this answer

























                            0












                            0








                            0







                            You don't actually need to store 10 million values in the table, so it's not a big deal either way.



                            Hint: think about how large your result can be after the first sum of squares operation. The largest possible result will be much smaller than 10 million...






                            share|improve this answer













                            You don't actually need to store 10 million values in the table, so it's not a big deal either way.



                            Hint: think about how large your result can be after the first sum of squares operation. The largest possible result will be much smaller than 10 million...







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Feb 23 '09 at 19:03









                            KivKiv

                            22.8k43856




                            22.8k43856



























                                draft saved

                                draft discarded
















































                                Thanks for contributing an answer to Stack Overflow!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid


                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.

                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f513882%2fpython-list-vs-dict-for-look-up-table%23new-answer', 'question_page');

                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

                                Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

                                Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript