OffsetIndex in parquet 1.11.0 Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) Data science time! April 2019 and salary with experience Should we burninate the [wrap] tag? The Ask Question Wizard is Live!Index in ParquetAvro vs. ParquetPredicate projection not working in parquetInspect Parquet from command lineSpark 2.0 deprecates 'DirectParquetOutputCommitter', how to live without it?Writing large parquet file (500 millions row / 1000 columns) to S3 takes too much timehow to create date type column in parquet file with parquet-avroHow to handle null values when writing to parquet from SparkCan't write ordered data to parquet in sparkNested JSON to Parquet

Storing hydrofluoric acid before the invention of plastics

Is it true that "carbohydrates are of no use for the basal metabolic need"?

When do you get frequent flier miles - when you buy, or when you fly?

List of Python versions

How to tell that you are a giant?

Error "illegal generic type for instanceof" when using local classes

Ring Automorphisms that fix 1.

Output the ŋarâþ crîþ alphabet song without using (m)any letters

Short Story with Cinderella as a Voo-doo Witch

What is the logic behind the Maharil's explanation of why we don't say שעשה ניסים on Pesach?

Seeking colloquialism for “just because”

Single word antonym of "flightless"

2001: A Space Odyssey's use of the song "Daisy Bell" (Bicycle Built for Two); life imitates art or vice-versa?

Bete Noir -- no dairy

Echoing a tail command produces unexpected output?

Using et al. for a last / senior author rather than for a first author

How to call a function with default parameter through a pointer to function that is the return of another function?

How to deal with a team lead who never gives me credit?

How to find all the available tools in mac terminal?

String `!23` is replaced with `docker` in command line

What causes the vertical darker bands in my photo?

3 doors, three guards, one stone

What does an IRS interview request entail when called in to verify expenses for a sole proprietor small business?

When were vectors invented?



OffsetIndex in parquet 1.11.0



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
Data science time! April 2019 and salary with experience
Should we burninate the [wrap] tag?
The Ask Question Wizard is Live!Index in ParquetAvro vs. ParquetPredicate projection not working in parquetInspect Parquet from command lineSpark 2.0 deprecates 'DirectParquetOutputCommitter', how to live without it?Writing large parquet file (500 millions row / 1000 columns) to S3 takes too much timehow to create date type column in parquet file with parquet-avroHow to handle null values when writing to parquet from SparkCan't write ordered data to parquet in sparkNested JSON to Parquet



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















From parquet 1.10.0, parquet introduces two new index structures, i.e., ColumnIndex and OffsetIndex. The document is here https://github.com/apache/parquet-format/blob/master/PageIndex.md



From the document, I can clearly understand the idea of ColumnIndex which points to pages inside each column chunk. But I don't quite understand the idea behind OffsetIndex.



As the document says, the OffsetIndex is used to navigate to rows identified by the ColumnIndex. But the ColumnIndex points only to pages which is compressed as a whole. Then, how can the OffsetIndex be used to navigate to, for example, a single row inside a row group?










share|improve this question




























    0















    From parquet 1.10.0, parquet introduces two new index structures, i.e., ColumnIndex and OffsetIndex. The document is here https://github.com/apache/parquet-format/blob/master/PageIndex.md



    From the document, I can clearly understand the idea of ColumnIndex which points to pages inside each column chunk. But I don't quite understand the idea behind OffsetIndex.



    As the document says, the OffsetIndex is used to navigate to rows identified by the ColumnIndex. But the ColumnIndex points only to pages which is compressed as a whole. Then, how can the OffsetIndex be used to navigate to, for example, a single row inside a row group?










    share|improve this question
























      0












      0








      0








      From parquet 1.10.0, parquet introduces two new index structures, i.e., ColumnIndex and OffsetIndex. The document is here https://github.com/apache/parquet-format/blob/master/PageIndex.md



      From the document, I can clearly understand the idea of ColumnIndex which points to pages inside each column chunk. But I don't quite understand the idea behind OffsetIndex.



      As the document says, the OffsetIndex is used to navigate to rows identified by the ColumnIndex. But the ColumnIndex points only to pages which is compressed as a whole. Then, how can the OffsetIndex be used to navigate to, for example, a single row inside a row group?










      share|improve this question














      From parquet 1.10.0, parquet introduces two new index structures, i.e., ColumnIndex and OffsetIndex. The document is here https://github.com/apache/parquet-format/blob/master/PageIndex.md



      From the document, I can clearly understand the idea of ColumnIndex which points to pages inside each column chunk. But I don't quite understand the idea behind OffsetIndex.



      As the document says, the OffsetIndex is used to navigate to rows identified by the ColumnIndex. But the ColumnIndex points only to pages which is compressed as a whole. Then, how can the OffsetIndex be used to navigate to, for example, a single row inside a row group?







      parquet






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 22 at 9:05









      Liqun LiLiqun Li

      64




      64






















          1 Answer
          1






          active

          oldest

          votes


















          0














          After reading the doc here: https://docs.google.com/document/d/1sBACp8Lbutuj1Zxdowvsrlm8ku4BFxf8U_Do5K2wSO4/edit



          In one sentence, one ColumnIndex stores the statistics of all pages belonging to one column, while the exact offset of each page within the ColumnChunk is stored in the OffsetIndex.






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55296140%2foffsetindex-in-parquet-1-11-0%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            After reading the doc here: https://docs.google.com/document/d/1sBACp8Lbutuj1Zxdowvsrlm8ku4BFxf8U_Do5K2wSO4/edit



            In one sentence, one ColumnIndex stores the statistics of all pages belonging to one column, while the exact offset of each page within the ColumnChunk is stored in the OffsetIndex.






            share|improve this answer



























              0














              After reading the doc here: https://docs.google.com/document/d/1sBACp8Lbutuj1Zxdowvsrlm8ku4BFxf8U_Do5K2wSO4/edit



              In one sentence, one ColumnIndex stores the statistics of all pages belonging to one column, while the exact offset of each page within the ColumnChunk is stored in the OffsetIndex.






              share|improve this answer

























                0












                0








                0







                After reading the doc here: https://docs.google.com/document/d/1sBACp8Lbutuj1Zxdowvsrlm8ku4BFxf8U_Do5K2wSO4/edit



                In one sentence, one ColumnIndex stores the statistics of all pages belonging to one column, while the exact offset of each page within the ColumnChunk is stored in the OffsetIndex.






                share|improve this answer













                After reading the doc here: https://docs.google.com/document/d/1sBACp8Lbutuj1Zxdowvsrlm8ku4BFxf8U_Do5K2wSO4/edit



                In one sentence, one ColumnIndex stores the statistics of all pages belonging to one column, while the exact offset of each page within the ColumnChunk is stored in the OffsetIndex.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Mar 25 at 2:58









                Liqun LiLiqun Li

                64




                64





























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55296140%2foffsetindex-in-parquet-1-11-0%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

                    SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

                    은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현