c++ mmap to “fast” read coupling with gzip file The Next CEO of Stack OverflowFast textfile reading in c++Can mmap and gzip collaborate?mmap() vs. reading blocksiostream linker errorRead whole ASCII file into C++ std::stringHow to read a large text file line by line using Java?Read file line by line using ifstream in C++Why is reading lines from stdin much slower in C++ than Python?Compile header file and two .cpp files in Unix/Linux (Ubuntu)Opening any text fileGetting Error when reading file in string using C++Builder Projectfatal error: 'boost/test/unit_test.hpp' file not found

"Eavesdropping" vs "Listen in on"

How to implement Comparable so it is consistent with identity-equality

Why did early computer designers eschew integers?

How seriously should I take size and weight limits of hand luggage?

Is a linearly independent set whose span is dense a Schauder basis?

Planeswalker Ability and Death Timing

Airship steam engine room - problems and conflict

How should I connect my cat5 cable to connectors having an orange-green line?

Raspberry pi 3 B with Ubuntu 18.04 server arm64: what pi version

Is it okay to majorly distort historical facts while writing a fiction story?

Prodigo = pro + ago?

Traveling with my 5 year old daughter (as the father) without the mother from Germany to Mexico

Compensation for working overtime on Saturdays

Another proof that dividing by 0 does not exist -- is it right?

Is it a bad idea to plug the other end of ESD strap to wall ground?

Do I need to write [sic] when including a quotation with a number less than 10 that isn't written out?

Upgrading From a 9 Speed Sora Derailleur?

That's an odd coin - I wonder why

How can a day be of 24 hours?

Can you teleport closer to a creature you are Frightened of?

logical reads on global temp table, but not on session-level temp table

Is it OK to decorate a log book cover?

Oldie but Goldie

Avoiding the "not like other girls" trope?



c++ mmap to “fast” read coupling with gzip file



The Next CEO of Stack OverflowFast textfile reading in c++Can mmap and gzip collaborate?mmap() vs. reading blocksiostream linker errorRead whole ASCII file into C++ std::stringHow to read a large text file line by line using Java?Read file line by line using ifstream in C++Why is reading lines from stdin much slower in C++ than Python?Compile header file and two .cpp files in Unix/Linux (Ubuntu)Opening any text fileGetting Error when reading file in string using C++Builder Projectfatal error: 'boost/test/unit_test.hpp' file not found










0















I am quite new to C++ so sorry if I ask something silly, but I found no answer online (only a post that refers to python (Can mmap and gzip collaborate?)), trying to see if is possible to read a .GZ file through mmap() function (following: Fast textfile reading in c++) in order to operate some operations on the file and write it down on another file.
I need to retain only some part of the original rows and columns based on some columns/fields values to later retrieve them and compare with other similar files but from different subjects, in order to extract similarities/differences. The files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods for GZIP files.
It is more a "performance comparison" with other methods. Here is it the code (sorry, it is long and I think awful):



#include <algorithm>
#include <iostream>
#include <cstring>
#include <vector>
#include <typeinfo>

// for mmap:
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>

//for writefile
#include <fstream>

template <int N>
void emptyArray( char (&arrayName) [N] )
std::fill( std::begin( arrayName ), std::end( arrayName ), 0 );


const char* map_file(const char* fname, size_t& length);

int main()

//prende la dimensione del file da aprire
size_t length;
auto f = map_file("myfile.vcf", length);
auto l = f + length;


uintmax_t m_numLines = 0;

std::vector<int> v0;
std::vector<int> v1;
std::vector<int> v2;

for (int i=1; i<length; i++)
//vettore di posizioni # in prima posizione di una linea
if (f[i] == '#' && f[i-1] == 'n') v0.push_back(i);
//vettore di nuove linee
if (f[i] == 'n') v1.push_back(i+1);


std::vector<int> inter;
set_intersection(v0.begin(), v0.end(),
v1.begin(), v1.end(),
back_inserter(inter));

v1.erase(set_difference(v1.begin(), v1.end(),
inter.begin(), inter.end(),
v1.begin()), v1.end());

v1.pop_back();


char chromArray[3];
char posArray[10];
char refArray[50];
char altArray[50];
char qualityArray[10];
char gtArray[4];
char gqxArray[5];
char dpArray[5];
char adArray[10];

//LOOP per NUM RIGA
//apro loop su vettore NL (non #)
for (int nl =0; nl<v1.size(); nl++) qi < 2) continue;
if (stoi(gqxValue) < 30) continue;

std::ofstream myfile ("myRes.txt", std::ios_base::app);
if (myfile.is_open())
myfile <<
nl << "t" <<
chromValue << "-" << posValue << "-" << refValue << "-" << altValue << "t" <<
gtValue << "t" <<
gqxValue << "t" <<
quality << "t" <<
dpValue << "t" <<
adValue <<
"n";
myfile.close();
else
std::cout << "Unable to open file" << 'n';





void handle_error(const char* msg)
perror(msg);
exit(255);


const char* map_file(const char* fname, size_t& length)

int fd = open(fname, O_RDONLY);

if (fd == -1)
handle_error("open");

struct stat sb;
if (fstat(fd, &sb) == -1)
handle_error("fstat");
length = sb.st_size;

const char* addr = static_cast<const char*>(mmap(NULL, length, PROT_READ, MAP_PRIVATE, fd, 0u));
if (addr == MAP_FAILED)
handle_error("mmap");

return addr;



Now, I know I can open GZIP file with something like:



#include <fstream>
#include <iostream>
#include <sstream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>

//NB: devo linkare a libreria boost zlib in comando: c++ --std=c++11 -L/opt/X11/lib -lboost_iostreams -lz gzread.cpp -o gzread

using namespace std;
using namespace boost::iostreams;

int main()
ios_base::binary);
filtering_streambuf<input> inbuf; //iniziallizzo filtering_streambuf inbuf
inbuf.push(gzip_decompressor()); //ci metto dentro decompressore GZIP (se file GZIP)
inbuf.push(file); //ci metto dentro file

//Convert streambuf to istream
std::istream instream(&inbuf);
//Iterate lines
std::string line;

string chr;

while(std::getline(instream, line))
istringstream iss(line); // string stream della linea
int i = 0;
while (getline(iss, line, t)) // read first part up to comma, ignore the comma (il terzo arfomento di getline gli indica dove fermarsi, se assente si ferma a newline)
if (i == 2) cout << line << "n";
++i;


// copy(inbuf, cout); //copio in stdout



here an example of a file row:



chr1 1246301 . A C 4 OffTarget;LowGQX SNVSB=0.0;SNVHPOL=2;phyloP=1.096;CSQT=1|ACAP3|NM_030649.2|upstream_gene_variant,1|PUSL1|NM_153339.1|missense_variant,1|CPSF3L|NM_001256456.1|downstream_gene_variant GT:GQ:GQX:DP:DPF:AD:PL 0/1:3:0:1:0:0,1:37,3,0



Is there a way to combine them? Or even other approaches if they can be more "performant".



Thanks a lot for any suggestion!










share|improve this question
























  • "performant" is not a word. Anyway, what are you trying to achieve?

    – Jesper Juhl
    Mar 21 at 20:06











  • I need to retain only some part of the original rows based on the value of some columns/fields (ex. if (gqxi < 2 || dpi < 2 || qi < 2) continue; if (stoi(gqxValue) < 30) continue;) to later pass them and compare with other programs filtering of similar file but from different subjects, in order to extract similarities/differences. The other files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods.

    – cccnrc
    Mar 21 at 20:08












  • Why is that info not part of the question?

    – Jesper Juhl
    Mar 21 at 20:10











  • completely right, edited

    – cccnrc
    Mar 21 at 20:13















0















I am quite new to C++ so sorry if I ask something silly, but I found no answer online (only a post that refers to python (Can mmap and gzip collaborate?)), trying to see if is possible to read a .GZ file through mmap() function (following: Fast textfile reading in c++) in order to operate some operations on the file and write it down on another file.
I need to retain only some part of the original rows and columns based on some columns/fields values to later retrieve them and compare with other similar files but from different subjects, in order to extract similarities/differences. The files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods for GZIP files.
It is more a "performance comparison" with other methods. Here is it the code (sorry, it is long and I think awful):



#include <algorithm>
#include <iostream>
#include <cstring>
#include <vector>
#include <typeinfo>

// for mmap:
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>

//for writefile
#include <fstream>

template <int N>
void emptyArray( char (&arrayName) [N] )
std::fill( std::begin( arrayName ), std::end( arrayName ), 0 );


const char* map_file(const char* fname, size_t& length);

int main()

//prende la dimensione del file da aprire
size_t length;
auto f = map_file("myfile.vcf", length);
auto l = f + length;


uintmax_t m_numLines = 0;

std::vector<int> v0;
std::vector<int> v1;
std::vector<int> v2;

for (int i=1; i<length; i++)
//vettore di posizioni # in prima posizione di una linea
if (f[i] == '#' && f[i-1] == 'n') v0.push_back(i);
//vettore di nuove linee
if (f[i] == 'n') v1.push_back(i+1);


std::vector<int> inter;
set_intersection(v0.begin(), v0.end(),
v1.begin(), v1.end(),
back_inserter(inter));

v1.erase(set_difference(v1.begin(), v1.end(),
inter.begin(), inter.end(),
v1.begin()), v1.end());

v1.pop_back();


char chromArray[3];
char posArray[10];
char refArray[50];
char altArray[50];
char qualityArray[10];
char gtArray[4];
char gqxArray[5];
char dpArray[5];
char adArray[10];

//LOOP per NUM RIGA
//apro loop su vettore NL (non #)
for (int nl =0; nl<v1.size(); nl++) qi < 2) continue;
if (stoi(gqxValue) < 30) continue;

std::ofstream myfile ("myRes.txt", std::ios_base::app);
if (myfile.is_open())
myfile <<
nl << "t" <<
chromValue << "-" << posValue << "-" << refValue << "-" << altValue << "t" <<
gtValue << "t" <<
gqxValue << "t" <<
quality << "t" <<
dpValue << "t" <<
adValue <<
"n";
myfile.close();
else
std::cout << "Unable to open file" << 'n';





void handle_error(const char* msg)
perror(msg);
exit(255);


const char* map_file(const char* fname, size_t& length)

int fd = open(fname, O_RDONLY);

if (fd == -1)
handle_error("open");

struct stat sb;
if (fstat(fd, &sb) == -1)
handle_error("fstat");
length = sb.st_size;

const char* addr = static_cast<const char*>(mmap(NULL, length, PROT_READ, MAP_PRIVATE, fd, 0u));
if (addr == MAP_FAILED)
handle_error("mmap");

return addr;



Now, I know I can open GZIP file with something like:



#include <fstream>
#include <iostream>
#include <sstream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>

//NB: devo linkare a libreria boost zlib in comando: c++ --std=c++11 -L/opt/X11/lib -lboost_iostreams -lz gzread.cpp -o gzread

using namespace std;
using namespace boost::iostreams;

int main()
ios_base::binary);
filtering_streambuf<input> inbuf; //iniziallizzo filtering_streambuf inbuf
inbuf.push(gzip_decompressor()); //ci metto dentro decompressore GZIP (se file GZIP)
inbuf.push(file); //ci metto dentro file

//Convert streambuf to istream
std::istream instream(&inbuf);
//Iterate lines
std::string line;

string chr;

while(std::getline(instream, line))
istringstream iss(line); // string stream della linea
int i = 0;
while (getline(iss, line, t)) // read first part up to comma, ignore the comma (il terzo arfomento di getline gli indica dove fermarsi, se assente si ferma a newline)
if (i == 2) cout << line << "n";
++i;


// copy(inbuf, cout); //copio in stdout



here an example of a file row:



chr1 1246301 . A C 4 OffTarget;LowGQX SNVSB=0.0;SNVHPOL=2;phyloP=1.096;CSQT=1|ACAP3|NM_030649.2|upstream_gene_variant,1|PUSL1|NM_153339.1|missense_variant,1|CPSF3L|NM_001256456.1|downstream_gene_variant GT:GQ:GQX:DP:DPF:AD:PL 0/1:3:0:1:0:0,1:37,3,0



Is there a way to combine them? Or even other approaches if they can be more "performant".



Thanks a lot for any suggestion!










share|improve this question
























  • "performant" is not a word. Anyway, what are you trying to achieve?

    – Jesper Juhl
    Mar 21 at 20:06











  • I need to retain only some part of the original rows based on the value of some columns/fields (ex. if (gqxi < 2 || dpi < 2 || qi < 2) continue; if (stoi(gqxValue) < 30) continue;) to later pass them and compare with other programs filtering of similar file but from different subjects, in order to extract similarities/differences. The other files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods.

    – cccnrc
    Mar 21 at 20:08












  • Why is that info not part of the question?

    – Jesper Juhl
    Mar 21 at 20:10











  • completely right, edited

    – cccnrc
    Mar 21 at 20:13













0












0








0








I am quite new to C++ so sorry if I ask something silly, but I found no answer online (only a post that refers to python (Can mmap and gzip collaborate?)), trying to see if is possible to read a .GZ file through mmap() function (following: Fast textfile reading in c++) in order to operate some operations on the file and write it down on another file.
I need to retain only some part of the original rows and columns based on some columns/fields values to later retrieve them and compare with other similar files but from different subjects, in order to extract similarities/differences. The files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods for GZIP files.
It is more a "performance comparison" with other methods. Here is it the code (sorry, it is long and I think awful):



#include <algorithm>
#include <iostream>
#include <cstring>
#include <vector>
#include <typeinfo>

// for mmap:
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>

//for writefile
#include <fstream>

template <int N>
void emptyArray( char (&arrayName) [N] )
std::fill( std::begin( arrayName ), std::end( arrayName ), 0 );


const char* map_file(const char* fname, size_t& length);

int main()

//prende la dimensione del file da aprire
size_t length;
auto f = map_file("myfile.vcf", length);
auto l = f + length;


uintmax_t m_numLines = 0;

std::vector<int> v0;
std::vector<int> v1;
std::vector<int> v2;

for (int i=1; i<length; i++)
//vettore di posizioni # in prima posizione di una linea
if (f[i] == '#' && f[i-1] == 'n') v0.push_back(i);
//vettore di nuove linee
if (f[i] == 'n') v1.push_back(i+1);


std::vector<int> inter;
set_intersection(v0.begin(), v0.end(),
v1.begin(), v1.end(),
back_inserter(inter));

v1.erase(set_difference(v1.begin(), v1.end(),
inter.begin(), inter.end(),
v1.begin()), v1.end());

v1.pop_back();


char chromArray[3];
char posArray[10];
char refArray[50];
char altArray[50];
char qualityArray[10];
char gtArray[4];
char gqxArray[5];
char dpArray[5];
char adArray[10];

//LOOP per NUM RIGA
//apro loop su vettore NL (non #)
for (int nl =0; nl<v1.size(); nl++) qi < 2) continue;
if (stoi(gqxValue) < 30) continue;

std::ofstream myfile ("myRes.txt", std::ios_base::app);
if (myfile.is_open())
myfile <<
nl << "t" <<
chromValue << "-" << posValue << "-" << refValue << "-" << altValue << "t" <<
gtValue << "t" <<
gqxValue << "t" <<
quality << "t" <<
dpValue << "t" <<
adValue <<
"n";
myfile.close();
else
std::cout << "Unable to open file" << 'n';





void handle_error(const char* msg)
perror(msg);
exit(255);


const char* map_file(const char* fname, size_t& length)

int fd = open(fname, O_RDONLY);

if (fd == -1)
handle_error("open");

struct stat sb;
if (fstat(fd, &sb) == -1)
handle_error("fstat");
length = sb.st_size;

const char* addr = static_cast<const char*>(mmap(NULL, length, PROT_READ, MAP_PRIVATE, fd, 0u));
if (addr == MAP_FAILED)
handle_error("mmap");

return addr;



Now, I know I can open GZIP file with something like:



#include <fstream>
#include <iostream>
#include <sstream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>

//NB: devo linkare a libreria boost zlib in comando: c++ --std=c++11 -L/opt/X11/lib -lboost_iostreams -lz gzread.cpp -o gzread

using namespace std;
using namespace boost::iostreams;

int main()
ios_base::binary);
filtering_streambuf<input> inbuf; //iniziallizzo filtering_streambuf inbuf
inbuf.push(gzip_decompressor()); //ci metto dentro decompressore GZIP (se file GZIP)
inbuf.push(file); //ci metto dentro file

//Convert streambuf to istream
std::istream instream(&inbuf);
//Iterate lines
std::string line;

string chr;

while(std::getline(instream, line))
istringstream iss(line); // string stream della linea
int i = 0;
while (getline(iss, line, t)) // read first part up to comma, ignore the comma (il terzo arfomento di getline gli indica dove fermarsi, se assente si ferma a newline)
if (i == 2) cout << line << "n";
++i;


// copy(inbuf, cout); //copio in stdout



here an example of a file row:



chr1 1246301 . A C 4 OffTarget;LowGQX SNVSB=0.0;SNVHPOL=2;phyloP=1.096;CSQT=1|ACAP3|NM_030649.2|upstream_gene_variant,1|PUSL1|NM_153339.1|missense_variant,1|CPSF3L|NM_001256456.1|downstream_gene_variant GT:GQ:GQX:DP:DPF:AD:PL 0/1:3:0:1:0:0,1:37,3,0



Is there a way to combine them? Or even other approaches if they can be more "performant".



Thanks a lot for any suggestion!










share|improve this question
















I am quite new to C++ so sorry if I ask something silly, but I found no answer online (only a post that refers to python (Can mmap and gzip collaborate?)), trying to see if is possible to read a .GZ file through mmap() function (following: Fast textfile reading in c++) in order to operate some operations on the file and write it down on another file.
I need to retain only some part of the original rows and columns based on some columns/fields values to later retrieve them and compare with other similar files but from different subjects, in order to extract similarities/differences. The files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods for GZIP files.
It is more a "performance comparison" with other methods. Here is it the code (sorry, it is long and I think awful):



#include <algorithm>
#include <iostream>
#include <cstring>
#include <vector>
#include <typeinfo>

// for mmap:
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>

//for writefile
#include <fstream>

template <int N>
void emptyArray( char (&arrayName) [N] )
std::fill( std::begin( arrayName ), std::end( arrayName ), 0 );


const char* map_file(const char* fname, size_t& length);

int main()

//prende la dimensione del file da aprire
size_t length;
auto f = map_file("myfile.vcf", length);
auto l = f + length;


uintmax_t m_numLines = 0;

std::vector<int> v0;
std::vector<int> v1;
std::vector<int> v2;

for (int i=1; i<length; i++)
//vettore di posizioni # in prima posizione di una linea
if (f[i] == '#' && f[i-1] == 'n') v0.push_back(i);
//vettore di nuove linee
if (f[i] == 'n') v1.push_back(i+1);


std::vector<int> inter;
set_intersection(v0.begin(), v0.end(),
v1.begin(), v1.end(),
back_inserter(inter));

v1.erase(set_difference(v1.begin(), v1.end(),
inter.begin(), inter.end(),
v1.begin()), v1.end());

v1.pop_back();


char chromArray[3];
char posArray[10];
char refArray[50];
char altArray[50];
char qualityArray[10];
char gtArray[4];
char gqxArray[5];
char dpArray[5];
char adArray[10];

//LOOP per NUM RIGA
//apro loop su vettore NL (non #)
for (int nl =0; nl<v1.size(); nl++) qi < 2) continue;
if (stoi(gqxValue) < 30) continue;

std::ofstream myfile ("myRes.txt", std::ios_base::app);
if (myfile.is_open())
myfile <<
nl << "t" <<
chromValue << "-" << posValue << "-" << refValue << "-" << altValue << "t" <<
gtValue << "t" <<
gqxValue << "t" <<
quality << "t" <<
dpValue << "t" <<
adValue <<
"n";
myfile.close();
else
std::cout << "Unable to open file" << 'n';





void handle_error(const char* msg)
perror(msg);
exit(255);


const char* map_file(const char* fname, size_t& length)

int fd = open(fname, O_RDONLY);

if (fd == -1)
handle_error("open");

struct stat sb;
if (fstat(fd, &sb) == -1)
handle_error("fstat");
length = sb.st_size;

const char* addr = static_cast<const char*>(mmap(NULL, length, PROT_READ, MAP_PRIVATE, fd, 0u));
if (addr == MAP_FAILED)
handle_error("mmap");

return addr;



Now, I know I can open GZIP file with something like:



#include <fstream>
#include <iostream>
#include <sstream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>

//NB: devo linkare a libreria boost zlib in comando: c++ --std=c++11 -L/opt/X11/lib -lboost_iostreams -lz gzread.cpp -o gzread

using namespace std;
using namespace boost::iostreams;

int main()
ios_base::binary);
filtering_streambuf<input> inbuf; //iniziallizzo filtering_streambuf inbuf
inbuf.push(gzip_decompressor()); //ci metto dentro decompressore GZIP (se file GZIP)
inbuf.push(file); //ci metto dentro file

//Convert streambuf to istream
std::istream instream(&inbuf);
//Iterate lines
std::string line;

string chr;

while(std::getline(instream, line))
istringstream iss(line); // string stream della linea
int i = 0;
while (getline(iss, line, t)) // read first part up to comma, ignore the comma (il terzo arfomento di getline gli indica dove fermarsi, se assente si ferma a newline)
if (i == 2) cout << line << "n";
++i;


// copy(inbuf, cout); //copio in stdout



here an example of a file row:



chr1 1246301 . A C 4 OffTarget;LowGQX SNVSB=0.0;SNVHPOL=2;phyloP=1.096;CSQT=1|ACAP3|NM_030649.2|upstream_gene_variant,1|PUSL1|NM_153339.1|missense_variant,1|CPSF3L|NM_001256456.1|downstream_gene_variant GT:GQ:GQX:DP:DPF:AD:PL 0/1:3:0:1:0:0,1:37,3,0



Is there a way to combine them? Or even other approaches if they can be more "performant".



Thanks a lot for any suggestion!







c++ performance boost gzip mmap






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 21 at 20:13







cccnrc

















asked Mar 21 at 19:53









cccnrccccnrc

799




799












  • "performant" is not a word. Anyway, what are you trying to achieve?

    – Jesper Juhl
    Mar 21 at 20:06











  • I need to retain only some part of the original rows based on the value of some columns/fields (ex. if (gqxi < 2 || dpi < 2 || qi < 2) continue; if (stoi(gqxValue) < 30) continue;) to later pass them and compare with other programs filtering of similar file but from different subjects, in order to extract similarities/differences. The other files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods.

    – cccnrc
    Mar 21 at 20:08












  • Why is that info not part of the question?

    – Jesper Juhl
    Mar 21 at 20:10











  • completely right, edited

    – cccnrc
    Mar 21 at 20:13

















  • "performant" is not a word. Anyway, what are you trying to achieve?

    – Jesper Juhl
    Mar 21 at 20:06











  • I need to retain only some part of the original rows based on the value of some columns/fields (ex. if (gqxi < 2 || dpi < 2 || qi < 2) continue; if (stoi(gqxValue) < 30) continue;) to later pass them and compare with other programs filtering of similar file but from different subjects, in order to extract similarities/differences. The other files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods.

    – cccnrc
    Mar 21 at 20:08












  • Why is that info not part of the question?

    – Jesper Juhl
    Mar 21 at 20:10











  • completely right, edited

    – cccnrc
    Mar 21 at 20:13
















"performant" is not a word. Anyway, what are you trying to achieve?

– Jesper Juhl
Mar 21 at 20:06





"performant" is not a word. Anyway, what are you trying to achieve?

– Jesper Juhl
Mar 21 at 20:06













I need to retain only some part of the original rows based on the value of some columns/fields (ex. if (gqxi < 2 || dpi < 2 || qi < 2) continue; if (stoi(gqxValue) < 30) continue;) to later pass them and compare with other programs filtering of similar file but from different subjects, in order to extract similarities/differences. The other files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods.

– cccnrc
Mar 21 at 20:08






I need to retain only some part of the original rows based on the value of some columns/fields (ex. if (gqxi < 2 || dpi < 2 || qi < 2) continue; if (stoi(gqxValue) < 30) continue;) to later pass them and compare with other programs filtering of similar file but from different subjects, in order to extract similarities/differences. The other files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods.

– cccnrc
Mar 21 at 20:08














Why is that info not part of the question?

– Jesper Juhl
Mar 21 at 20:10





Why is that info not part of the question?

– Jesper Juhl
Mar 21 at 20:10













completely right, edited

– cccnrc
Mar 21 at 20:13





completely right, edited

– cccnrc
Mar 21 at 20:13












1 Answer
1






active

oldest

votes


















0














You can read a memory-mapped gzip file with zlib's inflate() functions. (Read the documentation in zlib.h.)



However regardless of whether you read from the file or read from the memory map, you cannot jump around the uncompressed data. The uncompressed data must be processed sequentially, or saved sequentially for later random access processing.






share|improve this answer























  • Thank you for your reply, I tried the library but it seems that a real time decompression-reading with buffer is faster...

    – cccnrc
    Mar 23 at 3:00











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55288310%2fc-mmap-to-fast-read-coupling-with-gzip-file%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














You can read a memory-mapped gzip file with zlib's inflate() functions. (Read the documentation in zlib.h.)



However regardless of whether you read from the file or read from the memory map, you cannot jump around the uncompressed data. The uncompressed data must be processed sequentially, or saved sequentially for later random access processing.






share|improve this answer























  • Thank you for your reply, I tried the library but it seems that a real time decompression-reading with buffer is faster...

    – cccnrc
    Mar 23 at 3:00















0














You can read a memory-mapped gzip file with zlib's inflate() functions. (Read the documentation in zlib.h.)



However regardless of whether you read from the file or read from the memory map, you cannot jump around the uncompressed data. The uncompressed data must be processed sequentially, or saved sequentially for later random access processing.






share|improve this answer























  • Thank you for your reply, I tried the library but it seems that a real time decompression-reading with buffer is faster...

    – cccnrc
    Mar 23 at 3:00













0












0








0







You can read a memory-mapped gzip file with zlib's inflate() functions. (Read the documentation in zlib.h.)



However regardless of whether you read from the file or read from the memory map, you cannot jump around the uncompressed data. The uncompressed data must be processed sequentially, or saved sequentially for later random access processing.






share|improve this answer













You can read a memory-mapped gzip file with zlib's inflate() functions. (Read the documentation in zlib.h.)



However regardless of whether you read from the file or read from the memory map, you cannot jump around the uncompressed data. The uncompressed data must be processed sequentially, or saved sequentially for later random access processing.







share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 21 at 23:43









Mark AdlerMark Adler

59.7k867112




59.7k867112












  • Thank you for your reply, I tried the library but it seems that a real time decompression-reading with buffer is faster...

    – cccnrc
    Mar 23 at 3:00

















  • Thank you for your reply, I tried the library but it seems that a real time decompression-reading with buffer is faster...

    – cccnrc
    Mar 23 at 3:00
















Thank you for your reply, I tried the library but it seems that a real time decompression-reading with buffer is faster...

– cccnrc
Mar 23 at 3:00





Thank you for your reply, I tried the library but it seems that a real time decompression-reading with buffer is faster...

– cccnrc
Mar 23 at 3:00



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55288310%2fc-mmap-to-fast-read-coupling-with-gzip-file%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript