c++ mmap to “fast” read coupling with gzip file The Next CEO of Stack OverflowFast textfile reading in c++Can mmap and gzip collaborate?mmap() vs. reading blocksiostream linker errorRead whole ASCII file into C++ std::stringHow to read a large text file line by line using Java?Read file line by line using ifstream in C++Why is reading lines from stdin much slower in C++ than Python?Compile header file and two .cpp files in Unix/Linux (Ubuntu)Opening any text fileGetting Error when reading file in string using C++Builder Projectfatal error: 'boost/test/unit_test.hpp' file not found

"Eavesdropping" vs "Listen in on"

How to implement Comparable so it is consistent with identity-equality

Why did early computer designers eschew integers?

How seriously should I take size and weight limits of hand luggage?

Is a linearly independent set whose span is dense a Schauder basis?

Planeswalker Ability and Death Timing

Airship steam engine room - problems and conflict

How should I connect my cat5 cable to connectors having an orange-green line?

Raspberry pi 3 B with Ubuntu 18.04 server arm64: what pi version

Is it okay to majorly distort historical facts while writing a fiction story?

Prodigo = pro + ago?

Traveling with my 5 year old daughter (as the father) without the mother from Germany to Mexico

Compensation for working overtime on Saturdays

Another proof that dividing by 0 does not exist -- is it right?

Is it a bad idea to plug the other end of ESD strap to wall ground?

Do I need to write [sic] when including a quotation with a number less than 10 that isn't written out?

Upgrading From a 9 Speed Sora Derailleur?

That's an odd coin - I wonder why

How can a day be of 24 hours?

Can you teleport closer to a creature you are Frightened of?

logical reads on global temp table, but not on session-level temp table

Is it OK to decorate a log book cover?

Oldie but Goldie

Avoiding the "not like other girls" trope?



c++ mmap to “fast” read coupling with gzip file



The Next CEO of Stack OverflowFast textfile reading in c++Can mmap and gzip collaborate?mmap() vs. reading blocksiostream linker errorRead whole ASCII file into C++ std::stringHow to read a large text file line by line using Java?Read file line by line using ifstream in C++Why is reading lines from stdin much slower in C++ than Python?Compile header file and two .cpp files in Unix/Linux (Ubuntu)Opening any text fileGetting Error when reading file in string using C++Builder Projectfatal error: 'boost/test/unit_test.hpp' file not found










0















I am quite new to C++ so sorry if I ask something silly, but I found no answer online (only a post that refers to python (Can mmap and gzip collaborate?)), trying to see if is possible to read a .GZ file through mmap() function (following: Fast textfile reading in c++) in order to operate some operations on the file and write it down on another file.
I need to retain only some part of the original rows and columns based on some columns/fields values to later retrieve them and compare with other similar files but from different subjects, in order to extract similarities/differences. The files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods for GZIP files.
It is more a "performance comparison" with other methods. Here is it the code (sorry, it is long and I think awful):



#include <algorithm>
#include <iostream>
#include <cstring>
#include <vector>
#include <typeinfo>

// for mmap:
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>

//for writefile
#include <fstream>

template <int N>
void emptyArray( char (&arrayName) [N] )
std::fill( std::begin( arrayName ), std::end( arrayName ), 0 );


const char* map_file(const char* fname, size_t& length);

int main()

//prende la dimensione del file da aprire
size_t length;
auto f = map_file("myfile.vcf", length);
auto l = f + length;


uintmax_t m_numLines = 0;

std::vector<int> v0;
std::vector<int> v1;
std::vector<int> v2;

for (int i=1; i<length; i++)
//vettore di posizioni # in prima posizione di una linea
if (f[i] == '#' && f[i-1] == 'n') v0.push_back(i);
//vettore di nuove linee
if (f[i] == 'n') v1.push_back(i+1);


std::vector<int> inter;
set_intersection(v0.begin(), v0.end(),
v1.begin(), v1.end(),
back_inserter(inter));

v1.erase(set_difference(v1.begin(), v1.end(),
inter.begin(), inter.end(),
v1.begin()), v1.end());

v1.pop_back();


char chromArray[3];
char posArray[10];
char refArray[50];
char altArray[50];
char qualityArray[10];
char gtArray[4];
char gqxArray[5];
char dpArray[5];
char adArray[10];

//LOOP per NUM RIGA
//apro loop su vettore NL (non #)
for (int nl =0; nl<v1.size(); nl++) qi < 2) continue;
if (stoi(gqxValue) < 30) continue;

std::ofstream myfile ("myRes.txt", std::ios_base::app);
if (myfile.is_open())
myfile <<
nl << "t" <<
chromValue << "-" << posValue << "-" << refValue << "-" << altValue << "t" <<
gtValue << "t" <<
gqxValue << "t" <<
quality << "t" <<
dpValue << "t" <<
adValue <<
"n";
myfile.close();
else
std::cout << "Unable to open file" << 'n';





void handle_error(const char* msg)
perror(msg);
exit(255);


const char* map_file(const char* fname, size_t& length)

int fd = open(fname, O_RDONLY);

if (fd == -1)
handle_error("open");

struct stat sb;
if (fstat(fd, &sb) == -1)
handle_error("fstat");
length = sb.st_size;

const char* addr = static_cast<const char*>(mmap(NULL, length, PROT_READ, MAP_PRIVATE, fd, 0u));
if (addr == MAP_FAILED)
handle_error("mmap");

return addr;



Now, I know I can open GZIP file with something like:



#include <fstream>
#include <iostream>
#include <sstream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>

//NB: devo linkare a libreria boost zlib in comando: c++ --std=c++11 -L/opt/X11/lib -lboost_iostreams -lz gzread.cpp -o gzread

using namespace std;
using namespace boost::iostreams;

int main()
ios_base::binary);
filtering_streambuf<input> inbuf; //iniziallizzo filtering_streambuf inbuf
inbuf.push(gzip_decompressor()); //ci metto dentro decompressore GZIP (se file GZIP)
inbuf.push(file); //ci metto dentro file

//Convert streambuf to istream
std::istream instream(&inbuf);
//Iterate lines
std::string line;

string chr;

while(std::getline(instream, line))
istringstream iss(line); // string stream della linea
int i = 0;
while (getline(iss, line, t)) // read first part up to comma, ignore the comma (il terzo arfomento di getline gli indica dove fermarsi, se assente si ferma a newline)
if (i == 2) cout << line << "n";
++i;


// copy(inbuf, cout); //copio in stdout



here an example of a file row:



chr1 1246301 . A C 4 OffTarget;LowGQX SNVSB=0.0;SNVHPOL=2;phyloP=1.096;CSQT=1|ACAP3|NM_030649.2|upstream_gene_variant,1|PUSL1|NM_153339.1|missense_variant,1|CPSF3L|NM_001256456.1|downstream_gene_variant GT:GQ:GQX:DP:DPF:AD:PL 0/1:3:0:1:0:0,1:37,3,0



Is there a way to combine them? Or even other approaches if they can be more "performant".



Thanks a lot for any suggestion!










share|improve this question
























  • "performant" is not a word. Anyway, what are you trying to achieve?

    – Jesper Juhl
    Mar 21 at 20:06











  • I need to retain only some part of the original rows based on the value of some columns/fields (ex. if (gqxi < 2 || dpi < 2 || qi < 2) continue; if (stoi(gqxValue) < 30) continue;) to later pass them and compare with other programs filtering of similar file but from different subjects, in order to extract similarities/differences. The other files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods.

    – cccnrc
    Mar 21 at 20:08












  • Why is that info not part of the question?

    – Jesper Juhl
    Mar 21 at 20:10











  • completely right, edited

    – cccnrc
    Mar 21 at 20:13















0















I am quite new to C++ so sorry if I ask something silly, but I found no answer online (only a post that refers to python (Can mmap and gzip collaborate?)), trying to see if is possible to read a .GZ file through mmap() function (following: Fast textfile reading in c++) in order to operate some operations on the file and write it down on another file.
I need to retain only some part of the original rows and columns based on some columns/fields values to later retrieve them and compare with other similar files but from different subjects, in order to extract similarities/differences. The files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods for GZIP files.
It is more a "performance comparison" with other methods. Here is it the code (sorry, it is long and I think awful):



#include <algorithm>
#include <iostream>
#include <cstring>
#include <vector>
#include <typeinfo>

// for mmap:
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>

//for writefile
#include <fstream>

template <int N>
void emptyArray( char (&arrayName) [N] )
std::fill( std::begin( arrayName ), std::end( arrayName ), 0 );


const char* map_file(const char* fname, size_t& length);

int main()

//prende la dimensione del file da aprire
size_t length;
auto f = map_file("myfile.vcf", length);
auto l = f + length;


uintmax_t m_numLines = 0;

std::vector<int> v0;
std::vector<int> v1;
std::vector<int> v2;

for (int i=1; i<length; i++)
//vettore di posizioni # in prima posizione di una linea
if (f[i] == '#' && f[i-1] == 'n') v0.push_back(i);
//vettore di nuove linee
if (f[i] == 'n') v1.push_back(i+1);


std::vector<int> inter;
set_intersection(v0.begin(), v0.end(),
v1.begin(), v1.end(),
back_inserter(inter));

v1.erase(set_difference(v1.begin(), v1.end(),
inter.begin(), inter.end(),
v1.begin()), v1.end());

v1.pop_back();


char chromArray[3];
char posArray[10];
char refArray[50];
char altArray[50];
char qualityArray[10];
char gtArray[4];
char gqxArray[5];
char dpArray[5];
char adArray[10];

//LOOP per NUM RIGA
//apro loop su vettore NL (non #)
for (int nl =0; nl<v1.size(); nl++) qi < 2) continue;
if (stoi(gqxValue) < 30) continue;

std::ofstream myfile ("myRes.txt", std::ios_base::app);
if (myfile.is_open())
myfile <<
nl << "t" <<
chromValue << "-" << posValue << "-" << refValue << "-" << altValue << "t" <<
gtValue << "t" <<
gqxValue << "t" <<
quality << "t" <<
dpValue << "t" <<
adValue <<
"n";
myfile.close();
else
std::cout << "Unable to open file" << 'n';





void handle_error(const char* msg)
perror(msg);
exit(255);


const char* map_file(const char* fname, size_t& length)

int fd = open(fname, O_RDONLY);

if (fd == -1)
handle_error("open");

struct stat sb;
if (fstat(fd, &sb) == -1)
handle_error("fstat");
length = sb.st_size;

const char* addr = static_cast<const char*>(mmap(NULL, length, PROT_READ, MAP_PRIVATE, fd, 0u));
if (addr == MAP_FAILED)
handle_error("mmap");

return addr;



Now, I know I can open GZIP file with something like:



#include <fstream>
#include <iostream>
#include <sstream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>

//NB: devo linkare a libreria boost zlib in comando: c++ --std=c++11 -L/opt/X11/lib -lboost_iostreams -lz gzread.cpp -o gzread

using namespace std;
using namespace boost::iostreams;

int main()
ios_base::binary);
filtering_streambuf<input> inbuf; //iniziallizzo filtering_streambuf inbuf
inbuf.push(gzip_decompressor()); //ci metto dentro decompressore GZIP (se file GZIP)
inbuf.push(file); //ci metto dentro file

//Convert streambuf to istream
std::istream instream(&inbuf);
//Iterate lines
std::string line;

string chr;

while(std::getline(instream, line))
istringstream iss(line); // string stream della linea
int i = 0;
while (getline(iss, line, t)) // read first part up to comma, ignore the comma (il terzo arfomento di getline gli indica dove fermarsi, se assente si ferma a newline)
if (i == 2) cout << line << "n";
++i;


// copy(inbuf, cout); //copio in stdout



here an example of a file row:



chr1 1246301 . A C 4 OffTarget;LowGQX SNVSB=0.0;SNVHPOL=2;phyloP=1.096;CSQT=1|ACAP3|NM_030649.2|upstream_gene_variant,1|PUSL1|NM_153339.1|missense_variant,1|CPSF3L|NM_001256456.1|downstream_gene_variant GT:GQ:GQX:DP:DPF:AD:PL 0/1:3:0:1:0:0,1:37,3,0



Is there a way to combine them? Or even other approaches if they can be more "performant".



Thanks a lot for any suggestion!










share|improve this question
























  • "performant" is not a word. Anyway, what are you trying to achieve?

    – Jesper Juhl
    Mar 21 at 20:06











  • I need to retain only some part of the original rows based on the value of some columns/fields (ex. if (gqxi < 2 || dpi < 2 || qi < 2) continue; if (stoi(gqxValue) < 30) continue;) to later pass them and compare with other programs filtering of similar file but from different subjects, in order to extract similarities/differences. The other files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods.

    – cccnrc
    Mar 21 at 20:08












  • Why is that info not part of the question?

    – Jesper Juhl
    Mar 21 at 20:10











  • completely right, edited

    – cccnrc
    Mar 21 at 20:13













0












0








0








I am quite new to C++ so sorry if I ask something silly, but I found no answer online (only a post that refers to python (Can mmap and gzip collaborate?)), trying to see if is possible to read a .GZ file through mmap() function (following: Fast textfile reading in c++) in order to operate some operations on the file and write it down on another file.
I need to retain only some part of the original rows and columns based on some columns/fields values to later retrieve them and compare with other similar files but from different subjects, in order to extract similarities/differences. The files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods for GZIP files.
It is more a "performance comparison" with other methods. Here is it the code (sorry, it is long and I think awful):



#include <algorithm>
#include <iostream>
#include <cstring>
#include <vector>
#include <typeinfo>

// for mmap:
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>

//for writefile
#include <fstream>

template <int N>
void emptyArray( char (&arrayName) [N] )
std::fill( std::begin( arrayName ), std::end( arrayName ), 0 );


const char* map_file(const char* fname, size_t& length);

int main()

//prende la dimensione del file da aprire
size_t length;
auto f = map_file("myfile.vcf", length);
auto l = f + length;


uintmax_t m_numLines = 0;

std::vector<int> v0;
std::vector<int> v1;
std::vector<int> v2;

for (int i=1; i<length; i++)
//vettore di posizioni # in prima posizione di una linea
if (f[i] == '#' && f[i-1] == 'n') v0.push_back(i);
//vettore di nuove linee
if (f[i] == 'n') v1.push_back(i+1);


std::vector<int> inter;
set_intersection(v0.begin(), v0.end(),
v1.begin(), v1.end(),
back_inserter(inter));

v1.erase(set_difference(v1.begin(), v1.end(),
inter.begin(), inter.end(),
v1.begin()), v1.end());

v1.pop_back();


char chromArray[3];
char posArray[10];
char refArray[50];
char altArray[50];
char qualityArray[10];
char gtArray[4];
char gqxArray[5];
char dpArray[5];
char adArray[10];

//LOOP per NUM RIGA
//apro loop su vettore NL (non #)
for (int nl =0; nl<v1.size(); nl++) qi < 2) continue;
if (stoi(gqxValue) < 30) continue;

std::ofstream myfile ("myRes.txt", std::ios_base::app);
if (myfile.is_open())
myfile <<
nl << "t" <<
chromValue << "-" << posValue << "-" << refValue << "-" << altValue << "t" <<
gtValue << "t" <<
gqxValue << "t" <<
quality << "t" <<
dpValue << "t" <<
adValue <<
"n";
myfile.close();
else
std::cout << "Unable to open file" << 'n';





void handle_error(const char* msg)
perror(msg);
exit(255);


const char* map_file(const char* fname, size_t& length)

int fd = open(fname, O_RDONLY);

if (fd == -1)
handle_error("open");

struct stat sb;
if (fstat(fd, &sb) == -1)
handle_error("fstat");
length = sb.st_size;

const char* addr = static_cast<const char*>(mmap(NULL, length, PROT_READ, MAP_PRIVATE, fd, 0u));
if (addr == MAP_FAILED)
handle_error("mmap");

return addr;



Now, I know I can open GZIP file with something like:



#include <fstream>
#include <iostream>
#include <sstream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>

//NB: devo linkare a libreria boost zlib in comando: c++ --std=c++11 -L/opt/X11/lib -lboost_iostreams -lz gzread.cpp -o gzread

using namespace std;
using namespace boost::iostreams;

int main()
ios_base::binary);
filtering_streambuf<input> inbuf; //iniziallizzo filtering_streambuf inbuf
inbuf.push(gzip_decompressor()); //ci metto dentro decompressore GZIP (se file GZIP)
inbuf.push(file); //ci metto dentro file

//Convert streambuf to istream
std::istream instream(&inbuf);
//Iterate lines
std::string line;

string chr;

while(std::getline(instream, line))
istringstream iss(line); // string stream della linea
int i = 0;
while (getline(iss, line, t)) // read first part up to comma, ignore the comma (il terzo arfomento di getline gli indica dove fermarsi, se assente si ferma a newline)
if (i == 2) cout << line << "n";
++i;


// copy(inbuf, cout); //copio in stdout



here an example of a file row:



chr1 1246301 . A C 4 OffTarget;LowGQX SNVSB=0.0;SNVHPOL=2;phyloP=1.096;CSQT=1|ACAP3|NM_030649.2|upstream_gene_variant,1|PUSL1|NM_153339.1|missense_variant,1|CPSF3L|NM_001256456.1|downstream_gene_variant GT:GQ:GQX:DP:DPF:AD:PL 0/1:3:0:1:0:0,1:37,3,0



Is there a way to combine them? Or even other approaches if they can be more "performant".



Thanks a lot for any suggestion!










share|improve this question
















I am quite new to C++ so sorry if I ask something silly, but I found no answer online (only a post that refers to python (Can mmap and gzip collaborate?)), trying to see if is possible to read a .GZ file through mmap() function (following: Fast textfile reading in c++) in order to operate some operations on the file and write it down on another file.
I need to retain only some part of the original rows and columns based on some columns/fields values to later retrieve them and compare with other similar files but from different subjects, in order to extract similarities/differences. The files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods for GZIP files.
It is more a "performance comparison" with other methods. Here is it the code (sorry, it is long and I think awful):



#include <algorithm>
#include <iostream>
#include <cstring>
#include <vector>
#include <typeinfo>

// for mmap:
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>

//for writefile
#include <fstream>

template <int N>
void emptyArray( char (&arrayName) [N] )
std::fill( std::begin( arrayName ), std::end( arrayName ), 0 );


const char* map_file(const char* fname, size_t& length);

int main()

//prende la dimensione del file da aprire
size_t length;
auto f = map_file("myfile.vcf", length);
auto l = f + length;


uintmax_t m_numLines = 0;

std::vector<int> v0;
std::vector<int> v1;
std::vector<int> v2;

for (int i=1; i<length; i++)
//vettore di posizioni # in prima posizione di una linea
if (f[i] == '#' && f[i-1] == 'n') v0.push_back(i);
//vettore di nuove linee
if (f[i] == 'n') v1.push_back(i+1);


std::vector<int> inter;
set_intersection(v0.begin(), v0.end(),
v1.begin(), v1.end(),
back_inserter(inter));

v1.erase(set_difference(v1.begin(), v1.end(),
inter.begin(), inter.end(),
v1.begin()), v1.end());

v1.pop_back();


char chromArray[3];
char posArray[10];
char refArray[50];
char altArray[50];
char qualityArray[10];
char gtArray[4];
char gqxArray[5];
char dpArray[5];
char adArray[10];

//LOOP per NUM RIGA
//apro loop su vettore NL (non #)
for (int nl =0; nl<v1.size(); nl++) qi < 2) continue;
if (stoi(gqxValue) < 30) continue;

std::ofstream myfile ("myRes.txt", std::ios_base::app);
if (myfile.is_open())
myfile <<
nl << "t" <<
chromValue << "-" << posValue << "-" << refValue << "-" << altValue << "t" <<
gtValue << "t" <<
gqxValue << "t" <<
quality << "t" <<
dpValue << "t" <<
adValue <<
"n";
myfile.close();
else
std::cout << "Unable to open file" << 'n';





void handle_error(const char* msg)
perror(msg);
exit(255);


const char* map_file(const char* fname, size_t& length)

int fd = open(fname, O_RDONLY);

if (fd == -1)
handle_error("open");

struct stat sb;
if (fstat(fd, &sb) == -1)
handle_error("fstat");
length = sb.st_size;

const char* addr = static_cast<const char*>(mmap(NULL, length, PROT_READ, MAP_PRIVATE, fd, 0u));
if (addr == MAP_FAILED)
handle_error("mmap");

return addr;



Now, I know I can open GZIP file with something like:



#include <fstream>
#include <iostream>
#include <sstream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>

//NB: devo linkare a libreria boost zlib in comando: c++ --std=c++11 -L/opt/X11/lib -lboost_iostreams -lz gzread.cpp -o gzread

using namespace std;
using namespace boost::iostreams;

int main()
ios_base::binary);
filtering_streambuf<input> inbuf; //iniziallizzo filtering_streambuf inbuf
inbuf.push(gzip_decompressor()); //ci metto dentro decompressore GZIP (se file GZIP)
inbuf.push(file); //ci metto dentro file

//Convert streambuf to istream
std::istream instream(&inbuf);
//Iterate lines
std::string line;

string chr;

while(std::getline(instream, line))
istringstream iss(line); // string stream della linea
int i = 0;
while (getline(iss, line, t)) // read first part up to comma, ignore the comma (il terzo arfomento di getline gli indica dove fermarsi, se assente si ferma a newline)
if (i == 2) cout << line << "n";
++i;


// copy(inbuf, cout); //copio in stdout



here an example of a file row:



chr1 1246301 . A C 4 OffTarget;LowGQX SNVSB=0.0;SNVHPOL=2;phyloP=1.096;CSQT=1|ACAP3|NM_030649.2|upstream_gene_variant,1|PUSL1|NM_153339.1|missense_variant,1|CPSF3L|NM_001256456.1|downstream_gene_variant GT:GQ:GQX:DP:DPF:AD:PL 0/1:3:0:1:0:0,1:37,3,0



Is there a way to combine them? Or even other approaches if they can be more "performant".



Thanks a lot for any suggestion!







c++ performance boost gzip mmap






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 21 at 20:13







cccnrc

















asked Mar 21 at 19:53









cccnrccccnrc

799




799












  • "performant" is not a word. Anyway, what are you trying to achieve?

    – Jesper Juhl
    Mar 21 at 20:06











  • I need to retain only some part of the original rows based on the value of some columns/fields (ex. if (gqxi < 2 || dpi < 2 || qi < 2) continue; if (stoi(gqxValue) < 30) continue;) to later pass them and compare with other programs filtering of similar file but from different subjects, in order to extract similarities/differences. The other files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods.

    – cccnrc
    Mar 21 at 20:08












  • Why is that info not part of the question?

    – Jesper Juhl
    Mar 21 at 20:10











  • completely right, edited

    – cccnrc
    Mar 21 at 20:13

















  • "performant" is not a word. Anyway, what are you trying to achieve?

    – Jesper Juhl
    Mar 21 at 20:06











  • I need to retain only some part of the original rows based on the value of some columns/fields (ex. if (gqxi < 2 || dpi < 2 || qi < 2) continue; if (stoi(gqxValue) < 30) continue;) to later pass them and compare with other programs filtering of similar file but from different subjects, in order to extract similarities/differences. The other files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods.

    – cccnrc
    Mar 21 at 20:08












  • Why is that info not part of the question?

    – Jesper Juhl
    Mar 21 at 20:10











  • completely right, edited

    – cccnrc
    Mar 21 at 20:13
















"performant" is not a word. Anyway, what are you trying to achieve?

– Jesper Juhl
Mar 21 at 20:06





"performant" is not a word. Anyway, what are you trying to achieve?

– Jesper Juhl
Mar 21 at 20:06













I need to retain only some part of the original rows based on the value of some columns/fields (ex. if (gqxi < 2 || dpi < 2 || qi < 2) continue; if (stoi(gqxValue) < 30) continue;) to later pass them and compare with other programs filtering of similar file but from different subjects, in order to extract similarities/differences. The other files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods.

– cccnrc
Mar 21 at 20:08






I need to retain only some part of the original rows based on the value of some columns/fields (ex. if (gqxi < 2 || dpi < 2 || qi < 2) continue; if (stoi(gqxValue) < 30) continue;) to later pass them and compare with other programs filtering of similar file but from different subjects, in order to extract similarities/differences. The other files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods.

– cccnrc
Mar 21 at 20:08














Why is that info not part of the question?

– Jesper Juhl
Mar 21 at 20:10





Why is that info not part of the question?

– Jesper Juhl
Mar 21 at 20:10













completely right, edited

– cccnrc
Mar 21 at 20:13





completely right, edited

– cccnrc
Mar 21 at 20:13












1 Answer
1






active

oldest

votes


















0














You can read a memory-mapped gzip file with zlib's inflate() functions. (Read the documentation in zlib.h.)



However regardless of whether you read from the file or read from the memory map, you cannot jump around the uncompressed data. The uncompressed data must be processed sequentially, or saved sequentially for later random access processing.






share|improve this answer























  • Thank you for your reply, I tried the library but it seems that a real time decompression-reading with buffer is faster...

    – cccnrc
    Mar 23 at 3:00











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55288310%2fc-mmap-to-fast-read-coupling-with-gzip-file%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














You can read a memory-mapped gzip file with zlib's inflate() functions. (Read the documentation in zlib.h.)



However regardless of whether you read from the file or read from the memory map, you cannot jump around the uncompressed data. The uncompressed data must be processed sequentially, or saved sequentially for later random access processing.






share|improve this answer























  • Thank you for your reply, I tried the library but it seems that a real time decompression-reading with buffer is faster...

    – cccnrc
    Mar 23 at 3:00















0














You can read a memory-mapped gzip file with zlib's inflate() functions. (Read the documentation in zlib.h.)



However regardless of whether you read from the file or read from the memory map, you cannot jump around the uncompressed data. The uncompressed data must be processed sequentially, or saved sequentially for later random access processing.






share|improve this answer























  • Thank you for your reply, I tried the library but it seems that a real time decompression-reading with buffer is faster...

    – cccnrc
    Mar 23 at 3:00













0












0








0







You can read a memory-mapped gzip file with zlib's inflate() functions. (Read the documentation in zlib.h.)



However regardless of whether you read from the file or read from the memory map, you cannot jump around the uncompressed data. The uncompressed data must be processed sequentially, or saved sequentially for later random access processing.






share|improve this answer













You can read a memory-mapped gzip file with zlib's inflate() functions. (Read the documentation in zlib.h.)



However regardless of whether you read from the file or read from the memory map, you cannot jump around the uncompressed data. The uncompressed data must be processed sequentially, or saved sequentially for later random access processing.







share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 21 at 23:43









Mark AdlerMark Adler

59.7k867112




59.7k867112












  • Thank you for your reply, I tried the library but it seems that a real time decompression-reading with buffer is faster...

    – cccnrc
    Mar 23 at 3:00

















  • Thank you for your reply, I tried the library but it seems that a real time decompression-reading with buffer is faster...

    – cccnrc
    Mar 23 at 3:00
















Thank you for your reply, I tried the library but it seems that a real time decompression-reading with buffer is faster...

– cccnrc
Mar 23 at 3:00





Thank you for your reply, I tried the library but it seems that a real time decompression-reading with buffer is faster...

– cccnrc
Mar 23 at 3:00



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55288310%2fc-mmap-to-fast-read-coupling-with-gzip-file%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현