c++ mmap to “fast” read coupling with gzip file The Next CEO of Stack OverflowFast textfile reading in c++Can mmap and gzip collaborate?mmap() vs. reading blocksiostream linker errorRead whole ASCII file into C++ std::stringHow to read a large text file line by line using Java?Read file line by line using ifstream in C++Why is reading lines from stdin much slower in C++ than Python?Compile header file and two .cpp files in Unix/Linux (Ubuntu)Opening any text fileGetting Error when reading file in string using C++Builder Projectfatal error: 'boost/test/unit_test.hpp' file not found
"Eavesdropping" vs "Listen in on"
How to implement Comparable so it is consistent with identity-equality
Why did early computer designers eschew integers?
How seriously should I take size and weight limits of hand luggage?
Is a linearly independent set whose span is dense a Schauder basis?
Planeswalker Ability and Death Timing
Airship steam engine room - problems and conflict
How should I connect my cat5 cable to connectors having an orange-green line?
Raspberry pi 3 B with Ubuntu 18.04 server arm64: what pi version
Is it okay to majorly distort historical facts while writing a fiction story?
Prodigo = pro + ago?
Traveling with my 5 year old daughter (as the father) without the mother from Germany to Mexico
Compensation for working overtime on Saturdays
Another proof that dividing by 0 does not exist -- is it right?
Is it a bad idea to plug the other end of ESD strap to wall ground?
Do I need to write [sic] when including a quotation with a number less than 10 that isn't written out?
Upgrading From a 9 Speed Sora Derailleur?
That's an odd coin - I wonder why
How can a day be of 24 hours?
Can you teleport closer to a creature you are Frightened of?
logical reads on global temp table, but not on session-level temp table
Is it OK to decorate a log book cover?
Oldie but Goldie
Avoiding the "not like other girls" trope?
c++ mmap to “fast” read coupling with gzip file
The Next CEO of Stack OverflowFast textfile reading in c++Can mmap and gzip collaborate?mmap() vs. reading blocksiostream linker errorRead whole ASCII file into C++ std::stringHow to read a large text file line by line using Java?Read file line by line using ifstream in C++Why is reading lines from stdin much slower in C++ than Python?Compile header file and two .cpp files in Unix/Linux (Ubuntu)Opening any text fileGetting Error when reading file in string using C++Builder Projectfatal error: 'boost/test/unit_test.hpp' file not found
I am quite new to C++ so sorry if I ask something silly, but I found no answer online (only a post that refers to python (Can mmap and gzip collaborate?)), trying to see if is possible to read a .GZ file through mmap() function (following: Fast textfile reading in c++) in order to operate some operations on the file and write it down on another file.
I need to retain only some part of the original rows and columns based on some columns/fields values to later retrieve them and compare with other similar files but from different subjects, in order to extract similarities/differences. The files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods for GZIP files.
It is more a "performance comparison" with other methods. Here is it the code (sorry, it is long and I think awful):
#include <algorithm>
#include <iostream>
#include <cstring>
#include <vector>
#include <typeinfo>
// for mmap:
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
//for writefile
#include <fstream>
template <int N>
void emptyArray( char (&arrayName) [N] )
std::fill( std::begin( arrayName ), std::end( arrayName ), 0 );
const char* map_file(const char* fname, size_t& length);
int main()
//prende la dimensione del file da aprire
size_t length;
auto f = map_file("myfile.vcf", length);
auto l = f + length;
uintmax_t m_numLines = 0;
std::vector<int> v0;
std::vector<int> v1;
std::vector<int> v2;
for (int i=1; i<length; i++)
//vettore di posizioni # in prima posizione di una linea
if (f[i] == '#' && f[i-1] == 'n') v0.push_back(i);
//vettore di nuove linee
if (f[i] == 'n') v1.push_back(i+1);
std::vector<int> inter;
set_intersection(v0.begin(), v0.end(),
v1.begin(), v1.end(),
back_inserter(inter));
v1.erase(set_difference(v1.begin(), v1.end(),
inter.begin(), inter.end(),
v1.begin()), v1.end());
v1.pop_back();
char chromArray[3];
char posArray[10];
char refArray[50];
char altArray[50];
char qualityArray[10];
char gtArray[4];
char gqxArray[5];
char dpArray[5];
char adArray[10];
//LOOP per NUM RIGA
//apro loop su vettore NL (non #)
for (int nl =0; nl<v1.size(); nl++) qi < 2) continue;
if (stoi(gqxValue) < 30) continue;
std::ofstream myfile ("myRes.txt", std::ios_base::app);
if (myfile.is_open())
myfile <<
nl << "t" <<
chromValue << "-" << posValue << "-" << refValue << "-" << altValue << "t" <<
gtValue << "t" <<
gqxValue << "t" <<
quality << "t" <<
dpValue << "t" <<
adValue <<
"n";
myfile.close();
else
std::cout << "Unable to open file" << 'n';
void handle_error(const char* msg)
perror(msg);
exit(255);
const char* map_file(const char* fname, size_t& length)
int fd = open(fname, O_RDONLY);
if (fd == -1)
handle_error("open");
struct stat sb;
if (fstat(fd, &sb) == -1)
handle_error("fstat");
length = sb.st_size;
const char* addr = static_cast<const char*>(mmap(NULL, length, PROT_READ, MAP_PRIVATE, fd, 0u));
if (addr == MAP_FAILED)
handle_error("mmap");
return addr;
Now, I know I can open GZIP file with something like:
#include <fstream>
#include <iostream>
#include <sstream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>
//NB: devo linkare a libreria boost zlib in comando: c++ --std=c++11 -L/opt/X11/lib -lboost_iostreams -lz gzread.cpp -o gzread
using namespace std;
using namespace boost::iostreams;
int main()
ios_base::binary);
filtering_streambuf<input> inbuf; //iniziallizzo filtering_streambuf inbuf
inbuf.push(gzip_decompressor()); //ci metto dentro decompressore GZIP (se file GZIP)
inbuf.push(file); //ci metto dentro file
//Convert streambuf to istream
std::istream instream(&inbuf);
//Iterate lines
std::string line;
string chr;
while(std::getline(instream, line))
istringstream iss(line); // string stream della linea
int i = 0;
while (getline(iss, line, t)) // read first part up to comma, ignore the comma (il terzo arfomento di getline gli indica dove fermarsi, se assente si ferma a newline)
if (i == 2) cout << line << "n";
++i;
// copy(inbuf, cout); //copio in stdout
here an example of a file row:
chr1 1246301 . A C 4 OffTarget;LowGQX SNVSB=0.0;SNVHPOL=2;phyloP=1.096;CSQT=1|ACAP3|NM_030649.2|upstream_gene_variant,1|PUSL1|NM_153339.1|missense_variant,1|CPSF3L|NM_001256456.1|downstream_gene_variant GT:GQ:GQX:DP:DPF:AD:PL 0/1:3:0:1:0:0,1:37,3,0
Is there a way to combine them? Or even other approaches if they can be more "performant".
Thanks a lot for any suggestion!
c++ performance boost gzip mmap
add a comment |
I am quite new to C++ so sorry if I ask something silly, but I found no answer online (only a post that refers to python (Can mmap and gzip collaborate?)), trying to see if is possible to read a .GZ file through mmap() function (following: Fast textfile reading in c++) in order to operate some operations on the file and write it down on another file.
I need to retain only some part of the original rows and columns based on some columns/fields values to later retrieve them and compare with other similar files but from different subjects, in order to extract similarities/differences. The files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods for GZIP files.
It is more a "performance comparison" with other methods. Here is it the code (sorry, it is long and I think awful):
#include <algorithm>
#include <iostream>
#include <cstring>
#include <vector>
#include <typeinfo>
// for mmap:
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
//for writefile
#include <fstream>
template <int N>
void emptyArray( char (&arrayName) [N] )
std::fill( std::begin( arrayName ), std::end( arrayName ), 0 );
const char* map_file(const char* fname, size_t& length);
int main()
//prende la dimensione del file da aprire
size_t length;
auto f = map_file("myfile.vcf", length);
auto l = f + length;
uintmax_t m_numLines = 0;
std::vector<int> v0;
std::vector<int> v1;
std::vector<int> v2;
for (int i=1; i<length; i++)
//vettore di posizioni # in prima posizione di una linea
if (f[i] == '#' && f[i-1] == 'n') v0.push_back(i);
//vettore di nuove linee
if (f[i] == 'n') v1.push_back(i+1);
std::vector<int> inter;
set_intersection(v0.begin(), v0.end(),
v1.begin(), v1.end(),
back_inserter(inter));
v1.erase(set_difference(v1.begin(), v1.end(),
inter.begin(), inter.end(),
v1.begin()), v1.end());
v1.pop_back();
char chromArray[3];
char posArray[10];
char refArray[50];
char altArray[50];
char qualityArray[10];
char gtArray[4];
char gqxArray[5];
char dpArray[5];
char adArray[10];
//LOOP per NUM RIGA
//apro loop su vettore NL (non #)
for (int nl =0; nl<v1.size(); nl++) qi < 2) continue;
if (stoi(gqxValue) < 30) continue;
std::ofstream myfile ("myRes.txt", std::ios_base::app);
if (myfile.is_open())
myfile <<
nl << "t" <<
chromValue << "-" << posValue << "-" << refValue << "-" << altValue << "t" <<
gtValue << "t" <<
gqxValue << "t" <<
quality << "t" <<
dpValue << "t" <<
adValue <<
"n";
myfile.close();
else
std::cout << "Unable to open file" << 'n';
void handle_error(const char* msg)
perror(msg);
exit(255);
const char* map_file(const char* fname, size_t& length)
int fd = open(fname, O_RDONLY);
if (fd == -1)
handle_error("open");
struct stat sb;
if (fstat(fd, &sb) == -1)
handle_error("fstat");
length = sb.st_size;
const char* addr = static_cast<const char*>(mmap(NULL, length, PROT_READ, MAP_PRIVATE, fd, 0u));
if (addr == MAP_FAILED)
handle_error("mmap");
return addr;
Now, I know I can open GZIP file with something like:
#include <fstream>
#include <iostream>
#include <sstream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>
//NB: devo linkare a libreria boost zlib in comando: c++ --std=c++11 -L/opt/X11/lib -lboost_iostreams -lz gzread.cpp -o gzread
using namespace std;
using namespace boost::iostreams;
int main()
ios_base::binary);
filtering_streambuf<input> inbuf; //iniziallizzo filtering_streambuf inbuf
inbuf.push(gzip_decompressor()); //ci metto dentro decompressore GZIP (se file GZIP)
inbuf.push(file); //ci metto dentro file
//Convert streambuf to istream
std::istream instream(&inbuf);
//Iterate lines
std::string line;
string chr;
while(std::getline(instream, line))
istringstream iss(line); // string stream della linea
int i = 0;
while (getline(iss, line, t)) // read first part up to comma, ignore the comma (il terzo arfomento di getline gli indica dove fermarsi, se assente si ferma a newline)
if (i == 2) cout << line << "n";
++i;
// copy(inbuf, cout); //copio in stdout
here an example of a file row:
chr1 1246301 . A C 4 OffTarget;LowGQX SNVSB=0.0;SNVHPOL=2;phyloP=1.096;CSQT=1|ACAP3|NM_030649.2|upstream_gene_variant,1|PUSL1|NM_153339.1|missense_variant,1|CPSF3L|NM_001256456.1|downstream_gene_variant GT:GQ:GQX:DP:DPF:AD:PL 0/1:3:0:1:0:0,1:37,3,0
Is there a way to combine them? Or even other approaches if they can be more "performant".
Thanks a lot for any suggestion!
c++ performance boost gzip mmap
"performant" is not a word. Anyway, what are you trying to achieve?
– Jesper Juhl
Mar 21 at 20:06
I need to retain only some part of the original rows based on the value of some columns/fields (ex. if (gqxi < 2 || dpi < 2 || qi < 2) continue; if (stoi(gqxValue) < 30) continue;) to later pass them and compare with other programs filtering of similar file but from different subjects, in order to extract similarities/differences. The other files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods.
– cccnrc
Mar 21 at 20:08
Why is that info not part of the question?
– Jesper Juhl
Mar 21 at 20:10
completely right, edited
– cccnrc
Mar 21 at 20:13
add a comment |
I am quite new to C++ so sorry if I ask something silly, but I found no answer online (only a post that refers to python (Can mmap and gzip collaborate?)), trying to see if is possible to read a .GZ file through mmap() function (following: Fast textfile reading in c++) in order to operate some operations on the file and write it down on another file.
I need to retain only some part of the original rows and columns based on some columns/fields values to later retrieve them and compare with other similar files but from different subjects, in order to extract similarities/differences. The files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods for GZIP files.
It is more a "performance comparison" with other methods. Here is it the code (sorry, it is long and I think awful):
#include <algorithm>
#include <iostream>
#include <cstring>
#include <vector>
#include <typeinfo>
// for mmap:
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
//for writefile
#include <fstream>
template <int N>
void emptyArray( char (&arrayName) [N] )
std::fill( std::begin( arrayName ), std::end( arrayName ), 0 );
const char* map_file(const char* fname, size_t& length);
int main()
//prende la dimensione del file da aprire
size_t length;
auto f = map_file("myfile.vcf", length);
auto l = f + length;
uintmax_t m_numLines = 0;
std::vector<int> v0;
std::vector<int> v1;
std::vector<int> v2;
for (int i=1; i<length; i++)
//vettore di posizioni # in prima posizione di una linea
if (f[i] == '#' && f[i-1] == 'n') v0.push_back(i);
//vettore di nuove linee
if (f[i] == 'n') v1.push_back(i+1);
std::vector<int> inter;
set_intersection(v0.begin(), v0.end(),
v1.begin(), v1.end(),
back_inserter(inter));
v1.erase(set_difference(v1.begin(), v1.end(),
inter.begin(), inter.end(),
v1.begin()), v1.end());
v1.pop_back();
char chromArray[3];
char posArray[10];
char refArray[50];
char altArray[50];
char qualityArray[10];
char gtArray[4];
char gqxArray[5];
char dpArray[5];
char adArray[10];
//LOOP per NUM RIGA
//apro loop su vettore NL (non #)
for (int nl =0; nl<v1.size(); nl++) qi < 2) continue;
if (stoi(gqxValue) < 30) continue;
std::ofstream myfile ("myRes.txt", std::ios_base::app);
if (myfile.is_open())
myfile <<
nl << "t" <<
chromValue << "-" << posValue << "-" << refValue << "-" << altValue << "t" <<
gtValue << "t" <<
gqxValue << "t" <<
quality << "t" <<
dpValue << "t" <<
adValue <<
"n";
myfile.close();
else
std::cout << "Unable to open file" << 'n';
void handle_error(const char* msg)
perror(msg);
exit(255);
const char* map_file(const char* fname, size_t& length)
int fd = open(fname, O_RDONLY);
if (fd == -1)
handle_error("open");
struct stat sb;
if (fstat(fd, &sb) == -1)
handle_error("fstat");
length = sb.st_size;
const char* addr = static_cast<const char*>(mmap(NULL, length, PROT_READ, MAP_PRIVATE, fd, 0u));
if (addr == MAP_FAILED)
handle_error("mmap");
return addr;
Now, I know I can open GZIP file with something like:
#include <fstream>
#include <iostream>
#include <sstream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>
//NB: devo linkare a libreria boost zlib in comando: c++ --std=c++11 -L/opt/X11/lib -lboost_iostreams -lz gzread.cpp -o gzread
using namespace std;
using namespace boost::iostreams;
int main()
ios_base::binary);
filtering_streambuf<input> inbuf; //iniziallizzo filtering_streambuf inbuf
inbuf.push(gzip_decompressor()); //ci metto dentro decompressore GZIP (se file GZIP)
inbuf.push(file); //ci metto dentro file
//Convert streambuf to istream
std::istream instream(&inbuf);
//Iterate lines
std::string line;
string chr;
while(std::getline(instream, line))
istringstream iss(line); // string stream della linea
int i = 0;
while (getline(iss, line, t)) // read first part up to comma, ignore the comma (il terzo arfomento di getline gli indica dove fermarsi, se assente si ferma a newline)
if (i == 2) cout << line << "n";
++i;
// copy(inbuf, cout); //copio in stdout
here an example of a file row:
chr1 1246301 . A C 4 OffTarget;LowGQX SNVSB=0.0;SNVHPOL=2;phyloP=1.096;CSQT=1|ACAP3|NM_030649.2|upstream_gene_variant,1|PUSL1|NM_153339.1|missense_variant,1|CPSF3L|NM_001256456.1|downstream_gene_variant GT:GQ:GQX:DP:DPF:AD:PL 0/1:3:0:1:0:0,1:37,3,0
Is there a way to combine them? Or even other approaches if they can be more "performant".
Thanks a lot for any suggestion!
c++ performance boost gzip mmap
I am quite new to C++ so sorry if I ask something silly, but I found no answer online (only a post that refers to python (Can mmap and gzip collaborate?)), trying to see if is possible to read a .GZ file through mmap() function (following: Fast textfile reading in c++) in order to operate some operations on the file and write it down on another file.
I need to retain only some part of the original rows and columns based on some columns/fields values to later retrieve them and compare with other similar files but from different subjects, in order to extract similarities/differences. The files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods for GZIP files.
It is more a "performance comparison" with other methods. Here is it the code (sorry, it is long and I think awful):
#include <algorithm>
#include <iostream>
#include <cstring>
#include <vector>
#include <typeinfo>
// for mmap:
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
//for writefile
#include <fstream>
template <int N>
void emptyArray( char (&arrayName) [N] )
std::fill( std::begin( arrayName ), std::end( arrayName ), 0 );
const char* map_file(const char* fname, size_t& length);
int main()
//prende la dimensione del file da aprire
size_t length;
auto f = map_file("myfile.vcf", length);
auto l = f + length;
uintmax_t m_numLines = 0;
std::vector<int> v0;
std::vector<int> v1;
std::vector<int> v2;
for (int i=1; i<length; i++)
//vettore di posizioni # in prima posizione di una linea
if (f[i] == '#' && f[i-1] == 'n') v0.push_back(i);
//vettore di nuove linee
if (f[i] == 'n') v1.push_back(i+1);
std::vector<int> inter;
set_intersection(v0.begin(), v0.end(),
v1.begin(), v1.end(),
back_inserter(inter));
v1.erase(set_difference(v1.begin(), v1.end(),
inter.begin(), inter.end(),
v1.begin()), v1.end());
v1.pop_back();
char chromArray[3];
char posArray[10];
char refArray[50];
char altArray[50];
char qualityArray[10];
char gtArray[4];
char gqxArray[5];
char dpArray[5];
char adArray[10];
//LOOP per NUM RIGA
//apro loop su vettore NL (non #)
for (int nl =0; nl<v1.size(); nl++) qi < 2) continue;
if (stoi(gqxValue) < 30) continue;
std::ofstream myfile ("myRes.txt", std::ios_base::app);
if (myfile.is_open())
myfile <<
nl << "t" <<
chromValue << "-" << posValue << "-" << refValue << "-" << altValue << "t" <<
gtValue << "t" <<
gqxValue << "t" <<
quality << "t" <<
dpValue << "t" <<
adValue <<
"n";
myfile.close();
else
std::cout << "Unable to open file" << 'n';
void handle_error(const char* msg)
perror(msg);
exit(255);
const char* map_file(const char* fname, size_t& length)
int fd = open(fname, O_RDONLY);
if (fd == -1)
handle_error("open");
struct stat sb;
if (fstat(fd, &sb) == -1)
handle_error("fstat");
length = sb.st_size;
const char* addr = static_cast<const char*>(mmap(NULL, length, PROT_READ, MAP_PRIVATE, fd, 0u));
if (addr == MAP_FAILED)
handle_error("mmap");
return addr;
Now, I know I can open GZIP file with something like:
#include <fstream>
#include <iostream>
#include <sstream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>
//NB: devo linkare a libreria boost zlib in comando: c++ --std=c++11 -L/opt/X11/lib -lboost_iostreams -lz gzread.cpp -o gzread
using namespace std;
using namespace boost::iostreams;
int main()
ios_base::binary);
filtering_streambuf<input> inbuf; //iniziallizzo filtering_streambuf inbuf
inbuf.push(gzip_decompressor()); //ci metto dentro decompressore GZIP (se file GZIP)
inbuf.push(file); //ci metto dentro file
//Convert streambuf to istream
std::istream instream(&inbuf);
//Iterate lines
std::string line;
string chr;
while(std::getline(instream, line))
istringstream iss(line); // string stream della linea
int i = 0;
while (getline(iss, line, t)) // read first part up to comma, ignore the comma (il terzo arfomento di getline gli indica dove fermarsi, se assente si ferma a newline)
if (i == 2) cout << line << "n";
++i;
// copy(inbuf, cout); //copio in stdout
here an example of a file row:
chr1 1246301 . A C 4 OffTarget;LowGQX SNVSB=0.0;SNVHPOL=2;phyloP=1.096;CSQT=1|ACAP3|NM_030649.2|upstream_gene_variant,1|PUSL1|NM_153339.1|missense_variant,1|CPSF3L|NM_001256456.1|downstream_gene_variant GT:GQ:GQX:DP:DPF:AD:PL 0/1:3:0:1:0:0,1:37,3,0
Is there a way to combine them? Or even other approaches if they can be more "performant".
Thanks a lot for any suggestion!
c++ performance boost gzip mmap
c++ performance boost gzip mmap
edited Mar 21 at 20:13
cccnrc
asked Mar 21 at 19:53
cccnrccccnrc
799
799
"performant" is not a word. Anyway, what are you trying to achieve?
– Jesper Juhl
Mar 21 at 20:06
I need to retain only some part of the original rows based on the value of some columns/fields (ex. if (gqxi < 2 || dpi < 2 || qi < 2) continue; if (stoi(gqxValue) < 30) continue;) to later pass them and compare with other programs filtering of similar file but from different subjects, in order to extract similarities/differences. The other files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods.
– cccnrc
Mar 21 at 20:08
Why is that info not part of the question?
– Jesper Juhl
Mar 21 at 20:10
completely right, edited
– cccnrc
Mar 21 at 20:13
add a comment |
"performant" is not a word. Anyway, what are you trying to achieve?
– Jesper Juhl
Mar 21 at 20:06
I need to retain only some part of the original rows based on the value of some columns/fields (ex. if (gqxi < 2 || dpi < 2 || qi < 2) continue; if (stoi(gqxValue) < 30) continue;) to later pass them and compare with other programs filtering of similar file but from different subjects, in order to extract similarities/differences. The other files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods.
– cccnrc
Mar 21 at 20:08
Why is that info not part of the question?
– Jesper Juhl
Mar 21 at 20:10
completely right, edited
– cccnrc
Mar 21 at 20:13
"performant" is not a word. Anyway, what are you trying to achieve?
– Jesper Juhl
Mar 21 at 20:06
"performant" is not a word. Anyway, what are you trying to achieve?
– Jesper Juhl
Mar 21 at 20:06
I need to retain only some part of the original rows based on the value of some columns/fields (ex. if (gqxi < 2 || dpi < 2 || qi < 2) continue; if (stoi(gqxValue) < 30) continue;) to later pass them and compare with other programs filtering of similar file but from different subjects, in order to extract similarities/differences. The other files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods.
– cccnrc
Mar 21 at 20:08
I need to retain only some part of the original rows based on the value of some columns/fields (ex. if (gqxi < 2 || dpi < 2 || qi < 2) continue; if (stoi(gqxValue) < 30) continue;) to later pass them and compare with other programs filtering of similar file but from different subjects, in order to extract similarities/differences. The other files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods.
– cccnrc
Mar 21 at 20:08
Why is that info not part of the question?
– Jesper Juhl
Mar 21 at 20:10
Why is that info not part of the question?
– Jesper Juhl
Mar 21 at 20:10
completely right, edited
– cccnrc
Mar 21 at 20:13
completely right, edited
– cccnrc
Mar 21 at 20:13
add a comment |
1 Answer
1
active
oldest
votes
You can read a memory-mapped gzip file with zlib's inflate()
functions. (Read the documentation in zlib.h.)
However regardless of whether you read from the file or read from the memory map, you cannot jump around the uncompressed data. The uncompressed data must be processed sequentially, or saved sequentially for later random access processing.
Thank you for your reply, I tried the library but it seems that a real time decompression-reading with buffer is faster...
– cccnrc
Mar 23 at 3:00
add a comment |
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55288310%2fc-mmap-to-fast-read-coupling-with-gzip-file%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can read a memory-mapped gzip file with zlib's inflate()
functions. (Read the documentation in zlib.h.)
However regardless of whether you read from the file or read from the memory map, you cannot jump around the uncompressed data. The uncompressed data must be processed sequentially, or saved sequentially for later random access processing.
Thank you for your reply, I tried the library but it seems that a real time decompression-reading with buffer is faster...
– cccnrc
Mar 23 at 3:00
add a comment |
You can read a memory-mapped gzip file with zlib's inflate()
functions. (Read the documentation in zlib.h.)
However regardless of whether you read from the file or read from the memory map, you cannot jump around the uncompressed data. The uncompressed data must be processed sequentially, or saved sequentially for later random access processing.
Thank you for your reply, I tried the library but it seems that a real time decompression-reading with buffer is faster...
– cccnrc
Mar 23 at 3:00
add a comment |
You can read a memory-mapped gzip file with zlib's inflate()
functions. (Read the documentation in zlib.h.)
However regardless of whether you read from the file or read from the memory map, you cannot jump around the uncompressed data. The uncompressed data must be processed sequentially, or saved sequentially for later random access processing.
You can read a memory-mapped gzip file with zlib's inflate()
functions. (Read the documentation in zlib.h.)
However regardless of whether you read from the file or read from the memory map, you cannot jump around the uncompressed data. The uncompressed data must be processed sequentially, or saved sequentially for later random access processing.
answered Mar 21 at 23:43
Mark AdlerMark Adler
59.7k867112
59.7k867112
Thank you for your reply, I tried the library but it seems that a real time decompression-reading with buffer is faster...
– cccnrc
Mar 23 at 3:00
add a comment |
Thank you for your reply, I tried the library but it seems that a real time decompression-reading with buffer is faster...
– cccnrc
Mar 23 at 3:00
Thank you for your reply, I tried the library but it seems that a real time decompression-reading with buffer is faster...
– cccnrc
Mar 23 at 3:00
Thank you for your reply, I tried the library but it seems that a real time decompression-reading with buffer is faster...
– cccnrc
Mar 23 at 3:00
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55288310%2fc-mmap-to-fast-read-coupling-with-gzip-file%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
"performant" is not a word. Anyway, what are you trying to achieve?
– Jesper Juhl
Mar 21 at 20:06
I need to retain only some part of the original rows based on the value of some columns/fields (ex. if (gqxi < 2 || dpi < 2 || qi < 2) continue; if (stoi(gqxValue) < 30) continue;) to later pass them and compare with other programs filtering of similar file but from different subjects, in order to extract similarities/differences. The other files are very big (up to 10GB .GZ) so I am trying to use fast comparison methods.
– cccnrc
Mar 21 at 20:08
Why is that info not part of the question?
– Jesper Juhl
Mar 21 at 20:10
completely right, edited
– cccnrc
Mar 21 at 20:13