How can I load a large (3.96 gb) .tsv file in R studioR memory management / cannot allocate vector of size n MbR memory management / cannot allocate vector of size n MbHow to use SQLite data in R without loading dataHow can we make xkcd style graphs?fread segfault with 30GB space separated file with some rows starting with spaceCannot allocate vector in R of size 11.8 GbHow can I delete column from data frame without causing a memory allocation error?use readOGR to load in a large spatial file in RLoading large SAS file in R gives “Error: cannot allocate vector of size 109.3 Mb”Error in processing large number of filesHow to allocate enough memory to join datasets in R
If quadruped mammals evolve to become bipedal will their breast or nipple change position?
How do I minimise waste on a flight?
If studying in groups is more effective, why don't academics also research in groups?
And now you see it
Why doesn't increasing the temperature of something like wood or paper set them on fire?
While drilling into kitchen wall, hit a wire - any advice?
What calendar would the Saturn nation use?
How to increase row height of a table and vertically "align middle"?
How to make a kid's bike easier to pedal
What is the meaning of "matter" in physics?
What's the role of the Receiver/Transmitter in Avengers Endgame?
What does the copyright in a dissertation protect exactly?
Does restarting the SQL Services (on the machine) clear the server cache (for things like query plans and statistics)?
How do I give a darkroom course without negs from the attendees?
call() a function within its own context
Can you just subtract the challenge rating of friendly NPCs?
Justification of physical currency in an interstellar civilization?
What's the difference between "ricochet" and "bounce"?
If an attacker targets a creature with the Sanctuary spell cast on them, but fails the Wisdom save, can they choose not to attack anyone else?
why it is 2>&1 and not 2>>&1 to append to a log file
Why did Dr. Strange keep looking into the future after the snap?
Is there a reason why Turkey took the Balkan territories of the Ottoman Empire, instead of Greece or another of the Balkan states?
An adjective or a noun to describe a very small apartment / house etc
Make me a minimum magic sum
How can I load a large (3.96 gb) .tsv file in R studio
R memory management / cannot allocate vector of size n MbR memory management / cannot allocate vector of size n MbHow to use SQLite data in R without loading dataHow can we make xkcd style graphs?fread segfault with 30GB space separated file with some rows starting with spaceCannot allocate vector in R of size 11.8 GbHow can I delete column from data frame without causing a memory allocation error?use readOGR to load in a large spatial file in RLoading large SAS file in R gives “Error: cannot allocate vector of size 109.3 Mb”Error in processing large number of filesHow to allocate enough memory to join datasets in R
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I want to load a 3.96 gigabyte tab separated value file to R and I have 8 ram in my system. How can I load this file to R to do some manipulation on it.
I tried library(data.table) to load my data
but I´ve got this error message (Error: cannot allocate vector of size 965.7 Mb)
I also tried fread with this code but it was not working either: it took a lot of time and at last it showed an error.
as.data.frame(fread(file name))
r
|
show 1 more comment
I want to load a 3.96 gigabyte tab separated value file to R and I have 8 ram in my system. How can I load this file to R to do some manipulation on it.
I tried library(data.table) to load my data
but I´ve got this error message (Error: cannot allocate vector of size 965.7 Mb)
I also tried fread with this code but it was not working either: it took a lot of time and at last it showed an error.
as.data.frame(fread(file name))
r
1
I think you're going to have a really hard time dealing with that much data on your system. Several factors: (1) how much data you can just fit in memory depends on what kind of data it is: logical and integer are fairly small, numeric is generally double, character varies depending on the length of the strings. (2) Once you load it, what do you plan to actually do with it? Some operations in R are copy-on-write, meaning your memory requirements are much larger. If it's a frame-like object, thedata.tablepackage tends to be memory-frugal (e.g.,fread), but still ...
– r2evans
Mar 23 at 6:00
it contains integer and numeric values but still i m not getting how to load my data
– xyz
Mar 23 at 6:04
1
Ultimately, we can't know, since we aren't there. As an example, I have a 584MB csv here (woefully smaller) that I've loaded withdata.table::fread, and it takes 335MB sitting in memory (I've seen a worse ratio of on-disk to in-memory), not bad. Depending on the functions I'm using, the actual memory required to operate on this data ranges from an additional 300MB to well over 600MB more, depending on if I intentionally or accidentally keep copies of the data sitting around.
– r2evans
Mar 23 at 6:16
1
Additionally, your OS configuration can change things, too. What else is running? What OS? Do you have virtual memory configured? Though it might be feasible to make something work here (I can't tell from what we know), you are close to the line of "big data", where my loose definition includes discussion about "more data than I can manipulate on this computer". My computer at work has 16x the amount of RAM this laptop does, to its practical "big data" limit is much higher.
– r2evans
Mar 23 at 6:18
1
Note thatas.data.framewill make a copy of the dataset, thus doubling the size. Here is where you run out of memory. If you really want to work with a data.frame rather than a data.table, you should usesetDFinstead as it will convert to a data.frame without the copy. As everyone has mentioned, you will have difficulty doing anything complicated with this data as a whole given the amount of RAM you have.
– lmo
Mar 23 at 17:19
|
show 1 more comment
I want to load a 3.96 gigabyte tab separated value file to R and I have 8 ram in my system. How can I load this file to R to do some manipulation on it.
I tried library(data.table) to load my data
but I´ve got this error message (Error: cannot allocate vector of size 965.7 Mb)
I also tried fread with this code but it was not working either: it took a lot of time and at last it showed an error.
as.data.frame(fread(file name))
r
I want to load a 3.96 gigabyte tab separated value file to R and I have 8 ram in my system. How can I load this file to R to do some manipulation on it.
I tried library(data.table) to load my data
but I´ve got this error message (Error: cannot allocate vector of size 965.7 Mb)
I also tried fread with this code but it was not working either: it took a lot of time and at last it showed an error.
as.data.frame(fread(file name))
r
r
edited Mar 23 at 17:12
Oka
75229
75229
asked Mar 23 at 5:55
xyzxyz
194
194
1
I think you're going to have a really hard time dealing with that much data on your system. Several factors: (1) how much data you can just fit in memory depends on what kind of data it is: logical and integer are fairly small, numeric is generally double, character varies depending on the length of the strings. (2) Once you load it, what do you plan to actually do with it? Some operations in R are copy-on-write, meaning your memory requirements are much larger. If it's a frame-like object, thedata.tablepackage tends to be memory-frugal (e.g.,fread), but still ...
– r2evans
Mar 23 at 6:00
it contains integer and numeric values but still i m not getting how to load my data
– xyz
Mar 23 at 6:04
1
Ultimately, we can't know, since we aren't there. As an example, I have a 584MB csv here (woefully smaller) that I've loaded withdata.table::fread, and it takes 335MB sitting in memory (I've seen a worse ratio of on-disk to in-memory), not bad. Depending on the functions I'm using, the actual memory required to operate on this data ranges from an additional 300MB to well over 600MB more, depending on if I intentionally or accidentally keep copies of the data sitting around.
– r2evans
Mar 23 at 6:16
1
Additionally, your OS configuration can change things, too. What else is running? What OS? Do you have virtual memory configured? Though it might be feasible to make something work here (I can't tell from what we know), you are close to the line of "big data", where my loose definition includes discussion about "more data than I can manipulate on this computer". My computer at work has 16x the amount of RAM this laptop does, to its practical "big data" limit is much higher.
– r2evans
Mar 23 at 6:18
1
Note thatas.data.framewill make a copy of the dataset, thus doubling the size. Here is where you run out of memory. If you really want to work with a data.frame rather than a data.table, you should usesetDFinstead as it will convert to a data.frame without the copy. As everyone has mentioned, you will have difficulty doing anything complicated with this data as a whole given the amount of RAM you have.
– lmo
Mar 23 at 17:19
|
show 1 more comment
1
I think you're going to have a really hard time dealing with that much data on your system. Several factors: (1) how much data you can just fit in memory depends on what kind of data it is: logical and integer are fairly small, numeric is generally double, character varies depending on the length of the strings. (2) Once you load it, what do you plan to actually do with it? Some operations in R are copy-on-write, meaning your memory requirements are much larger. If it's a frame-like object, thedata.tablepackage tends to be memory-frugal (e.g.,fread), but still ...
– r2evans
Mar 23 at 6:00
it contains integer and numeric values but still i m not getting how to load my data
– xyz
Mar 23 at 6:04
1
Ultimately, we can't know, since we aren't there. As an example, I have a 584MB csv here (woefully smaller) that I've loaded withdata.table::fread, and it takes 335MB sitting in memory (I've seen a worse ratio of on-disk to in-memory), not bad. Depending on the functions I'm using, the actual memory required to operate on this data ranges from an additional 300MB to well over 600MB more, depending on if I intentionally or accidentally keep copies of the data sitting around.
– r2evans
Mar 23 at 6:16
1
Additionally, your OS configuration can change things, too. What else is running? What OS? Do you have virtual memory configured? Though it might be feasible to make something work here (I can't tell from what we know), you are close to the line of "big data", where my loose definition includes discussion about "more data than I can manipulate on this computer". My computer at work has 16x the amount of RAM this laptop does, to its practical "big data" limit is much higher.
– r2evans
Mar 23 at 6:18
1
Note thatas.data.framewill make a copy of the dataset, thus doubling the size. Here is where you run out of memory. If you really want to work with a data.frame rather than a data.table, you should usesetDFinstead as it will convert to a data.frame without the copy. As everyone has mentioned, you will have difficulty doing anything complicated with this data as a whole given the amount of RAM you have.
– lmo
Mar 23 at 17:19
1
1
I think you're going to have a really hard time dealing with that much data on your system. Several factors: (1) how much data you can just fit in memory depends on what kind of data it is: logical and integer are fairly small, numeric is generally double, character varies depending on the length of the strings. (2) Once you load it, what do you plan to actually do with it? Some operations in R are copy-on-write, meaning your memory requirements are much larger. If it's a frame-like object, the
data.table package tends to be memory-frugal (e.g., fread), but still ...– r2evans
Mar 23 at 6:00
I think you're going to have a really hard time dealing with that much data on your system. Several factors: (1) how much data you can just fit in memory depends on what kind of data it is: logical and integer are fairly small, numeric is generally double, character varies depending on the length of the strings. (2) Once you load it, what do you plan to actually do with it? Some operations in R are copy-on-write, meaning your memory requirements are much larger. If it's a frame-like object, the
data.table package tends to be memory-frugal (e.g., fread), but still ...– r2evans
Mar 23 at 6:00
it contains integer and numeric values but still i m not getting how to load my data
– xyz
Mar 23 at 6:04
it contains integer and numeric values but still i m not getting how to load my data
– xyz
Mar 23 at 6:04
1
1
Ultimately, we can't know, since we aren't there. As an example, I have a 584MB csv here (woefully smaller) that I've loaded with
data.table::fread, and it takes 335MB sitting in memory (I've seen a worse ratio of on-disk to in-memory), not bad. Depending on the functions I'm using, the actual memory required to operate on this data ranges from an additional 300MB to well over 600MB more, depending on if I intentionally or accidentally keep copies of the data sitting around.– r2evans
Mar 23 at 6:16
Ultimately, we can't know, since we aren't there. As an example, I have a 584MB csv here (woefully smaller) that I've loaded with
data.table::fread, and it takes 335MB sitting in memory (I've seen a worse ratio of on-disk to in-memory), not bad. Depending on the functions I'm using, the actual memory required to operate on this data ranges from an additional 300MB to well over 600MB more, depending on if I intentionally or accidentally keep copies of the data sitting around.– r2evans
Mar 23 at 6:16
1
1
Additionally, your OS configuration can change things, too. What else is running? What OS? Do you have virtual memory configured? Though it might be feasible to make something work here (I can't tell from what we know), you are close to the line of "big data", where my loose definition includes discussion about "more data than I can manipulate on this computer". My computer at work has 16x the amount of RAM this laptop does, to its practical "big data" limit is much higher.
– r2evans
Mar 23 at 6:18
Additionally, your OS configuration can change things, too. What else is running? What OS? Do you have virtual memory configured? Though it might be feasible to make something work here (I can't tell from what we know), you are close to the line of "big data", where my loose definition includes discussion about "more data than I can manipulate on this computer". My computer at work has 16x the amount of RAM this laptop does, to its practical "big data" limit is much higher.
– r2evans
Mar 23 at 6:18
1
1
Note that
as.data.frame will make a copy of the dataset, thus doubling the size. Here is where you run out of memory. If you really want to work with a data.frame rather than a data.table, you should use setDF instead as it will convert to a data.frame without the copy. As everyone has mentioned, you will have difficulty doing anything complicated with this data as a whole given the amount of RAM you have.– lmo
Mar 23 at 17:19
Note that
as.data.frame will make a copy of the dataset, thus doubling the size. Here is where you run out of memory. If you really want to work with a data.frame rather than a data.table, you should use setDF instead as it will convert to a data.frame without the copy. As everyone has mentioned, you will have difficulty doing anything complicated with this data as a whole given the amount of RAM you have.– lmo
Mar 23 at 17:19
|
show 1 more comment
3 Answers
3
active
oldest
votes
If I were you, I probably would
1) try your fread code once more without the typo (closing parenthesis was initially missing):
as.data.frame(fread(file name))
2) try to read the file in parts by specifying number of rows to read. This can be done in read.csv and fread with nrow arguments. By reading a small number of rows one could check and confirm that the file is actually readable before doing anything else. Sometimes files are malformed, there could be some special characters, wrong end-of-line characters, escaping or something else which needs to be addressed first.
3) have a look at bigmemory package which have read.big.matrix function. Also ff package has the desired functionalities.
Alternatively, I probably would also try to think "outside the box": do I need all of the data in the file? If not, I could preprocess the file for example with cut or awk to remove unnecessary columns. Do I absolutely need to read it as one file and have all data simultaneously in memory? If not, I could split the file or maybe use readLines..
ps. This topic is covered quite nicely in this post.
pps. Thanks to @Yuriy Barvinchenko for comment on fread
add a comment |
You are reading the data (which puts it in memory) and then storing it as a data.frame (which makes another copy). Instead, read it directly into a data.frame with
fread(file name, data.table=FALSE)
Also, it wouldn't hurt to run garbage collection.
gc()
add a comment |
From my experience and in addition to @Oka answer:
fread()havenrows=argument, so you can read first 10 lines.- If you found out that you don't need all lines and/or all columns, so you can set condition and list of fields just after
fread()[] - You can use data.table as dataframe in many cases, so you can try to read without as.data.frame()
This way I worked with 5GB csv file.
1
i was able to load data but system was running really slow
– xyz
Mar 25 at 5:56
It's all about memory. You have 2 friends: data.table and rm()+gc() Data.table use the same address in memory when up modify the same table. So you need less memory. If you don't need some data in next steps, remove it from memory with rm() and do garbage collection with gc().
– Yuriy Barvinchenko
Mar 25 at 7:47
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55311050%2fhow-can-i-load-a-large-3-96-gb-tsv-file-in-r-studio%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
If I were you, I probably would
1) try your fread code once more without the typo (closing parenthesis was initially missing):
as.data.frame(fread(file name))
2) try to read the file in parts by specifying number of rows to read. This can be done in read.csv and fread with nrow arguments. By reading a small number of rows one could check and confirm that the file is actually readable before doing anything else. Sometimes files are malformed, there could be some special characters, wrong end-of-line characters, escaping or something else which needs to be addressed first.
3) have a look at bigmemory package which have read.big.matrix function. Also ff package has the desired functionalities.
Alternatively, I probably would also try to think "outside the box": do I need all of the data in the file? If not, I could preprocess the file for example with cut or awk to remove unnecessary columns. Do I absolutely need to read it as one file and have all data simultaneously in memory? If not, I could split the file or maybe use readLines..
ps. This topic is covered quite nicely in this post.
pps. Thanks to @Yuriy Barvinchenko for comment on fread
add a comment |
If I were you, I probably would
1) try your fread code once more without the typo (closing parenthesis was initially missing):
as.data.frame(fread(file name))
2) try to read the file in parts by specifying number of rows to read. This can be done in read.csv and fread with nrow arguments. By reading a small number of rows one could check and confirm that the file is actually readable before doing anything else. Sometimes files are malformed, there could be some special characters, wrong end-of-line characters, escaping or something else which needs to be addressed first.
3) have a look at bigmemory package which have read.big.matrix function. Also ff package has the desired functionalities.
Alternatively, I probably would also try to think "outside the box": do I need all of the data in the file? If not, I could preprocess the file for example with cut or awk to remove unnecessary columns. Do I absolutely need to read it as one file and have all data simultaneously in memory? If not, I could split the file or maybe use readLines..
ps. This topic is covered quite nicely in this post.
pps. Thanks to @Yuriy Barvinchenko for comment on fread
add a comment |
If I were you, I probably would
1) try your fread code once more without the typo (closing parenthesis was initially missing):
as.data.frame(fread(file name))
2) try to read the file in parts by specifying number of rows to read. This can be done in read.csv and fread with nrow arguments. By reading a small number of rows one could check and confirm that the file is actually readable before doing anything else. Sometimes files are malformed, there could be some special characters, wrong end-of-line characters, escaping or something else which needs to be addressed first.
3) have a look at bigmemory package which have read.big.matrix function. Also ff package has the desired functionalities.
Alternatively, I probably would also try to think "outside the box": do I need all of the data in the file? If not, I could preprocess the file for example with cut or awk to remove unnecessary columns. Do I absolutely need to read it as one file and have all data simultaneously in memory? If not, I could split the file or maybe use readLines..
ps. This topic is covered quite nicely in this post.
pps. Thanks to @Yuriy Barvinchenko for comment on fread
If I were you, I probably would
1) try your fread code once more without the typo (closing parenthesis was initially missing):
as.data.frame(fread(file name))
2) try to read the file in parts by specifying number of rows to read. This can be done in read.csv and fread with nrow arguments. By reading a small number of rows one could check and confirm that the file is actually readable before doing anything else. Sometimes files are malformed, there could be some special characters, wrong end-of-line characters, escaping or something else which needs to be addressed first.
3) have a look at bigmemory package which have read.big.matrix function. Also ff package has the desired functionalities.
Alternatively, I probably would also try to think "outside the box": do I need all of the data in the file? If not, I could preprocess the file for example with cut or awk to remove unnecessary columns. Do I absolutely need to read it as one file and have all data simultaneously in memory? If not, I could split the file or maybe use readLines..
ps. This topic is covered quite nicely in this post.
pps. Thanks to @Yuriy Barvinchenko for comment on fread
edited Mar 23 at 15:20
answered Mar 23 at 11:15
OkaOka
75229
75229
add a comment |
add a comment |
You are reading the data (which puts it in memory) and then storing it as a data.frame (which makes another copy). Instead, read it directly into a data.frame with
fread(file name, data.table=FALSE)
Also, it wouldn't hurt to run garbage collection.
gc()
add a comment |
You are reading the data (which puts it in memory) and then storing it as a data.frame (which makes another copy). Instead, read it directly into a data.frame with
fread(file name, data.table=FALSE)
Also, it wouldn't hurt to run garbage collection.
gc()
add a comment |
You are reading the data (which puts it in memory) and then storing it as a data.frame (which makes another copy). Instead, read it directly into a data.frame with
fread(file name, data.table=FALSE)
Also, it wouldn't hurt to run garbage collection.
gc()
You are reading the data (which puts it in memory) and then storing it as a data.frame (which makes another copy). Instead, read it directly into a data.frame with
fread(file name, data.table=FALSE)
Also, it wouldn't hurt to run garbage collection.
gc()
answered Mar 23 at 11:42
G5WG5W
24.1k92344
24.1k92344
add a comment |
add a comment |
From my experience and in addition to @Oka answer:
fread()havenrows=argument, so you can read first 10 lines.- If you found out that you don't need all lines and/or all columns, so you can set condition and list of fields just after
fread()[] - You can use data.table as dataframe in many cases, so you can try to read without as.data.frame()
This way I worked with 5GB csv file.
1
i was able to load data but system was running really slow
– xyz
Mar 25 at 5:56
It's all about memory. You have 2 friends: data.table and rm()+gc() Data.table use the same address in memory when up modify the same table. So you need less memory. If you don't need some data in next steps, remove it from memory with rm() and do garbage collection with gc().
– Yuriy Barvinchenko
Mar 25 at 7:47
add a comment |
From my experience and in addition to @Oka answer:
fread()havenrows=argument, so you can read first 10 lines.- If you found out that you don't need all lines and/or all columns, so you can set condition and list of fields just after
fread()[] - You can use data.table as dataframe in many cases, so you can try to read without as.data.frame()
This way I worked with 5GB csv file.
1
i was able to load data but system was running really slow
– xyz
Mar 25 at 5:56
It's all about memory. You have 2 friends: data.table and rm()+gc() Data.table use the same address in memory when up modify the same table. So you need less memory. If you don't need some data in next steps, remove it from memory with rm() and do garbage collection with gc().
– Yuriy Barvinchenko
Mar 25 at 7:47
add a comment |
From my experience and in addition to @Oka answer:
fread()havenrows=argument, so you can read first 10 lines.- If you found out that you don't need all lines and/or all columns, so you can set condition and list of fields just after
fread()[] - You can use data.table as dataframe in many cases, so you can try to read without as.data.frame()
This way I worked with 5GB csv file.
From my experience and in addition to @Oka answer:
fread()havenrows=argument, so you can read first 10 lines.- If you found out that you don't need all lines and/or all columns, so you can set condition and list of fields just after
fread()[] - You can use data.table as dataframe in many cases, so you can try to read without as.data.frame()
This way I worked with 5GB csv file.
answered Mar 23 at 13:46
Yuriy BarvinchenkoYuriy Barvinchenko
37117
37117
1
i was able to load data but system was running really slow
– xyz
Mar 25 at 5:56
It's all about memory. You have 2 friends: data.table and rm()+gc() Data.table use the same address in memory when up modify the same table. So you need less memory. If you don't need some data in next steps, remove it from memory with rm() and do garbage collection with gc().
– Yuriy Barvinchenko
Mar 25 at 7:47
add a comment |
1
i was able to load data but system was running really slow
– xyz
Mar 25 at 5:56
It's all about memory. You have 2 friends: data.table and rm()+gc() Data.table use the same address in memory when up modify the same table. So you need less memory. If you don't need some data in next steps, remove it from memory with rm() and do garbage collection with gc().
– Yuriy Barvinchenko
Mar 25 at 7:47
1
1
i was able to load data but system was running really slow
– xyz
Mar 25 at 5:56
i was able to load data but system was running really slow
– xyz
Mar 25 at 5:56
It's all about memory. You have 2 friends: data.table and rm()+gc() Data.table use the same address in memory when up modify the same table. So you need less memory. If you don't need some data in next steps, remove it from memory with rm() and do garbage collection with gc().
– Yuriy Barvinchenko
Mar 25 at 7:47
It's all about memory. You have 2 friends: data.table and rm()+gc() Data.table use the same address in memory when up modify the same table. So you need less memory. If you don't need some data in next steps, remove it from memory with rm() and do garbage collection with gc().
– Yuriy Barvinchenko
Mar 25 at 7:47
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55311050%2fhow-can-i-load-a-large-3-96-gb-tsv-file-in-r-studio%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
I think you're going to have a really hard time dealing with that much data on your system. Several factors: (1) how much data you can just fit in memory depends on what kind of data it is: logical and integer are fairly small, numeric is generally double, character varies depending on the length of the strings. (2) Once you load it, what do you plan to actually do with it? Some operations in R are copy-on-write, meaning your memory requirements are much larger. If it's a frame-like object, the
data.tablepackage tends to be memory-frugal (e.g.,fread), but still ...– r2evans
Mar 23 at 6:00
it contains integer and numeric values but still i m not getting how to load my data
– xyz
Mar 23 at 6:04
1
Ultimately, we can't know, since we aren't there. As an example, I have a 584MB csv here (woefully smaller) that I've loaded with
data.table::fread, and it takes 335MB sitting in memory (I've seen a worse ratio of on-disk to in-memory), not bad. Depending on the functions I'm using, the actual memory required to operate on this data ranges from an additional 300MB to well over 600MB more, depending on if I intentionally or accidentally keep copies of the data sitting around.– r2evans
Mar 23 at 6:16
1
Additionally, your OS configuration can change things, too. What else is running? What OS? Do you have virtual memory configured? Though it might be feasible to make something work here (I can't tell from what we know), you are close to the line of "big data", where my loose definition includes discussion about "more data than I can manipulate on this computer". My computer at work has 16x the amount of RAM this laptop does, to its practical "big data" limit is much higher.
– r2evans
Mar 23 at 6:18
1
Note that
as.data.framewill make a copy of the dataset, thus doubling the size. Here is where you run out of memory. If you really want to work with a data.frame rather than a data.table, you should usesetDFinstead as it will convert to a data.frame without the copy. As everyone has mentioned, you will have difficulty doing anything complicated with this data as a whole given the amount of RAM you have.– lmo
Mar 23 at 17:19