How to read only 5 records from s3 bucket and return it without getting all data of csv fileHow do I check whether a file exists without exceptions?How do I return multiple values from a function?How to get the filename without the extension from a path in Python?How do you read from stdin?How do I list all files of a directory?How to read a file line-by-line into a list?How to read a text file into a string variable and strip newlines?How do I write JSON data to a file?Boto3 upload to S3: cut off last few rows of data from a .csv fileLoad CSV data into Jupyter Notebook from S3

In what sequence should an advanced civilization teach technology to medieval society to maximize rate of adoption?

What 68-pin connector is this on my 2.5" solid state drive?

Does a large scratch in an ND filter affect image quality?

Block diagram vs flow chart?

How to publish superseding results without creating enemies

Why don't airports use arresting gears to recover energy from landing passenger planes?

shell script to check if input is a string/integer/float

Why does the speed of sound decrease at high altitudes although the air density decreases?

Impossible Scrabble Words

Read string of any length in C

2000s space film where an alien species has almost wiped out the human race in a war

geschafft or geschaffen? which one is past participle of schaffen?

How to write characters doing illogical things in a believable way?

What is this gigantic dish at Ben Gurion airport?

Insight into cavity resonators

Output a Super Mario Image

International Orange?

Is it possible to format a USB from a live USB?

How does a simple logistic regression model achieve a 92% classification accuracy on MNIST?

Access parent controller attribute from Visual force component

Can a character with good/neutral alignment attune to a sentient magic item with evil alignment?

Prove that a convergent real sequence always has a smallest or a largest term

Python web-scraper to download table of transistor counts from Wikipedia

In what state are satellites left in when they are left in a graveyard orbit?



How to read only 5 records from s3 bucket and return it without getting all data of csv file


How do I check whether a file exists without exceptions?How do I return multiple values from a function?How to get the filename without the extension from a path in Python?How do you read from stdin?How do I list all files of a directory?How to read a file line-by-line into a list?How to read a text file into a string variable and strip newlines?How do I write JSON data to a file?Boto3 upload to S3: cut off last few rows of data from a .csv fileLoad CSV data into Jupyter Notebook from S3






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








0















Hello guys I know lots of similar questions i'll find here but i have a code which is executing properly which is returning five records also my query is how should i only read the entire file and atlast return with desire rows just supose i have csv file which have size in gb so i don't want to return the entire gb file data for getting only 5 records so please tell me how should i get it....Please if possible explain my code if it is not good why it is not good..
code:



import boto3
from botocore.client import Config
import pandas as pd

ACCESS_KEY_ID = 'something'
ACCESS_SECRET_KEY = 'something'
BUCKET_NAME = 'something'
Filename='dataRepository/source/MergedSeedData(Parts_skills_Durations).csv'

client = boto3.client("s3",
aws_access_key_id=ACCESS_KEY_ID,
aws_secret_access_key=ACCESS_SECRET_KEY)
obj = client.get_object(Bucket=BUCKET_NAME, Key=Filename)
Data = pd.read_csv(obj['Body'])
# data1 = Data.columns
# return data1
Data=Data.head(5)
print(Data)


This my code which is running fine also getting the 5 records from s3 bucket but i have explained it what i'm looking for any other query feel free to text me...thnxx in advance










share|improve this question


























  • does obj['Body'] point to the csv file path that is to be read?

    – Paritosh Singh
    Mar 28 at 11:59











  • @ParitoshSingh yes its get the csv file content

    – snehil singh
    Mar 28 at 12:01


















0















Hello guys I know lots of similar questions i'll find here but i have a code which is executing properly which is returning five records also my query is how should i only read the entire file and atlast return with desire rows just supose i have csv file which have size in gb so i don't want to return the entire gb file data for getting only 5 records so please tell me how should i get it....Please if possible explain my code if it is not good why it is not good..
code:



import boto3
from botocore.client import Config
import pandas as pd

ACCESS_KEY_ID = 'something'
ACCESS_SECRET_KEY = 'something'
BUCKET_NAME = 'something'
Filename='dataRepository/source/MergedSeedData(Parts_skills_Durations).csv'

client = boto3.client("s3",
aws_access_key_id=ACCESS_KEY_ID,
aws_secret_access_key=ACCESS_SECRET_KEY)
obj = client.get_object(Bucket=BUCKET_NAME, Key=Filename)
Data = pd.read_csv(obj['Body'])
# data1 = Data.columns
# return data1
Data=Data.head(5)
print(Data)


This my code which is running fine also getting the 5 records from s3 bucket but i have explained it what i'm looking for any other query feel free to text me...thnxx in advance










share|improve this question


























  • does obj['Body'] point to the csv file path that is to be read?

    – Paritosh Singh
    Mar 28 at 11:59











  • @ParitoshSingh yes its get the csv file content

    – snehil singh
    Mar 28 at 12:01














0












0








0








Hello guys I know lots of similar questions i'll find here but i have a code which is executing properly which is returning five records also my query is how should i only read the entire file and atlast return with desire rows just supose i have csv file which have size in gb so i don't want to return the entire gb file data for getting only 5 records so please tell me how should i get it....Please if possible explain my code if it is not good why it is not good..
code:



import boto3
from botocore.client import Config
import pandas as pd

ACCESS_KEY_ID = 'something'
ACCESS_SECRET_KEY = 'something'
BUCKET_NAME = 'something'
Filename='dataRepository/source/MergedSeedData(Parts_skills_Durations).csv'

client = boto3.client("s3",
aws_access_key_id=ACCESS_KEY_ID,
aws_secret_access_key=ACCESS_SECRET_KEY)
obj = client.get_object(Bucket=BUCKET_NAME, Key=Filename)
Data = pd.read_csv(obj['Body'])
# data1 = Data.columns
# return data1
Data=Data.head(5)
print(Data)


This my code which is running fine also getting the 5 records from s3 bucket but i have explained it what i'm looking for any other query feel free to text me...thnxx in advance










share|improve this question
















Hello guys I know lots of similar questions i'll find here but i have a code which is executing properly which is returning five records also my query is how should i only read the entire file and atlast return with desire rows just supose i have csv file which have size in gb so i don't want to return the entire gb file data for getting only 5 records so please tell me how should i get it....Please if possible explain my code if it is not good why it is not good..
code:



import boto3
from botocore.client import Config
import pandas as pd

ACCESS_KEY_ID = 'something'
ACCESS_SECRET_KEY = 'something'
BUCKET_NAME = 'something'
Filename='dataRepository/source/MergedSeedData(Parts_skills_Durations).csv'

client = boto3.client("s3",
aws_access_key_id=ACCESS_KEY_ID,
aws_secret_access_key=ACCESS_SECRET_KEY)
obj = client.get_object(Bucket=BUCKET_NAME, Key=Filename)
Data = pd.read_csv(obj['Body'])
# data1 = Data.columns
# return data1
Data=Data.head(5)
print(Data)


This my code which is running fine also getting the 5 records from s3 bucket but i have explained it what i'm looking for any other query feel free to text me...thnxx in advance







python pandas amazon-s3 boto3






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 28 at 11:57









taras

3,8266 gold badges26 silver badges35 bronze badges




3,8266 gold badges26 silver badges35 bronze badges










asked Mar 28 at 11:51









snehil singhsnehil singh

3101 silver badge9 bronze badges




3101 silver badge9 bronze badges















  • does obj['Body'] point to the csv file path that is to be read?

    – Paritosh Singh
    Mar 28 at 11:59











  • @ParitoshSingh yes its get the csv file content

    – snehil singh
    Mar 28 at 12:01


















  • does obj['Body'] point to the csv file path that is to be read?

    – Paritosh Singh
    Mar 28 at 11:59











  • @ParitoshSingh yes its get the csv file content

    – snehil singh
    Mar 28 at 12:01

















does obj['Body'] point to the csv file path that is to be read?

– Paritosh Singh
Mar 28 at 11:59





does obj['Body'] point to the csv file path that is to be read?

– Paritosh Singh
Mar 28 at 11:59













@ParitoshSingh yes its get the csv file content

– snehil singh
Mar 28 at 12:01






@ParitoshSingh yes its get the csv file content

– snehil singh
Mar 28 at 12:01













2 Answers
2






active

oldest

votes


















2
















You can use the pandas capability of reading a file in chunks, just loading as much data as you need.



Data_iter = pd.read_csv(obj['Body'], chunksize = 5)
Data = Data_iter.get_chunk()
print(Data)





share|improve this answer

























  • please can u explain it to me how does it help me to not to get all data from s3 bucket

    – snehil singh
    Mar 28 at 12:06






  • 1





    if obj itself does no do any reading, Specifying a chunksize utilizes the file handler to only read portions of a file as needed. This is essentially how file handlers read through the data in files, they can function as iterators. The chunksize argument gets you an iterator, and you can iterate through it to get only as much data as you need.

    – Paritosh Singh
    Mar 28 at 12:09


















1
















You can use a HTTP Range: header (see RFC 2616), which take a byte range argument. S3 APIs have a provision for this and this will help you to NOT read/download the whole S3 file.



Sample code:



import boto3
obj = boto3.resource('s3').Object('bucket101', 'my.csv')
record_stream = obj.get(Range='bytes=0-1000')['Body']
print(record_stream.read())


This will return only the byte_range_data provided in the header.



But you will need to modify this to convert the string into Dataframe. Maybe read + join for the t and n present in the string coming from the .csv file






share|improve this answer



























  • its good only for the very small file where ppl have enough time for counting the file words but this will become painfull when i use it for very large file cause no one will count the number of words in the csv file as simple we will pass only rows no it will give the desire row ....anyway thnx for the answer

    – snehil singh
    Mar 28 at 12:34













Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);














draft saved

draft discarded
















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55396938%2fhow-to-read-only-5-records-from-s3-bucket-and-return-it-without-getting-all-data%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









2
















You can use the pandas capability of reading a file in chunks, just loading as much data as you need.



Data_iter = pd.read_csv(obj['Body'], chunksize = 5)
Data = Data_iter.get_chunk()
print(Data)





share|improve this answer

























  • please can u explain it to me how does it help me to not to get all data from s3 bucket

    – snehil singh
    Mar 28 at 12:06






  • 1





    if obj itself does no do any reading, Specifying a chunksize utilizes the file handler to only read portions of a file as needed. This is essentially how file handlers read through the data in files, they can function as iterators. The chunksize argument gets you an iterator, and you can iterate through it to get only as much data as you need.

    – Paritosh Singh
    Mar 28 at 12:09















2
















You can use the pandas capability of reading a file in chunks, just loading as much data as you need.



Data_iter = pd.read_csv(obj['Body'], chunksize = 5)
Data = Data_iter.get_chunk()
print(Data)





share|improve this answer

























  • please can u explain it to me how does it help me to not to get all data from s3 bucket

    – snehil singh
    Mar 28 at 12:06






  • 1





    if obj itself does no do any reading, Specifying a chunksize utilizes the file handler to only read portions of a file as needed. This is essentially how file handlers read through the data in files, they can function as iterators. The chunksize argument gets you an iterator, and you can iterate through it to get only as much data as you need.

    – Paritosh Singh
    Mar 28 at 12:09













2














2










2









You can use the pandas capability of reading a file in chunks, just loading as much data as you need.



Data_iter = pd.read_csv(obj['Body'], chunksize = 5)
Data = Data_iter.get_chunk()
print(Data)





share|improve this answer













You can use the pandas capability of reading a file in chunks, just loading as much data as you need.



Data_iter = pd.read_csv(obj['Body'], chunksize = 5)
Data = Data_iter.get_chunk()
print(Data)






share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 28 at 12:04









Paritosh SinghParitosh Singh

4,5192 gold badges7 silver badges29 bronze badges




4,5192 gold badges7 silver badges29 bronze badges















  • please can u explain it to me how does it help me to not to get all data from s3 bucket

    – snehil singh
    Mar 28 at 12:06






  • 1





    if obj itself does no do any reading, Specifying a chunksize utilizes the file handler to only read portions of a file as needed. This is essentially how file handlers read through the data in files, they can function as iterators. The chunksize argument gets you an iterator, and you can iterate through it to get only as much data as you need.

    – Paritosh Singh
    Mar 28 at 12:09

















  • please can u explain it to me how does it help me to not to get all data from s3 bucket

    – snehil singh
    Mar 28 at 12:06






  • 1





    if obj itself does no do any reading, Specifying a chunksize utilizes the file handler to only read portions of a file as needed. This is essentially how file handlers read through the data in files, they can function as iterators. The chunksize argument gets you an iterator, and you can iterate through it to get only as much data as you need.

    – Paritosh Singh
    Mar 28 at 12:09
















please can u explain it to me how does it help me to not to get all data from s3 bucket

– snehil singh
Mar 28 at 12:06





please can u explain it to me how does it help me to not to get all data from s3 bucket

– snehil singh
Mar 28 at 12:06




1




1





if obj itself does no do any reading, Specifying a chunksize utilizes the file handler to only read portions of a file as needed. This is essentially how file handlers read through the data in files, they can function as iterators. The chunksize argument gets you an iterator, and you can iterate through it to get only as much data as you need.

– Paritosh Singh
Mar 28 at 12:09





if obj itself does no do any reading, Specifying a chunksize utilizes the file handler to only read portions of a file as needed. This is essentially how file handlers read through the data in files, they can function as iterators. The chunksize argument gets you an iterator, and you can iterate through it to get only as much data as you need.

– Paritosh Singh
Mar 28 at 12:09













1
















You can use a HTTP Range: header (see RFC 2616), which take a byte range argument. S3 APIs have a provision for this and this will help you to NOT read/download the whole S3 file.



Sample code:



import boto3
obj = boto3.resource('s3').Object('bucket101', 'my.csv')
record_stream = obj.get(Range='bytes=0-1000')['Body']
print(record_stream.read())


This will return only the byte_range_data provided in the header.



But you will need to modify this to convert the string into Dataframe. Maybe read + join for the t and n present in the string coming from the .csv file






share|improve this answer



























  • its good only for the very small file where ppl have enough time for counting the file words but this will become painfull when i use it for very large file cause no one will count the number of words in the csv file as simple we will pass only rows no it will give the desire row ....anyway thnx for the answer

    – snehil singh
    Mar 28 at 12:34















1
















You can use a HTTP Range: header (see RFC 2616), which take a byte range argument. S3 APIs have a provision for this and this will help you to NOT read/download the whole S3 file.



Sample code:



import boto3
obj = boto3.resource('s3').Object('bucket101', 'my.csv')
record_stream = obj.get(Range='bytes=0-1000')['Body']
print(record_stream.read())


This will return only the byte_range_data provided in the header.



But you will need to modify this to convert the string into Dataframe. Maybe read + join for the t and n present in the string coming from the .csv file






share|improve this answer



























  • its good only for the very small file where ppl have enough time for counting the file words but this will become painfull when i use it for very large file cause no one will count the number of words in the csv file as simple we will pass only rows no it will give the desire row ....anyway thnx for the answer

    – snehil singh
    Mar 28 at 12:34













1














1










1









You can use a HTTP Range: header (see RFC 2616), which take a byte range argument. S3 APIs have a provision for this and this will help you to NOT read/download the whole S3 file.



Sample code:



import boto3
obj = boto3.resource('s3').Object('bucket101', 'my.csv')
record_stream = obj.get(Range='bytes=0-1000')['Body']
print(record_stream.read())


This will return only the byte_range_data provided in the header.



But you will need to modify this to convert the string into Dataframe. Maybe read + join for the t and n present in the string coming from the .csv file






share|improve this answer















You can use a HTTP Range: header (see RFC 2616), which take a byte range argument. S3 APIs have a provision for this and this will help you to NOT read/download the whole S3 file.



Sample code:



import boto3
obj = boto3.resource('s3').Object('bucket101', 'my.csv')
record_stream = obj.get(Range='bytes=0-1000')['Body']
print(record_stream.read())


This will return only the byte_range_data provided in the header.



But you will need to modify this to convert the string into Dataframe. Maybe read + join for the t and n present in the string coming from the .csv file







share|improve this answer














share|improve this answer



share|improve this answer








edited Mar 28 at 12:25

























answered Mar 28 at 12:19









sanster_23sanster_23

5657 silver badges16 bronze badges




5657 silver badges16 bronze badges















  • its good only for the very small file where ppl have enough time for counting the file words but this will become painfull when i use it for very large file cause no one will count the number of words in the csv file as simple we will pass only rows no it will give the desire row ....anyway thnx for the answer

    – snehil singh
    Mar 28 at 12:34

















  • its good only for the very small file where ppl have enough time for counting the file words but this will become painfull when i use it for very large file cause no one will count the number of words in the csv file as simple we will pass only rows no it will give the desire row ....anyway thnx for the answer

    – snehil singh
    Mar 28 at 12:34
















its good only for the very small file where ppl have enough time for counting the file words but this will become painfull when i use it for very large file cause no one will count the number of words in the csv file as simple we will pass only rows no it will give the desire row ....anyway thnx for the answer

– snehil singh
Mar 28 at 12:34





its good only for the very small file where ppl have enough time for counting the file words but this will become painfull when i use it for very large file cause no one will count the number of words in the csv file as simple we will pass only rows no it will give the desire row ....anyway thnx for the answer

– snehil singh
Mar 28 at 12:34


















draft saved

draft discarded















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55396938%2fhow-to-read-only-5-records-from-s3-bucket-and-return-it-without-getting-all-data%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript