fastest way to sum the file sizes by owner in a directoryPerl Program to efficiently process 500,000 small files in a directorysum file sizes by year-month and include year-wise totals when printingUsing wget to recursively fetch a directory with arbitrary files in itCalculate size of files in shellShell command to tar directory excluding certain files/foldersUsing ls to list directories and their total sizesGiven two directory trees, how can I find out which files differ?Recursively counting files in a Linux directoryhow to do arithmetic operations with the size of the files and directory in shell scriptingPrinting size of the largest file and the average file sizebash script: calculate sum size of filesSort files (including those in subdirectories) based on their size and print file name and size

The plural of 'stomach"

What to do with wrong results in talks?

How do I rename a LINUX host without needing to reboot for the rename to take effect?

Your magic is very sketchy

Have I saved too much for retirement so far?

Trouble understanding overseas colleagues

HashMap containsKey() returns false although hashCode() and equals() are true

What is the oldest known work of fiction?

The baby cries all morning

Best way to store options for panels

Why are on-board computers allowed to change controls without notifying the pilots?

Hostile work environment after whistle-blowing on coworker and our boss. What do I do?

How will losing mobility of one hand affect my career as a programmer?

How to verify if g is a generator for p?

Greatest common substring

Curses work by shouting - How to avoid collateral damage?

voltage of sounds of mp3files

Is exact Kanji stroke length important?

Failed to fetch jessie backports repository

Ways to speed up user implemented RK4

Opposite of a diet

Can a monster with multiattack use this ability if they are missing a limb?

Everything Bob says is false. How does he get people to trust him?

Why does John Bercow say “unlock” after reading out the results of a vote?

fastest way to sum the file sizes by owner in a directory

Perl Program to efficiently process 500,000 small files in a directorysum file sizes by year-month and include year-wise totals when printingUsing wget to recursively fetch a directory with arbitrary files in itCalculate size of files in shellShell command to tar directory excluding certain files/foldersUsing ls to list directories and their total sizesGiven two directory trees, how can I find out which files differ?Recursively counting files in a Linux directoryhow to do arithmetic operations with the size of the files and directory in shell scriptingPrinting size of the largest file and the average file sizebash script: calculate sum size of filesSort files (including those in subdirectories) based on their size and print file name and size

I'm using the below command using an alias to print the sum of all file sizes by owner in a directory

ls -l $dir | awk ' NF>3 file[$3]+=$5 
END for( i in file) ss=file[i]; 
if(ss >=1024*1024*1024 ) size=ss/1024/1024/1024; unit="G" else 
if(ss>=1024*1024) size=ss/1024/1024; unit="M" else size=ss/1024; unit="K"; 
format="%.2f%s"; res=sprintf(format,size,unit); 
printf "%-8s %12dt%sn",res,file[i],i ' | sort -k2 -nr

but, it doesn't seem to be fast all the times.

Is it possible to get the same output in some other way, but faster?

asked Mar 21 at 15:13

stack0114106

4,8642423

2

why not parse ls

– Barmar
Mar 21 at 15:18

1

You don't need to escape newlines inside a string.

– Barmar
Mar 21 at 15:19

check superuser.com/a/597173

– UjinT34
Mar 21 at 15:31

When it's slow, how fast is ls -l $dir alone? On some file systems, listing large directories is very, very slow.

– Aaron Digulla
Mar 21 at 16:29

1

I have around 308,530 files under one such directory..

– stack0114106
Mar 21 at 18:14

|
show 1 more comment

I'm using the below command using an alias to print the sum of all file sizes by owner in a directory

ls -l $dir | awk ' NF>3 file[$3]+=$5 
END for( i in file) ss=file[i]; 
if(ss >=1024*1024*1024 ) size=ss/1024/1024/1024; unit="G" else 
if(ss>=1024*1024) size=ss/1024/1024; unit="M" else size=ss/1024; unit="K"; 
format="%.2f%s"; res=sprintf(format,size,unit); 
printf "%-8s %12dt%sn",res,file[i],i ' | sort -k2 -nr

but, it doesn't seem to be fast all the times.

Is it possible to get the same output in some other way, but faster?

asked Mar 21 at 15:13

stack0114106

4,8642423

2

why not parse ls

– Barmar
Mar 21 at 15:18

1

You don't need to escape newlines inside a string.

– Barmar
Mar 21 at 15:19

check superuser.com/a/597173

– UjinT34
Mar 21 at 15:31

When it's slow, how fast is ls -l $dir alone? On some file systems, listing large directories is very, very slow.

– Aaron Digulla
Mar 21 at 16:29

1

I have around 308,530 files under one such directory..

– stack0114106
Mar 21 at 18:14

|
show 1 more comment

I'm using the below command using an alias to print the sum of all file sizes by owner in a directory

ls -l $dir | awk ' NF>3 file[$3]+=$5 
END for( i in file) ss=file[i]; 
if(ss >=1024*1024*1024 ) size=ss/1024/1024/1024; unit="G" else 
if(ss>=1024*1024) size=ss/1024/1024; unit="M" else size=ss/1024; unit="K"; 
format="%.2f%s"; res=sprintf(format,size,unit); 
printf "%-8s %12dt%sn",res,file[i],i ' | sort -k2 -nr

but, it doesn't seem to be fast all the times.

Is it possible to get the same output in some other way, but faster?

asked Mar 21 at 15:13

stack0114106

4,8642423

I'm using the below command using an alias to print the sum of all file sizes by owner in a directory

ls -l $dir | awk ' NF>3 file[$3]+=$5 
END for( i in file) ss=file[i]; 
if(ss >=1024*1024*1024 ) size=ss/1024/1024/1024; unit="G" else 
if(ss>=1024*1024) size=ss/1024/1024; unit="M" else size=ss/1024; unit="K"; 
format="%.2f%s"; res=sprintf(format,size,unit); 
printf "%-8s %12dt%sn",res,file[i],i ' | sort -k2 -nr

but, it doesn't seem to be fast all the times.

Is it possible to get the same output in some other way, but faster?

linux shell perl

asked Mar 21 at 15:13

stack0114106

4,8642423

asked Mar 21 at 15:13

stack0114106

4,8642423

asked Mar 21 at 15:13

stack0114106

4,8642423

asked Mar 21 at 15:13

stack0114106

4,8642423

asked Mar 21 at 15:13

stack0114106

4,8642423

2

why not parse ls

– Barmar
Mar 21 at 15:18

1

You don't need to escape newlines inside a string.

– Barmar
Mar 21 at 15:19

check superuser.com/a/597173

– UjinT34
Mar 21 at 15:31

When it's slow, how fast is ls -l $dir alone? On some file systems, listing large directories is very, very slow.

– Aaron Digulla
Mar 21 at 16:29

1

I have around 308,530 files under one such directory..

– stack0114106
Mar 21 at 18:14

|
show 1 more comment

2

why not parse ls

– Barmar
Mar 21 at 15:18

1

You don't need to escape newlines inside a string.

– Barmar
Mar 21 at 15:19

check superuser.com/a/597173

– UjinT34
Mar 21 at 15:31

When it's slow, how fast is ls -l $dir alone? On some file systems, listing large directories is very, very slow.

– Aaron Digulla
Mar 21 at 16:29

1

I have around 308,530 files under one such directory..

– stack0114106
Mar 21 at 18:14

why not parse ls

– Barmar
Mar 21 at 15:18

You don't need to escape newlines inside a string.

– Barmar
Mar 21 at 15:19

check superuser.com/a/597173

– UjinT34
Mar 21 at 15:31

When it's slow, how fast is ls -l $dir alone? On some file systems, listing large directories is very, very slow.

– Aaron Digulla
Mar 21 at 16:29

I have around 308,530 files under one such directory..

– stack0114106
Mar 21 at 18:14

|
show 1 more comment

6 Answers
6

active

oldest

votes

Get a listing from Perl (tagged), add up sizes and sort it out by owner

perl -wE'
 chdir (shift // "."); 
 for (glob ".* *") 
 next if not -f;
 ($owner_id, $size) = (stat)[4,7]
 or do warn "Trouble stat for: $_"; next ;
 $rept$owner_id += $size 
 
 say (getpwuid($_)//$_, " => $rept$_ bytes") for sort keys %rept
'

I didn't get to benchmark it, and it'd be worth trying it out against an approach where the directory is iterated over, as opposed to glob-ed (even though glob was faster in a related problem).

I expect good runtimes in comparison with ls, which slows down dramatically as a file list in a single directory gets long. This is mostly due to the system so Perl will be affected as well but as far as I recall it handles it far better.

However, I've seen such dramatic slowdown in getting listings as an issue once entries get to around a hundred thousand, not a few thousand, so I am not sure why it runs slow on your system.

If this need be recursive then use File::Find. For example

perl -MFile::Find -wE'
 $dir = shift // "."; 
 find( sub 
 return if /^..?$/; 
 ($owner_id, $size) = (stat)[4,7] 
 or do warn "Trouble stat for: $_"; return ; 
 $rept$owner_id += $size 
 , $dir ); 
 say (getpwuid($_)//$_, "$_ => $rept$_ bytes") for keys %rept
'

This scans a directory with 2.4 Gb, with mostly small files distributed over a hierarchy of subdirectories, in a little over 2 seconds. The du -sh took around 5 seconds (the first time round).

edited Mar 22 at 9:07

answered Mar 21 at 17:31

zdim

34k32443

@stack0114106 "the first one runs and gives results" -- (1) is that where you see the error (in the first script), and (2) does the second one not run (or is that where you see the error)?

– zdim
Mar 21 at 18:41

@stack0114106 That would mean that $owner_id didn't get assigned (unless I have some typo from copy-pasting?) -- I can't readily imagine what kind of a beasty would not return owner it from stat ... ? Can you debug -- add a print like say "no owner id for $_" if not $owner_id; or some such. I can't as I don't have a problem

– zdim
Mar 21 at 18:43

I think if folks leave the organization, their id would be disabled, but the files would still be there.. and would that affect how getpwuid results??..

– stack0114106
Mar 21 at 18:47

@stack0114106 They can handle it in many ways (when a user leaves an organization), but I think that there would have to be an owner id for each file on the system. Can you add the print from my previous comment to see what that is about?

– zdim
Mar 21 at 18:51

@stack0114106 "the second one would take lot of time" -- for how large a hierarchy? How long does the awk parsed ls take (from the question) on that ? If that's just too big to try now can you compare on soemthing more reasonable? Perl File::Find should be fast, as much as one can expect recursive searching to go.

– zdim
Mar 21 at 18:54

|
show 27 more comments

Another perl one, that displays total sizes sorted by user:

#!/usr/bin/perl
use warnings;
use strict;
use autodie;
use feature qw/say/;
use File::Spec;
use Fcntl qw/:mode/;

my $dir = shift;
my %users;

opendir(my $d, $dir);
while (my $file = readdir $d) 
 my $filename = File::Spec->catfile($dir, $file);
 my ($mode, $uid, $size) = (stat $filename)[2, 4, 7];
 $users$uid += $size if S_ISREG($mode);

closedir $d;

my @sizes = sort $a->[0] cmp $b->[0] 
 map [ getpwuid($_) // $_, $users$_ ] keys %users;
local $, = "t";
say @$_ for @sizes;

edited Mar 21 at 16:35

answered Mar 21 at 16:27

Shawn

4,8572614

what does if S_ISREG($mode) do ?..

– stack0114106
Mar 21 at 18:16

yes your solution works.. thank you

– stack0114106
Mar 21 at 18:50

@stack0114106 It limits the size tracking to regular files - skips directories, fifos, sockets, devices, etc. Same idea as the -f $file in another answer, just a different way of checking.

– Shawn
Mar 21 at 21:09

add a comment |

Parsing output from ls - bad idea.

How about using find instead?

start in directory $dir
- limit to that directory level (-maxdepth 1)
- limit to files (-type f)
- print a line with user name and file size in bytes (-printf "%u %sn")

run the results through a perl filter
- split each line (-a)
- add to a hash under key (field 0) the size (field 1)
- at the end (END ...) print out the hash contents, sorted by key, i.e. user name

$ find $dir -maxdepth 1 -type f -printf "%u %sn" | 
 perl -ane '$s$F[0] += $F[1]; END print "$_ $s$_n" foreach (sort keys %s); '
stefanb 263305714

A solution using Perl:

#!/usr/bin/perl
use strict;
use warnings;
use autodie;

use File::Spec;

my %users;
foreach my $dir (@ARGV) 
 opendir(my $dh, $dir);

 # files in this directory
 while (my $entry = readdir($dh)) 
 my $file = File::Spec->catfile($dir, $entry);

 # only files
 if (-f $file) 
 my($uid, $size) = (stat($file))[4, 7];
 $users$uid += $size
 
 

 closedir($dh);


print "$_ $users$_n" foreach (sort keys %users);

exit 0;

Test run:

$ perl dummy.pl .
1000 263618544

Interesting difference. The Perl solution discovers 3 more files in my test directory than the find solution. I have to ponder why that is...

edited Mar 21 at 16:35

answered Mar 21 at 16:08

Stefan Becker

4,31521125

It should print for all owners..not just the current user.. the files are owned by diff users

– stack0114106
Mar 21 at 16:14

Updated accordingly

– Stefan Becker
Mar 21 at 16:26

your solution works.. thank you..

– stack0114106
Mar 21 at 18:51

add a comment |

Not sure why question is tagged perl when awk is being used.

Here's a simple perl version:

#!/usr/bin/perl

chdir($ARGV[0]) or die("Usage: $0 dirn");

map 
 if ( ! m/^[.][.]?$/o ) 
 ($s,$u) = (stat)[7,4];
 $h$u += $s;
 
 glob ".* *";

map 
 $s = $h$_;
 $u = !( $s >>10) ? ""
 : !(($s>>=10)>>10) ? "k"
 : !(($s>>=10)>>10) ? "M"
 : !(($s>>=10)>>10) ? "G"
 : ($s>>=10) ? "T"
 : undef
 ;
 printf "%-8s %12dt%sn", $s.$u, $h$_, getpwuid($_)//$_;
 keys %h;

glob gets our file list

m// discards . and ..

stat the size and uid

accumulate sizes in %h

compute the unit by bitshifting (>>10 is integer divide by 1024)

map uid to username (// provides fallback)

print results (unsorted)

NOTE: unlike some other answers, this code doesn't recurse into subdirectories

To exclude symlinks, subdirectories, etc, change the if to appropriate -X tests. (eg. (-f $_), (!-d $_ and !-l $_), etc). See perl docs on the _ filehandle optimisation for caching stat results.

edited Mar 21 at 20:08

answered Mar 21 at 16:19

jhnc

2,559214

I don't see m/// in the script. My guess is you're referring to !/^[.][.]?$/o?

– Aaron Digulla
Mar 21 at 16:27

yes. // is shortcut for m//. m is only needed if you want to use different delimiter (eg m[], m<>, etc). Three slashes was typo.

– jhnc
Mar 21 at 16:29

1

Please either use m// in the script or use the code from the script in the explanation. As it is, it's very confusing for people who don't know a lot about Perl.

– Aaron Digulla
Mar 21 at 16:32

@jhnc.. your solution also works..thank you

– stack0114106
Mar 21 at 18:49

add a comment |

Did I see some awk in the op? Here is one in GNU awk using filefuncs extension:

$ cat bar.awk
@load "filefuncs"
BEGIN 
 FS=":" # passwd field sep
 passwd="/etc/passwd" # get usernames from passwd
 while ((getline < passwd)>0)
 users[$3]=$1
 close(passwd) # close passwd

 if(path="") # set path with -v path=...
 path="." # default path is cwd
 pathlist[1]=path # path from the command line
 # you could have several paths
 fts(pathlist,FTS_PHYSICAL,filedata) # dont mind links (vs. FTS_LOGICAL)
 for(p in filedata) # p for paths
 for(f in filedata[p]) # f for files
 if(filedata[p][f]["stat"]["type"]=="file") # mind files only
 size[filedata[p][f]["stat"]["uid"]]+=filedata[p][f]["stat"]["size"]
 for(i in size)
 print (users[i]?users[i]:i),size[i] # print username if found else uid
 exit

Sample outputs:

$ ls -l
total 3623
drwxr-xr-x 2 james james 3690496 Mar 21 21:32 100kfiles/
-rw-r--r-- 1 root root 4 Mar 21 18:52 bar
-rw-r--r-- 1 james james 424 Mar 21 21:33 bar.awk
-rw-r--r-- 1 james james 546 Mar 21 21:19 bar.awk~
-rw-r--r-- 1 james james 315 Mar 21 19:14 foo.awk
-rw-r--r-- 1 james james 125 Mar 21 18:53 foo.awk~
$ awk -v path=. -f bar.awk
root 4
james 1410

Another:

$ time awk -v path=100kfiles -f bar.awk
root 4
james 342439926

real 0m1.289s
user 0m0.852s
sys 0m0.440s

Yet another test with a million empty files:

$ time awk -v path=../million_files -f bar.awk

real 0m5.057s
user 0m4.000s
sys 0m1.056s

edited Mar 22 at 7:08

answered Mar 21 at 17:23

James Brown

20.1k42037

looks like my awk doesn't have filefuncs awk: foo.awk:1: ^ invalid char '@' in expression

– stack0114106
Mar 21 at 18:04

TIme to upgrade to a modern version of GNU awk.

– James Brown
Mar 21 at 18:17

this is in Enterprise Linux - RHEL 6.10.. I see gawk pointing to /bin/gawk and the version is GNU Awk 3.1.7.. does it support @loadfiles?.. or is there any other location that would have another awk??..

– stack0114106
Mar 21 at 18:24

A wild guess that extensions came in GNU awk 4. But I saw you mentioned 300k files, this solution can't handle that many.

– James Brown
Mar 21 at 18:29

1

ok.. anyway good to know loadfiles...I did run this in my cygwin and it works..so ++

– stack0114106
Mar 21 at 18:31

|
show 2 more comments

Using datamash (and Stefan Becker's find code):

find $dir -maxdepth 1 -type f -printf "%ut%sn" | datamash -sg 1 sum 2

edited Mar 22 at 11:40

answered Mar 22 at 11:36

agc

5,0171438

@agc..the answer seems to be simple.. is datamash available in RHEL 6.1?

– stack0114106
Mar 22 at 11:38

@stack0114106, Not sure -- RPM files exist, but whether those work in RHEL 6.1 is unclear without a 6.1 box to test on.

– agc
Mar 22 at 11:58

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55283652%2ffastest-way-to-sum-the-file-sizes-by-owner-in-a-directory%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

6 Answers
6

active

oldest

votes

6 Answers
6

active

oldest

votes

Get a listing from Perl (tagged), add up sizes and sort it out by owner

perl -wE'
 chdir (shift // "."); 
 for (glob ".* *") 
 next if not -f;
 ($owner_id, $size) = (stat)[4,7]
 or do warn "Trouble stat for: $_"; next ;
 $rept$owner_id += $size 
 
 say (getpwuid($_)//$_, " => $rept$_ bytes") for sort keys %rept
'

I didn't get to benchmark it, and it'd be worth trying it out against an approach where the directory is iterated over, as opposed to glob-ed (even though glob was faster in a related problem).

However, I've seen such dramatic slowdown in getting listings as an issue once entries get to around a hundred thousand, not a few thousand, so I am not sure why it runs slow on your system.

If this need be recursive then use File::Find. For example

perl -MFile::Find -wE'
 $dir = shift // "."; 
 find( sub 
 return if /^..?$/; 
 ($owner_id, $size) = (stat)[4,7] 
 or do warn "Trouble stat for: $_"; return ; 
 $rept$owner_id += $size 
 , $dir ); 
 say (getpwuid($_)//$_, "$_ => $rept$_ bytes") for keys %rept
'

This scans a directory with 2.4 Gb, with mostly small files distributed over a hierarchy of subdirectories, in a little over 2 seconds. The du -sh took around 5 seconds (the first time round).

edited Mar 22 at 9:07

answered Mar 21 at 17:31

zdim

34k32443

@stack0114106 "the first one runs and gives results" -- (1) is that where you see the error (in the first script), and (2) does the second one not run (or is that where you see the error)?

– zdim
Mar 21 at 18:41

@stack0114106 That would mean that $owner_id didn't get assigned (unless I have some typo from copy-pasting?) -- I can't readily imagine what kind of a beasty would not return owner it from stat ... ? Can you debug -- add a print like say "no owner id for $_" if not $owner_id; or some such. I can't as I don't have a problem

– zdim
Mar 21 at 18:43

I think if folks leave the organization, their id would be disabled, but the files would still be there.. and would that affect how getpwuid results??..

– stack0114106
Mar 21 at 18:47

@stack0114106 They can handle it in many ways (when a user leaves an organization), but I think that there would have to be an owner id for each file on the system. Can you add the print from my previous comment to see what that is about?

– zdim
Mar 21 at 18:51

@stack0114106 "the second one would take lot of time" -- for how large a hierarchy? How long does the awk parsed ls take (from the question) on that ? If that's just too big to try now can you compare on soemthing more reasonable? Perl File::Find should be fast, as much as one can expect recursive searching to go.

– zdim
Mar 21 at 18:54

|
show 27 more comments

Get a listing from Perl (tagged), add up sizes and sort it out by owner

perl -wE'
 chdir (shift // "."); 
 for (glob ".* *") 
 next if not -f;
 ($owner_id, $size) = (stat)[4,7]
 or do warn "Trouble stat for: $_"; next ;
 $rept$owner_id += $size 
 
 say (getpwuid($_)//$_, " => $rept$_ bytes") for sort keys %rept
'

I didn't get to benchmark it, and it'd be worth trying it out against an approach where the directory is iterated over, as opposed to glob-ed (even though glob was faster in a related problem).

However, I've seen such dramatic slowdown in getting listings as an issue once entries get to around a hundred thousand, not a few thousand, so I am not sure why it runs slow on your system.

If this need be recursive then use File::Find. For example

perl -MFile::Find -wE'
 $dir = shift // "."; 
 find( sub 
 return if /^..?$/; 
 ($owner_id, $size) = (stat)[4,7] 
 or do warn "Trouble stat for: $_"; return ; 
 $rept$owner_id += $size 
 , $dir ); 
 say (getpwuid($_)//$_, "$_ => $rept$_ bytes") for keys %rept
'

This scans a directory with 2.4 Gb, with mostly small files distributed over a hierarchy of subdirectories, in a little over 2 seconds. The du -sh took around 5 seconds (the first time round).

edited Mar 22 at 9:07

answered Mar 21 at 17:31

zdim

34k32443

@stack0114106 "the first one runs and gives results" -- (1) is that where you see the error (in the first script), and (2) does the second one not run (or is that where you see the error)?

– zdim
Mar 21 at 18:41

@stack0114106 That would mean that $owner_id didn't get assigned (unless I have some typo from copy-pasting?) -- I can't readily imagine what kind of a beasty would not return owner it from stat ... ? Can you debug -- add a print like say "no owner id for $_" if not $owner_id; or some such. I can't as I don't have a problem

– zdim
Mar 21 at 18:43

I think if folks leave the organization, their id would be disabled, but the files would still be there.. and would that affect how getpwuid results??..

– stack0114106
Mar 21 at 18:47

@stack0114106 They can handle it in many ways (when a user leaves an organization), but I think that there would have to be an owner id for each file on the system. Can you add the print from my previous comment to see what that is about?

– zdim
Mar 21 at 18:51

@stack0114106 "the second one would take lot of time" -- for how large a hierarchy? How long does the awk parsed ls take (from the question) on that ? If that's just too big to try now can you compare on soemthing more reasonable? Perl File::Find should be fast, as much as one can expect recursive searching to go.

– zdim
Mar 21 at 18:54

|
show 27 more comments

Get a listing from Perl (tagged), add up sizes and sort it out by owner

perl -wE'
 chdir (shift // "."); 
 for (glob ".* *") 
 next if not -f;
 ($owner_id, $size) = (stat)[4,7]
 or do warn "Trouble stat for: $_"; next ;
 $rept$owner_id += $size 
 
 say (getpwuid($_)//$_, " => $rept$_ bytes") for sort keys %rept
'

I didn't get to benchmark it, and it'd be worth trying it out against an approach where the directory is iterated over, as opposed to glob-ed (even though glob was faster in a related problem).

However, I've seen such dramatic slowdown in getting listings as an issue once entries get to around a hundred thousand, not a few thousand, so I am not sure why it runs slow on your system.

If this need be recursive then use File::Find. For example

perl -MFile::Find -wE'
 $dir = shift // "."; 
 find( sub 
 return if /^..?$/; 
 ($owner_id, $size) = (stat)[4,7] 
 or do warn "Trouble stat for: $_"; return ; 
 $rept$owner_id += $size 
 , $dir ); 
 say (getpwuid($_)//$_, "$_ => $rept$_ bytes") for keys %rept
'

This scans a directory with 2.4 Gb, with mostly small files distributed over a hierarchy of subdirectories, in a little over 2 seconds. The du -sh took around 5 seconds (the first time round).

edited Mar 22 at 9:07

answered Mar 21 at 17:31

zdim

34k32443

Get a listing from Perl (tagged), add up sizes and sort it out by owner

perl -wE'
 chdir (shift // "."); 
 for (glob ".* *") 
 next if not -f;
 ($owner_id, $size) = (stat)[4,7]
 or do warn "Trouble stat for: $_"; next ;
 $rept$owner_id += $size 
 
 say (getpwuid($_)//$_, " => $rept$_ bytes") for sort keys %rept
'

I didn't get to benchmark it, and it'd be worth trying it out against an approach where the directory is iterated over, as opposed to glob-ed (even though glob was faster in a related problem).

However, I've seen such dramatic slowdown in getting listings as an issue once entries get to around a hundred thousand, not a few thousand, so I am not sure why it runs slow on your system.

If this need be recursive then use File::Find. For example

perl -MFile::Find -wE'
 $dir = shift // "."; 
 find( sub 
 return if /^..?$/; 
 ($owner_id, $size) = (stat)[4,7] 
 or do warn "Trouble stat for: $_"; return ; 
 $rept$owner_id += $size 
 , $dir ); 
 say (getpwuid($_)//$_, "$_ => $rept$_ bytes") for keys %rept
'

This scans a directory with 2.4 Gb, with mostly small files distributed over a hierarchy of subdirectories, in a little over 2 seconds. The du -sh took around 5 seconds (the first time round).

edited Mar 22 at 9:07

answered Mar 21 at 17:31

zdim

34k32443

edited Mar 22 at 9:07

answered Mar 21 at 17:31

zdim

34k32443

answered Mar 21 at 17:31

zdim

34k32443

answered Mar 21 at 17:31

zdim

34k32443

@stack0114106 "the first one runs and gives results" -- (1) is that where you see the error (in the first script), and (2) does the second one not run (or is that where you see the error)?

– zdim
Mar 21 at 18:41

@stack0114106 That would mean that $owner_id didn't get assigned (unless I have some typo from copy-pasting?) -- I can't readily imagine what kind of a beasty would not return owner it from stat ... ? Can you debug -- add a print like say "no owner id for $_" if not $owner_id; or some such. I can't as I don't have a problem

– zdim
Mar 21 at 18:43

I think if folks leave the organization, their id would be disabled, but the files would still be there.. and would that affect how getpwuid results??..

– stack0114106
Mar 21 at 18:47

@stack0114106 They can handle it in many ways (when a user leaves an organization), but I think that there would have to be an owner id for each file on the system. Can you add the print from my previous comment to see what that is about?

– zdim
Mar 21 at 18:51

@stack0114106 "the second one would take lot of time" -- for how large a hierarchy? How long does the awk parsed ls take (from the question) on that ? If that's just too big to try now can you compare on soemthing more reasonable? Perl File::Find should be fast, as much as one can expect recursive searching to go.

– zdim
Mar 21 at 18:54

|
show 27 more comments

@stack0114106 "the first one runs and gives results" -- (1) is that where you see the error (in the first script), and (2) does the second one not run (or is that where you see the error)?

– zdim
Mar 21 at 18:41

@stack0114106 That would mean that $owner_id didn't get assigned (unless I have some typo from copy-pasting?) -- I can't readily imagine what kind of a beasty would not return owner it from stat ... ? Can you debug -- add a print like say "no owner id for $_" if not $owner_id; or some such. I can't as I don't have a problem

– zdim
Mar 21 at 18:43

I think if folks leave the organization, their id would be disabled, but the files would still be there.. and would that affect how getpwuid results??..

– stack0114106
Mar 21 at 18:47

@stack0114106 They can handle it in many ways (when a user leaves an organization), but I think that there would have to be an owner id for each file on the system. Can you add the print from my previous comment to see what that is about?

– zdim
Mar 21 at 18:51

@stack0114106 "the second one would take lot of time" -- for how large a hierarchy? How long does the awk parsed ls take (from the question) on that ? If that's just too big to try now can you compare on soemthing more reasonable? Perl File::Find should be fast, as much as one can expect recursive searching to go.

– zdim
Mar 21 at 18:54

@stack0114106 "the first one runs and gives results" -- (1) is that where you see the error (in the first script), and (2) does the second one not run (or is that where you see the error)?

– zdim
Mar 21 at 18:41

@stack0114106 That would mean that $owner_id didn't get assigned (unless I have some typo from copy-pasting?) -- I can't readily imagine what kind of a beasty would not return owner it from stat ... ? Can you debug -- add a print like say "no owner id for $_" if not $owner_id; or some such. I can't as I don't have a problem

– zdim
Mar 21 at 18:43

I think if folks leave the organization, their id would be disabled, but the files would still be there.. and would that affect how getpwuid results??..

– stack0114106
Mar 21 at 18:47

@stack0114106 They can handle it in many ways (when a user leaves an organization), but I think that there would have to be an owner id for each file on the system. Can you add the print from my previous comment to see what that is about?

– zdim
Mar 21 at 18:51

@stack0114106 "the second one would take lot of time" -- for how large a hierarchy? How long does the awk parsed ls take (from the question) on that ? If that's just too big to try now can you compare on soemthing more reasonable? Perl File::Find should be fast, as much as one can expect recursive searching to go.

– zdim
Mar 21 at 18:54

|
show 27 more comments

Another perl one, that displays total sizes sorted by user:

#!/usr/bin/perl
use warnings;
use strict;
use autodie;
use feature qw/say/;
use File::Spec;
use Fcntl qw/:mode/;

my $dir = shift;
my %users;

opendir(my $d, $dir);
while (my $file = readdir $d) 
 my $filename = File::Spec->catfile($dir, $file);
 my ($mode, $uid, $size) = (stat $filename)[2, 4, 7];
 $users$uid += $size if S_ISREG($mode);

closedir $d;

my @sizes = sort $a->[0] cmp $b->[0] 
 map [ getpwuid($_) // $_, $users$_ ] keys %users;
local $, = "t";
say @$_ for @sizes;

edited Mar 21 at 16:35

answered Mar 21 at 16:27

Shawn

4,8572614

what does if S_ISREG($mode) do ?..

– stack0114106
Mar 21 at 18:16

yes your solution works.. thank you

– stack0114106
Mar 21 at 18:50

@stack0114106 It limits the size tracking to regular files - skips directories, fifos, sockets, devices, etc. Same idea as the -f $file in another answer, just a different way of checking.

– Shawn
Mar 21 at 21:09

add a comment |

Another perl one, that displays total sizes sorted by user:

#!/usr/bin/perl
use warnings;
use strict;
use autodie;
use feature qw/say/;
use File::Spec;
use Fcntl qw/:mode/;

my $dir = shift;
my %users;

opendir(my $d, $dir);
while (my $file = readdir $d) 
 my $filename = File::Spec->catfile($dir, $file);
 my ($mode, $uid, $size) = (stat $filename)[2, 4, 7];
 $users$uid += $size if S_ISREG($mode);

closedir $d;

my @sizes = sort $a->[0] cmp $b->[0] 
 map [ getpwuid($_) // $_, $users$_ ] keys %users;
local $, = "t";
say @$_ for @sizes;

edited Mar 21 at 16:35

answered Mar 21 at 16:27

Shawn

4,8572614

what does if S_ISREG($mode) do ?..

– stack0114106
Mar 21 at 18:16

yes your solution works.. thank you

– stack0114106
Mar 21 at 18:50

@stack0114106 It limits the size tracking to regular files - skips directories, fifos, sockets, devices, etc. Same idea as the -f $file in another answer, just a different way of checking.

– Shawn
Mar 21 at 21:09

add a comment |

Another perl one, that displays total sizes sorted by user:

#!/usr/bin/perl
use warnings;
use strict;
use autodie;
use feature qw/say/;
use File::Spec;
use Fcntl qw/:mode/;

my $dir = shift;
my %users;

opendir(my $d, $dir);
while (my $file = readdir $d) 
 my $filename = File::Spec->catfile($dir, $file);
 my ($mode, $uid, $size) = (stat $filename)[2, 4, 7];
 $users$uid += $size if S_ISREG($mode);

closedir $d;

my @sizes = sort $a->[0] cmp $b->[0] 
 map [ getpwuid($_) // $_, $users$_ ] keys %users;
local $, = "t";
say @$_ for @sizes;

edited Mar 21 at 16:35

answered Mar 21 at 16:27

Shawn

4,8572614

Another perl one, that displays total sizes sorted by user:

#!/usr/bin/perl
use warnings;
use strict;
use autodie;
use feature qw/say/;
use File::Spec;
use Fcntl qw/:mode/;

my $dir = shift;
my %users;

opendir(my $d, $dir);
while (my $file = readdir $d) 
 my $filename = File::Spec->catfile($dir, $file);
 my ($mode, $uid, $size) = (stat $filename)[2, 4, 7];
 $users$uid += $size if S_ISREG($mode);

closedir $d;

my @sizes = sort $a->[0] cmp $b->[0] 
 map [ getpwuid($_) // $_, $users$_ ] keys %users;
local $, = "t";
say @$_ for @sizes;

edited Mar 21 at 16:35

answered Mar 21 at 16:27

Shawn

4,8572614

edited Mar 21 at 16:35

answered Mar 21 at 16:27

Shawn

4,8572614

answered Mar 21 at 16:27

Shawn

4,8572614

answered Mar 21 at 16:27

Shawn

4,8572614

what does if S_ISREG($mode) do ?..

– stack0114106
Mar 21 at 18:16

yes your solution works.. thank you

– stack0114106
Mar 21 at 18:50

@stack0114106 It limits the size tracking to regular files - skips directories, fifos, sockets, devices, etc. Same idea as the -f $file in another answer, just a different way of checking.

– Shawn
Mar 21 at 21:09

add a comment |

what does if S_ISREG($mode) do ?..

– stack0114106
Mar 21 at 18:16

yes your solution works.. thank you

– stack0114106
Mar 21 at 18:50

@stack0114106 It limits the size tracking to regular files - skips directories, fifos, sockets, devices, etc. Same idea as the -f $file in another answer, just a different way of checking.

– Shawn
Mar 21 at 21:09

what does if S_ISREG($mode) do ?..

– stack0114106
Mar 21 at 18:16

yes your solution works.. thank you

– stack0114106
Mar 21 at 18:50

@stack0114106 It limits the size tracking to regular files - skips directories, fifos, sockets, devices, etc. Same idea as the -f $file in another answer, just a different way of checking.

– Shawn
Mar 21 at 21:09

add a comment |

Parsing output from ls - bad idea.

How about using find instead?

start in directory $dir
- limit to that directory level (-maxdepth 1)
- limit to files (-type f)
- print a line with user name and file size in bytes (-printf "%u %sn")

run the results through a perl filter
- split each line (-a)
- add to a hash under key (field 0) the size (field 1)
- at the end (END ...) print out the hash contents, sorted by key, i.e. user name

$ find $dir -maxdepth 1 -type f -printf "%u %sn" | 
 perl -ane '$s$F[0] += $F[1]; END print "$_ $s$_n" foreach (sort keys %s); '
stefanb 263305714

A solution using Perl:

#!/usr/bin/perl
use strict;
use warnings;
use autodie;

use File::Spec;

my %users;
foreach my $dir (@ARGV) 
 opendir(my $dh, $dir);

 # files in this directory
 while (my $entry = readdir($dh)) 
 my $file = File::Spec->catfile($dir, $entry);

 # only files
 if (-f $file) 
 my($uid, $size) = (stat($file))[4, 7];
 $users$uid += $size
 
 

 closedir($dh);


print "$_ $users$_n" foreach (sort keys %users);

exit 0;

Test run:

$ perl dummy.pl .
1000 263618544

Interesting difference. The Perl solution discovers 3 more files in my test directory than the find solution. I have to ponder why that is...

edited Mar 21 at 16:35

answered Mar 21 at 16:08

Stefan Becker

4,31521125

It should print for all owners..not just the current user.. the files are owned by diff users

– stack0114106
Mar 21 at 16:14

Updated accordingly

– Stefan Becker
Mar 21 at 16:26

your solution works.. thank you..

– stack0114106
Mar 21 at 18:51

add a comment |

Parsing output from ls - bad idea.

How about using find instead?

start in directory $dir
- limit to that directory level (-maxdepth 1)
- limit to files (-type f)
- print a line with user name and file size in bytes (-printf "%u %sn")

run the results through a perl filter
- split each line (-a)
- add to a hash under key (field 0) the size (field 1)
- at the end (END ...) print out the hash contents, sorted by key, i.e. user name

$ find $dir -maxdepth 1 -type f -printf "%u %sn" | 
 perl -ane '$s$F[0] += $F[1]; END print "$_ $s$_n" foreach (sort keys %s); '
stefanb 263305714

A solution using Perl:

#!/usr/bin/perl
use strict;
use warnings;
use autodie;

use File::Spec;

my %users;
foreach my $dir (@ARGV) 
 opendir(my $dh, $dir);

 # files in this directory
 while (my $entry = readdir($dh)) 
 my $file = File::Spec->catfile($dir, $entry);

 # only files
 if (-f $file) 
 my($uid, $size) = (stat($file))[4, 7];
 $users$uid += $size
 
 

 closedir($dh);


print "$_ $users$_n" foreach (sort keys %users);

exit 0;

Test run:

$ perl dummy.pl .
1000 263618544

Interesting difference. The Perl solution discovers 3 more files in my test directory than the find solution. I have to ponder why that is...

edited Mar 21 at 16:35

answered Mar 21 at 16:08

Stefan Becker

4,31521125

It should print for all owners..not just the current user.. the files are owned by diff users

– stack0114106
Mar 21 at 16:14

Updated accordingly

– Stefan Becker
Mar 21 at 16:26

your solution works.. thank you..

– stack0114106
Mar 21 at 18:51

add a comment |

Parsing output from ls - bad idea.

How about using find instead?

start in directory $dir
- limit to that directory level (-maxdepth 1)
- limit to files (-type f)
- print a line with user name and file size in bytes (-printf "%u %sn")

run the results through a perl filter
- split each line (-a)
- add to a hash under key (field 0) the size (field 1)
- at the end (END ...) print out the hash contents, sorted by key, i.e. user name

$ find $dir -maxdepth 1 -type f -printf "%u %sn" | 
 perl -ane '$s$F[0] += $F[1]; END print "$_ $s$_n" foreach (sort keys %s); '
stefanb 263305714

A solution using Perl:

#!/usr/bin/perl
use strict;
use warnings;
use autodie;

use File::Spec;

my %users;
foreach my $dir (@ARGV) 
 opendir(my $dh, $dir);

 # files in this directory
 while (my $entry = readdir($dh)) 
 my $file = File::Spec->catfile($dir, $entry);

 # only files
 if (-f $file) 
 my($uid, $size) = (stat($file))[4, 7];
 $users$uid += $size
 
 

 closedir($dh);


print "$_ $users$_n" foreach (sort keys %users);

exit 0;

Test run:

$ perl dummy.pl .
1000 263618544

Interesting difference. The Perl solution discovers 3 more files in my test directory than the find solution. I have to ponder why that is...

edited Mar 21 at 16:35

answered Mar 21 at 16:08

Stefan Becker

4,31521125

Parsing output from ls - bad idea.

How about using find instead?

start in directory $dir
- limit to that directory level (-maxdepth 1)
- limit to files (-type f)
- print a line with user name and file size in bytes (-printf "%u %sn")

run the results through a perl filter
- split each line (-a)
- add to a hash under key (field 0) the size (field 1)
- at the end (END ...) print out the hash contents, sorted by key, i.e. user name

$ find $dir -maxdepth 1 -type f -printf "%u %sn" | 
 perl -ane '$s$F[0] += $F[1]; END print "$_ $s$_n" foreach (sort keys %s); '
stefanb 263305714

A solution using Perl:

#!/usr/bin/perl
use strict;
use warnings;
use autodie;

use File::Spec;

my %users;
foreach my $dir (@ARGV) 
 opendir(my $dh, $dir);

 # files in this directory
 while (my $entry = readdir($dh)) 
 my $file = File::Spec->catfile($dir, $entry);

 # only files
 if (-f $file) 
 my($uid, $size) = (stat($file))[4, 7];
 $users$uid += $size
 
 

 closedir($dh);


print "$_ $users$_n" foreach (sort keys %users);

exit 0;

Test run:

$ perl dummy.pl .
1000 263618544

Interesting difference. The Perl solution discovers 3 more files in my test directory than the find solution. I have to ponder why that is...

edited Mar 21 at 16:35

answered Mar 21 at 16:08

Stefan Becker

4,31521125

edited Mar 21 at 16:35

answered Mar 21 at 16:08

Stefan Becker

4,31521125

answered Mar 21 at 16:08

Stefan Becker

4,31521125

answered Mar 21 at 16:08

Stefan Becker

4,31521125

It should print for all owners..not just the current user.. the files are owned by diff users

– stack0114106
Mar 21 at 16:14

Updated accordingly

– Stefan Becker
Mar 21 at 16:26

your solution works.. thank you..

– stack0114106
Mar 21 at 18:51

add a comment |

It should print for all owners..not just the current user.. the files are owned by diff users

– stack0114106
Mar 21 at 16:14

Updated accordingly

– Stefan Becker
Mar 21 at 16:26

your solution works.. thank you..

– stack0114106
Mar 21 at 18:51

It should print for all owners..not just the current user.. the files are owned by diff users

– stack0114106
Mar 21 at 16:14

Updated accordingly

– Stefan Becker
Mar 21 at 16:26

your solution works.. thank you..

– stack0114106
Mar 21 at 18:51

add a comment |

Not sure why question is tagged perl when awk is being used.

Here's a simple perl version:

#!/usr/bin/perl

chdir($ARGV[0]) or die("Usage: $0 dirn");

map 
 if ( ! m/^[.][.]?$/o ) 
 ($s,$u) = (stat)[7,4];
 $h$u += $s;
 
 glob ".* *";

map 
 $s = $h$_;
 $u = !( $s >>10) ? ""
 : !(($s>>=10)>>10) ? "k"
 : !(($s>>=10)>>10) ? "M"
 : !(($s>>=10)>>10) ? "G"
 : ($s>>=10) ? "T"
 : undef
 ;
 printf "%-8s %12dt%sn", $s.$u, $h$_, getpwuid($_)//$_;
 keys %h;

glob gets our file list

m// discards . and ..

stat the size and uid

accumulate sizes in %h

compute the unit by bitshifting (>>10 is integer divide by 1024)

map uid to username (// provides fallback)

print results (unsorted)

NOTE: unlike some other answers, this code doesn't recurse into subdirectories

edited Mar 21 at 20:08

answered Mar 21 at 16:19

jhnc

2,559214

I don't see m/// in the script. My guess is you're referring to !/^[.][.]?$/o?

– Aaron Digulla
Mar 21 at 16:27

yes. // is shortcut for m//. m is only needed if you want to use different delimiter (eg m[], m<>, etc). Three slashes was typo.

– jhnc
Mar 21 at 16:29

1

Please either use m// in the script or use the code from the script in the explanation. As it is, it's very confusing for people who don't know a lot about Perl.

– Aaron Digulla
Mar 21 at 16:32

@jhnc.. your solution also works..thank you

– stack0114106
Mar 21 at 18:49

add a comment |

Not sure why question is tagged perl when awk is being used.

Here's a simple perl version:

#!/usr/bin/perl

chdir($ARGV[0]) or die("Usage: $0 dirn");

map 
 if ( ! m/^[.][.]?$/o ) 
 ($s,$u) = (stat)[7,4];
 $h$u += $s;
 
 glob ".* *";

map 
 $s = $h$_;
 $u = !( $s >>10) ? ""
 : !(($s>>=10)>>10) ? "k"
 : !(($s>>=10)>>10) ? "M"
 : !(($s>>=10)>>10) ? "G"
 : ($s>>=10) ? "T"
 : undef
 ;
 printf "%-8s %12dt%sn", $s.$u, $h$_, getpwuid($_)//$_;
 keys %h;

glob gets our file list

m// discards . and ..

stat the size and uid

accumulate sizes in %h

compute the unit by bitshifting (>>10 is integer divide by 1024)

map uid to username (// provides fallback)

print results (unsorted)

NOTE: unlike some other answers, this code doesn't recurse into subdirectories

edited Mar 21 at 20:08

answered Mar 21 at 16:19

jhnc

2,559214

I don't see m/// in the script. My guess is you're referring to !/^[.][.]?$/o?

– Aaron Digulla
Mar 21 at 16:27

yes. // is shortcut for m//. m is only needed if you want to use different delimiter (eg m[], m<>, etc). Three slashes was typo.

– jhnc
Mar 21 at 16:29

1

Please either use m// in the script or use the code from the script in the explanation. As it is, it's very confusing for people who don't know a lot about Perl.

– Aaron Digulla
Mar 21 at 16:32

@jhnc.. your solution also works..thank you

– stack0114106
Mar 21 at 18:49

add a comment |

Not sure why question is tagged perl when awk is being used.

Here's a simple perl version:

#!/usr/bin/perl

chdir($ARGV[0]) or die("Usage: $0 dirn");

map 
 if ( ! m/^[.][.]?$/o ) 
 ($s,$u) = (stat)[7,4];
 $h$u += $s;
 
 glob ".* *";

map 
 $s = $h$_;
 $u = !( $s >>10) ? ""
 : !(($s>>=10)>>10) ? "k"
 : !(($s>>=10)>>10) ? "M"
 : !(($s>>=10)>>10) ? "G"
 : ($s>>=10) ? "T"
 : undef
 ;
 printf "%-8s %12dt%sn", $s.$u, $h$_, getpwuid($_)//$_;
 keys %h;

glob gets our file list

m// discards . and ..

stat the size and uid

accumulate sizes in %h

compute the unit by bitshifting (>>10 is integer divide by 1024)

map uid to username (// provides fallback)

print results (unsorted)

NOTE: unlike some other answers, this code doesn't recurse into subdirectories

edited Mar 21 at 20:08

answered Mar 21 at 16:19

jhnc

2,559214

Not sure why question is tagged perl when awk is being used.

Here's a simple perl version:

#!/usr/bin/perl

chdir($ARGV[0]) or die("Usage: $0 dirn");

map 
 if ( ! m/^[.][.]?$/o ) 
 ($s,$u) = (stat)[7,4];
 $h$u += $s;
 
 glob ".* *";

map 
 $s = $h$_;
 $u = !( $s >>10) ? ""
 : !(($s>>=10)>>10) ? "k"
 : !(($s>>=10)>>10) ? "M"
 : !(($s>>=10)>>10) ? "G"
 : ($s>>=10) ? "T"
 : undef
 ;
 printf "%-8s %12dt%sn", $s.$u, $h$_, getpwuid($_)//$_;
 keys %h;

glob gets our file list

m// discards . and ..

stat the size and uid

accumulate sizes in %h

compute the unit by bitshifting (>>10 is integer divide by 1024)

map uid to username (// provides fallback)

print results (unsorted)

NOTE: unlike some other answers, this code doesn't recurse into subdirectories

edited Mar 21 at 20:08

answered Mar 21 at 16:19

jhnc

2,559214

edited Mar 21 at 20:08

answered Mar 21 at 16:19

jhnc

2,559214

answered Mar 21 at 16:19

jhnc

2,559214

answered Mar 21 at 16:19

jhnc

2,559214

I don't see m/// in the script. My guess is you're referring to !/^[.][.]?$/o?

– Aaron Digulla
Mar 21 at 16:27

yes. // is shortcut for m//. m is only needed if you want to use different delimiter (eg m[], m<>, etc). Three slashes was typo.

– jhnc
Mar 21 at 16:29

1

Please either use m// in the script or use the code from the script in the explanation. As it is, it's very confusing for people who don't know a lot about Perl.

– Aaron Digulla
Mar 21 at 16:32

@jhnc.. your solution also works..thank you

– stack0114106
Mar 21 at 18:49

add a comment |

I don't see m/// in the script. My guess is you're referring to !/^[.][.]?$/o?

– Aaron Digulla
Mar 21 at 16:27

yes. // is shortcut for m//. m is only needed if you want to use different delimiter (eg m[], m<>, etc). Three slashes was typo.

– jhnc
Mar 21 at 16:29

1

Please either use m// in the script or use the code from the script in the explanation. As it is, it's very confusing for people who don't know a lot about Perl.

– Aaron Digulla
Mar 21 at 16:32

@jhnc.. your solution also works..thank you

– stack0114106
Mar 21 at 18:49

I don't see m/// in the script. My guess is you're referring to !/^[.][.]?$/o?

– Aaron Digulla
Mar 21 at 16:27

yes. // is shortcut for m//. m is only needed if you want to use different delimiter (eg m[], m<>, etc). Three slashes was typo.

– jhnc
Mar 21 at 16:29

Please either use m// in the script or use the code from the script in the explanation. As it is, it's very confusing for people who don't know a lot about Perl.

– Aaron Digulla
Mar 21 at 16:32

@jhnc.. your solution also works..thank you

– stack0114106
Mar 21 at 18:49

add a comment |

Did I see some awk in the op? Here is one in GNU awk using filefuncs extension:

$ cat bar.awk
@load "filefuncs"
BEGIN 
 FS=":" # passwd field sep
 passwd="/etc/passwd" # get usernames from passwd
 while ((getline < passwd)>0)
 users[$3]=$1
 close(passwd) # close passwd

 if(path="") # set path with -v path=...
 path="." # default path is cwd
 pathlist[1]=path # path from the command line
 # you could have several paths
 fts(pathlist,FTS_PHYSICAL,filedata) # dont mind links (vs. FTS_LOGICAL)
 for(p in filedata) # p for paths
 for(f in filedata[p]) # f for files
 if(filedata[p][f]["stat"]["type"]=="file") # mind files only
 size[filedata[p][f]["stat"]["uid"]]+=filedata[p][f]["stat"]["size"]
 for(i in size)
 print (users[i]?users[i]:i),size[i] # print username if found else uid
 exit

Sample outputs:

$ ls -l
total 3623
drwxr-xr-x 2 james james 3690496 Mar 21 21:32 100kfiles/
-rw-r--r-- 1 root root 4 Mar 21 18:52 bar
-rw-r--r-- 1 james james 424 Mar 21 21:33 bar.awk
-rw-r--r-- 1 james james 546 Mar 21 21:19 bar.awk~
-rw-r--r-- 1 james james 315 Mar 21 19:14 foo.awk
-rw-r--r-- 1 james james 125 Mar 21 18:53 foo.awk~
$ awk -v path=. -f bar.awk
root 4
james 1410

Another:

$ time awk -v path=100kfiles -f bar.awk
root 4
james 342439926

real 0m1.289s
user 0m0.852s
sys 0m0.440s

Yet another test with a million empty files:

$ time awk -v path=../million_files -f bar.awk

real 0m5.057s
user 0m4.000s
sys 0m1.056s

edited Mar 22 at 7:08

answered Mar 21 at 17:23

James Brown

20.1k42037

looks like my awk doesn't have filefuncs awk: foo.awk:1: ^ invalid char '@' in expression

– stack0114106
Mar 21 at 18:04

TIme to upgrade to a modern version of GNU awk.

– James Brown
Mar 21 at 18:17

this is in Enterprise Linux - RHEL 6.10.. I see gawk pointing to /bin/gawk and the version is GNU Awk 3.1.7.. does it support @loadfiles?.. or is there any other location that would have another awk??..

– stack0114106
Mar 21 at 18:24

A wild guess that extensions came in GNU awk 4. But I saw you mentioned 300k files, this solution can't handle that many.

– James Brown
Mar 21 at 18:29

1

ok.. anyway good to know loadfiles...I did run this in my cygwin and it works..so ++

– stack0114106
Mar 21 at 18:31

|
show 2 more comments

Did I see some awk in the op? Here is one in GNU awk using filefuncs extension:

$ cat bar.awk
@load "filefuncs"
BEGIN 
 FS=":" # passwd field sep
 passwd="/etc/passwd" # get usernames from passwd
 while ((getline < passwd)>0)
 users[$3]=$1
 close(passwd) # close passwd

 if(path="") # set path with -v path=...
 path="." # default path is cwd
 pathlist[1]=path # path from the command line
 # you could have several paths
 fts(pathlist,FTS_PHYSICAL,filedata) # dont mind links (vs. FTS_LOGICAL)
 for(p in filedata) # p for paths
 for(f in filedata[p]) # f for files
 if(filedata[p][f]["stat"]["type"]=="file") # mind files only
 size[filedata[p][f]["stat"]["uid"]]+=filedata[p][f]["stat"]["size"]
 for(i in size)
 print (users[i]?users[i]:i),size[i] # print username if found else uid
 exit

Sample outputs:

$ ls -l
total 3623
drwxr-xr-x 2 james james 3690496 Mar 21 21:32 100kfiles/
-rw-r--r-- 1 root root 4 Mar 21 18:52 bar
-rw-r--r-- 1 james james 424 Mar 21 21:33 bar.awk
-rw-r--r-- 1 james james 546 Mar 21 21:19 bar.awk~
-rw-r--r-- 1 james james 315 Mar 21 19:14 foo.awk
-rw-r--r-- 1 james james 125 Mar 21 18:53 foo.awk~
$ awk -v path=. -f bar.awk
root 4
james 1410

Another:

$ time awk -v path=100kfiles -f bar.awk
root 4
james 342439926

real 0m1.289s
user 0m0.852s
sys 0m0.440s

Yet another test with a million empty files:

$ time awk -v path=../million_files -f bar.awk

real 0m5.057s
user 0m4.000s
sys 0m1.056s

edited Mar 22 at 7:08

answered Mar 21 at 17:23

James Brown

20.1k42037

looks like my awk doesn't have filefuncs awk: foo.awk:1: ^ invalid char '@' in expression

– stack0114106
Mar 21 at 18:04

TIme to upgrade to a modern version of GNU awk.

– James Brown
Mar 21 at 18:17

this is in Enterprise Linux - RHEL 6.10.. I see gawk pointing to /bin/gawk and the version is GNU Awk 3.1.7.. does it support @loadfiles?.. or is there any other location that would have another awk??..

– stack0114106
Mar 21 at 18:24

A wild guess that extensions came in GNU awk 4. But I saw you mentioned 300k files, this solution can't handle that many.

– James Brown
Mar 21 at 18:29

1

ok.. anyway good to know loadfiles...I did run this in my cygwin and it works..so ++

– stack0114106
Mar 21 at 18:31

|
show 2 more comments

Did I see some awk in the op? Here is one in GNU awk using filefuncs extension:

$ cat bar.awk
@load "filefuncs"
BEGIN 
 FS=":" # passwd field sep
 passwd="/etc/passwd" # get usernames from passwd
 while ((getline < passwd)>0)
 users[$3]=$1
 close(passwd) # close passwd

 if(path="") # set path with -v path=...
 path="." # default path is cwd
 pathlist[1]=path # path from the command line
 # you could have several paths
 fts(pathlist,FTS_PHYSICAL,filedata) # dont mind links (vs. FTS_LOGICAL)
 for(p in filedata) # p for paths
 for(f in filedata[p]) # f for files
 if(filedata[p][f]["stat"]["type"]=="file") # mind files only
 size[filedata[p][f]["stat"]["uid"]]+=filedata[p][f]["stat"]["size"]
 for(i in size)
 print (users[i]?users[i]:i),size[i] # print username if found else uid
 exit

Sample outputs:

$ ls -l
total 3623
drwxr-xr-x 2 james james 3690496 Mar 21 21:32 100kfiles/
-rw-r--r-- 1 root root 4 Mar 21 18:52 bar
-rw-r--r-- 1 james james 424 Mar 21 21:33 bar.awk
-rw-r--r-- 1 james james 546 Mar 21 21:19 bar.awk~
-rw-r--r-- 1 james james 315 Mar 21 19:14 foo.awk
-rw-r--r-- 1 james james 125 Mar 21 18:53 foo.awk~
$ awk -v path=. -f bar.awk
root 4
james 1410

Another:

$ time awk -v path=100kfiles -f bar.awk
root 4
james 342439926

real 0m1.289s
user 0m0.852s
sys 0m0.440s

Yet another test with a million empty files:

$ time awk -v path=../million_files -f bar.awk

real 0m5.057s
user 0m4.000s
sys 0m1.056s

edited Mar 22 at 7:08

answered Mar 21 at 17:23

James Brown

20.1k42037

Did I see some awk in the op? Here is one in GNU awk using filefuncs extension:

$ cat bar.awk
@load "filefuncs"
BEGIN 
 FS=":" # passwd field sep
 passwd="/etc/passwd" # get usernames from passwd
 while ((getline < passwd)>0)
 users[$3]=$1
 close(passwd) # close passwd

 if(path="") # set path with -v path=...
 path="." # default path is cwd
 pathlist[1]=path # path from the command line
 # you could have several paths
 fts(pathlist,FTS_PHYSICAL,filedata) # dont mind links (vs. FTS_LOGICAL)
 for(p in filedata) # p for paths
 for(f in filedata[p]) # f for files
 if(filedata[p][f]["stat"]["type"]=="file") # mind files only
 size[filedata[p][f]["stat"]["uid"]]+=filedata[p][f]["stat"]["size"]
 for(i in size)
 print (users[i]?users[i]:i),size[i] # print username if found else uid
 exit

Sample outputs:

$ ls -l
total 3623
drwxr-xr-x 2 james james 3690496 Mar 21 21:32 100kfiles/
-rw-r--r-- 1 root root 4 Mar 21 18:52 bar
-rw-r--r-- 1 james james 424 Mar 21 21:33 bar.awk
-rw-r--r-- 1 james james 546 Mar 21 21:19 bar.awk~
-rw-r--r-- 1 james james 315 Mar 21 19:14 foo.awk
-rw-r--r-- 1 james james 125 Mar 21 18:53 foo.awk~
$ awk -v path=. -f bar.awk
root 4
james 1410

Another:

$ time awk -v path=100kfiles -f bar.awk
root 4
james 342439926

real 0m1.289s
user 0m0.852s
sys 0m0.440s

Yet another test with a million empty files:

$ time awk -v path=../million_files -f bar.awk

real 0m5.057s
user 0m4.000s
sys 0m1.056s

edited Mar 22 at 7:08

answered Mar 21 at 17:23

James Brown

20.1k42037

edited Mar 22 at 7:08

answered Mar 21 at 17:23

James Brown

20.1k42037

answered Mar 21 at 17:23

James Brown

20.1k42037

answered Mar 21 at 17:23

James Brown

20.1k42037

looks like my awk doesn't have filefuncs awk: foo.awk:1: ^ invalid char '@' in expression

– stack0114106
Mar 21 at 18:04

TIme to upgrade to a modern version of GNU awk.

– James Brown
Mar 21 at 18:17

this is in Enterprise Linux - RHEL 6.10.. I see gawk pointing to /bin/gawk and the version is GNU Awk 3.1.7.. does it support @loadfiles?.. or is there any other location that would have another awk??..

– stack0114106
Mar 21 at 18:24

A wild guess that extensions came in GNU awk 4. But I saw you mentioned 300k files, this solution can't handle that many.

– James Brown
Mar 21 at 18:29

1

ok.. anyway good to know loadfiles...I did run this in my cygwin and it works..so ++

– stack0114106
Mar 21 at 18:31

|
show 2 more comments

looks like my awk doesn't have filefuncs awk: foo.awk:1: ^ invalid char '@' in expression

– stack0114106
Mar 21 at 18:04

TIme to upgrade to a modern version of GNU awk.

– James Brown
Mar 21 at 18:17

this is in Enterprise Linux - RHEL 6.10.. I see gawk pointing to /bin/gawk and the version is GNU Awk 3.1.7.. does it support @loadfiles?.. or is there any other location that would have another awk??..

– stack0114106
Mar 21 at 18:24

A wild guess that extensions came in GNU awk 4. But I saw you mentioned 300k files, this solution can't handle that many.

– James Brown
Mar 21 at 18:29

1

ok.. anyway good to know loadfiles...I did run this in my cygwin and it works..so ++

– stack0114106
Mar 21 at 18:31

looks like my awk doesn't have filefuncs awk: foo.awk:1: ^ invalid char '@' in expression

– stack0114106
Mar 21 at 18:04

TIme to upgrade to a modern version of GNU awk.

– James Brown
Mar 21 at 18:17

this is in Enterprise Linux - RHEL 6.10.. I see gawk pointing to /bin/gawk and the version is GNU Awk 3.1.7.. does it support @loadfiles?.. or is there any other location that would have another awk??..

– stack0114106
Mar 21 at 18:24

A wild guess that extensions came in GNU awk 4. But I saw you mentioned 300k files, this solution can't handle that many.

– James Brown
Mar 21 at 18:29

ok.. anyway good to know loadfiles...I did run this in my cygwin and it works..so ++

– stack0114106
Mar 21 at 18:31

|
show 2 more comments

Using datamash (and Stefan Becker's find code):

find $dir -maxdepth 1 -type f -printf "%ut%sn" | datamash -sg 1 sum 2

edited Mar 22 at 11:40

answered Mar 22 at 11:36

agc

5,0171438

@agc..the answer seems to be simple.. is datamash available in RHEL 6.1?

– stack0114106
Mar 22 at 11:38

@stack0114106, Not sure -- RPM files exist, but whether those work in RHEL 6.1 is unclear without a 6.1 box to test on.

– agc
Mar 22 at 11:58

add a comment |

Using datamash (and Stefan Becker's find code):

find $dir -maxdepth 1 -type f -printf "%ut%sn" | datamash -sg 1 sum 2

edited Mar 22 at 11:40

answered Mar 22 at 11:36

agc

5,0171438

@agc..the answer seems to be simple.. is datamash available in RHEL 6.1?

– stack0114106
Mar 22 at 11:38

@stack0114106, Not sure -- RPM files exist, but whether those work in RHEL 6.1 is unclear without a 6.1 box to test on.

– agc
Mar 22 at 11:58

add a comment |

Using datamash (and Stefan Becker's find code):

find $dir -maxdepth 1 -type f -printf "%ut%sn" | datamash -sg 1 sum 2

edited Mar 22 at 11:40

answered Mar 22 at 11:36

agc

5,0171438

Using datamash (and Stefan Becker's find code):

find $dir -maxdepth 1 -type f -printf "%ut%sn" | datamash -sg 1 sum 2

edited Mar 22 at 11:40

answered Mar 22 at 11:36

agc

5,0171438

edited Mar 22 at 11:40

answered Mar 22 at 11:36

agc

5,0171438

answered Mar 22 at 11:36

agc

5,0171438

answered Mar 22 at 11:36

agc

5,0171438

@agc..the answer seems to be simple.. is datamash available in RHEL 6.1?

– stack0114106
Mar 22 at 11:38

@stack0114106, Not sure -- RPM files exist, but whether those work in RHEL 6.1 is unclear without a 6.1 box to test on.

– agc
Mar 22 at 11:58

add a comment |

@agc..the answer seems to be simple.. is datamash available in RHEL 6.1?

– stack0114106
Mar 22 at 11:38

@stack0114106, Not sure -- RPM files exist, but whether those work in RHEL 6.1 is unclear without a 6.1 box to test on.

– agc
Mar 22 at 11:58

@agc..the answer seems to be simple.. is datamash available in RHEL 6.1?

– stack0114106
Mar 22 at 11:38

@stack0114106, Not sure -- RPM files exist, but whether those work in RHEL 6.1 is unclear without a 6.1 box to test on.

– agc
Mar 22 at 11:58

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

6 Answers
6

Your Answer

Post as a guest

6 Answers
6

6 Answers
6

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

6 Answers 6

Your Answer

Sign up or log in

Post as a guest

Post as a guest

6 Answers 6

6 Answers 6

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

6 Answers
6

6 Answers
6

6 Answers
6