Fast line count
The following is a simple bash script used to calculate the number of lines in ginormous text files ( like 20G CSV).
#!/bin/bash
# get the line count of a file without reading the entire file
# accuracy can be adjusted by changing the $linenum parameter
path=$1
linenum=$2
head=$(head -$linenum $path | wc -c)
tail=$(head -$linenum $path | wc -c)
bStr=$(wc -c $path)
totalBytes=$(echo $bStr | cut -d' ' -f1)
headAvg=$(($head/$linenum))
tailAvg=$(($tail/$linenum))
totalAvg=$((($headAvg+$tailAvg)/2))
estimatedlines=$(($totalBytes/$totalAvg))
echo $estimatedlines
Example of usage and benchmark
time ./lcount.sh ./somefile.txt 100000
2234932
real 0m2.707s
user 0m0.974s
sys 0m1.509s
time wc -l ./somefile.txt
2248443 ./somefile.txt
real 1m30.088s
user 0m0.794s
sys 0m6.408s
The accuracy, in this case, is 99.39%, but can be adjusted using the linenum argument.