The following is a simple bash script used to calculate the number of lines in ginormous text files ( like 20G CSV).

#!/bin/bash

# get the line count of a file without reading the entire file
# accuracy can be adjusted by changing the $linenum parameter

path=$1  
linenum=$2  
head=$(head -$linenum $path | wc -c)  
tail=$(head -$linenum $path | wc -c)  
bStr=$(wc -c $path)  
totalBytes=$(echo $bStr | cut -d' ' -f1)

headAvg=$(($head/$linenum))  
tailAvg=$(($tail/$linenum))  
totalAvg=$((($headAvg+$tailAvg)/2))

estimatedlines=$(($totalBytes/$totalAvg))  
echo $estimatedlines  
Example of usage and benchmark

time ./lcount.sh ./somefile.txt 100000  
2234932

real    0m2.707s  
user    0m0.974s  
sys     0m1.509s

time wc -l ./somefile.txt  
2248443 ./somefile.txt

real    1m30.088s  
user    0m0.794s  
sys     0m6.408s

The accuracy, in this case, is 99.39%, but can be adjusted using the linenum argument.

—Read This Next—

DAGs visualization using dagre-3d


Directed Acyclic Graphs are directed graphs, that have a topological ordering, a sequence of the vertices such that every edge is directed f
—You Might Enjoy—

Backslash and pipe on a US keyboard set as UK


This is for anyone in the UK, stuck with a US layout keyboard!!! Recently I bought a cheap mechanical keyboard from the US, even though the