You may want to compare two very large
text files containing delimited data, for millions of rows, unsorted
and in a random order. The ideal solution Excel would fail because of
it's incapability to support rows > 1048596. For even thousands of
rows, Excel would put the system to an irresponsive state, as the
data we are going to compare are not in proper order in both the
files.
This piece of code, enables the user to
compare File1 against File2 and list the differences. One of the file
will be compared agaisnt the other and the selection of what to
compare against what, will be chosen based upon the file size,
(larger file is compared against), in order to increase the
efficiency. The output is displayed on the screen, which is the
entire row that mismatched (or not found). Output can of course be
redirected to a file using the UNIX redirection operator. >
filename.txt
#!/usr/bin/perl print "File 1 : "; chomp($file1=); #get the name of File 1 print "File 2 : "; chomp($file2= ); #get the name of File 2 open TESTFILE, "<", $file1 or die "Cannot open File : $file1\nError : $!\n"; # Abort if unable to read file 1 open TESTFILE2, "<", $file2 or die "Cannot open File : $file2\nError : $!\n"; # Abort if unable to read file 1 @file_1_data= (-s $file1 < -s $file2) ? : ; #choose what to compare against what depending #upon the file size so as to reduce the number of #comparison operations. @file_2_data= (-s $file1 < -s $file2) ? : ; $file_1_current_line; #contains data being read from file 1 $line_counter; #to keep track of the current line $matched_flag=0; #flag to indicate if match found foreach (@file_1_data) { s/\s+//g; #replace unnecessary white space to nothing $file_1_current_line=$_; #start from line no. 1 $line_counter ++; #Increment the line counter as it traverses. foreach ( @file_2_data ) { #do it for each row in file 2 s/\s+//g; #replace unnecessary white space to nothing if(/($file_1_current_line)/) { #if different from the data just read from file 1, # print "matched\n"; #a debugging message to display the user as soon as a match is found $matched_flag=1; #set to indicate match found and the comparison can continue with next row of file 1 last; #similiar to saying break; } } print "\nNot matched\nLine no: $line_counter\n$file_1_current_line\n" if $matched_flag==0; #display the row not matched $matched_flag=0; #reset the match flag to continue with next row. } close TESTFILE; #close both the files close TESTFILE2;