skip to main | skip to sidebar

Perl Programming Language Tutorial

Pages

  • Home
 
  • RSS
  • Twitter
Sunday, October 21, 2012

Huge text file comparator

Posted by Raju Gupta at 11:49 AM – 0 comments
 

You may want to compare two very large text files containing delimited data, for millions of rows, unsorted and in a random order. The ideal solution Excel would fail because of it's incapability to support rows > 1048596. For even thousands of rows, Excel would put the system to an irresponsive state, as the data we are going to compare are not in proper order in both the files.

This piece of code, enables the user to compare File1 against File2 and list the differences. One of the file will be compared agaisnt the other and the selection of what to compare against what, will be chosen based upon the file size, (larger file is compared against), in order to increase the efficiency. The output is displayed on the screen, which is the entire row that mismatched (or not found). Output can of course be redirected to a file using the UNIX redirection operator. > filename.txt


 #!/usr/bin/perl

print "File 1 : ";
chomp($file1=); #get the name of File 1
print "File 2 : "; 
chomp($file2=); #get the name of File 2

open TESTFILE, "<", $file1 or die "Cannot open File : $file1\nError : $!\n"; # Abort if unable to read file 1
open TESTFILE2, "<", $file2 or die "Cannot open File : $file2\nError : $!\n"; # Abort if unable to read file 1

@file_1_data= (-s $file1 < -s $file2) ?  : ; #choose what to compare against what depending 
    #upon the file size so as to reduce the number of 
           #comparison operations.
@file_2_data= (-s $file1 < -s $file2) ?  : ;

$file_1_current_line;   #contains data being read from file 1
$line_counter;    #to keep track of the current line
$matched_flag=0;   #flag to indicate if match found

foreach (@file_1_data) {
 s/\s+//g;    #replace unnecessary white space to nothing
 $file_1_current_line=$_; #start from line no. 1
 $line_counter ++;  #Increment the line counter as it traverses.
  foreach ( @file_2_data ) { #do it for each row in file 2
  s/\s+//g;  #replace unnecessary white space to nothing
   if(/($file_1_current_line)/) {  #if different from the data just read from file 1, 
    # print "matched\n"; #a debugging message to display the user as soon as a match is found
   $matched_flag=1;        #set to indicate match found and the comparison can continue with next row of file 1
   last;   #similiar to saying break;
    }
   }
 print "\nNot matched\nLine no: $line_counter\n$file_1_current_line\n" 
      if $matched_flag==0; #display the row not matched
   $matched_flag=0;      #reset the match flag to continue with next row.
}

close TESTFILE;    #close both the files
close TESTFILE2;


Labels: Perl File Example Email This BlogThis! Share to X Share to Facebook

Leave a Reply

Newer Post Older Post
Subscribe to: Post Comments (Atom)
  • Popular
  • Recent
  • Archives

Popular Posts

  • Perl function to compare two dates
    This function can be used to compare two dates using PERL. The function accepts two string(date) arguments, let's say date1 and date2...
  • File size and modified time
    When we pass the file name (include full path) as parameter,It will give us the size of the file and last modified time. sub getFileSt...
  • Perl function to check whether file or dir name passed to it readable or not
    The function makes sure that the path (directory and/or file) passed to it as an Input parameter is readable or not   use constant SU...
  • Perl script to find files older than x minutes
    This script can be used to find files in a windows directory older than 40 min. List can be emailed to a user also. Script can be modi...
  • Sepearte First name and last name by using PERL regular expressions
    We are using PERL regular expression. We are using 3 functions prxparse,prxmatch and prxposn. Prxparse takes regularexpression an...
  • Perl function to check whether the passed path is empty or not
    The function makes sure that the path (directory and/or file) passed to it as an Input parameter is empty or not. use constant SUCCESS ...
  • Perl function to trim leading and trailing spaces from a string
    Leading and trailing spaces, if any present, are trimmed and the string is returned back to the caller. If a NULL string is passed, the func...
  • Fix Message Reader from Log
    Various subroutines of the package FixUtil can be used to read fix message (tag, value pair). Fix message can be extracted. Tag and Value ca...
  • Login screen using Perl
    This code snippet takes one parameter for default user and displays a login screen asking for user name and password . It aslo provides a...
  • Script to rotate any log file
    This script creates a copy of standard out log files of Weblogic Server after it has reached a predefined size limit,renames it with current...
Powered by Blogger.

Archives

  • ▼  2012 (24)
    • ▼  October (24)
      • Sepearte First name and last name by using PERL re...
      • Validate Info-Perl Script
      • Print Message--Perl Script
      • Password encryption-Perl Script
      • Convert to Seconds--Perl script
      • File size and modified time
      • Convert to minute-Perl script
      • Login screen using Perl
      • Huge text file comparator
      • Parse Input - Perl Script
      • Get colored difference - Perl Script
      • Perl function to compare two dates
      • Script to rotate any log file
      • Cross Referencing script
      • Random Bunch Creation in Perl
      • NASDAQ Status checker using Perl
      • Fix Message Reader from Log
      • Perl script to find files older than x minutes
      • Perl function to check whether the passed path is ...
      • Date Arimatic
      • Perl function to trim leading and trailing spaces ...
      • Perl function to check whether the passed path is ...
      • Perl function to check whether file or dir name pa...
      • Perl function to check whether the passed path is ...
 

Followers

Labels

  • File Searching Example (1)
  • Perl Date Example (2)
  • Perl Encryption Example (1)
  • Perl File Example (2)
  • Validation Example (1)
 
 
© 2011 Perl Programming Language Tutorial | Designs by Web2feel & Fab Themes

Bloggerized by DheTemplate.com - Main Blogger