Java code to search a file for lines matching a word / regular expression.

The following tutorial explains about writing code to search a file for lines that contains any word/string  or matching a given regular expression  and prints those lines. I have written two programs. First one is very easy and simple version which accepts  word and file name through command prompt and searches the file for lines matching the given word and those lines are getting printed. Part of the code is as follows.

public static void findLine(String searchWord, String filename) throws IOException
 {
	int count = 0;
	BufferedReader br = new BufferedReader(new FileReader(filename));
	String line;
	while( (line = br.readLine( )) != null)
		{
			if (line.indexOf(searchWord) != -1)
				{
					count++;
					System.out.println(line);
				}
		}
	System.out.println(count + " matching line(s) found.");
	br.close( );
 }

Second one is little bit advanced version for the above program. Here let us write own code for grep command in linux/unix. Before going to the code, let us go through grep command. grep (global regular expression print ) in linux/unix is one of the very useful command which searches the given file for lines matching a given regular expression, and prints those lines. Some of the simple examples of grep command are as follows

1. grep USA city.txt : prints the lines which has the word(substring) USA

2. grep -i usa city.txt : the -i argument tells grep to ignore case. (matches both USA & usa)

3. grep -v usa city.txt : the -v prints all lines that do not contain usa

4. grep -n java booklist.txt – prints with line number

5. grep -c ibm companies.txt – prints the number of lines matched.

6. grep “04[-./]11[-./]2011” k.txt – searches for the dates 04.11.2011 , 04-11-2011 , 04/11/2011 using the regular expression

7. grep -w java k.txt – matches the java as whole word.  does not match  myjava, sunjava

I have implemented the above functionality through java code. Java program is given below

package com.javaonline;

import java.io.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
class JavaGrep
{
	public static void main(String args[])
		{
			String searchWord="";
			String fileName="";
			String options="";
			String optionsAllowed="vnic";//Options allowed in the input
			if (args.length==2)
			{
				searchWord=args[0];
				fileName=args[1];
			}
			else if (args.length==3)
			  {
				options=args[0];    //options may contain  vnic.   
				searchWord=args[1];
				fileName=args[2];
			}
			else 
			{
				exit();
			}
			
			
			if(options.length()>4 )   // length  of options may not be more than 4 chars. 
				{
					exit();   
				}
			
				char OA[]=options.toCharArray(); //Options converted to char array
				
			for (int i=0 ; i<OA.length;i++)     
				if( optionsAllowed.indexOf(OA[i])==-1)  //if options contains other than vnci, it exits
				{ 
					System.out.println("Invalid options");
					exit();
				}
					
			
			try
			{
				grep(options, searchWord, fileName);  // calling grep method to search lines
			}
			catch(IOException io)  { System.out.println("IO Error");  }
		}
// grep method starts
		public static void grep(String options, String exp, String filename) throws IOException
			{
				int count = 0;
				int flag=0;
				int lineNo=0;
				int countNM=0;
				// open the file
				BufferedReader brdr = new BufferedReader(new FileReader(filename));
				String line;
				if(options.contains("i"))  flag=2;   // for case insensitive
					Pattern pattern=null;
					// reading  each line
					while( (line = brdr.readLine( )) != null)
					{
						  lineNo++;  
						  try
						   {
							  pattern = Pattern.compile(exp, flag); //flag 2 to mention case insensitive
						   }
						  catch (PatternSyntaxException e)    { System.out.println("Error");   } 
					
						  Matcher matcher = pattern.matcher(line);
						 if (matcher.find())
						  {
							 count++;    //counting matching lines
							//printing only matching lines
							if (!options.contains("v"))
							{
								if (options.contains("n")) 
									System.out.println(lineNo + " : " + line);  
								else System.out.println(line);
							}
						  }
						 else
						 	{
							 countNM++;  // counting not matching lines
							 //printing not matching lines
							if (options.contains("v"))   
							{
								if (options.contains("n"))  
									System.out.println(lineNo + " : " + line);  
								else 
									System.out.println(line); 
							}
						 	}
						}
						//Lines count for both matching  &  not matching 
						if (options.contains("c"))  
						{
							System.out.println("\n Word / Exp : " + exp );
							System.out.println(count + " line(s) matched.");
							System.out.println(countNM + " line(s) not matched.");
						}
						brdr.close( );
				}

		public static void exit()
		{
				System.out.println("Syntax : java JavaGrep [options] regular_expression/word file_name "  );
				System.out.println("Options Allowed : i or n or c or v  or any combinations "  );
				System.exit(0);
		}
}

The program accepts the following inputs through command prompt

1. Options like i, c, n , v

2. Word / Regular Expression

3. File Name

Syntax for running running the program is ..

java JavaGrep [options] word/regexp filename

options i for ignoring case , n for printing line number , v for printing lines not containing the given word

eg.1 java JavaGrep 04[-./]11[-./]2011 k.txt – searches for the dates 04.11.2011, 04-11-2011, 04/11/2011 and prints the lines

eg.2 java JavaGrep ivnc servlet cutomer.rtf – combined all options together. ignore case, lines not containing the given word, print line no, lines counting .

eg.3 java JavaGrep java\b k.txt – java as a whole word. ( myjava , sunjava does not match)

javagrep1

javagrep2

You may also like