Java Based Search Engine

Java Based Search Engine

ACKNOWLEDGMENT I express thanks and gratitude to Mr. …………………………………………H. O. D computer science department, ……………………………………………………College for his encouraging support and guidance in carrying out the project. I would like to express gratitude and indebtedness to Mr……………………………………, for his valuable advice and guidance without which this project would not have seen the light of the day. I thank Mr………………………………… , project guide , GSS for his insistence on good programming technique which helped us to design and develop a successful model of an Chatting Tool. Name CONTENTS 1.

STUDY & ANALYSIS PHASE 1. 1 INTRODUCTION 1. 1. 1 PURPOSE OF THE PROJECT 1. 1. 2 PROBLEM IN EXISTING SYSTEM 1. 1. 3 SOLUTION OF THESE PROBLEMS 1. 1. 4 SCOPE OF THE PROJECT 1. 1. 5 LIMITATIONS 1. 1. 6 HARDWARE & SOFTWARE SPECIFICATIONS 1. 1. 7 ORGANISATION PROFILE 2. PROJECT ANALYSIS 1. STUDY OF THE SYSTEM 2. INPUT & OUTPUT 3. PROCESS MODULES USED WITH JUSTIFICATION 3. DESIGN PHASE 1. DATAFLOW 2. UML DIAGRAMS 4. IMPLEMENTATION PHASE 5. TESTING 1. TYPES OF TESTING 1. COMPILING TEST 2. EXECUTION TEST 3. OUTPUT TEST 5. PROJECT CODING 6. OUTPUT SCREENS 7. CONCLUSION 8. BIBLOGRAPHY

STUDY PHASE INTRODUCTION: EXISTING SYSTEM With the advent of the Internet in the past decade, searching for information in various formats has been redefined by the internet search engines, most of them being based on information retrieval (IR) indexing techniques. IR-based searching, which usually allows formulation of queries with multiple words PROPOSED SYSTEM This Java application is basically a recursive file finder. You can search for files based on their filename, their contents, or both in selected directory and as well as sub directories of the specified directory.

Engine is a graphical version of the well-known GREP utility, with an additional feature of traversing subdirectories. You can specify which directory to start looking in, which files to search through, and what pattern to look for in the files. The various fields expect regular expressions, like Global Regular Expression, ‘Engine’ has not only the graphical interface, but also a command-line interface. This is useful for quick searches through, say, a development tree. Engine requires the Java 2 Platform, Standard Edition version 1. 4 or higher. SOLUTION OF THESE PROBLEMS

Regular expressions figure into all kinds of text-manipulation tasks. Searching and search-and-replace are among the more common uses, but regular expressions can also be used to test for certain conditions in a text file or data stream. You might use regular expressions, for example, as the basis for a short program that separates incoming mail from incoming spam. In this case, the program might use a regular expression to determine whether the name of a known spammer appeared in the “From:” line of the email. Email filtering programs, in fact, very often use regular expressions for exactly this type of operation. SCOPE OF THE PROJECT

Engine is a graphical version of the well-known GREP utility, with an additional feature of traversing subdirectories. The main aim of this project is to develop a java based recursive file finder. Engine is a graphical version of the well-known GREP utility, with an additional feature of traversing subdirectories LIMITATIONS Regular expressions tend to be easier to write than they are to read. This is less of a problem if you are the only one who ever needs to maintain the program (or sed routine, or shell script, or what have you), but if several people need to watch over it, the syntax can turn into more of a hindrance than an aid.

Ordinary macros (in particular, editable macros such as those generated by the major word processors and editors) tend not to be as fast, as flexible, as portable, as concise, or as fault-tolerant as regular expressions, but they have the advantage of being much more readable; even people with no programming background whatsoever can usually make enough sense of a macro script to change it if the need arises. For some jobs, such readability will outweigh all other concerns.

As with all things in computing, it’s largely a question of fitting the tool to the job. HARDWARE & SOFTWARE SPECIFICATIONS Environment: Java Runtime Environment version 1. 4 or better installed. Operating System: Any O. S. compatable with JVM Hard disk:10 GB Processor:PIII or higher ORGANIZATION PROFILE EMINENT SOFTWARE SOLUTIONS EMINENT TECHNOLOGIES (ET) is an IT Solution Provider for a dynamic environment where business and technology strategies converge.

Our approach focuses on new ways of business combining IT innovation and adoption while also leveraging an organization’s current IT assets. We work with large global corporations and new generation technology companies – to build new products or services and to implement prudent business and technology strategies in today’s environment. EMINENT’s range of expertise includes: Software Development Services Engineering Services Systems Integration Customer Relationship Management Supply Chain Management Product Development Electronic Commerce

Consulting IT Outsourcing We apply technology with innovation and responsibility to achieve two broad objectives: Effectively address the business issues our customers face today Generate new opportunities that will help them stay ahead in the future This approach rests on: A strategy where we Architect, Integrate and Manage technology services and solutions — we call it AIM for success. A robust offshore development methodology and reduced demand on customer resources A focus on the use of reusable frameworks to provide cost and time benefits

We combine the best people, processes and technology to achieve excellent results consistently. We offer customers the advantages of: Speed: We understand the importance of timing, of getting there before the competition. A rich portfolio of reusable, modular frameworks helps jump start projects. Tried and tested methodology ensures that we follow a predictable, low-risk path to achieve results. Our track record is testimony to complex projects delivered within and even before schedule. Expertise: Our teams combine cutting edge technology skills with rich domain expertise.

What’s equally important — we share a strong customer orientation that means we actually start by listening to the customer. We’re focused on coming up with solutions that serve customer requirements today and anticipate future needs. A Full Service Portfolio: We offer customers the advantage of being able to Architect, Integrate and Manage technology services. This means that they can rely on one, fully accountable source instead of trying to integrate disparate multi-vendor solutions. Services: GSS is providing its services to Sain medicaments Pvt.

Ltd, Grace drugs and pharmaceuticals Private Limited Alka Drugs and Pharmaceuticals Pvt Ltd to name just a few with out rich experience and expertise in Information Technology we are in the best position to provide software solutions to distinct business requirements. PROJECT ANALYSIS STUDY OF THE SYSTEM This application can be mainly divided into two modules:- • User Interface • File Manipulation and filtering INPUT AND OUTPUT: User has to input the File Name or File Contents and he has to select the Drive or Directory in which searching has to be done.

Application will search for the files matching with given criteria and shows the output in the text area of the application. PROCESS MODEL USED WITH JUSTIFICTION The model used here is a SPIRAL MODEL. This Model demands a direct consideration of technical risk at all stages of the project and if properly applied it reduces risk before they become problematic, hence it becomes easier to handle a project when using this kind of model where in the end user can evaluate the program at the end of each stage and suggest modification if required.

PROJECT DESIGN DATA FLOW DIAGRAM 0th Level 1st Level 2nd Level 2nd Level 3rd Level UML DIAGRAMS USE CASE DIAGRAM [pic] CLASS DIAGRAM [pic] ACTIVITY DIAGRAM [pic] SEQUENCE DIAGRAM [pic] IMPLEMENTATION PHASE MODULES: 1. COMPONENT THIS IS THE FRONT END CONSOLE OF THE USER TO INTERACT WITH SEARCH ANALYZER. 2. TOOLBAR COORDINATIOR THIS MODULE HAS THE EMBEDED TOOLS WHICH ARE TO IMPLEMENT SEARCH. 3. MAP FINDER THIS MODULE IS MAPS THE SEARCH CORRESPONDING TO THE USERS QUERY. 4. DYNAMIC STATUS FINDER

THIS IS THE DYNAMIC MODULE WHICH GIVES THE STATUS OF THE SEARCH . 5. GREP GENERATOR GREP GENERATOR IS THE MAJOR MODULE ON WHICH THE SEARCH IS DONE BASED ON REGULAR EXPRESSIONS . OUTPUT SCREENS Engine view [pic] Browse [pic] Search [pic] Help [pic] Error [pic] PROJECT CODING CODE EXPLANATION Regular expressions simplify pattern-matching code Discover the elegance of regular expressions in text-processing scenarios that involve pattern matching Text processing often involves matching text against a pattern.

Although Java’s character and assorted string classes offer low-level pattern-matching support, that support commonly leads to complex code. To help you write simpler pattern-matching code, Java provides regular expressions with java. util. regex package. Text processing frequently requires code to match text against patterns. That capability makes possible text searches, email header validation, custom text creation from generic text (e. g. , “Dear Mr. Chakradhar” instead of “Dear Customer”), and so on. Java supports pattern matching via its character and assorted string classes.

Because that low-level support commonly leads to complex pattern-matching code, Java also offers regular expressions to help you write simpler code. After introducing regular expression terminology, the java. util. regex package’s classes, and a program that demonstrates regular expression constructs, I explore many of the regular expression constructs that the Pattern class supports. I also examine the methods comprising Pattern and other java. util. regex classes. A practical application of regular expressions concludes my discussion.

Regular expressions’ long history begins in the theoretical computer science fields of automata theory and formal language theory. That history continues to Unix and other operating systems, where regular expressions are often used in Unix and Unix-like utilities: examples include awk (a programming language that enables sophisticated text analysis and manipulation-named after its creators, Aho, Weinberger, and Kernighan), emacs (a developer’s editor), and grep (a program that matches regular expressions in one or more text files and stands for global regular expression print).

Regular expressions trace back to the work of an American mathematician by the name of Stephen Kleene (one of the most influential figures in the development of theoretical computer science) who developed regular expressions as a notation for describing what he called “the algebra of regular sets. ” His work eventually found its way into some early efforts with computational search algorithms, and from there to some of the earliest text-manipulation tools on the Unix platform (including ed and grep). In the context of computer searches, the “*” is formally known as a “Kleene star. “

A regular expression, also known as a regex or regexp, is a string whose pattern (template) describes a set of strings. The pattern determines what strings belong to the set, and consists of literal characters and meta characters, characters that have special meaning instead of a literal meaning. The process of searching text to identify matches—strings that match a regex’s pattern—is pattern matching. Java’s java. util. regex package supports pattern matching via its Pattern, Matcher, and PatternSyntaxException classes: 1. Pattern objects, also known as patterns, are compiled regexes 2.

Matcher objects, or matchers, are engines that interpret patterns to locate matches in character sequences, objects whose classes implement the java. lang. CharSequence interface and serve as text sources 3. PatternSyntaxException objects describe illegal regex patterns Code for mainwindow package jog. engine; import java. awt. *; import java. awt. event. *; import java. io. *; import java. net. URL; import java. util. prefs. *; import java. util. regex. *; import javax. swing. *; import javax. swing. filechooser. FileFilter; import jog. engine. *; public class MainWindow extends JFrame implements ActionListener, FileSearchListener { rotected Preferences preferences; protected JTextField lookInField; protected JTextField filePatternField; protected JTextField searchForField; protected JTextField excludeField; protected JCheckBox includeSubCheckBox; protected JList resultList; protected RunSearch runner; protected JButton browseButton; protected JButton helpButton; protected JButton startButton; protected JButton stopButton; protected JButton closeButton; protected JLabel status; protected JPanel cardPanel; protected CardLayout cardLayout; private boolean stopFlag; public MainWindow() { super(Bundle. getString(“AppTitle”)); addWindowListener(new WindowAdapter() public void windowClosing(WindowEvent event) { handleClose(); } }); preferences = Preferences. userRoot(). node(“com/bluemarsh/jrgrep”); Container pane = getContentPane(); GridBagLayout gb = new GridBagLayout(); pane. setLayout(gb); GridBagConstraints gc = new GridBagConstraints(); gc. insets = new Insets(3, 3, 3, 3); JLabel label = new JLabel(Bundle. getString(“lookInLabel”)); gc. anchor = GridBagConstraints. EAST; gb. setConstraints(label, gc); pane. add(label); String s = preferences. get(“lookIn”, “”); lookInField = new JTextField(s, 20); gc. anchor = GridBagConstraints. WEST; gc. fill = GridBagConstraints.

HORIZONTAL; gc. weightx = 1. 0; gb. setConstraints(lookInField, gc); pane. add(lookInField); browseButton = new JButton(Bundle. getString(“browseLabel”)); browseButton. addActionListener(this); gc. anchor = GridBagConstraints. CENTER; gc. gridwidth = GridBagConstraints. REMAINDER; gc. fill = GridBagConstraints. NONE; gc. weightx = 0. 0; gb. setConstraints(browseButton, gc); pane. add(browseButton); label = new JLabel(Bundle. getString(“filePatternLabel”)); gc. anchor = GridBagConstraints. EAST; gc. gridwidth = 1; gb. setConstraints(label, gc); pane. add(label); s = preferences. get(“filter”, “”); ilePatternField = new JTextField(s, 20); gc. anchor = GridBagConstraints. WEST; gc. fill = GridBagConstraints. HORIZONTAL; gc. gridwidth = GridBagConstraints. RELATIVE; gc. weightx = 1. 0; gb. setConstraints(filePatternField, gc); pane. add(filePatternField); Component glue = Box. createGlue(); gc. anchor = GridBagConstraints. CENTER; gc. fill = GridBagConstraints. NONE; gc. gridwidth = GridBagConstraints. REMAINDER; gc. weightx = 0. 0; gb. setConstraints(glue, gc); pane. add(glue); label = new JLabel(Bundle. getString(“searchForLabel”)); gc. anchor = GridBagConstraints. EAST; gc. fill = GridBagConstraints.

NONE; gc. gridwidth = 1; gb. setConstraints(label, gc); pane. add(label); s = preferences. get(“searchFor”, “”); searchForField = new JTextField(s, 20); gc. anchor = GridBagConstraints. WEST; gc. fill = GridBagConstraints. HORIZONTAL; gc. gridwidth = GridBagConstraints. RELATIVE; gc. weightx = 1. 0; gb. setConstraints(searchForField, gc); pane. add(searchForField); helpButton = new JButton(Bundle. getString(“helpLabel”)); helpButton. addActionListener(this); gc. anchor = GridBagConstraints. CENTER; gc. fill = GridBagConstraints. NONE; gc. gridwidth = GridBagConstraints. REMAINDER; gc. weightx = 0. 0; gb. etConstraints(helpButton, gc); pane. add(helpButton); label = new JLabel(Bundle. getString(“excludeLabel”)); gc. anchor = GridBagConstraints. EAST; gc. gridwidth = 1; gb. setConstraints(label, gc); pane. add(label); s = preferences. get(“exclude”, “”); excludeField = new JTextField(s, 20); gc. anchor = GridBagConstraints. WEST; gc. fill = GridBagConstraints. HORIZONTAL; gc. gridwidth = GridBagConstraints. RELATIVE; gc. weightx = 1. 0; gb. setConstraints(excludeField, gc); pane. add(excludeField); glue = Box. createGlue(); gc. anchor = GridBagConstraints. CENTER; gc. fill = GridBagConstraints. NONE; gc. ridwidth = GridBagConstraints. REMAINDER; gc. weightx = 0. 0; gb. setConstraints(glue, gc); pane. add(glue); includeSubCheckBox = new JCheckBox( Bundle. getString(“includeSubDirLabel”), true); includeSubCheckBox. setSelected(preferences. getBoolean( “recurse”, true)); gc. anchor = GridBagConstraints. WEST; gb. setConstraints(includeSubCheckBox, gc); pane. add(includeSubCheckBox); startButton = new JButton( Bundle. getString(“startSearchLabel”)); startButton. addActionListener(this); gc. gridwidth = 1; gb. setConstraints(startButton, gc); pane. add(startButton); status=new JLabel(“”); Color c=new Color(240,100,100); tatus. setForeground(c); gc. gridwidth = 1; gb. setConstraints(status, gc); pane. add(status); stopButton = new JButton(Bundle. getString(“stopLabel”)); stopButton. setEnabled(false); stopButton. addActionListener(this); gc. anchor = GridBagConstraints. CENTER; gc. gridwidth = GridBagConstraints. RELATIVE; gb. setConstraints(stopButton, gc); pane. add(stopButton); closeButton = new JButton(Bundle. getString(“closeLabel”)); closeButton. addActionListener(this); gc. gridwidth = GridBagConstraints. REMAINDER; gb. setConstraints(closeButton, gc); pane. add(closeButton); resultList = new JList(new ResultsListModel());

JScrollPane scroller = new JScrollPane(resultList); cardPanel = new JPanel(new CardLayout()); cardLayout = (CardLayout) cardPanel. getLayout(); cardPanel. add(scroller, “list”); gc. gridwidth = GridBagConstraints. REMAINDER; gc. gridheight = GridBagConstraints. REMAINDER; gc. fill = GridBagConstraints. BOTH; gc. weightx = 1. 0; gc. weighty = 1. 0; gb. setConstraints(cardPanel, gc); pane. add(cardPanel); int width = preferences. getInt(“windowWidth”, 0); int height = preferences. getInt(“windowHeight”, 0); if (width == 0 && height == 0) { pack(); } else { setSize(width, height); } int top = preferences. getInt(“windowTop”, 100); nt left = preferences. getInt(“windowLeft”, 100); setLocation(left, top); stopFlag=false; } public void actionPerformed(ActionEvent event) { JButton button = (JButton) event. getSource(); //System. out. println(“action “+button); if (button == closeButton) { handleClose(); } else if (button == browseButton) { handleBrowse(); } else if (button == startButton) { status. setText(“Search Started”); //System. out. println(“Search started”); startSearch(); } else if (button == helpButton) { displayHelp(); } else if (button == stopButton) { //System. out. println(“Search stopped”); stopButton. setEnabled(false); runner. stop(); stopFlag=true; /int count=((ListModel)resultList. getModel()). getSize(); //status. setText(“Search Stopped “+count+” Files Found”); } } protected void displayHelp() { if (cardPanel. getComponentCount() < 2) { URL helpUrl = Bundle. getResource(“helpFile”); try { JEditorPane editor = new JEditorPane(helpUrl); editor. setEditable(false); JScrollPane scroller = new JScrollPane(editor); cardPanel. add(scroller, “help”); } catch (IOException ioe) { searchFailed(ioe); return; } } cardLayout. show(cardPanel, “help”); } public void fileFound(FileFoundEvent event) { ResultsListModel model = (ResultsListModel) resultList. etModel(); model. addElement(event. getFile()); } protected void handleBrowse() { String dirStr = lookInField. getText(); JFileChooser fc; if (dirStr. equals(“”)) { String lastdir = preferences. get(“lastdir”, null); if (lastdir == null || lastdir. length() == 0) { lastdir = System. getProperty(“user. dir”); } fc = new JFileChooser(lastdir); } else { fc = new JFileChooser(dirStr); } fc. setFileSelectionMode(JFileChooser. DIRECTORIES_ONLY); if (fc. showOpenDialog(this) ! = JFileChooser. CANCEL_OPTION) { File dir = fc. getSelectedFile(); String path = dir. getPath(); lookInField. setText(path); references. put(“lastdir”, path); } } protected void handleClose() { preferences. putInt(“windowTop”, getY()); preferences. putInt(“windowLeft”, getX()); preferences. putInt(“windowWidth”, getWidth()); preferences. putInt(“windowHeight”, getHeight()); preferences. put(“lookIn”, lookInField. getText()); preferences. put(“filter”, filePatternField. getText()); preferences. put(“searchFor”, searchForField. getText()); preferences. put(“exclude”, excludeField. getText()); preferences. putBoolean(“recurse”, includeSubCheckBox. isSelected()); System. exit(0); } public void searchComplete() { stopButton. etEnabled(false); //status. setText(“Search completed”); int count=((ListModel)resultList. getModel()). getSize(); if(count==0){ status. setText(“Search Completed No Files Found”); } if(stopFlag){ status. setText(“Search Stopped “+count +” Files Found”); stopFlag=false; }else{ status. setText(“Search Completed “+count +” Files Found”); } } public void searchFailed(Throwable t) { Object[] messages = {Bundle. getString(“exceptionOccurred”),t. getMessage()}; JOptionPane. showMessageDialog(this, messages, Bundle. getString(“errorTitle”), JOptionPane. ERROR_MESSAGE); } protected void startSearch() { cardLayout. how(cardPanel, “list”); ResultsListModel model = (ResultsListModel) resultList. getModel(); model. clear(); String dirStr = lookInField. getText(); if (dirStr == null || dirStr. length() == 0) { dirStr = “. “; } File dir = new File(dirStr); if (! dir. exists()) { JOptionPane. showMessageDialog(this, Bundle. getString(“pathDoesNotExist”), Bundle. getString(“errorTitle”), JOptionPane. ERROR_MESSAGE); return; } String target = searchForField. getText(); String filter = filePatternField. getText(); String exclude = excludeField. getText(); try { Pattern. compile(target); Pattern. compile(filter); Pattern. ompile(exclude); } catch (PatternSyntaxException pse) { Object[] messages = {Bundle. getString(“invalidRegexPattern”), pse. getMessage()}; JOptionPane. showMessageDialog(this, messages, Bundle. getString(“errorTitle”),JOptionPane. ERROR_MESSAGE); return; } if (runner == null) { runner = new RunSearch(this); } runner. search(dir, target, filter, includeSubCheckBox. isSelected(), exclude); Thread th = new Thread(runner); th. start(); stopButton. setEnabled(true); } } Code for bundle package jog. engine; import java. net. URL; import java. util. MissingResourceException; import java. til. ResourceBundle; public class Bundle { private static ResourceBundle resourceBundle; static { resourceBundle = ResourceBundle. getBundle(Bundle. class. getName()); } public static ResourceBundle getBundle() { return resourceBundle; } public static URL getResource(String key) { String name = getString(key); return name == null ? null : Bundle. class. getResource(name); } public static String getString(String key) { try { return resourceBundle. getString(key); } catch (MissingResourceException mre) { return null; } } } Code for runsearch package jog. engine; import java. io. File; import jog. engine. *; mport java. lang. *; class RunSearch implements Runnable { protected File dir; protected String lookFor; protected String filter; protected boolean subDirs; protected String exclude; protected FileSearchListener listener; protected Searcher searcher; public RunSearch(FileSearchListener listener) { this. listener = listener; } public void run() { if (searcher == null) { searcher = new Searcher(); if (listener ! = null) { searcher. addSearchListener(listener); } } searcher. search(dir, lookFor, filter, subDirs, exclude); } public void search(File dir, String lookFor, String filter, boolean subDirs, String exclude) this. dir = dir; this. lookFor = lookFor; this. filter = filter; this. subDirs = subDirs; this. exclude = exclude; } public void stop() { //System. out. println(“Searcher is one:”+ isAlive()); System. out. println(“Searcher”); if (searcher ! = null) { searcher. stopSearching(); searcher=null; System. out. println(“Searcher one”); } //System. out. println(“Searcher is :”+ isAlive()); //System. out. println(“Searcher one”); } } Code for searcher package jog. engine; import java. io. *; import java. nio. *; import java. nio. channels. *; import java. nio. charset. *; import java. util. *; import java. til. regex. *; import javax. swing. event. EventListenerList; import jog. engine. *; class Searcher { protected static Pattern linePattern; protected static Charset charset; protected static CharsetDecoder decoder; protected EventListenerList searchListeners; protected volatile boolean stopSearch; protected Pattern targetPattern; protected Matcher targetMatcher; protected Pattern filterPattern; protected Matcher filterMatcher; protected Pattern excludePattern; protected Matcher excludeMatcher; static { try { linePattern = Pattern. compile(“. *
?
“); } catch (PatternSyntaxException pse) { System. out. rintln(“Ye flipping gods! “); } charset = Charset. forName(“ISO-8859-1”); decoder = charset. newDecoder(); } public Searcher() { searchListeners = new EventListenerList(); } public void addSearchListener(FileSearchListener listener) { searchListeners. add(FileSearchListener. class, listener); } protected void fireDone() { if (searchListeners == null) { return; } Object[] listeners = searchListeners. getListenerList(); for (int i = listeners. length – 2; i >= 0; i -= 2) { if (listeners[i] == FileSearchListener. class) { FileSearchListener fsl = (FileSearchListener) listeners[i + 1]; fsl. earchComplete(); } } } protected void fireError(Throwable t) { if (searchListeners == null) { return; } Object[] listeners = searchListeners. getListenerList(); for (int i = listeners. length – 2; i >= 0; i -= 2) { if (listeners[i] == FileSearchListener. class) { FileSearchListener fsl = (FileSearchListener) listeners[i + 1]; fsl. searchFailed(t); } } } protected void fireFound(String match) { if (searchListeners == null) { return; } FileFoundEvent event = new FileFoundEvent(this, match); Object[] listeners = searchListeners. getListenerList(); for (int i = listeners. ength – 2; i >= 0; i -= 2) { if (listeners[i] == FileSearchListener. class) { FileSearchListener fsl = (FileSearchListener) listeners[i + 1]; fsl. fileFound(event); } } event = null; } public void removeSearchListener(FileSearchListener listener) { searchListeners. remove(FileSearchListener. class, listener); } public void search(File startIn, String target, String filter, boolean recurse, String exclude) { stopSearch = false; try { targetPattern = Pattern. compile(target); filterPattern = Pattern. compile(filter); if (exclude ! = null && exclude. length() > 0) { excludePattern = Pattern. ompile(exclude); } searchLow(startIn, recurse); } catch (IOException ioe) { fireError(ioe); } catch (PatternSyntaxException pse) { fireError(pse); } targetPattern = null; targetMatcher = null; filterPattern = null; filterMatcher = null; excludePattern = null; excludeMatcher = null; fireDone(); } protected void searchLow(File startIn, boolean recurse) throws IOException { String[] files = startIn. list(); if (files == null) { return; } for (int ii = 0; ii < files. length; ii++) { if (stopSearch) { break; } File file = new File(startIn, files[ii]); if (file. isFile() && file. canRead()) { String filename = file. etCanonicalPath(); if (filterMatcher == null) { filterMatcher = filterPattern. matcher(filename); } else { filterMatcher. reset(filename); } if (! filterMatcher. find()) { continue; } FileInputStream fis = new FileInputStream(file); FileChannel fc = fis. getChannel(); MappedByteBuffer bb = fc. map(FileChannel. MapMode. READ_ONLY, 0, fc. size()); CharBuffer cb = decoder. decode(bb); boolean matchFound = false; if ((targetPattern. flags() & Pattern. DOTALL) ! = 0) { if (targetMatcher == null) { targetMatcher = targetPattern. matcher(cb); } else { argetMatcher. reset(cb); } if (targetMatcher. find()) { matchFound = true; } } else { Matcher lm = linePattern. matcher(cb); while (lm. find()) { CharSequence cs = lm. group(); if (targetMatcher == null) { targetMatcher = targetPattern. matcher(cs); } else { targetMatcher. reset(cs); } if (targetMatcher. find()) { matchFound = true; } if (lm. end() == cb. limit()) { break; } } } if (matchFound) { fireFound(filename); } } else if (recurse && file. isDirectory()) { String dirname = file. getName(); if (excludePattern ! = null) { if (excludeMatcher == null) { excludeMatcher = excludePattern. atcher(dirname); } else { excludeMatcher. reset(dirname); } if (! excludeMatcher. find()) { searchLow(file, recurse); } } else { searchLow(file, recurse); } } } } public void stopSearching() { stopSearch = true; } } Code for filesearch listener package jog. engine; import java. util. EventListener; import jog. engine. FileFoundEvent; interface FileSearchListener extends EventListener { public void fileFound(FileFoundEvent event); public void searchComplete(); public void searchFailed(Throwable t); } Code for tty package jog. engine; import java. io. *; import java. util. regex. *; mport jog. engine. *; public class tty implements FileSearchListener { protected static int argIndex; protected static String excludeStr = “”; protected static String nameStr = “”; protected static void displayHelp() { String str = Bundle. getString(“ttyHelp1”); int i = 1; while (str ! = null) { System. out. println(str); i++; str = Bundle. getString(“ttyHelp” + i); } } public void fileFound(FileFoundEvent event) { System. out. println(event. getFile()); } protected static boolean processArgs(String[] args) { while (argIndex < args. length) { String arg = args[argIndex]; if (arg. equals(“-exclude”)) { rgIndex++; excludeStr = args[argIndex]; } else if (arg. equals(“-h”) || arg. equals(“-help”) || arg. equals(“–help”)) { displayHelp(); return false; } else if (arg. equals(“-name”)) { argIndex++; nameStr = args[argIndex]; } else { break; } argIndex++; } return true; } public void searchComplete() {} public void searchFailed(Throwable t) { System. err. println(Bundle. getString(“exceptionOccurred”)); System. err. println(t. getMessage()); } public static void main(String[] args) { try { if (! processArgs(args)) { return; } } catch (ArrayIndexOutOfBoundsException aioobe) { System. err. println(Bundle. etString(“ttyMissingArguments”)); return; } if (argIndex == args. length) { System. err. println(Bundle. getString(“ttyMissingRequired”)); return; } String target = args[argIndex]; argIndex++; String dirStr = null; if (argIndex == args. length) { dirStr = “. “; } else { dirStr = args[argIndex]; } File dir = new File(dirStr); if (! dir. exists()) { System. err. println(Bundle. getString(“pathDoesNotExist”)); return; } try { Pattern. compile(target); Pattern. compile(nameStr); } catch (PatternSyntaxException pse) { System. err. println(Bundle. getString(“invalidRegexPattern”)); System. rr. println(pse. getMessage()); return; } Searcher searcher = new Searcher(); tty instance = new tty(); searcher. addSearchListener(instance); searcher. search(dir, target, nameStr, true, excludeStr); searcher. removeSearchListener(instance); } } Code for filefoundevent package jog. engine; import java. util. EventObject; class FileFoundEvent extends EventObject { protected String file; public FileFoundEvent(Object source, String file) { super(source); this. file = file; } public String getFile() { return file; } } Code for result list model package jog. engine; import java. util. Vector; import javax. wing. AbstractListModel; import jog. engine. *; class ResultsListModel extends AbstractListModel { protected Vector listData; public ResultsListModel() { listData = new Vector(); } public void addElement(Object o) { listData. add(o); fireIntervalAdded(this, listData. size(), listData. size()); } public void clear() { int size = listData. size(); listData. clear(); fireIntervalRemoved(this, 0, size); } public Object getElementAt(int i) { try { return listData. elementAt(i); } catch (ArrayIndexOutOfBoundsException e) { return null; } } public int getSize() { return listData. size(); } } Code for main package jog. ngine; import jog. engine. *; public class Main { public static void main(String[] args) { new MainWindow(). show(); // new jog. searchdesk. MainWindow. show(); } } For a tool with full regex support, metacharacters like “*” and “? ” (or “wildcard operators,” as they are sometimes called) are only the tip of the iceberg. Using a good regex engine and a well-crafted regular expression, one can easily search through a text file (or a hundred text files) searching for words that have the suffix “. html” (but only if the word begins with a capital letter and occurs at the beginning of the line), replace the . tml suffix with a . sgml suffix, and then change all the lower case characters to upper case. With the right tools, this series of regular expressions would do just that: s/(^[A_Z]{1})([a-z]+). sgml/12. html/g tr/a-z/A-Z/ As you might guess from this example, concision is everything when it comes to crafting regular expressions, and while this syntax won’t win any beauty prizes, it follows a logical and fairly standardized format which you can learn to rea+*d and write easily with just a little bit of practice. In a regular expression, everything is a generalized pattern.

If I type the word “serendipitous” into my editor, I’ve created one instance of the word “serendipitous. ” If, however, I indicate to my tool (or compiler, or editor, or what have you) that I’m now typing a regular expression, I am in effect creating a template that matches all instances of the characters “s,” “e,” “r,” “e,” “n,” “d,” “i,” “p,” “i,” “t,” “o,” “u,” and “s” all in a row. The standard way to find “serendipitous” (the word) in a file is to use “serendipitous” (the regular expression) with a tool like egrep (or extended grep): $ egrep “serendipitous” foobar ;hits

This line, as you might guess, asks egrep to find instances of the pattern “serendipitous” in the file “foobar” and write the results to a file called “hits”. In the preceding examples, we have been using regular expressions that adhere to the first rule of regular expressions: namely, that all alphanumeric characters match themselves. There are other characters, however, that match in a more generalized fashion. These are usually referred to as the meta characters. Some meta characters match single characters. This includes the following symbols: . Matches any one character …]Matches any character listed between the brackets [^…]Matches any character except those listed between the brackets Suppose we have a number of filenames listed out in a file called “Important. files. ” We want to “grep out” those filenames which follow the pattern “blurfle1”, “blurfle2”, “blurfle3,” and so on, but exclude files of the form “1blurfle”, “2blurfle”, “3blurfle” The following regex would do the trick: $ egrep “blurfle. ” Important. files ;blurfles The important thing to realize here is that this line will not match merely the string “blurfle. (that is, “blurfle” followed by a period). In a regular expression, the dot is a reserved symbol (we’ll get to matching periods a little further on). This is fine if we aren’t particular about the character we match (whether it’s a “1,” a “2,” or even a letter, a space, or an underscore). Narrowing the field of choices for a single character match, however, requires that we use a character class. Character classes match any character listed within that class and are separated off using square brackets.

So, for example, if we wanted to match on “blurfle” but only when it is followed immediately by a number (including “blurfle1” but not “blurflez”) we would use something like this: $ egrep “blurfle[0123456789]” Important. files >blurfles The syntax here is exactly as it seems: “Find ‘blurfle’ followed by a zero, a one, a two, a three, a four, a five, a six, a seven, an eight, or a nine. ” Such classes are usually abbreviated using the range operator (“-“): $ egrep “blurfle[0-9]” Important. files >blurfles The following regex would find “blurfle” followed by any alphanumeric character (upper or lower case). egrep “blurfle[0-9A-Za-z]” Important. files >blurfles (Notice that we didn’t write blurfle[0-9 A-Z a-z] for that last one. The spaces might make it easier to read, but we’d be matching on anything between zero and nine, anything between a and z, anything between A and Z, or a space. ) A carat at the beginning of the character class negates that class. In other words, if you wanted to find all instances of blurfle except those which end in a number, you’d use the following: $ egrep “blurfle[^0-9]” Important. files >blurfles Many regex implementations have “macros” for various character classes.

In Perl, for example, d matches any digit ([0-9]) and w matches any “word character” ([a-zA-Z0-9_]). Grep uses a slightly different notation for the same thing: [:digit:] for digits and [:alnum:] for alphanumeric characters. The man page (or other documentation) for the particular tool should list all the regex macros available for that tool. Quantifiers The regular expression syntax also provides metacharacters which specify the number of times a particular character should match. ?Matches any character zero or one times *Matches the preceding element zero or more times +Matches the preceding element one or more times num}Matches the preceding element num times {min, max}Matches the preceding element at least min times, but not more than max times These metacharacters allow you to match on a single-character pattern, but then continue to match on it until the pattern changes. In the last example, we were trying to search for patterns that contain “blurfle” followed by a number between zero and nine. The regex we came up with would match on blurfle1, blurfle2, blurfle3, etc. If, however, you had a programmer who mistakenly thought that “blurfle” was supposed to be spelled “blurffle,” our regex wouldn’t be able to catch it.

We could fix it, though, with a quantifier. $ egrep “blur[f]+le[0-9]” Important. files >blurfles Here we have “Find ‘b’, ‘l’, ‘u,’ ‘r’ (in a row) followed by one or more instances of an ‘f’ followed by ‘l’ and ‘e’ and then any single digit character between zero and nine. ” There’s always more than one way to do it with regular expressions, and in fact, if we use single-character metacharacters and quantifiers in conjunction with one another, we can search for almost all the variant spellings of “blurfle” (“bllurfle,” “bllurrfle”, bbluuuuurrrfffllle”, and so on).

One way, for example, might employ the ubiquitous (and exceedingly powerful) . * combination: $ egrep “b. *e” Important. files ;blurfles If we work this out, we come out with something like: “find a ‘b’ followed by any character any number of times (including zero times) followed by an ‘e’. ” It’s tempting to use “. *” with abandon. However, bear in mind that the preceding example would match on words like “blue” and “baritone” as well as “blurfle. ” Suppose the filenames in blurfle are numbered up to 12324, but we only care about the first 999: $ egrep “blurfle[0-9]{3}” Important. files >blufles

This regex tells egrep to match any number between zero and nine exactly three times in a row. Similarly, “blurfle[0-9]{3,5}” matches any number between zero and nine at lest three times but not more than five times in a row. Anchors Often, you need to specify the position at which a particular pattern occurs. This is often referred to as “anchoring” the pattern: ^Matches at the start of the line $Matches at the end of the line <Matches at the beginning of a word >Matches at the end of a word Matches at the beginning or the end of a word BMatches any charater not at the beginning or end of a word ^” and “$” are some of the most useful metacharacters in the regex arsenal–particularly when you need to run a search-and-replace on a list of strings. Suppose, for example, that we want to take the “blurfle” files listed in Important. files, list them out separately, run a program called “fragellate” on each one, and then append each successive output to a file called “fraggled_files. ” We could write a full-blown shell script (or Perl script) that would do this, but often, the job is faster and easier if we build a very simple shell script with a series of regular expressions.

We’d begin by greping the files we want to operate on and writing the output to a file. $ egrep “blurfle[0-9]” Important. file ;script. sh This would give us a list of files in script. sh that looked something like this: blurfle1 blurfle2 blurfle3 blurfle4 . . . Now we use sed (or the “/%s” operator in vi, or the “query-replace-regexp” command in emacs) to put “fragellate” in front of each filename and “>>fraggled_files” after each filename. This requires two separate search-and-replace operations (though not necessarily, as I’ll explain when we get to backreferences).

With sed, you have the ability to put both substitution lines into a file, and then use that file to iterate through another making each substitution in turn. In other words, we create a file called “fraggle. sed” which contains the following lines: s/^/fraggelate / s/$/ >>fraggled_files/ Then run the following “sed routine” on script. sh like so: $ sed -f fraggle. sed script. sh >script2. sh Our script would then look like this: fraggelate blurfle1 >>fraggled_files fraggelate blurfle2 >>fraggled_files fraggelate blurfle3 >>fraggled_files raggelate blurfle4 >>fraggled_files . . Chmod it, run it, and you’re done. Of course, this is a somewhat trivial example (“Why wouldn’t you just run “fragglate blurfle* ;;fraggled_files” from the command line? “). Still, one can easily imagine instances where the criteria for the file name list is too complicated to express using [filename]* on the command line. In fact, you can probably see from this sed-routine example that we have the makings of an automatic shell-script generator or file filter. You may also have noticed something odd about that caret in our sed routine.

Why doesn’t it mean “except” as in our previous example? The answer has to do with the sometimes radical difference between what an operator means inside the range operator and what it means outside it. The rules change from tool to tool, but generally speaking, you should use metacharacters inside range operators with caution. Some tools don’t allow them at all, and others change the meaning. To pick but one example, most tools would interpret [A-Za-z. ] as “Any character between A and Z, a and z or a period. ” Most tools provide some way to anchor a match on a word boundary.

In some versions of grep, for example, you are allowed to write: $ grep “fle>” Important. files ;blurfles This says: “Find the characters “f”, “l”, “e”, but only when they come at the end of a word. ”  tells the regex engine to match any word boundary (whether it’s at the beginning or the end) and B tells it to match any position that isn’t a word boundary. This again can vary considerably from tool to tool. Some tools don’t support word boundaries at all, and others support them using a slightly different syntax.

The tools that do support word boundaries generally consider words to be bounded by spaces or punctuation, and consider numerals to be legitimate parts of words, but there are some variations on these rules that can effect the accuracy of your matches. The man page or other documentation should resolve the matter. Escape Characters By now, you’re probably wondering how you go about searching for one of the special characters (asterisks, periods, slashes, and so on). The answer lies in the use of the escape character–for most tools, the backslash (“”).

To reverse the meaning of a special character (in other words, to treat it as a normal character instead of as a metacharacter), we simply put a backslash before that character. So, we know that a regex like “. *” finds any character any number of times. But suppose we’re searching for ellipses of various lengths and we just want to find periods any number of times. Because the period is normally a special character, we’d need to escape it with a backslash: $ grep “. *” Important. Files ;ellipses. files Unfortunately, this contribute to the legendary ugliness of regular expressions more than any other element of the syntax.

Add a few escape characters, and a simple sed routine designed to replace a couple of URL’s quickly degenerates into confusion: sed ‘s/http://etext. lib. virginia. edu//http://www. etext. virginia. edu/g To make matters worse, the list of what needs to be escaped differs from tool to tool. Some tools, for example, consider the “+” quantifier to have its normal meaning (as a ordinary plus sign) until it is escaped. If you’re having trouble with a regex (a sed routine that won’t parse or a grep pattern that won’t match even though you’re certain the pattern exists), try playing around with the escapes.

Or better yet, read the man page. Alternation Alternation refers to the use of the “|” symbol to indicate logical OR. In a previous example, we used “blur[f]+le” to catch those instances of “blurfle” that were misspelled with two “f’s”. Using alternation, we could have written: $ egrep “blurfle|blurffle” Important. files ;blurfles This means simply “Find either blurfle OR blurffle. ” The power of this becomes more evident when we use parentheses to limit the scope of the alternative matches.

Consider the following regex, which accounts for both the American and British spellings of the word “gray”: $ egrep “gr(a|e)y” Important. files ;hazy. shades Or perhaps a mail-filtering program that uses the following regex to single out past correspondence between you and the boss: /(^To:|^From:) (Seaman|Ramsay)/ This says, “Find a ‘To:’ or a ‘From:’ line followed by a space and then either the word ‘Seaman’ or the word ‘Ramsay’ This can make your regex’s extremely flexible, but be careful! Parentheses are also meta characters which figure prominently in the use of . . . Back references

Perhaps the most powerful element of the regular expression syntax, back references allows you to load the results of a matched pattern into a buffer and then reuse it later in the expression. In a previous example, we used two separate regular expressions to put something before and after a filename in a list of files. I mentioned at that point that it wasn’t entirely necessary that we use two lines. This is because back references allow us to get it down to one line. Here’s how: s/(blurfle[0-9]+)/fraggelate 1 >>fraggled_files/ The key elements in this example are the parentheses and the “1”.

Earlier we noted that parentheses can be used to limit the scope of a match. They can also be used to save a particular pattern into a temporary buffer. In this example, everything in the “search” half of the sed routine (the “blurfle” part) is saved into a buffer. In the “replace” half we recall the contents of that buffer back into the string by referring to its buffer number. In this case, buffer “1”. So, this sed routine will do precisely what the earlier one did: find all the instances of blurfle followed by a number between zero and nine and replace it with “fragellate blurfle[some number] ;;fraggled files”.

Backreferences allow for something that very few ordinary search engines can manage; namely, strings of data that change slightly from instance to instance. Page numbering schemes provide a perfect example of this. Suppose we had a document that numbered each page with the notation . The number and the chapter name change from page to page, but the rest of the string stays the same. We can easily write a regular expression that matches on this string, but what if we wanted to match on it and then replace everything but the number and the chapter name? //Page 1, Chapter 2/ Buffer number one (“1”) holds the first matched sequence, ([0-9]+); buffer number two (“2”) holds the second, ([A-Za-z]+). Tools vary in the number of backreference they can hold. The more common tools (like sed and grep) hold nine, but Python can hold up to ninety-nine. Perl is limited only by the amount of physical memory (which, for all practical purposes, means you can have as many as you want). Perl also lets you assign the buffer number to an ordinary scalar variable ($1, $2, etc. ) so you can use it later on in the code block. a.

OBJECT ORIENTED PROGRAMMING AND JAVA Object-oriented Programming was developed because of limitations found in earlier approaches of programming. To appreciate what OOP does, we need to understand what these limitations are and how they arose from traditional programming. PROCEDURAL LANGUAGES Pascal, C, Basic, FORTRAN, and similar languages are procedural languages. That is, each statement in the language tells the computer to do something: Get some input, add these numbers, divide by 6, display the output. A program in a procedural language is a list of instructions.

For very small programs no other organizing principle (often called a paradigm) is needed. The programmer creates the list of instructions, and the computer carries them out. Division into Functions When programs become larger, a single list of instructions becomes unwieldy. Few programmers can comprehend a program of more than a few hundred statements unless it is broken down into smaller units. For this reason the function was adopted as a way to make programs more comprehensible to their human creators. (The term functions is used in C++ and C.

In other languages the same concept may be referred to as a subroutine, a subprogram, or a procedure. ) A program is divided into functions, and (ideally, at least) each function has a clearly defined purpose and a clearly defined interface to the other functions in the program. The idea of breaking a program into functions can be further extended by grouping a number of functions together into a larger entity called a module, but the principle is similar: grouping a number of components that carry out specific tasks.

Dividing a program into functions and modules is one of the cornerstones of structured programming, the somewhat loosely defined discipline that has influenced programming organization for more than a decade. Problems with Structured Programming As programs grow ever larger and more complex, even the structured programming approach begins to show signs of strain. You may have heard about, or been involved in, horror stories of program development. The project is too complex, the schedule slips, more programmers are added, complexity increases, costs skyrocket, the schedule slips further, and disaster ensues.

Analyzing the reasons for these failures reveals that there are weaknesses in the procedural paradigm itself. No matter how well the structured programming approach is implemented, large programs become excessively complex. What are the reasons for this failure of procedural languages? One of the most crucial is the role played by data. Data Undervalued In a procedural language, the emphasis is on doing things–read the keyboard, invert the vector, check for errors, and so on. The subdivision of a program into functions continues this emphasis. Functions do things just as single program statements do.

What they do may be more complex or abstract, but the emphasis is still on the action. What happens to the data in this paradigm? Data is, after all, the reason for a program’s existence. The important part of an inventory program isn’t a function that displays the data, or a function that checks for correct input; it’s the inventory data itself. Yet data is given second-class status in the organization of procedural languages. For example, in an inventory program, the data that makes up the inventory is probably read from a disk file into memory, where it is treated as a global variable.

By global we mean that the variables that constitute the data are declared outside of any function, so they are accessible to all functions. These functions perform various operations on the data. They read it, analyze it, update it, rearrange it, display it, write it back to the disk, and so on. We should note that most languages, such as Pascal and C, also support local variables, which are hidden within a single function. But local variables are not useful for important data that must be accessed by many different functions. Now suppose a new programmer is hired to write a function to analyze this nventory data in a certain way. Unfamiliar with the subtleties of the program, the programmer creates a function that accidentally corrupts the. This is easy to do, because every function has complete access to the data. It’s like leaving your personal papers in the lobby of your apartment building: Anyone can change or destroy them. In the same way, global data can be corrupted by functions that have no business changing it. Another problem is that, since many functions access the same data, the way the data is stored becomes critical.

The arrangement of the data can’t be changed without modifying all the functions that access it. If you add new data items, for example, you’ll need to modify all the functions that access the data so that they can also access these new items. It will be hard to find all such functions, and even harder to modify all of them correctly. It’s similar to what happens when your local supermarket moves the bread from aisle 4 to aisle 12. Everyone who patronizes the supermarket must figure out where the bread has gone, and adjust their shopping habits accordingly.

What is needed is a way to restrict access to the data, to hide it from all but a few critical functions. This will protect the data, simplify maintenance, and offer other benefits as well. Relationship to the Real World Procedural programs are often difficult to design. The problem is that their chief components–functions and data structures–don’t model the real world very well. For example, suppose you are writing a program to create the elements of a graphics user interface: menus, windows, and so on. Quick now, what functions will you need? What data structures?

The answers are not obvious, to say the least. It would be better if windows and menus corresponded more closely to actual program elements. New Data Types There are other problems with traditional languages. One is the difficulty of creating new data types. Computer languages typically have several built-in data types: integers, floating-point numbers, characters, and so on. What if you want to invent your own data type? Perhaps you want to work with complex numbers, or two dimensional coordinates, or dates—quantities the built-in data types don’t handle easily.

Being able to create your own types is called extensibility; you can extend the capabilities of the language. Traditional languages are not usually extensible. Without unnatural convolutions, you can’t bundle together both X and Y coordinates into a single variable called Point, and then add and subtract values of this type. The result is that traditional programs are more complex to write and maintain. The object oriented approach The fundamental idea behind object-oriented languages is to combine into a single unit both data and the functions that operate on that data.

Such a unit is called an object. An object’s functions, called member methods in Java, typically provide the only way to access its data. If you want to read the item and return the value to you, you call a member function in the object. It will read the item and return the value to you. You can’t access the data directly. The data is hidden, so it is safe from accidental modification. Data and its functions are said to be encapsulated into a single entity. Data encapsulation and data hiding are key terms in the description of object oriented languages.

If you want to modify the data in an object, you know exactly what functions interact with it: the member functions in the object. No other functions can access the data. This simplifies writing, debugging, and maintaining the program. A Java program typically consists of a number of objects, which communicate with each other by calling one another’s members functions. We should mention that what are called member functions in C++ are called methods in Java. Also, data items are referred to as instance variables. Calling an object’s member function is referred to as sending a message to the object.