contents   index   previous   next



RegExp Object

 

Regular expressions do not seem very regular to average people. Regular expressions are used to search text and strings, searches that are very powerful if a person makes the effort to learn how to use them. Simple searches may be done like the following:

 

var str = "one two three";

str.indexOf("two");   // == 4

 

The String indexOf() method searches str for "two" and returns the beginning position of "two", which is 4. What if you wanted to find "t" and "o" with or without any characters in between, an "o" only at the beginning of a string, or an "e" only at the end of a string? Before answering, lets consider wildcards.

 

Most computer users are familiar with wildcards in searching, especially since they may be used in finding files. For example, the DOS command:

 

dir t*o.bat

 

will list all files that begin with "t" and end "o" in the filename and that have an extension of "bat". JavaScript does not use wildcards to extend search capability. Instead, ECMAScript, the standard for JavaScript, has implemented regular expression searches that do everything that wildcards do and much, much more. Regular expressions follow the PERL standard, though the syntax has been made easier to read. Anyone who can use regular expressions in PERL already knows how to use JavaScript regular expressions. For advanced information on regular expressions, there are many books in the PERL community, in addition to JavaScript books, that explain regular expressions.

 

Now lets answer the question about how to find the three cases mentioned above.

 

var str = "one two three";

var pat = /t.*o/;

str.search(pat);   // == 4

 

This fragment illustrates one way to use regular expressions to find "t" followed by "o" with any number of characters between them. Two things are different. One the variable pat which is assigned /t.*o/. The slashes indicate the beginning and end of a regular expression pattern, similar to how quotation marks indicate a string. The String search() method is a method of the String object that uses a regular expression pattern to search a string, similar to the String indexOf() method. In fact, they both return 4, the start position of "two", in these examples.

 

The String object has three methods for searching using regular expression patterns. The three methods are:

 

String match()

String replace()

String search()

 

The methods in the RegExp object, for using regular expressions, are explained below in this section. Before we move on to the cases of an "o" at the start or an "e" at the end of a string, consider the current example a little further. What do the slashes "/ . . . /" do? First, they define a regular expression pattern. Second, they create a RegExp object. In our example, the quotes cause str to be a String object, and the slashes cause pat to be a RegExp object. Thus, pat may be used with RegExp methods and with the three String methods that use regular expression patterns.

 

var str = "one two three";

var pat = /t.*o/;

pat.test(str);   // == true

 

By using a method, such as test(), of the RegExp object, the string to be searched becomes the argument rather than the pattern to search for, as with the string methods. The RegExp test() method simply returns true or false indicating whether the pattern is found in the string.

 

var str = "one two three";

var pat = /t.*o/;

str.match(pat);   // == an Array with pertinent info

pat.exec(str);    // == an Array with pertinent info

 

The String match() and RegExp exec() methods return very similar, often the same, results in an Array. The return may vary depending on exactly which attributes, discussed later, are set for a regular expression.

 

To find an "o" only at the start of a string, use something like:

 

var str = "one two three";

var pat = /^o/;

str.search(pat);   // == 0

 

The caret "^" has a special meaning, namely, the start of a string or line. It anchors the characters that follow to the start of a string or line and is one of the special anchor characters.

 

To find an "e" only at the end of a string, use something like:

 

var str = "one two three";

var pat = /e$/;

str.search(pat);   // == 12

 

The dollar sign "$" has a special meaning, namely, the end of a string or line. It anchors the characters that follow to the end of a string or line and is one of the special anchor characters.

 

Note that there is a very important distinction between searching for pattern matches using the String methods and using the RegExp methods. The RegExp methods execute much faster, but the String methods are often quicker to program. So, if you need to do intensive searching in which a single regular expression pattern is used many times in a loop, use the RegExp methods. If you just need to do a few searches, use the String methods. Every time a RegExp object is constructed using new, the pattern is compiled into a form that can be executed very quickly. Every time a new pattern is compiled using the RegExp compile() method, a pattern executes much faster. Other than the difference in speed and script writing time, the choice of which methods to use depends on personal preferences and the particular tasks at hand.

 

In general, the RegExp object allows the use of regular expression patterns in searches of strings or text. The syntax follows the ECMAScript standard, which may be thought of as a large and powerful subset of PERL regular expressions.

 


Regular expression syntax

Regular expression special characters

Regular expression precedence

RegExp object instance properties

RegExp returned array properties

RegExp object instance methods

RegExp object static properties