Regular expression part 1 in java
In the previous post, we have seen the conditional statements. In this post, we will learn about the regular expression in java.
What is regular expression?
A regular expression is nothing but a String pattern which is used to search, edit or match text or data. The specified string pattern of regular expression when matches against text, the result may be a matched text or a set of matches in the text. This returns true if matches or false if not matches.
Java regex is the official Java regular expression API. It is found in java.util.regex package and has been included in java since java 1.4.
Core classes of Java Regex:
There are 2 core classes of java regex which are mentioned below.
- The Pattern Class (java.util.regex.Pattern)
- The Matcher Class (java.util.regex.Matcher)
- The
PatternSyntaxException
Class (java.util.regex.PatternSyntaxException
)
Pattern class is used to create String pattern i.e. regular expression. It provides a pattern object which is a compiled expression representation of regular expression.
Matcher object is created by invoking matcher method of the Pattern object. Matcher object helps to identify the occurrence of the pattern in the text.
PatternSyntaxException notifies about the wrong pattern.
Another important aspect in regular expression is the syntax of regular expression. To learn regular expression , first we need to know syntax. There are a lot of syntax and we are not going to cover in very detail. Instead we will learn basics of syntax with examples which are commonly used. If you want more details, you can refer Pattern class java doc Page.
Syntax of regular expressions:
1. Characters:
Characters | Description |
\\ | The backslash character |
\t | The tab character (‘\u0009’) |
\n | The newline (line feed) character (‘\u000A’) |
\r | The carriage-return character (‘\u000D’) |
\f | The form-feed character (‘\u000C’) |
\e | The escape character (‘\u001B’) |
2. Character classes:
Characters classes | Description |
[abc] | It is called the simple class.Matches a or b or c in the class. |
[^abc] | Matches any character except a or b or c. |
[a-zA-Z] | Matches character from a to z or A to Z. This is called a range. |
[a-d[m-p]] | Matched character from from a to d or from m to p. This is known as union. |
[a-z&&[def]] | Matches d or e or f. This is known as intersection(between a to z and def). |
[a-z&&[^bc]] | Matches from a to z except characters b and c. This is known as subtraction. |
[a-z&&[^m-p]] | Matches from a to z except from m to p. This is also known as subtraction. |
3. Predefined Character Classes:
Predefined Characters classes | Description |
. | Matches any single character. May or may not match line terminators. |
\d | Matches any digit from 0 to 9 |
\D | Matches any non-digit character [^0-9] |
\s | Matches any white space character like space, tab, line break, carriage return etc. |
\S | Matches any non-white space character. |
\w | Matches any word character. |
\W | Matches any non-word character. |
3. Boundary Matches:
Boundary Matches | Description |
^ | Matches the beginning of a line. |
$ | Matches the end of a line. |
\b | Matches a word boundary. |
\B | Matches a non-word boundary. |
\A | Matches the beginning of the input text. |
\G | Matches the end of the previous match. |
\Z | Matches the end of the input text except the final terminator, if any |
\z | Matches the end of the input text. |
4. Quantifiers:
Greedy | Reluctant | Possesive | Description |
X? | X?? | X?+ | Matches X once, or not at all. |
X* | X*? | X*+ | Matches X zero or more times. |
X+ | X+? | X++ | Matches X one or more times. |
X{n} | X{n}? | X{n}+ | Matches X exactly n times. |
X{n,} | X{n,}? | X{n,}+ | Matches X at least n times. |
X{n,m} | X{n,m}? | X{n,m}+ | Matches X, at least n but not more than m time. |
5. Logical Operators
Logical operators | Description |
XY | X followed by Y |
X|Y | Either X or Y |
We will see the rest of the regular expression in the part 2.
1 Response
[…] the part 1 , we have seen what is regular expression and its syntax. In part 2, we will mostly understand how […]