Latest News

the latest news from our team

Regular Expressions

A regular expression, commonly known as Regex, is a special sequence of characters that searches for and finds specific text. The eFORMz Text-to-XML pre-processor processes regular expressions for matching strings of input data, then converts selected strings to XML. The data is examined and characters are matched in accordance with the given specifications of a regular expression.

Additional Resources

Regex Match Testing Utility (Off-site)
Regular Expression Tips, Tricks and Examples (Off-site)
Anchors
Character Classes
Regular Expressions: LookAround
Quantifiers
Ranges
Special Characters

Sample Statements

Listed here are some common regular expressions for matching strings with the Text to XML Editor. Note that all expressions are case-sensitive. Statements marked templates are not exact in that certain portions of the statement (such as numbers) may vary.

^\s*$
Use: For blank space
This statement will pick up any line that is solely white space. The expression reads that the instance of white space at the very beginning of a line, zero or more times until the end of that line, will be matched as blank. For a line that is mostly blank, give an exact number of white spaces for that line. For example, a line that is blank for its first 198 characters will be matched with this statement: ^\s{198}

.[0-12]\d[/]
Use: For short, numerical dates
This statement will match a date with a short numerical format. An example might be 05/02/12. The expression matches any character except a new line that contains any digit from zero through 12, followed by the instance of a forward slash ‘/’ character. As this will pick up all dates, using a statement to pick up the text characters preceding the date may be necessary for matching a certain date. For example, to match ‘Order Date:’ use either of the following expressions:
.(Order)\s{1}+(Date)+(:)
.(Order)+( )+(Date)+(:)
For any character except a new line, followed by the word ‘Order,’ one white space, then one or more instances of the word ‘Date’ and a colon.

^.{2}\d{1}\s
Use: For detail lines (Template)
The preferred method of picking up purchase order detail line information with a regular expression is to note the location of the detail information in a given string and build patterns accordingly. This statement will pick up the very beginning of the line, as long as the character is anything except a new line, followed by exactly 2 spaces and the occurrence of a digit. After the digit (typically this is where the Line field is), exactly 1 white space is accounted for. The exact numbers may vary, but detail lines are typically matched by the number of spaces from the beginning of a line, the instance of a digit (the Line field), followed by the number of white spaces after that digit.

^.\d+[-]+\d
Use: For detail lines matching by item number (Template)
One way of picking up invoice detail line information with a regular expression is to note the location of the detail information in a particular string and build corresponding patterns. This statement will pick up the very beginning of the line, as long as the character is anything except a new line, followed by a digit, one or more instances of a dash ‘-’ and one or more instances of a digit. The digits being referenced are from the Item Number field in an invoice. If the item number has no dashes, then the statement should read: ^.\d+\s
This will pick up the beginning of a line, as long as it isn’t a new line, followed by a digit, then one or more instances of a white space. If the item number has letters, numbers and dashes, then the statement should read: ^.[A-Z]+\d+[-]
This will match the beginning of a line, as long as it it isn’t a new line, followed by any instance of an uppercase letter, one or more instances of a digit and then a dash ‘-’ character. Recall that letters are case-sensitive.

^.{76}\d{1}\.\d{2}
Use: For detail lines matching by extended price (Template)
If a match on item number is not possible, then detail line information can be matched by the extended price field, or whichever field is the final one in the detail line series. This statement will pick up the very beginning of the line, as long as the character is anything except a new line, followed by 76 spaces until the occurrence of 1 digit, followed by a period. The \ is an escape character and is required to treat the period that follows it as literal period and not a grouping construct. Following that, exactly 2 digits must be matched. The number of line spaces may vary (represented by 76 in this example), but the rest of the statement should apply.

^\s{2}\S{1}
Use: For descriptions (Template)
Matching strings with description information might vary significantly depending on output. In this particular example, as long as the first two spaces are blank, followed by one non-white space and no instance of the word ‘Expire’ then the line will be picked up. The expression states that at the beginning of a line there are 2 white spaces followed by 1 non-white space.

^(\s{2}\S{1})((?!Expire).)*$
Use: For descriptions with excluded information (Template)
If there is information that shouldn’t be picked up, such as expiration information, then that data can be excluded. The additional part of this expression is a negative lookahead ‘?!’ which indicates that no instance of the word ‘Expire’ can appear after the 2 white spaces and 1 non-white space for this string to be matched. After the open parenthesis group, the asterisk and dollar sign match any instance of this expression zero or more times until the end of that line.

.DUE( )+DATE[:]
Use: For due dates
Typically found in backorder notifications, this statement will pick up any character except a new line followed by the word DUE, a blank space and one or more instances of the word DATE, followed by a colon. Recall that regular expressions are case-sensitive. If the string is ‘Due Date:’ then the statement should be: .Due( )+Date[:]

.[http]+[:]+[//]+[a-z]
Use: For hyperlinks
Typically found in ship confirmations, this statement will pick up any instance of a URL. The regular expression will match each character except a new line followed by h, t, t or p, one or more instances of a colon, one or more instances of two forward slashes, and one or more instances of any letter between a and z. Recall that regular expressions are case-sensitive. If the URL is capitalized, the statement should read: .[HTTP]+[:]+[//]+[A-Z]

^0*
Use: For values with unwanted zeros in front of them
If a value, such as ‘Quantity Shipped’, has zeroes in front of it, like ‘000000001’, it may be necessary to strip the leading zeros. This statement asserts position at the start of the string, matches the character 0 literally and the * is a quantifier that matches between zero as many times as possible until it hits a different character. So, the statement should read: ^0*

Leave a Reply

Your email address will not be published. Required fields are marked *