RSS feed for blog Linkin Skype Mail Me Twitter

Stickfight

Java quick tip, the pipe delimiter

The ’ (or pipe) symbol is a excellent delimiter, rarely used by users, and of particular use if your doing intra line delimitation, In Lotus script a normal usage example would be:

Dim CountryCode As String
CountryCode = “Britan’GB”
Dim vField As Variant
vField = Split(CountryCode, “’”)

So in Java you would expect to write

String[] vField = CountryCode.split(“’”);

This will appear to work but will do something odd (it normally delimits on every character), this is because split() is expecting a regular expression and the ’ is the OR special character for regular expressions, normally you would just ‘escape’ it with a \ ie ‘'’ but for some unknown reason, with split() you have to double escape it, eg

String[] vField = CountryCode.split(“\’”);

daft little tip, but it might help someone

Old Comments

Mark Myers(30/11/2010 11:15:12 GMT)

true enough, but a pain for though that don’t know (hence the post)

Andrew Magerman(30/11/2010 10:16:43 GMT)

Mark,

I use RegexBuddy { Link } for calculating my regexes. It’s awesome, and it automatically does the annoying Java escaping for you. It’s written by the guys who wrote the O’reilly book on regexes. It’s worth every penny of 30 - I would not do any regexes without it.

Alas, it is windows-only, but I guess you could have a little virtual machine for your regexes. Even if it hurts your Linux soul.

Andrew

Mark Myers(29/11/2010 11:34:22 GMT)

cool, thanks for the tip, on the reason for the double escape the reason that i say unknown is that it allows single escapes on a number of other special characters, i will looking to auto escaping of characters in eclipse as i use reg ex quite a lot , ta

Kerr Rainey(30/11/2010 11:04:56 GMT)

Well, the main problem is not so much how java handles regex, but that there is no regex literal in java. There is no way to simply write a regex pattern into java source without escaping it. If you pass the regex pattern in from some other input then you don’t have the escaping to deal with.

Mark Myers(30/11/2010 10:29:21 GMT)

Andrew, it is a fine bit of code indeed, i use edit pad pro from the same place and would go insane without it, alas you are also right about windows, it tend to only use Linux a good solid host os + media player, and do most of my actual working in windows VM’s Emoticon

Mark Myers(29/11/2010 20:17:17 GMT)

an example would be \b , if you go to an expression builder such as { Link } both \b and ' work just fine, but in split() only \b would work, also ' works fine in things like Thunar file manager.

Kerr Rainey(29/11/2010 10:19:14 GMT)

I think the reason for having to put two backslash chars is that the java string literal for a backslash is “\”. So to pass a regex patern ‘backslash followed by pipe’ to the regex engine from a java string literal you need to put “\’”

You can set up eclipse to automatically escape strings pasted into string literals. Normally I find this a pain, but if you have a long regex pattern that you need to escape it can be handy to turn it on, paste and then turn it off again.

Mark Myers(30/11/2010 09:52:08 GMT)

that’s the problem from my point for view, there is the way reg ex is handled by java and the way it is handled by everything else, if you have a valid regex expression that works else where it has to be modified to work in split() Emoticon it would seem im not the only person that finds this irritating see “backslash mess” near the bottom of { Link }

Kerr Rainey(29/11/2010 17:15:03 GMT)

Do you have a specific example?

All I can think of off the top of my head is if I wanted to pass something like a double quote to the the regex engine. Since I don’t need it escaped in the regex, but I do need to escape it the the java string literal, that would be “"”. If I wanted the regex to look for a backslash, I’d need to escape it in the regex and in the java string literal, ending up with: “\\”

I’d be really curious to find something that didn’t work like that.

Kerr Rainey(30/11/2010 09:16:31 GMT)

Are you sure you are getting the result you expect in the \b case? I’ve just done a little test and although it runs, does not give me the answer I would want.

‘\b’ is the char literal in java for a backspace. Putting that into a string literal however can give you odd looking results. Just printing it to System.out will not show it, but the length of the string will include it.

If it is the regex pattern to your split then it will split on any backspace in your input. But it won’t match anything if there is not backspace char in the input.

If the regex pattern you want to use is to match only the beginning of the word then you will have to escape the backslash in your java string literal.

“Look out Mr. Toad.”.split(“\bo”)
gives
[“Look out Mr. Toad.”]

“Look out Mr. Toad.”.split(“\bo”)
gives
[“Look “, “ut Mr. Toad.”]

“Look out Mr. T\boad.”.split(“\bo”)
gives
[“Look out Mr. T”, “ad.”]

Wormwood(24/11/2011 15:38:50 GMT)

I just had this problem and found your tip using Google. Thank you very much! :)

Leave Your Comments

blog comments powered by Disqus
Latest Blogs