Discussion:
extracting class name from String
Ted Yu ted_yu-/E1597aS9LQAvxtiuMwx3w@public.gmane.org [seajug]
2014-06-13 16:31:19 UTC
Permalink
Hi,
Given the following log line:

2014-06-11 08:01:27,411 INFO org.apache.hadoop.ipc.RpcServer: RpcServer.responder: starting

How would a regex be written to extract RpcServer as the class name ?
The class name is the identifier between the first colon and the preceding dot.

Thanks
Daniel Kirkdorffer dankirkd-Wuw85uim5zDR7s880joybQ@public.gmane.org [seajug]
2014-06-13 16:39:29 UTC
Permalink
http://www.regular-expressions.info/ is a great resource for understanding regular expressions.

Dan

----- Original Message -----

From: seajug-***@public.gmane.org
To: seajug-***@public.gmane.org
Sent: Friday, June 13, 2014 9:31:19 AM
Subject: [seajug] extracting class name from String




Hi,
Given the following log line:

2014-06-11 08:01:27,411 INFO org.apache.hadoop.ipc.RpcServer: RpcServer.responder: starting

How would a regex be written to extract RpcServer as the class name ?
The class name is the identifier between the first colon and the preceding dot.

Thanks
Konstantin Ignatyev kgignatyev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org [seajug]
2014-06-13 16:59:19 UTC
Permalink
"^[^:]+[.]([^:]+):"
Post by Ted Yu ted_yu-/***@public.gmane.org [seajug]
Hi,
RpcServer.responder: starting
How would a regex be written to extract RpcServer as the class name ?
The class name is the identifier between the first colon and the preceding dot.
Thanks
--
Konstantin Ignatyev

PS: If this is a typical day on planet Earth, humans will add fifteen
million tons of carbon to the atmosphere, destroy 115 square miles of
tropical rainforest, create seventy-two miles of desert, eliminate between
forty to one hundred species, erode seventy-one million tons of topsoil,
add 2,700 tons of CFCs to the stratosphere, and increase their population
by 263,000

Bowers, C.A. The Culture of Denial: Why the Environmental Movement Needs a
Strategy for Reforming Universities and Public Schools. New York: State
University of New York Press, 1997: (4) (5) (p.206)
Stewart Buskirk stewart-HB+ws3Ciqv8@public.gmane.org [seajug]
2014-06-13 17:00:09 UTC
Permalink
Ted,

I think this works:

String testData = "2014-06-11 08:01:27,411 INFO org.apache.hadoop.ipc.RpcServer: RpcServer.responder: starting";
Pattern p = Pattern.compile("\\.([^.]*)\\:");
Matcher m = p.matcher(testData);

if(m.find())
{
System.out.println("Match!");
System.out.println(m.group(1));
}

I love regex patterns. They're like little magic spells. This place is an interesting way to practice them:

http://regexcrossword.com/

Thanks,

-Stewart
Post by Ted Yu ted_yu-/***@public.gmane.org [seajug]
Hi,
2014-06-11 08:01:27,411 INFO org.apache.hadoop.ipc.RpcServer: RpcServer.responder: starting
How would a regex be written to extract RpcServer as the class name ?
The class name is the identifier between the first colon and the preceding dot.
Thanks
Ted Yu ted_yu-/E1597aS9LQAvxtiuMwx3w@public.gmane.org [seajug]
2014-06-13 17:56:10 UTC
Permalink
Thanks for sharing.
I had an expression for the following log format (the string inside brackets are not of interest):

2014-04-11 10:53:16,616 INFO  [RS_LOG_REPLAY_OPS-hor13n04:60020-1] handler.HLogSplitterHandler: successfully transitioned


I was trying to extract using same regex for both formats.
Using the regex below, recognized entries went from 51031 to 50519.

So I need to find the log pattern that caused the difference.




On Friday, June 13, 2014 10:00 AM, "Stewart Buskirk stewart-HB+***@public.gmane.org [seajug]" <seajug-***@public.gmane.org> wrote:



 
Ted,

I think this works:

        String testData = "2014-06-11 08:01:27,411 INFO org.apache.hadoop.ipc.RpcServer: RpcServer.responder: starting";
        Pattern p = Pattern.compile("\\.([^.]*)\\:");
        Matcher m = p.matcher(testData);

        if(m.find())
        {
            System.out.println("Match!");
            System.out.println(m.group(1));
        }

I love regex patterns. They're like little magic spells. This place is an interesting way to practice them:

http://regexcrossword.com/

Thanks,

-Stewart
Post by Ted Yu ted_yu-/***@public.gmane.org [seajug]
Hi,
2014-06-11 08:01:27,411 INFO org.apache.hadoop.ipc.RpcServer: RpcServer.responder: starting
How would a regex be written to extract RpcServer as the class name ?
The class name is the identifier between the first colon and the preceding dot.
Thanks
Jonathan Burke jonathangburke-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org [seajug]
2014-06-13 19:01:53 UTC
Permalink
Hi Ted,

One thing to keep in mind is that inner classes will look like
my.package.to.OuterClass$InnerClass. The previous regex does look like it
would catch this and output the class as "OuterClass$InnerClass" which
might be what you want. Also, anonymous classes would look like
my.package.to.OuterClass$1 (or some other number depending on the number of
anonymous classes you have). This might not be an issue but you may want
to be aware that you may have '$' character in your output.

Jonathan
Post by Ted Yu ted_yu-/***@public.gmane.org [seajug]
Thanks for sharing.
I had an expression for the following log format (the string inside
2014-04-11 10:53:16,616 INFO [RS_LOG_REPLAY_OPS-hor13n04:60020-1]
handler.HLogSplitterHandler: successfully transitioned
I was trying to extract using same regex for both formats.
Using the regex below, recognized entries went from 51031 to 50519.
So I need to find the log pattern that caused the difference.
Ted,
String testData = "2014-06-11 08:01:27,411 INFO
org.apache.hadoop.ipc.RpcServer: RpcServer.responder: starting";
Pattern p = Pattern.compile("\\.([^.]*)\\:");
Matcher m = p.matcher(testData);
if(m.find())
{
System.out.println("Match!");
System.out.println(m.group(1));
}
I love regex patterns. They're like little magic spells. This place is an
http://regexcrossword.com/
Thanks,
-Stewart
Hi,
RpcServer.responder: starting
How would a regex be written to extract RpcServer as the class name ?
The class name is the identifier between the first colon and the preceding dot.
Thanks
kingn@u.washington.edu [seajug]
2014-06-13 22:47:35 UTC
Permalink
Because the problem is a greedy search from the start of the string,
and because the start character is unique and the subsequent stop
character is also without need of further search after finding,
instead of using a regex, you could simply iterate over the characters
in the log statement from character at index 0 to the first ':',
then begin storing characters or note startIndex,
and continue to the subsequent '.' afterwards to stop storing characters
or note stopIndex.
If the log statement is already stored in a String or CharBuffer, you
can use charAt to extract each character without creating new objects.
The comparison can use the characters integer ascii values and the equals operator (comparison is to ':' or '.').



For your specific problem, that should be far fewer steps than using Pattern and Matcher with a regular expression.
kingn@u.washington.edu [seajug]
2014-06-13 22:56:35 UTC
Permalink
Correction, should have been a "reluctant" rather than "greedy" "from the start of the string"...
Ross Bleakney rossbleakney-PkbjNfxxIARBDgjK7y7TUQ@public.gmane.org [seajug]
2014-06-14 00:28:01 UTC
Permalink
"instead of using a regex, you could simply iterate over the characters ..."

That is what I would do. I would string together a few org.apache.commons.lang.StringUtils calls into a method; then write a few unit tests and call it a day. Someone could come along later and rewrite my method using "magic" regular expressions, but as the previous commenter pointed out, that probably wouldn't save you anything. But as long as the unit tests work, it's all good.

Ross

To: seajug-***@public.gmane.org
From: seajug-***@public.gmane.org
Date: Fri, 13 Jun 2014 15:47:35 -0700
Subject: Re: [seajug] extracting class name from String


































Because the problem is a greedy search from the start of the string,
and because the start character is unique and the subsequent stop
character is also without need of further search after finding,
instead of using a regex, you could simply iterate over the characters
in the log statement from character at index 0 to the first ':',then begin storing characters or note startIndex,and continue to the subsequent '.' afterwards to stop storing charactersor note stopIndex.If the log statement is already stored in a String or CharBuffer, youcan use charAt to extract each character without creating new objects.The comparison can use the characters integer ascii values and the equals operator (comparison is to ':' or '.').








For your specific problem, that should be far fewer steps than using Pattern and Matcher with a regular expression.
'P.Hill' parehill1-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org [seajug]
2014-06-16 07:39:26 UTC
Permalink
"2014-06-11 08:01:27,411 INFO org.apache.hadoop.ipc.RpcServer:
RpcServer.responder: starting"

Yeah I think I'd go for that colon after the full class name with some
simple String utils too.
How it is it you are going to justify making a regular expression (which
you had to ask around to get right) that is more robust than simply the
1st colon after the 1st dot?
A change in log format can screw either RE or any two step string scans.

-Paul
Post by Ross Bleakney rossbleakney-***@public.gmane.org [seajug]
"instead of using a regex, you could simply iterate over the
characters ..."
That is what I would do. I would string together a few
org.apache.commons.lang.StringUtils calls into a method; then write a
few unit tests and call it a day. Someone could come along later and
rewrite my method using "magic" regular expressions, but as the
previous commenter pointed out, that probably wouldn't save you
anything. But as long as the unit tests work, it's all good.
Ross
------------------------------------------------------------------------
Loading...