• Regex to match lines containing multiple strings in Java

    by  • July 30, 2009 • java, regex • 2 Comments

    I spent a disproportionate amount of time today trying to get my head round some Java regular expression code that would match against certain lines in a log file which had a number of given words in it. Thanks to this entry on Stack Overflow, who posed a similar question, I converted the code from C# to Java (including adding more escape characters, as Java doesnt support the ‘@’ string literal character). I then wrapped in a function to give me something that could dynamically generate a regular expression based on an array of strings (note the following code uses the excellent commons lang library hosted at Apache in order to get access to the StringUtils.join method, filling in a couple of gaps):


    public static String constructRegexOr(String[] str)
    {
    return String.format("\\b(%s)\\w*\\b", StringUtils.join(str,"|"));
    }

    Essentially the array of strings are joined with the regex alternation operator, which say if a string has any, or all of, str[0], str[1], … in it, then it is regarded as a match by the regular expression matcher.

    Excellent, I thought. I tested some C# code by creating an instance of the Regex class using the regular expression provided on Stack Overflow, and used the IsMatch(str) method to test it worked, and it did, returning true/false where appropriate.

    In Java, I had some issues: using str.match(regex) and the longer winded way:


    ...
    Pattern p = Pattern.compile(regex);


    if(p.matcher(str).match()) {
    return entry;
    }

    Did not work as I expected. On a hunt, I discovered this article, which says with specific reference to the Java regular expression implementation:

    “regex” is applied as if you had written “^regex$” with start and end of string anchors

    and

    This is different from most other regex libraries

    Which was a revelation – with a slight modification, by using find() instead of match(), i.e.


    ...
    Pattern p = Pattern.compile(regex);


    if(p.matcher(str).find()) {
    return entry;
    }

    The regular expression matched as expected! I’ve pasted the implementation below (with some test values!) — my ‘gotcha’ of the day.


    String str[] = new String[] { "apples","and","oranges"};
    String regex = String.format("\\b(%s)\\w*\\b", StringUtils.join(str,"|"));
    String strToTest = "oranges and lemons";


    Pattern p = Pattern.compile(regex);
    boolean matches = p.matcher(entry).find(); //true

    About

    .NET developer at thetrainline.com, previously web developer at MRM Meteorite. Awarded a PhD in misbehaviour detection in wireless ad-hoc networks.A keen C# ASP.net developer bridging the gap with APIs and JavaScript frameworks, one web app at a time.

    http://www.paulkiddie.com

    2 Responses to Regex to match lines containing multiple strings in Java

    1. October 28, 2011 at 3:50 am

      I made slightly modifications to your example, simulating the input from a textfield (like Google’s) and replacing multiple spaces, trimming and then inserting the “|” operator. So I don’t need a String Array.

      String _toMatch = jtextField.getText().replaceAll(“\\b\\s{2,}\\b”, ” “).trim().replaceAll(” “, “|”);

    2. bharathkumar.ks@sc.com
      December 11, 2011 at 6:57 pm

      Can you please share the entire code which you used. I am trying to do the same by passing the file which i read the character line by line.

    Leave a Reply

    Your email address will not be published. Required fields are marked *