What Does This Regular Expression Mean?

Click For Summary

Discussion Overview

The discussion revolves around the interpretation of a specific regular expression: 'there \([^ ]*\)'. Participants explore its meaning, functionality, and nuances in the context of text processing using the `sed` command.

Discussion Character

  • Technical explanation
  • Conceptual clarification
  • Debate/contested

Main Points Raised

  • One participant interprets the regular expression as matching the word "there" followed by a space and then a non-space character, questioning if this interpretation is correct.
  • Another participant clarifies that it matches "there " followed by as many consecutive non-blank characters as possible, suggesting that it replaces this match with nothing.
  • A subsequent reply asks if this can be rephrased as matching everything up until a whitespace is found, indicating a possible misunderstanding of the term "whitespace."
  • Another participant points out that "whitespace" typically includes tab characters, confirming that the regex consumes characters until the first space character is encountered.
  • A later post emphasizes the distinction between "blank space" and "whitespace," noting that while "whitespace" includes tabs, "blank space" refers specifically to space characters.

Areas of Agreement / Disagreement

Participants express differing interpretations of the regular expression, particularly regarding the definitions of "blank space" and "whitespace." There is no consensus on a singular interpretation, and the discussion remains unresolved.

Contextual Notes

Participants highlight the importance of terminology, specifically the differences between "blank space" and "whitespace," which may affect the understanding of the regular expression's behavior.

James889
Messages
190
Reaction score
1
Howdy,

I came across a regular expression i couldn't get my head around.

Code:
' there \([^ ]*\)'

Code:
echo "Howdy there neighbor" | sed 's/there \([^ ]*\)//'

returns howdy.

It's the subgroup that's a bit confusing.

match any sentence which contains banana then a space and then a non-space character.

Is this the correct way of interpreting this regular expression ?
 
Technology news on Phys.org
So, basically, it matches "there " (the word 'there' followed with a blank space) followed with as many consecutive non-blank spaces as it can find "[^ ]*" and replaces that with nothing.

You can test that only replaces what I said, if you test it with "Howdy there neighbor what up?"

Oh, the back slashes are there to escape the parenthesis within the double quotes
 
gsal said:
.. followed with as many consecutive non-blank spaces as it can find "[^ ]*" and replaces that with nothing.


Is this the same as saying 'match as much as possible up until a white space is found' ?
 
"White space" would normally include tab characters. This will eat everything up until the first space character; that detail aside, yes.
 
Sorry, I guess I need to be more correct, like Ibix says.

The expression "[^ ]*" will consume consecutive character after character until it finds a "blank space" character.

A "blank space" character itself is not the same as "white space" in general...it is a subset. If the regular expression is looking for "white space" then, "blank space" and "tab" characters qualify...but if you are looking for "blank space", then a "tab" is a totally different character.
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 12 ·
Replies
12
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
15K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
5
Views
3K