grep (global regular expression print)
Tokenize input
This will print one word per line:
grep -oE '\w+'
-o
,--only-matching
- prints only the matched (non-empty) parts of the matching line, with each such part on a separate output line.-E
,--extended-regexp
- interpret PATTERN as extended regular expressions.\w+
- matches word character
Exclusions
Filter out words in exclude-words
file from stdin
.
grep -vwFf exclude-words
-v
,--invert-match
- invert the sense of matching, to select non-matching lines.-w
,--word-regexp
- select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character.-F
,--fixed-strings
- interpret PATTERN asf ixed strings, not regular expressions.-f FILE
,--file=FILE
-obtain patterns from FILE, one per line. The empty file contains zero patterns, and therefore matches nothing.