While we all love Splunk's ability to just recursively eat entire directories of files in real time, there are some cases where you want to give splunk a "whoooa nelly.. not so fast"
For example: I've decided i want to monitor the /var/log directory on my Mac laptop, but i really don't need those old rolled over GZipped logfiles.
Normally, you just add an entry to the $SPLUNK_HOME/etc/system/local/inputs.conf to eat a directory, like this:
[monitor:///mnt/logs]
disabled=false
In my case, i'd like to have that monitor cruise through my directories and eat everything but files that have the ".gz" extension. To do that, just add a "_blacklist" entry in that stanza that contains the regular expression (regex) that matches the files you want. In our case, my inputs.conf will now look like this:
[monitor:///var/log]
_blacklist = \.gz$
Of course if i had multiple extensions i wanted to block, we could add an "or" regex, like this:
_blacklist = \.(txt|gz|tgz|bz2)$
Note: You can also use the syntax "_whitelist" which will disallow everything BUT what matches the whitelist's regex.
More info is over here on
Splunk's docs as well