How to POSIX-ly ignore "warning: command substitution: ignored null byte in input"?

Question

Today, I was working with my Raspberry Pi 4 with Debian 12 Bookworm, and I found it odd, that some system text files, namely I needed these two:

/sys/firmware/devicetree/base/model
/sys/firmware/devicetree/base/serial-number

were, upon reading and storing in a variable in bash like this:

rpi_model_name=$(cat /sys/firmware/devicetree/base/model)
rpi_serial_number=$(cat /sys/firmware/devicetree/base/serial-number)

producing the following warning

warning: command substitution: ignored null byte in input

I tried to get rid of the warning by redirecting stderr to /dev/null; for instance:

rpi_model_name=$(cat /sys/firmware/devicetree/base/model 2>/dev/null)

or

rpi_model_name=$(cat /sys/firmware/devicetree/base/model) 2>/dev/null

but we still get that annoying warning:

-bash: warning: command substitution: ignored null byte in input

I just have two shells installed, bash and dash. While dash behaves normal as in without that warning; bash appears to be more mouthful about this. So, at this point, I can't be sure if other shells spit this or some similar warning too.

My goal is to suppress this warning for all shells. I do not care about its origin or value, I just want it gone.

The solution must be written POSIX-ly for portability.

Maybe related/helpful: Suppress execution trace for echo command? (at Super User), and Suppress stderr messages in a bash script. — G-Man Says 'Reinstate Monica', Commented Jul 6 at 20:42
The relevance is that you’re getting an error message from the shell, so redirecting stderr from a command that you’re running in the shell doesn’t make the message go away.   You need to redirect stderr for the shell. … (Your question would be clearer if you showed exactly how you tried “redirecting stderr to /dev/null”.) — G-Man Says 'Reinstate Monica', Commented Jul 6 at 22:52
If a text file contains a NUL character, it isn't a Text File, as per POSIX definition. This is a way for programs to detect "binary" files. — Vilinkameni, Commented Jul 7 at 5:35
@Vilinkameni: I agree that, by definition, anything that contains null bytes isn’t text, and Stéphane Chazelas points out the risk in simply disregarding the warning. And I��agree that warning messages (like the ‘Oil’ light on your dashboard) often serve to alert the user to conditions requiring human attention. … (Cont’d) — G-Man Says 'Reinstate Monica', Commented Jul 7 at 18:34
(Cont’d) … But I quibble with your wording: This is NOT a way for programs to detect "binary" files, because the script has no way of knowing that the shell issued a warning message to the user. (Or am I overlooking something?) — G-Man Says 'Reinstate Monica', Commented Jul 7 at 18:34

Marcus Müller · Accepted Answer · 2024-07-07 15:46:57Z

Ignoring or explicitly deleting an unknown number of zero bytes without understanding where they come from seems like a bad idea! So, let's actually look at where that comes from and in which cases we might want to remove it.

First off, why the error? A shell string itself is a zero-terminated string (a "C string" if you're familiar with that programming language) internally. So, it can't contain the 0-byte; that always signals the end of the string. Hence, the moment you try to store something with a zero byte in a string in bash (or any other POSIX shell), the string is "cleansed" from all 0-bytes, otherwise you'd save the whole output into that string variable, but whenever you read it, you would only get all the characters up to (but excluding) the first 0-byte. Bash programmers know this to be a great source of bugs, so they warn you about it.

Now, can we safely remove these bytes from the input without breaking it?

What this is, according to kernel documentation:

The contents of each file is the exact binary data from the device tree.

And as such, the devicetree string type properties end in the 0-byte, and stringlist type properties are multiple 0-terminated strings just concatenated together. In your case, the model field hence correctly ends in a zero byte; that's part of the exact binary data, and hence also present in the sysfs entry.

Since we, however, know that it's going to be the last byte,

rpi_model_name="$(head -c -1 /proc/device-tree/model)"

would work. And, bonus, if that is not actually a single string but multiple strings concatenated together (you'd see that e.g. in compatible entries!), you'd still get a warning instead of something smashed together without spaces.

Of course, you might want to handle this more generally: if something is a stringlist, you might still want to read it and then use it (e.g. for printing) element-wise or concatenated e.g. with newlines or spaces.

function read_devicetree_node() {
  mapfile -d '' result < "$1"
}

## use like this

read_devicetree_node /proc/device-tree/model
# save the result in a variable. 
# Strange syntax, because we want to save *all* entries from `result` in `model_names`.
model_names=( ${result[@]} )
# print the first element from that `result` array we just got
printf 'First element is %s. Total elements %d.\n' "$model_names" "${#model_names[@]}"
printf 'All elements concatenated with "foobar" in between:'
printf '%sfoobar' ${model_names[@]}

function simple_devicetree_string() {
  local tmp
  mapfile -d '' tmp < $1
  printf '%s ' ${tmp[@]}
}

printf 'Let's read this simply into a line, safely: %s\n' "$(simple_devicetree_string /proc/device-tree/model)"

Other than that, you shouldn't access these paths using /sys/firmware/devicetree/base directly, but use /proc/device-tree/base/ as prefix. That's guaranteed to still be valid next kernel update, /sys/firmware/devicetree/ might be gone. (These aren't actually files stored somewhere – this is just the parsed devicetree structure represented in sysfs. It is calculated by the kernel when you read it, not some file stored on a disk. So, their existence is basically the API to the kernel here – and the kernel developers say "please access via /proc, the other part is what we currently use, but we can't guarantee that won't change. The /proc path, we guarantee.".)

"A shell string itself is a zero-terminated string (a "C string" if you're familiar with that programming language) internally". It doesn't have to be and is not in zsh for instance. That limitation only affects some system calls such as all those that take file paths (and why a file path can't contain NULs) and execve() (which means arguments and env var names and values passed to commands executed with execve() can't contain NULs; but there's no such constraint for functions or shell builtins). — Stéphane Chazelas, Commented Jul 7 at 12:57
There are a few missing quotes which make that code actually quite unsafe. — Stéphane Chazelas, Commented Jul 8 at 7:34
Note the better readarray name of bash's mapfile misnomer. You'd may want to use the -t option to strip the delimiter to be future proof in case bash some day supports storing NULs in its variables. — Stéphane Chazelas, Commented Jul 8 at 7:35
Note the OP asked for a POSIX compatible solution. My understanding is that they mention bash as it's one (of many) implementations of an interpreter for the POSIX sh language. None of mapfile/readarray/head -c-1/function/local/arrays are in POSIX. — Stéphane Chazelas, Commented Jul 8 at 7:37
@StéphaneChazelas I got that, but they're solving a bash-specific issue, so a bash-independent solution is "ignore the problem, it only exists on bash", and that's pretty certainly not what they asked for! — Marcus Müller, Commented Jul 8 at 8:14

Stéphane Chazelas · Accepted Answer · 2024-07-08 06:34:37Z

If the question is how to ignore the warnings or errors that the shell outputs when you try to do something not supported such as here a command substitution with a command that outputs NULs in the GNU implementation of sh (bash), then as @GMan says, the best you can do is:

{ <potentially-unsupported-stuff>; } 2> /dev/null

The shell could also decide to abort in addition to or instead of writing an error message, which:

(<potentially-unsupported-stuff>) 2> /dev/null

Which uses a subshell might avoid, but even then if the <unsupported-stuff> is a syntax error, that won't help.

$ bash -c '( if ) 2> /dev/null; echo not reached'
bash: -c: line 1: syntax error near unexpected token `)'
bash: -c: line 1: `( if ) 2> /dev/null; echo not reached'

And of course in your case using a subshell won't do as you'll lose the value of the assignment.

Instead, you can do, POSIXly:

command eval '<potentially-unsupported-stuff>' 2> /dev/null

So here:

command eval 'var=$(command-that-outputs-non-text)' 2> /dev/null

POSIX requires eval to exit if it fails (which bash ignores when not in POSIX mode), but prefixing with command¹ prevents it.

So it would discard all errors by the shells whilst evaluating the code passed to eval as well as the errors by the commands run during that evaluation and would also be less likely to cause the shell to abort, while still not running a subshell.

Now, that doesn't make that portable. Example:

$ cat test.sh
command eval 'var=$(printf "\61\200\62\0\63\12\12")' 2> /dev/null
printf %s "$var" | od -An -vto1 -tc
$ ARGV0=sh zsh ./test.sh
 061 200 062 000 063
   1 200   2  \0   3
$ ARGV0=sh dash ./test.sh
 061 200 062 063
   1 200   2   3
$ ARGV0=sh bash ./test.sh
 061 200 062 063
   1 200   2   3
$ ARGV0=sh ksh ./test.sh
 061 200 062
   1 200   2
$ ARGV0=sh yash ./test.sh
 061
   1
$ locale charmap
UTF-8

(where ARGV0=sh is my shell (zsh) way to pass sh as argv[0]).

It's simply not possible to store non-text in a sh variable portably.

NUL is a problem for all shells except zsh. Some shells remove them in command substitutions (some with a warning such as bash), some don't but as they work internally with C-style NUL-delimited strings end up discarding it and everything that follows.

NUL is not the only problem as seen in yash's output: in a locale that uses UTF-8 as charmap, that sequence of bytes cannot be decoded into text, and yash stops at the first decoding error.

And you see that all strip the two 012 bytes (the encoding of newline on ASCII-based systems) as required by POSIX.

What you can do is store some text encoding of that output.

In the POSIX tool chest, you can use od or uuencode for that, though to be able to use it later, uuencode would be more useful as you can use uudecode to decode it:

$ var=$(printf '\61\200\62\0\63\12\12' | uuencode -)
$ printf '%s\n' "$var" | uudecode | od -An -vto1 -tc
 061 200 062 000 063 012 012
   1 200   2  \0   3  \n  \n

See how all 7 bytes were preserved.

Beware printf might still fail for relatively small strings with shells where printf is not builtin (such as ksh88 and pdksh and most of its derivatives) on systems where there's a limit on the size of arguments+environment passed to an executed command (most).

If the question is how to portably remove NULs from the output of a command without relying on specific shells doing by themselves in their command substitutions like bash does with a warning (as an extension to or in conflict to the standard, as it's not clear to me what POSIX has to say about it), then yes:

cmd_output_without_NULs_and_trailing_newlines=$(
  cmd | tr -d '\0'
)

Or:

file_contents_without_NULs_and_trailing_newlines=$(
  <file tr -d '\0'
)

is the way to go, but note that it still removes trailing 0xA bytes on ASCII-based systems and can still fail if the output/contents cannot be decoded as text in the current locale in some shells such as yash.

To preserve the trailing newlines (and the exit status), as usual:

file_contents_without_NULs_and_trailing_newlines=$(
  <file tr -d '\0'
  ret=$?
  echo .
  exit "$ret"
)
ret=$?
file_contents_without_NULs=${file_contents_without_NULs%.}

If the question is how to POSIXly decode the output of a command made of several concatenated NUL-delimited strings such as the output of find -print0 into separate shell parameters, then since the 2024 edition of the POSIX standard, you can do:

cmd | {
  set --
  while IFS= read -rd '' var; do
    set -- "$@" "$var"
  done
  # rest of the script that needs to process those strings in
  # the positional parameters must go here, as this part runs
  # in a subshell in some shells such as bash
  printf 'There are %d strings and the first is "%s"\n' "$#" "$1"
}

But not all shells are confirming to that yet. In particular dash, the sh implementation on many GNU/Linux systems isn't and its read doesn't support -d yet as of July 2024.

An alternative is to do:

eval "$(
  cmd |
    LC_ALL=C od -An -vtu1 |
    LC_ALL=C awk -v q="'" '
      BEGIN {
        for (i = 1; i < 256; i++) {
          c[i] = sprintf("%c", i)
          if (c[i] == q) c[i] = q "\\" q q
        }
        printf "set --"
        started = 0
      }
      {
        for (i = 1; i <= NF; i++) {
          if (!started) {
            printf " " q
            started = 1
          }
          n = $i
          sub(/^0+/, "", n) # remove leading 0s that some od
                            # implementations add.
          if (n == "") {
            printf q
            started = 0
          } else printf "%s", c[n]
        }
      }
      END {if (started) printf q}'
  )"

That's still not POSIX as POSIX allows systems where the encoding of ' could vary between locales or locales with charsets with a shift state (where a given byte or byte sequence can represent different characters depending on context), but those are not workable anyway and you won't find them in any locale by default on GNU/Linux based systems (where in practice all locale charsets are supersets of ASCII and locales with charsets with shift states are not enabled by default and not properly supported (not that it's possible to properly support them)).

^{¹ note that it doesn't work in zsh when not in sh emulation, where command there (which predates POSIX') is for running an external command rather than only bypassing functions and remove their specialness to special builtins.}

Vlastimil Burián · Accepted Answer · 2024-07-06 22:50:19Z

This worked for me on Raspberry Pi 4 with Debian 12 Bookworm, and it should be a portable solution (adhering to POSIX).

We can translate all of the NULL characters with an empty string, effectively deleting all of its occurrences from the string using tr utility (POSIX man page) (We know in my case, there is just one string and one NULL character at the end of it in both cases, so we're not removing multiple NULL chars, just one at the end of the string.)

Let's take the first mentioned file as an example (It may be better to read /proc/device-tree/model as per Marcus Müller's answer):

rpi_model_name=$(cat /proc/device-tree/model|tr -d '\0')

or even getting rid of cat:

rpi_model_name=$(tr -d '\0' </proc/device-tree/model)

The following printf can be interchanged with echo,.. whatever makes you comfortable:

printf '%s\n' "$rpi_model_name"

should output something like:

Raspberry Pi 4 Model B Rev 1.5

This may not be the only way, feel free to post your own, or improve this answer.

The same content can be found in an answer on stackoverflow: stackoverflow.com/questions/46163678/… The answers there are quite in-depth! Maybe you want to adopt some of the info from there or link to that answer in your answer. — Marcus Müller, Commented Jul 6 at 20:17

G-Man Says 'Reinstate Monica' · Accepted Answer · 2024-07-07 23:04:15Z

2

The error message you are getting is from the shell. To ignore/suppress it, you need to redirect stderr for the shell. When you do

rpi_model_name=$(cat /sys/firmware/devicetree/base/model 2>/dev/null)

or

rpi_model_name=$(cat /sys/firmware/devicetree/base/model) 2>/dev/null

you are only redirecting stderr for the cat command.

A simple way to redirect stderr for the shell is to follow the pattern shown in my answer to Suppress execution trace for echo command. E.g., incorporating the filenames suggested by Marcus Müller:

{ rpi_model_name=$(cat /proc/device-tree/model); }            2> /dev/null
{ rpi_serial_number=$(cat /proc/device-tree/serial-number); } 2> /dev/null

or

{
  rpi_model_name=$(cat /proc/device-tree/model)
  rpi_serial_number=$(cat /proc/device-tree/serial-number)
} 2> /dev/null

Or use the other techniques presented in related questions; e.g., exec 2> /dev/null, which, of course, suppresses all error messages generated within the script. This is not really a good idea, as it blinds you to problems.

edited Jul 7 at 23:04

answered Jul 6 at 23:26

G-Man Says 'Reinstate Monica'

23.2k27 gold badges74 silver badges122 bronze badges

1

That removes the warnings/errors if any but yields unspecified behaviour. Depending on the shell, you typically find 4 different types of behaviour: 0 bytes removed wirh a warning, 0 bytes removed without warning, everything starting with the first 0 byte discarded, 0 bytes preserved (in zsh, the only shell that can store NULs in its variables). In any case, the trailing newlines are removed as an effect of command substitution.
– Stéphane Chazelas
Commented Jul 7 at 12:53
1

@VlastimilBurián: Perhaps you need to refine your question.   I (not-so-humbly) believe that my answer is the best answer to the question, “How can I ignore/suppress an error/diagnostic message from the shell?”   I will concede that my answer is NOT the best answer to the question, “What should I do if I get a ‘command substitution: ignored null byte in input’ message (from the shell) as a result of running a command that I believe that I want to run?”   It’s the age-old question of “Do I cover the ‘Oil’ light on my dashboard with opaque tape, or do I get an oil change?”   :-)
– G-Man Says 'Reinstate Monica'
Commented Jul 7 at 18:12

Add a comment |

Vilinkameni · Accepted Answer · 2024-07-15 18:20:22Z

The command

rpi_model_name=$(cat /sys/firmware/devicetree/base/model 2>/dev/null)

won't suppress the warning in Bash because it is generated by the variable assignment in the executing shell (Bash) itself, not by cat(1).

The answer by @G-Man Says 'Reinstate Monica' mentions the redirection of stderr by exec(1p), but not that it also can be put into a subshell:

(exec 2>/dev/null
 rpi_model_name=$(cat /sys/firmware/devicetree/base/model)
 # ... some code not relying on stderr, and using rpi_model_name
)
printf "Test\n" >&2

will print Test to stderr.

Note that POSIX has the following to say about the character NUL related to the values of variables:

(8.1 Environment Variable Definition)

These strings have the form name=value; names shall not contain the character '='. For values to be portable across systems conforming to POSIX.1-2017, the value shall be composed of characters from the portable character set (except NUL and as indicated below).
[...]
The values that the environment variables may be assigned are not restricted except that they are considered to end with a null byte and the total space used to store the environment and the arguments to the process is limited to {ARG_MAX} bytes.

Also, this is mentioned about NUL in command substitution:

(2.6.3 Command Substitution)

$(command)
or (backquoted version):
`command`

The shell shall expand the command substitution by executing command in a subshell environment [...] If the output contains any null bytes, the behavior is unspecified.

Essentially, as far as POSIX is concerned, NULs in command output in command substitution might produce unspecified behavior, and variable values containing NUL should end with it.

var=$(command) isn't an env var unless var was previously exported or set -a is in effect, and AFAIK no similar restriction applies to shell vars — dave_thompson_085, Commented 1 hour ago

Stack Exchange Network

How to POSIX-ly ignore "warning: command substitution: ignored null byte in input"?

5 Answers 5

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
shell-script
debian
raspberry-pi
posix
null
.

Linked

Hot Network Questions

How to POSIX-ly ignore "warning: command substitution: ignored null byte in input"?

5 Answers 5

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged shell-scriptdebianraspberry-piposixnull.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
shell-script
debian
raspberry-pi
posix
null
.