You may find difficulties with non-printables in your files. You can see such characters if you open your files in editors like vi. Eventhough commands like “cat” won’t output such non-printable characters into console by default, you can’t remove them by redirecting “cat” output to a different file.
Here is a way to remove non-printable characters with a combination of sed and tr commands.
Use sed with “l” (lower case L) option to print the file/line in a “visually unambiguous” form. From sed output, find the character notation that needs to be removed.
#sed -n ‘l’ filename.txt
Remove control characters that you found from sed output using “tr” or “sed”.
suppose you want to remove the form feed character “\f” lines from filename.txt, use any of the commands given below.
# tr -d ‘\f’ filename.txt
# sed ‘/\f/d’ filename.txt
You can any of the commands given below if you want to remove such control characters only, but not the entire line containing them.
# tr ‘\f’ ‘ ‘ filename.txt
# sed ‘s/\f//’ filename.txt
The control characters in ASCII still in common use include:
* 0 (null, , ^@), originally intended to be an ignored character, but now used by many programming languages to terminate the end of a string.
* 7 (bell, \a, ^G), which may cause the device receiving it to emit a warning of some kind (usually audible).
* 8 (backspace, \b, ^H), used either to erase the last character printed or to overprint it.
* 9 (horizontal tab, \t, ^I), moves the printing position some spaces to the right.
* 10 (line feed, \n, ^J), used as the end_of_line marker in most UNIX systems and variants.
* 12 (form feed, \f, ^L), to cause a printer to eject paper to the top of the next page, or a video terminal to clear the screen.
* 13 (carriage return, \r, ^M), used as the end_of_line marker in Mac OS, OS-9, FLEX (and variants). A carriage return/line feed pair is used by CP/M-80 and its derivatives including DOS and Windows, and by Application Layer protocols such as HTTP.
* 27 (escape, \e [GCC only], ^[).
* 127 (delete, ^?), originally intended to be an ignored character, but now used to erase a character (especially the one to the right of the cursor).
Read more about control characters at wikipedia.