File character encoding / iconv utility
Determine and change file character encoding
Determine what character encoding is used by a file
file -bi [filename]
Example output:
steph@localhost ~ $ file -bi test.txttext/plain; charset=us-ascii
Use vim to change a file’s encoding
If you use the vim text editor, you can configure it to save files as utf-8. Place the following in your /etc/vim/vimrc or ~/.vimrc file:
set encoding=utf-8set fileencoding=utf-8
You will only notice a difference in the encoding if you edit the file and add unicode (utf-8) characters (most character keys on the keyboard will create a unicode equivalent if you hold down the alt key). Start vim, edit the file and add some unicode characters. If you create a test file containing the following…
steph@localhost ~ $ cat utf8test.txtabcdefghijklmnopqrstuvwxyzá ãä ç éêëìíîïðñò õö øùú
…then the file command should tell you the file is utf-8:
steph@localhost ~ $ file -bi utf8test.txttext/plain; charset=utf-8
If you then remove the UTF-8 characters and save the file, it will be us-ascii again.
Change a file’s encoding from the command line
To convert the file contents to from ASCII to UTF-8:
iconv -f ascii -t utf8 [filename] > [newfilename]
Or
recode UTF-8 [filename]
To convert the file contents from UTF-8 to ASCII:
iconv -f utf8 -t ascii [filename]
Because UTF-8 can contain characters that can’t be encoded with ASCII, this command will generate an error unless you tell it to strip non-ASCII characters using the -c flags:
steph@localhost ~ $ iconv -f utf-8 -t ascii utf8test.txtabcdefghijklmnopqrstuvwxyziconv: illegal input sequence at position 27steph@localhost ~ $ iconv -c -f utf-8 -t ascii utf8test.txtabcdefghijklmnopqrstuvwxyz
A similar thing can be achieved using the -f flag with the recode command.
steph@localhost ~ $ recode ascii utf8test.txtrecode: utf8test.txt failed: Invalid input in step `ANSI_X3.4-1968..CHAR'steph@localhost ~ $ recode -f ascii utf8test.txtsteph@localhost ~ $ cat utf8test.txtabcdefghijklmnopqrstuvwxyz
Warning: If you use the iconv -c flag or the recode -f flag, you could loose characters.
Change the filename encoding
To convert the filename from ascii to UTF-8:
Warning:Run this without the –notest option first, to make sure there will be no problems.
convmv -f ascii -t utf8 --notest [filename] > [newfilename]