LaTeX/Internationalization

From testwiki
Jump to navigation Jump to search

Template:LaTeX/Top


LaTeX requires some additional configuration to typeset documents in languages other than English. In addition, many scripts require LuaTeX, which is currently the recommended engine in newly created documents[1], or XeTeX.

There are currently two packages providing international language support, namely, Babel and Polyglossia:

  • Babel[2] works with the three main engines, namely, pdfTeX, LuaTeX and XeTeX. Depending on the engine the number of supported languages (with various levels of coverage) goes from about 170 to 300, covering about 45 scripts, and new ones can be declared easily from scratch. It also provides partial support for Plain TeX.
  • Polyglossia was devised as an alternative to Babel for the now discouraged XeTeX (although provides partial support for LuaTeX, but not for pdfTeX). It supports about 90 languages, covering about 30 scripts.

Both packages cover the major languages around the World (French, Spanish, Arabic, Chinese, Japanese, Thai, Hindi, Marathi, etc.) and handle the following tasks:

Fonts
Assigning a font to each non-Latin language. Traditional LaTeX requires setting an appropriate font encoding. In Unicode engines, both Babel and Polyglossia rely on the advanced OpenType font features provided by the 'fontspec' package. While Babel offers a high-level interface to automatically handle fonts for different scripts, and set the language and the script, Polyglossia depends on standard calls to 'fontspec' and the latter must be set manually. With Babel + LuaTeX the font can be switched automatically based on script. See also the discussion of Template:LaTeX/Package in the Fonts chapter.
Linebreaking, justification and hyphenation
Activating for each script and language the corresponding line breaking algorithm. In the case of hyphenated languages, loading the language-specific hyphenation patterns. Babel and Polyglossia provide basic line breaking for CJK scripts. Babel provides non-standard hyphenation, like “ff” → “ff-f”, repeated hyphens, and ranked rules, and there is also some tentative support for Arabic and Tibetan justificacion.
Cultural elements
Translating document labels (like “chapter”, “figure”, “bibliography”), as well as formatting dates according to language-specific conventions and formatting numbers for languages that have their own numbering system. Polyglossia can generate the current date in the Hebrew, Islamic (Civil) and Persian calendars; Babel supports in addition Islamic Umm al-Qura, Coptic, Ethiopic, Chinese, and Buddhist.
Bidirectional typesetting
Supporting documents that contain right to left scripts. Babel + LuaTeX uses an algorithm based on the Unicode one, which changes the direction automatically. Layout elements such as tables, margins and so on must be reversed too, and this is done by Babel with LuaTeX to a great extent. With XeTeX, both Babel and Polyglossia rely on the Template:LaTeX/Package package, which requires explicit markup to change the direction.
Typographical rules and transliterations
Performing miscellaneous transformation both at the character level (like transliterations) as well as at the typographical level (like inserting spaces or penalties at appropriate places). Babel with LuaTeX can do this automatically by means of “transforms”; with XeTeX this can be done to some extent (both Babel and Polyglossia), while in 8-bit engines many of them must be done by hand.

With Babel, LaTeX ≥ 2018-04-01, and a monolingual document in UTF-8 encoding (which is the recommended encoding), all you need in many European languages is something like, for example: Template:LaTeX/Usage

In addition, there are some specialized frameworks for languages like Japanese, Korean or Chinese, described below.

Encodings

Unicode engines

When using the xelatex or lualatex engines, many of the problems described below are solved for you. Input files are assumed to be UTF-8 (XeLaTeX also accepts UTF-16 and UTF-32), and the engine automatically maps Unicode characters to their glyphs in the TrueType or OpenType fonts you selected for your document. (This is, of course, assuming those fonts contain the glyphs you need, so you must ensure that your fonts support the languages you are using.)

8-bit engines

With engines not supporting Unicode internally (latex or pdflatex), LaTeX must handle two fundamental problems:

  1. Mapping the bytes of your input file into the characters of the language(s) you want to use.
  2. Mapping those characters to their glyphs in the fonts your document uses.

With them, you must tell LaTeX which encoding to use for your input files, and what "output" encoding it should use to map characters to their glyphs in the fonts. In most cases (especially for multilingual documents), UTF-8 is an optimal input encoding, which is currently the default encoding.

For most Latin languages, T1 is the desired output encoding, and can be set with: Template:LaTeX/Usage Other output encodings for specific languages are shown below.

For additional information, see the discussion of encoding in the Fonts chapter, as well as the Special Characters chapter.

Babel

The core package Template:LaTeX/Package supports the 3 main engines (PDFLaTeX, LuaLaTeX and XeLaTeX). There are two ways to specify the document languages. One of them is as arguments to the package when it is loaded:

Template:LaTeX/Usage

Another approach is making the language a global option in order to let other packages detect and use it: Template:LaTeX/Usage

Finally, Template:LaTeX/Package provides total or partial support for about 300 languages with a set of ini files, which are accessed with Template:LaTeX/LaTeX. This command can be used to define easily your own language from scratch, too. Names are normalized to those in the Unicode CLDR.

Babel will automatically activate the appropriate hyphenation rules for the language you choose. If your LaTeX format does not support hyphenation in the language of your choice, babel will still work but will disable hyphenation, which has quite a negative effect on the appearance of the typeset document (with LuaLaTeX, however, hyphenation rules can be loaded when the document is being typeset). Babel also specifies new commands for some languages, which simplify the input of special characters. See the sections about languages below for more information.

If you call babel with multiple languages: Template:LaTeX/Usage Short texts in a secondary language does not require an explicit declaration in the preamble, because Template:LaTeX/Package supports lazy loading of languages. Just select it as explained in what follows and the basic declarations will be loaded on the fly.

The last language in the option list will be active (i.e. languageB), and you can use the command Template:LaTeX/Usage to change the active language (when the document begins, with Template:LaTeX/LaTeX, the main language is automatically selected). You can also add short pieces of text in another language using the command Template:LaTeX/Usage

Babel also offers various environments for entering larger pieces of text in another language:

Template:LaTeX/Usage

The starred version of this environment typesets the main text according to the rules of the other language, but keeps the language specific string for ancillary things like figures in the main language of the document.

The babel manual provides much more information on these and many other options.

Font management

If you are using XeTeX or LuaTeX, Babel supports OpenType fonts with Template:LaTeX/Package. To ease font handling, it provides the macro \babelfont, which switches the font across languages and sets the OpenType ‘language system’ (ie, language and script). Let us assume you are setting up a document in Swedish, with some words in Hebrew, with a font suited for both languages: Template:LaTeX/Usage

If, on the other hand, you have to resort to different fonts, you would say: Template:LaTeX/Usage

Also, with version >=3.38 the locale identifiers (\language and \localeid) and the fonts can be switched without explicit markup, depending on the script (only LuaTeX). In the following example, bidi=basic switches the direction, and onchar=ids fonts switches the identifiers and the font: Template:LaTeX/Usage

Bidirectional texts

Babel provides basic support for bidirectional texts. The package option may take three values, namely, default, basic-r, and basic. With bidi=basic RTL and LTR text can be mixed without explicit markup (only LuaTeX).

Customization

Babel provides tools to customize the behavior of languages. The most common requirement is modifying the default captions like the chapter name, because the strings defined for each language not always fit some editorial or personal preferences. The command \setlocalecaption is provided for this purpose. For example: Template:LaTeX/Usage

Formerly, the only way to customize the captions was with something like \addto\captionsspanish{\renewcommand{\contentsname}{Contenidos}}. According to the babel manual, this low level interface is not recommended.

Multilingual versions

It is possible in LaTeX to typeset the content of one document in several languages and to choose upon compilation which language to output in predefined strings (chapter name, date, etc.). Using the commands above in multilingual documents can be cumbersome, and therefore Template:LaTeX/Package provides a way to define shorter names. With Template:LaTeX/Usage You can write: Template:LaTeX/Usage

There is a clear drawback to this feature, namely, the ‘prefix’ \text... is heavily overloaded in LaTeX and conflicts with existing macros may arise. The Template:LaTeX/Package manual recommends to to stick to the default selectors or to define your own alternatives.

The current language can also be tested by using the Template:LaTeX/Package package by Heiko Oberdiek (the built-in feature from the babel package is not reliable). Here comes a simple example:

\IfLanguageName{ngerman}{Hallo}{Hello}

This allows to easily distinguish between two languages without the need of defining own commands. Another approach for localized strings is Template:LaTeX/Package.

Polyglossia

When using XeLaTeX or LuaLaTeX, Template:LaTeX/Package provides an alternative to the core Template:LaTeX/Package package for international language support, as described in its manual.

The original aim was to be compatible with Template:LaTeX/Package, but there is a number of differences. For example, the standard mechanism in LaTeX to declare languages, via package or class options, is not recognized, and the user must rely on a set of new commands, as shown in the example. Unlike Template:LaTeX/Package, secondary languages must be always explicitly declared, because Template:LaTeX/Package doesn't support lazy loading. It also adds the concept of ‘language variant’, while in Template:LaTeX/Package, all locales are treated on an equal footing. Not only languages are declared in a non standard way, but also a new way to switch languages has been devised, with commands like \textenglish or \textlang. Font are set with standard `fontspec` commands and no higher level interface is provided.

To use Template:LaTeX/Package, load it in your preamble and specify the languages you will be using, along with any language-specific options you wish.

If, for example, we have a document with American English as the main language, and some short texts in French, Bulgarian and Greek, you might use: Template:LaTeX/Usage As a comparison, here is the code with `babel`: Template:LaTeX/Usage

Specific languages

Here is a collection of language-specific suggestions. If you have experience in a language not listed below, please add some notes about it. Some of the methods described in this chapter may be useful when dealing with non-English author names in bibliographies.

Template:Info

Arabic script

Documents in the Arabic script, including Arabic, Persian, Urdu, Pashto, Kurdish, Uyghur, etc., are best typeset with either XeTeX or LuaTeX. An example with Template:LaTeX/Package and LuaTeX follows (rendering by the browser may be different from an editor):

Template:LaTeX/Usage

With XeTeX, you may set bidi=bidi, but mixed LR and RL text must be marked up explicitly. The same applies to Template:LaTeX/Package.

Template:LaTeX/Package with LuaTeX provides partial and tentative support for Arabic justification based on kashida (with the ARABIC TATWEEL Unicode character) or on the ‘justification alternatives’ OpenType table (jalt).

An alternative package for LuaTeX is Template:LaTeX/Package, which is an extension for LuaTeX of Template:LaTeX/Package, described below. For XeTeX there is Template:LaTeX/Package.

In 8-bit engines, they can be typeset in a number of ways, one of the oldest being Template:LaTeX/Package. Add the following code to your preamble:

Template:LaTeX/Usage You can input text in either romanized characters or native Arabic script encodings. Use any of the following commands and environments to enter in text:

Template:LaTeX/Usage

See the ArabTeX Wikipedia article for further details.

You may also use the Template:LaTeX/Package package within Babel to typeset Arabic and Persian

Template:LaTeX/Usage

You may also copy and paste from PDF files produced with Arabi thanks to the support of the Template:LaTeX/Package package. You may use Arabi with LyX, or with tex4ht to produce HTML.

See Arabi page on CTAN

Armenian

The Armenian script uses its own characters, which will require you to install a text editor that supports Unicode and will allow you to enter UTF-8 text, such as Texmaker or WinEdt. These text editors should then be configured to compile using XeLaTeX or LuaLaTeX.

Once the text editor is set up to compile with XeLaTeX or LuaLaTeX, the Template:LaTeX/Package package can be used to write in Armenian:

Template:LaTeX/Usage

or

Template:LaTeX/Usage

The Sylfaen font lacks italic and bold, but DejaVu Serif supports them.

See Armenian Wikibooks for further details, especially on how to configure the Unicode supporting text editors to compile with Unicode engines.

Cyrillic script

Currently the most convenient way to typeset Cyrillic texts is with XeTeX or LuaTeX in the UTF-8 encoding. An example for Russian with these engines, which do not require encoding transformations because everything is done directly in that encoding, is: Template:LaTeX/Usage

Support for Cyrillic in non-Unicode engines is based on standard LaTeX mechanisms plus the Template:LaTeX/Package and Template:LaTeX/Package packages. Template:LaTeX/Package includes support for the T2* encodings and for typesetting Bulgarian, Russian and Ukrainian texts using Cyrillic letters[3] with non-Unicode engines. AMS-LaTeX packages should be loaded before Template:LaTeX/Package and Template:LaTeX/Package(Why?). If you are going to use Cyrillics in mathmode, you also need to load Template:LaTeX/Package package before Template:LaTeX/Package:

Template:LaTeX/Usage

Generally, Template:LaTeX/Package will automatically choose the default font encoding, for the above three languages this is T2A. However, documents are not restricted to a single font encoding. For multilingual documents using Cyrillic and Latin-based languages it makes sense to include Latin font encoding explicitly. Babel will take care of switching to the appropriate font encoding when a different language is selected within the document.

On modern operating systems it is beneficial to use Unicode (Template:LaTeX/Parameter or Template:LaTeX/Parameter) instead of KOI8-RU (Template:LaTeX/Parameter) as an input encoding for Cyrillic text.

In addition to enabling hyphenations, translating automatically generated text strings, and activating some language specific typographic rules (like Template:LaTeX/LaTeX), Template:LaTeX/Package provides some commands allowing typesetting according to the standards of Bulgarian, Russian, or Ukrainian languages.

For all three languages, language specific punctuation is provided: the Cyrillic dash for the text (it is little narrower than Latin dash and surrounded by tiny spaces), a dash for direct speech, quotes, and commands to facilitate hyphenation:

Key combination Action
Template:LaTeX/LaTeX No ligature at this position.
Template:LaTeX/LaTeX Explicit hyphen sign, allowing hyphenation in the rest of the word.
Template:LaTeX/LaTeX Cyrillic emdash in plain text.
Template:LaTeX/LaTeX Cyrillic emdash in compound names (surnames).
Template:LaTeX/LaTeX Cyrillic emdash for denoting direct speech.
Template:LaTeX/LaTeX Similar to Template:LaTeX/LaTeX, but it produces no hyphen sign (used for compound words with hyphen, e.g. Template:LaTeX/LaTeX or some other signs as “disable/enable”).
Template:LaTeX/LaTeX Compound word mark without a breakpoint.
Template:LaTeX/LaTeX Compound word mark with a breakpoint, allowing hyphenation in the composing words.
Template:LaTeX/LaTeX Thinspace for initials with a breakpoint in a following surname.
Template:LaTeX/LaTeX German opening double quote (,,).
Template:LaTeX/LaTeX German closing double quote (“).
Template:LaTeX/LaTeX French opening double quote (<<).
Template:LaTeX/LaTeX French closing double quote (>>).

The Russian and Ukrainian options of Template:LaTeX/Package define the commands Template:LaTeX/Usage which act like Template:LaTeX/LaTeX and Template:LaTeX/LaTeX (commands for turning counters into letters, e.g. Template:LaTeX/LaTeX), but produce capital and small letters of Russian or Ukrainian alphabets (whichever is the active language of the document).

The Bulgarian option of Template:LaTeX/Package provides the commands Template:LaTeX/Usage which make Template:LaTeX/LaTeX and Template:LaTeX/LaTeX produce letters of either Bulgarian or Latin (English) alphabets. The default behaviour of Template:LaTeX/LaTeX and Template:LaTeX/LaTeX for the Bulgarian language option is to produce letters from the Bulgarian alphabet.

See the Bulgarian translation of "The Not So Short Introduction to LaTeX" [4] for a method to type Cyrillic letters directly from the keyboard using a different distribution.

Chinese

Typesetting Chinese texts (and, in general, CJK script ones) is best done with a complete framework, like Template:LaTeX/Package o Template:LaTeX/Package, although for short texts or a few words in horizontal typesetting Template:LaTeX/Package with XeTeX and LuaTeX could be enough, with basic line breaking.

CJK Package

One possible Chinese support is made available thanks to the Template:LaTeX/Package package collection. If you are using a package manager or a portage tree, the CJK collection is usually in a separate package because of its size (mainly due to fonts).

Make sure your document is saved using the UTF-8 character encoding. See Special Characters for more details. Put the parts where you want to write chinese characters in a Template:LaTeX/Environment environment.

Template:LaTeX/Usage

The last argument specifies the font. It must fit the desired language, since fonts are different for Chinese, Japanese and Korean. Possible choices for Chinese include:

  • gbsn (简体宋体, simplified Chinese)
  • gkai (简体楷体, simplified Chinese)
  • bsmi (繁體細上海宋體, traditional Chinese)
  • bkai (繁體標楷體, traditional Chinese)

In CTeX distribution (which has been outdated), six more fonts for simplified Chinese are included, corresponding to default Windows fonts:

  • song (宋体, Simsun)
  • hei (黑体, Simhei)
  • fang (仿宋, STFangSong)
  • kai (楷体, STKaiti)
  • li (隶书, SimLi)
  • you (幼圆, SimYou)

xeCJK Package

When using the XeTeX engine, there is another package called xeCJK, which is based on fontspec and offers similar interface to CJK package.

When using the package, one can define CJK fonts like this: Template:LaTeX/Usage

Czech

Czech is fine using Template:LaTeX/Usage UTF-8 allows you to have „czech quotation marks“ directly in your text. Otherwise, there are macros \clqq and \crqq to produce left and right quote. You can place quotated text inside Template:LaTeX/LaTeX.

Copying and searching in PDF

Although czech letters with diacritical sign are displayed correctly, they are not copy-able or search-able in PDF files generated with pdfLaTeX with just command above. Using package cmap solves this for some fonts, for others is also neccessary to use command glyphtounicode.

Combinations of commands with different fonts
Font (no additional command)
\usepackage{cmap}
\usepackage[resetfonts]{cmap}
\usepackage{cmap}
\input{glyphtounicode}
\pdfgentounicode=1
\usepackage{lmodern}
Template:Color Template:Color Template:Color Template:Color
\usepackage{ebgaramond}
Template:Color Template:Color Template:Color Template:Color

Devanagari and other Indic scripts

The Devanagari script is used by many languages, including Marathi, Pāḷi, Sanskrit, Hindi, Nepali, Bodo, Konkani, Prakrit. Here is an example for Hindi with Template:LaTeX/Package, for both XeTeX and LuaTeX: Template:LaTeX/Usage

Other Indic scripts have a similar setup (Malayalam, Bengali, Sinhala, Telugu, Tamil, Kannada, Assamese, Punjabi, etc.).

If any additional features are required, you need an alternative approach, as illustrated in the following example for Bangla, which sets the option mapdigits for the Arabic digits to be converted to the local ones (only LuaTeX). Template:LaTeX/Usage

Mapping the digits is accomplished in XeTeX at the font level, with the option Mapping=, like: Template:LaTeX/Usage This is actually a XeTeX feature and doesn't require Template:LaTeX/Package. It can be used directly with Template:LaTeX/Package.

Support in pdfTeX is based mainly on the Template:LaTeX/Package package. An alternative for XeTeX is Template:LaTeX/Package, which relies on Template:LaTeX/Package.

Finnish

Finnish language hyphenation is enabled with: Template:LaTeX/Usage This will also automatically change document language (section names, etc.) to Finnish.

French

As of version 3.0 of Template:LaTeX/Package, it is advised to choose the language as a global option with the following command[5]:

Template:LaTeX/Usage

Formerly, you could load French language support with the following command:

Template:LaTeX/Usage

or

Template:LaTeX/Usage


There are multiple options for typesetting French documents, depending on the flavor of French: Template:LaTeX/Parameter for Parisian French, and Template:LaTeX/Parameter and Template:LaTeX/Parameter for new-world French. If you do not know or do not really care, we would recommend using Template:LaTeX/LaTeX.

All enable French hyphenation, if you have configured your LaTeX system accordingly. All of these also change all automatic text into French: Template:LaTeX/LaTeX prints Chapitre, Template:LaTeX/LaTeX prints the current date in French and so on. A set of new commands also becomes available, which allows you to write French input files more easily. Check out the following table for inspiration:

input code rendered output
\og guillemets \fg{} « guillemets »
M\up{me}, D\up{r} Mme, Dr
1\ier{}, 1\iere{}, 1\ieres{} 1er, 1re, 1res
2\ieme{} 4\iemes{} 2e 4es
\No 1, \no 2 N° 1, n° 2
20~\degres C, 45\degres 20 °C, 45°
M. \bsc{Durand} M. Durand
\nombre{1234,56789} 1 234,567 89

You may want to typeset guillemets and other French characters directly if your keyboard has them. Running Xorg (*BSD and GNU/Linux), you may want to use the oss variant which features some nice shortcuts, like

Key combination Character
Alt Gr + w «
Alt Gr + x »
Alt Gr + Shift + é É
Alt Gr + Shift + è È
Alt Gr + Shift + ç Ç

You will need the T1 font encoding for guillemets to print properly.

For the degree character you will get an error like

! Package inputenc Error: Unicode char \u8:° not set up for use with LaTeX.

The Template:LaTeX/Package package will fix it for you.

The great advantage of Babel for French is that it will handle some elements of French typography for you, especially non-breaking spaces before all two-parts punctuation marks. So now you can write:

Template:LaTeX/Usage

The non-breaking space before the euro symbol is still necessary because currency symbols and other units or not supported in general (that's not specific to French).

You can use the Template:LaTeX/Package package along Babel. It will let you print numbers the French way.

\usepackage[french]{babel}
\usepackage[autolanguage]{numprint} % Must be loaded *after* babel.

% ...

\nombre{123456.123456 e-17}

123456,1234561017

Template:BookCat


You will also notice that the layout of lists changes when switching to the French language. This is customizable using the Template:LaTeX/LaTeX command. For more information on what the Template:LaTeX/Parameter option of Template:LaTeX/Package does and how you can customize its behavior, run LaTeX on file frenchb.dtx and read the produced file frenchb.pdf or frenchb.dvi. You can get the PDF version on CTAN.

German

You can load German language support using either one of the two following commands (pdfTeX, XeTeX and LuaTeX are supported).

For traditional ("old") German orthography use Template:LaTeX/Usage

or for reform ("new") German orthography use

Template:LaTeX/Usage

This enables German hyphenation, if you have configured your LaTeX system accordingly. It also changes all automatic text into German, e.g. “Chapter” becomes “Kapitel”. A set of new commands also becomes available, which allows you to write German input files more quickly even when you don't use the inputenc package. Check out the table below for inspiration. With inputenc, all this becomes moot, but your text also is locked in a particular encoding world.

German Special Characters.
"A "O "U Ä Ö Ü
"a "o "u "s ä ö ü ß
"` or \glqq
"' or \grqq
\glq \grq
"< or \flqq «
"> or \frqq »
\flq \frq ‹ ›
\dq "

In German books you sometimes find French quotation marks («guillemets»). German typesetters, however, use them differently. A quote in a German book would look like »this«. In the German speaking part of Switzerland, typesetters use «guillemets» the same way the French do. A major problem arises from the use of commands like Template:LaTeX/LaTeX: If you use the OT1 font encoding (which is the default) the guillemets will look like the math symbol "", which turns a typesetter's stomach. T1 encoded fonts, on the other hand, do contain the required symbols. So if you are using this type of quote, make sure you use the T1 encoding.

Decimal numbers usually have to be written like 0{,}5 (not just 0,5). Packages like ziffer enable input like 0,5. Alternatively, one can use the Template:LaTeX/LaTeX command from the Template:LaTeX/Package and (globally) set the decimal marker using

\usepackage[output-decimal-marker={,}]{siunitx}
% ...
\num{0,5}

0,5

Template:BookCat

Greek

This is the preamble you need to write in the Greek language. Template:LaTeX/Usage

This preamble enables hyphenation and changes all automatic text to Greek. A set of new commands also becomes available, which allows you to write Greek input files more easily.

Modern Monotonic Greek, as well as Polytonic and Ancient Greek are supported.

If you need a language in the Latin script and you are using LuaTeX, you can switch automatically the font in the following way, with no explicit markup: Template:LaTeX/Usage

There is a dedicated package for XeTeX named xgreek.

Hungarian

Use the following lines: Template:LaTeX/Usage

More information in hungarian.

Icelandic and Faroese

The following lines can be added to write Icelandic text:

Template:LaTeX/Usage

This changes text like Part into Hluti. It makes additional commands available:

Icelandic special commands
"` or \glqq
\grqq
\TH Þ
\th þ
\DH Ð
\dh ð

To make special characters such as Þ and Æ become available just add:

Template:LaTeX/Usage

The default LATEX font encoding is OT1, but it contains only the 128 characters. The T1 encoding contains letters and punctuation characters for most of the European languages using Latin script.

Italian

Italian is well supported by LaTeX. Just add Template:LaTeX/Usage at the beginning of your document and the output of all the commands will be translated properly.

Norwegian

Norwegian is well supported by LaTeX. Just add Template:LaTeX/Usage at the beginning of your document and the output of all the commands will be translated properly.

Japanese

jlreq

The package provides the class file and JFM (Japanese font metric) files for LuaTeX-ja / pLaTeX / upLaTeX. This aims to implement Requirements for Japanese Text Layout.

upTeX, pTeX

There is a variant of TeX intended for Japanese named upTeX, which supports vertical typesetting.

luatexja

Another possible way to write in japanese is to use Lualatex and the Template:LaTeX/Package package. Adapted example from the Luatexja documentation : Template:LaTeX/Usage

You can also use capabilities provided by the fontspec package and those provided by luatexja-fontspec to declare the font you want to use in your paper. Let us take an example: Template:LaTeX/Usage Use UTF-8 as your encoding. In case you don't know how to do this, take a look at Texmaker, a LaTeX editor which uses UTF-8 by default.

luatex-ja can collaborate with babel. For example: Template:LaTeX/Usage For short Japanese texts (a few words or a few paragraphs) in a document in another language, babel (≥3.31) with luatex could be enough; eg: Template:LaTeX/Usage

For Template:LaTeX/Package package to show the Table of Contents correctly, the encoding has to be explicitly specified. Template:LaTeX/Usage

CJK, XeCJK, bxcjkjatype

Another (but old) possible Japanese support is made available thanks to the Template:LaTeX/Package or Template:LaTeX/Package package collection. If you are using a package manager or a portage tree, the CJK collection is usually in a separate package because of its size (mainly due to fonts).

Make sure your document is saved using the UTF-8 character encoding. See Special Characters for more details. Put the parts where you want to write japanese characters in a Template:LaTeX/Environment environment. Template:LaTeX/Usage The last argument specifies the font. It must fit the desired language, since fonts are different for Chinese, Japanese and Korean. Template:LaTeX/Parameter is an example for Japanese.

The Template:LaTeX/Package pack­age pro­vides a work­ing con­fig­u­ra­tion of the CJK pack­age, suit­able for Ja­panese type­set­ting of mod­er­ate qual­ity. More­over, it fa­cil­i­tates use of the CJK pack­age for pLATEX users, by pro­vid­ing com­mands that are sim­i­lar to those used by the pLATEX ker­nel and some other pack­ages used with it. Template:LaTeX/Usage

Korean

The two most widely used encodings for Korean text files are EUC-KR and its upward compatible extension used in Korean MS-Windows, CP949/Windows-949/UHC. In these encodings each US-ASCII character represents its normal ASCII character similar to other ASCII compatible encodings such as ISO-8859-x, EUC-JP, Big5, or Shift_JIS. On the other hand, Hangul syllables, Hanjas (Chinese characters as used in Korea), Hangul Jamos, Hiraganas, Katakanas, Greek and Cyrillic characters and other symbols and letters drawn from KS X 1001 are represented by two consecutive octets. The first has its MSB set. Until the mid-1990's, it took a considerable amount of time and effort to set up a Korean-capable environment under a non-localized (non-Korean) operating system. You can skim through the now much-outdated http://jshin.net/faq to get a glimpse of what it was like to use Korean under non-Korean OS in mid-1990's.

TeX and LaTeX were originally written for scripts with no more than 256 characters in their alphabet. To make them work for languages with considerably more characters such as Korean or Chinese, a subfont mechanism was developed. It divides a single CJK font with thousands or tens of thousands of glyphs into a set of subfonts with 256 glyphs each.

For Korean, there are three widely used packages.

  • HLATEX by UN Koaunghi
  • hLATEXp by CHA Jaechoon
  • the CJK package by Werner Lemberg

HLATEX and hLATEXp are specific to Korean and provide Korean localization on top of the font support. They both can process Korean input text files encoded in EUC-KR. HLATEX can even process input files encoded in CP949/Windows-949/UHC and UTF-8 when used along with Λ, Ω.

The CJK package is not specific to Korean. It can process input files in UTF-8 as well as in various CJK encodings including EUC-KR and CP949/Windows-949/UHC, it can be used to typeset documents with multilingual content (especially Chinese, Japanese and Korean). The CJK package has no Korean localization such as the one offered by HLATEX and it does not come with as many special Korean fonts as HLATEX.

The ultimate purpose of using typesetting programs like TeX and LaTeX is to get documents typeset in an aesthetically satisfying way. Arguably the most important element in typesetting is a set of welldesigned fonts. The HLATEX distribution includes UHC PostScript fonts of 10 different families and Munhwabu fonts (TrueType) of 5 different families. The CJK package works with a set of fonts used by earlier versions of HLATEX and it can use Bitstream's cyberbit True-Type font.

To use the HLATEX package for typesetting your Korean text, put the following declaration into the preamble of your document: Template:LaTeX/Usage This command turns the Korean localization on. The headings of chapters, sections, subsections, table of content and table of figures are all translated into Korean and the formatting of the document is changed to follow Korean conventions. The package also provides automatic particle selection. In Korean, there are pairs of post-fix particles grammatically equivalent but different in form. Which of any given pair is correct depends on whether the preceding syllable ends with a vowel or a consonant. (It is a bit more complex than this, but this should give you a good picture.) Native Korean speakers have no problem picking the right particle, but it cannot be determined which particle to use for references and other automatic text that will change while you edit the document. It takes a painstaking effort to place appropriate particles manually every time you add/remove references or simply shuffle parts of your document around. HLATEX relieves its users from this boring and error-prone process.

In case you don't need Korean localization features but just want to typeset Korean text, you can put the following line in the preamble, instead. Template:LaTeX/Usage For more details on typesetting Korean with HLATEX, refer to the HLATEX Guide. Check out the web site of the Korean TeX User Group (KTUG).

In the FAQ section of KTUG it is recommended to use the kotex package

Template:LaTeX/Usage

Persian script

For Persian language, there is a dedicated package called XePersian which uses XeLaTeX as the typesetting engine. Just add the following code to your preamble:

Template:LaTeX/Usage

See XePersian page on CTAN

Moreover, Arabic script can be used to type Persian as illustrated in the corresponding section.

Polish

If you plan to use Polish in your encoded document, use the following code: Template:LaTeX/Usage

The above code merely allows to use Polish letters and translates the automatic text to Polish, so that "chapter" becomes "rozdział". There are a few additional things one must remember about.

Connectives

Polish has many single letter connectives: "a", "o", "w", "i", "u", "z", etc., grammar and typography rules don't allow for them to end a printed line. To ensure that LaTeX won't set them as last letter in the line, you have to use non breakable space:

Template:LaTeX/Usage

Babel (>=3.58) with LuaTeX provides a transform for this purpose, without explicit markup, which is activated with: Template:LaTeX/Usage

Numerals

According to Polish grammar rules, you have to put dots after numerals in chapter, section, subsection, etc. headers.

This is achieved by redefining few LaTeX macros.

For books: Template:LaTeX/Usage

For articles: Template:LaTeX/Usage


Alternatively you can use dedicated document classes:

Those classes have much more European typography settings but do not require the use of Polish babel settings or character encoding.

Simple usage: Template:LaTeX/Usage

Full documentation for those classes is available at http://web.archive.org/web/20040609034031/http://www.ci.pwr.wroc.pl/~pmazur/LaTeX/mwclsdoc.pdf (Polish).

Indentation

It may be customary (depending on publisher) to indent the first paragraph in sections and chapters: Template:LaTeX/Usage

Hyphenation and typography

It's much more frowned upon to set pages with hyphenation between pages than it is customary in American typesetting.

To adjust penalties for hyphenation spanning pages, use this command: Template:LaTeX/Usage

To adjust penalties for leaving widows and orphans (clubs in TeX nomenclature) use those commands: Template:LaTeX/Usage

Commas in math

According to some typography rules, fractional parts of numbers should be delimited by a comma, not a dot. To make LaTeX not insert additional space in math mode after a comma (unless there is a space after the comma), use the Template:LaTeX/Package package.

Template:LaTeX/Usage

Unfortunately, it is partially incompatible with the Template:LaTeX/Package package. One needs to either use dots in columns with numerical data in the source file and make Template:LaTeX/Package switch them to commas for display or define the column as follows: Template:LaTeX/Usage

The alternative is to use the Template:LaTeX/Package package, but it is much less convenient.

Another alternative is using package Template:LaTeX/Package that lets you typeset numbers and their according units consistently. Number alignment in tables and different output modes re supported.

Further information

Refer the Słownik Ortograficzny (in Polish) for additional information on Polish grammar and typography rules.

Good extract is available at Zasady Typograficzne Składania Tekstu (in Polish).

Portuguese

Add the following code to your preamble:

Template:LaTeX/Usage

You can substitute the language for brazilian portuguese by choosing Template:LaTeX/Parameter or Template:LaTeX/Parameter.

Slovak

Basic settings are fine when left the same as Czech, but Slovak needs special signs for 'ď', 'ť', 'ľ'. To be able to type them from keyboard use the following settings: Template:LaTeX/Usage

Spanish

Include the appropriate Babel option: Template:LaTeX/Usage

The trick is that Spanish has several options and commands to control the layout. The options may be loaded either at the call to Babel, or before, by defining the command Template:LaTeX/LaTeX. Therefore, the following commands are roughly equivalent:

Template:LaTeX/Usage

Template:LaTeX/Usage

On average, the former syntax should be preferred, as the latter is not recognized by some programs (LyX, latex2rtf) interacting with LaTeX.

Spanish also defines shorthands for the dot and << >> so that they are used as logical markup: the former is used as decimal marker in math mode, and the output is typically either a comma or a dot; the latter is used for quoted text, and the output is typically either «» or “”. This allows different typographical conventions with the same input, as preferences may be quite different from, say, Spain and Mexico.

Two particularly useful options are Template:LaTeX/Parameter: some packages and classes are known to collide with Spanish in the way they handle active characters, and these options disable the internal workings of Spanish to allow you to overcome these common pitfalls. Moreover, these options may simplify the way LyX customizes some features of the Spanish layout from inside the GUI.

The options Template:LaTeX/Parameter provide support for local custom in Mexico: the former using decimal dot, as customary, and the latter allowing decimal comma, as formerly required by the Mexican Official Norm (NOM) of the Secretariat of Economy for labels in foods and goods. More localizations are in the making.

The other commands modify the Spanish layout after loading Babel. Two particularly useful commands are Template:LaTeX/LaTeX and Template:LaTeX/LaTeX.

The macro Template:LaTeX/LaTeX contains a list of spanish mathematical operators, and may be redefined at will. For instance, the command Template:LaTeX/Usage only defines Template:LaTeX/Parameter, overriding all other definitions; the command Template:LaTeX/LaTeX disables them all. This command supports accented or spaced operators: the Template:LaTeX/LaTeX command puts an accent, and the Template:LaTeX/LaTeX command adds a small space. For instance, the following operators are defined by default.

Template:LaTeX/Usage

Finally, the macro Template:LaTeX/LaTeX disables some active characters, to keep you out of trouble if they are redefined by other packages. The candidates for deactivation are the set {<>."'}. Please, beware that some option preempt the availability of some active characters. In particular, you should not combine the Template:LaTeX/Parameter option with Template:LaTeX/LaTeX, or the Template:LaTeX/Parameter with Template:LaTeX/LaTeX.

Please check the documentation for Babel or spanish.dtx for further details.

Thai

Both babel (luatex and xetex) and polyglossia (only xetex) support Thai. Word division in luatex is based on the standard hyphenation mechanism, so that patterns can be modified with \babelpatterns, while xetex relies on its own built-in mechanism. In pdftex you need an external tool for word segmentation (like swath). An example with babel (luatex and xetex) is: Template:LaTeX/Usage

Tibetan

One option to use Tibetan script in LaTeX is to add Template:LaTeX/Usage to your preamble and use a slightly modified Wylie transliteration for input. Refer to the excellent package documentation for details. More information can be found on [1]

`babel` for `luatex` provides tentative support for justification with trailing tshegs.[2]

Vietnamese

The following preamble could be used to directly type Vietnamese (xetex or luatex). Template:LaTeX/Usage For a document written in this language: Template:LaTeX/Usage

References

  1. Engine news from the LaTeX Project
  2. Babel: The multilingual framework to localize LaTeX, LuaLaTeX, XeLaTeX
  3. The Not So Short Introduction to LaTeX, 2.5.6 Support for Cyrillic, Maksym Polyakov
  4. The Not So Short Introduction to LaTeX, Bulgarian translation
  5. Template:LaTeX/Package documentation: "the French language should now be loaded as french, not as frenchb or francais and preferably as a global option of Template:LaTeX/LaTeX. Some tolerance still exists in v3.0, but do not rely on it."


Template:LaTeX/Bottom