globalization / cyrillic sign in database

With the problem to store cyrillic sign in the database and display these on the client, I found an article, describing the things behind very well:

SOURCE: http://www.erpassist.com/groups/groups.asp?v=oracle-db-l&i=262575 - found on 20.07.2005

**** begin ****
Posted By Nishant KAVI On Wednesday, May 21, 2003 at 3:55 AM

Content

1. Key concepts
2. How to set up my NLS_LANG
3. The correct NLS_LANG for my Windows ANSI Code Page
4. The correct NLS_LANG for my DOS / Command Prompt OEM Code Page
5. How to check the NLS_LANG
6. List of common NLS_LANG to be set in Windows registry
7. List of common character sets to be used in a command prompt

1. Key Concepts

1.1 Windows and Dos Code Pages:
-------------------------------
On Windows systems, the encoding scheme is specified by a code page. Code pages are defined to support specific languages or groups of languages which share common writing systems. or example, code page 1253 provides character codes required in the Greek writing system.
To sum up, a code page is the name for a Windows or DOS encoding schema. Usually, in non Chinese-Japanese-Korean environments, Windows and DOS command rompt do not use the same code page.
From Oracle point of view the terms Codepage and Characterset mean the same.

1.2 Font:
---------
A font is a collection of glyphs (from "hieroglyphs") that share common appearances (typeface, character size). A font is used by the operating system to convert a numeric value into a graphical representation on screen.
A font does not necessary contain a graphical representation for all numeric values defined in the code page you are using.
That's why you get sometimes black squares on the screen if you change fonts. The Windows "Character Set Map" utility can be used to see which glyphs are part of a certain font.
On Windows 2000:
Start -> Programs -> Accessories -> System Tools -> Character Map
or Start -> Run... Type "charmap", and click "ok"
A font implements a particular code page or set of code pages. For example, the Arial font implements the code pages 1252, 1250, 1251, 1253, 1254, 1257.

1.3 Character Set:
------------------
A character set defines characters to be placed in a particular order and with a particular character code. In other words, a character set is just an agreement on what numeric value a symbol has.

1.4 ORACLE Database Characterset:
---------------------------------
An Oracle database has to be created with a character set that supports ALL symbols used by all various clients.

1.5 ORACLE client Characterset:
-------------------------------
The Oracle client character set is defined by the third part of the NLS_LANG parameter. It has almost nothing to do with the database character set in use.
On Oracle client side, the encoding scheme used by Oracle MUST MATCH the code page used by the OS I/O system.
This means that you have to make sure that the Oracle client character set always fits to the Windows code page if using the GUI, and to the DOS code page when using the DOS command prompt. If not, you will end up with corrupt data.
After setting NLS_LANG, the Oracle software correctly translates any symbols to the character code values of the database character set when storing data into the database or retrieving data from the database.
The Oracle client character set can be, and in many cases, has to be different from the database character set.

1.6 How Windows uses Fonts to display the different charactersets:
------------------------------------------------------------------
We assume you have an UTF8 database with correctly stored UTF8 codepoints.
On Windows there are two kinds of tools / applications:
1)A fully Unicode enabled applications which accepts Unicode codepoints and which can render them. Note that it's the Application that needs to deal with the Unicode, Windows just provides the unicode API but the GUI system itself is NOT Unicode "by nature".
A fully Unicode application can only show one glyph for a given Unicode code point. So there is NO confusion possible here, this application will need to use a full unicode font. If you have a full unicode application, then you need to set the NLS_LANG to UTF8.
Note that there are currently NOT many applications like this and if it's not explicitly mentioned by the vendor it's most likely an ANSI application (see below). So DON'T set the NLS_LANG to UTF8 if you are not sure!
The only Unicode capable client that is included in the database is iSQLPLus.

2) An standard ANSI application (like sqlplusw.exe) cannot use Unicode code points. So the Unicode code point stored in the database needs to be CONVERTED to a ANSI code point. This is done by setting NLS_LANG This allows oracle to map the unicode point to the characterset of the client, (and here comes the tricky part) this is NOT the same as a font. If you want to display Arabic for example then you need to set the Windows characterset to Arabic. That way Windows knows what are valid codepoints and can use the FONT engine to DISPLAY the codepoints (this results in glyphs).
Windows passes the codepoint and the "page" to the rendering engine. This "page" defines the glyphs for the codepoints for a certain characterset/codepage.
Because there are only 256 possible positions for a ANSI application, and one font contains normally glyphs for different languages this "page" is used to select from a FONT that has (for example) all the glyphs for Cyrillic, Arabic and West-European the "page" for arabian.
So lets say you have a Arabic setup that works, you change manually the "Page" of a FONT and ask to display the glyph for ANSI codepoint XX. Now 1 of 2 things can happen:
1) There is a character defined on that position for the CHARACTERSET of that "Page", so the creator of the font has forseen a glyph and this is displayed
(but this is NOT the character expected or wanted as its stored as a different character in the database!).
2) There is NO character defined on that position for the CHARACTERSET of that "Page" so the creator of the font has NOT forseen a glyph and you get "garbage" or black squares (normally you should see a black square but a ? or ¿ are also possible, this depends on the error handling defined in the FONT).
The above is also possible if you have an non-Unicode characterset for the database.


2. How to setup my NLS_LANG:

To specify the locale behaviour of your client Oracle software, you have to set your NLS_LANG parameter. It sets the language, territory and also the character set of your client. For a short overview, use the following format:
NLS_LANG = LANGUAGE_TERRITORY.CHARACTERSET
where:
LANGUAGE specifies:
- language used for Oracle messages,
- day names and month names

TERRITORY specifies:
- monetary and numeric formats,
- territory and conventions for calculating week and day numbers

CHARACTERSET:
- controls the character set used by the client application, normally THAT MATCHES your Windows code page


2.1 In the Registry:
--------------------
On Windows systems, you should make sure that you have set an NLS_LANG registry subkey for each of your Oracle Homes:
You can easily modify this subkey with the Windows Registry Editor:
Start -> Run...
Type "regedit", and click "ok"
Edit the following registry entry:
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\HOMEID\ where ID is the unique number identifying the Oracle home.
There you have an entry with as name NLS_LANG When starting an Oracle tools, like sqlplusw, it will read the content of
the oracle.key file located in the same directory to determine which registry tree will be used, therefore which NLS_LANG subkey will be used.

2.2 As a System or User Environment Variable, in System properties:
-------------------------------------------------------------------
Although the Registry is the primary repository for environment settings, it is not the only place where environment parameters can be set.
Even if not at all recommended, you can set the NLS_LANG as a System or User Environment Variable in the System properties. This setting will be used for ALL Oracle homes. To check and modify them:
Right-click the 'My Computer' icon -> 'Properties'
Select the 'Advanced' Tab -> Click on 'Environment Variables'
The 'User Variables' list contains the settings for the specific OS user currently logged on and the 'System varaibles' keeps environment-wide variables.
Since these environment variables take precedence of the parameters already set in your Registry, you should not set Oracle parameters at this location unless you have a very good reason.
Particularly not the "ORACLE_HOME" parameter that is set on unix but NOT on windows.

2.3 As an Environment variable defined in the command prompt:
-------------------------------------------------------------
If you set the NLS_LANG as an environment variable in a Command prompt, be aware that it will overrite the current NLS_LANG setting in the Registry and also the System Properties.
In an MS-DOS command prompt, use the set command, for example:
C:\> set NLS_LANG=american_america.WE8PC850
The NLS_LANG parameter is never inherited from the server.
If not set, NLS_LANG defaults to american_america.US7ASCII.


3. The correct NLS_LANG for my Windows ANSI Code Page

3.1 Determine your Windows ANSI code page:
------------------------------------------
Firstly, you have to determine your current Windows ANSI Code Page (ACP).
You'll find its value in the registry:
Start -> Run...
Type "regedit", and click "ok"
Browse the following registry entry:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage\
There you have (all the way below) an entry with as name ACP
The value of ACP is your current GUI Codepage, see the table in point 3.2 for the mapping to the oracle name. OEMCP shows the command line codepage>
Since there are many registry entries with very similar names, please make sure that you are looking at the right place in the registry.
Additionally, the following URL provides a list of the default code pages for all Windows versions:
http://www.microsoft.com/globaldev/reference/ (under the REFERENCE tab on the left of the page)
OEM = the command line codepage, ANSI = the gui codepage


3.2 Find the correspondent Oracle client character set:
-------------------------------------------------------
After, find the Oracle client character set in the table below based on the ACP you got above.
ANSI CodePage (ACP) Oracle Client character set (3rd part of NLS_LANG)
1250 EE8MSWIN1250
1251 CL8MSWIN1251
1252 WE8MSWIN1252
1253 EL8MSWIN1253
1254 TR8MSWIN1254
1255 IW8MSWIN1255
1256 AR8MSWIN1256
1257 BLT8MSWIN1257
1258 VN8MSWIN1258
874 TH8TISASCII
932 JA16SJIS
936 ZHS16GBK
949 KO16MSWIN949
950 ZHT16MSWIN950
others UTF8
You can use UTF8 as Oracle client character set on Windows NT/2000/XP but you will be limited to use only client programs that explicitly support this configuration:
This is because the user interface of Win32 is not UTF8, therefore the client programs have to perform explicit conversions between UTF8 (used on Oracle side) and UTF16 (used on Win32 side).
An example of such a program is Oracle Forms in version 5 and later on NT 4.0. or iSQLplus (from 817 onwards).
From the other side, programs relying on ANSI/OEM Win32 API, like SQL*Plus, Oracle Forms on Win9x, Export/Import, command-line tools, etc. cannot work with an UTF8 Oracle client character set.
IF you need to display unicode then you might want to use iSQLPLUS, the browser based version of sqlplus.

3.3 Set it in your Registry:
----------------------------
Use the Windows Registry Editor to set up the NLS_LANG subkeys in all your Oracle Homes with the value you have just find above.
Section 2.1 gives you more details on how to use the Registry Editor for that purpose.


4. The correct NLS_LANG for my DOS / Command Prompt OEM Code Page

MS-DOS mode uses, with a few exceptions (CJK), a different code page (called OEM code page) than Windows GUI (ANSI code page).
Meaning that before using an Oracle command line tool such as SQL*Plus (sqlplus.exe/ plus80.exe / plus33.exe ) en svrmgrl in a command prompt then you need to MANUALLY SET the NLS_LANG parameter as an environment variable with the set DOS command BEFORE using the tool.
(CJK) For Japanese, Korean, Simplified Chinese, and Traditional Chinese, the MS-DOS OEM code page is identical to the ANSI code page meaning that, in this particular case, there is no need to set the NLS_LANG parameter in MS-DOS mode.
In all other cases, you need to set it in order to overwrite the NLS_LANG registry key already matching the ANSI code page. The new "MS-DOS dedicated" NLS_LANG needs to match the MS-DOS OEM code page that could be retrieved by typing chcp in a Command Prompt:
C:\> chcp
Active code page: 437

C:\> set NLS_LANG=american_america.US8PC437
If the NLS_LANG parameter for the MS-DOS mode session is not set appropriately, error messages and data can be corrupted due to incorrect character set conversion.
Use the following list to find the Oracle character set that fits to your MS-DOS code page in use on your locale system:
MS-DOS code page Oracle Client character set (3rd part of NLS_LANG)
437 US8PC437
737 EL8PC737
850 WE8PC850
852 EE8PC852
857 TR8PC857
858 WE8PC858
861 IS8PC861
865 N8PC865
866 RU8PC866
For tools like sqlloader you need to set the NLS_LANG to the characterset of the FILE you loading.


5. How to check the NLS_LANG:

To check the NLS_LANG, you need to open a command prompt and to run sqlplus in command line mode.
First, check if it's set in the environment:
SQL> host echo %NLS_LANG%
If this reports just %NLS_LANG% back, the variable is not set in the environment. If it's set it reports something like
ENGLISH_UNITED KINGDOM.WE8PC850
If NLS_LANG is not set in the enviroment, you should check the value in the registry:
SQL> @.[%NLS_LANG%].
If you get something like:
unable to open file ".[ENGLISH_UNITED KINGDOM.WE8ISO8859P1]." the "file name" between the '[]' is the value of the registry parameter.
If you get this as result:
unable to open file ".[%NLS_LANG%]." then the parameter NLS_LANG is also not set in the registry.
Note: the @.[%NLS_LANG%]. "trick" reports the NLS_LANG known by the sqlplus executable, it will not read the registry itself.
But then you are not sure if the variable is set in the enviroment or in the registry. That's the reason of checking with the host commando first.

**** end ****

Kommentare

Beliebte Posts aus diesem Blog

PGA unter Oracle 11g

trunc(sysdate) - nette Spiele mit dem Datum

Datapump - Verzeichnis erstellen