Why the code of character data is different from the original character data after being Exported and Imported by the XMLExport and XMLImport? (DBMR 1917; version: DBMaker 4.X)

~ 0 min
2016-03-08 10:41

For DBMaker Japanese, the encoding of the XML file is Shift_JIS. Shift_JIS encoding has 2 types: 83pv and 90pv. If the original data is 83pv data, after being imported into database, the 83pv data would be translated to 90pv data because that DBMaker XMLImport will translate data into Unicode internally before inserting. At this time, the Unicode maps to 90pv type Shift_JIS by default.

For more detail information can refer to: http://homepage1.nifty.com/hm7/works/AppleScript/83pvSpChar-to-90pv.text

User can transfer 83pv to 90pv before inserting into database, because DBMaker 4.x series only accept 90pv Shift_JIS with XMLImport.

DBMaker 4.x series will adopt 90pv that is from Microsoft official announcement: http://support.microsoft.com/kb/170559/EN-US/, so we translate 83pv to 90pv.

DBMaker 5.x series will change to support 83pv and will not translate to 90pv, because 83pv is Japanese windows OS (CP932) adopting, it is commoner than 90pv.

For example:

suppose that customer’s original data (∈∋⊆⊇⊂⊃∪∩∧∨¬) is 83pv, hex encoding like below 81 B8 81 B9 81 BA 81 BB 81 BC 81 BD 87 9C 87 9B 81 C8 81 C9 FA 54.

After being imported into database, the 83pv data has been translated to 90pv data, the imported data 90pv hex encoding would be like below 81 B8 81 B9 81 BA 81 BB 81 BC 81 BD 81 BE 81 BF 81 C8 81 C9 81 CA.

Because DBMaker XMLImport will translate data into unicode internally then insert:

  1. get 83pv data '0x879c'  => translate to unicode - 0x222a  (because DBMaker use expat parser to parse xml file and expat parser only accept unicode data)
  2. Get unicode data - 0x222a and insert into database, it will translate unicode data - 0x222a into shift-jis; at this time, the unicode map to 90pv type shift-jis by default, so the data will be translated as '0x81be'. And ASCII code of data is different.
  3. No matter 83pv shift-jis (∩- 0x879c) or 90pv shift-jis(∩-0x81be) are all map to unicode - 0x222a, so the data view is the same in browser but the ASCII code number is different.
Average rating 0 (0 Votes)

You cannot comment on this entry

Tags