中文缺字在數位典藏系統之應用 ~ 網頁缺字處理技術

中文缺字在數位典藏系統之應用~網頁缺字處理技術中文缺字在數位典藏系統之應用~網頁缺字處理技術報告人：林金龍

outline • 構字式(缺字)輸入 • 資料庫處理 • 構字式(缺字)呈現

構字式(缺字)輸入 • 漢字構形資料庫 • 缺字查詢系統(Web)

漢字構形資料庫 1.查詢 3.構字式 2.查詢結果

缺字查詢系統(Web) 1.查詢 2.查詢結果 3.構字式 4.快速複製至剪貼簿

缺字查詢系統(Web)

資料庫處理 • 典藏系統編碼 • Big5 • 資料庫編碼 • Big5 • 構字式轉unicode跳脫字元 • JAVA Bean

構字式轉unicode跳脫字元 public String getCode(String OriStr) { StringBuffer sb = new StringBuffer(); byte[] byteOriStr = OriStr.getBytes("ISO-8859-1"); int hiByte; int loByte; for (int k = 0; k < byteOriStr.length; k++) { if (byteOriStr[k] < 0 && k < byteOriStr.length - 1) { // 非英數字 hiByte = byteOriStr[k]; loByte = byteOriStr[k + 1]; hiByte = (hiByte > 0) ? (hiByte) : (hiByte & 0xFF); // 轉為正值 loByte = (loByte > 0) ? (loByte) : (loByte & 0xFF); int big5code = hiByte * 256 + loByte; // Big 5碼 if (hiByte == 0xF9 && loByte > 0xD5 && loByte < 0xFF) { // 判斷是否為倚天擴充字與特殊字碼區 sb.append(getETCode(loByte)); } // if () 倚天字碼區 else if ( (big5code >= 0x8140 && big5code <= 0x8DFE) || (big5code >= 0x8E40 && big5code <= 0xA0FE) || (big5code >= 0xC6A1 && big5code <= 0xC8FE) || (big5code >= 0xFA40 && big5code <= 0xFEFE)) { // 判斷是否為中文造字區 sb.append(getMissCharCode(hiByte, loByte)); } // if () 中文造字區 else { // 其它 - 一般中文 byte dbcs[] = new byte[2]; dbcs[0] = byteOriStr[k]; dbcs[1] = byteOriStr[k + 1]; sb.append(new String(dbcs)); } k++; } else { // 英數字 sb.append( (char) byteOriStr[k]); } } // for(int k = 0; k < OriStr.length(); k++) } return sb.toString(); }

構字式轉unicode跳脫字元(cont.1) private String getETCode(int loByte) { String ETCode = ""; switch (loByte) { case 0xD6: ETCode = "碁"; break; case 0xD7: ETCode = "銹"; break; ．．． case 0xFE: ETCode = "▓"; break; default: break; } // switch () return ETCode; }

構字式轉unicode跳脫字元(cont.2) private String getMissCharCode(int hiByte, int loByte) { String MissCharCode = ""; int big5code = hiByte * 256 + loByte; int MISS_CHAR_BASE = 0; int MISS_CHAR_START = 0; if (hiByte >= 0x81 && hiByte <= 0x8D) { MISS_CHAR_BASE = 0x6D78; MISS_CHAR_START = 0x81; } else if (hiByte >= 0x8E && hiByte <= 0xA0) { MISS_CHAR_BASE = 0x54D1; MISS_CHAR_START = 0x8E; } else if (hiByte >= 0xC6 && hiByte <= 0xC8) { MISS_CHAR_BASE = 0x3032; MISS_CHAR_START = 0xC6; } else if (hiByte >= 0xFA && hiByte <= 0xFE) { MISS_CHAR_BASE = -0x1A40; MISS_CHAR_START = 0xFA; } int offset = MISS_CHAR_BASE - 0x63 * (hiByte - MISS_CHAR_START); if (loByte >= 0xA1 && loByte <= 0xFE) { offset = offset - 0x22; } int unicode = big5code + offset; MissCharCode = "&#x" + (Integer.toHexString(unicode)).toUpperCase() + ";"; return MissCharCode; } }

構字式(缺字)呈現 • 網頁處理 • Java Applet • Java Script • Java Bean • 缺字圖型產生程式 • CGI • C#

Q & A

中文缺字在數位典藏系統之應用 ~ 網頁缺字處理技術

中文缺字在數位典藏系統之應用 ~ 網頁缺字處理技術

Presentation Transcript