Showing posts with label unsupported string to database. Show all posts
Showing posts with label unsupported string to database. Show all posts

Friday, May 9, 2014

Java Regex to convert string to ascii code and ascii code to string back

MainClass.java


package com.pkm;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class MainClass {
    public static void main(String[] args) {
        String str1 = "(=>THAI(คุณ), HINDI:(तुम मेरी हो), HANGERIO:(تو مال منی), CHINA:(您), ARBI(أنت), FARSI(شما)";
        System.out.println("\n\nOriginal :: " +str1);        
        Pattern pattern = Pattern.compile("([^\\x00-\\x7F])|([^A-Za-z0-9-_])");
        StringBuffer output = new StringBuffer();
        Matcher matcher = pattern.matcher(str1);
        while (matcher.find()) {
            String mString = matcher.group(0);
            String rep = String.format("[%d \\%s = %d]", 
                    mString.length(), mString,
                    (int) mString.charAt(0));
            String rep2 = String.format("&#%d;", (int) mString.charAt(0));
            matcher.appendReplacement(output, rep2);
        }
        matcher.appendTail(output);
        System.out.println("Output   :: " + output.toString());
        
        pattern = Pattern.compile("\\&\\#\\d{2,}\\;");
        matcher = pattern.matcher(output.toString());
        output = new StringBuffer();
        while (matcher.find()) {
            String mString = matcher.group(0);
            mString = mString.substring(2);
            mString = mString.substring(0, mString.length() - 1);
            mString = Character.toString((char) Integer.parseInt(mString));
            matcher.appendReplacement(output, mString);
        }
        matcher.appendTail(output);
        System.out.println("Output   :: " + output.toString());
    }
}

Output as follows:

Original :: (=>THAI(คุณ), HINDI:(तुम मेरी हो), HANGERIO:(تو مال منی), CHINA:(您), ARBI(أنت), FARSI(شما)


Output   :: &#40&#59;=>THAI(คุณ), HINDI:(तुम मेरी हो), HANGERIO:(تو مال منی), CHINA:(您), ARBI(أنت), FARSI(شما)


Output   :: (=>THAI(คุณ), HINDI:(तुम मेरी हो), HANGERIO:(تو مال منی), CHINA:(您), ARBI(أنت), FARSI(شما)

Java code to convert string to ascii code and ascii code to string back

Friday, December 13, 2013

Convert string to ascii and ascii to string using JAVA

Java code to convert string to ascii code and ascii code to string back

Java Regex to convert string to ascii code and ascii code to string back



/**
 *
 * @author Pritom K Mondal
 */
public class AsciiString {
    public static void main(String[] args) {
        String inputString = "Hi, THAI(คุณ), HINDI:(तुम मेरी हो), HANGERIO:(تو مال منی), CHINA:(您), ARBI(أنت), FARSI(شما)";
        System.out.println("ORIGINAL:      " + inputString);
        String encoded = AsciiString.encode(inputString);
        System.out.println("ASCII ENCODED: " + encoded);
        String decoded = AsciiString.decode(encoded);
        System.out.println("ASCII DECODED: " + decoded);
    }
    
    public static String encode(String word) {
        String encoded = "";
        for(Integer index = 0; index < word.length(); index++) {
            int ascii = (int) word.charAt(index);
            Boolean keepAscii = true;
            if(ascii >= 48 && ascii <= 57) {
                keepAscii = false;
            }
            if(ascii >= 65 && ascii <= 90) {
                keepAscii = false;
            }
            if(ascii >= 97 && ascii <= 122) {
                keepAscii = false;
            }
            if(ascii == 32 || ascii == 43 || ascii == 45 || ascii == 46) {
                keepAscii = false;
            }
            if(keepAscii) {
                encoded += "&#" + ascii + ";";
            } else {
                encoded += word.charAt(index);
            }
        }
        return encoded;
    }
    
    public static String decode(String word) {
        String decoded = "";
        for(Integer index = 0; index < word.length(); index++) {
            String charAt = "" + word.charAt(index);
            if(charAt.equals("&") && index < word.length() && ("" + word.charAt(index + 1)).equals("#")) {
                try {
                    Integer length = word.indexOf(";", index);
                    String sub = word.substring(index + 2, length);
                    decoded += Character.toString((char) Integer.parseInt(sub));
                    index = length;
                } catch (Exception ex) {
                    decoded += charAt;
                }
            } else {
                decoded += charAt;
            }
        }
        return decoded;
    }
}

Output for the above program is as following:


ORIGINAL:      Hi, THAI(คุณ), HINDI:(तुम मेरी हो), HANGERIO:(تو مال منی), CHINA:(您), ARBI(أنت), FARSI(شما)
ASCII ENCODED: Hi&#44; THAI&#40;&#3588;&#3640;&#3603;&#41;&#44; HINDI&#58;&#40;&#2340;&#2369;&#2350; &#2350;&#2375;&#2352;&#2368; &#2361;&#2379;&#41;&#44; HANGERIO&#58;&#40;&#1578;&#1608; &#1605;&#1575;&#1604; &#1605;&#1606;&#1740;&#41;&#44; CHINA&#58;&#40;&#24744;&#41;&#44; ARBI&#40;&#1571;&#1606;&#1578;&#41;&#44; FARSI&#40;&#1588;&#1605;&#1575;&#41;
ASCII DECODED: Hi, THAI(คุณ), HINDI:(तुम मेरी हो), HANGERIO:(تو مال منی), CHINA:(您), ARBI(أنت), FARSI(شما)

Now create a html file named 'index.html' such:


<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    <head>
        <title>String and ASCII</title>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
    </head>
    <body>
        <div>ORIGINAL:      Hi, THAI(คุณ), HINDI:(तुम मेरी हो), HANGERIO:(تو مال منی), CHINA:(您), ARBI(أنت), FARSI(شما)</div>
        <div>ASCII ENCODED: Hi&#44; THAI&#40;&#3588;&#3640;&#3603;&#41;&#44; HINDI&#58;&#40;&#2340;&#2369;&#2350; &#2350;&#2375;&#2352;&#2368; &#2361;&#2379;&#41;&#44; HANGERIO&#58;&#40;&#1578;&#1608; &#1605;&#1575;&#1604; &#1605;&#1606;&#1740;&#41;&#44; CHINA&#58;&#40;&#24744;&#41;&#44; ARBI&#40;&#1571;&#1606;&#1578;&#41;&#44; FARSI&#40;&#1588;&#1605;&#1575;&#41;</div>
        <div>ASCII DECODED: Hi, THAI(คุณ), HINDI:(तुम मेरी हो), HANGERIO:(تو مال منی), CHINA:(您), ARBI(أنت), FARSI(شما)</div>
    </body>
</html>

Now showing in browser:


In browser they are appearing same. Such original string, after ascii conversion and the back to again string. So why you convert them?
  • It is easy to maintain them.
  • Easy to insert to and get from database.
  • You have no worry about what characters are you inserting to database.
  • If you create xml from those data, you do not need to worry about what characters support in xml file, specially when you transport data via api server.