Passing Arabic parameters from a Web Page to an Oracle Reports shows question marks ??

Dec 17,08

There are three fundamental facts that one need to consider while working with Arabic (or non ascii characters) in  Java and J2EE environment.  

     1-     All String values are stored in Unicode format in java

2-     When and HTML FORM is submitted, any non-ascii character is encoded by the browser to its Hexadecimal representation and as such appear on the URL in that representation.  The browser therefore, needs to know what encoding is being used in order to do the URL encoding properly

3-     When the web container receives the Request, it assumes that the encoding is 8859-p1

 

Failure to understand the impact of all or any of the above issues shall undoubtedly change your Arabic characters to conceivably  unreadable characters or even Question Marks (????)

 My environment is a J2EE application based on Oracle ADF with JSF and Oracle Report Services

 The scenario is rather simple:

There is a JSF page that takes a parameter (the parameter is an Arabic one).

When a button is pressed, an Oracle Report is called and the parameter is passed by programmatically composing the appropriate URL

 The problem

Even though the first fact above is well understood, however, in the URL, the Parameter string appears as ????.  The Report runs with after binding its query with the STRING value ????? and as such returns no data

 The code

 The button is associated with an actionlistener containing the following code

       String query=getInputText1().getValue().toString;

       String url = "http://asdb.realsoft.com:7781/reports/rwservlet?destype=cache&desformat=HTML&report=MODULE1.rdf&userid=scott/tiger@orclstc&d1="+query;

                 HttpServletResponse response = (HttpServletResponse) FacesContext.getCurrentInstance().getExternalContext().getResponse();

         FacesContext.getCurrentInstance().getExternalContext().redirect(url) ;

 The Solution

First of all, you want to make sure that your browser understands that your page is an Arabic page with the appropriate code page, one simple way of doing that is to make sure that your JSF page (or JSP for that matter) contains the appropriate tag

<%@ page contentType="text/html;charset=windows-1256" ….

This is equivalent to setting the encoding on the Browser Menu ViewàEncodingà Arabic Windows>

The first serious problem in our code is its violation to the second fact above.  Since we are constructing the URL ourselves and our Parameter string is in Arabic, we should ourselves be responsible for encoding the non-ascii character to the hex equivalent.

Fix

String query=URLEncoder.encode(getInputText1().getValue().toString(),"cp1256");

This effectively converts the value of the input parameter to its equivalent CP1256 encoding which is Arabic Windows (note The URLEncoder is available in JDK1.4 and above)

 

Running the page and pressing the button indeed revealed that the Query String in encoded correctly based on the 1256 codepage and are shown in the following format %xx, of course, if the parameter is normal ascii, things work normally anyways.

The problem however, is not fully solved, the reports shows its user parameters to debugging purposes, and the parameter is printed as ???? again in the report. Then, the problem is not resolved by resolving the URL parameter issue, it also seems that the report servlets itself is not interpreting it correctly.   

This turns out to be true, because now, we run into the third fact.  The report runs as a servlet in the Web container, and as such assumes by default that all data transmitted in the URL are code page 1252 which is the same as 8859-1 that is used in Western Europe.  When the reports reads the parameters it does not find 1252 equivalent and therefore renders the values as ????

Doing some research revealed that if you are writing JSP or Servlet code, you need to do some work to convert the string back to 1256

For example

 You would be using HttpServletRequest.getParameter( ) API , all embedded %XX data in the input text is decoded, the decoded input is converted from ISO-8859-1 to Unicode, and returned as a Java string. The Java string returned is incorrect if the encoding of the HTML form is not ISO-8859-1. However, you can work around this problem by converting the form input data. When a JSP or Java Servlet receives form input, it converts it back to the original form in bytes, and then converts the original form to a Java string based on the correct encoding.

 

public static String convertFrom1256(String s) {

  String out = null;

   try {

     out = new String(s.getBytes("ISO-8859-1"), "Cp1256");

   } catch (java.io.UnsupportedEncodingException e) {

     return null;

   }

   return out;

}

 That would work if I am handling the page, but in my case, it is the report server that is handling the URL, and it must be using HttpServletRequest.getParameter( ) API.  Nevertheless, how can it then do something like the covertFrom method? And if there is such a method, how can it then know which code page the client is using ad Arabic encoding?

 Further research revealed that there is a file called

 $ORACLE_HOME/reports/conf/rwservlet.properties

 You need to edit the file and add

 DEFAULTCHARSET=Cp1256

 Then restart the Reports OC4J and  then you  be able to pass correct Arabic parameters through parameter form

 Conclusion:  passing Arabic characters in the URL requires thorough understanding of how Java and Java Web environment work with non-ascii characters (Arabic or otherwise) and the problem could be multi-layered

 Since the request goes through different layer, any one layer that is not set correctly shall compromise your Arabic characters

 Note: if you are doing some testing, and would like to provide static arabic data to a string, you should be doing it in Unicode

    String original = new String("A" + "\u0623" + "\u062d" +

                                "\u0645" + "C");   

   try {

      byte [] by1252 = original.getBytes("cp1256");

     printBytes(by1252,"by");

    String aParUnicode = new String(by1252,"cp1256");

//printBytes(aParUnicode,"by");

       System.out.println(aParUnicode);

   } catch (Exception e) {

       e.printStackTrace();

   }

  I used the following to peek into the bytes in order to make sure that I am not losing the Arabic data at intermediate steps

 public static void printBytes(byte[] array, String name) {

  System.out.print(name + " = ");

  for (int k = 0; k < array.length; k++) {

  System.out.print("0x" + (array[k]) + " ");

  }

  System.out.println();

Lessons Learned

1) I spend 8 hours in two nights (4 hours per night) after work in order to research and solve this issue, Things are not necessarily made to be easy in our world, but fortunately life also taught us that “where there is a WILL there is a WAY”

 بقدر الكد تكتسب المعالي                   ومن طلب العلا  سهر الليالي

والسؤال الملح الان هو :   ما هو تفسيرك  للعـــــــلا

2) make your keys that you pass as  parameters numeric or ASCII by design