Package loci.poi.util

Class StringUtil


  • public class StringUtil
    extends Object
    Title: String Utility Description: Collection of string handling utilities
    Since:
    May 10, 2002
    Version:
    1.0
    Author:
    Andrew C. Oliver, Sergei Kozello (sergeikozello at mail.ru), Toshiaki Kamoshida (kamoshida.toshiaki at future dot co dot jp)
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static String format​(String message, Object[] params)
      Apply printf() like formatting to a string.
      static String getFromCompressedUnicode​(byte[] string, int offset, int len)
      Read 8 bit data (in ISO-8859-1 codepage) into a (unicode) Java String and return.
      static String getFromUnicodeBE​(byte[] string)
      Given a byte array of 16-bit unicode characters in big endian format (most important byte first), return a Java String representation of it.
      static String getFromUnicodeBE​(byte[] string, int offset, int len)
      Given a byte array of 16-bit unicode characters in big endian format (most important byte first), return a Java String representation of it.
      static String getFromUnicodeLE​(byte[] string)
      Given a byte array of 16-bit unicode characters in little endian format (most important byte last), return a Java String representation of it.
      static String getFromUnicodeLE​(byte[] string, int offset, int len)
      Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it.
      static String getPreferredEncoding()  
      static boolean hasMultibyte​(String value)
      check the parameter has multibyte character
      static boolean isUnicodeString​(String value)
      Checks to see if a given String needs to be represented as Unicode
      static void putCompressedUnicode​(String input, byte[] output, int offset)
      Takes a unicode (java) string, and returns it as 8 bit data (in ISO-8859-1 codepage).
      static void putUnicodeBE​(String input, byte[] output, int offset)
      Takes a unicode string, and returns it as big endian (most important byte first) bytes in the supplied byte array.
      static void putUnicodeLE​(String input, byte[] output, int offset)
      Takes a unicode string, and returns it as little endian (most important byte last) bytes in the supplied byte array.
    • Method Detail

      • getFromUnicodeLE

        public static String getFromUnicodeLE​(byte[] string,
                                              int offset,
                                              int len)
                                       throws ArrayIndexOutOfBoundsException,
                                              IllegalArgumentException
        Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it. { 0x16, 0x00 } -0x16
        Parameters:
        string - the byte array to be converted
        offset - the initial offset into the byte array. it is assumed that string[ offset ] and string[ offset + 1 ] contain the first 16-bit unicode character
        len - the length of the final string
        Returns:
        the converted string
        Throws:
        ArrayIndexOutOfBoundsException - if offset is out of bounds for the byte array (i.e., is negative or is greater than or equal to string.length)
        IllegalArgumentException - if len is too large (i.e., there is not enough data in string to create a String of that length)
      • getFromUnicodeLE

        public static String getFromUnicodeLE​(byte[] string)
        Given a byte array of 16-bit unicode characters in little endian format (most important byte last), return a Java String representation of it. { 0x16, 0x00 } -0x16
        Parameters:
        string - the byte array to be converted
        Returns:
        the converted string
      • getFromUnicodeBE

        public static String getFromUnicodeBE​(byte[] string,
                                              int offset,
                                              int len)
                                       throws ArrayIndexOutOfBoundsException,
                                              IllegalArgumentException
        Given a byte array of 16-bit unicode characters in big endian format (most important byte first), return a Java String representation of it. { 0x00, 0x16 } -0x16
        Parameters:
        string - the byte array to be converted
        offset - the initial offset into the byte array. it is assumed that string[ offset ] and string[ offset + 1 ] contain the first 16-bit unicode character
        len - the length of the final string
        Returns:
        the converted string
        Throws:
        ArrayIndexOutOfBoundsException - if offset is out of bounds for the byte array (i.e., is negative or is greater than or equal to string.length)
        IllegalArgumentException - if len is too large (i.e., there is not enough data in string to create a String of that length)
      • getFromUnicodeBE

        public static String getFromUnicodeBE​(byte[] string)
        Given a byte array of 16-bit unicode characters in big endian format (most important byte first), return a Java String representation of it. { 0x00, 0x16 } -0x16
        Parameters:
        string - the byte array to be converted
        Returns:
        the converted string
      • getFromCompressedUnicode

        public static String getFromCompressedUnicode​(byte[] string,
                                                      int offset,
                                                      int len)
        Read 8 bit data (in ISO-8859-1 codepage) into a (unicode) Java String and return. (In Excel terms, read compressed 8 bit unicode as a string)
        Parameters:
        string - byte array to read
        offset - offset to read byte array
        len - length to read byte array
        Returns:
        String generated String instance by reading byte array
      • putCompressedUnicode

        public static void putCompressedUnicode​(String input,
                                                byte[] output,
                                                int offset)
        Takes a unicode (java) string, and returns it as 8 bit data (in ISO-8859-1 codepage). (In Excel terms, write compressed 8 bit unicode)
        Parameters:
        input - the String containing the data to be written
        output - the byte array to which the data is to be written
        offset - an offset into the byte arrat at which the data is start when written
      • putUnicodeLE

        public static void putUnicodeLE​(String input,
                                        byte[] output,
                                        int offset)
        Takes a unicode string, and returns it as little endian (most important byte last) bytes in the supplied byte array. (In Excel terms, write uncompressed unicode)
        Parameters:
        input - the String containing the unicode data to be written
        output - the byte array to hold the uncompressed unicode, should be twice the length of the String
        offset - the offset to start writing into the byte array
      • putUnicodeBE

        public static void putUnicodeBE​(String input,
                                        byte[] output,
                                        int offset)
        Takes a unicode string, and returns it as big endian (most important byte first) bytes in the supplied byte array. (In Excel terms, write uncompressed unicode)
        Parameters:
        input - the String containing the unicode data to be written
        output - the byte array to hold the uncompressed unicode, should be twice the length of the String
        offset - the offset to start writing into the byte array
      • format

        public static String format​(String message,
                                    Object[] params)
        Apply printf() like formatting to a string. Primarily used for logging.
        Parameters:
        message - the string with embedded formatting info eg. "This is a test %2.2"
        params - array of values to format into the string
        Returns:
        The formatted string
      • getPreferredEncoding

        public static String getPreferredEncoding()
        Returns:
        the encoding we want to use, currently hardcoded to ISO-8859-1
      • hasMultibyte

        public static boolean hasMultibyte​(String value)
        check the parameter has multibyte character
        Parameters:
        value - string to check
        Returns:
        boolean result true:string has at least one multibyte character
      • isUnicodeString

        public static boolean isUnicodeString​(String value)
        Checks to see if a given String needs to be represented as Unicode
        Parameters:
        value -
        Returns:
        true if string needs Unicode to be represented.