Package loci.poi.util
Class StringUtil
java.lang.Object
loci.poi.util.StringUtil
Title: String Utility Description: Collection of string handling utilities
- Since:
- May 10, 2002
- Version:
- 1.0
- Author:
- Andrew C. Oliver, Sergei Kozello (sergeikozello at mail.ru), Toshiaki Kamoshida (kamoshida.toshiaki at future dot co dot jp)
-
Method Summary
Modifier and TypeMethodDescriptionstatic StringApply printf() like formatting to a string.static StringgetFromCompressedUnicode(byte[] string, int offset, int len) Read 8 bit data (in ISO-8859-1 codepage) into a (unicode) Java String and return.static StringgetFromUnicodeBE(byte[] string) Given a byte array of 16-bit unicode characters in big endian format (most important byte first), return a Java String representation of it.static StringgetFromUnicodeBE(byte[] string, int offset, int len) Given a byte array of 16-bit unicode characters in big endian format (most important byte first), return a Java String representation of it.static StringgetFromUnicodeLE(byte[] string) Given a byte array of 16-bit unicode characters in little endian format (most important byte last), return a Java String representation of it.static StringgetFromUnicodeLE(byte[] string, int offset, int len) Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it.static Stringstatic booleanhasMultibyte(String value) check the parameter has multibyte characterstatic booleanisUnicodeString(String value) Checks to see if a given String needs to be represented as Unicodestatic voidputCompressedUnicode(String input, byte[] output, int offset) Takes a unicode (java) string, and returns it as 8 bit data (in ISO-8859-1 codepage).static voidputUnicodeBE(String input, byte[] output, int offset) Takes a unicode string, and returns it as big endian (most important byte first) bytes in the supplied byte array.static voidputUnicodeLE(String input, byte[] output, int offset) Takes a unicode string, and returns it as little endian (most important byte last) bytes in the supplied byte array.
-
Method Details
-
getFromUnicodeLE
public static String getFromUnicodeLE(byte[] string, int offset, int len) throws ArrayIndexOutOfBoundsException, IllegalArgumentException Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it. { 0x16, 0x00 } -0x16- Parameters:
string- the byte array to be convertedoffset- the initial offset into the byte array. it is assumed that string[ offset ] and string[ offset + 1 ] contain the first 16-bit unicode characterlen- the length of the final string- Returns:
- the converted string
- Throws:
ArrayIndexOutOfBoundsException- if offset is out of bounds for the byte array (i.e., is negative or is greater than or equal to string.length)IllegalArgumentException- if len is too large (i.e., there is not enough data in string to create a String of that length)
-
getFromUnicodeLE
Given a byte array of 16-bit unicode characters in little endian format (most important byte last), return a Java String representation of it. { 0x16, 0x00 } -0x16- Parameters:
string- the byte array to be converted- Returns:
- the converted string
-
getFromUnicodeBE
public static String getFromUnicodeBE(byte[] string, int offset, int len) throws ArrayIndexOutOfBoundsException, IllegalArgumentException Given a byte array of 16-bit unicode characters in big endian format (most important byte first), return a Java String representation of it. { 0x00, 0x16 } -0x16- Parameters:
string- the byte array to be convertedoffset- the initial offset into the byte array. it is assumed that string[ offset ] and string[ offset + 1 ] contain the first 16-bit unicode characterlen- the length of the final string- Returns:
- the converted string
- Throws:
ArrayIndexOutOfBoundsException- if offset is out of bounds for the byte array (i.e., is negative or is greater than or equal to string.length)IllegalArgumentException- if len is too large (i.e., there is not enough data in string to create a String of that length)
-
getFromUnicodeBE
Given a byte array of 16-bit unicode characters in big endian format (most important byte first), return a Java String representation of it. { 0x00, 0x16 } -0x16- Parameters:
string- the byte array to be converted- Returns:
- the converted string
-
getFromCompressedUnicode
Read 8 bit data (in ISO-8859-1 codepage) into a (unicode) Java String and return. (In Excel terms, read compressed 8 bit unicode as a string)- Parameters:
string- byte array to readoffset- offset to read byte arraylen- length to read byte array- Returns:
- String generated String instance by reading byte array
-
putCompressedUnicode
Takes a unicode (java) string, and returns it as 8 bit data (in ISO-8859-1 codepage). (In Excel terms, write compressed 8 bit unicode)- Parameters:
input- the String containing the data to be writtenoutput- the byte array to which the data is to be writtenoffset- an offset into the byte arrat at which the data is start when written
-
putUnicodeLE
Takes a unicode string, and returns it as little endian (most important byte last) bytes in the supplied byte array. (In Excel terms, write uncompressed unicode)- Parameters:
input- the String containing the unicode data to be writtenoutput- the byte array to hold the uncompressed unicode, should be twice the length of the Stringoffset- the offset to start writing into the byte array
-
putUnicodeBE
Takes a unicode string, and returns it as big endian (most important byte first) bytes in the supplied byte array. (In Excel terms, write uncompressed unicode)- Parameters:
input- the String containing the unicode data to be writtenoutput- the byte array to hold the uncompressed unicode, should be twice the length of the Stringoffset- the offset to start writing into the byte array
-
format
Apply printf() like formatting to a string. Primarily used for logging.- Parameters:
message- the string with embedded formatting info eg. "This is a test %2.2"params- array of values to format into the string- Returns:
- The formatted string
-
getPreferredEncoding
- Returns:
- the encoding we want to use, currently hardcoded to ISO-8859-1
-
hasMultibyte
check the parameter has multibyte character- Parameters:
value- string to check- Returns:
- boolean result true:string has at least one multibyte character
-
isUnicodeString
Checks to see if a given String needs to be represented as Unicode- Parameters:
value-- Returns:
- true if string needs Unicode to be represented.
-