Imported Upstream version 2.0.3

author: Hans-Christoph Steiner <hans@eds.org> 2012-03-30 20:42:12 -0400
committer: Hans-Christoph Steiner <hans@eds.org> 2012-03-30 20:42:12 -0400
commit: 7bb481fda9ecb134804b49c2ce77ca28f7eea583 (patch)
tree: 31b520b9914d3e2453968abe375f2c102772c3dc /ext/icu/README.txt
1 files changed, 169 insertions, 0 deletions
diff --git a/ext/icu/README.txt b/ext/icu/README.txt
new file mode 100644
index 0000000..c5cadb5
--- /dev/null
+++ b/ext/icu/README.txt
@@ -0,0 +1,169 @@
+
+This directory contains source code for the SQLite "ICU" extension, an
+integration of the "International Components for Unicode" library with
+SQLite. Documentation follows.
+
+    1. Features
+    
+        1.1  SQL Scalars upper() and lower()
+        1.2  Unicode Aware LIKE Operator
+        1.3  ICU Collation Sequences
+        1.4  SQL REGEXP Operator
+    
+    2. Compilation and Usage
+    
+    3. Bugs, Problems and Security Issues
+    
+        3.1  The "case_sensitive_like" Pragma
+        3.2  The SQLITE_MAX_LIKE_PATTERN_LENGTH Macro
+        3.3  Collation Sequence Security Issue
+
+
+1. FEATURES
+
+  1.1  SQL Scalars upper() and lower()
+
+    SQLite's built-in implementations of these two functions only 
+    provide case mapping for the 26 letters used in the English
+    language. The ICU based functions provided by this extension
+    provide case mapping, where defined, for the full range of 
+    unicode characters.
+
+    ICU provides two types of case mapping, "general" case mapping and
+    "language specific". Refer to ICU documentation for the differences
+    between the two. Specifically:
+
+       http://www.icu-project.org/userguide/caseMappings.html
+       http://www.icu-project.org/userguide/posix.html#case_mappings
+
+    To utilise "general" case mapping, the upper() or lower() scalar 
+    functions are invoked with one argument:
+
+        upper('ABC') -> 'abc'
+        lower('abc') -> 'ABC'
+
+    To access ICU "language specific" case mapping, upper() or lower()
+    should be invoked with two arguments. The second argument is the name
+    of the locale to use. Passing an empty string ("") or SQL NULL value
+    as the second argument is the same as invoking the 1 argument version
+    of upper() or lower():
+
+        lower('I', 'en_us') -> 'i'
+        lower('I', 'tr_tr') -> 'ı' (small dotless i)
+
+  1.2  Unicode Aware LIKE Operator
+
+    Similarly to the upper() and lower() functions, the built-in SQLite LIKE
+    operator understands case equivalence for the 26 letters of the English
+    language alphabet. The implementation of LIKE included in this
+    extension uses the ICU function u_foldCase() to provide case
+    independent comparisons for the full range of unicode characters.  
+
+    The U_FOLD_CASE_DEFAULT flag is passed to u_foldCase(), meaning the
+    dotless 'I' character used in the Turkish language is considered
+    to be in the same equivalence class as the dotted 'I' character
+    used by many languages (including English).
+
+  1.3  ICU Collation Sequences
+
+    A special SQL scalar function, icu_load_collation() is provided that 
+    may be used to register ICU collation sequences with SQLite. It
+    is always called with exactly two arguments, the ICU locale 
+    identifying the collation sequence to ICU, and the name of the
+    SQLite collation sequence to create. For example, to create an
+    SQLite collation sequence named "turkish" using Turkish language
+    sorting rules, the SQL statement:
+
+        SELECT icu_load_collation('tr_TR', 'turkish');
+
+    Or, for Australian English:
+
+        SELECT icu_load_collation('en_AU', 'australian');
+
+    The identifiers "turkish" and "australian" may then be used
+    as collation sequence identifiers in SQL statements:
+
+        CREATE TABLE aust_turkish_penpals(
+          australian_penpal_name TEXT COLLATE australian,
+          turkish_penpal_name    TEXT COLLATE turkish
+        );
+  
+  1.4 SQL REGEXP Operator
+
+    This extension provides an implementation of the SQL binary
+    comparision operator "REGEXP", based on the regular expression functions
+    provided by the ICU library. The syntax of the operator is as described
+    in SQLite documentation:
+
+        <string> REGEXP <re-pattern>
+
+    This extension uses the ICU defaults for regular expression matching
+    behaviour. Specifically, this means that:
+
+        * Matching is case-sensitive,
+        * Regular expression comments are not allowed within patterns, and
+        * The '^' and '$' characters match the beginning and end of the
+          <string> argument, not the beginning and end of lines within
+          the <string> argument.
+
+    Even more specifically, the value passed to the "flags" parameter
+    of ICU C function uregex_open() is 0.
+
+
+2  COMPILATION AND USAGE
+
+  The easiest way to compile and use the ICU extension is to build
+  and use it as a dynamically loadable SQLite extension. To do this
+  using gcc on *nix:
+
+    gcc -shared icu.c `icu-config --ldflags` -o libSqliteIcu.so
+
+  You may need to add "-I" flags so that gcc can find sqlite3ext.h
+  and sqlite3.h. The resulting shared lib, libSqliteIcu.so, may be
+  loaded into sqlite in the same way as any other dynamically loadable
+  extension.
+
+
+3 BUGS, PROBLEMS AND SECURITY ISSUES
+
+  3.1 The "case_sensitive_like" Pragma
+
+    This extension does not work well with the "case_sensitive_like"
+    pragma. If this pragma is used before the ICU extension is loaded,
+    then the pragma has no effect. If the pragma is used after the ICU
+    extension is loaded, then SQLite ignores the ICU implementation and
+    always uses the built-in LIKE operator.
+
+    The ICU extension LIKE operator is always case insensitive.
+
+  3.2 The SQLITE_MAX_LIKE_PATTERN_LENGTH Macro
+
+    Passing very long patterns to the built-in SQLite LIKE operator can
+    cause excessive CPU usage. To curb this problem, SQLite defines the
+    SQLITE_MAX_LIKE_PATTERN_LENGTH macro as the maximum length of a
+    pattern in bytes (irrespective of encoding). The default value is
+    defined in internal header file "limits.h".
+    
+    The ICU extension LIKE implementation suffers from the same 
+    problem and uses the same solution. However, since the ICU extension
+    code does not include the SQLite file "limits.h", modifying
+    the default value therein does not affect the ICU extension.
+    The default value of SQLITE_MAX_LIKE_PATTERN_LENGTH used by
+    the ICU extension LIKE operator is 50000, defined in source 
+    file "icu.c".
+
+  3.3 Collation Sequence Security Issue
+
+    Internally, SQLite assumes that indices stored in database files
+    are sorted according to the collation sequence indicated by the
+    SQL schema. Changing the definition of a collation sequence after
+    an index has been built is therefore equivalent to database
+    corruption. The SQLite library is not very well tested under
+    these conditions, and may contain potential buffer overruns
+    or other programming errors that could be exploited by a malicious
+    programmer.
+
+    If the ICU extension is used in an environment where potentially
+    malicious users may execute arbitrary SQL (i.e. gears), they
+    should be prevented from invoking the icu_load_collation() function,
+    possibly using the authorisation callback.
author	Hans-Christoph Steiner <hans@eds.org>	2012-03-30 20:42:12 -0400
committer	Hans-Christoph Steiner <hans@eds.org>	2012-03-30 20:42:12 -0400
commit	7bb481fda9ecb134804b49c2ce77ca28f7eea583 (patch)
tree	31b520b9914d3e2453968abe375f2c102772c3dc /ext/icu/README.txt