Better String library
---------------------

by Paul Hsieh

The bstring library is an attempt to provide improved string processing 
functionality to the C and C++ language.  At the heart of the bstring library 
(Bstrlib for short) is the management of "bstring"s which are a significant 
improvement over '\0' terminated char buffers.

===============================================================================

Motivation
----------

The standard C string library has serious problems:

    1) Its use of '\0' to denote the end of the string means knowing a 
       string's length is O(n) when it could be O(1).
    2) It imposes an interpretation for the character value '\0'.
    3) gets() always exposes the application to a buffer overflow.
    4) strtok() modifies the string its parsing and thus may not be usable in
       programs which are re-entrant or multithreaded.
    5) fgets has the unusual semantic of ignoring '\0's that occur before
       '\n's are consumed.
    6) There is no memory management, and actions performed such as strcpy,
       strcat and sprintf are common places for buffer overflows.
    7) strncpy() doesn't '\0' terminate the destination in some cases.
    8) Passing NULL to C library string functions causes an undefined NULL 
       pointer access.
    9) Parameter aliasing (overlapping, or self-referencing parameters) 
       within most C library functions has undefined behavior.
   10) Many C library string function calls take integer parameters with 
       restricted legal ranges.  Parameters passed outside these ranges are
       not typically detected and cause undefined behavior.

So the desire is to create an alternative string library that does not suffer
from the above problems and adds in the following functionality:

    1) Incorporate string functionality seen from other languages.
        a) MID$() - from BASIC
        b) split()/join() - from Python
        c) string/char x n - from Perl
    2) Implement analogs to functions that combine stream IO and char buffers
       without creating a dependency on stream IO functionality.
    3) Implement the basic text editor-style functions insert, delete, find,
       and replace.
    4) Implement reference based sub-string access (as a generalization of 
       pointer arithmetic.)
    5) Implement runtime write protection for strings.

There is also a desire to avoid "API-bloat".  So functionality that can be
implemented trivially in other functionality is omitted.  So there is no
left$() or right$() or reverse() or anything like that as part of the core 
functionality.

Explaining Bstrings
-------------------

A bstring is basically a header which wraps a pointer to a char buffer.  Lets
start with the declaration of a struct tagbstring:

    struct tagbstring {
        int mlen;
        int slen;
        unsigned char * data;
    };

This definition is considered exposed, not opaque (though it is neither 
necessary nor recommended that low level maintenance of bstrings be performed 
whenever the abstract interfaces are sufficient).  The mlen field (usually)
describes a lower bound for the memory allocated for the data field.  The 
slen field describes the exact length for the bstring.  The data field is a 
single contiguous buffer of unsigned chars.  Note that the existence of a '\0' 
character in the unsigned char buffer pointed to by the data field does not 
necessarily denote the end of the bstring.

To be a well formed modifiable bstring the mlen field must be at least the 
length of the slen field, and slen must be non-negative.  Furthermore, the 
data field must point to a valid buffer in which access to the first mlen 
characters has been acquired.  So the minimal check for correctness is:

    (slen >= 0 && mlen >= slen && data != NULL)

bstrings returned by bstring functions can be assumed to be either NULL or
satisfy the above property.  (When bstrings are only readable, the mlen >= 
slen restriction is not required; this is discussed later in this section.)  
A bstring itself is just a pointer to a struct tagbstring:

    typedef struct tagbstring * bstring;

Note that use of the prefix "tag" in struct tagbstring is required to work
around the inconsistency between C and C++'s struct namespace usage.  This
definition is also considered exposed.

Bstrlib basically manages bstrings allocated as a header and an associated 
data-buffer.  Since the implementation is exposed, they can also be 
constructed manually.  Functions which mutate bstrings assume that the header 
and data buffer have been malloced; the bstring library may perform free() or 
realloc() on both the header and data buffer of any bstring parameter. 
Functions which return bstring's create new bstrings.  The string memory is 
freed by a bdestroy() call (or using the bstrFree macro).

The following related typedef is also provided:

    typedef const struct tagbstring * const_bstring;

which is also considered exposed.  These are directly bstring compatible (no
casting required) but are just used for parameters which are meant to be 
non-mutable.  So in general, bstring parameters which are read as input but 
not meant to be modified will be declared as const_bstring, and bstring
parameters which may be modified will be declared as bstring.  This convention
is recommended for user written functions as well.

Since bstrings maintain interoperability with C library char-buffer style 
strings, all functions which modify, update or create bstrings also append a 
'\0' character into the position slen + 1.  This trailing '\0' character is 
not required for bstrings input to the bstring functions; this is provided 
solely as a convenience for interoperability with standard C char-buffer 
functionality.

Analogs for the ANSI C string library functions have been created when they
are necessary, but have also been left out when they are not.  In particular
there are no functions analogous to fwrite, or puts just for the purposes of
bstring.  The ->data member of any string is exposed, and therefore can be
used just as easily as char buffers for C functions which read strings.

For those that wish to hand construct bstrings, the following should be kept 
in mind:

    1) While bstrlib can accept constructed bstrings without terminating
       '\0' characters, the rest of the C language string library will not
       function properly on such non-terminated strings.  This is obvious
       but must be kept in mind.
    2) If it is intended that a constructed bstring be written to by the
       bstring library functions then the data portion should be allocated
       by the malloc function and the slen and mlen fields should be entered
       properly.  The struct tagbstring header is not reallocated, and only
       freed by bdestroy.
    3) Writing arbitrary '\0' characters at various places in the string
       will not modify its length as perceived by the bstring library 
       functions.  In fact, '\0' is a legitimate non-terminating character
       for a bstring to contain.
    4) For read only parameters, bstring functions do not check the mlen.  
       I.e., the minimal correctness requirements are reduced to:

            (slen >= 0 && data != NULL)

Better pointer arithmetic
-------------------------

One built-in feature of '\0' terminated char * strings, is that its very easy
and fast to obtain a reference to the tail of any string using pointer 
arithmetic.  Bstrlib does one better by providing a way to get a reference to
any substring of a bstring (or any other length delimited block of memory.)
So rather than just having pointer arithmetic, with bstrlib one essentially 
has segment arithmetic.  This is achieved using the macro blk2tbstr() which 
builds a reference to a block of memory and the macro bmid2tbstr() which 
builds a reference to a segment of a bstring.  Bstrlib also includes 
functions for direct consumption of memory blocks into bstrings, namely 
bcatblk () and blk2bstr ().

One scenario where this can be extremely useful is when string contains many 
substrings which one would like to pass as read-only reference parameters to 
some string consuming function without the need to allocate entire new 
containers for the string data.  More concretely, imagine parsing a command 
line string whose parameters are space delimited.  This can only be done for 
tails of the string with '\0' terminated char * strings.

Improved NULL semantics and error handling
------------------------------------------

Unless otherwise noted, if a NULL pointer is passed as a bstring or any other 
detectably illegal parameter, the called function will return with an error 
indicator (either NULL or BSTR_ERR) rather than simply performing a NULL 
pointer access, or having undefined behavior.

To illustrate the value of this, consider the following example:

        strcpy (p = malloc (13 * sizeof (char)), "Hello,");
        strcat (p, " World");

This is not correct because malloc may return NULL (due to an out of memory 
condition), and the behaviour of strcpy is undefined if either of its 
parameters are NULL.  However:

        bstrcat (p = bfromcstr ("Hello,"), q = bfromcstr (" World"));
        bdestroy (q);

is well defined, because if either p or q are assigned NULL (indicating a 
failure to allocate memory) both bstrcat and bdestroy will recognize it and 
perform no detrimental action.

Note that it is not necessary to check any of the members of a returned 
bstring for internal correctness (in particular the data member does not need 
to be checked against NULL when the header is non-NULL), since this is 
assured by the bstring library itself.

bStreams
--------

In addition to the bgets and bread functions, bstrlib can abstract streams 
with a high performance read only stream called a bStream.  In general, the 
idea is to open a core stream (with something like fopen) then pass its
handle as well as a bNread function pointer (like fread) to the bsopen 
function which will return a handle to an open bStream.  Then the functions 
bsread, bsreadln or bsreadlns can be called to read portions of the stream.  
Finally, the bsclose function is called to close the bStream -- it will 
return a handle to the original (core) stream.  So bStreams, essentially, 
wrap other streams.

The bStreams have two main advantages over the bgets and bread (as well as
fgets/ungetc) paradigms:

1) Improved functionality via the bunread function which allows a stream to 
   unread characters, giving the bStream stack-like functionality if so 
   desired.
2) A very high performance bsreadln function.  The C library function fgets()
   (and the bgets function) can typically be written as a loop on top of 
   fgetc(), thus paying all of the overhead costs of calling fgetc on a per 
   character basis.  bsreadln will read blocks at a time, thus amortizing the 
   overhead of fread calls over many characters at once.

However, clearly bStreams are suboptimal or unusable for certain kinds of 
streams (stdin) or certain usage patterns (a few spotty, or non-sequential
reads from a slow stream.)  For those situations, using bgets will be more 
appropriate.

The semantics of bStreams allows practical construction of layerable data 
streams.  What this means is that by writing a bNread compatible function on
top of a bStream, one can construct a new bStream on top of it.  This can be
useful for writing multi-pass parsers that don't actually read the entire
input more than once and don't require the use of intermediate storage.

Aliasing
--------

Aliasing occurs when a function is given two parameters which point to data
structures which overlap in the memory they occupy.  While this does not
disturb read only functions, for many libraries this can make functions that 
write to these memory locations malfunction.  This is a common problem of the 
C standard library and especially the string functions in the C standard 
library.

The C standard string library is entirely char by char oriented (as is 
bstring) which makes conforming implementations alias safe for some 
scenarios.  However no actual detection of aliasing is typically performed, 
so it is easy to find cases where the aliasing will cause anomolous or 
undesirable behaviour (consider: strcat (p, p).)  The C99 standard includes 
the "restrict" pointer modifier which allows the compiler to document and 
assume a no-alias condition on usage.  However, only the most trivial cases 
can be caught (if at all) by the compiler at compile time, and thus there is 
no actual enforcement of non-aliasing.

Bstrlib, by contrast, permits aliasing and is completely aliasing safe, in 
the C99 sense of aliasing.  That is to say, under the assumption that 
pointers of incompatible types from distinct objects can never alias, bstrlib 
is completely aliasing safe.  (In practice this means that the data buffer 
portion of any bstring and header of any bstring are assumed to never alias.) 
With the exception of the reference building macros, the library behaves as 
if all read-only parameters are first copied and replaced by temporary 
non-aliased parameters before any writing to any output bstring is performed 
(though actual copying is extremely rarely ever done.)

Besides being a useful safety feature, bstring searching/comparison 
functions can improve to O(1) execution when aliasing is detected.

Note that aliasing detection and handling code in Bstrlib is generally 
extremely cheap.  There is almost never any appreciable performance penalty
for using aliased parameters.

Reenterancy
-----------

Nearly every function in Bstrlib is a leaf function, and is completely 
reenterable with the exception of writing to common bstrings.  The split 
functions which use a callback mechanism requires only that the source string 
not be destroyed by the callback function unless the callback function returns
with an error status (note that Bstrlib functions which return an error do 
not modify the string in any way.)  The string can in fact be modified by the
callback and the behaviour is deterministic.  See the documentation of the 
various split functions for more details.

Undefined scenarios
-------------------

One of the basic important premises for Bstrlib is to not to increase the
propogation of undefined situations from parameters that are otherwise legal
in of themselves.  In particular, except for extremely marginal cases, usages 
of bstrings that use the bstring library functions alone cannot lead to any 
undefined action.  But due to C/C++ language and library limitations, there 
is no way to define a non-trivial library that is completely without 
undefined operations.  All such possible undefined operations are described 
below:

1) bstrings or struct tagbstrings that are not explicitely initialized cannot
   be passed as a parameter to any bstring function.
2) The members of the NULL bstring cannot be accessed directly.  (Though all
   APIs and macros detect the NULL bstring.)
3) A bstring whose data member has not been obtained from a malloc or 
   compatible call and which is write accessible passed as a writable 
   parameter will lead to undefined results.  (i.e., do not writeAllow any 
   constructed bstrings unless the data portion has been obtained from the 
   heap.)
4) If the headers of two strings alias but are not identical (which can only 
   happen via a defective manual construction), then passing them to a 
   bstring function in which one is writable is not defined.
5) If the mlen member is larger than the actual accessible length of the data 
   member for a writable bstring, or if the slen member is larger than the 
   readable length of the data member for a readable bstring, then the 
   corresponding bstring operations are undefined.
6) Any bstring definition whose header or accessible data portion has been
   assigned to inaccessible or otherwise illegal memory clearly cannot be
   acted upon by the bstring library in any way.
7) Destroying the source of an incremental split from within the callback
   and not returning with a negative value (indicating that it should abort)
   will lead to undefined behaviour.  (Though *modifying* or adjusting the 
   state of the source data, even if those modification fail within the 
   bstrlib API, has well defined behavior.)
8) Modifying a bstring which is write protected by direct access has 
   undefined behavior.

While this may seem like a long list, with the exception of invalid uses of 
the writeAllow macro, and source destruction during an iterative split 
without an accompanying abort, no usage of the bstring API alone can cause 
any undefined scenario to occurr.  I.e., the policy of restricting usage of 
bstrings to the bstring API can significantly reduce the risk of runtime 
errors (in practice it should eliminate them) related to string manipulation 
due to undefined action.

C++ wrapper
-----------

A C++ wrapper has been created to enable bstring functionality for C++ in the
most natural (for C++ programers) way possible.  The mandate for the C++ 
wrapper is different from the base C bstring library.  Since the C++ language 
has far more abstracting capabilities, the CBString structure is considered 
fully abstracted -- i.e., hand generated CBStrings are not supported (though 
conversion from a struct tagbstring is allowed) and all detectable errors are
manifest as thrown exceptions.

- The C++ class definitions are all under the namespace Bstrlib.  bstrwrap.h 
  enables this namespace (with a using namespace Bstrlib; directive at the 
  end) unless the macro BSTRLIB_DONT_ASSUME_NAMESPACE has been defined before 
  it is included.

- Erroneous accesses results in an exception being thrown.  The exception 
  parameter is of type "struct CBStringException" which is derived from
  std::exception if STL is used.  A verbose description of the error message 
  can be obtained from the what() method.

- CBString is a C++ structure derived from a struct tagbstring.  An address 
  of a CBString cast to a bstring must not be passed to bdestroy.  The bstring 
  C API has been made C++ safe and can be used directly in a C++ project.

- It includes constructors which can take a char, '\0' terminated char 
  buffer, tagbstring, (char, repeat-value), a length delimited buffer or a 
  CBStringList to initialize it.

- Concatenation is performed with the + and += operators.  Comparisons are 
  done with the ==, !=, <, >, <= and >= operators.  Note that == and != use
  the biseq call, while <, >, <= and >= use bstrcmp.

- CBString's can be directly cast to const character buffers.

- CBString's can be directly cast to double, float, int or unsigned int so
  long as the CBString are decimal representations of those types (otherwise
  an exception will be thrown).  Converting the other way should be done with
  the format(a) method(s).

- CBString contains the length, character and [] accessor methods.  The
  character and [] accessors are aliases of each other.  If the bounds for 
  the string are exceeded, an exception is thrown.  To avoid the overhead for
  this check, first cast the CBString to a (const char *) and use [] to 
  dereference the array as normal.  Note that the character and [] accessor 
  methods allows both reading and writing of individual characters.

- The methods: format, formata, find, reversefind, findcaseless, 
  reversefindcaseless, midstr, insert, insertchrs, replace, findreplace, 
  findreplacecaseless, remove, findchr, nfindchr, alloc, toupper, tolower, 
  gets, read are analogous to the functions that can be found in the C API.

- The caselessEqual and caselessCmp methods are analogous to biseqcaseless
  and bstricmp functions respectively.

- Note that just like the bformat function, the format and formata methods do 
  not automatically cast CBStrings into char * strings for "%s"-type 
  substitutions:

	CBString w("world");
	CBString h("Hello");
	CBString hw;

	/* The casts are necessary */
	hw.format ("%s, %s", (const char *)h, (const char *)w);

- The methods trunc and repeat have been added instead of using pattern.

- ltrim, rtrim and trim methods have been added.  These remove characters 
  from a given character string set (defaulting to the whitespace characters) 
  from either the left, right or both ends of the CBString, respectively.

- The method setsubstr is also analogous in functionality to bsetstr, except 
  that it cannot be passed NULL.  Instead the method fill and the fill-style 
  constructor have been supplied to enable this functionality.

- The writeprotect(), writeallow() and iswriteprotected() methods are 
  analogous to the bwriteprotect(), bwriteallow() and biswriteprotected() 
  macros in the C API.  Write protection semantics in CBString are stronger 
  than with the C API in that indexed character assignment is checked for 
  write protection.  However, unlike with the C API, a write protected 
  CBString can be destroyed by the destructor.

- CBStream is a C++ structure which wraps a struct bStream (its not derived
  from it, since destruction is slightly different).  It is constructed by
  passing in a bNread function pointer and a stream parameter cast to void *. 
  This structure includes methods for detecting eof, setting the buffer 
  length, reading the whole stream or reading entries line by line or block 
  by block, an unread function, and a peek function.

- If STL is available, the CBStringList structure is derived from a vector of 
  CBString with various split methods.  The split method has been overloaded 
  to accept either a character or CBString as the second parameter (when the 
  split parameter is a CBString any character in that CBString is used as a
  seperator).  The splitstr method takes a CBString as a substring seperator. 
  Joins can be performed via a CBString constructor which takes a 
  CBStringList as a parameter, or just using the CBString::join() method.

- If there is proper support for std::iostreams, then the >> and << operators 
  and the getline() function have been added (with semantics the same as 
  those for std::string).

Multithreading
--------------

A mutable bstring is kind of analogous to a small (two entry) linked list 
allocated by malloc, with all aliasing completely under programmer control. 
I.e., manipulation of one bstring will never affect any other distinct 
bstring unless explicitely constructed to do so by the programmer via hand 
construction or via building a reference.  Bstrlib also does not use any 
static or global storage, so there are no hidden unremovable race conditions. 
Bstrings are also clearly not inherently thread local.  So just like 
char *'s, bstrings can be passed around from thread to thread and shared and 
so on, so long as modifications to a bstring correspond to some kind of 
exclusive access lock as should be expected (or if the bstring is read-only, 
which can be enforced by bstring write protection) for any sort of shared 
object in a multithreaded environment.

Bsafe module
------------

For convenience, a bsafe module has been included.  The idea is that if this
module is included, inadvertant usage of the most dangerous C functions will 
be overridden and lead to an immediate run time abort.  Of course, it should 
be emphasized that usage of this module is completely optional.  The 
intention is essentially to provide an option for creating project safety 
rules which can be enforced mechanically rather than socially.  This is 
useful for larger, or open development projects where its more difficult to 
enforce social rules or "coding conventions".

Problems not solved
-------------------

Bstrlib is written for the C and C++ languages, which have inherent weaknesses
that cannot be easily solved:

1. Memory leaks:  Forgetting to call bdestroy on a bstring that is about to be
   unreferenced, just as forgetting to call free on a heap buffer that is 
   about to be dereferenced.  Though bstrlib itself is leak free.
2. Read before write usage:  In C, declaring an auto bstring does not 
   automatically fill it with legal/valid contents.  This problem has been 
   somewhat mitigated in C++.  (The bstrDeclare and bstrFree macros from 
   bstraux can be used to help mitigate this problem.)

Other problems not addressed:

3. Built-in mutex usage to automatically avoid all bstring internal race 
   conditions in multitasking environments: The problem with trying to 
   implement such things at this low a level is that it is typically more 
   efficient to use locks in higher level primitives. There is also no 
   platform independent way to implement locks or mutexes.
4. Unicode/widecharacter support.

Note that except for spotty support of wide characters, the default C 
standard library does not address any of these problems either.

Configurable compilation options
--------------------------------

All configuration options are meant solely for the purpose of compiler 
compatibility.  Configuration options are not meant to change the semantics
or capabilities of the library, except where it is unavoidable.

Since some C++ compilers don't include the Standard Template Library and some 
have the options of disabling exception handling, a number of macros can be 
used to conditionally compile support for each of this:

BSTRLIB_CAN_USE_STL

  - defining this will enable the used of the Standard Template Library.  
    Defining BSTRLIB_CAN_USE_STL overrides the BSTRLIB_CANNOT_USE_STL macro.

BSTRLIB_CANNOT_USE_STL

  - defining this will disable the use of the Standard Template Library.  
    Defining BSTRLIB_CAN_USE_STL overrides the BSTRLIB_CANNOT_USE_STL macro.

BSTRLIB_CAN_USE_IOSTREAM

  - defining this will enable the used of streams from class std.  Defining 
    BSTRLIB_CAN_USE_IOSTREAM overrides the BSTRLIB_CANNOT_USE_IOSTREAM macro.

BSTRLIB_CANNOT_USE_IOSTREAM

  - defining this will disable the use of streams from class std.  Defining 
    BSTRLIB_CAN_USE_IOSTREAM overrides the BSTRLIB_CANNOT_USE_IOSTREAM macro.

BSTRLIB_THROWS_EXCEPTIONS

  - defining this will enable the exception handling within bstring.
    Defining BSTRLIB_THROWS_EXCEPTIONS overrides the 
    BSTRLIB_DOESNT_THROWS_EXCEPTIONS macro.

BSTRLIB_DOESNT_THROW_EXCEPTIONS

  - defining this will disable the exception handling within bstring.
    Defining BSTRLIB_THROWS_EXCEPTIONS overrides the 
    BSTRLIB_DOESNT_THROW_EXCEPTIONS macro.

Note that these macros must be defined consistently throughout all modules 
that use CBStrings including bstrwrap.cpp.

Some older C compilers do not support functions such as vsnprintf.  This is 
handled by the following macro variables:

BSTRLIB_NOVSNP

  - defining this indicates that the compiler does not support vsnprintf.
    This will cause bformat and bformata to not be declared.  Note that
    for some compilers, such as Turbo C, this is set automatically.
    Defining BSTRLIB_NOVSNP overrides the BSTRLIB_VSNP_OK macro.

BSTRLIB_VSNP_OK

  - defining this will disable the autodetection of compilers the do not
    support of compilers that do not support vsnprintf.
    Defining BSTRLIB_NOVSNP overrides the BSTRLIB_VSNP_OK macro.

Semantic compilation options
----------------------------

Bstrlib comes with very few compilation options for changing the semantics of
of the library.  These are described below.

BSTRLIB_DONT_ASSUME_NAMESPACE

  - Defining this before including bstrwrap.h will disable the automatic 
    enabling of the Bstrlib namespace for the C++ declarations.

BSTRLIB_DONT_USE_VIRTUAL_DESTRUCTOR

  - Defining this will make the CBString destructor non-virtual.

BSTRLIB_MEMORY_DEBUG

  - Defining this will cause the bstrlib modules bstrlib.c and bstrwrap.cpp
    to invoke a #include "memdbg.h".  memdbg.h has to be supplied by the user.

Note that these macros must be defined consistently throughout all modules 
that use bstrings or CBStrings including bstrlib.c, bstraux.c and 
bstrwrap.cpp.

===============================================================================

Files
-----

bstrlib.c       - C implementaion of bstring functions.
bstrlib.h       - C header file for bstring functions.
bstraux.c       - C example that implements trivial additional functions.
bstraux.h       - C header for bstraux.c
bstest.c        - C unit/regression test for bstrlib.c

bstrwrap.cpp    - C++ implementation of CBString.
bstrwrap.h      - C++ header file for CBString.
test.cpp        - C++ unit/regression test for bstrwrap.cpp

bsafe.c         - C runtime stubs to abort usage of unsafe C functions.
bsafe.h         - C header file for bsafe.c functions.

C projects need only include bstrlib.h and compile/link bstrlib.c to use the
bstring library.  C++ projects need to additionally include bstrwrap.h and
compile/link bstrwrap.cpp.  For both, there may be a need to make choices 
about feature configuration as described in the "Configurable compilation 
options" in the section above.

Other files that are included in this archive are:

license.txt     - The BSD license for Bstrlib
gpl.txt         - The GPL version 2
security.txt    - A security statement useful for auditting Bstrlib
porting.txt     - A guide to porting Bstrlib
bstrlib.txt     - This file

===============================================================================

The functions
-------------

    extern bstring bfromcstr (const char * str);

    Take a standard C library style '\0' terminated char buffer and generate
    a bstring with the same contents as the char buffer.  If an error occurs 
    NULL is returned.

    So for example:

	bstring b = bfromcstr ("Hello");
	if (!b) {
		fprintf (stderr, "Out of memory");
	} else {
		puts ((char *) b->data);
	}

    ..........................................................................

    extern bstring bfromcstralloc (int mlen, const char * str);

    Create a bstring which contains the contents of the '\0' terminated 
    char * buffer str.  The memory buffer backing the bstring is at least 
    mlen characters in length.  If an error occurs NULL is returned.

    So for example:

	bstring b = bfromcstralloc (64, someCstr);
	if (b) b->data[63] = 'x';

    The idea is that this will set the 64th character of b to 'x' if it is at 
    least 64 characters long otherwise do nothing.  And we know this is well 
    defined so long as b was successfully created, since it will have been 
    allocated with at least 64 characters.

    ..........................................................................

    extern bstring blk2bstr (const void * blk, int len);

    Create a bstring whose contents are described by the contiguous buffer 
    pointing to by blk with a length of len bytes.  Note that this function
    creates a copy of the data in blk, rather than simply referencing it.  
    Compare with the blk2tbstr macro.  If an error occurs NULL is returned.

    ..........................................................................

    extern char * bstr2cstr (const_bstring s, char z);

    Create a '\0' terminated char buffer which contains the contents of the 
    bstring s, except that any contained '\0' characters are converted to the 
    character in z.  This returned value should be freed with bcstrfree(), by 
    the caller.  If an error occurs NULL is returned.

    ..........................................................................

    extern int bcstrfree (char * s);

    Frees a C-string generated by bstr2cstr ().  This is normally unnecessary
    since it just wraps a call to free (), however, if malloc () and free () 
    have been redefined as a macros within the bstrlib module (via macros in 
    the memdbg.h backdoor) with some difference in behaviour from the std 
    library functions, then this allows a correct way of freeing the memory 
    that allows higher level code to be independent from these macro 
    redefinitions.

    ..........................................................................

    extern bstring bstrcpy (const_bstring b1);

    Make a copy of the passed in bstring.  The copied bstring is returned if 
    there is no error, otherwise NULL is returned.

    ..........................................................................

    extern int bassign (bstring a, const_bstring b);

    Overwrite the bstring a with the contents of bstring b.  Note that the 
    bstring a must be a well defined and writable bstring.  If an error 
    occurs BSTR_ERR is returned and a is not overwritten.

    ..........................................................................

    int bassigncstr (bstring a, const char * str);

    Overwrite the string a with the contents of char * string str.  Note that 
    the bstring a must be a well defined and writable bstring.  If an error 
    occurs BSTR_ERR is returned and a may be partially overwritten.

    ..........................................................................

    int bassignblk (bstring a, const void * s, int len);

    Overwrite the string a with the contents of the block (s, len).  Note that 
    the bstring a must be a well defined and writable bstring.  If an error 
    occurs BSTR_ERR is returned and a is not overwritten.

    ..........................................................................

    extern int bassignmidstr (bstring a, const_bstring b, int left, int len);

    Overwrite the bstring a with the middle of contents of bstring b 
    starting from position left and running for a length len.  left and 
    len are clamped to the ends of b as with the function bmidstr.  Note that 
    the bstring a must be a well defined and writable bstring.  If an error 
    occurs BSTR_ERR is returned and a is not overwritten.

    ..........................................................................

    extern bstring bmidstr (const_bstring b, int left, int len);

    Create a bstring which is the substring of b starting from position left 
    and running for a length len (clamped by the end of the bstring b.)  If 
    there was no error, the value of this constructed bstring is returned 
    otherwise NULL is returned.

    ..........................................................................

    extern int bdelete (bstring s1, int pos, int len);

    Removes characters from pos to pos+len-1 and shifts the tail of the 
    bstring starting from pos+len to pos.  len must be positive for this call 
    to have any effect.  The section of the bstring described by (pos, len) 
    is clamped to boundaries of the bstring b.  The value BSTR_OK is returned 
    if the operation is successful, otherwise BSTR_ERR is returned.

    ..........................................................................

    extern int bconcat (bstring b0, const_bstring b1);

    Concatenate the bstring b1 to the end of bstring b0.  The value BSTR_OK 
    is returned if the operation is successful, otherwise BSTR_ERR is 
    returned.

    ..........................................................................

    extern int bconchar (bstring b, char c);

    Concatenate the character c to the end of bstring b.  The value BSTR_OK 
    is returned if the operation is successful, otherwise BSTR_ERR is 
    returned.

    ..........................................................................

    extern int bcatcstr (bstring b, const char * s);

    Concatenate the char * string s to the end of bstring b.  The value 
    BSTR_OK is returned if the operation is successful, otherwise BSTR_ERR is 
    returned.

    ..........................................................................

    extern int bcatblk (bstring b, const void * s, int len);

    Concatenate a fixed length buffer (s, len) to the end of bstring b.  The 
    value BSTR_OK is returned if the operation is successful, otherwise 
    BSTR_ERR is returned.

    ..........................................................................

    extern int biseq (const_bstring b0, const_bstring b1);

    Compare the bstring b0 and b1 for equality.  If the bstrings differ, 0 
    is returned, if the bstrings are the same, 1 is returned, if there is an 
    error, -1 is returned.  If the length of the bstrings are different, this 
    function has O(1) complexity.  Contained '\0' characters are not treated 
    as a termination character.

    Note that the semantics of biseq are not completely compatible with 
    bstrcmp because of its different treatment of the '\0' character.

    ..........................................................................

    extern int bisstemeqblk (const_bstring b, const void * blk, int len);

    Compare beginning of bstring b0 with a block of memory of length len for 
    equality.  If the beginning of b0 differs from the memory block (or if b0 
    is too short), 0 is returned, if the bstrings are the same, 1 is returned, 
    if there is an error, -1 is returned.

    ..........................................................................

    extern int biseqcaseless (const_bstring b0, const_bstring b1);

    Compare two bstrings for equality without differentiating between case.  
    If the bstrings differ other than in case, 0 is returned, if the bstrings 
    are the same, 1 is returned, if there is an error, -1 is returned.  If 
    the length of the bstrings are different, this function is O(1).  '\0' 
    termination characters are not treated in any special way.

    ..........................................................................

    extern int bisstemeqcaselessblk (const_bstring b0, const void * blk, int len);

    Compare beginning of bstring b0 with a block of memory of length len 
    without differentiating between case for equality.  If the beginning of b0
    differs from the memory block other than in case (or if b0 is too short), 
    0 is returned, if the bstrings are the same, 1 is returned, if there is an 
    error, -1 is returned.

    ..........................................................................

    extern int biseqcstr (const_bstring b, const char *s);

    Compare the bstring b and char * bstring s.  The C string s must be '\0' 
    terminated at exactly the length of the bstring b, and the contents 
    between the two must be identical with the bstring b with no '\0' 
    characters for the two contents to be considered equal.  This is 
    equivalent to the condition that their current contents will be always be 
    equal when comparing them in the same format after converting one or the 
    other.  If they are equal 1 is returned, if they are unequal 0 is 
    returned and if there is a detectable error BSTR_ERR is returned.

    ..........................................................................

    extern int biseqcstrcaseless (const_bstring b, const char *s);

    Compare the bstring b and char * string s.  The C string s must be '\0'
    terminated at exactly the length of the bstring b, and the contents
    between the two must be identical except for case with the bstring b with
    no '\0' characters for the two contents to be considered equal.  This is
    equivalent to the condition that their current contents will be always be
    equal ignoring case when comparing them in the same format after
    converting one or the other.  If they are equal, except for case, 1 is 
    returned, if they are unequal regardless of case 0 is returned and if 
    there is a detectable error BSTR_ERR is returned.

    ..........................................................................

    extern int bstrcmp (const_bstring b0, const_bstring b1);

    Compare the bstrings b0 and b1 for ordering.  If there is an error, 
    SHRT_MIN is returned, otherwise a value less than or greater than zero, 
    indicating that the bstring pointed to by b0 is lexicographically less 
    than or greater than the bstring pointed to by b1 is returned.  If the
    bstring lengths are unequal but the characters up until the length of the 
    shorter are equal then a value less than, or greater than zero, 
    indicating that the bstring pointed to by b0 is shorter or longer than the 
    bstring pointed to by b1 is returned.  0 is returned if and only if the 
    two bstrings are the same.  If the length of the bstrings are different, 
    this function is O(n).  Like its standard C library counter part, the 
    comparison does not proceed past any '\0' termination characters 
    encountered.

    The seemingly odd error return value, merely provides slightly more 
    granularity than the undefined situation given in the C library function 
    strcmp.  The function otherwise behaves very much like strcmp().

    Note that the semantics of bstrcmp are not completely compatible with 
    biseq because of its different treatment of the '\0' termination 
    character.

    ..........................................................................

    extern int bstrncmp (const_bstring b0, const_bstring b1, int n);

    Compare the bstrings b0 and b1 for ordering for at most n characters.  If 
    there is an error, SHRT_MIN is returned, otherwise a value is returned as 
    if b0 and b1 were first truncated to at most n characters then bstrcmp 
    was called with these new bstrings are paremeters.  If the length of the 
    bstrings are different, this function is O(n).  Like its standard C 
    library counter part, the comparison does not proceed past any '\0' 
    termination characters encountered.

    The seemingly odd error return value, merely provides slightly more 
    granularity than the undefined situation given in the C library function 
    strncmp.  The function otherwise behaves very much like strncmp().

    ..........................................................................

    extern int bstricmp (const_bstring b0, const_bstring b1);

    Compare two bstrings without differentiating between case.  The return 
    value is the difference of the values of the characters where the two 
    bstrings first differ, otherwise 0 is returned indicating that the 
    bstrings are equal.  If the lengths are different, then a difference from 
    0 is given, but if the first extra character is '\0', then it is taken to 
    be the value UCHAR_MAX+1.

    ..........................................................................

    extern int bstrnicmp (const_bstring b0, const_bstring b1, int n);

    Compare two bstrings without differentiating between case for at most n
    characters.  If the position where the two bstrings first differ is
    before the nth position, the return value is the difference of the values
    of the characters, otherwise 0 is returned.  If the lengths are different
    and less than n characters, then a difference from 0 is given, but if the 
    first extra character is '\0', then it is taken to be the value 
    UCHAR_MAX+1.

    ..........................................................................

    extern int bdestroy (bstring b);

    Deallocate the bstring passed.  Passing NULL in as a parameter will have 
    no effect.  Note that both the header and the data portion of the bstring 
    will be freed.  No other bstring function which modifies one of its 
    parameters will free or reallocate the header.  Because of this, in 
    general, bdestroy cannot be called on any declared struct tagbstring even 
    if it is not write protected.  A bstring which is write protected cannot 
    be destroyed via the bdestroy call.  Any attempt to do so will result in 
    no action taken, and BSTR_ERR will be returned.

    Note to C++ users: Passing in a CBString cast to a bstring will lead to
    undefined behavior (free will be called on the header, rather than the
    CBString destructor.)  Instead just use the ordinary C++ language 
    facilities to dealloc a CBString.

    ..........................................................................

    extern int binstr (const_bstring s1, int pos, const_bstring s2);

    Search for the bstring s2 in s1 starting at position pos and looking in a
    forward (increasing) direction.  If it is found then it returns with the 
    first position after pos where it is found, otherwise it returns BSTR_ERR.  
    The algorithm used is brute force; O(m*n).

    ..........................................................................

    extern int binstrr (const_bstring s1, int pos, const_bstring s2);

    Search for the bstring s2 in s1 starting at position pos and looking in a
    backward (decreasing) direction.  If it is found then it returns with the 
    first position after pos where it is found, otherwise return BSTR_ERR.  
    Note that the current position at pos is tested as well -- so to be 
    disjoint from a previous forward search it is recommended that the 
    position be backed up (decremented) by one position.  The algorithm used 
    is brute force; O(m*n).

    ..........................................................................

    extern int binstrcaseless (const_bstring s1, int pos, const_bstring s2);

    Search for the bstring s2 in s1 starting at position pos and looking in a
    forward (increasing) direction but without regard to case.  If it is 
    found then it returns with the first position after pos where it is 
    found, otherwise it returns BSTR_ERR. The algorithm used is brute force; 
    O(m*n).

    ..........................................................................

    extern int binstrrcaseless (const_bstring s1, int pos, const_bstring s2);

    Search for the bstring s2 in s1 starting at position pos and looking in a
    backward (decreasing) direction but without regard to case.  If it is 
    found then it returns with the first position after pos where it is 
    found, otherwise return BSTR_ERR. Note that the current position at pos 
    is tested as well -- so to be disjoint from a previous forward search it 
    is recommended that the position be backed up (decremented) by one 
    position.  The algorithm used is brute force; O(m*n).

    ..........................................................................

    extern int binchr (const_bstring b0, int pos, const_bstring b1);

    Search for the first position in b0 starting from pos or after, in which 
    one of the characters in b1 is found.  This function has an execution 
    time of O(b0->slen + b1->slen).  If such a position does not exist in b0, 
    then BSTR_ERR is returned.

    ..........................................................................

    extern int binchrr (const_bstring b0, int pos, const_bstring b1);

    Search for the last position in b0 no greater than pos, in which one of 
    the characters in b1 is found.  This function has an execution time
    of O(b0->slen + b1->slen).  If such a position does not exist in b0, 
    then BSTR_ERR is returned.

    ..........................................................................

    extern int bninchr (const_bstring b0, int pos, const_bstring b1);

    Search for the first position in b0 starting from pos or after, in which 
    none of the characters in b1 is found and return it.  This function has 
    an execution time of O(b0->slen + b1->slen).  If such a position does 
    not exist in b0, then BSTR_ERR is returned.

    ..........................................................................

    extern int bninchrr (const_bstring b0, int pos, const_bstring b1);
  
    Search for the last position in b0 no greater than pos, in which none of 
    the characters in b1 is found and return it.  This function has an 
    execution time of O(b0->slen + b1->slen).  If such a position does not 
    exist in b0, then BSTR_ERR is returned.

    ..........................................................................

    extern int bstrchr (const_bstring b, int c);

    Search for the character c in the bstring b forwards from the start of 
    the bstring.  Returns the position of the found character or BSTR_ERR if 
    it is not found.

    NOTE: This has been implemented as a macro on top of bstrchrp ().

    ..........................................................................

    extern int bstrrchr (const_bstring b, int c);

    Search for the character c in the bstring b backwards from the end of the 
    bstring.  Returns the position of the found character or BSTR_ERR if it is 
    not found.

    NOTE: This has been implemented as a macro on top of bstrrchrp ().

    ..........................................................................

    extern int bstrchrp (const_bstring b, int c, int pos);
  
    Search for the character c in b forwards from the position pos 
    (inclusive).  Returns the position of the found character or BSTR_ERR if 
    it is not found.

    ..........................................................................

    extern int bstrrchrp (const_bstring b, int c, int pos);

    Search for the character c in b backwards from the position pos in bstring 
    (inclusive).  Returns the position of the found character or BSTR_ERR if 
    it is not found.

    ..........................................................................

    extern int bsetstr (bstring b0, int pos, const_bstring b1, unsigned char fill);

    Overwrite the bstring b0 starting at position pos with the bstring b1. If 
    the position pos is past the end of b0, then the character "fill" is 
    appended as necessary to make up the gap between the end of b0 and pos.
    If b1 is NULL, it behaves as if it were a 0-length bstring. The value 
    BSTR_OK is returned if the operation is successful, otherwise BSTR_ERR is 
    returned.

    ..........................................................................

    extern int binsert (bstring s1, int pos, const_bstring s2, unsigned char fill);

    Inserts the bstring s2 into s1 at position pos.  If the position pos is 
    past the end of s1, then the character "fill" is appended as necessary to 
    make up the gap between the end of s1 and pos.  The value BSTR_OK is 
    returned if the operation is successful, otherwise BSTR_ERR is returned.

    ..........................................................................

    extern int binsertch (bstring s1, int pos, int len, unsigned char fill);

    Inserts the character fill repeatedly into s1 at position pos for a 
    length len.  If the position pos is past the end of s1, then the 
    character "fill" is appended as necessary to make up the gap between the 
    end of s1 and the position pos + len (exclusive).  The value BSTR_OK is 
    returned if the operation is successful, otherwise BSTR_ERR is returned.

    ..........................................................................

    extern int breplace (bstring b1, int pos, int len, const_bstring b2, 
                         unsigned char fill);

    Replace a section of a bstring from pos for a length len with the bstring 
    b2. If the position pos is past the end of b1 then the character "fill"
    is appended as necessary to make up the gap between the end of b1 and 
    pos.

    ..........................................................................

    extern int bfindreplace (bstring b, const_bstring find,
                             const_bstring replace, int position);

    Replace all occurrences of the find substring with a replace bstring 
    after a given position in the bstring b.  The find bstring must have a 
    length > 0 otherwise BSTR_ERR is returned.  This function does not 
    perform recursive per character replacement; that is to say successive
    searches resume at the position after the last replace.

    So for example:

        bfindreplace (a0 = bfromcstr("aabaAb"), a1 = bfromcstr("a"),
                      a2 = bfromcstr("aa"), 0);

    Should result in changing a0 to "aaaabaaAb".

    This function performs exactly (b->slen - position) bstring comparisons, 
    and data movement is bounded above by character volume equivalent to size 
    of the output bstring.

    ..........................................................................

    extern int bfindreplacecaseless (bstring b, const_bstring find,
                             const_bstring replace, int position);

    Replace all occurrences of the find substring, ignoring case, with a 
    replace bstring after a given position in the bstring b.  The find bstring 
    must have a length > 0 otherwise BSTR_ERR is returned.  This function 
    does not perform recursive per character replacement; that is to say 
    successive searches resume at the position after the last replace.

    So for example:

        bfindreplacecaseless (a0 = bfromcstr("AAbaAb"), a1 = bfromcstr("a"),
                              a2 = bfromcstr("aa"), 0);

    Should result in changing a0 to "aaaabaaaab".

    This function performs exactly (b->slen - position) bstring comparisons, 
    and data movement is bounded above by character volume equivalent to size 
    of the output bstring.

    ..........................................................................

    extern int balloc (bstring b, int length);

    Increase the allocated memory backing the data buffer for the bstring b
    to a length of at least length.  If the memory backing the bstring b is
    already large enough, not action is performed.  This has no effect on the 
    bstring b that is visible to the bstring API.  Usually this function will 
    only be used when a minimum buffer size is required coupled with a direct 
    access to the ->data member of the bstring structure.

    Be warned that like any other bstring function, the bstring must be well
    defined upon entry to this function.  I.e., doing something like:

        b->slen *= 2; /* ?? Most likely incorrect */
        balloc (b, b->slen);

    is invalid, and should be implemented as:

        int t;
        if (BSTR_OK == balloc (b, t = (b->slen * 2))) b->slen = t;

    This function will return with BSTR_ERR if b is not detected as a valid 
    bstring or length is not greater than 0, otherwise BSTR_OK is returned.

    ..........................................................................

    extern int ballocmin (bstring b, int length);

    Change the amount of memory backing the bstring b to at least length.  
    This operation will never truncate the bstring data including the 
    extra terminating '\0' and thus will not decrease the length to less than
    b->slen + 1.  Note that repeated use of this function may cause 
    performance problems (realloc may be called on the bstring more than
    the O(log(INT_MAX)) times).  This function will return with BSTR_ERR if b 
    is not detected as a valid bstring or length is not greater than 0, 
    otherwise BSTR_OK is returned.

    So for example:

	if (BSTR_OK == ballocmin (b, 64)) b->data[63] = 'x';

    The idea is that this will set the 64th character of b to 'x' if it is at 
    least 64 characters long otherwise do nothing.  And we know this is well 
    defined so long as the ballocmin call was successfully, since it will 
    ensure that b has been allocated with at least 64 characters.

    ..........................................................................

    int btrunc (bstring b, int n);

    Truncate the bstring to at most n characters.  This function will return 
    with BSTR_ERR if b is not detected as a valid bstring or n is less than 
    0, otherwise BSTR_OK is returned.

    ..........................................................................

    extern int bpattern (bstring b, int len);

    Replicate the starting bstring, b, end to end repeatedly until it 
    surpasses len characters, then chop the result to exactly len characters. 
    This function operates in-place.  This function will return with BSTR_ERR 
    if b is NULL or of length 0, otherwise BSTR_OK is returned.

    ..........................................................................

    extern int btoupper (bstring b);

    Convert contents of bstring to upper case.  This function will return with 
    BSTR_ERR if b is NULL or of length 0, otherwise BSTR_OK is returned.

    ..........................................................................

    extern int btolower (bstring b);

    Convert contents of bstring to lower case.  This function will return with 
    BSTR_ERR if b is NULL or of length 0, otherwise BSTR_OK is returned.

    ..........................................................................

    extern int bltrimws (bstring b);

    Delete whitespace contiguous from the left end of the bstring.  This 
    function will return with BSTR_ERR if b is NULL or of length 0, otherwise 
    BSTR_OK is returned.

    ..........................................................................

    extern int brtrimws (bstring b);

    Delete whitespace contiguous from the right end of the bstring.  This 
    function will return with BSTR_ERR if b is NULL or of length 0, otherwise 
    BSTR_OK is returned.

    ..........................................................................

    extern int btrimws (bstring b);

    Delete whitespace contiguous from both ends of the bstring.  This function 
    will return with BSTR_ERR if b is NULL or of length 0, otherwise BSTR_OK 
    is returned.

    ..........................................................................

    extern int bstrListCreate (void);

    Create an empty struct bstrList. The struct bstrList output structure is 
    declared as follows:

    struct bstrList {
        int qty, mlen;
        bstring * entry;
    };

    The entry field actually is an array with qty number entries.  The mlen
    record counts the maximum number of bstring's for which there is memory
    in the entry record.

    The Bstrlib API does *NOT* include a comprehensive set of functions for 
    full management of struct bstrList in an abstracted way.  The reason for 
    this is because aliasing semantics of the list are best left to the user 
    of this function, and performance varies wildly depending on the 
    assumptions made.  For a complete list of bstring data type it is 
    recommended that the C++ public std::vector<CBString> be used, since its
    semantics are usage are more standard.

    ..........................................................................

    extern int bstrListDestroy (struct bstrList * sl);

    Destroy a struct bstrList structure that was returned by the bsplit 
    function.  Note that this will destroy each bstring in the ->entry array
    as well.  See bstrListCreate() above for structure of struct bstrList.

    ..........................................................................

    extern int bstrListAlloc (struct bstrList * sl, int msz);

    Ensure that there is memory for at least msz number of entries for the
    list.

    ..........................................................................

    extern int bstrListAllocMin (struct bstrList * sl, int msz);

    Try to allocate the minimum amount of memory for the list to include at
    least msz entries or sl->qty whichever is greater.

    ..........................................................................

    extern struct bstrList * bsplit (bstring str, unsigned char splitChar);

    Create an array of sequential substrings from str divided by the 
    character splitChar.  Successive occurrences of the splitChar will be 
    divided by empty bstring entries, following the semantics from the Python 
    programming language.  To reclaim the memory from this output structure, 
    bstrListDestroy () should be called.  See bstrListCreate() above for 
    structure of struct bstrList.

    ..........................................................................

    extern struct bstrList * bsplits (bstring str, const_bstring splitStr);

    Create an array of sequential substrings from str divided by any 
    character contained in splitStr.  An empty splitStr causes a single entry 
    bstrList containing a copy of str to be returned.  See bstrListCreate() 
    above for structure of struct bstrList.

    ..........................................................................

    extern struct bstrList * bsplitstr (bstring str, const_bstring splitStr);

    Create an array of sequential substrings from str divided by the entire 
    substring splitStr.  An empty splitStr causes a single entry bstrList 
    containing a copy of str to be returned.  See bstrListCreate() above for 
    structure of struct bstrList.

    ..........................................................................

    extern bstring bjoin (const struct bstrList * bl, const_bstring sep);

    Join the entries of a bstrList into one bstring by sequentially 
    concatenating them with the sep bstring in between.  If sep is NULL, it
    is treated as if it were the empty bstring.  Note that:

        bjoin (l = bsplit (b, s->data[0]), s);

    should result in a copy of b, if s->slen is 1.  If there is an error NULL 
    is returned, otherwise a bstring with the correct result is returned.  
    See bstrListCreate() above for structure of struct bstrList.

    ..........................................................................

    extern int bsplitcb (const_bstring str, unsigned char splitChar, int pos,
	int (* cb) (void * parm, int ofs, int len), void * parm);

    Iterate the set of disjoint sequential substrings over str starting at 
    position pos divided by the character splitChar.  The parm passed to 
    bsplitcb is passed on to cb.  If the function cb returns a value < 0, 
    then further iterating is halted and this value is returned by bsplitcb.

    Note: Non-destructive modification of str from within the cb function 
    while performing this split is not undefined.  bsplitcb behaves in 
    sequential lock step with calls to cb.  I.e., after returning from a cb 
    that return a non-negative integer, bsplitcb continues from the position 
    1 character after the last detected split character and it will halt 
    immediately if the length of str falls below this point.  However, if the 
    cb function destroys str, then it *must* return with a negative value, 
    otherwise bsplitcb will continue in an undefined manner.

    This function is provided as an incremental alternative to bsplit that is
    abortable and which does not impose additional memory allocation.

    ..........................................................................

    extern int bsplitscb (const_bstring str, const_bstring splitStr, int pos,
	int (* cb) (void * parm, int ofs, int len), void * parm);

    Iterate the set of disjoint sequential substrings over str starting at 
    position pos divided by any of the characters in splitStr.  An empty 
    splitStr causes the whole str to be iterated once.  The parm passed to 
    bsplitcb is passed on to cb.  If the function cb returns a value < 0, 
    then further iterating is halted and this value is returned by bsplitcb.

    Note: Non-destructive modification of str from within the cb function 
    while performing this split is not undefined.  bsplitscb behaves in 
    sequential lock step with calls to cb.  I.e., after returning from a cb 
    that return a non-negative integer, bsplitscb continues from the position 
    1 character after the last detected split character and it will halt 
    immediately if the length of str falls below this point.  However, if the 
    cb function destroys str, then it *must* return with a negative value, 
    otherwise bsplitscb will continue in an undefined manner.

    This function is provided as an incremental alternative to bsplits that 
    is abortable and which does not impose additional memory allocation.

    ..........................................................................

    extern int bsplitstrcb (const_bstring str, const_bstring splitStr, int pos,
	int (* cb) (void * parm, int ofs, int len), void * parm);

    Iterate the set of disjoint sequential substrings over str starting at 
    position pos divided by the entire substring splitStr.  An empty splitStr 
    causes each character of str to be iterated.  The parm passed to bsplitcb 
    is passed on to cb.  If the function cb returns a value < 0, then further 
    iterating is halted and this value is returned by bsplitcb.

    Note: Non-destructive modification of str from within the cb function 
    while performing this split is not undefined.  bsplitstrcb behaves in 
    sequential lock step with calls to cb.  I.e., after returning from a cb 
    that return a non-negative integer, bsplitstrcb continues from the position 
    1 character after the last detected split character and it will halt 
    immediately if the length of str falls below this point.  However, if the 
    cb function destroys str, then it *must* return with a negative value, 
    otherwise bsplitscb will continue in an undefined manner.

    This function is provided as an incremental alternative to bsplitstr that 
    is abortable and which does not impose additional memory allocation.

    ..........................................................................

    extern bstring bformat (const char * fmt, ...);

    Takes the same parameters as printf (), but rather than outputting 
    results to stdio, it forms a bstring which contains what would have been 
    output. Note that if there is an early generation of a '\0' character, 
    the bstring will be truncated to this end point.

    Note that %s format tokens correspond to '\0' terminated char * buffers, 
    not bstrings.  To print a bstring, first dereference data element of the 
    the bstring:

        /* b1->data needs to be '\0' terminated, so tagbstrings generated 
           by blk2tbstr () might not be suitable. */
        b0 = bformat ("Hello, %s", b1->data);

    Note that if the BSTRLIB_NOVSNP macro has been set when bstrlib has been 
    compiled the bformat function is not present.

    ..........................................................................

    extern int bformata (bstring b, const char * fmt, ...);

    In addition to the initial output buffer b, bformata takes the same 
    parameters as printf (), but rather than outputting results to stdio, it 
    appends the results to the initial bstring parameter. Note that if 
    there is an early generation of a '\0' character, the bstring will be 
    truncated to this end point.

    Note that %s format tokens correspond to '\0' terminated char * buffers, 
    not bstrings.  To print a bstring, first dereference data element of the 
    the bstring:

        /* b1->data needs to be '\0' terminated, so tagbstrings generated 
           by blk2tbstr () might not be suitable. */
        bformata (b0 = bfromcstr ("Hello"), ", %s", b1->data); 

    Note that if the BSTRLIB_NOVSNP macro has been set when bstrlib has been 
    compiled the bformata function is not present.

    ..........................................................................

    extern int bassignformat (bstring b, const char * fmt, ...);

    After the first parameter, it takes the same parameters as printf (), but 
    rather than outputting results to stdio, it outputs the results to 
    the bstring parameter b. Note that if there is an early generation of a 
    '\0' character, the bstring will be truncated to this end point.

    Note that %s format tokens correspond to '\0' terminated char * buffers, 
    not bstrings.  To print a bstring, first dereference data element of the 
    the bstring:

        /* b1->data needs to be '\0' terminated, so tagbstrings generated 
           by blk2tbstr () might not be suitable. */
        bassignformat (b0 = bfromcstr ("Hello"), ", %s", b1->data); 

    Note that if the BSTRLIB_NOVSNP macro has been set when bstrlib has been 
    compiled the bassignformat function is not present.

    ..........................................................................

    extern int bvcformata (bstring b, int count, const char * fmt, va_list arglist);

    The bvcformata function formats data under control of the format control 
    string fmt and attempts to append the result to b.  The fmt parameter is 
    the same as that of the printf function.  The variable argument list is 
    replaced with arglist, which has been initialized by the va_start macro.
    The size of the output is upper bounded by count.  If the required output
    exceeds count, the string b is not augmented with any contents and a value
    below BSTR_ERR is returned.  If a value below -count is returned then it
    is recommended that the negative of this value be used as an update to the
    count in a subsequent pass.  On other errors, such as running out of 
    memory, parameter errors or numeric wrap around BSTR_ERR is returned.  
    BSTR_OK is returned when the output is successfully generated and 
    appended to b.

    Note: There is no sanity checking of arglist, and this function is
    destructive of the contents of b from the b->slen point onward.  If there 
    is an early generation of a '\0' character, the bstring will be truncated 
    to this end point.

    Although this function is part of the external API for Bstrlib, the 
    interface and semantics (length limitations, and unusual return codes) 
    are fairly atypical.  The real purpose for this function is to provide an 
    engine for the bvformata macro.

    Note that if the BSTRLIB_NOVSNP macro has been set when bstrlib has been 
    compiled the bvcformata function is not present.

    ..........................................................................

    extern bstring bread (bNread readPtr, void * parm);
    typedef size_t (* bNread) (void *buff, size_t elsize, size_t nelem, 
                               void *parm);

    Read an entire stream into a bstring, verbatum.  The readPtr function 
    pointer is compatible with fread sematics, except that it need not obtain 
    the stream data from a file.  The intention is that parm would contain 
    the stream data context/state required (similar to the role of the FILE* 
    I/O stream parameter of fread.)

    Abstracting the block read function allows for block devices other than 
    file streams to be read if desired.  Note that there is an ANSI 
    compatibility issue if "fread" is used directly; see the ANSI issues 
    section below.

    ..........................................................................

    extern int breada (bstring b, bNread readPtr, void * parm);

    Read an entire stream and append it to a bstring, verbatum.  Behaves 
    like bread, except that it appends it results to the bstring b.
    BSTR_ERR is returned on error, otherwise 0 is returned.

    ..........................................................................

    extern bstring bgets (bNgetc getcPtr, void * parm, char terminator);
    typedef int (* bNgetc) (void * parm);

    Read a bstring from a stream.  As many bytes as is necessary are read
    until the terminator is consumed or no more characters are available from
    the stream.  If read from the stream, the terminator character will be
    appended to the end of the returned bstring.  The getcPtr function must 
    have the same semantics as the fgetc C library function (i.e., returning 
    an integer whose value is negative when there are no more characters 
    available, otherwise the value of the next available unsigned character 
    from the stream.)  The intention is that parm would contain the stream 
    data context/state required (similar to the role of the FILE* I/O stream 
    parameter of fgets.)  If no characters are read, or there is some other 
    detectable error, NULL is returned.

    bgets will never call the getcPtr function more often than necessary to
    construct its output (including a single call, if required, to determine 
    that the stream contains no more characters.)

    Abstracting the character stream function and terminator character allows
    for different stream devices and string formats other than '\n' 
    terminated lines in a file if desired (consider \032 terminated email 
    messages, in a UNIX mailbox for example.)

    For files, this function can be used analogously as fgets as follows:

        fp = fopen ( ... );
        if (fp) b = bgets ((bNgetc) fgetc, fp, '\n');

    (Note that only one terminator character can be used, and that '\0' is 
    not assumed to terminate the stream in addition to the terminator 
    character. This is consistent with the semantics of fgets.)

    ..........................................................................

    extern int bgetsa (bstring b, bNgetc getcPtr, void * parm, char terminator);

    Read from a stream and concatenate to a bstring.  Behaves like bgets, 
    except that it appends it results to the bstring b.  The value 1 is 
    returned if no characters are read before a negative result is returned
    from getcPtr.  Otherwise BSTR_ERR is returned on error, and 0 is returned
    in other normal cases.

    ..........................................................................

    extern int bassigngets (bstring b, bNgetc getcPtr, void * parm, char terminator);

    Read from a stream and concatenate to a bstring.  Behaves like bgets, 
    except that it assigns the results to the bstring b.  The value 1 is 
    returned if no characters are read before a negative result is returned
    from getcPtr.  Otherwise BSTR_ERR is returned on error, and 0 is returned
    in other normal cases.

    ..........................................................................

    extern struct bStream * bsopen (bNread readPtr, void * parm);
  
    Wrap a given open stream (described by a fread compatible function 
    pointer and stream handle) into an open bStream suitable for the bstring 
    library streaming functions.
  
    ..........................................................................

    extern void * bsclose (struct bStream * s);
  
    Close the bStream, and return the handle to the stream that was 
    originally used to open the given stream.  If s is NULL or detectably 
    invalid, NULL will be returned.

    ..........................................................................

    extern int bsbufflength (struct bStream * s, int sz);

    Set the length of the buffer used by the bStream.  If sz is the macro
    BSTR_BS_BUFF_LENGTH_GET (which is 0), the length is not set.  If s is 
    NULL or sz is negative, the function will return with BSTR_ERR, otherwise 
    this function returns with the previous length.

    ..........................................................................

    extern int bsreadln (bstring r, struct bStream * s, char terminator);
  
    Read a bstring terminated by the terminator character or the end of the
    stream from the bStream (s) and return it into the parameter r.  The 
    matched terminator, if found, appears at the end of the line read.  If
    the stream has been exhausted of all available data, before any can be
    read, BSTR_ERR is returned.  This function may read additional characters 
    into the stream buffer from the core stream that are not returned, but 
    will be retained for subsequent read operations.  When reading from high 
    speed streams, this function can perform significantly faster than bgets.

    ..........................................................................

    extern int bsreadlna (bstring r, struct bStream * s, char terminator);
  
    Read a bstring terminated by the terminator character or the end of the
    stream from the bStream (s) and concatenate it to the parameter r.  The 
    matched terminator, if found, appears at the end of the line read.  If
    the stream has been exhausted of all available data, before any can be
    read, BSTR_ERR is returned.  This function may read additional characters 
    into the stream buffer from the core stream that are not returned, but 
    will be retained for subsequent read operations.  When reading from high 
    speed streams, this function can perform significantly faster than bgets.

    ..........................................................................

    extern int bsreadlns (bstring r, struct bStream * s, bstring terminators);

    Read a bstring terminated by any character in the terminators bstring or 
    the end of the stream from the bStream (s) and return it into the 
    parameter r. This function may read additional characters from the core 
    stream that are not returned, but will be retained for subsequent read 
    operations.

    ..........................................................................

    extern int bsreadlnsa (bstring r, struct bStream * s, bstring terminators);

    Read a bstring terminated by any character in the terminators bstring or 
    the end of the stream from the bStream (s) and concatenate it to the 
    parameter r.  If the stream has been exhausted of all available data, 
    before any can be read, BSTR_ERR is returned.  This function may read 
    additional characters from the core stream that are not returned, but 
    will be retained for subsequent read operations.

    ..........................................................................

    extern int bsread (bstring r, struct bStream * s, int n);
  
    Read a bstring of length n (or, if it is fewer, as many bytes as is 
    remaining) from the bStream.  This function will read the minimum 
    required number of additional characters from the core stream.  When the
    stream is at the end of the file BSTR_ERR is returned, otherwise BSTR_OK 
    is returned.

    ..........................................................................

    extern int bsreada (bstring r, struct bStream * s, int n);
  
    Read a bstring of length n (or, if it is fewer, as many bytes as is 
    remaining) from the bStream and concatenate it to the parameter r.  This 
    function will read the minimum required number of additional characters 
    from the core stream.  When the stream is at the end of the file BSTR_ERR 
    is returned, otherwise BSTR_OK is returned.
  
    ..........................................................................
  
    extern int bsunread (struct bStream * s, const_bstring b);
  
    Insert a bstring into the bStream at the current position.  These 
    characters will be read prior to those that actually come from the core 
    stream.

    ..........................................................................

    extern int bspeek (bstring r, const struct bStream * s);
  
    Return the number of currently buffered characters from the bStream that 
    will be read prior to reads from the core stream, and append it to the
    the parameter r.

    ..........................................................................

    extern int bssplitscb (struct bStream * s, const_bstring splitStr, 
	int (* cb) (void * parm, int ofs, const_bstring entry), void * parm);

    Iterate the set of disjoint sequential substrings over the stream s 
    divided by any character from the bstring splitStr.  The parm passed to 
    bssplitscb is passed on to cb.  If the function cb returns a value < 0, 
    then further iterating is halted and this return value is returned by 
    bssplitscb.

    Note: At the point of calling the cb function, the bStream pointer is 
    pointed exactly at the position right after having read the split 
    character.  The cb function can act on the stream by causing the bStream
    pointer to move, and bssplitscb will continue by starting the next split
    at the position of the pointer after the return from cb.

    However, if the cb causes the bStream s to be destroyed then the cb must
    return with a negative value, otherwise bssplitscb will continue in an 
    undefined manner.

    This function is provided as way to incrementally parse through a file
    or other generic stream that in total size may otherwise exceed the 
    practical or desired memory available.  As with the other split callback
    based functions this is abortable and does not impose additional memory 
    allocation.

    ..........................................................................
    
    extern int bssplitstrcb (struct bStream * s, const_bstring splitStr, 
	int (* cb) (void * parm, int ofs, const_bstring entry), void * parm);
	
    Iterate the set of disjoint sequential substrings over the stream s 
    divided by the entire substring splitStr.  The parm passed to 
    bssplitstrcb is passed on to cb.  If the function cb returns a 
    value < 0, then further iterating is halted and this return value is 
    returned by bssplitstrcb.

    Note: At the point of calling the cb function, the bStream pointer is 
    pointed exactly at the position right after having read the split 
    character.  The cb function can act on the stream by causing the bStream
    pointer to move, and bssplitstrcb will continue by starting the next 
    split at the position of the pointer after the return from cb.

    However, if the cb causes the bStream s to be destroyed then the cb must
    return with a negative value, otherwise bssplitscb will continue in an 
    undefined manner.

    This function is provided as way to incrementally parse through a file
    or other generic stream that in total size may otherwise exceed the 
    practical or desired memory available.  As with the other split callback
    based functions this is abortable and does not impose additional memory 
    allocation.
    
    ..........................................................................
    
    extern int bseof (const struct bStream * s);

    Return the defacto "EOF" (end of file) state of a stream (1 if the 
    bStream is in an EOF state, 0 if not, and BSTR_ERR if stream is closed or 
    detectably erroneous.)  When the readPtr callback returns a value <= 0 
    the stream reaches its "EOF" state. Note that bunread with non-empty 
    content will essentially turn off this state, and the stream will not be 
    in its "EOF" state so long as its possible to read more data out of it.

    Also note that the semantics of bseof() are slightly different from 
    something like feof().  I.e., reaching the end of the stream does not
    necessarily guarantee that bseof() will return with a value indicating
    that this has happened.  bseof() will only return indicating that it has
    reached the "EOF" and an attempt has been made to read past the end of
    the bStream.

The macros
----------

    The macros described below are shown in a prototype form indicating their
    intended usage.  Note that the parameters passed to these macros will be
    referenced multiple times.  As with all macros, programmer care is 
    required to guard against unintended side effects.

    int blengthe (const_bstring b, int err);

    Returns the length of the bstring.  If the bstring is NULL err is 
    returned.

    ..........................................................................

    int blength (const_bstring b);

    Returns the length of the bstring.  If the bstring is NULL, the length 
    returned is 0.

    ..........................................................................

    int bchare (const_bstring b, int p, int c);

    Returns the p'th character of the bstring b.  If the position p refers to 
    a position that does not exist in the bstring or the bstring is NULL, 
    then c is returned.

    ..........................................................................

    char bchar (const_bstring b, int p);

    Returns the p'th character of the bstring b.  If the position p refers to 
    a position that does not exist in the bstring or the bstring is NULL, 
    then '\0' is returned.

    ..........................................................................

    char * bdatae (bstring b, char * err);

    Returns the char * data portion of the bstring b.  If b is NULL, err is
    returned.

    ..........................................................................

    char * bdata (bstring b);

    Returns the char * data portion of the bstring b.  If b is NULL, NULL is
    returned.

    ..........................................................................

    char * bdataofse (bstring b, int ofs, char * err);

    Returns the char * data portion of the bstring b offset by ofs.  If b is 
    NULL, err is returned.

    ..........................................................................

    char * bdataofs (bstring b, int ofs);

    Returns the char * data portion of the bstring b offset by ofs.  If b is 
    NULL, NULL is returned.

    ..........................................................................

    struct tagbstring var = bsStatic ("...");

    The bsStatic macro allows for static declarations of literal string 
    constants as struct tagbstring structures.  The resulting tagbstring does 
    not need to be freed or destroyed.  Note that this macro is only well
    defined for string literal arguments.  For more general string pointers, 
    use the btfromcstr macro.

    The resulting struct tagbstring is permanently write protected.  Attempts 
    to write to this struct tagbstring from any bstrlib function will lead to 
    BSTR_ERR being returned.  Invoking the bwriteallow macro onto this struct 
    tagbstring has no effect.

    ..........................................................................

    <void * blk, int len> <- bsStaticBlkParms ("...")

    The bsStaticBlkParms macro emits a pair of comma seperated parameters 
    corresponding to the block parameters for the block functions in Bstrlib 
    (i.e., blk2bstr, bcatblk, blk2tbstr, bisstemeqblk, bisstemeqcaselessblk.) 
    Note that this macro is only well defined for string literal arguments.

    Examples:

    bstring b = blk2bstr (bsStaticBlkParms ("Fast init. "));
    bcatblk (b, bsStaticBlkParms ("No frills fast concatenation."));

    These are faster than using bfromcstr() and bcatcstr() respectively 
    because the length of the inline string is known as a compile time
    constant.  Also note that seperate struct tagbstring declarations for
    holding the output of a bsStatic() macro are not required.

    ..........................................................................

    void btfromcstr (struct tagbstring& t, const char * s);

    Fill in the tagbstring t with the '\0' terminated char buffer s.  This 
    action is purely reference oriented; no memory management is done.  The 
    data member is just assigned s, and slen is assigned the strlen of s.  
    The s parameter is accessed exactly once in this macro.

    The resulting struct tagbstring is initially write protected.  Attempts 
    to write to this struct tagbstring in a write protected state from any 
    bstrlib function will lead to BSTR_ERR being returned.  Invoke the 
    bwriteallow on this struct tagbstring to make it writeable (though this 
    requires that s be obtained from a function compatible with malloc.)

    ..........................................................................

    void btfromblk (struct tagbstring& t, void * s, int len);

    Fill in the tagbstring t with the data buffer s with length len.  This 
    action is purely reference oriented; no memory management is done.  The 
    data member of t is just assigned s, and slen is assigned len.  Note that 
    the buffer is not appended with a '\0' character.  The s and len 
    parameters are accessed exactly once each in this macro.

    The resulting struct tagbstring is initially write protected.  Attempts 
    to write to this struct tagbstring in a write protected state from any 
    bstrlib function will lead to BSTR_ERR being returned.  Invoke the 
    bwriteallow on this struct tagbstring to make it writeable (though this 
    requires that s be obtained from a function compatible with malloc.)

    ..........................................................................

    void btfromblkltrimws (struct tagbstring& t, void * s, int len);

    Fill in the tagbstring t with the data buffer s with length len after it
    has been left trimmed.  This action is purely reference oriented; no 
    memory management is done.  The data member of t is just assigned to a 
    pointer inside the buffer s.  Note that the buffer is not appended with a 
    '\0' character.  The s and len parameters are accessed exactly once each 
    in this macro.

    The resulting struct tagbstring is permanently write protected.  Attempts 
    to write to this struct tagbstring from any bstrlib function will lead to 
    BSTR_ERR being returned.  Invoking the bwriteallow macro onto this struct 
    tagbstring has no effect.

    ..........................................................................

    void btfromblkrtrimws (struct tagbstring& t, void * s, int len);

    Fill in the tagbstring t with the data buffer s with length len after it
    has been right trimmed.  This action is purely reference oriented; no 
    memory management is done.  The data member of t is just assigned to a 
    pointer inside the buffer s.  Note that the buffer is not appended with a 
    '\0' character.  The s and len parameters are accessed exactly once each 
    in this macro.

    The resulting struct tagbstring is permanently write protected.  Attempts 
    to write to this struct tagbstring from any bstrlib function will lead to 
    BSTR_ERR being returned.  Invoking the bwriteallow macro onto this struct 
    tagbstring has no effect.

    ..........................................................................

    void btfromblktrimws (struct tagbstring& t, void * s, int len);

    Fill in the tagbstring t with the data buffer s with length len after it
    has been left and right trimmed.  This action is purely reference 
    oriented; no memory management is done.  The data member of t is just 
    assigned to a pointer inside the buffer s.  Note that the buffer is not 
    appended with a '\0' character.  The s and len parameters are accessed 
    exactly once each in this macro.

    The resulting struct tagbstring is permanently write protected.  Attempts 
    to write to this struct tagbstring from any bstrlib function will lead to 
    BSTR_ERR being returned.  Invoking the bwriteallow macro onto this struct 
    tagbstring has no effect.

    ..........................................................................

    void bmid2tbstr (struct tagbstring& t, bstring b, int pos, int len);

    Fill the tagbstring t with the substring from b, starting from position
    pos with a length len.  The segment is clamped by the boundaries of
    the bstring b.  This action is purely reference oriented; no memory 
    management is done.  Note that the buffer is not appended with a '\0' 
    character.  Note that the t parameter to this macro may be accessed 
    multiple times.  Note that the contents of t will become undefined 
    if the contents of b change or are destroyed.

    The resulting struct tagbstring is permanently write protected.  Attempts 
    to write to this struct tagbstring in a write protected state from any 
    bstrlib function will lead to BSTR_ERR being returned.  Invoking the
    bwriteallow macro on this struct tagbstring will have no effect.

    ..........................................................................

    void bvformata (int& ret, bstring b, const char * format, lastarg);

    Append the bstring b with printf like formatting with the format control
    string, and the arguments taken from the ... list of arguments after
    lastarg passed to the containing function.  If the containing function 
    does not have ... parameters or lastarg is not the last named parameter 
    before the ... then the results are undefined.  If successful, the 
    results are appended to b and BSTR_OK is assigned to ret.  Otherwise 
    BSTR_ERR is assigned to ret.

    Example:

    void dbgerror (FILE * fp, const char * fmt, ...) {
        int ret;
        bstring b;
        bvformata (ret, b = bfromcstr ("DBG: "), fmt, fmt);
        if (BSTR_OK == ret) fputs ((char *) bdata (b), fp);
        bdestroy (b);
    }

    Note that if the BSTRLIB_NOVSNP macro was set when bstrlib had been 
    compiled the bvformata macro will not link properly.  If the 
    BSTRLIB_NOVSNP macro has been set, the bvformata macro will not be 
    available.

    ..........................................................................

    void bwriteprotect (struct tagbstring& t);

    Disallow bstring from being written to via the bstrlib API.  Attempts to 
    write to the resulting tagbstring from any bstrlib function will lead to 
    BSTR_ERR being returned.

    Note: bstrings which are write protected cannot be destroyed via bdestroy.

    Note to C++ users: Setting a CBString as write protected will not prevent
    it from being destroyed by the destructor.

    ..........................................................................

    void bwriteallow (struct tagbstring& t);

    Allow bstring to be written to via the bstrlib API.  Note that such an 
    action makes the bstring both writable and destroyable.  If the bstring is
    not legitimately writable (as is the case for struct tagbstrings 
    initialized with a bsStatic value), the results of this are undefined.

    Note that invoking the bwriteallow macro may increase the number of 
    reallocs by one more than necessary for every call to bwriteallow 
    interleaved with any bstring API which writes to this bstring.

    ..........................................................................

    int biswriteprotected (struct tagbstring& t);

    Returns 1 if the bstring is write protected, otherwise 0 is returned.

===============================================================================

The bstest module
-----------------

The bstest module is just a unit test for the bstrlib module.  For correct
implementations of bstrlib, it should execute with 0 failures being reported.
This test should be utilized if modifications/customizations to bstrlib have
been performed.  It tests each core bstrlib function with bstrings of every 
mode (read-only, NULL, static and mutable) and ensures that the expected 
semantics are observed (including results that should indicate an error). It
also tests for aliasing support.  Passing bstest is a necessary but not a 
sufficient condition for ensuring the correctness of the bstrlib module.


The test module
---------------

The test module is just a unit test for the bstrwrap module.  For correct
implementations of bstrwrap, it should execute with 0 failures being 
reported.  This test should be utilized if modifications/customizations to 
bstrwrap have been performed.  It tests each core bstrwrap function with 
CBStrings write protected or not and ensures that the expected semantics are 
observed (including expected exceptions.)  Note that exceptions cannot be
disabled to run this test.  Passing test is a necessary but not a sufficient 
condition for ensuring the correctness of the bstrwrap module.

===============================================================================

Using Bstring and CBString as an alternative to the C library
-------------------------------------------------------------

First let us give a table of C library functions and the alternative bstring 
functions and CBString methods that should be used instead of them.

C-library         Bstring alternative             CBString alternative
---------         -------------------             --------------------
gets              bgets                           ::gets
strcpy            bassign                         = operator
strncpy           bassignmidstr                   ::midstr
strcat            bconcat                         += operator
strncat           bconcat + btrunc                += operator + ::trunc
strtok            bsplit, bsplits                 ::split
sprintf           b(assign)format                 ::format
snprintf          b(assign)format + btrunc        ::format + ::trunc
vsprintf          bvformata                       bvformata

vsnprintf         bvformata + btrunc              bvformata + btrunc
vfprintf          bvformata + fputs               use bvformata + fputs
strcmp            biseq, bstrcmp                  comparison operators.
strncmp           bstrncmp, memcmp                bstrncmp, memcmp
strlen            ->slen, blength                 ::length
strdup            bstrcpy                         constructor
strset            bpattern                        ::fill
strstr            binstr                          ::find
strpbrk           binchr                          ::findchr
stricmp           bstricmp                        cast & use bstricmp
strlwr            btolower                        cast & use btolower
strupr            btoupper                        cast & use btoupper
strrev            bReverse (aux module)           cast & use bReverse
strchr            bstrchr                         cast & use bstrchr
strspnp           use strspn                      use strspn
ungetc            bsunread                        bsunread

The top 9 C functions listed here are troublesome in that they impose memory 
management in the calling function.  The Bstring and CBstring interfaces have
built-in memory management, so there is far less code with far less potential 
for buffer overrun problems.  strtok can only be reliably called as a "leaf" 
calculation, since it (quite bizarrely) maintains hidden internal state.  And
gets is well known to be broken no matter what.  The Bstrlib alternatives do
not suffer from those sorts of problems.

The substitute for strncat can be performed with higher performance by using 
the blk2tbstr macro to create a presized second operand for bconcat.

C-library         Bstring alternative             CBString alternative
---------         -------------------             --------------------
strspn            strspn acceptable               strspn acceptable
strcspn           strcspn acceptable              strcspn acceptable
strnset           strnset acceptable              strnset acceptable
printf            printf acceptable               printf acceptable
puts              puts acceptable                 puts acceptable
fprintf           fprintf acceptable              fprintf acceptable
fputs             fputs acceptable                fputs acceptable
memcmp            memcmp acceptable               memcmp acceptable

Remember that Bstring (and CBstring) functions will automatically append the
'\0' character to the character data buffer.  So by simply accessing the data
buffer directly, ordinary C string library functions can be called directly 
on them.  Note that bstrcmp is not the same as memcmp in exactly the same way
that strcmp is not the same as memcmp.

C-library         Bstring alternative             CBString alternative
---------         -------------------             --------------------
fread             balloc + fread                  ::alloc + fread
fgets             balloc + fgets                  ::alloc + fgets

These are odd ones because of the exact sizing of the buffer required.  The 
Bstring and CBString alternatives requires that the buffers are forced to 
hold at least the prescribed length, then just use fread or fgets directly.  
However, typically the automatic memory management of Bstring and CBstring 
will make the typical use of fgets and fread to read specifically sized 
strings unnecessary.

Implementation Choices
----------------------

Overhead:
.........

The bstring library has more overhead versus straight char buffers for most
functions.  This overhead is essentially just the memory management and 
string header allocation.  This overhead usually only shows up for small 
string manipulations.  The performance loss has to be considered in
light of the following:

1) What would be the performance loss of trying to write this management
   code in one's own application?
2) Since the bstring library source code is given, a sufficiently powerful 
   modern inlining globally optimizing compiler can remove function call 
   overhead.

Since the data type is exposed, a developer can replace any unsatisfactory
function with their own inline implementation.  And that is besides the main 
point of what the better string library is mainly meant to provide.  Any 
overhead lost has to be compared against the value of the safe abstraction 
for coupling memory management and string functionality.

Performance of the C interface:
...............................

The algorithms used have performance advantages versus the analogous C 
library functions.  For example:

1. bfromcstr/blk2str/bstrcpy versus strcpy/strdup.  By using memmove instead 
   of strcpy, the break condition of the copy loop is based on an independent 
   counter (that should be allocated in a register) rather than having to 
   check the results of the load.  Modern out-of-order executing CPUs can 
   parallelize the final branch mis-predict penality with the loading of the 
   source string.  Some CPUs will also tend to have better built-in hardware 
   support for counted memory moves than load-compare-store.  (This is a 
   minor, but non-zero gain.)
2. biseq versus strcmp.  If the strings are unequal in length, bsiseq will
   return in O(1) time.  If the strings are aliased, or have aliased data
   buffers, biseq will return in O(1) time.  strcmp will always be O(k), 
   where k is the length of the common prefix or the whole string if they are 
   identical.
3. ->slen versus strlen.  ->slen is obviously always O(1), while strlen is
   always O(n) where n is the length of the string.
4. bconcat versus strcat.  Both rely on precomputing the length of the 
   destination string argument, which will favor the bstring library.  On 
   iterated concatenations the performance difference can be enormous.
5. bsreadln versus fgets.  The bsreadln function reads large blocks at a time
   from the given stream, then parses out lines from the buffers directly.
   Some C libraries will implement fgets as a loop over single fgetc calls.
   Testing indicates that the bsreadln approach can be several times faster
   for fast stream devices (such as a file that has been entirely cached.)
6. bsplits/bsplitscb versus strspn.  Accelerators for the set of match 
   characters are generated only once.

Practical testing indicates that in general Bstrlib is never signifcantly 
slower than the C library for common operations, while very often having a
performance advantage that ranges from significant to massive.  Even for 
functions like b(n)inchr versus str(c)spn() (where, in theory, there is no 
advantage for the Bstrlib architecture) the performance of Bstrlib is vastly 
superior to most tested C library implementations.

Some of Bstrlib's extra functionality also lead to inevitable performance
advantages over typical C solutions.  For example, using the blk2tbstr macro,
one can (in O(1) time) generate an internal substring by reference while not 
disturbing the original string.  If disturbing the original string is not an
option, typically, a comparable C solution would have to make a copy of the
substring to provide similar functionality.  Another example is reverse 
character set scanning -- the str(c)spn functions only scan in a forward 
direction which can complicate some parsing algorithms.

Where high performance char * based algorithms are available, Bstrlib can 
still leverage them by accessing the ->data field on bstrings.  So 
realistically Bstrlib can never be significantly slower than any standard
'\0' terminated char * based solutions.

Performance of the C++ interface:
.................................

The C++ interface has been designed with an emphasis on abstraction and safety
first.  However, since it is substantially a wrapper for the C bstring 
functions, for longer strings the performance comments described in the 
"Performance of the C interface" section above still apply. Note that the 
(CBString *) type can be directly cast to a (bstring) type, and passed as 
parameters to the C functions (though a CBString must never be passed to 
bdestroy.)

Probably the most controversial choice is performing full bounds checking on
the [] operator.  This decision was made because 1) the fast alternative of
not bounds checking is still available by first casting the CBString to a 
(const char *) buffer or to a (struct tagbstring) then derefencing .data and 
2) because the lack of bounds checking is seen as one of the main weaknesses 
of C/C++ versus other languages.  This check being done on every access leads 
to individual character extraction being actually slower than other languages 
in this one respect (other language's compilers will normally dedicate more 
resources on hoisting or removing bounds checking as necessary) but otherwise 
bring C++ up to the level of other languages in terms of functionality.

It is common for other C++ libraries to leverage the abstractions provided by
C++ to use reference counting and "copy on write" policies.  While these 
techniques can speed up some scenarios, they impose a problem with respect to
thread safety.  bstrings and CBStrings can be properly protected with 
"per-object" mutexes, meaning that two bstrlib calls can be made and execute
simultaneously, so long as the bstrings and CBstrings are distinct.  With a
reference count and alias before copy on write policy, global mutexes are 
required that prevent multiple calls to the strings library to execute 
simultaneously regardless of whether or not the strings represent the same
string.

One interesting trade off in CBString is that the default constructor is not 
trivial.  I.e., it always prepares a ready to use memory buffer.  The purpose 
is to ensure that there is a uniform internal composition for any functioning 
CBString that is compatible with bstrings.  It also means that the other 
methods in the class are not forced to perform "late initialization" checks. 
In the end it means that construction of CBStrings are slower than other 
comparable C++ string classes.  Initial testing, however, indicates that 
CBString outperforms std::string and MFC's CString, for example, in all other 
operations.  So to work around this weakness it is recommended that CBString 
declarations be pushed outside of inner loops.

Practical testing indicates that with the exception of the caveats given 
above (constructors and safe index character manipulations) the C++ API for
Bstrlib generally outperforms popular standard C++ string classes.  Amongst 
the standard libraries and compilers, the quality of concatenation operations 
varies wildly and very little care has gone into search functions.  Bstrlib
dominates those performance benchmarks.

Memory management:
..................

The bstring functions which write and modify bstrings will automatically 
reallocate the backing memory for the char buffer whenever it is required to 
grow.  The algorithm for resizing chosen is to snap up to sizes that are a 
power of two which are sufficient to hold the intended new size.  Memory 
reallocation is not performed when the required size of the buffer is 
decreased.  This behavior can be relied on, and is necessary to make the 
behaviour of balloc deterministic.  This trades off additional memory usage 
for decreasing the frequency for required reallocations:

1. For any bstring whose size never exceeds n, its buffer is not ever 
   reallocated more than log_2(n) times for its lifetime.
2. For any bstring whose size never exceeds n, its buffer is never more than
   2*(n+1) in length.  (The extra characters beyond 2*n are to allow for the 
   implicit '\0' which is always added by the bstring modifying functions.)

Decreasing the buffer size when the string decreases in size would violate 1) 
above and in real world case lead to pathological heap thrashing.  Similarly, 
allocating more tightly than "least power of 2 greater than necessary" would 
lead to a violation of 1) and have the same potential for heap thrashing.

Property 2) needs emphasizing.  Although the memory allocated is always a 
power of 2, for a bstring that grows linearly in size, its buffer memory also 
grows linearly, not exponentially.  The reason is that the amount of extra 
space increases with each reallocation, which decreases the frequency of 
future reallocations.

Obviously, given that bstring writing functions may reallocate the data 
buffer backing the target bstring, one should not attempt to cache the data 
buffer address and use it after such bstring functions have been called.  
This includes making reference struct tagbstrings which alias to a writable 
bstring.

balloc or bfromcstralloc can be used to preallocate the minimum amount of 
space used for a given bstring.  This will reduce even further the number of 
times the data portion is reallocated.  If the length of the string is never 
more than one less than the memory length then there will be no further 
reallocations.

Note that invoking the bwriteallow macro may increase the number of reallocs 
by one more than necessary for every call to bwriteallow interleaved with any 
bstring API which writes to this bstring.

The library does not use any mechanism for automatic clean up for the C API.
Thus explicit clean up via calls to bdestroy() are required to avoid memory
leaks.

Constant and static tagbstrings:
................................

A struct tagbstring can be write protected from any bstrlib function using 
the bwriteprotect macro.  A write protected struct tagbstring can then be 
reset to being writable via the bwriteallow macro.  There is, of course, no 
protection from attempts to directly access the bstring members.  Modifying a 
bstring which is write protected by direct access has undefined behavior.

static struct tagbstrings can be declared via the bsStatic macro.  They are
considered permanently unwritable.  Such struct tagbstrings's are declared 
such that attempts to write to it are not well defined.  Invoking either 
bwriteallow or bwriteprotect on static struct tagbstrings has no effect.  

struct tagbstring's initialized via btfromcstr or blk2tbstr are protected by 
default but can be made writeable via the bwriteallow macro.  If bwriteallow 
is called on such struct tagbstring's, it is the programmer's responsibility 
to ensure that:

1) the buffer supplied was allocated from the heap.
2) bdestroy is not called on this tagbstring (unless the header itself has
   also been allocated from the heap.)
3) free is called on the buffer to reclaim its memory.

bwriteallow and bwriteprotect can be invoked on ordinary bstrings (they have 
to be dereferenced with the (*) operator to get the levels of indirection 
correct) to give them write protection.

Buffer declaration:
...................

The memory buffer is actually declared "unsigned char *" instead of "char *".
The reason for this is to trigger compiler warnings whenever uncasted char
buffers are assigned to the data portion of a bstring.  This will draw more
diligent programmers into taking a second look at the code where they 
have carelessly left off the typically required cast.  (Research from 
AT&T/Lucent indicates that additional programmer eyeballs is one of the most 
effective mechanisms at ferreting out bugs.)

Function pointers:
..................

The bgets, bread and bStream functions use function pointers to obtain 
strings from data streams.  The function pointer declarations have been 
specifically chosen to be compatible with the fgetc and fread functions. 
While this may seem to be a convoluted way of implementing fgets and fread 
style functionality, it has been specifically designed this way to ensure 
that there is no dependency on a single narrowly defined set of device 
interfaces, such as just stream I/O.  In the embedded world, its quite 
possible to have environments where such interfaces may not exist in the 
standard C library form.  Furthermore, the generalization that this opens up 
allows for more sophisticated uses for these functions (performing an fgets 
like function on a socket, for example.) By using function pointers, it also 
allows such abstract stream interfaces to be created using the bstring library 
itself while not creating a circular dependency.

Use of int's for sizes:
.......................

This is just a recognition that 16bit platforms with requirements for strings
that are larger than 64K and 32bit+ platforms with requirements for strings
that are larger than 4GB are pretty marginal.  The main focus is for 32bit 
platforms, and emerging 64bit platforms with reasonable < 4GB string 
requirements.  Using ints allows for negative values which has meaning 
internally to bstrlib.

Semantic consideration:
.......................

Certain care needs to be taken when copying and aliasing bstrings.  A bstring
is essentially a pointer type which points to a multipart abstract data
structure.  Thus usage, and lifetime of bstrings have semantics that follow
these considerations.  For example:

    bstring a, b;
    struct tagbstring t;

    a = bfromcstr("Hello"); /* Create new bstring and copy "Hello" into it. */
    b = a;                  /* Alias b to the contents of a.                */
    t = *a;                 /* Create a current instance pseudo-alias of a. */
    bconcat (a, b);         /* Double a and b, t is now undefined.          */
    bdestroy (a);           /* Destroy the contents of both a and b.        */

Variables of type bstring are really just references that point to real 
bstring objects.  The equal operator (=) creates aliases, and the asterisk 
dereference operator (*) creates a kind of alias to the current instance (which
is generally not useful for any purpose.)  Using bstrcpy() is the correct way 
of creating duplicate instances.  The ampersand operator (&) is useful for 
creating aliases to struct tagbstrings (remembering that constructed struct 
tagbstrings are not writable by default.)

CBStrings use complete copy semantics for the equal operator (=), and thus do 
not have these sorts of issues.  

Debugging:
..........

Bstrings have a simple, exposed definition and construction, and the library 
itself is open source.  So most debugging is going to be fairly straight- 
forward.  But the memory for bstrings come from the heap, which can often be
corrupted indirectly, and it might not be obvious what has happened even from
direct examination of the contents in a debugger or a core dump.  There are
some tools such as Purify, Insure++ and Electric Fence which can help solve
such problems, however another common approach is to directly instrument the
calls to malloc, realloc, calloc, free, memcpy, memmove and/or other calls
by overriding them with macro definitions.

Although the user could hack on the Bstrlib sources directly as necessary to 
perform such an instrumentation, Bstrlib comes with a built-in mechanism for 
doing this.  By defining the macro BSTRLIB_MEMORY_DEBUG and providing an 
include file named memdbg.h this will force the core Bstrlib modules to 
attempt to include this file.  In such a file, macros could be defined which 
overrides Bstrlib's useage of the C standard library.

Rather than calling malloc, realloc, free, memcpy or memmove directly, Bstrlib
emits the macros bstr__alloc, bstr__realloc, bstr__free, bstr__memcpy and 
bstr__memmove in their place respectively.  By default these macros are simply
assigned to be equivalent to their corresponding C standard library function
call.  However, if they are given earlier macro definitions (via the back 
door include file) they will not be given their default definition.  In this
way Bstrlib's interface to the standard library can be changed but without
having to directly redefine or link standard library symbols (both of which 
are not strictly ANSI C compliant.)

An example definition might include:

    #define bstr__alloc(sz) X_malloc ((sz), __LINE__, __FILE__)

which might help contextualize heap entries in a debugging environment.

The NULL parameter and sanity checking of bstrings is part of the Bstrlib 
API, and thus Bstrlib itself does not present any different modes which would 
correspond to "Debug" or "Release" modes.  Bstrlib always contains mechanisms
which one might think of as debugging features, but retains the performance
and small memory footprint one would normally associate with release mode
code.

Integration Microsoft's Visual Studio debugger:
...............................................

Microsoft's Visual Studio debugger has a capability of customizable mouse 
float over data type descriptions.  This is accomplished by editting the
AUTOEXP.DAT file to include the following:

    ; new for CBString
    tagbstring =slen=<slen> mlen=<mlen> <data,st>
    Bstrlib::CBStringList =count=<size()>

In Visual C++ 6.0 this file is located in the directory:

    C:\Program Files\Microsoft Visual Studio\Common\MSDev98\Bin

and in Visual Studio .NET 2003 its located here:

    C:\Program Files\Microsoft Visual Studio .NET 2003\Common7\Packages\Debugger

This will improve the ability of debugging with Bstrlib under Visual Studio.

Security
--------

Bstrlib does not come with explicit security features outside of its fairly
comprehensive error detection, coupled with its strict semantic support.  
That is to say that certain common security problems, such as buffer overrun,
constant overwrite, arbitrary truncation etc, are far less likely to happen 
inadvertently.  Where it does help, Bstrlib maximizes its advantage by 
providing developers a simple adoption path that lets them leave less secure
string mechanisms behind.  The library will not leave developers wanting, so 
they will be less likely to add new code using a less secure string library 
to add functionality that might be missing from Bstrlib.

That said there are a number of security ideas not addressed by Bstrlib:

1. Race condition exploitation (i.e., verifying a string's contents, then 
raising the privilege level and execute it as a shell command as two 
non-atomic steps) is well beyond the scope of what Bstrlib can provide.  It 
should be noted that MFC's built-in string mutex actually does not solve this 
problem either -- it just removes immediate data corruption as a possible 
outcome of such exploit attempts (it can be argued that this is worse, since 
it will leave no trace of the exploitation).  In general race conditions have 
to be dealt with by careful design and implementation; it cannot be assisted 
by a string library.

2. Any kind of access control or security attributes to prevent usage in 
dangerous interfaces such as system().  Perl includes a "trust" attribute 
which can be endowed upon strings that are intended to be passed to such 
dangerous interfaces.  However, Perl's solution reflects its own limitations 
-- notably that it is not a strongly typed language.  In the example code for 
Bstrlib, there is a module called taint.cpp.  It demonstrates how to write a 
simple wrapper class for managing "untainted" or trusted strings using the 
type system to prevent questionable mixing of ordinary untrusted strings with 
untainted ones then passing them to dangerous interfaces.  In this way the 
security correctness of the code reduces to auditing the direct usages of 
dangerous interfaces or promotions of tainted strings to untainted ones.

3. Encryption of string contents is way beyond the scope of Bstrlib.  
Maintaining encrypted string contents in the futile hopes of thwarting things 
like using system-level debuggers to examine sensitive string data is likely 
to be a wasted effort (imagine a debugger that runs at a higher level than a 
virtual processor where the application runs).  For more standard encryption 
usages, since the bstring contents are simply binary blocks of data, this 
should pose no problem for usage with other standard encryption libraries.

Compatibility
-------------

The Better String Library is known to compile and function correctly with the 
following compilers:

  - Microsoft Visual C++
  - Watcom C/C++
  - Intel's C/C++ compiler (Windows)
  - The GNU C/C++ compiler (cygwin and Linux on PPC64)
  - Borland C
  - Turbo C

Setting of configuration options should be unnecessary for these compilers
(unless exceptions are being disabled or STLport has been added to WATCOM 
C/C++).  Bstrlib has been developed with an emphasis on portability.  As such 
porting it to other compilers should be straight forward.  This package 
includes a porting guide (called porting.txt) which explains what issues may 
exist for porting Bstrlib to different compilers and environments.

ANSI issues
-----------

1. The function pointer types bNgetc and bNread have prototypes which are very
similar to, but not exactly the same as fgetc and fread respectively.  
Basically the FILE * parameter is replaced by void *.  The purpose of this
was to allow one to create other functions with fgetc and fread like 
semantics without being tied to ANSI C's file streaming mechanism.  I.e., one 
could very easily adapt it to sockets, or simply reading a block of memory, 
or procedurally generated strings (for fractal generation, for example.)

The problem is that invoking the functions (bNgetc)fgetc and (bNread)fread is 
not technically legal in ANSI C.  The reason being that the compiler is only 
able to coerce the function pointers themselves into the target type, however 
are unable to perform any cast (implicit or otherwise) on the parameters 
passed once invoked.  I.e., if internally void * and FILE * need some kind of 
mechanical coercion, the compiler will not properly perform this conversion 
and thus lead to undefined behavior.

Apparently a platform from Data General called "Eclipse" and another from 
Tandem called "NonStop" have a different representation for pointers to bytes 
and pointers to words, for example, where coercion via casting is necessary.  
(Actual confirmation of the existence of such machines is hard to come by, so
it is prudent to be skeptical about this information.)  However, this is not 
an issue for any known contemporary platforms.  One may conclude that such 
platforms are effectively apocryphal even if they do exist.

To correctly work around this problem to the satisfaction of the ANSI 
limitations, one needs to create wrapper functions for fgets and/or 
fread with the prototypes of bNgetc and/or bNread respectively which performs 
no other action other than to explicitely cast the void * parameter to a
FILE *, and simply pass the remaining parameters straight to the function 
pointer call.

The wrappers themselves are trivial:

    size_t freadWrap (void * buff, size_t esz, size_t eqty, void * parm) {
        return fread (buff, esz, eqty, (FILE *) parm);
    }

    int fgetcWrap (void * parm) {
        return fgetc ((FILE *) parm);
    }

These have not been supplied in bstrlib or bstraux to prevent unnecessary
linking with file I/O functions.

2. vsnprintf is not available on all compilers.  Because of this, the bformat
and bformata functions (and format and formata methods) are not guaranteed to 
work properly.  For those compilers that don't have vsnprintf, the 
BSTRLIB_NOVSNP macro should be set before compiling bstrlib, and the format 
functions/method will be disabled.

The more recent ANSI C standards have specified the required inclusion of a 
vsnprintf function.

3. The bstrlib function names are not unique in the first 6 characters.  This
is only an issue for older C compiler environments which do not store more 
than 6 characters for function names.

4. The bsafe module defines macros and function names which are part of the
C library.  This simply overrides the definition as expected on all platforms
tested, however it is not sanctioned by the ANSI standard.  This module is 
clearly optional and should be omitted on platforms which disallow its 
undefined semantics.

In practice the real issue is that some compilers in some modes of operation 
can/will inline these standard library functions on a module by module basis 
as they appear in each.  The linker will thus have no opportunity to override
the implementation of these functions for those cases.  This can lead to 
inconsistent behaviour of the bsafe module on different platforms and 
compilers.

===============================================================================

Comparison with Microsoft's CString class
-----------------------------------------

Although developed independently, CBStrings have very similar functionality to 
Microsoft's CString class.  However, the bstring library has significant 
advantages over CString:

1. Bstrlib is a C-library as well as a C++ library (using the C++ wrapper).

    - Thus it is compatible with more programming environments and 
      available to a wider population of programmers.

2. The internal structure of a bstring is considered exposed.

    - A single contiguous block of data can be cut into read-only pieces by 
      simply creating headers, without allocating additional memory to create 
      reference copies of each of these sub-strings.
    - In this way, using bstrings in a totally abstracted way becomes a choice
      rather than an imposition.  Further this choice can be made differently
      at different layers of applications that use it.

3. Static declaration support precludes the need for constructor 
   invocation.

    - Allows for static declarations of constant strings that has no 
      additional constructor overhead.

4. Bstrlib is not attached to another library.

    - Bstrlib is designed to be easily plugged into any other library 
      collection, without dependencies on other libraries or paradigms (such 
      as "MFC".)

The bstring library also comes with a few additional functions that are not 
available in the CString class:

    - bsetstr
    - bsplit
    - bread
    - breplace (this is different from CString::Replace())
    - Writable indexed characters (for example a[i]='x')

Interestingly, although Microsoft did implement mid$(), left$() and right$() 
functional analogues (these are functions from GWBASIC) they seem to have
forgotten that mid$() could be also used to write into the middle of a string.  
This functionality exists in Bstrlib with the bsetstr() and breplace() 
functions.

Among the disadvantages of Bstrlib is that there is no special support for 
localization or wide characters.  Such things are considered beyond the scope 
of what bstrings are trying to deliver.  CString essentially supports the 
older UCS-2 version of Unicode via widechar_t as an application-wide compile 
time switch.

CString's also use built-in mechanisms for ensuring thread safety under all
situations.  While this makes writing thread safe code that much easier, this
built-in safety feature has a price -- the inner loops of each CString method
runs in its own critical section (grabbing and releasing a light weight mutex
on every operation.)  The usual way to decrease the impact of a critical 
section performance penalty is to amortize more operations per critical 
section.  But since the implementation of CStrings is fixed as a one critical 
section per-operation cost, there is no way to leverage this common 
performance enhancing idea.

The search facilities in Bstrlib are comparable to those in MFC's CString 
class, though it is missing locale specific collation.  But because Bstrlib 
is interoperable with C's char buffers, it will allow programmers to write 
their own string searching mechanism (such as Boyer-Moore), or be able to 
choose from a variety of available existing string searching libraries (such 
as those for regular expressions) without difficulty.

Microsoft used a very non-ANSI conforming trick in its implementation to
allow printf() to use the "%s" specifier to output a CString correctly.  This
can be convenient, but it is inherently not portable.  CBString requires an
explicit cast, while bstring requires the data member to be dereferenced.  
Microsoft's own documentation recommends casting, instead of relying on this
feature.

Comparison with C++'s std::string
---------------------------------

This is the C++ language's standard STL based string class.

1. There is no C implementation.
2. The [] operator is not bounds checked.
3. Missing a lot of useful functions like printf-like formatting.
4. Some sub-standard std::string implementations (SGI) are necessarily unsafe 
   to use with multithreading.
5. Limited by STL's std::iostream which in turn is limited by ifstream which
   can only take input from files.  (Compare to CBStream's API which can take
   abstracted input.)
6. Extremely uneven performance across implementations.

Comparison with ISO C TR 24731 proposal
---------------------------------------

Following the ISO C99 standard, Microsoft has proposed a group of C library 
extensions which are supposedly "safer and more secure".  This proposal is
expected to be adopted by the ISO C standard which follows C99.

The proposal reveals itself to be very similar to Microsoft's "StrSafe" 
library. The functions are basically the same as other standard C library 
string functions except that destination parameters are paired with an 
additional length parameter of type rsize_t.  rsize_t is the same as size_t, 
however, the range is checked to make sure its between 1 and RSIZE_MAX.  Like 
Bstrlib, the functions perform a "parameter check".  Unlike Bstrlib, when a 
parameter check fails, rather than simply outputing accumulatable error 
statuses, they call a user settable global error function handler, and upon 
return of control performs no (additional) detrimental action.  The proposal 
covers basic string functions as well as a few non-reenterable functions 
(asctime, ctime, and strtok).

1. Still based solely on char * buffers (and therefore strlen() and strcat()
   is still O(n), and there are no faster streq() comparison functions.)
2. No growable string semantics.
3. Requires manual buffer length synchronization in the source code.
4. No attempt to enhance functionality of the C library.
5. Introduces a new error scenario (strings exceeding RSIZE_MAX length).

The hope is that by exposing the buffer length requirements there will be 
fewer buffer overrun errors.  However, the error modes are really just 
transformed, rather than removed.  The real problem of buffer overflows is 
that they all happen as a result of erroneous programming.  So forcing 
programmers to manually deal with buffer limits, will make them more aware of 
the problem but doesn't remove the possibility of erroneous programming.  So
a programmer that erroneously mixes up the rsize_t parameters is no better off
from a programmer that introduces potential buffer overflows through other 
more typical lapses.  So at best this may reduce the rate of erroneous 
programming, rather than making any attempt at removing failure modes.

The error handler can discriminate between types of failures, but does not 
take into account any callsite context.  So the problem is that the error is 
going to be manifest in a piece of code, but there is no pointer to that 
code.  It would seem that passing in the call site __FILE__, __LINE__ as 
parameters would be very useful, but the API clearly doesn't support such a 
thing (it would increase code bloat even more than the extra length 
parameter does, and would require macro tricks to implement).

The Bstrlib C API takes the position that error handling needs to be done at 
the callsite, and just tries to make it as painless as possible.  Furthermore, 
error modes are removed by supporting auto-growing strings and aliasing.  For 
capturing errors in more central code fragments, Bstrlib's C++ API uses 
exception handling extensively, which is superior to the leaf-only error 
handler approach.

Comparison with Managed String Library CERT proposal
----------------------------------------------------

The main webpage for the managed string library:
http://www.cert.org/secure-coding/managedstring.html

Robert Seacord at CERT has proposed a C string library that he calls the 
"Managed String Library" for C. Like Bstrlib, it introduces a new type 
which is called a managed string. The structure of a managed string 
(string_m) is like a struct tagbstring but missing the length field.  This 
internal structure is considered opaque. The length is, like the C standard 
library, always computed on the fly by searching for a terminating NUL on 
every operation that requires it. So it suffers from every performance 
problem that the C standard library suffers from. Interoperating with C 
string APIs (like printf, fopen, or anything else that takes a string 
parameter) requires copying to additionally allocating buffers that have to 
be manually freed -- this makes this library probably slower and more 
cumbersome than any other string library in existence.

The library gives a fully populated error status as the return value of every
string function.  The hope is to be able to diagnose all problems 
specifically from the return code alone.  Comparing this to Bstrlib, which 
aways returns one consistent error message, might make it seem that Bstrlib 
would be harder to debug; but this is not true.  With Bstrlib, if an error 
occurs there is always enough information from just knowing there was an error
and examining the parameters to deduce exactly what kind of error has 
happened.  The managed string library thus gives up nested function calls 
while achieving little benefit, while Bstrlib does not.

One interesting feature that "managed strings" has is the idea of data 
sanitization via character set whitelisting.  That is to say, a globally
definable filter that makes any attempt to put invalid characters into strings 
lead to an error and not modify the string.  The author gives the following 
example:

    // create valid char set
    if (retValue = strcreate_m(&str1, "abc") ) {
      fprintf(
        stderr, 
        "Error %d from strcreate_m.\n", 
        retValue
      );
    }  
    if (retValue = setcharset(str1)) {
      fprintf(
        stderr, 
        "Error %d from  setcharset().\n", 
        retValue
      );
    }
    if (retValue = strcreate_m(&str1, "aabbccabc")) {
      fprintf(
        stderr, 
        "Error %d from strcreate_m.\n", 
        retValue
      );
    }
    // create string with invalid char set
    if (retValue = strcreate_m(&str1, "abbccdabc")) {
      fprintf(
        stderr, 
        "Error %d from strcreate_m.\n", 
        retValue
      );
    } 

Which we can compare with a more Bstrlib way of doing things:

    bstring bCreateWithFilter (const char * cstr, const_bstring filter) {
      bstring b = bfromcstr (cstr);
      if (BSTR_ERR != bninchr (b, filter) && NULL != b) {
        fprintf (stderr, "Filter violation.\n");
        bdestroy (b);
        b = NULL;
      }
      return b;
    }

    struct tagbstring charFilter = bsStatic ("abc");
    bstring str1 = bCreateWithFilter ("aabbccabc", &charFilter);
    bstring str2 = bCreateWithFilter ("aabbccdabc", &charFilter);

The first thing we should notice is that with the Bstrlib approach you can 
have different filters for different strings if necessary.  Furthermore, 
selecting a charset filter in the Managed String Library is uni-contextual. 
That is to say, there can only be one such filter active for the entire 
program, which means its usage is not well defined for intermediate library 
usage (a library that uses it will interfere with user code that uses it, and 
vice versa.)  It is also likely to be poorly defined in multi-threading 
environments.

There is also a question as to whether the data sanitization filter is checked
on every operation, or just on creation operations.  Since the charset can be
set arbitrarily at run time, it might be set *after* some managed strings have
been created.  This would seem to imply that all functions should run this
additional check every time if there is an attempt to enforce this.  This 
would make things tremendously slow.  On the other hand, if it is assumed that
only creates and other operations that take char *'s as input need be checked 
because the charset was only supposed to be called once at and before any 
other managed string was created, then one can see that its easy to cover 
Bstrlib with equivalent functionality via a few wrapper calls such as the 
example given above.

And finally we have to question the value of sanitation in the first place.
For example, for httpd servers, there is generally a requirement that the
URLs parsed have some form that avoids undesirable translation to local file 
system filenames or resources.  The problem is that the way URLs can be 
encoded, it must be completely parsed and translated to know if it is using 
certain invalid character combinations.  That is to say, merely filtering
each character one at a time is not necessarily the right way to ensure that
a string has safe contents.

In the article that describes this proposal, it is claimed that it fairly 
closely approximates the existing C API semantics.  On this point we should 
compare this "closeness" with Bstrlib:

                      Bstrlib                     Managed String Library
                      -------                     ----------------------

Pointer arithmetic    Segment arithmetic          N/A

Use in C Std lib      ->data, or bdata{e}         getstr_m(x,*) ... free(x)

String literals       bsStatic, bsStaticBlk       strcreate_m()

Transparency          Complete                    None

Its pretty clear that the semantic mapping from C strings to Bstrlib is fairly 
straightforward, and that in general semantic capabilities are the same or
superior in Bstrlib.  On the other hand the Managed String Library is either
missing semantics or changes things fairly significantly.

Comparison with Annexia's c2lib library
---------------------------------------

This library is available at:
http://www.annexia.org/freeware/c2lib

1. Still based solely on char * buffers (and therefore strlen() and strcat()
   is still O(n), and there are no faster streq() comparison functions.)
   Their suggestion that alternatives which wrap the string data type (such as 
   bstring does) imposes a difficulty in interoperating with the C langauge's
   ordinary C string library is not founded.
2. Introduction of memory (and vector?) abstractions imposes a learning
   curve, and some kind of memory usage policy that is outside of the strings
   themselves (and therefore must be maintained by the developer.)
3. The API is massive, and filled with all sorts of trivial (pjoin) and
   controvertial (pmatch -- regular expression are not sufficiently 
   standardized, and there is a very large difference in performance between
   compiled and non-compiled, REs) functions.  Bstrlib takes a decidely 
   minimal approach -- none of the functionality in c2lib is difficult or
   challenging to implement on top of Bstrlib (except the regex stuff, which
   is going to be difficult, and controvertial no matter what.)
4. Understanding why c2lib is the way it is pretty much requires a working 
   knowledge of Perl.  bstrlib requires only knowledge of the C string library
   while providing just a very select few worthwhile extras.
5. It is attached to a lot of cruft like a matrix math library (that doesn't
   include any functions for getting the determinant, eigenvectors, 
   eigenvalues, the matrix inverse, test for singularity, test for 
   orthogonality, a grahm schmit orthogonlization, LU decomposition ... I 
   mean why bother?)

Convincing a development house to use c2lib is likely quite difficult.  It
introduces too much, while not being part of any kind of standards body.  The
code must therefore be trusted, or maintained by those that use it.  While
bstring offers nothing more on this front, since its so much smaller, covers 
far less in terms of scope, and will typically improve string performance, 
the barrier to usage should be much smaller.

Comparison with stralloc/qmail
------------------------------

More information about this library can be found here:
http://www.canonical.org/~kragen/stralloc.html or here:
http://cr.yp.to/lib/stralloc.html

1. Library is very very minimal.  A little too minimal.
2. Untargetted source parameters are not declared const.
3. Slightly different expected emphasis (like _cats function which takes an
   ordinary C string char buffer as a parameter.)  Its clear that the 
   remainder of the C string library is still required to perform more
   useful string operations.

The struct declaration for their string header is essentially the same as that
for bstring.  But its clear that this was a quickly written hack whose goals
are clearly a subset of what Bstrlib supplies.  For anyone who is served by
stralloc, Bstrlib is complete substitute that just adds more functionality.

stralloc actually uses the interesting policy that a NULL data pointer 
indicates an empty string.  In this way, non-static empty strings can be
declared without construction.  This advantage is minimal, since static empty
bstrings can be declared inline without construction, and if the string needs
to be written to it should be constructed from an empty string (or its first 
initializer) in any event.

wxString class
--------------

This is the string class used in the wxWindows project.  A description of 
wxString can be found here:
http://www.wxwindows.org/manuals/2.4.2/wx368.htm#wxstring

This C++ library is similar to CBString.  However, it is littered with 
trivial functions (IsAscii, UpperCase, RemoveLast etc.)

1. There is no C implementation.
2. The memory management strategy is to allocate a bounded fixed amount of 
   additional space on each resize, meaning that it does not have the 
   log_2(n) property that Bstrlib has (it will thrash very easily, cause 
   massive fragmentation in common heap implementations, and can easily be a 
   common source of performance problems).
3. The library uses a "copy on write" strategy, meaning that it has to deal 
   with multithreading problems.

Vstr
----

This is a highly orthogonal C string library with an emphasis on 
networking/realtime programming.  It can be found here:
http://www.and.org/vstr/

1. The convoluted internal structure does not contain a '\0' char * compatible
   buffer, so interoperability with the C library a non-starter.
2. The API and implementation is very large (owing to its orthogonality) and
   can lead to difficulty in understanding its exact functionality.
3. An obvious dependency on gnu tools (confusing make configure step)
4. Uses a reference counting system, meaning that it is not likely to be 
   thread safe.

The implementation has an extreme emphasis on performance for nontrivial 
actions (adds, inserts and deletes are all constant or roughly O(#operations) 
time) following the "zero copy" principle.  This trades off performance of 
trivial functions (character access, char buffer access/coersion, alias 
detection) which becomes significantly slower, as well as incremental 
accumulative costs for its searching/parsing functions.  Whether or not Vstr 
wins any particular performance benchmark will depend a lot on the benchmark, 
but it should handily win on some, while losing dreadfully on others.

The learning curve for Vstr is very steep, and it doesn't come with any 
obvious way to build for Windows or other platforms without gnu tools.  At 
least one mechanism (the iterator) introduces a new undefined scenario 
(writing to a Vstr while iterating through it.)  Vstr has a very large 
footprint, and is very ambitious in its total functionality.  Vstr has no C++ 
API.

Vstr usage requires context initialization via vstr_init() which must be run
in a thread-local context.  Given the totally reference based architecture
this means that sharing Vstrings across threads is not well defined, or at
least not safe from race conditions.  This API is clearly geared to the older
standard of fork() style multitasking in UNIX, and is not safely transportable
to modern shared memory multithreading available in Linux and Windows.  There 
is no portable external solution making the library thread safe (since it 
requires a mutex around each Vstr context -- not each string.)

In the documentation for this library, a big deal is made of its self hosted
s(n)printf-like function.  This is an issue for older compilers that don't
include vsnprintf(), but also an issue because Vstr has a slow conversion to
'\0' terminated char * mechanism.  That is to say, using "%s" to format data 
that originates from Vstr would be slow without some sort of native function 
to do so.  Bstrlib sidesteps the issue by relying on what snprintf-like 
functionality does exist and having a high performance conversion to a char * 
compatible string so that "%s" can be used directly.

Str Library
-----------

This is a fairly extensive string library, that includes full unicode support
and targetted at the goal of out performing MFC and STL.  The architecture, 
similarly to MFC's CStrings, is a copy on write reference counting mechanism.

http://www.utilitycode.com/str/default.aspx

1. Commercial.
2. C++ only.

This library, like Vstr, uses a ref counting system.  There is only so deeply
I can analyze it, since I don't have a license for it.  However, performance 
improvements over MFC's and STL, doesn't seem like a sufficient reason to 
move your source base to it.  For example, in the future, Microsoft may 
improve the performance CString.

It should be pointed out that performance testing of Bstrlib has indicated 
that its relative performance advantage versus MFC's CString and STL's 
std::string is at least as high as that for the Str library.

libmib astrings
---------------

A handful of functional extensions to the C library that add dynamic string
functionality.
http://www.mibsoftware.com/libmib/astring/

This package basically references strings through char ** pointers and assumes
they are pointing to the top of an allocated heap entry (or NULL, in which 
case memory will be newly allocated from the heap.)  So its still up to user
to mix and match the older C string functions with these functions whenever
pointer arithmetic is used (i.e., there is no leveraging of the type system
to assert semantic differences between references and base strings as Bstrlib 
does since no new types are introduced.)  Unlike Bstrlib, exact string length 
meta data is not stored, thus requiring a strlen() call on *every* string 
writing operation.  The library is very small, covering only a handful of C's
functions.

While this is better than nothing, it is clearly slower than even the 
standard C library, less safe and less functional than Bstrlib.

To explain the advantage of using libmib, their website shows an example of
how dangerous C code:

    char buf[256];
    char *pszExtraPath = ";/usr/local/bin";

    strcpy(buf,getenv("PATH")); /* oops! could overrun! */
    strcat(buf,pszExtraPath); /* Could overrun as well! */

    printf("Checking...%s\n",buf); /* Some printfs overrun too! */

is avoided using libmib:

    char *pasz = 0;      /* Must initialize to 0 */
    char *paszOut = 0;
    char *pszExtraPath = ";/usr/local/bin";

    if (!astrcpy(&pasz,getenv("PATH"))) /* malloc error */ exit(-1);
    if (!astrcat(&pasz,pszExtraPath)) /* malloc error */ exit(-1);
	
    /* Finally, a "limitless" printf! we can use */
    asprintf(&paszOut,"Checking...%s\n",pasz);fputs(paszOut,stdout);

    astrfree(&pasz); /* Can use free(pasz) also. */
    astrfree(&paszOut);

However, compare this to Bstrlib:

    bstring b, out;

    bcatcstr (b = bfromcstr (getenv ("PATH")), ";/usr/local/bin");
    out = bformat ("Checking...%s\n", bdatae (b, "<Out of memory>"));
    /* if (out && b) */ fputs (bdatae (out, "<Out of memory>"), stdout);
    bdestroy (b);
    bdestroy (out);

Besides being shorter, we can see that error handling can be deferred right
to the very end.  Also, unlike the above two versions, if getenv() returns 
with NULL, the Bstrlib version will not exhibit undefined behavior. 
Initialization starts with the relevant content rather than an extra 
autoinitialization step.

libclc
------

An attempt to add to the standard C library with a number of common useful
functions, including additional string functions.
http://libclc.sourceforge.net/

1. Uses standard char * buffer, and adopts C 99's usage of "restrict" to pass
   the responsibility to guard against aliasing to the programmer.
2. Adds no safety or memory management whatsoever.
3. Most of the supplied string functions are completely trivial.

The goals of libclc and Bstrlib are clearly quite different.

fireString
----------

http://firestuff.org/

1. Uses standard char * buffer, and adopts C 99's usage of "restrict" to pass
   the responsibility to guard against aliasing to the programmer.
2. Mixes char * and length wrapped buffers (estr) functions, doubling the API
   size, with safety limited to only half of the functions.

Firestring was originally just a wrapper of char * functionality with extra
length parameters.  However, it has been augmented with the inclusion of the 
estr type which has similar functionality to stralloc.  But firestring does 
not nearly cover the functional scope of Bstrlib.

Safe C String Library
---------------------

A library written for the purpose of increasing safety and power to C's string
handling capabilities.
http://www.zork.org/safestr/safestr.html

1. While the safestr_* functions are safe in of themselves, interoperating 
   with char * string has dangerous unsafe modes of operation.
2. The architecture of safestr's causes the base pointer to change.  Thus,
   its not practical/safe to store a safestr in multiple locations if any 
   single instance can be manipulated.
3. Dependent on an additional error handling library.
4. Uses reference counting, meaning that it is either not thread safe or
   slow and not portable.

I think the idea of reallocating (and hence potentially changing) the base 
pointer is a serious design flaw that is fatal to this architecture.  True 
safety is obtained by having automatic handling of all common scenarios 
without creating implicit constraints on the user.

Because of its automatic temporary clean up system, it cannot use "const" 
semantics on input arguments.  Interesting anomolies such as:

    safestr_t s, t;
    s = safestr_replace (t = SAFESTR_TEMP ("This is a test"),
                         SAFESTR_TEMP (" "), SAFESTR_TEMP ("."));
    /* t is now undefined. */

are possible.  If one defines a function which takes a safestr_t as a 
parameter, then the function would not know whether or not the safestr_t is 
defined after it passes it to a safestr library function.  The author 
recommended method for working around this problem is to examine the 
attributes of the safestr_t within the function which is to modify any of
its parameters and play games with its reference count.  I think, therefore,
that the whole SAFESTR_TEMP idea is also fatally broken.

The library implements immutability, optional non-resizability, and a "trust" 
flag.  This trust flag is interesting, and suggests that applying any 
arbitrary sequence of safestr_* function calls on any set of trusted strings 
will result in a trusted string.  It seems to me, however, that if one wanted 
to implement a trusted string semantic, one might do so by actually creating 
a different *type* and only implement the subset of string functions that are 
deemed safe (i.e., user input would be excluded, for example.)  This, in 
essence, would allow the compiler to enforce trust propogation at compile 
time rather than run time.  Non-resizability is also interesting, however, 
it seems marginal (i.e., to want a string that cannot be resized, yet can be 
modified and yet where a fixed sized buffer is undesirable.)

===============================================================================

Examples
--------

    Dumping a line numbered file:

    FILE * fp;
    int i, ret;
    struct bstrList * lines;
    struct tagbstring prefix = bsStatic ("-> ");

    if (NULL != (fp = fopen ("bstrlib.txt", "rb"))) {
        bstring b = bread ((bNread) fread, fp);
        fclose (fp);
        if (NULL != (lines = bsplit (b, '\n'))) {
            for (i=0; i < lines->qty; i++) {
                binsert (lines->entry[i], 0, &prefix, '?');
                printf ("%04d: %s\n", i, bdatae (lines->entry[i], "NULL"));
            }
            bstrListDestroy (lines);
        }
        bdestroy (b);
    }

For numerous other examples, see bstraux.c, bstraux.h and the example archive.

===============================================================================

License
-------

The Better String Library is available under either the BSD license (see the
accompanying license.txt) or the Gnu Public License version 2 (see the 
accompanying gpl.txt) at the option of the user.

===============================================================================

Acknowledgements
----------------

The following individuals have made significant contributions to the design
and testing of the Better String Library:

Bjorn Augestad
Clint Olsen
Darryl Bleau
Fabian Cenedese
Graham Wideman
Ignacio Burgueno
International Business Machines Corporation
Ira Mica
John Kortink
Manuel Woelker
Marcel van Kervinck
Michael Hsieh
Richard A. Smith
Simon Ekstrom
Wayne Scott

===============================================================================
