BZTCL


Node:Top, Next:, Up:(dir)

BZTCL

This file documents BZTCL version 0.6.0 a TCL binding to the BZip2 library. This package is distributed under a modified BSD license.

BZTCL provides access to the BZip2 compression and decompression functionalities, as long as the capability to read and write files in the .bz2 format, the one used by bzip2.


Node:Overview, Next:, Previous:Top, Up:Top

Overview of the library

This file documents BZTCL version 0.6.0 a TCL binding to the BZip2 library. BZTCL provides access to the BZip2 compression and decompression functionalities, as long as the capability to read and write files in the .bz2 format, the one used by the bzip2 command line utility.

BZTCL has been tested with BZip2 version 1.0.2. To compile BZTCL the library (libbz2.so.1.0.2 on UNIX-like systems) and the header file (bzlib.h) must be installed.

BZTCL depends on TCLMORE: another extension to the TCL language. A version of TCLMORE must be installed on the system to use BZTCL; TCLMORE should be available on the Net from the same site of BZTCL. Version 0.7 is required.

BZTCL provides a single command [bzlib] in the interpreter in which the bzlib package is loaded. To load the package:

package require bzlib

or:

package require bzlib 0.6.0

The C API of BZTCL provides the same functionalities of the TCL commands at the C level. We can access the functions including the installed header file bztcl.h.


Node:Compress and Decompress, Next:, Previous:Overview, Up:Top

Compressing and decompressing chunks of data

bzlib compress data ?options? Command
Compresses the data stored in data, returns the compressed data. The data is treated as a ByteArray object.

Recognised options are (options are optional):

-blockSize INTEGER
the block size to be used for compression; it should be a number between 1 and 9 inclusive, 9 gives the best compression but takes more memory; the value is in units of 100 kilobytes;
-workFactor INTEGER
controls the worst case behaviour; it must be a number between 0 and 250 inclusive, with 0 equivalent to the default value of 30; lower values will cause the library to switch easily to a slower algorithm when a worst case input data is found; 30 is a good value;

default is an acceptable value for both -blockSize and -workFactor, it corresponds to 3 and 30 respectively.

bzlib decompress data ?options? Command
Uncompresses the ByteArray representation of data.

Supported options (options are optional):

-size SIZE
the first guess for the size of the target buffer, or an integer <=0; in the latter case the library is informed that we don't know the size of the uncompressed data, so it has to figure out for itself the size of requested memory block;
-small
controls the memory usage: if selected, the library will adopt a slower decompression algorithm which uses less memory.

Examples

Compress and uncompress a chunk of data:

package require bzlib

set size     [string length $data]
set zipped   [bzlib compress   $data]
set unzipped [bzlib decompress $zipped -size $size]


Node:BZ Files, Next:, Previous:Compress and Decompress, Up:Top

Reading and writing files in .bz2 format

The BZTCL interface to files is similar to the standard TCL interface for files.

Examples

Commands

bzlib open fileName mode ?options? Command
Opens a .bz2 file for reading or writing. mode can be RDONLY or WRONLY.

Supported options:

-blockSize INTEGER
-workFactor INTEGER
used only for writable files, their meaning is the same as in [bzlib compress] (Compress and Decompress for details);
-small
used only for readable files, its meaning is the same as in [bzlib uncompress] (Compress and Decompress for details).

In writing mode: if the file already exists, it is truncated to zero length.

Returns the file identifier string, which matches the pattern bztcl[0-9]+.

All the built-in TCL commands acting on channels can be used to operate on the BZTCL channels.


Node:BZ Streams, Next:, Previous:BZ Files, Up:Top

Processing streams of data

BZTCL allows the compression and decompression of streams of data. We can read or write data from or to a channel and compress/decompress it; this works by stacking upon the channel a transformation layer.

Streams are used to send data from one software entity to another; the concept of TCL transformation has to be interpreted as a way to process all the data flowing between the two entities, not only chunks of it: if we need to send compressed data in chunks intermixed with normal bytes, we have to use the [bzlib compress] and [bzlib decompress] commands, not the transformation.

A stacked transformation can be detached with the [more unstack] command, provided by TCLMORE.

Examples

Command

bzlib stream channel mode ?option ...? Command
Stacks a transformation upon an already existent channel. mode can be RDONLY, WRONLY or RDWR, and must be compatible with the open mode of channel. Both the directions can be in compress or decompress mode, even both in compress or both in decompress mode.

Description of supported options follows.

-input compress
-input decompress
Selects a compression or decompression stream for the input direction. The default if this option is not used is decompression.
-output compress
-output decompress
Selects a compression or decompression stream for the output direction. The default if this option is not used is compression.
-blockSize INTEGER
-workFactor INTEGER
Used only for compression streams, their meaning is the same as in [bzlib compress] (Compress and Decompress for details).
-small
Used only for decompression streams, its meaning is the same as in [bzlib decompress] (Compress and Decompress for details).

A stacked transformation accepts the following configuration options through [fconfigure].

-flush input
-flush output
Forces the flush operation on data in the internal context of the BZip2 stream.

For output streams: this causes as much data as possible to be flushed from the internal context to the output buffer and sent to the underlying channel, we should invoke [flush $channel] before this.

For input streams: this causes as much data as possible to be flushed from the internal context to the output buffer and to be available for reading; no new data is read from the underlying channel.

-finish input
-finish output
Forces the finalisation of the stream.

For input streams: this causes all the data stored in the internal stream context to be available for reading; after this: no more data can be read from the underlying channel until the transformation is unstacked. No data is read from the underlying channel.

For output streams: all the data is flushed to the output buffer as if [flush $channel] has been invoked, and an attempt is made to write all of it to the underlying channel; after this: no data can be written to the transformation.

-bufferSize SIZE
Configures the output buffer size. If the buffer is currently allocated and the number of bytes in it can fit in a buffer of new size, the buffer is reallocated (Stream Functions for details on the output buffer).
-bufferDelta NUMBER
Configures the new hysteresis parameter for the reallocation of the output buffer. When the buffer has less than NUMBER free bytes, it is reallocated enlarging it by NUMBER bytes. When the buffer has at least a number of unused bytes equal to two times NUMBER, the buffer is restricted, leaving at least NUMBER bytes free.

The following options are available in read-only mode to query the state of the channel.

-endOfInputStream
-endOfOutputStream
The value returned by [fconfigure] is true if the stream is in decompression mode and the end of stream has been detected; this means that the stream will no more process data: it is locked until we unstack the transformation. In this state we can read data from the internal context of the stream until all of it has been consumed. Some unprocessed data may have been read from the underlying channel: it is lost.


Node:Misc Commands, Next:, Previous:BZ Streams, Up:Top

Miscellaneous commands

bzlib info version Command
Returns a string describing the BZip2 library version.

bzlib info isWorkFactor value Command
Returns true if value is valid as work factor, and so it can be used as argument to the commands.

bzlib info isBlockSize value Command
Returns true if value is valid as block size, and so it can be used as argument to the commands.


Node:Error Codes, Next:, Previous:Misc Commands, Up:Top

Error code strings

In case of errors, the BZTCL commands set the ::errorCode variable to LOGIC or RUNTIME. LOGIC is for errors caused by misuse of the interface or invalid arguments; a well debugged script should never raise a LOGIC error. RUNTIME is for "things that may happen", like errors writing files.


Node:Data Types, Next:, Previous:Error Codes, Up:Top

BZTCL data types

Bztcl_File Opaque Pointer Typedef
A reference to a channel descriptor.

Bztcl_Stream Opaque Pointer Typedef
A reference to a compression or decompression filter.

Bztcl_Config Struct Pointer
Configuration values. Fields description follows.
int blockSize
The number of 100 kilobytes blocks to be used for compression, it should be a number between 1 and 9 inclusive; 9 gives the best compression but takes most memory.
int workFactor
Controls the worst case behaviour, see the BZip2 documentation, function BZ2_bzCompressInit(), for details; it must be a number between 0 and 250 inclusive, with 0 equivalent to the default value of 30; lower values will cause the library to switch easily to a slower algorithm when a worst case input data is found; 30 is a good value.
int small
Controls the memory usage, can be one or zero; if it's one: the library will use a slower decompression algorithm which uses less memory.

BZTCL_DEFAULT_CONFIG Macro
Defaults for Bztcl_Config.

More_Block Struct Typedef
Type of input and output memory blocks used as source and destination for write and read operations on buffers. It is declared by TCLMORE. Fields description follows.
int len
The number of bytes in the block. This is an int, rather than a size_t, to make it easy to interface with TCL code.
More_BytePtr ptr
Pointer to the first byte in the block.

More_BytePtr Macro
Macro that represents a pointer to byte type. It is declared by TCLMORE.


Node:Errors, Next:, Previous:Data Types, Up:Top

Error descriptors

When functions need to return an error, they return an error descriptor of type More_Error. This type is handled with functions provided by TCLMORE (See Error Reporting, for details).

If the returned value is NULL the function completed successfully.

File and stream functions will store in the integer error specific field of the error descriptor an appropriate (more or less) value of errno.


Node:Compress Functions, Next:, Previous:Errors, Up:Top

Compress/uncompress functions

More_Error Bztcl_CompressObj (srcObj, cfg, dstObjVar) Function
Compresses data stored in a Tcl_Obj. The source buffer is the ByteArray representation of the source object. Arguments description follows.
Tcl_Obj *srcObj
Pointer to the object that holds the source data.
Bztcl_Config * cfg
Pointer to a structure holding the configuration for compression.
Tcl_Obj ** dstObjVar
Pointer to a variable that will hold the new object with the compressed data.

The new object with the compressed data will have reference counter set to zero.

More_Error Bztcl_UncompressObj (srcObj, cfg, size, dstObjVar) Function
Decompresses data in a Tcl_Obj.

The uncompress function of BZip2 has no way to predict the size of the uncompressed data: if the caller can guess a good value (for example because during compression the size has been recorded somewhere) it can pass it in size. If the caller cannot guess a good value it can pass size==0.

Arguments description follows.

Tcl_Obj * srcObj
A pointer to the object that holds the compressed data.
Bztcl_Config * cfg
Pointer to a structure holding the configuration for compression.
int size
The first guess for the size of the target buffer, or an integer less than or equal to zero.
Tcl_Obj ** dstObjVar
Pointer to a variable that will hold the new object with the uncompressed data.

The new object with the uncompressed data will have the reference counter set to zero.


Node:File Functions, Next:, Previous:Compress Functions, Up:Top

Compressing and decompressing files

Tcl_Channel Bztcl_MakeChannel (Bztcl_File descriptor) Function
Creates a new channel interface to an already created file descriptor. BZip2 separates the interface for readable files from the one for writable files. This function handles this automatically inspecting the state of the descriptor. Returns the channel token.

More_Error Bztcl_Open (fileName, openMode, tokenVar) Function
Opens a .bz2 file. Actually, this function opens only the file using the standard C stream function; the BZ stream will be opened at the first write or read operation.
CONST char * fileName
Pointer to a string representing the file name.
int openMode
TCL_READABLE or TCL_WRITABLE.
Bztcl_File * tokenVar
Pointer to a variable that will hold the file descriptor.

Returns NULL or an error descriptor.

More_Error Bztcl_Close (Bztcl_File token) Function
Closes a file. Returns NULL or an error descriptor. It's safe to invoke this function multiple times for an already closed BZTCL descriptor: this is to allow the invoking function to retry to close the FILE stream.

More_Error Bztcl_ReadObj (Bztcl_File token, int number, Tcl_Obj ** objVar) Function
Uncompresses data from a file into a Tcl_Obj, seen as a byte array. The file must have been opened for reading. number is the number of bytes to read. Returns NULL or an error descriptor.

More_Error Bztcl_Read (Bztcl_File token, More_Block * blockPtr) Function
Reads a data from a file and stores the uncompressed data in a memory block. If end-of-stream is reached, the EOF status is recorded in the file descriptor (so that it can be retrieved with Bztcl_Eof()) and the function returns no error. Returns NULL or an error descriptor.

blockPtr is a pointer to the target block. Before the call the length field must hold the number of bytes to read. After the call, the length field holds the number of byte actually read.

More_Error Bztcl_WriteObj (Bztcl_File token, Tcl_Obj * srcObj) Function
Compresses data in a Tcl_Obj, seen as a byte array, and writes it in a file. The file must have been opened for writing. Returns NULL or an error descriptor.

More_Error Bztcl_Write (Bztcl_File token, More_Block block) Function
Writes a block of data into a file. Returns NULL or an error descriptor.

int Bztcl_Eof (Bztcl_File token) Function
Queries the state of the descriptor to see if an EOF happened on the last read operation. Returns true or false.

int Bztcl_GetHandle (Bztcl_File token) Function
Returns the file handle.

void Bztcl_SetSmall (Bztcl_File token, int small) Function
Configures a new "small" mode for a file descriptor, it must be one or zero (Compress Functions for details). This function must be used only to configure a readable descriptor prior to the first invocation to Bztcl_Read(), else it will have no effect.

void Bztcl_SetWorkFactor (Bztcl_File token, int workFactor) Function
Configures a new work factor value for a file descriptor. This function must be used only to configure a writable descriptor prior to the first invocation to Bztcl_Write(), else it will have no effect.

void Bztcl_SetBlockSize (Bztcl_File token, int blockSize) Function
Configures a new block size value for a file descriptor. This function must be used only to configure a descriptor prior to the first call to Bztcl_Write().

int Bztcl_GetSmall (Bztcl_File token) Function
Returns the current small model: one or zero.

int Bztcl_GetWorkFactor (Bztcl_File token) Function
Returns the current value of the work factor.

int Bztcl_GetBlockSize (Bztcl_File token) Function
Return the current value of the block size.

int Bztcl_ReadableFile (Bztcl_File token) Function
Returns true if the file is readable.

int Bztcl_WritableFile (Bztcl_File token) Function
Returns true if the file is writable.


Node:Stream Functions, Next:, Previous:File Functions, Up:Top

Compressing and decompressing streams of bytes

To understand how this module works we first have to learn how TCLMORE implements the transformation interface (See Stream transformation, for details).

The BZip2 stream interface functions define and manage the internal compression or decompression context, BZTCL handles the output buffer and manages memory allocation.


Node:Stream Functions Examples, Next:, Up:Stream Functions

Examples


Bztcl_Stream    token;
More_Block      buffer, input, output;
More_Error      error;
Bztcl_Config    config;
int             compress, numberOfBytesUsed;


compress = ...;
/* fill "config" */

error = Bztcl_StreamInit(&token, compress, &config);
if (error) { ... }

More_BlockAlloc(buffer, 4096*32);

for (input = buffer, fill_a_block_with_data(&input);
     input.len;
     input = buffer, fill_a_block_with_data(&input))
  {
    error = Bztcl_StreamWrite(token, &input);
    if (error)
      {
        goto Error; /* example: corrupted data */
      }
    if (input.len)
      {
        /*
          Some data was not absorbed: it is still in "buffer",
          referenced by "input", so do something with it.

            For compression streams: this has to be considered
          an error so: goto Error. This should never happen
          anyway.

            For decompression streams: the end of stream was
          found. This is not an exception: it is normal operation
          when we read data from a source that supplies the stream
          and other data after it.
        */
        break;
      }

    output = Bztcl_StreamOutput(token);
    if (output.len)
      {
        numberOfBytesUsed = use_the_processed_data(output);
        Bztcl_StreamRead(token, numberOfBytesUsed);
      }
  }

error = Bztcl_StreamFinish(token);
if (error) { ... }
output = Bztcl_StreamOutput(token);
use_the_processed_data(output);

Error:
More_BlockFree(buffer);
Bztcl_StreamFinal(token);


Node:Stream Functions Interface, Next:, Previous:Stream Functions Examples, Up:Stream Functions

Interface functions

All the stream memory is managed with the ckalloc(), ckrealloc() and ckfree() functions.

In case of error the functions return an error descriptor of type More_Error; this type is declared by TCLMORE. The only thing to do after an error is to call the finalisation function to release all the resources.

The integer data error-specific field of the error descriptor is set to an appropriate (more or less) errno value. This is to allow the use of the stream functions with the transformation layer implemented by TCLMORE; errno codes are poor information but this is what the TCL channel interface offers. In the future an -error option for [fconfigure] may be implemented, somewhat like the one offered by [socket].


Node:Stream Functions Interface Init and Final, Next:, Up:Stream Functions Interface

Initialisation and finalisation

More_Error Bztcl_StreamInit (tokenVar, compress, config) Function
Initialises a (de)compression stream. Memory is completely managed by BZTCL.
Bztcl_Stream * tokenVar
Pointer to a variable that will hold the reference to the new stream descriptor.
int compress
If true the new stream will compress data, else will decompress.
Bztcl_Config *config
Pointer to the structure holding the configuration (Data Types for details). All the values are used, coherently with the mode (compression or decompression).

Returns NULL or an error descriptor.

void Bztcl_StreamFinal (Bztcl_Stream token) Function
Releases all the resources. This function may be used to abort a stream, at any instant, or to clean up after the stream has been finished.

Remark: this function is a wrapper for: BZ2_bzCompressEnd() and BZ2_bzDecompressEnd() (see the BZLIB documentation); an error from these functions can occur only if the data structures are corrupted, so this function always releases all the memory blocks and ignores any error returned by them.


Node:Stream Functions Interface Writing, Next:, Previous:Stream Functions Interface Init and Final, Up:Stream Functions Interface

Writing data

More_Error Bztcl_StreamWrite (Bztcl_Stream token, More_Block *blockPtr) Function
Writes data into a stream. blockPtr must reference a variable referencing the input block: its pointer field must reference a block of memory, its length field must be the number of bytes to write into the stream from the block.

For compression streams: all the data will be absorbed, the block structure will be cleared to zero.

For decompression streams:

  • if the end of stream is not detected, all the data is absorbed and the block structure is cleared to zero;
  • if the end of stream is found: no more data is processed after it, the internal context is finalised, no error is returned and data is available from the output buffer.

If end-of-stream is detected the block is updated to reference the portion of data that has not been processed: before:

              end of stream
                    v
|---------------------------------------|
^
blockPtr->ptr

.........................................
             blockPtr->len

after:

              end of stream
                    v
|---------------------------------------|
                     ^
                   blockPtr->ptr

                     ....................
                       blockPtr->len

Returns NULL or an error descriptor.


Node:Stream Functions Interface Reading, Next:, Previous:Stream Functions Interface Writing, Up:Stream Functions Interface

Reading data

More_Block Bztcl_StreamOutput (Bztcl_Stream token) Function
Accesses the output buffer. This function is meant to be used just before Bztcl_StreamRead(). Returns a block representing the output buffer; it may reference a NULL buffer.

void Bztcl_StreamRead (Bztcl_Stream token, int number) Function
Declares that a number of bytes has been read from the output buffer. Shifts the unread data to the beginning of the output buffer. Possibly restricts the output buffer.


Node:Stream Functions Interface Flush and Finish, Next:, Previous:Stream Functions Interface Reading, Up:Stream Functions Interface

Flusing and finishing

More_Error Bztcl_StreamFlush (Bztcl_Stream token) Function
Flushes as much data as possible from the internal context. It is required if we need to send data somewhere quickly, and usually it is not used. Its use is discouraged because it slows down the operations and may degrade the compression ratio. After the flush operation has been carried out, new data may have been added to the output buffer, and so it can be read. Returns NULL or an error descriptor.

More_Error Bztcl_StreamFinish (Bztcl_Stream token) Function
For compression streams: flushes data from the internal context to the output buffer, marking it as end of stream.

For decompression streams: flushes data from the internal context until the end of stream is found; this works only if we have written all the data into the stream, else a "corrupted data" error is raised.

After a call to this function data is available with Bztcl_StreamOutput(). When data has been read the only allowed operation is finalisation.


Node:Stream Functions Interface Output Buffer, Next:, Previous:Stream Functions Interface Flush and Finish, Up:Stream Functions Interface

Output buffer

The dimension of the output buffer can be configured after the stream has been initialised: the buffer is allocated only at the first invocation to Bztcl_StreamWrite().

Repeated invocations to the write function will enlarge the buffer if the unused space is less than a configurable hysteresis value. The formula used to compute the new size follows.

newSize = oldSize + oldSize % hysteresis + hysteresis;

The size will always be a multiple of the hysteresis value, and after reallocation there always be a number of unused bytes greater or equal to the hysteresis value. Invocations to the write function will never shrink the buffer.

Invocations to Bztcl_StreamRead() may reallocate the buffer, shrinking it. The reallocation will take place if the number of unused bytes is greater than twice the configured hysteresis value. The formula used to compute the new size follows.

newSize = usedSpace + usedSpace % hysteresis + hysteresis;

The default value for both the size and hysteresis is 4096*4.

void Bztcl_StreamSetBufferSize (Bztcl_Stream token, int size) Function
Selects a new size for the output buffer.

If the buffer is not allocated: size will be its initial size.

If the buffer is allocated at the time of the call and the number of used bytes is less than size: the buffer is reallocated to the new size.

void Bztcl_StreamSetBufferHysteresis (Bztcl_Stream token, int hysteresis) Function
Selects a new hysteresis for the reallocation of the buffer.

int Bztcl_StreamGetBufferSize (Bztcl_Stream token) Function
Returns the current buffer size.

int Bztcl_StreamGetBufferHysteresis (Bztcl_Stream token) Function
Returns the current hysteresis value.

int Bztcl_StreamBufferAllocated (Bztcl_Stream token) Function
Returns true if the buffer is allocated.


Node:Stream Functions Interface Inspection, Previous:Stream Functions Interface Output Buffer, Up:Stream Functions Interface

Inspection

int Bztcl_StreamGetWorkFactor (Bztcl_Stream token) Function
Returns the configured work factor.

int Bztcl_StreamGetBlockSize (Bztcl_Stream token) Function
Returns the configured block size.

int Bztcl_StreamGetSmall (Bztcl_Stream token) Function
Returns the configured small memory mode.

int Bztcl_StreamFinished (Bztcl_Stream token) Function
Returns true if the stream has been finished: explicitly with Bztcl_StreamFinish() or because the end of stream was detected in a write or flush operation.


Node:Stream Functions Transformation, Next:, Previous:Stream Functions Interface, Up:Stream Functions

Transformation

The stream transformation can be stacked upon an existing channel. To remove it use Tcl_UnstackChannel().

Tcl_Channel Bztcl_MakeTransform (input, output, subChannel) Function
Creates a new channel interface to already created stream descriptors. The transformation is stacked upon an existing channel. Arguments description follows.
Bztcl_Stream input
The input stream token. Can be NULL if the transformation is write-only.
Bztcl_Stream output
The output stream token. Can be NULL if the transformation is read-only.
Tcl_Channel subChannel
The underlying channel token.

Returns the transformation's channel token.

The transformation module is implemented by TCLMORE. Bztcl_MakeTransform() is a wrapper for More_MakeStreamTransform() (See Public interface, for details).


Node:Stream Functions Internals, Previous:Stream Functions Transformation, Up:Stream Functions

Internals

The source code file is generic/stream.c. This module is composed by four elements.

Compression/decompression context
It is a data structure holding the internal state of the stream; it is handled mainly by the BZLIB functions (all prefixed with BZ2_).

Data is absorbed registering into the context a reference to a block of data supplied by the user of the module, and extracted from it registering a reference to unused space in the output buffer. The processing functions of BZLIB will transfer the data. There must always be an output block to which send data; an input block must be present when processing data in normal mode, while when flushing or finishing a context input data must not be present.

Compression/decompression drivers
The BZLIB functions are wrapped by "drivers": submodules whose functions are collected in a table of function pointers, to provide a common interface for compression and decompression. These tables may be viewed as implementations of virtual functions (in the C++ sense).

The driver instances are statically allocated, have no version field and the logic used to select and call them is hard-coded: there is no plan to split the stream module to allow plugging in of other drivers.

The functions offered by the BZLIB interface are three for compression and three for decompression: initialisation, finalisation, processing; the processing functions can be invoked requesting different operations: common processing, flushing, finishing. The driver functions are four: initialisation, finalisation, writing, flushing.

Output buffer
It is where the data goes after being processed by the stream. When data is written to the stream: it is processed and accumulated in the output buffer; some of it may be accumulated in the internal context.

Reading data from the stream does not involve usage of the drivers: it is an operation that acts upon the output buffer only.

An effort was attempted to split completely the output buffer submodule from the rest: the result was that the synchronisation required between the buffer and the internal context nearly doubled the size of the code for some operations, so the idea was abandoned by the author. In the end, the output buffer functions are just wrappers for ckalloc() and ckrealloc().

Public interface
These functions are all prefixed with Bztcl_. Basically: they are wrappers for the driver functions; at initialisation time a driver is selected and its functions are invoked through the pointers in it.

Data types

Stream Struct Typedef
The basic data structure used to handle the stream, in it are allocated the internal context and output buffer instances.

The first argument to most of the functions in the module is a pointer to a structure of this type. This is the case of the opaque pointer used as first argument for the public interface functions.

Driver Struct Typedef
It is the table of function pointers used to provide a common interface for both compression and decompression.

Each driver function has its own type declaration, to make it easy to declare the prototypes.

Operations

Initialisation

It is straightforward: a Stream data structure is allocated and initialised with default values; the configuration values are stored in it along with a reference to the selected driver: compression or decompression. The initialisation function of the driver is invoked to prepare the internal context. The output buffer is not allocated: this will be done at the first write operation.

Writing

Steps: the input block supplied by the user is registered in the context; the processing function of BZLIB is invoked multiple times in normal mode until all the data has been absorbed; the input block is unregistered.

Invoking the processing function multiple times is required because, after each call, the output buffer may be full but some unread data may still be in the input block; the buffer is checked prior to each invocation to make sure that room is available.

Compression
Writing never ends a stream, we can process data at will.
Decompression
Each writing operation may find the end of stream in the middle of the input block; if this happens: the stream is finished correctly and a reference to the unread data is handed to the caller, so that it can be put back where it came from.
Flushing

This operation attempts to extract data from the internal context, without processing new data.

Compression
The processing function is invoked in a loop in flushing mode until it signals that all the data has been processed, or an error occurs.
Decompression
The processing function is invoked in a loop in flushing mode until: the end of stream is found, no more data is appended to the output buffer, or an error occurs. If the end of stream is detected the internal context is correctly finished and subsequent operations different from reading will be ignored.

It is not clear from the BZLIB documentation if the decompression accumulates data in an internal context or not; when the decompression function does not append new data to the output buffer, the operation is considered carried out.

Finishing

Data still in the internal context is processed and appended to the output buffer. The processing function of BZLIB is invoked in finish mode until all data has been flushed.

Compression
A stream can be terminated at any time.
Decompression
A stream can be terminated successfully only if all the compressed data has been absorbed in the internal context. BZLIB refuses to absorb data past the end of the stream, so there is no problem of handling superfluous data.

When a stream is finished: the event is recorded in the descriptor, so that no other operation will be performed on it.

Output buffer allocation

void OutputBufferEnlarge (Stream * stream) Function
If the buffer has not been allocated yet: allocates it with the configured size. Else, if required, enlarge the allocated data block.

When the number of unused bytes in the buffer is less than the configured hysteresis value, the block is reallocated. The new size is a multiple of the hysteresis value, and it is such that at least a number of bytes equal to the hysteresis value is unused.

This function is invoked before processing data through the stream. A reference to the unused space is registered in the context of the stream.

void OutputBufferShrink (Stream * stream) Function
Shrinks the output buffer to reduce memory usage. If the number of unused bytes is at least twice the configured hysteresis value, the block is reallocated. The new size is computed in the same way as when the block is enlarged.

This function is invoked after data has been read from the output buffer.

BZLIB memory allocation

void * AllocFunc (void *opaque, int number, int size) Function
Memory allocator used by the BZip2 library to manage its internal context. It is a wrapper for ckalloc().

void FreeFunc (void * opaque, void * block) Function
Memory releaser used by the BZip2 library to manage its internal context. It is a wrapper for ckfree().


Node:Misc Functions, Next:, Previous:Stream Functions, Up:Top

Miscellaneous functions

int Bztcl_IsBlockSize (int blockSize) Function
Validates an integer that is candidate to be used as compression block size. It must be a number between 1 and 9 inclusive and it expresses the block size in units of 100 kilobytes. Returns true if the value is correct, else returns false.

int Bztcl_IsWorkFactor (int workFactor) Function
Validates an integer that is candidate to be used as work factor. The work factor controls how the compression phase behaves when presented with the worst case. It must be an integer in the range 0-250 inclusive; 0 is interpreted as 30: the default value, good in most cases. Returns true if the value is valid, else returns false.

int Bztcl_GetBlockSizeFromObj (interp, obj, blockSizeVar) Function
Extracts a valid block size from an object. default is an acceptable string value for the input object: it means 3.

Arguments description follows.

Tcl_Interp * interp
If not NULL, the interp in which report errors.
Tcl_Obj * obj
Pointer to the source object.
int * blockSizePtr
Pointer to a result variable that will hold the integer.

Returns TCL_OK if a good value was found in the object integer representation, otherwise TCL_ERROR.

int Bztcl_GetWorkFactorFromObj (interp, obj, workFactorVar) Function
Extracts a valid BZip2 work factor from an object. default is an acceptable string value for the input object: it means 30.

Arguments description follows.

Tcl_Interp * interp
If not NULL, the interp in which report errors.
Tcl_Obj * obj
Pointer to the source object.
int * workFactorVar
Pointer to the target variable.

Returns TCL_OK if a good value was found, otherwise TCL_ERROR.

CONST char * Bztcl_Version (void) Function
Returns a pointer to the string representing the BZip2 version.


Node:Using stubs, Next:, Previous:Misc Functions, Up:Top

Using the stub mechanism

The stub mechanism allows us to dynamically link a client extension to a version of BZTCL and to use it with future versions, without recompiling, as long as the future versions do not change the interface.

To do this we link our client extension with the BZTCL's stub library (an object file whose name is something like libbztclstub...) and compile our code with the symbol USE_BZTCL_STUB defined. Our client library's initialisation function must contain the following code:

#include "bztcl.h"

...

int
Client_Init (...)
{
...

#ifdef USE_BZTCL_STUB
  if (Bztcl_InitStub(interp, "0.5", 0) == NULL) {
    return TCL_ERROR;
  }
#endif

...
}

where 0.5 is the version of BZTCL that the client library is supposed to use.


Node:Credits, Next:, Previous:Documentation License, Up:Top

Who wrote what

From the BZip2 README file:

bzip2-1.0.2 is distributed under a BSD-style license. ...

To the best of my knowledge, bzip2 does not use any patented algorithms. However, I do not have the resources available to carry out a full patent search. Therefore I cannot give any guarantee of the above statement. ...

I hope you find bzip2 useful. Feel free to contact me at jseward@acm.org if you have any suggestions or queries. Many people mailed me with comments, suggestions and patches after the releases of bzip-0.15, bzip-0.21, and bzip2 versions 0.1pl2, 0.9.0, 0.9.5, 1.0.0 and 1.0.1, and the changes in bzip2 are largely a result of this feedback. I thank you for your comments.

At least for the time being, bzip2's "home" is (or can be reached via) http://sources.redhat.com/bzip2.

Julian Seward Cambridge, UK (and what a great town this is!)

BZTCL was written by Marco Maggi.


Node:Package License, Next:, Previous:Using stubs, Up:Top

BSD style license

Copyright © 2002, 2003, 2004 Marco Maggi.

The author hereby grant permission to use, copy, modify, distribute, and license this software and its documentation for any purpose, provided that existing copyright notices are retained in all copies and that this notice is included verbatim in any distributions. No written agreement, license, or royalty fee is required for any of the authorized uses. Modifications to this software may be copyrighted by their authors and need not follow the licensing terms described here, provided that the new terms are clearly indicated on the first page of each file where they apply.

IN NO EVENT SHALL THE AUTHOR OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS SOFTWARE, ITS DOCUMENTATION, OR ANY DERIVATIVES THEREOF, EVEN IF THE AUTHOR HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

THE AUTHOR AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT. THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, AND THE AUTHOR AND DISTRIBUTORS HAVE NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.


Node:Documentation License, Next:, Previous:Package License, Up:Top

Documentation license

This document is copyright © 2002, 2003, 2004 by Marco Maggi.

Permission is granted to make and distribute verbatim copies of this document provided the copyright notice and this permission notice are preserved on all copies.

Permission is granted to copy and distribute modified versions of this document under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.


Node:Concept Index, Previous:Credits, Up:Top

An entry for each concept, command, function and data type

Table of Contents