The library functions to support the user with data set handling can roughly be divided into three groups. The first group of functions is for creating and deleting datasets and the optimization of the memory management associated with it. Functions like malloc_dataset, malloc_sample and free_dataset take care of this.
The second group of functions is for storing and retrieval of datasets. Datasets are
stored on disk in a straight forward ASCII format. If data sets are very large the
library supports adaptive Lempel-Ziv compression of these files for Unix machines and
on the PC (only if one uses the djgpp gcc distribution), transparent for the user. The
functions
fprintf_compressed_dataset and
fprintf_dataset,
fscanf_compressed_dataset
and (fscanf_dataset
and load_dataset) take care of writing and loading of datasets.
The last group of functions allows the user to scale the inputs and targets in any arbitrary way. Some methods depend on the evaluation order of the samples. A function to randomize the order in which the samples are stored is available too. For instance random_order_dataset) takes care of randomization of a dataset.