[Leptonica Home Page]

Multi-platform programming

March 20, 2011

Why is multi-platform programming tricky?
C library function compatibility
C runtime difference
C runtime boundary issues for DLLs
File and director operations

Why is multi-platform programming tricky?

Leptonica was originally developed on Linux with relatively little concern for details of porting the library to Windows, beyond using strict ANSI C and Posix library functions. The assumption was that it should compile under Windows, and if it didn't, we would thank Bill Gates for using standards (sarcasm intended) and then fix it up.

Within the past two years it became clear that Leptonica needed to work on Windows. The prime mover was Tom Powers, who wrote tools and documentation for Windows under VS2008. In addition, James Le Cuirot produced set of autotools files (configure.ac, Makefile.am) that would generate not only the library but also all the programs, in Cygwin and MSYS environments. Dave Bryan also contributed substantially to the effort to build everything and have it run properly in Windows. A further incentive for building cleanly on Windows is that Leptonica is now a required library for Tesseract, which is built for both Unix and Windows.

A considerable amount of work has been required to make the library and programs work in a simple and consistent way on all platforms. By "simple and consistent" I mean that to the greatest extent possible you should be able to write your library function or application with a single stream of source code, and with a minimum amount of parallel code delineated by C preprocessor defines. This is a work-in-progress: we will continue to write functions to abstract out cross-platform differences.

The issues that needed to be resolved fall into the following categories:

Most of these are handled in utils.c, and you should consult that file for details. See in particular the "Notes on cross-platform development" at the top.

C library function compatibility

There are significant differences between the string operations in Windows and Linux. The biggest problem was strncat(), which is not compatible with the Windows strcat_s() function. We wrote our own, stringCat(), using the arguably better Windows interface. It is suggested that you use the string operations in utils.c when possible.

The Windows functions that handle the file system are significantly different from Unix, and are implemented in parallel. These differences are abstracted away by the Leptonica library functions. The same is true with the timing functions.

C runtime differences

GNU has two runtime extensions that allow data to a stream to be redirected to memory and data from memory to be read as if it is in a file. These do not exist in Windows, and are either stubbed out or simulated by writing to and reading from the file system.

C runtime boundary issues for DLLs

Problems arise when pointers to streams and data are passed between two Windows DLLs that have been generated with different C runtimes. To avoid this, Leptonica provides wrappers for several C library calls. Always use these functions:

File and directory operations

Most of the multi-platform issues arise because we are reading and writing files. A number of cross-platform functions have been written for handling files in the "temp" directory tree. Always use them for such files:

Note that lept_rmdir removes all files in the directory as well. Also, these five functions can only write and remove files in the "temp" directory tree, with the root /tmp on Unix and its analog on Windows. That way it is not possible to accidentally alter files in other parts of the file system.

Windows does not have a /tmp directory, so it is necessary to find the appropriate directory to write to. Two functions are provided that use the appropriate directory when generating filenames. Both use unix-style pathname separators to generate platform-dependent pathnames:

All functions take Unix path separators except for these two:

There are also functions for reading and writing binary data, which also use the "/tmp" name translation above for Windows:

Finally, there are higher-level copy functions that use these, such as

These files can write anywhere in the file system, and use the "/tmp" name translation for Windows. For example, fileConcatenate() calls l_binaryRead(), which calls fopenReadStream(). This calls genPathname(), to get the canonical cross-platform filename for reading, before calling fopen().

The bottom line: if you want your application to run on Windows, use these functions when writing applications using Leptonica.

[Leptonica Home Page]

Creative Commons License
Leptonica by Dan Bloomberg is licensed under a Creative Commons Attribution 3.0 United States License.