COMING OF AGE: THE POSITIVE LEGACY OF FOSS GIS
Software projects evolve during their development and usage lifecycles through various stages. For successful projects, this results in improvements regarding code quality, reliability and performance. Because of these processes, projects which were started in the past can be considered as legacy in respect to the very latest IT-approaches. Usually this term is used in a derogative way. Several Open Source Software projects have had several iterations of these cycles and could be rightfully addressed as legacy while they still serve their intended purpose. The example of GRASS GIS is used to showcase a software project of positive legacy. This kind of legacy is a valuable asset for software development. While ensuring code functionality by the careful maintenance of existing code, the approach eventually incorporates cutting edge technologies once they themselves have stood the test of time. It is shown how the rise of Integrated Development Environments (IDE) will help to ensure the future-saving of the project by gradual yet strategic increasing of the developer base.
Perceptions of Legacy Wikipedia describes two ways to apply the term legacy to software, which seem to be almost mutually exclusive: On the one hand, in an innovation-friendly view, application programs are considered legacy which continue to be used because the user does not want to replace or redesign them. Yet, on the other hand, from a task-solving perspective, legacy software often differs from its suggested alternative by actually working and scaling, making the term not pejorative at all, denoting a fully developed tool, which serves its purpose well. It will be shown that regarding software development there are at least two other factors related of legacy.
The Legacy of GRASS GIS: Alive and kicking The Geographic Resources Analysis Support System (GRASS) GIS project has been evolving for more than two decades. Development was started by the U. S. Army Construction Engineering Reseach Laboratories military (USA-CERL), lasting from 1982 to 1995. Afterwards, the code was handed over to academia, where a new phase of development began. Licensing under the General Public Licence (GPL) in 1999 resulted in a dramatic increase of development. Over the years, the functionality evolved from a raster-based GIS to include floating point operations, a topology-based vector model, volume-support and finally inclusion of OGC-services. Apart from the traditional use as a desktop GIS, GRASS is also used as part of the backend of other applications, such as Quantum GIS and JGRASS (uDig). For web mapping is can be used with UMN Mapserver or as a standalone OGC WPS-Services (GeOnAs, pyWPS).
Coding Paradigm With a highly modular code paradigm, following the approach of (scriptable) Unix shell commands, the canonic codebase consists of more than 300 ANSI C-modules, with additional add-on scripts and modules being independently provided and hosted by the user communities. The current version consists of about 500,000 lines of source code. Write access to the code repository is controlled since 2006 by the GRASS Project Steering Committee.
Serial Code Development A relatively small multi-national group of constant contributors volunteerly adds new features and updates old ones. The majority of the developer community is highly fluent in the code structure and uses text based tools for development such as vi or Emacs. For newcomers, this situation results in a steep learning curve. Besides, the constant overall development of libraries forces contributors of add-on C-modules to adapt their code. As a consequence, many C-based add-on-modules become defunct over time, while Shell-style Scripts remain usable since the C-module interfaces remain unchanged despite the internal changes.
Arguing the case for extended IDE usage During the last years Integrated Development Environments (IDEs) became widely available for code development. They allow for easy navigation in large code repositories and collaborative development. Many programmers consider the availability of IDEs as given. Therefore it makes sense to foster the know-how regarding GRASS-development with IDEs, such as Eclipse/CDT. Also, due to the availability of code-tracking and refactoring tools, add-on modules could much more easily being updated to the latest standards by their coders. The decision of the GRASS GIS community to stick to C-code in the spirit of positive-legacy allowed for successful deployment of a native version for Microsoft Windows, avoiding the performance losses of Cygwin-based installations. Since IDEs such as Eclipse are platform independent, this also allows for active code development on non-Linux systems, such as Microsoft Windows. In regard to the dark side of legacy such as retiring of the current pre-IDE developers and the resulting loss in know how and skill, it is even more important to enable IDE-based development to help document the code while there is time. An IDE can be used as a convenient frontend for souce code development, maintenance and the building of binary executables. The issue of software legacy can be split up in two distinctive aspects: the overall quality of the source code itself, including knowledge preservation by means s uch as comments, and the mechanisms (toolchains) required to derive executables. For the latter, GRASS GIS relies on a non-standard toolchain. Standard approaches like condfigure/make/install or cmake are (still) not supported. This can be perceived as a case of positive legacy (if it works, don't change it), yet it could this approach create a bottleneck for platform-independent development. Fortunately, this custom toolchain for building GRASS GIS binaries can be encapsulated to be used within an Ant-toolchain. Ant is commonly used for Java applications and is part of the Eclipse IDE. This allows to manage the whole software development process of checking out the latest sources from the SVN, code editing, configuration and building of executables in one IDE, making it truly platform independent.
Conclusion As it has been shown, GRASS GIS has proven over the last two decades to be a working and scaling geoinformation system environment. It continues to grow and is very well alive. The development approach of the community is basically conservative, yet integrates additional technologies once they are considered as mature and of relevance. The usage of IDEs is expected to extend and rejunevate the developer community. It will also broaden and speed up the development process in a second fashion, as it also enables the active development on non-linux stystems. This documents the positive, stabilizing effects of trusted software which just works well and can be continued to be developed on various platforms.
Outlook A tool to describe processing chains would be desirable to document and manage the community-inherent know-how about how to orchestrate the GRASS modules to cope with complex tasks. Usually every task can be solved in various ways, some of them more efficient than others. While this knowledge remains only marginally archived, the leaving of experienced member from the user community will result in massive losses of know how and skill. A likely tool for this task is CyberIntegrator (CI), a workflow-based system that supports interactive workflow creation, connection to external data and event streams, provenance tracking, and incorporation of workflow fragments and functionality from other systems and applications. Trials for the integration of CI and GRASS GIS are underway. Once such tools have been tied into the GRASS GIS environment, the future-saving of a community driven FOSS Geoinformatics-project will be ensured for years to come.
References
Eclipse for GRASS GIS development
Legacy System Definition (Wikipedia)
Illustration The image shows the Eclipse IDE with the CDT perspective for C-code development. The current GRASS GIS source code has been downloaded from the SVN repository and has just been compiled via the ant-wrapped GRASS-building chain. The resulting binary has just been invoked up as the last step of the processing chain.