ࡱ > !` bjbj\\ . > > j F P P P P P P P d r r r 8 r Pt $ d 2 u 0x ( Xx Xx Xx z #~ / # % % % % % % $ h b I 9 P Cz @ z I P P Xx Xx Y Y Y P Xx P Xx # Y # Y Y f P P Xx tu A r a Y D 0 P Y I I d d d d 5 D - d d d D d d d P P P P P P Austrian Grid
A Report on the First Prototype of a Grid-enabled Data Management System for SEE-GRID
Document Identifier:AG-DA-1c-4-2006_v1.docStatus:PublicWorkpackage:A1cPartner(s):Research Institute for Symbolic Computation (RISC)
Upper Austrian Research (UAR)Lead Partner:RISCWP Leaders:Wolfgang Schreiner (RISC), Michael Buchberger (UAR)
Delivery SlipNamePartnerDateSignatureFromKaroly BosaRISC2006.07.31
Verified by
Approved by
Document LogVersionDateSummary of changesAuthor1.02006-07-17Initial VersionSee cover on page 3A Report on the First Prototype of a Grid-enabled Data Management System for SEE-GRID
Karoly Bosa
Wolfgang Schreiner
Research Institute for Symbolic Computation (RISC)
Johannes Kepler University Linz
{Karoly.Bosa, Wolfgang.Schreiner}@risc.uni-linz.ac.at
Michael Buchberger
Thomas Kaltofen
Department for Medical Informatics
Upper Austrian Research (UAR)
Thomas.Kaltofen@uar.at
DATE \@ "MMMM d, yyyy" Juli 31, 2006
TOC \o "1-3" \h \z \u
HYPERLINK \l "_Toc142115224" Abstract PAGEREF _Toc142115224 \h 5
HYPERLINK \l "_Toc142115225" 1 Introduction PAGEREF _Toc142115225 \h 6
HYPERLINK \l "_Toc142115226" 2 A Preliminary Integration of the SEE-GRID Database Component into the WSRF framework PAGEREF _Toc142115226 \h 7
HYPERLINK \l "_Toc142115227" 2.1 The Structure of the Original Database Component PAGEREF _Toc142115227 \h 7
HYPERLINK \l "_Toc142115228" 2.2 Altered Source Code and Interfaces PAGEREF _Toc142115228 \h 8
HYPERLINK \l "_Toc142115229" 2.3 Building the WSRF based Database Component with Globus Build Service Tool PAGEREF _Toc142115229 \h 10
HYPERLINK \l "_Toc142115230" 3 The Client Side API for Accessing the new WSRF based Database Component PAGEREF _Toc142115230 \h 12
HYPERLINK \l "_Toc142115231" 3.1 Generating C Stubs and Bindings PAGEREF _Toc142115231 \h 12
HYPERLINK \l "_Toc142115232" 3.2 Extensions of the SEE++ To Grid Bridge PAGEREF _Toc142115232 \h 13
HYPERLINK \l "_Toc142115233" 3.2.1 Modified Messages PAGEREF _Toc142115233 \h 13
HYPERLINK \l "_Toc142115234" 4 A Design for a gLite Port of the SEE-GRID Architecture PAGEREF _Toc142115234 \h 14
HYPERLINK \l "_Toc142115235" 5 Outlook PAGEREF _Toc142115235 \h 15
HYPERLINK \l "_Toc142115236" 6 Acknowledgements PAGEREF _Toc142115236 \h 15
HYPERLINK \l "_Toc142115237" References PAGEREF _Toc142115237 \h 15
Abstract
SEE-GRID is based on the SEE++ software system for the biomechanical simulation of the human eye. The goal of SEE-GRID is to extend SEE++ in several steps in order to develop an efficient grid-based tool for ``Evidence Based Medicine'', which supports surgeons in choosing optimal surgery techniques for the treatment of certain eye motility disorders.
First, we have developed a grid-enabled version of the simulation of the Hess-Lancaster test, which is a medical examination by which the pathology of the patient can be estimated. Based on this, we work on a pathology fitting algorithm that attempts to give sufficiently close estimations for the pathological reasons of the disorder. Furthermore, we have developed a prototype version of a grid enabled medical database where both real and simulated pathological cases can be collected, sorted and evaluated for improving both the later pathology fitting calculations and the future medical treatments.
In this document, we present a description of the elementary integration of the prototype database component into Globus 4. Then we give a short overview on a planned gLite (EGEE middleware) compatible architecture of the SEE-GRID software system and outline its proposed components.
Introduction
Figure 1: The Extended Architecture of SEE-GRID based on Globus 4
SEE-GRID is based on the SEE++ [SEE-KID, 2006] software system for the biomechanical 3D simulation of the human eye and its muscles. SEE++ simulates the common eye muscle surgery techniques in a graphic interactive way that is familiar to an experienced surgeon. The goal of SEE-GRID is to adapt and to extend SEE++ in several steps and to develop an efficient grid-based tool for Evidence Based Medicine, which supports the surgeons in choosing optimal surgery techniques for the treatments of different syndromes of strabismus.
In [SEE-GRID, 2006], we combined the SEE++ software with the Globus (pre-Web Service) middleware [Globus, 2006] and developed a parallel version of the simulation of the Hess-Lancaster test (typical medical examination). By this, we demonstrated how a noticeable speedup can be achieved in SEE++ by the exploitation of the computational power of the Austrian Grid. Furthermore, we reported the prototype implementation of a medical database component for SEE-GRID. Finally, we designed a so called grid-based Pathology Fitting algorithm, which would be able to determinate (or at least estimate) automatically the pathological reason of a patient's strabismus.
The current architecture of SEE-GRID (the box in Figure 1. bordered by the dashed line) consists of the Web Service based database services, WSRF-based database services, the SEE++ to Grid Bridge, the grid-enabled SEE++ servers (which are started via pre-WS GRAM and perform the gaze pattern calculations) and the SEE++ clients.
According to the scenario of the parallel simulation of Hess-Lancaster, before the bridge accepts the computational requests from the SEE++ clients, it submits in advance some grid-enabled SEE++ servers into the grid. These processes behave as some kind of ``executer'' processes for the computation tasks in order that the remarkable latencies of the job submissions in case of the computational requests can be avoided (since the parallel Hess-Lancaster test simulation takes only approximately 1 up to 15 seconds on the grid).
The SEE++ clients can connect to all database components located on the grid via the ``SEE++ to Grid Bridge'' (the underlying grid-based infrastructure is hidden from the clients). Furthermore, the clients can also reach every Web Service based database component via the bridge, although a client is also able to interact only with one such database directly. For the underlying database system, we decided to use MySQL, because in our comparative performance tests it worked together with our developed software architecture 4-8 times faster than postgreSQL.
Recently, we just finished the elementary integration of the prototype Web Service based implementation [Mitterdorfer, 2005] of the SEE-GRID database into the WSRF framework, see Section 2.
Since we joined the EGEE2 [EGEE, 2006], we designed a gLite [gLite, 2006] compatible version of SEE-GRID, see Section 4. According to this, we intend to further develop the SEE-GRID software system on the basis of the higher services of the EGEE2 middleware (compared with the low-level services of the Globus Toolkit).
A Preliminary Integration of the SEE-GRID Database Component into the WSRF framework
In the last project phase, we integrated the original (Web Service based) database component into Globus 4 (WSRF framework). At the moment, the only differences between the two software components are the interfaces via which they connect to the differing underlying infrastructures (Web Services vs.WSRF). Later, we plan to exploit the features of the WSRF framework (Resource Properties, Notifications, etc) in the further development of the database component.
We kept the originally introduced security concept [Mitterdorfer, 2005], which was designed for the specific requirements of the SEE++/SEE-GRID software system. However, we will combine these authorization and authentication mechanisms with the grid style transport level security and certifications in case of the interactions of the bridge and of the WSRF services (the communication between the bridge and the clients take place outside of the grid).
The Structure of the Original Database Component
The source code of the original SEE-GRID database component is organized in the following subproject:
DatabaseBuilder The build file in this project allows setting up a database schema and filling it with reference data. Currently, PostgreSQL, MS SQL and MySQL server are supported.
FacadeGenerator This project implements a code generator that is needed during the build process of the Web service. It generates a remote Proxy class and interface as facade for the Web service. For more background information, please refer to chapter 5.4.1 of [Mitterdorfer, 2005].
PersistenceComponent This project is the core project. It contains the database/persistence component itself. However, the code of this project is not aware of Web services. The project SeePersistenceServiceAxis serves the purpose of enabling the persistence component to be remotely accessible. For more information on the design and implementation of the persistence component, please refer to chapter 4 and 5 of [Mitterdorfer, 2005].
SecurityBase This project implements a generic interceptor, which performs authentication and authorization on method calls.
SeePersistenceServiceAxis It uses the project FacadeGenerator to generate the facade code. It generates the server side stubs for the SOAP communication. It also creates the production databases, deploys the WAR file to the server and registers the Web service.
SQLGen This project contains a tool that reads an XML file that contains records and inserts the contents directly into a database. It is needed to fill the database with reference data and to instantiate the metamodel. The database must be accessible at build time. This might be a problem if you want to install the persistence component at a customer using an installer.
Altered Source Code and Interfaces
Since only the project SeePersistenceServiceAxis responsible for the remote (Web Service based) accessibility of the SEE-GRID database component, we had to modify only this project.
In the Java source code, we had to make only minimal changes. We imported some Globus packages and the generated stubs (for generating stubs see Section 2.3) and we implemented the interface org.globus.wsrf.Resource (however, this interface doesn't require any methods. It is simply a way of tagging an application as being a grid resource).
However, at the beginning there will not be difference in the functionalities of the Web Service based and WSRF-based version, but still we had to make some changes in the WSDL interface file. The reason for this, because of:
Requirements of the WSRF framework
Backward incompatibility of the different AXIS versions [Axis, 2006] (in the original version of the database component we applied Axis 1.2.1 while in Globus 4 [Globus, 2006] a modified version of Axis 1.2RC2 is used).
According to these, we made the following modification in the WSDL file:
We added and imported some additional namespace definitions with respect to the WSRF framework like:
- xmlns:wsrp="http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-ResourceProperties-1.2-draft-01.xsd"
- xmlns:wsrpw= HYPERLINK "http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-ResourceProperties-1.2-draft-01.wsdl" http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-ResourceProperties-1.2-draft-01.wsdl
- xmlns:wsdlpp="http://www.globus.org/namespaces/2004/10/WSDLPreprocessor"
-
In the SEE-GRID SOAP protocol we have two SOAP messages the GetServerVersionRequest and GetServerStatisticsRequest, which do not have arguments. Originally these messages was defined in the WSDL interface in the following way:
In Globus 4, this kind of message definition triggered an error message when we attempted to generate stubs by a Gobus 4 tool for the database component. Hence, we follow the technique described in [GDP, 2006] for such cases and defined an empty type first:
Then, the corresponding message definitions looks like as follows:
where ns1 denotes the namespace of the SEE-GRID database component.
We also extended the portType definition in the WSDL file:
Thanks to the wsdlpp:extends attribute of the portType element we can include existing WSRF portTypes in our own portType without having to copy-and-paste from the official WSRF WSDL files.
The entire binding section is removed from the WSDL document. Bindings are an essential part of a normal WSDL file. However, we don't have to add them, since they are generated automatically by a Globus 4 tool that is called when we build the service.
We used the same WSDD (deployment descriptor) file as in case of the Web Services. Furthermore, we create a JNDI deployment file, which is also required for a WSRF service:
?xml version="1.0" encoding="UTF-8"?>
factory
org.globus.wsrf.jndi.BeanFactory
This file is responsible for specifying what resource home our service has to use to get a hold of resources. However, since at this point we are only managing a single resource, our JNDI deployment file is as simple as possible. The service tag has a name attribute whose value must match the service name specified in the WSDD file. The resource tag specifies the resource home for our service.
Building the WSRF based Database Component with Globus Build Service Tool
The building of the SEE-GRID database component as WSDL service and generate a Grid Archive (GAR) file is not an obvious task. For performing this, the easiest way is to use globus-build-service [GBS, 2005]. It is a general-purpose Ant [Ant, 2006] build file and script that generates a GAR file from a given set of Java, WSDL, WSDD, etc. files without the need to manually edit the Ant file. Nevertheless, this tool has some bugs and several restrictions.
Since some stub files is already required for the building of the project PersistenceComponent, we generated the stub files from the modified WSDL file by the following command first:
./globus-build-service.sh --dir at/uarmi/seegrid/persistence/ --schema \ schema/seepp_server/seepp_server.wsdl --target compileStubs
Then, we compiled projects DatabaseBuilder, SQLGEN, SecurityBase, and PersistenceComponent into a jar file called Persistence.jar as before. We also executed the build target generate-spring-ws-impl in the build file of the project SeePersistenceServerAxis to generate web service faade from the stubs file.
Since all files (source, library, configuration, etc.) of an application intended to build with the globus-build-service has to be located in a predefined directory structure:
We copied the library file used by the database component into the corresponding lib directory.
We copied the previously compiled Persistence.jar into the lib directory as well.
We removed the earlier used Axis related files (e.g.: axis.jar, axis-ant.jar, jaxrpc.jar, saaj.jar and wsdl4j-1.5.1.jar), since we want to use another Axis version which is a part of the globus 4 distribution (the globus compatible versions of these file reside in the directory $GLOBUS_LOCATION/lib).
We copied the source files of the SeePersistenceServerAxis with the generated source of the stub files into the corresponding impl library, and the deployment and configuration files into the corresponding etc directory.
Before we start the build of the application with globus-build-service, there was one more thing to do. We had to modify the general Ant build file of this tool:
We modified the build target compile in the build file, because the tool contains a bug and it was not able to include the Jar files from lib directory into the $CLASSPATH.
We also extended the build target copyJars, because our lib directory contains not only Jar files but some configuration files (e.g.: some security protocol for the database). Originally only the Jar files were copied from this directory into the generated GAR file.
Finally, we could build our SEE-GRID database component with globus-build-service tool in the following way:
./globus-build-service.sh --dir at/uarmi/seegrid/persistence/ --schema \ schema/seepp_server/seepp_server.wsdl --target all
After we logged in as the globus user (who has write permission in $GLOBUS_LOCATION), we can deploy our service into the Globus 4 by the command:
globus-deploy-gar at_uarmi_seegrid_persistence.gar
Then we can start up the globus standalone container with the command
globus-start-container nosec
and our database component will appear in the list of the services. We started the container with the argument -nosec, because at the moment we do not use any security configuration like (certificates, https, etc).
The Client Side API for Accessing the new WSRF based Database Component
Since the SEE++ To Grid Bridge is implemented in C/C++, we should apply the APIs of the Globus C WS Core for implementing the client side functionality our database service.
Generating C Stubs and Bindings
First, we should generate client side stub files from the WSDL file. For this purpose, we applied the globus-wsrf-cgen tool. Unfortunately, this tool has some strict limitations, too:
Only generates bindings from Document/literal style WSDL schemas. Since our original WSDL file used RPC/encoded style, we had to regenerate the file with Document/literal (we did it before we started to create the java stubs for the server side of course).
Only generates ANSI-C bindings. C++ bindings are not supported. The solution for this problem is not as easy as in the previous case. We employed a very complex data structure as arguments in the SOAP messages, whose bindings were implemented in C++ on the SEE++ To Grid Bridge (which implementation was taken from the original SEE++ software). Therefore, we must implement a conversion between the generated ANSI-C bindings and the already applied C++ data structure (since we do not want to re-implement the whole SEE++ to Grid Bridge based on only the ANSI-C data structure).
We investigated as an alternative solution to use gSOAP to interact with the database component (which do not use any special features of Globus 4 at the moment), but this would be a dead end for the development, since later we cannot extend our system with any Globus related solutions, like grid style certificates, resource properties, factory service etc.
So for generating the ANSI-C client side bindings, we issued the following command:
globus-wsrf-cgen -s at_uarmi_seegrid_persistence \
-flavor gcc32dbgpthr no-service seepp_server.wsdl
where we specified the package name (with the parameter -s), the build flavor (with the parameter -flavor) and the WSDL schema file. By the argument -no-service, we indicated for the tool, that we want to generate only client side stubs and types.
The generated package can be built and installed using the following command:
$GLOBUS_LOCATION/sbin/gpt-build at_uarmi_persistence_bindings-0.2.tar.gz gcc32dbgpthr
Extensions of the SEE++ To Grid Bridge
The generated package includes a client header which provides the necessary function declarations to perform the client invocations we need to make:
#include at_uarmi_seegrid_persistence_client.h
In the source code of the SEE++ To Grid Bridge, we had to activate the module defined for the client first (similarly as in the case of the Globus pre-WS APIs). Then we need a client handle for accessing a database service:
globus_module_activate(AT_UARMI_SEEGRID_PERSISTENCE_MODULE);
at_uarmi_seegrid_persistence_client_handle_t db_handle;
int result = at_uarmi_seegrid_persistence_client_init(&db_handle, NULL, NULL);
Modified Messages
Two-Way Message
Request: findPatient (User_Identication_Data, Record_of_Personal_Patient_Data)
Answer: Array_of_Records_of_Personal_Patient_Data
The parameters of this SOAP message are the user identification data and a template record of the personal patient data, which contains the search criteria. The message returns with a list of the personal data of all the patients, who fit to the criteria.
Parallel search: Currently we work an the extension of the message, such that if the SEE++ To Grid Bridge receives a message findPatient from a SEE++ client, it forwards it to all available Web Service based and all WSRF-based database component, too.
For this, we intend to use the function at_uarmi_seegrid_persistence_findPatient_epr (declared in the header file at_uarmi_seegrid_persistence_client.h) for accessing to the WSRF-based database services, because in the case of this function we can give the end point reference of a service as an argument (at the moment we have only one resource for every WSRF database service, therefore, we skip the resource creation step).
Then the SEE++ to Grid Bridge collects the search results received from the databases into single list and then forwards it to the SEE++ client. The accessing to the database components and collecting of the corresponding data from the database components happens simultaneously by the usage of POSIX threads.
A Design for a gLite Port of the SEE-GRID Architecture
Figure 2: The Design of the gLite Compatible SEE-GRID
For this purpose, we again use some kind of SEE-GRID server jobs (as executers for parallel Hess calculations) started via Workload Management System (WMS) of gLite. Nevertheless, instead of forking and terminating these jobs as before to return the allocated port number to the bridge, we investigate and exploit the interactive job submission feature of gLite. After such a job is submitted, the gLite environment starts a listener process for the job on the client side. Any user application (like ``SEE++ to Grid Bridge'') is able to interact with this process through named pipes.
Under gLite, we may exchange the software architecture and authentication methods applied earlier for the SEE-GRID medical databases to an AMGA-based solution. AMGA [AMGA] is a database access service for grid applications, which is part of the latest release of gLite. It is able to hide the differences of the user interfaces of the supported underlying database systems and provides a unified access to them with the grid style certificate-based authentication. Since AMGA supports among others MySQL as well, it would be possible to use the same medical databases in the Globus Toolkit 4 and the gLite environments.
In the gLite compatible version of SEE-GRID, we will make one more step further in the development and reach that the system will be able to discover automatically the available databases and the executer jobs on the grid. For this purpose, we plan to apply R-GMA information system in gLite, which also allows users to publish their own data.
Outlook
Currently, we started to work on an extension of ``SEE++ To the Grid Bridge'' with WS-GRAM Client C API. At the end of the next project phase, we intend to report the outcome of the comparative benchmarks either of the usage of the pre-WS and the WS-GRAM APIs or of the applied prototype (Axis based Web Service) implementation and the WSRF-based implementation of the SEE-GRID database.
The final goal is to implement a variant of SEE-GRID which uses the Web Service based interfaces of gLite services (e.g.: MWSproxy and SOAP-based interface of AMGA) and to compare its performance with the other SEE-GRID version based on the Globus Toolkit 4.
Another planned way for establishing a grid-based distributed medical database is to use the Grid Seamless Data Access Middleware (G-SDAM) [G-SDAM, 2005] developed by the Institute for Applied Knowledge Processing (FAW). The G-SDAM is still under development and the first prototype will come out in September 2006. However, the developers of G-SDAM and SEE-GRID have already started to elaborate the common requirements and to design interfaces to combine the two software components.
Acknowledgements
The G-SDAM framework is developed by the Institute for Applied Knowledge Processing (Institut Fr Anwendungsorientierte Wissensve r a r b e i t u n g F A W ) a s a p a r t n e r o f t h e S E E - G R I D p r o j e c t .
R e f e r e n c e s
[ A M G A ] A M G A U s e r ' s a n d A d m i n i s t r a t o r ' s M a n u a l H Y P E R L I N K " h t t p : / / p r o j e c t - a r d a - d e v . w e b . c e r n . c h / p r o j e c t - a r d a - d e v / m e t a d a t a / d o w n l o a d s / a m g a - m a n u a l _ 1 _ 2 _ 3 . p d f " h t t p : / / p r o j e c t - a r d a - d e v . w e b . c e r n.ch/project-arda-dev/metadata/downloads/amga-manual_1_2_3.pdf
[Ant, 2006] Apach Ant homepage, 2006. http://ant.apache.org/
[Axis, 2006] Apache Axis homepage, 2006. http://ws.apache.org/axis/
[SEE-GRID, 2006] Karoly Bosa, Wolfgang Schreiner, Michael Buchberger, Thomas Kaltofen. SEE-GRID, A Grid-Based Medical Decision Support System for Eye Muscle Surgery, 1st Austrian Grid Symposium, December 1-2, 2005, Hagenberg, Austria. OCG Verlag, 14 pages.
[Buchberger, 2004] Michael Buchberger. Biomechanical Modelling of the Human Eye.
Ph.D. thesis, Johannes Kepler University, Linz, Austria, March 2004.
HYPERLINK "http://www.see-kid.at/download/Dissertation_MB.pdf" http://www.see-kid.at/download/Dissertation_MB.pdf
[EGEE, 2006] EGEE-II homepage, 2006. http://www.eu-egee.org
[gLite, 2006] gLite 3.0.0 home page, 2006. http://www.glite.org
[Globus, 2006] The Globus Toolkit. HYPERLINK "http://www.globus.org/toolkit/" http://www.globus.org/toolkit/
[GDP, 2006] Globus Documentation Project. The Globus Toolkit 4 Programmers Tutorial. http://gdp.globus.org/gt4-tutorial/
[GBS, 2005] Globus Build Service 0.2.5. http://gsbt.sourceforge.net
[G-SDAM, 2005] A Report on a Unified Grid-aware Access Layer for SEE-GRID Data Sets,
Austrian Grid Deliverable M-4aA-1c, FAW Institute and RISC Institute, Johannes Kepler University, Linz, August 2005. http://www.faw.uni-linz.ac.at
[gSOAP, 2005] gSOAP 2.7.0 User Guide, 2005. HYPERLINK "http://www.cs.fsu.edu/~engelen/soap.html" http://www.cs.fsu.edu/~engelen/soap.html
[Kaltofen, 2002] Thomas Kaltofen. Design and Implementation of a Mathematical Pulley Model for Biomechanical Eye Surgery. Diploma thesis, Upper Austria University of Applied Sciences, Hagenberg, June 2002. HYPERLINK "http://www.see-kid.at/download/Pulley_Model_Thesis.pdf" http://www.see-kid.at/download/Pulley_Model_Thesis.pdf
[Mitterdorfer, 2005] Daniel Mitterdorfer. Grid-Capable Persistance Based on a Metamodel for Medical Decision Support
d e f { ůhWH9 hxm 5CJ OJ QJ mHsHhG 5CJ OJ QJ mHsH h h CJ OJ QJ ^J aJ .h h 5CJ, OJ QJ \^J aJ, mH sH 1h h 5:CJ OJ QJ \^J aJ mH sH +hxm 5:CJ OJ QJ \^J aJ mH sH +h
# 5:CJ OJ QJ \^J aJ mH sH h h mH sH +hI^ 5:CJ( OJ QJ \^J aJ( mH sH 1hI^ h 5:CJ( OJ QJ \^J aJ( mH sH
e f { b | kd $$If F 0 #D
t 0 6 4 4
l a $If gdN<