Master of Science (MSc) in Engineering Technology Electronics-ICT. Proceedings of MSc thesis papers electronics-ict

Size: px
Start display at page:

Download "Master of Science (MSc) in Engineering Technology Electronics-ICT. Proceedings of MSc thesis papers electronics-ict"


1 Master of Science (MSc) in Engineering Technology Electronics-ICT Proceedings of MSc thesis papers electronics-ict Academic year




5 Introduction We are proud to present you this second edition of the Proceedings of M.Sc. thesis papers from our Master students in Engineering Technology: Electronics-ICT. Thirty-two students report here the results of their research. This research was done in companies, research institutions and our department itself. The results are presented as papers and collected in this text which aims to give the reader an idea about the quality of the student conducted research. Both theoretical and application-oriented articles are included. These proceedings can be downloaded from menu item onderzoek and Wetenschappelijke papers. Our research areas are: Electronics ICT Biomedical technology We hope that these papers will give the opportunity to discuss with us new ideas in current and future research and will result in new ways of collaboration. The Electronics-ICT team Patrick Colleman Tom Croonenborghs Joan Deboeck Guy Geeraerts Peter Karsmakers Paul Leroux Vic Van Roie Bart Vanrumste Staf Vermeulen III


7 Java-based local streaming of multimedia files (June 2011) S. Bel 1, T. Larsson 2, P. Colleman 1 1 IBW, Katholieke Hogeschool Kempen, Kleinhoefstraat 2, B-2440 Geel, Belgium 2 ES, Högskolan Halmstad, Box 823, S Halmstad, Sweden Abstract When playing media from a remote device, different solutions exist the day of writing. Streaming is one of those possibilities and can be implemented without high demands on hardware and network structures. Thanks to the variety of streaming, different streaming protocols can be used to suit multiple situations, like in this case HTTP for the local area network. Implementing this concept can be done using Java and an available library found in the large online community supporting this programming language. The concerning library is called vlcj and is well known due to the popular VLC media player, where it is a decent from. B Index Terms Java, streaming, HTTP, local, vlcj. I. INTRODUCTION ECAUSE of the growing demands for digital video applications on the internet and the broadcasting of media, different solutions are developed and are still being developed. A lot of media nowadays is available on the internet and can be watched on the webpage itself, like by example YouTube. The objective of this paper is to watch media remotely without using the internet as a medium. The environment for this application is the local area network and the media will be hosted on a regular peer in the network, so no high performance superpeer or server. In this way the flexibility of the application can be guaranteed, because the hosting peer does not have high requirements. Another fact that provides flexibility is cross-platformability. With web applications this guarantee exists, you only need to take into account the layout of the web application for the different platforms. For this research topic, the solution is based upon a standalone application. Since this includes choosing a programming language, a requisite will be the ability to develop one program that suites to all (or almost all) different platforms existing the day of writing. Furthermore the application needs to be simple in layout and be able to reach a large audience. This requisite can be fulfilled by using the streaming concept. With this you build up a structure of different peers, with by example one peer holding the required multimedia file to play along the different other peers. Streaming allows you to host the media with a range of different protocols (by example HyperText Transfer Protocol, HTTP) and this stream can be opened by the other peers. An easy solution that can be implemented with the technologies described in this paper. When combining the two requisites and the streaming concept, the research points us to the direction of Java in combination with streaming libraries. In the sections below Java bindings for VLC media player, or vlcj, will be discussed, because this is the most suitable answer to the questions posed in this research topic. This topic will be discussed first with a short explanation about the prerequisites of the network structure and what is included into the research and what is considered implementing the solution. Afterwards the choice of the programming language is defended with the advantages Java offers. This will be followed by the streaming concept, how streaming in a local area network takes place and which protocols that can be used. Based upon this section the paper continues with the Java bindings for VLC media player (vlcj), the used streaming library. To conclude then, the proposed solution is tested by using a test package provided by vlcj and a global conclusion is made. II. PREREQUISITES Before starting to describe the problems and solutions of the research, it is important to know what is included in research area of this paper and what is not. The objective is to provide an application that is able to host a multimedia file and to play this file remotely on another peer in the network. When we do the comparison with the different layers in the OSI (Open Systems Interconnection) model, we can conclude that the objective of this paper covers only the top layers of the communication process. The bottom layers will be outside the boundaries of the research area and are setup and maintained by protocols that will not be discussed here. This contains the setup of the different connections, which in most cases will be done by TCP/IP, the 1

8 maintaining of the connections between the clients, etc. The network structure containing different switches and peers needs to be in full working order and will not be a requirement of the developed solution. Aside from all the advantages of the Java programming language itself, the online community supporting Java is very large. Thus developing in Java can be made a lot easier when consulting these resources, especially when using developed libraries, which has been eagerly have used in this paper. In a manner of fact, the utilized library in this paper is namely an example of this. The library will be addressed in the following sections. Fig. 1. The local area network structure that makes implementation of streaming flexible, no centralization of data is necessary, because no high performance server is mandatory for setting up the streaming concept. III. PROGRAMMING LANGUAGE An important fact in the development of streaming applications is that it needs to be cross-platform. The multimedia files provided will be hosted on different peers and possible with different Operating Systems. To overcome this obstacle the choice for the programming language will be determined by the cross-platformability. A different range of languages exist that have the possibility to develop applications that are cross-platform. Java is an example of such a language and will be chosen, because it not only provides cross-platformability, it also maintains the stability of the applications, what can be an issue with this kind of programming. Java utilizes the Java Virtual Machine (JVM) which includes WORA or write once, run anywhere and is well-implemented and even features automatic exception handling, which provides information about the error independent of the source, making debugging a lot easier. The JVM can execute CLASS or JAR files and moreover nowadays Just-In-Time (JIT) compiling is used to achieve greater speed and overcome the overhead mentioned in the previous section. Also resource utilization is kept low with using Java. Applications written in Java have an automatic garbage collection system while executing, so keeping needed resources down and eliminate the need to dispose created objects or pointers. Another advantage is the small footprint of a Java application, for the developed solution the size is less than 100 kilobytes. For more information about the Java programming language, consult [1] [2]. IV. STREAMING CONCEPT For playing multimedia that is stored remotely streaming is a solid solution. With streaming one is the streaming provider and the other side are the streaming receivers. Since one stream can be opened by multiple receivers it has become a popular solution for digital video internet applications. Aside from this usage, it is also very suitable for the local area network. Streaming multimedia files can be described as opening a file and making it accessible for others by using different protocols. Supported protocols are HTTP, Real Time Streaming Protocol (RTSP), Real-time Transport Protocol (RTP), User Data Protocol (UDP), etc. The advantage of the different supported protocols is that you are able to choose the protocol that is suitable for each situation. In the case of local area network streaming, the utilized protocol will be HTTP. HTTP in namely well supported and as you ll see later on the chosen Java library support this as well. Fig. 2. Simple visualization of the streaming concept with the HTTP protocol. When opening a stream, the streaming provider needs to convert the multimedia file in a (HTTP) stream that can be opened by others. This conversion does not require much processing power and every notebook or desktop is capable for doing this conversion. However when the multimedia file 2

9 uses a rare codec, one that is not supported by the streaming provider application, the streaming provider needs to convert this file on the fly to a format supported by the application and demanding a lot of resources. This problem can be encountered when using the DLNA protocol [3] [4]. V. JAVA BINDINGS FOR VLC MEDIA PLAYER When using streaming in combination with the Java programming language, the possibilities for streaming libraries are numerous. One example is the Java Media Framework (JMF), a big heavy-weight library that includes a lot of extra components that are not needed in this research. Another drawback is that JMF relays on native libraries, which in case of a mismatch need to be installed on the host machine [5]. After researching the different available libraries, one solution looked very appealing. The Java bindings for VLC media player are namely based upon the existing VLC media player, which is a very popular media player the day of writing. A couple of advantages made this solution the most suitable: good codec support, flexible and lightweight, streaming support with multiple protocols (HTTP included) and a large online community. All supported codecs can be found on the VLC media player website [6]. For development and testing purposes vlcj proposes a test package with most features of the libraries included. With this solution developers are able to evaluate the libraries before implementing them. The testing of the solution will also be based upon this available test package. VI. PRACTICAL IMPLEMENTATION The vlcj libraries will be implemented for the streaming provider and the streaming receiving side using the HTTP protocol. The provider side requires the most development time due to the fact that the multimedia files needs to be selected and the streaming needs to be set up. The receiving side will require only a few lines of codes considering the few actions which need to be implemented. The receiver side will only need to open of the broadcasted stream. The streaming provider will mainly be using two vlcj classes: HeadlessMediaPlayer and MediaPlayerFactory. The following example code contains the setup of the stream. 1 String[] libvlcargs = {<<arguments of VLC>>}; 2 MediaPlayerFactory mediaplayerfactory = new MediaPlayerFactory(libvlcArgs); 3 HeadlessMediaPlayer mediaplayer = mediaplayerfactory.newmediaplayer(); Listing 1. Streaming provider code for setting up the necessary classes. MediaPlayerFactory is used to create the class containing media player, together with arguments concerning the play back of media. After the MediaPlayerFactory the creation of HeadlessMediaPlayer is realised, this will be the media player class able to stream the multimedia file String mediaopt = :sout=#duplicate{dst=std{ access=http,mux=ts,dst=<<ip:portnr>>}} ; mediaplayer.setstandardmediaoptions(mediaopt); mediaplayer.playmedia(<<media file>>); Listing 2. Streaming provider code for opening the multimedia file and initiating the stream using the HTTP protocol. Now the HeadlessMediaPlayer class has been giving the media options to stream the multimedia file using the HTTP protocol and with mediaplayer.playmedia() the streaming is started. The IP address that is used as a media option will determine who can open the network stream, it is possible to use a broadcast address, so that the stream can be opened by multiple receivers. The port number decides on which port the stream will be available and this port must be known by the receiver side. The provider side of the streaming is setup and if the media options are set correctly, it is now easy for the receiving side to open the stream with the vlcj library String[] libvlcargs = {<<arguments of VLC>>}; MediaPlayerFactory mediaplayerfactory = new MediaPlayerFactory(libvlcArgs); HeadlessMediaPlayer mediaplayer = mediaplayerfactory.newmediaplayer(); String mediaopt = <<options for playback>> ; mediaplayer.setstandardmediaoptions(mediaopt); mediaplayer.playmedia(<<ip:portnr>>); Listing 3. Streaming receiver code to open the HTTP stream setup by the streaming provider. At first the client sets up the required class to be able to open the stream, the same as the server, except for the media options. The last code line is the most important here, it opens the streamed multimedia file. The provider and the receiver side are implemented in an easy to understand way using the vlcj library. When implementing the local streaming with different protocol options, the information can be found on the vlcj developer s website [7] [8]. This paper will now continue with the testing of the Java bindings for VLC media player and finish off with a conclusion. VII. TESTING AND CONCLUSION The previous section talked about the usage of the vlcj library, this section will show the results of the test and conclude the research. As testing methods, the standard VLC Media Player is used in combination with the test package provided by vlcj. Like mentioned in the previous section vlcj has a vlcj test package that includes most of the features available in the vlcj library. Since streaming with HTTP is included in this package as well, the testing will be based upon this provided solution. The test setup is a Windows 7 streaming provider together with a Windows XP Virtual Machine (using Microsoft Virtual PC). The Windows 7 machine will be used to set up the streaming of an.mp4 file encoded with x264 (Matroska) and MPEG AAC for audio. The virtual Windows XP will be using the vlcj test package to open the stream with the HTTP protocol. The communication between the 2 applications will 3

10 be done over the local area network, using a virtual network adapter. In figure 3 the setup of the test approach can be seen. At the left it is showing the VLC Media Player on the Windows 7 machine that is streaming the multimedia file and on the right the Windows XP Virtual Machine with the Java test application that will open the stream using HTTP. The results fulfilled the expectation of vlcj. The requirements posed in the first section were local streaming, large codec support and low processing power. Local streaming was successful and verified by testing with the test application. Large codec support can be confirmed by using a more unusual codec to stream over the local network. Also the processing power was very low when streaming the multimedia file. The processor was an Intel T6670 Dual Core and stayed lower than 7% for streaming the multimedia file over a period of 40 minutes. Java together with the vlcj library is a solid streaming solution that can be used in the local area network. Even when implementing this across the World Wide Web is possible with HTTP as a streaming protocol. For playing multimedia files on a remote computer, streaming is a flexible way of solving this problem. The streaming provider is not required to have high performance hardware, since the streaming only demands low processing power. The streaming receiver opens the streams when it is allowed access to the stream and when it knows the IP address and port number of the streamed media. Java as programming language made it possible to program for multiple platforms together and creating an application that is cross-platform. This in combination with the JIT compiling and the JVM, Java was stable, high in performance and versatile at the same moment. When implementing an application with Java, the available libraries are from the large online Java community. The Java bindings for VLC media player was found in this large community and gave great results. First of all the codec support covers almost all available codecs nowadays and secondly the implementation when without problems. After using the test package provided by vlcj, the conclusion was very clear. The combination of streaming and Java, while making use of vlcj makes a solid base for streaming multimedia along the local area network. REFERENCES [1] Tim Lindholm and Frank Yellin, The Structure of the Java Virtual Machine in The Java Virtual Machine specification, 2nd ed., Addison- Wesley, 1999, ch. 3. [2] Oracle Technology Network - Java", Available at June [3] About Digital Living Network Alliance - White Paper, Available at June [4] Why do I hate DLNA protocol so much?, Available at August [5] Osama Tolba, Hector Briceño and Leonard McMillan, Pure Java-based Streaming MPEG Player unpublished. [6] VLC Features Formats Codec Support, Available at November [7] Java Bindings for VLC Media Player, Available at June [8] vlcj - Java bindings for the VLC Media Player, Available at June Fig. 3. Streaming test setup using a Windows 7 machine (host machine) and a Windows XP Virtual Machine. At the left the VLC media player in the Windows 7 system, which streams the multimedia file? At the right the vlcj test package in the Windows XP Virtual Machine opening the stream with the HTTP protocol and using the IP address and port number of the streaming provider. 4

11 0.7µm Digitally Controlled Switched Capacitor Oscillator for a resonance tracking ultrasound transmitter. B. Blockx 1, W. De Cock 2, P. Leroux 34 1 Katholieke Hogeschool Kempen, Departement Industriële en biowetenschappen, Geel, Belgium ² SCK-CEN, Belgian Nuclear Research Centre, Advanced Nuclear Sytems Institute, Mol, Belgium 3 Katholieke Hogeschool Kempen, Departement Industriële en biowetenschappen, Geel, Belgium 4 Katholieke Universiteit Leuven, Departement ESAT-MICAS, Leuven, Belgium Abstract A digitally controlled oscillator is presented in this paper. The oscillator is part of a robust PLL transmitter which is used for ultrasonic visualization in the MYRRHA-reactor. It has a quiescent frequency of 5MHz, a tuning range of 4-6MHz and a power consumption of 15mW. Index Terms Oscillator, DCO, Band-pass filter, PLL I. INTRODUCTION Oscillators are used in a wide variety of applications. The oscillator presented here is designed for use in a robust PLL stabilized transmitter for ultrasonic visualization in the MYRRHA-reactor (Multi-purpose hybrid Research for Hightech Applications). The reactor serves as a research tool to study the effectiveness of the minor actinides transmutation. Because the liquid cooling in the reactor is done with leadbismuth (LBE) at high temperatures (200 C C) which is opaque, optical inspection of the reactor is not possible. Therefore visual inspection is done using ultrasonic waves. The transducers [1] used to send and receive the ultrasonic waves, which are specially designed for use in harsh conditions (high temperature, corrosive nature of LBE and strong gamma radiation), have a resonant frequency of 5MHz. In a regular ultrasound transmitter [2] the transducer is driven by a short Dirac pulse to which the transducer responds with an oscillation burst at the resonance frequency (its impulse response). The aim is to detect and digitize this pulse with high accuracy in a band-pass Delta-Sigma receiver centered around this resonance frequency. The receiver bandpass frequency needs to be tunable as the transducer may suffer rapid aging under the harsh conditions presented above, which may affect the resonance frequency. In order to be able to automatically tune this receiver, in our transmitter design the transducer will be driven with a pulse at its resonance frequency. This resonance frequency will be measured continuously using a Phase Locked Loop. A copy of the Digitally Controlled Oscillator in the PLL can be used as resonator in the Delta-Sigma loop which ensures accurate tuning if both VCO and resonator are controlled with the same digital control word. This work will focus on the design of this oscillator. The oscillator is implemented with switched capacitors. This technique has several advantages [3] [4] [5] in the design of programmable oscillators and filters. They can be fully integrated, have high accuracies and stability, operate over a wide frequency range, are small in size and have good temperature stability. The primary advantage is that the quality factor Q and center frequency are independently variable [6]. The oscillator presented in this paper consists of a band-pass filter with a hard limiting feedback [7]. Frequency and amplitude of the resulting oscillation can be controlled separately. This is explained in the next section. As mentioned earlier the oscillator needs to work under high temperature conditions. This puts certain constraints on the design of the operational amplifiers. The use of Complementary Metal-Oxide-Semiconductor (CMOS) allows fabrication on a single die and offers low power consumption. At high temperatures conventional bulk CMOS has many drawbacks including increased junction leakage, reduced mobility and unstable threshold voltages [8]. Excess leakage current is the most serious problem which can reduce circuit performance and eventually can destroy the chip due to latchup. Despite these issues, an instrumentation amplifier in bulk CMOS has been demonstrated at temperatures reaching 300 C [9]. When MOS is compared with other structures like JFET s or BJT s, the MOS devices more suitable for high temperature because an insulated gate is used. The gate in a JFET and the base in a BJT are formed by a junction. Each of these junctions is in effect a diode resulting in leakage currents and degrade biasing. II. PLL CONTROLLED ULTRASOUND TRANSCEIVER Because of the harsh condition inside the reactor, the transducer suffers from rapid aging with shifting of the resonant frequency as a result. The efficiency of the transducer reduces under these conditions resulting in a reduced measurement distance or an increased power requirement. To 5

12 counter this problem the resonant frequency of the transducer is measured and tracked with a PLL. The ultrasonic transducer in the resonance frequency region can be described using the Butterworth-Van Dycke model [10] in figure 1. This model describes the mechanical part,, and the electrical part (the clamping capacitor ) of the transducer. C 0 R s L s C s Figure 1 Butterworth-Van Dycke model The derived ultrasonic transducer input impedance is given by: (1) Figure 2 Modulus and phase of the input impedance of the equivalent transducer model The PLL uses this natural behavior of the transducer to measure and track the resonant frequency of the transducer. This concept is translated into the block diagram in figure 3. From the previous equation we get the resonance frequencies for the serial and the parallel resonance or anti-resonance: Serial resonance frequency: Parallel resonance frequency: 1 (2) 2 R s L s C (3) In [11] the proposed component values for an equivalent transducer model with an operating frequency of several MHz are the following: 0.110, , , With a serial resonance frequency of 5MHz the following component values are calculated: 1, 1, 200, 100 These component values result in a serial resonant frequency of 5.03MHz and a parallel frequency of 5.058MHz. The simulation in figure 2 displays the modules and the phase of the transducer input impedance in (1). This figure clearly shows that when the transducer resonates the phase becomes almost zero. On this resonance frequency and cancel each other out. The parallel resonance frequency is due to the clamping capacitor. Figure 3 PLL block diagram When resonating, the current through the transducer is theoretically in phase with the output voltage of the oscillator. This current is converted in to a voltage which is then compared with the output voltage of the oscillator in the phase detector. The difference between them is integrated by the loop filter which drives the controlled oscillator. The oscillator adjust the output frequency until the difference between the current through the transducer and the output signal of the oscillator is zero. The phase detector is a three state phase detector which has a linear range of 2 radians. The loop filter is implemented as a charge pump type integrator with a finite zero gain which results in a second order, type 2 PLL. The VCO has a quiescent frequency of 5MHz, an input sensitivity of 250KHz/V and an output amplitude of 1V. Figure 4 shows the simulation results of the block diagram in figure 3. At the beginning of the simulation the current through the transducer is not in phase with the output signal of the oscillator. Under this condition the output of the phase detector is continuously negative which results in a rising output of the loop filter and a rising oscillation frequency of the oscillator. After 10µs both input signals of the phase detector are in phase and the oscillator settles around his quiescent oscillation frequency which is also the series resonance frequency of the transducer. C s 6

13 In this paper the values are 2 5 and 10. The sample frequency at which the transmission gates are switching is 100MHz or 10nS. This is fast enough so that the second order term in the transfer function of the band-pass filter can be neglected. 1 1 K 4 C C 1 K C 6 1 C2 V in K 2 C K 5 C V out a) K K (1 z ) Figure 4 PLL simulation III. CIRCUIT DESCRIPTION The controlled oscillator is implemented with switched capacitors. The advantages of this technique have already been mentioned. The basic concept of the oscillator is a band-pass filter with a hard limiting positive feedback [7]. The band-pass filter in figure 5 has a center frequency of 5MHz. The transfer function of the filter is 2 1 After applying the bilinear transformation the analog transfer function is The second order term in the nominator is negligible, which is valid whenever the sampling frequency is considerably higher than the oscillation frequency, so that the TF becomes the well-known band-pass transfer function from which the capacitor values can be calculated. In order to obtain a low-distortion oscillator, it is necessary for the band-pass filter to have a relatively high Q-factor. Suppose now that a square wave with a frequency equal to the center frequency of the band-pass filter is applied at the input of the filter. Under high-q assumption, it is clear that the outputs of the operational amplifiers will be approximately sinusoidal due to the filtering action. V in (z) 1 K (1 z ) z 1 K z b) z 1 V out (z) Figure 5 a) Band-pass filter with switched capacitors b) equivalent signal flow diagram The foregoing arguments demonstrate that the circuit in figure 1 added with a hard limiting positive feedback is a good basic oscillator. Yet, a significant drawback is that the output amplitude is a direct function of the saturation limits of the positive feedback. A solution for this drawback [7] is illustrated in figure 6. The output of the comparator generates logic signals and, which are used to select the input path. V ref X X K 2C1 C3 2 1 C K 4C1 C6 K 6C1 C5 5C 2 C4 Figure 6 Oscillator with switched capacitors Figure 7 shows the simulation results of the circuit in figure 6. The transmission gates are implemented as ideal voltage controlled switches. The operational amplifiers are ideal voltage controlled voltage sources with a voltage gain of 25dB. The minimum voltage gain that guarantees a proper operation of the oscillator is 12dB. Figure 7 shows the output of the first operational amplifier and the output of the second K C 2 X X V out 7

14 operational amplifier. While the output of either operational amplifier is usable, the output of the second operational amplifier is preferred because it has less distortion due to its low-pass response at least with a non ideal oscillator. The quiescent oscillation period of 200 ns is shown clearly.. This results in a overall voltage gain of The maximum output swing is. The operational amplifier is supplied with an asymmetric voltage so that the output signal is symmetrical. The supply voltage is 3 and 2. V DD 3V Figure 7 Ideal oscillator output signal M M 7 8 Figure 8 shows the normalized frequency spectrum of the ideal oscillator in figure 6. This image shows clearly the oscillation frequency of 5MHz. Also the sample frequency plus and minus the oscillation frequency are clearly visible. R 1 V 1, b 2 3V V 0, b 1 8V M6 M5 M3 M4 M1 M2 V in V SS M 11 V out R 2 I R1 M 10 I D9 M 9 M M V SS 2V Figure 9 Low level implementation of the operational amplifiers Figure 8 Normalized frequency spectrum of the ideal oscillator output signal a. Operational amplifier implementation The implementation of each of the two operational amplifiers presented in figure 5 and figure 6 consist of an operational transconductance amplifier followed by a buffer output stage. The implementation of the op amp is shown in figure 9. Current mirror / is biased by. In this case the value of is chosen so that the current drawn by the first stage of the op amp is 450µA from which is 10 times smaller than. The current mirror multiplies this current with a factor of 10. The amplifier has a voltage gain of Figure 10 displays the bode diagram of the voltage gain and the phase margin. The voltage gain is 62dB with a bandwith of 3MHz. The gain at 5MHz, the quiescent frequency of the oscillator, is 56dB. The gain bandwidth is 1.1GHz which is more than the minimum GBW of 120MHz.. The buffer avoids large capacitive loads on each of the output nodes of the two operational amplifiers and increases the output swing. The buffer is biased by with a current of 200µA. This current is multiplied by the current mirror / with a factor of 10. The voltage gain of the buffer is Figure 10 Voltage gain of the operational amplifier 8

15 Figure 11 shows the slew rate of the operational amplifier with a capacitive load of 10pF which is: / Figure 11 Slew rate of the operational amplifier b. Comparator implementation The comparator [12] generates logical signals and which are used to inject the proper amount of charge into op amp 1. The low-level implementation of the comparator used in figure 6 is shown in figure 12. The comparator consists of a preamplifier, the actual comparator and two inverters. To reduce the delay of the amplifier the input signal is preamplified with a voltage gain of. The comparator is supplied with 2.5 and 2.5. The outputs of the pre-amplifier drive the inputs of the comparator. When is on, the gates of M5b and M6b are low causing the output voltage to be at. and vice versa. The same reasoning applies to. Eventually the outputs of the comparator drive the inputs of the inverters. R 2 M 2 V in R 1 a M 1a V SS Vout V DD M 3 R 1 b V DD M 1b M 7 b X V in M 4 a M 6 a Vout V out M 5a V DD V SS V DD M 8 b M 5b X M 6b V out M 4b Figure 13 a) Input signal of the comparator b) output signal of the comparator c) inverted output of the comparator The input signal has a amplitude of 200. The output range of the comparator is and. The delay of the comparator is 7nS. Without the preamplifier the delay is 25ns. c. Frequency control The oscillator frequency can be made variable when the capacitance of one or more capacitors changes. Here capacitor is made variable because this capacitor has most influence on the oscillation frequency. The capacitor can be made variable by placing capacitors parallel. Each capacitor is controlled separately by the transmission gate in series with it. These transmission gates, shown in figure 14, can be controlled by a digital control word. The phase-frequency detector in a digital PLL generates a digital number which is integrated in a counter. The depth of that counter determines the number of capacitors used to control the oscillation frequency. Figure 14 shows an example of the oscillator controlled by a 4 bit digital word. If # then 8, 4, 2,. When the number of bits in the digital word is increased, more capacitors can be placed parallel which increases the accuracy by which the oscillation frequency can be controlled. M 7 a M 8 a V SS Figure 12 Comparator The inverters avoid capacitive loads at the output of the comparator and produce a square wave. Figure 13 shows the input- and output signals of the comparator. V SS 9

16 2 1 8C C 11 C C 14 K6C1 C5 C 1 1 X K2C1 C3 2 1 K5C 2 C 2 4 C 2 V out V ref X 1 1 X Figure 14 Frequency controlled oscillator IV. OSCILLATOR SIMULATION RESULTS X Figure 16 Normalized frequency spectrum of the oscillator output signal The signal-to-noise ratio of the oscillator output signal is 11.1dB. Figure 15 displays the output of the oscillator in figure 6. The operational amplifiers and the comparator are implemented as shown in figure 9 and figure 12. The quiescent oscillation frequency of the oscillator is Figuur 17 displays the normalized frequency spectrum of the output signals generated by the oscillator in figure 14. The spectrum consist of 15 different oscillation frequencies. The range of the oscillator is 5.5MHz. To obtain a range of 2MHz the digital word can be limited from 0110 to The oscillation range of 2MHz, from 4MHz to 6MHz, is then controllable in 6 steps. which is corresponds to the required oscillation frequency. The peak-to-peak voltage of the output signal is 2V. Figuur 17 4 bit controlled oscillator output frequencies Figure 15 Transient analysis of the oscillator output signal Figure 16 shows the normalized frequency spectrum of the oscillator oscillating at 5MHz. The oscillation frequency of 5MHz is clearly dominant. Figure 18 displays the output signal of the oscillator simulated at different temperatures. The amplitude drift is less then 40mV over a temperature range of 100 C. Figure 18 Amplitude drift of the oscillation output signal 10

17 V. CONCLUSION In this paper the high level implementation of a PLL ultrasound transmitter has been presented. The need for a PLL based transmitter with a controlled oscillator is clarified. The design of the controlled oscillator with sinusoidal output used in this PLL has been discussed in detail. The oscillator uses a switched capacitor topology with a high Q bandpass filter and a comparator to provide the required loop gain. The circuit is simulated in a standard low-cost 0.7µm CMOS technology. Frequency control is performed by a digital control word used to adjust the feedback capacitor. The DCO has a center frequency of 5MHz, consumes only 15mW, has a SNR of 10.7dB and is stable over a wide temperature range. REFERENCES [1] R. Kazys, L. Mazeika, E. Jasi Niené, A. Voleisis, R. Sliteris, H. A. Abderrahim, M. Dierckx. Ultrasonic Evaluation of Status of Nuclear Reactors Cooled by Liquid Metal, ECNDT [2] Shen, C., Li, P. Harmonic Leakage and Image Quality Degradation in Tissue Harmonic Imaging. IEEE Transactions on ultrasonics, ferroelectrics and frequency control, vol 48, no. 3, may [3] D. J. Allstrot, R.W. Brodersen and P R Gray. An Electrically Programmable Analog NMOS Second-Order Filter, ISSCC 1979 [4] B. C. Douglas and L.T Lyon. A Real-time Programmable Switched Capacitor Filter, ISSCC [5] M. Verbeck, C. Zimmermann, and H.-L. Fiedler, A MOS switchedcapacitor ladder filter in SIMOX technology for high-temperature applications up to 300 C, IEEE J. Solid-State Circuits, vol. 31, no.7, pp. 908, July [6] B. C. Douglas. A Digitally Programmable Switched-Capacitor Universal Active Filter/Oscillator. IEEE Journal of solid-state circuits, vol sc-18, NO. 4, April [7] PAUL E. FLEISCHER, A switched Capacitor Oscillator with precision amplitude control and guaranteed start-up. in IEEE Journal of solidstate circuits, vol sc-20, NO. 2, April [8] M. Willander and H. L. Hartnagel (eds.), High Temperature Electronics, Chapman & Hall, London, [9] P. C. de Jong, G. C. M. Meijer, and A. H. M. van Roermund, A 300 C dynamicfeedback instrumentation amplifier, IEEE J. Solid-State Circuits, vol. 33, no.12, pp , Dec [10] L.Svilainis, G. MOtieJunas, Power amplifier for ultrasonic transducer excitation, ISSN ULTRAGARSAS, Nr.1(58) [11] A. Arnau (2004). Piezoelectric Transducers and Applications. Berlin: Springer-Verlag p [12] Koen Uyttenhove and Michiel Steyaert, A 1.8-V, 6-bit, 1.3-GHz CMOS Flash ADC in 0.25 µm CMOS, ESSCIRC

18 12

19 Motion control of an arm orthosis by EMG signals through ANN analysis Joachim Buyle 1, Ekaitz Zulueta 2 1: Katholieke Hogeschool Kempen, Department of Industrial Engineering, Geel, Belgium 2: Universidad del País Vasco, University College of Engineering, Vitoria-Gasteiz, Spain 1 June 2011 ABSTRACT This paper briefly describes the research method used to attempt to improve the motion or speed control of an arm orthosis by EMG signals. Data obtained from EMG sensors is processed by calculating its auto-regressive coefficients, using the Yule-Walker method. These coefficients are fed in an artificial neural network and validated to check whether or not the AR coefficients correspond to the desired output. Key words: EMG, Arm Orthosis, ANN, Yule- Walker, AR coefficients, MATLAB I. INTRODUCTION This research project is carried out in order to improve the control and movement of a robotic arm prosthesis or orthosis. The purpose of this orthosis is to support a person, who has lost strength in his arm, to lift objects. The orthosis is controlled by electromyographic or myoelectric signals, captured at the arm s biceps and triceps muscles using specialized surface EMG sensors and equipment. These EMG signals correspond to the body s emitted electric signals in the nervous system, which produce movement. This kind of application could help handicapped or disabled persons to carry out a specific job or operation. These man-machine interfaces have become very important in the last years will become even more important in the future. Each year new groups of students contribute to this research project. The last group of students mainly focused on improving the system s signal processing to control the orthosis in a more natural way. They implemented a fuzzy logic controller, which processes the EMG signals using fuzzy rules. The outcome of this method was indicating the most important EMG signal of a certain input. They also implemented the use of a goniometer. This device allows you to measure the angle of the orthosis and adds more feedback and possibilities to control the orthosis in a better way. This year ( ) I joined two local Spanish students, Marcos Albaina Corcuera and Enrique Kike de Miguel Blanco, who continued the project with updating the control diagram in Simulink in order to improve the arm s motion more naturally. We found a way to improve the control diagram using the Fast Fourier Transform (FFT) and control the orthosis in a real-time environment. Besides that the main focus of my research was to look for an offline or non real-time method to improve the arm s movement by using a more nonconventional or more intelligent system to extract and analyze the data. We opted to analyze the EMG signals with the Yule-Walker Auto-Regression (AR) coefficients. For a certain subset of data these transfer function coefficients could indicate which Motor Unit Action Potential (MUAP) is active. These coefficients would serve as an input for an Artificial Neural Network (ANN). Using a training set, derived from the goniometer s signal, we could train the neural network and indicate if the correspondent AR coefficients are predicted correctly as the arm moving up, down or not moving. At a later stage the idea is to combine this learned knowledge with a control diagram in Simulink. II. ELECTROMYOGRAPHY Electromyography or EMG is a technique for evaluating and recording electrical activity produced by skeletal muscles or muscles under control of the nervous system, for example the biceps muscle in the upper arm. EMG sensors detect the electrical potential of the muscle cells when these cells are electrically activated by the nervous system. These EMG signals can be analyzed to detect and help treat medical abnormalities or used for biomechanics, like in this project. There are many applications for the use of EMG. EMG is used 13

20 clinically for the diagnosis of neurological and neuromuscular problems. It is used diagnostically by laboratories and by clinicians trained in the use of biofeedback or ergonomic assessment. EMG is also used in many types of research laboratories, including those involved in biomechanics, motor control, neuromuscular physiology, movement disorders, postural control, and physical therapy. Electromyography is the recording of electrical discharges in skeletal muscles, for example: muscle activation patterns during functional activities like running or weight lifting. Muscle fibers have to be stimulated to initiate muscle contraction. They are innervated by motor neurons located in the spinal cord or brainstem. Motor neurons are connected to the skeletal muscle by nerve fibers, called axons. Each fiber of a muscle receives its innervations by one single motor neuron. Contrary, one single motor neuron can innervate more than one muscle fiber. A motor neuron and its associated fibers are defined as a motor unit. Feinstein described that one motor unit controls between three and 2000 muscle fibers, depending on the required fineness of control. When a motor unit is activated by the central nervous system, an impulse is sent along the axon to the motor end plates of the muscle fibers. As consequence of the stimulus, the motor end plates release neurotransmitters that interact with receptors on the surface of the muscle fibers. This results in a reduction of the electrical potential of the cells and the released action potential spreads through the muscle fibers. Such a depolarization is called end plate potential. The combined action potentials of all muscle fibers of a single motor unit are called "Motor Unit Action Potential" (MUAP). As the tissue around the muscle fibers is electrically conductive, this combined action potential can be observed by means of electrodes. The repetitive firing of a motor unit creates a train of impulses known as the "Motor Unit Action Potential Train" (MUAPT). These MUAPTs can be measured from the body s skin through SEMG sensors. SEMG or surface electromyography is used in this project to measure muscle activity from the skin. Through SEMG, the combination of electrical activity or action potentials from the numerous muscle fibers that contribute to a muscle contraction can be collected and analyzed. III. ARTIFICIAL NEURAL NETWORKS A way to improve the movement of the orthosis is by analyzing the EMG signals with an Artificial Neural Network. An ANN can be trained to verify if the calculated auto-regressive coefficients are correct by validating that input with the goniometer s signal, which indicates the produced motion of the arm orthosis. An artificial neural network or ANN is a computer algorithm created to mimic biological neural networks. Even with today's complex computing power standards, there are certain operations a microprocessor cannot perform. An ANN is a nonlinear system used to classify data, which makes this a very flexible system. ANNs are used in a wide range of applications such as control, identification, prediction and pattern recognition. ANNs are considered to be relatively new and an advanced technology in the field of digital signal processing. And it applies to much more fields than just engineering. It has two main functions: pattern classifiers and non-linear adaptive filters. An ANN is also an adaptive system, just like its biological counterpart. Adaptive means that during its operation parameters can be changed. This is known as the training phase. An ANN works by a step-by-step procedure, which optimizes a criterion, known as the learning rule. It crucial to chose a proper set of training data to determine the optimal operating point. There are many different architectures for NNs with different types of algorithms, and although some ANNs have a very complex nature, the concept of a NN is relatively simple. Basically an ANN is a system that receives input data, processes that data and delivers an output. The input data is usually an array of elements, which can be any representable data. The input data will be compared with a desired target response and an error is composed from the difference. This error will be fed back into the system to adjust the parameters of the learning rule. This feedback process will continue until the desired output is acceptable. Since there is no way to determine exactly what is the best design, you'll have to play with the design characteristics in order to get the best possible results. So it's pretty difficult to refine the solution when the system doesn't do what it is intended for. However ANNs are very efficient in terms of development, time and resources. ANNs can provide real solutions that are difficult to match with other technologies. Collecting EMG sensor data The first step is to take a series of samples of the EMG sensors. We take a series of different, controlled movements and collect data from the biceps, triceps and goniometer. The test subjects are my project partners Kike and Marcos. They are two persons with different physical characteristics who 14

21 will provide different EMG patterns. Both execute the same controlled movements, which include moving the arm up and down, fast or slow, with or without extra weight added. When extra weight is hold in the hand, in this case a heavy bag, EMG signals tend to more clear as more force is needed to move the arm. We try to locate the sensors in the exact same position every time. Each sample is recorded for nine seconds and then saved as a.mat file. We try to find a clear pattern of distinction in muscle movement and EMG activity for both the biceps and triceps muscles. From each test subject one sample will be chosen for further investigation with an ANN. After recording more than 10 different samples of both test subjects, sample 5 of Marcos, marcos5, provided the best wave patterns for further investigation. Marcos5 was created with following movements: The arm starts down, rested next to the body, then moved up quickly against the shoulder (fully contracted) and then back down again. This move is repeated four times, with no extra weight added. exact same position. Surface sensors also tend to move a bit when the arm is moved, making them more susceptible to noise and errors. It seems that EMG signals tend to be easier to distinguish when the arm is moving in fast motion. IV. IMPLEMENTATION For the implementation of the Artificial Neural Network we used MATLAB. We begin with loading the MATLAB file that contains the EMG sample data. The data contains 4 signals: biceps, triceps, goniometer and the time. The next step after loading the EMG data into MATLAB is the feature extraction. We have raw EMG data but the question is how we will use this data to produce a meaningful input for the ANN. We will use this input signal of Yule-Walker coefficients to compare it to the training set which will be based on the goniometer signal. Yule-Walker Because an EMG is signal is a composition of different EMG impulses we need to find a way to decompose the signal into multiple signals or inputs. We need to know which muscle fiber or MUAP is active at the time of movement and the amplitude of the strength. One way of doing that is to estimate an autoregressive (AR) all-pole model using the Yule- Walker method. This method returns the transfer function coefficients of the input signal, which can be an indication for which Motor Unit Action Potential (MUAP) in the muscle is active. Figure 1: A plot of the biceps, triceps and goniometer signals of sample marcos5 Sample 5 of Marcos ( marcos5.mat ) and sample 6 of Kike ( kike6.mat ) are chosen to be further investigated because when moving the arm the graph shows a clear, sudden increase in EMG muscle activity. Sample 5 of Marcos is clearly the best sample, as you can clearly see which muscle is active at what time or movement. You can see a sudden increase of EMG activity in the biceps signal if the arm moves up and increased activity in the triceps signal when the arm is moving down. There are numerous factors why the other samples are not that easy to distinguish. The main reason is that surface sensors produce much more noise and capture more EMG signals at a time than the more precise intra-muscular or needle EMG sensors. Also physical characteristics of the test subject are important as one person can have more or less body tissue and a different muscle structure than the other. Third thing to consider is the placement of the sensors: it is quite difficult to place the sensors in the In statistics and signal processing, an autoregressive (AR) model is a type of random process, which is often used to model and predict various types of natural phenomena. The autoregressive model is one of a group of linear prediction formulas that attempt to predict an output of a system based on the previous outputs. The notation AR(p) indicates an autoregressive model of order p. The AR(p) model is defined as: where! 1,,! p are the parameters of the model, c is a constant (often omitted for simplicity) and! t is white noise. White noise is a random signal (or process) with a flat power spectral density. In other words, the signal contains equal power within a fixed bandwidth at any center frequency. An autoregressive model can thus be viewed as the output of an all-pole infinite impulse response (IIR) filter whose input is white noise. 15

22 It is based on parameters! i where i = 1,..., p. There is a direct correspondence between these parameters and the covariance function of the process, and this correspondence can be inverted to determine the parameters from the autocorrelation function (which is itself obtained from the covariances). This is done using the Yule-Walker equations. The Yule-Walker method, also called the autocorrelation method is used to fit a p th order autoregressive (AR) model to the windowed input signal, x, by minimizing the forward prediction error in the least-squares sense. This formulation leads to the Yule-Walker equations, which are solved by the Levinson-Durbin recursion. x is assumed to be the output of an AR system driven by white noise. Vector a contains the normalized estimate of the AR system parameters, A(z), in descending powers of z. Because the method characterizes the input data using an all-pole model, the correct choice of the model order p is important. AR(p) model, by replacing the theoretical covariances with estimated values. One way of specifying the estimated covariances is equivalent to a calculation using least squares regression of values Xt on the p previous values of the same series. The EMG data of the biceps and triceps are extracted for an interval of 1 second. The order of the AR model is 9, which will result in 10 coefficients since the first AR coefficient is always equal to one. We calculate the Yule-Walker coefficients for both the biceps and triceps EMG signals after a white noise filter is applied. The Yule-Walker equations are the following set of equations: Figure 2: A plot of the biceps and triceps Yule-Walker AR coefficients (9 th order) where m = 0,..., p, yielding p + 1 equations. " m is the autocorrelation function of X,!! is the standard deviation of the input noise process, and # m,0 is the Kronecker delta function. Because the last part of the equation is non-zero only if m = 0, the equation is usually solved by representing it as a matrix for m > 0, thus getting equation: solving all!. For m = 0 we have: which allows us to solve!! 2. The above equations (the Yule-Walker equations) provide one route to estimating the parameters of an Training Set A neural network has to be trained so that a set of inputs produces the desired set of outputs (target). Teaching patters (training set) are fed into the neural network and change the weights of the connections according to a learning rule. So in order to validate the input data of the ANN, the AR coefficients of the EMG data, we must first generate a training set. This training set will be generated by the information from the third sensor, the goniometer. The training set sets the target for matching the input data. Because the goniometer registers movement in the arm, we can see if the Yule-Walker coefficients were correctly predicted. Moving the arm upwards corresponds to +1, moving the arm downwards corresponds to -1 and 0 means the arm isn t moving. Before we generate the training set, the goniometer signal is filtered with an 8 th order low-pass filter to reduce noise and smooth out the signal. Envelope function Next, we will put the biceps and triceps EMG signals through an envelope function. The idea is to further smoothen out the signal in order to prevent noise in the output and get more stable results. Each peak in EMG activity is directly followed by a 16

23 negative peak. This is also the case for the noise, when no or low EMG activity is measured we still see the signal move up and down rapidly. This pattern could confuse the ANN and affect the output in a negative way. To overcome this problem we filter the biceps and triceps EMG signals with an envelope or enclosure function. This function follows the positive spikes of the signal and reduces sudden variations in the signal. The envelope function S(t) with input signal Y(t) is mathematically defined as follows: S(t) = Y(t) if Y(t) > S(t-1) Else S(t) = a * S(t-1) Since we look for a positive rise in the signal s amplitude we take the absolute value of the current signal s value Y(t) and compare that to the previous outcome S(t-1). If the current absolute value Y(t) is bigger than the previous envelope value S(t-1), the current absolute value Y(t) will be stored as the new current envelope value S(t). In case Y(t) is smaller we multiply the previous envelope value S(t1) with a constant a which value is set to This constant defines how quickly the slope of the envelope decreases. Low values, for example 0.1 or 0.3, ensure that the slope declines very fast. This is not desirable, as the result would resemble the original signal a lot. We set a to the maximum value, 0.99, in order produce a more smooth output signal. In the following figure you can see an example of the envelope signal: In red the original EMG signal and in blue the filtered envelope signal. As you can see there are a lot less sudden variations in the signal. gradient of each point we can determine the movement of the arm. By using a sliding window with a certain interval for linear regression we can try to indicate where exactly the arm moves. To keep it simple, linear regression will fit the best possible straight line in a series of points, in this case the slope values of the goniometer signal within each window. Linear regression is defined as follows: Y = [!] * P P = (!t *!)-1 *!t * Y If y is a linear function of x, then the coefficient of x is the slope of the line created by plotting the function. Therefore, if the equation of the line is given in the form: y = mx + b then m is the slope. This form of a line's equation is called the slope-intercept form, because b can be interpreted as the y-intercept of the line, the ycoordinate where the line intersects the y-axis. The slope is the measurement of a line, and is defined as the ratio of the "rise" divided by the "run" between two points on a line, or in other words, the ratio of the altitude change to the horizontal distance between any two points on the line. Given two points (x1,y1) and (x2,y2) on a line, the slope m of the line is: m = (y2-y1)/(x2-x1) Figure 3: an example of the envelope function Linear regression Before we calculate the AR coefficients we will determine a target set or training set out of the goniometer signal. The target set will be based on the movement of the arm so we can than link arm movement to increased EMG activity. We just want to know when the arm moves up or down. So the training set should indicate the upward movement with 1 and the downward movement with -1, 0 indicates no movement. By calculating linear regression, which makes use of the slope or Figure 4: A plot of the (filtered) goniometer signal in red, the linear regression signal in blue and the target signal in magenta. ANN Training Once we have all the input data ready, it is time to start training the ANN and looking at the results. We create a feed-forward network with a single hidden layer and a varying number of neurons (3 to 7). In feed-forward neural networks the data propagates from input to output, over multiple 17

24 processing units, called hidden layers. There are no feedback connections present. A feedback connection means that the output of one unit is fed back into an input of a unit of the same or previous layer. More neurons require more computation, and they have a tendency to over fit the data when the number is set too high, but they allow the network to solve more complicated problems. More layers require more computation, but their use might result in the network solving complex problems more efficiently. Comparison of different ANN setups The network architecture for this research consisted of an input layer, one hidden layer and an output layer. A hidden layer in an ANN consists of a number of neurons. Several trainings and validations are executed using different number of neurons and the two types of sample processing: with and without using an envelope function. I compared the EMG samples of marcos5 and kike6 with different ANN parameters and reviewed the results to see whether or not we can find an optimal configuration for classifying Yule- Walker AR coefficients as correct movement or not. In total we will test 20 different setups. Four sample sets: marcos5 with and without envelope function and kike6 with and without envelope function. Each sample set will be trained 5 times, each time with a different number of neurons, ranging from 3 to 7. We will compare results graphically in a plot with the target output in red and the actual output in blue. The blue pattern has to match the red target as good as possible. The following graphs are an example of such a test set: Figure 6: marcos5 with 5 neurons and envelope filter With 5 neurons applied both positive and negative movements are more or less properly classified within its target. The envelope function gets rid of a lot of noise. This result is quite ok. V. CONCLUSION After examining the results I ve come to the following conclusions. The use of the envelope functions seems to reduce the noise or interference in most cases. Mostly it reduces the amplitude between the -0,5 and +0,5 V range, making it easier to spot the spikes, which approach a +1 or -1 value. It also seems that negative movements (stretching of the arm) are way easier classified than positive movements (flexing of the arm). In all samples we see that negative spikes around the target are way more abundant than positive spikes. The use of an envelope tends to filter some of the positive spikes. But in general we can say the envelope function works. However the use of the Yule-Walker AR method to classify surface EMG signals doesn t look like a very stable method. It works more or less but acceptable results depend heavily on the setup of the ANN, of the sample used and on the test subject. The amount of neurons doesn t seem to be conclusive as for Marcos samples we got better results starting from 5 neurons and for Kike s results the opposite happened. In general we can say that this used method of analyzing surface EMG signals with an ANN and Yule-Walker AR coefficients didn t give the result we hoped for. REFERENCES Figure 5: marcos5 with 5 neurons and without envelope filter Roberto Merletti, Philip A. Parker, Electromyography - Physiology, Engineering and Noninvasive Applications, IEEE Press/Wiley-Interscience, 2004 Mark Hudson Beale, Martin T. Hagan, Howard B. Demuth, Neural Network Toolbox 7 - User s Guide, MathWorks Inc., 2010 MATLAB Help Offline Documentation, MathWorks Inc., 2010 MathWorks Support Online MATLAB Documentation, available at: ScienceDirect, available at: Web of Knowledge (WoK), available at: 18

25 Design of a Discrete Tunable Ultra-Wideband Pulse Generator for In Vivo Dosimetry K. Cools, M. Strackx, P. Leroux Abstract This work presents the design of a tunable discrete UWB pulse generator. Basic pulse generation architectures are reused in a new design for testing a novel in vivo dosimetry technique. Simulations predict the same or a smaller minimum monocycle duration than other published results on a design with discrete components. Depending on the smallest possible stub length, the obtained results are 300ps for 5mm or 215ps for 4mm (time between 20%-20% of the maximum). The reason for this research is because the present in vivo dosimetry methods don t offer a real-time non-invasive in situ measurement. The new envisioned method does all that and the method measures the irradiated tissue directly. Index Terms In vivo dosimetry, IR-UWB, Pulse generator, tunable circuits and devices, Ultra-wideband (UWB) N I. INTRODUCTION OWADAYS cancer treatment is implemented on an increasing number of people, regretfully. A commonly known part of most treatments is radiotherapy, used for curative or adjuvant cancer treatment or in worst cases to control/slow down the cancer. For irradiation of living human beings in vivo dosimetry, which is the measurement of the absorbed radiation dose in living tissue, is required in order to be able to optimize the irradiation cycles and minimize damage to healthy tissue. In vivo dosimetry can be done in many different cumbersome ways. Different physical and chemical effects are used to measure the dose, e.g. luminescence or conductivity. All present measuring techniques still involve time-consuming labor like individual placement, individual calibration or imaging. Some techniques also tend to be invasive, causing great discomfort to the patient. Also none of the current techniques combine real-time measurement and give the opportunity to really measure reactions/changes within the tissue during irradiation. For these reasons there exists a strong need for better and easier ways to do in vivo dosimetry. Kris Cools is a last year s Master Electronics student at the IBW department at Katholieke Hogeschool Kempen (Association KU Leuven), Geel, Belgium ( M. Strackx is with the Katholieke Hogeschool Kempen, Geel, Belgium ( P. Leroux is with the Katholieke Hogeschool Kempen, Geel, Belgium. He is also with the SCK CEN, the Belgian Nuclear Research Centre, Mol, Belgium (tel. : , A novel non invasive concept to tackle the in vivo dosimetry problem is presented in this work using Impulse- Radio UWB (IR-UWB) signals and Time Domain Reflectometry (TDR). UWB is already being implemented in other areas of medicine [1] (e.g. heart rate or respiration monitoring). Another field of great use is high speed datacommunication, but this implements other kinds of UWB. Our concept is based upon the fact that the permittivity (ε, polarization grade) and conductivity in matter is changed when it is irradiated [2] and this change can then be related back to the absorbed dose. With this technique the absorbed dose inside the irradiated tissue may be measured instead of only the entrance and/or exit dose which is the case for many other methods. In principle the combination of TDR and UWB allows investigation of the absorbed dose at any specific depth of interest. TDR is a way to observe discontinuities in a transmission path. The discontinuities in case of dosimetry applications are the multiple transitions between different layers of tissue in a human. Using UWB, defined as a signal with a bandwidth equal to or greater than 20% of the center frequency or 500MHz, whichever occurs first, a variety of frequencies are used to derive information from. The UWB bandwidth is measured at the -10dB points. The propagation depth is dependent on a combination of the permittivity and the conductivity, which are frequency dependent. Not every frequency can penetrate equally deep. This means in order to measure deeper in tissue there are fewer frequencies available to derive information from, which could mean less precise information. The lower frequencies tend to penetrate deeper into a human body and have a significant power reflection. Changes after irradiation are also most noticeable at the lower frequencies [2]. Another benefit of the proposed technique is the real-time monitoring of tissue during irradiation. In order to develop an UWB dosimetry test setup an UWB pulse generator must be made which is capable of generating multiple pulse shapes and thus possessing different frequency spectra. In the next section the basics of UWB are discussed. Next the different subparts of the generator are discussed based on their specific purpose, beginning with the creation of a sharp high-frequency containing signal edge. The second is pulse creation, followed by pulse enhancement and last pulse shaping. Some tuning possibilities will also be mentioned. Discreet Tunable Ultra-Wideband Pulse Generator for In Vivo Dosimetry August

26 II. UWB SIGNALS AND REGULATIONS In order to generate a wide frequency band signal, a short pulse is used in IR-UWB. Ideal would be a dirac-impulse, which contains every frequency, all equal in magnitude. In reality there always is a finite pulse time, and the most ideal signal is a Gaussian pulse. A special property of this pulse is that it is Gaussian in time and frequency domain. The property is shown in the Fourier transform in (1) [3]. (1) Figure 1 shows some commonly used waveforms and it can be seen that the smoother the signal is in time, the smaller the side lobes or the less energy there is outside the bandwidth [4]. This shows that the Gaussian pulse is the most energy efficient. Fig. 2. EC (past ) & medical imaging FCC UWB frequency masks Fig. 1. Different pulse signals with the same bandwidth, the smoothest time signal has the least energy outside of its bandwidth. The regulations on UWB are still evolving, and different regions may have different regulations for different fields of application like indoor, handheld, etc.. The US has the FCC as regulator and countries that are a member of the EU have the European Commission (EC) as main regulator. The common Equivalent Isotropically Radiated Power (EIRP) maximum for indoor use is -41,3dBm/MHz. In practice an EIRP measurement happens by integrating average RMS measurements over 1ms [5]. The formula to calculate the EIRP for a practical antenna is given in (2), where P trans is the generated transmitter power in dbm, P loss is for example the cable loss to the antenna in db and G antenna is the antenna gain in dbi. An example of FCC and EC frequency masks is shown in Figure 2 [6][7]. A commonly used term in UWB is Pulse Repetition Frequency (PRF). It gives the number of pulses sent in a second, expressed in Hz. Sampling in the frequency-domain is a property of a periodic signal, shown in Figure 3. In order to keep the noise-like characteristic, the UWB spectrum has to be continuous. A way of keeping this is to add a noise generator to the oscillator circuit responsible for the PRF. (2) Fig. 3. Periodic in time means sampling in the frequency-domain III. SHARP EDGE CREATION The first thing to know is how much power the UWB signal needs to posses to be useful for the envisioned application. In most cases this is enough to determine the component to be used to generate a sharp edge. There is however a trade-off between the transmit power and the smallest possible rise time. Typically the rise time is measured from 90% to 10% or 80% to 20% of the maximum amplitude. Something all these edge generating components have in common is their nonlinear behavior. If however rise time is more important, one can also achieve a higher power signal by summing signals, this requires multiple identical circuits and an accurate combiner circuit, or using a RF amplifier, both methods raise costs. A. High Power Here the voltage is starting from above 100V. The used components are the avalanche transistor and the Drift Step Recovery Diode (DSRD). The first uses the avalanche effect, which is a very fast transition to conducting state caused by a chain reaction of one free carrier hitting and freeing multiple similar carriers. The operation of the latter is comparable with that of a similarly named component in the mid-power class. Discreet Tunable Ultra-Wideband Pulse Generator for In Vivo Dosimetry August

27 However the dimensions and composition are different. These two can produce rise times in the order of hundreds of picoseconds. The possible PRF is limited due to caution for high temperatures inside the component, causing it to break down, and depending on the surrounding circuit components, who for example need to recharge. Some of the application areas where this high power type is used are radar and Ground Penetrating Radar (GPR). B. Low Power This category contains a pulse generator capable of delivering amplitudes up to ±250mV. The utilized component is a tunnel diode. These can give rise times of a few picoseconds. Figure 4 shows a typical tunnel diode I-V graph compared to that of a regular diode. This diode uses a quantum effect called tunneling, the penetration of a potential barrier by electrons whose energy (by classical reasoning) is insufficient to overcome the height of the barrier. Because of the use of very highly doped materials the gap between electrons of the N-material and holes of the P-material is much narrower than in regular PN junctions. So when forward biased, conduction starts immediately until a peak current is reached. Then the gap becomes bigger, too big for tunneling, and current decreases until the valley current is reached. Afterwards the tunnel diode functions as a normal diode. This decrease in current can happen in a very short period of time making it possible to create steep edges. time when its polarization switches from conducting to blocking state due to a stored charge in the intrinsic layer (PIN diodes). A special property of the SRD is when it stops conducting, it stops in a much more abrupt way than other kinds of diodes. This is due to its layer structure and used materials. Figure 5 illustrates this difference. In order to tune the edge duration, the easiest way is to adjust the amplitude of an input sine. The fastest digital gates are of the Emitter Coupled Logic (ECL) kind, but they can only achieve rise times in the order of hundreds of picoseconds and they drain a constant current. Our focus was most on this power class, which also is the most commonly used. As a result the following sections are applicable for this class, but may not be limited to it. Fig. 5 Ideal reverse recovery transient for a SRD (right) compared to typical PIN diode recovery transient IV. PULSE CREATION For the digital gates this can be done by a combination of two gates. Examples of this are shown in Figure 6. The pulse time can be regulated by placing a capacitor to the ground or adding a delay line behind the inverter. Fig. 6 Pulse cre ation with digital gates Fig. 4. Typical I-V graph of a tunnel diode C. Mid Power Between the two previously mentioned voltage levels, the generation may be performed by the use of a Step Recovery Diode (SRD) or digital gates and optionally at the end of the circuit a broadband amplifier, where Monolithic Microwave Integrated Circuit (MMIC) amplifiers are best suited. A much recalled UWB radio using the SRD is the Micropower Impulse Radar (MIR), which features a low PRF (= 2MHz) [8], but this can be adequate for many applications. Rise times in the order of tens of picoseconds are possible with SRDs. The interesting part of the operating principal is when the SRD ceases to conduct. Some diodes keep conducting for a short period of Other options which are also usable with the SRD are a high pass filter or a stub terminated to the ground. The latter gives a better symmetric shape then the former. So a better Gaussianlike pulse is generated by combining a sharp edge creating device with a stub terminated to the ground. The operating principal is that at the point of the stub a signal splits in two equal parts. One half travels directly towards the output while the other half takes a detour along the stub and reflects back with inverted polarity to join the other half with a certain time delay dependent on the stub length. Figure 7 shows the principal starting off with the edge at the diode and ending up as a pulse on the right. Discreet Tunable Ultra-Wideband Pulse Generator for In Vivo Dosimetry August

28 Fig. 7 Pulse creation with a stub The stub length tuning can be achieved by two different methods, namely line switching and bend switching. In the case of line switching there are multiple spots along a line where it can get connected to the ground. Whereas with bend switching there is only one ground at a single point. It is the path towards it that can be changed. Both layouts are depicted in Figures 8 and 9. Because of the use of diodes as switching elements every switching section has to be DC isolated by coupling capacitors and this means extra deformation of the pulse. This can be resolved by replacing the diodes with MESFETs. With line switching there can only be one section activated at a time, while with bend switching they can be switched binary. V. PULSE ENHANCEMENT Besides the pulse there is also some smaller unwanted signal forming. To separate the pulse from this, another component has to be added. This component will be used to generate a threshold higher then this unwanted signal. There are two possibilities. One is a diode with the right voltage drop. Preferably a Schottky diode can be used because of his fast operation and minimal reverse recovery. The voltage drop can be regulated by biasing the diode with a DC-source. The biasing has to be separated from the rest of the circuit by two coupling capacitors. The second possibility is by use of a MESFET, where V g or V s can serve to adjust the threshold. By using V s one less coupling capacitor has to be used. Which results in less signal deformation. This last possibility also has the extra of amplification, making it possible to place the threshold higher than really necessary and by doing so, the pulse time can be made even shorter, while still being able to get the wanted amplitude. VI. PULSE SHAPING Pulse shaping is done to make a signal fit in a frequency mask, to modulate data or for other reasons. UWB pulses have different names according to their form. You have the Gaussian pulse, the doublet, the monocycle and the polycycle. The two latter are derivatives of the first, with the monocycle being the first derivative. Examples of these pulses and their spectra are shown in Figures 10 and 11. Derived forms can be attained by sending a pulse trough a high pass filter, even an UWB antenna can be used. Another technique to get all these waves is by summing multiple pulses together with a certain delay. It is also possible with a stub, like mentioned in the pulse creation paragraph. For good derivative forms the forward and reflected wave have to overlap [9]. Fig. 8 Line switching Fig. 10 Different waveforms shown with their derivative form Fig. 9 Bend switching Discreet Tunable Ultra-Wideband Pulse Generator for In Vivo Dosimetry August

29 It s assumed that capacitive coupling while be neglectable in our case. Mainly because the MESFETs are kept as close as possible to the stub. So that with a practical minimal ground distance of 4mm or 5mm the capacitive coupling with any MESFET connection is still very small. The resulting signals and spectra for 4mm and 5mm distance without coupling are shown in Figure 12. Fig. 11 Spectra of the waveforms in Figure 10 VII. DESIGN & SIMULATION RESULTS By using a well thought combination of the above techniques an own design was made and is presented in Figure 13. Dependent on the chosen spacing between the stub start and the first grounded point, the minimum monocycle duration will differ. The minimum distance will depend upon the maximum capacitive coupling accepted. For traces on a PCB the rule of thumb for minimal capacitive coupling is a distance of 10 times the conductor width [10]. Also the less two traces are parallel, the less coupling they will have with each other. Fig. 13 Our own discreet tunable UWB pulse generator VIII. CONCLUSION Different approaches to a solution where found. Through finding the best performing and most suited approaches a well thought combination could be made. As sharp edge creator a SRD was chosen. Regulating the edge time can be done through the input voltage. Transforming the edge into a pulse Fig. 12 Predicted monocycle output of our own designed circuit Discreet Tunable Ultra-Wideband Pulse Generator for In Vivo Dosimetry August

30 was done with a shorted stub, which is tunable of length with MESFETs. The pulse enhancing was also done by a MESFET. After which the can be changed into a monocycle or doublet by a second tunable stub. This all leads to a new discreet tunable UWB pulse generator design, which, according to simulations, promises excellent results. ACKNOWLEDGMENT I would like to thank Ing. Maarten Strackx and Dr. Ir. Paul Leroux for their guidance and giving me the chance to participate in their research. REFERENCES [1] E. M. Staderini, UWB radars in medicine, IEEE Aerospace and Electronic Systems Magazine, vol. 17, no. 1, Jan. 2002, pp [2] T. Tamura, M. Tenhunen, T. Lahinen, T. Repo and H. P. Schwan. Modelling of the Dielectric Properties of Normal and Irradiated Skin. Phys. Med. Biol. 39, 1994, pp [3] M. J. Roberts. Signals and Systems, International Edition 2003, New York: McGraw-Hill, 2004, pp [4] K. Siwiak and D. McKeown, Ultra-Wideband Radio Technology. West Sussex, England: John Wiley & Sons Ltd, 2004, pp. 64. [5] S. K. Jones. (2005, May 11). The Evolution of Modern UWB Technology: A Spectrum Management Perspective. Available: [6] United States of America. Federal Communications Commission FCC. Memorandum Opinion and Order and Further Notice of Proposed Rule Making. Washington, D.C., 12 March 2003, section Available: [7] European Union. Commission of the European Communities. Decisions - Commission, Official Journal of the European Union. Brussels, 21 Apr. 2009, section 1.1 Available: 3:EN:PDF [8] J. D. Taylor. Ultra-Wideband Radar Technology. Boca Raton, Florida: CRC Press LLC, 2001, ch. 6. [9] I. Oppermann, M. Hämäläinen and J. Iinatti. UWB Theory and Applications. West Sussex, England: John Wiley & Sons Ltd, [10] P. Colleman. Digitale Technieken: EMC [Course]. Geel, Belgium: Campinia Media VZW, 2008, pp. 25. Discreet Tunable Ultra-Wideband Pulse Generator for In Vivo Dosimetry August

31 Positioning Techniques for Location Aware Programming on Mobile Devices M. De Maeyer 1, T. Larsson 2, P. Colleman 1 1 IBW, Katholieke Hogeschool Kempen, Kleinhoefstraat 2, B-2440 Geel, Belgium 2 ES, Högskolan Halmstad, Box 823, S Halmstad, Sweden Abstract To implement location aware programs for mobile devices two main techniques exist for defining the location: relative defined positioning techniques and absolute defined positioning techniques. Where the relative techniques rely on the limited range of the connection standards like Wi-Fi and Bluetooth, the absolute technique uses a GPS sensor and extra formulas to make conclusions. A combination of both can give interesting results because of the combined advantages of both techniques. T I. INTRODUCTION HESE days people use their phones for all different kinds of purposes, the smartphones are getting more and more powerful and are able to use a big set of sensors. This all leads to a whole new world of opportunities for developers. Using their smartphones, people are connected all day long. They receive messages, phone calls, s,, they retrieve information from the internet and connect with each other using different social networks. The information that a person can receive and use on these devices is overwhelming. But while the amount of information that is available is enormous, the problem shifts from offering information to letting a person find the right information. This information can be a lot of different things. It can be a search that a person wants to execute, it can be messages send by other persons, advertisements, To get the right information to a person different techniques can be used and are already used nowadays. These techniques are filtering information by date, language, tags, interests, but also filtering based on friends of a person or even interests of other persons is used. Another interesting approach to filtering content that can be used is the use of the location of a certain person. For example in advertising a person that is in a library would probably be more interested in seeing advertisements about books than seeing advertisements about fishing. But also a person that is looking at what his friends are doing on Facebook can be more interested in the friends that are close to his location, it even creates opportunities to send messages in a whole new kind of way, by sending messages to people in a location instead of to people you know. In this paper we will look into the techniques that can be used to fetch a location of a mobile device and to make conclusions using this knowledge. Firstly we will describe the related work to this paper, and next we will describe the two main techniques that can be used to fetch a location. Where after we will look into the extra knowledge that is needed to use this information. Then we will compare the two different techniques and form a conclusion about them. II. RELATED WORK A lot of research has already been done on how to use the location of a person or device to be used in advertising [1]. In this usage scenario a lot of money is invested and earned. In general there has been a lot of writings about location based services and how these can be implemented [2]. Other research has been done in how to implement location aware services on mobile devices. A good example of this is the PLASH platform: A Platform for Location Aware Services with Human Computation [3]. This platform describes a complete framework that developers can use to implement local aware services on. This platform is extremely broad and detailed, what makes it complete but also too big to use for smaller projects. But then again it can give a good idea of a structure for location aware projects. The PLASH platform is not dependent on technologies that are used for the connection, another platform that is developed by Qualcomm is called AllJoyn [4] and this proscribes the connection standard that should be used. This is a platform that works using Bluetooth or TCP over Wi-Fi or ethernet. This platform is a lot smaller than the PLASH platform. There has also been done some research in the area of so called geographic routing, this makes use of a different type of routing protocol with a slightly larger packet header. This header will add 16 extra bytes that will hold information on the user s physical location. These are split up into 8 bytes for the user s X coordinate and 8 bytes for the user s Y coordinate [5]. 25

32 III. TECHNOLOGIES There are two different types of defining a location of a certain mobile device, the first is the relative defined positioning technique where a mobile device won t get a coordinate to be able to define it on a place on earth, but it will be positioned nearby another device. The second technique is then of course the absolute defined positioning technique, where a mobile device will be positioned in a 2D coordinate space covering the earth. In this way it is possible to define the exact location of a device. In this part we will explain the different possibilities for these two techniques, and shortly describe the characteristics of those techniques. Afterwards we will describe some extra knowledge that is needed to use these techniques. Then we will compare the techniques and make conclusions. A. Relative Defined Positioning Techniques. In this paragraph we will describe different techniques that can be used to define a location of a mobile device relative to that of another device. To do this one can use different kinds of connection techniques that don t have a big range. The first connection technique that one could use is Bluetooth. This connection technique is available in every mobile device these days and most of them use version 2 whereas the devices already supporting version 3 are still not widespread. Apart from this division there are also 3 different classes that are used for Bluetooth, and the one that is mostly supported in mobile devices is Class 2. This stands for connection distances up to something around 10 meters [6]. Using this connection some projects are already made, like the BEDD software which is a SOCIAL SOFTWARE AND FLIRT TOOL. It is a localized chatting application with some extra functionality [7]. Another technique that can be used is ad hoc Wi-Fi, which uses the Wi-Fi antenna that can be found in a lot of mobile devices nowadays. It makes a direct connection between different mobile devices without the need for any infrastructure. It is also known as peer to peer mode and has a range of 400 meters [8]. Other newer techniques are also possible, these are techniques like Wi-Fi Direct[9] or FlashLinq[10]. These are both still new techniques and are not yet supported by a big range of mobile devices. They both take care of easier connection set-up between multiple devices and a bigger range than the solutions that we mentioned earlier. We can get a bigger range using the previous techniques by using every mobile device as a node in a bigger network topology. In this way every node works as an end station and a routing device to be able to route traffic to the network. But using this technique also called Mobile Ad Hoc Network (MANET) the topology will be able to change a lot and will be unpredictable. For instance if the mobile device X in fig. 1 disappears or moves then the connection between part A and B will be lost. B. Absolute Defined Positioning Techniques To have an absolute defined position we need to place a mobile device in a certain location on the earth. The most logical method would be to use the GPS sensor that is available in most of the smartphones today. But there are also other techniques available like the home location register (HLR) in cellphones or the Geolocation API[11]. To get the current location of a mobile device one could use the GPS sensor in this mobile device. Therefore we can use the Geolocation API in HTML5 to fetch the coordinates of a certain device. Important is to see that the accuracy of a GPS Fig. 1. Unexpected splitting of one MANET in to two different ones position is around 10 meters under ideal circumstances. The use of HLR is not available without carriers permission, so we won t look in to this any further. The same applies to the use of the Geolocation API without use of the GPS sensor on the mobile device because the results are unsatisfying and depend on a network location provider server. IV. EXTRA KNOWLEDGE In this paragraph we will describe the extra knowledge that is needed to use the absolute defined positioning technique. To be able to use the location in a good way for a mobile device we don t need the absolute position of a device but we want to know what is in its neighborhood. In this way this device can act according to what is happening around it. For the relative defined positioning technique no further knowledge is needed, 26

33 because by using this the devices in the neighborhood are already known and shouldn t be retrieved anymore. But when we use the absolute defined positioning technique we need to know what the relation is between device A and device B by using their coordinates. When we have the coordinates of a certain device A (with latitude A lat and longitude A long ) and another device in an unknown location B (with latitude B lat and longitude B long ) we can find the distance D between these two devices: D( A B, A B ) (1) But this will give us a distance in degrees, which is hard to work with. Like for example if someone makes a chat client that displays messages only if people are closer than hundred meters to each other this won t give a good solution. Therefore we use a square with a certain size X (the length of the side of the square). In this way we can say that all the mobile devices in this square (B) around the sending device (A) are in the interest zone, and the devices (C) are not in our interest zone.(fig. 2 is a visual representation of this.) Here we define the interest zone as a certain physical area which covers devices that are closer to the sending device than others. following formulas: NLat = sin sin(latitude) cos + cos(latitude) sin cos(angle) (2) NLong = longitude + atan2 sin(angle) sin cos(latitude), cos sin(latitude) sin ( ) (3) Fig. 3 Finding formulas for T,U,V and W These formulas are used to find the coordinates of a point (NLat,NLong) which lies a certain distance (D) from a point with known coordinates (latitude, longitude). The variable R is the radius of the earth and angle is the direction in which we measure the distance D. Using these formulas we can find the Fig. 2 Interest zone To be able to make a distinction between the mobile devices in the interest zone and those outside of the interest zone we need to find the coordinates of the vertices P,Q,R and S in figure 3. But if we have the coordinates of the points P and R or Q and S this will be enough to define the square. This is the same as saying that we need the latitude of point T and V and the longitude of point W and U. (Because the latitude of T and V are the same as the latitude of respectively P and Q or S and R. The longitude of W and U are the same as the longitude of respectively S and P or Q and R.) To find the latitude of T and V and the longitude of W and U we use formulas for the points T,U,V and W: = sin sin(latitude) cos + cos(latitude) sin (4) = sin sin(latitude) cos + cos(latitude) sin cos(angle) (5) 27

34 = + 2 sin cos( ), cos ²( ) (6) = sin cos( ), cos ²( ) (7) Here we find the latitude of T (Tlat) and V (VLat) and the longitude of U (ULat) and W (WLat) therefore we used formula 2 and used 0 for angle to get formula 4 and 180 to get formula 5 and we used formula 3 to get formula 6 and 7 with respectively 90 and 270 as value for angle, we also know that latitude and NLat will stay the same, because the points are on the same horizontal as the point A. So now we found easy to use formulas to make a decision if a certain mobile device is in the interest zone of a sender. Using this we can make conclusion on whether we want that a mobile device receives certain information or not or if we should rank certain information higher than other information. but there are new and promising techniques coming in the near future. Using the absolute defined positioning technique the location of a mobile device will be set using the GPS sensor in the device. Using this technique we need to make calculations using the coordinates of different devices to be able to make conclusions about interest zones. But it is possible to use this technique to use big locations, on the other hand it is impossible to use this technique in smaller locations like rooms or even buildings. We can conclude that for good results in every use case a combination of both relative and absolute defined positioning techniques are the better solution. In this way software is able to send messages in short ranges using the relative technique and in long range using the absolute technique. ACKNOWLEDGMENT The authors would like to express their gratitude to everyone who supported them during the period in which they made this paper and in special Patrick Colleman, Tony Larsson and the Erasmus Programme. V. COMPARISON In this paragraph we will make a comparison of the two different techniques that one can use to define the position of a mobile device, we will look into the differences of those techniques. Using the relative defined positioning techniques all the connected devices are in the same region by definition. This is because of the limits on the range of the connection techniques. This limit in range is the biggest disadvantage of this technique but it is also the biggest advantage because unlike the absolute defined positioning technique with the relative technique you don t need any further calculations to make conclusions about the interest zone. Using the absolute defined positioning technique we can easily make conclusions about very big interest zones, for example when making a localized chatting service we can easily use this for a location as big as whole city or province. But on the other hand this won t work very good when we would want to use it in a small area like a room or even a building. To send a message in a square with side of 10 meters the difference in degrees between sender (A) and edge point (T,U,V,W) will only be and this together with the accuracy of a GPS will be too small to make good conclusions about an interest zone. REFERENCES [1] B. Kölmel, Location Bassed Advertising Available at s%20conference% pdf, 2002 [2] J. H. Schiller, A. Voisard, Location Based Services San Francisco: Elsevier, [3] Y. H. Ho, Y. C. Wu, M. C. Chen, PLASH: A Platform for Location Aware Services with Human Computation IEEE Communication Magazine,Decemer 2010: pp [4] Qualcomm Innovation Center Inc. AllJoyn Available at [5] J.C. Navas, T. Imielinsk, GeoCast geographic addressing and routing MobiCom 97 Proceedings of the 3 rd annual ACM/IEEE international conference on Mobile computing and networking, 1997: pp [6] Bluetooth Special Interest Group, Building with the Technology available at [7] HiWave, BEDD Social Software and Flirt Tool available at [8] Wi-Fi Alliance, Windows Tips and Techniques for Wi-Fi Networks available at [9] Wi-Fi Alliance, Wi-Fi CERTIFIED Wi-Fi Direct available at [10] Qualcomm Incorporated, FlashLinq: Discover your wireless sense, available at [11] A. Popescu, Geolocation API Specification Feb VI. CONCLUSION We looked into two positioning techniques for location aware programming: relative defined positioning techniques and absolute defined positioning techniques. Using the relative defined positioning technique we explained that it is easy to use without further calculations needed, because of the limit on the range. For this technique one can use ad hoc Wi-Fi or Bluetooth connections these days 28

35 Fixed-size kernel logistic regression: study and validation of a C++ implementation Alard Geukens 1, Peter Karsmakers 1,2 and Joan De Boeck 1 1 IBW, K.H. Kempen, B-2440 Geel, Belgium 2 ESAT-SCD/SISTA, K.U.Leuven, B-3001 Heverlee, Belgium Abstract This paper describes an efficient implementation of Fixed-Size Kernel Logistic Regression (FS-KLR) suitable to handle large scale data sets. Starting from a verified implementation in MATLAB, cpu load and memory usage is optimized using the C++ language. Since most of the computation time is spend on performing linear algebra, first a number of existing linear algebra libraries are summarized and empirically compared to each other. Finally, the MATLAB and C++ implementations of FS-KLR are compared in terms of speed and memory usage. I. Introduction IN machine learning, classification is assigning a class label to an input object, mostly a vector. This classification rule is found using a set of input vectors (x 1, y 1 ),..., (x N, y N ), where x is the input vector and y is a label denoting one of C classes. This is called training the model. Logistic Regression (LR), and its non-linear kernel based extension Kernel Logistic Regression (KLR) are are well known methods for classification that will determine a-posteriori probabilities of membership in each of the classes based on a maximum likelihood argument [1]. Because there needs to be worked with big data sets, a fixed-size variant of KLR is used to save recourses. This is then called Fixed-Size KLR (FS-KLR). An algorithm of FS-KLR is already implemented in MATLAB. The disadvantage of MATLAB is that you do not have full control of the internal working which leads to an implementation with suboptimal computational performance and memory usage. For this reason, an implementation in C++ is chosen here for a better performance in terms of speed and memory usage. Because C++ is one of the most used programming languages, it has many support. It has very advanced compilers that can do many optimizations to produce efficient machine code. Because C++ does not use a garbage collector, there is much more control over the memory usage. It is known that most of the computation time in the FS-KLR algorithm is spend on performing linear algebra. This is why the best linear algebra library is searched for C++ and used in the implementation. By doing this, we hope to outperform the MATLAB implementation. This paper is organized as follows. In section II, a short overview of kernel-based learning is given. Next the large scale KLR method is briefly reviewed for the binary case. Section IV summarizes the available linear algebra libraries. Then, in Section V, some implementation related issues are discussed. In the experimental section MAT- LAB and C++ implementations are compared in terms of speed and memory usage. Section VII discusses the results. Finally, we conclude in Section VIII and mention future work. II. Kernel-based learning Kernels are used to solve problems linearly in a feature space since this is more efficient than solving it non-linear in the original space. For example, in Figure 1 a set of data is shown, where each point belongs to one of two classes, that needs to be seperated. In the original input space this can only be done non-linear, but via a feature map ϕ( ) the points are mapped to a different space where they then can be linearly separated. Fig. 1. From input space to feature space. Define the learning problem where data vectors only appear in a dot product, then the feature map can be defined implicitly using a kernel function which defines a kernel matrix Ω ij = K(x i, x j ) = ϕ(x i ) T ϕ(x j ), i, j = 1,..., N where x R D are the input vectors and ϕ(x) R Dϕ are the vectors mapped in feature space. Any valid kernel function K : R D R D R corresponds with an inner product in a corresponding feature space as long as the function K is positive semi-definite [2]. In the experiments, the Radial Basis Function (RBF) kernel ( x x K(x, x 2 ) 2 ) = exp (1) σ 2 is used, where σ is a tuning parameter. III. Fixed-size kernel logistic regression In order to make it easier for the reader, the FS-KLR is explained for the binary case, so C = 2. A similar deriva- 29

36 tion for the multi-class case is given in [1]. The aim of logistic regression is to produce an estimate of the a-posteriori probability of membership in each of the classes for the given vector. Suppose we have a random variable (X, Y ) R D {1,...,C} where D is the dimensionality of the input vectors, and C is the number of classes. The posterior class probabilities for the binary case are estimated by a logistic model given as P (Y = 1 X = x;w) = 1 1+exp(w T x+w 0) P (Y = 1 X = x;w) = exp(wt x+w 0) 1+exp(w T x+w 0) where w R D. For a non-linear extension, the inputs x are first mapped to a different space. Hence x in (2) is replaced by ϕ(x). In [1] it is shown that the feature map can be approximated by M ˆϕ i (x ) = λ s i M j=1 (2) (u i ) j (x j, x ) (3) to save computer resources where λ s i are the eigenvalues and u j are the eigenvectors of the kernel matrix Ω. M is a subsample of the N input vectors called prototype vectors (PVs) and are randomnly selected in the experiments. In the sequel, the ˆϕ(x) are augmented with a 1 such that the intercept term w 0 is incorporated in the parameter vector w for notational convenience. Equation (2) now becomes 1 P (Y = 1 X = x;w) = 1+exp(w T ˆϕ(x)) P (Y = 1 X = x;w) = exp(wt ˆϕ(x)) 1+exp(w T ˆϕ(x)) By choosing y i { 1,1}, (4) can be rewritten as P (Y = y i X = x i ; w) = (4) exp( y i w T ˆϕ(x i )), (5) which gives the same result as (4). Parameter w is inferred by maximizing the log likelihood max w N l(w) = max ln P (Y = y i X = x i ; w) (6) w i=1 Because very flexible functions are considered, a penalized negative log likelihood (PNLL) is used which adds a regularization term and results in ν min w lν = min w 2 wt w ln N P (Y = y i X = x i ; w) (7) i=1 where ν is a regularization parameter that needs to be tuned. Specializing (7) for KLR leads to the global objective function which can be written as min w ν lν K = min w N 1 2 wt w ln 1 + exp( y i w T ˆϕ(x i )) ν = min w 2 wt w + i=1 N ln(1 + exp( y i w T ˆϕ(x i ))) (8) i=1 In order to solve (8) a Newton Trust Region optimization method can be used. Here a tentative solution is iteratively updated by a step s (k) as follows w (k) = w (k 1) + s (k). (9) This step s (k) is obtained by minimizing a second order Taylor approximate of l ν K, subject to a trust region constraint which results in the following optimization problem a (k) = min s (k) g (k)t s (k) s(k)t H (k) s (k) such that s (k) < (k), (10) at iterate w (k) and a trust region (k), with H the Hessian and g the gradient of the objective function (8). The direction s (k) is accepted if ρ (k), the ratio of the actual reduction to the predicted reduction of the objective function, is large enough. To solve the constrained minimization problem of (10) a conjugate gradient method is used [1]. The gradient of (8) is given by g = N y i ˆϕ(x i )(P (Y = y i X = x i ;w)) + νw (11) i=1 If we define Φ = [ ˆϕ(x) 1 ˆϕ(x) 2... ˆϕ(x) N ] T, p = [ P (Y = y 1 X = x 1 ; w);... ; P (Y = y N X = x N ; w)] T, y = [y 1,...,y N ] T then we can write g = Φ T q + νw (12) where q i = p i y i, i = 1,...,N. The Hessian of (8) in case i equals j is defined as 2 l ν K (w) w i w j = N ˆϕ(x i ) ˆϕ(x i ) T P (Y = y i X = x i ;w) i=1 (1 P (Y = y i X = x i ;w)) + ν (13) for i, j = 1,..., N. If i is not equal to j, then the same result is obtained but without the term ν. Define v i = P (Y = y i X = x i ; w)(1 P (Y = y i X = x i ; w)), then V = diag(v 1,..., v N ). The Hessian is then given by H = Φ T V Φ + νi (14) Details about the derivations of g and H are given in Appendix A Algorithm 1 summarizes the FS-KLR trust region algorithm. 30

37 Algorithm 1 FS-MKLR 1: Input: training data D = (x i,y i ) N i=1,m 2: P arameters : w (k) 3: Output: probabilities P r(x = x i Y = y i ;w opt,i = 1,...,N and w opt is the converged parameter vector 4: Initialize : k := 0,w(0) = 0 CM 5: Define : g (k) and H (k) according to resp. (12)(14) 6: PV selection 7: compute features Φ 8: repeat 9: k := k : compute P r(x = x i Y = y i ;w (k 1) ),i = 1,...,N 11: calculate g (k) 12: min s (k) g (k)t s (k) s(k)t H (k) s (k) such that s (k) (k) 13: compute ρ (k) 14: w (k+1) = w (k) + s (k) 15: obtain (k+1) 16: until convergence The features Φ in step 7 are the mapped input vectors computed by (3). The model hyper-parameters ν and σ are tuned by cross-validation with a grid search method [1]. Specifically in the multi-class case this Newton trust region approach is useful since in this case, the size of the Hessian H R (Dϕ+1)C (Dϕ+1)C will be proportional to C and the feature vector lengths D ϕ + 1 [1]. For large scale multi-class data this matrix is then too large to be stored. In case of a Newton Trust Region algorithm the Hessian is always used in a product with a vector d. This fact can be used to exploit the structure of the Hessian and therefore storage of the full Hessian is not needed [1]. IV. Linear algebra libraries Since the FS-MKLR method is mostly based on linear algebra a short survey on the available C++ libraries is given. A. BLAS and LAPACK BLAS (Basic Linear Algebra Subprograms) are standard routines for basic vector and matrix operations. The operations are subdivided into 3 levels. The Level 1 BLAS are the routines for scalar and vector, Level 2 for matrixvector and Level 3 for matrix-matrix operations. A higher level makes use of the lower levels, for example a Level 3 operation uses also Level 1 and Level 2 BLAS. LAPACK (Linear Algebra Package) is a standard for routines such as eigenvalue decomposition, that we ll need for our implementation. LAPACK makes use of the BLAS as much as possible. So the faster the BLAS routines are, the faster LAPACK will work. Both BLAS and LAPACK only define the functionality and interfaces, the actual implementation is not standardized [3], [4]. A reference BLAS written in Fortran77 can be found on a reference LAPACK written in Fortran90 can be found on These do not support multithreading. B. Optimized BLAS There exist many implementations for different architectures of the BLAS which also all support multithreading: ATLAS (Automatically Tuned Linear Algebra Software) will create optimized BLAS by automatically choosing the best algorithms for the architecture during compile time. ATLAS contains also a few LA- PACK routines [5]. The source code is found on GotoBLAS tries to use the memory as efficient as possible. The library gets optimized for the architecture it gets compiled on. The basic subroutines are developed in assembly language [6]. On the source code is found. Intel Math Kernel Library (MKL) contains Intel s implementation of BLAS and LAPACK. So they are very optimized for Intel processors. This library is not free, but an evaluation version can be downloaded on ACML (AMD Core Math Library) contains AMD s implementation of the BLAS and LA- PACK. Good performance is expected on AMD processors. ACML can be found on /Pages/default.aspx. C. Other Libraries Many other linear algebra libraries exist such as ublas, Seldon and Lapack++ which all use an implementation of BLAS and LAPACK with a C interface. GPU implementations of BLAS can be found too which can be much faster than the CPU implementations [7]. We will not use these because specific video cards are needed. D. MATLAB MATLAB uses BLAS for among others matrix-vector and matrix-matrix multiplication and LAPACK for eigenvalue decomposition. The release notes of MATLAB on mention that MATLAB uses MKL since version on Intel processors and ACML on AMD processors for an optimized BLAS on Windows systems. The used LAPACK is version since MATLAB version 7.6. This is a reference implementation, so it is not optimized. V. FS-MKLR Implementation It is known from analyzing the implementation in MAT- LAB that almost all time in the algorithm is spend in the eigenvalue decomposition, the matrix-vector and matrixmatrix multiplication. That is why specifically for these operations the fastest libraries are searched and used in our implementation. MEX-files [8] are used to have a MATLAB interface. This way data can easily be entered from MATLAB. It also makes it easier to test MATLAB and C++ with the same test data and compare the results. The disadvantage is that the MATLAB software 31

38 still needs to run the program, which takes some extra memory. The training of the model and the tuning of the hyper-parameters are put in two MEX-files. In the tuning function, multithreading is used on a high level. For different values of the hyper-parameter σ in the grid search, different threads are used up to the number of cores the processor has. In the training function, multithreading is used in the BLAS and LAPACK libraries. VI. Experiments First, empirical evaluation of BLAS and LAPACK libraries is performed in order to choose the fastest library for a set of individual operations. Later on, these libraries will be used when comparing the MATLAB and C++ FS- KLR implementations in terms of computation speed and memory usage. Note that all measurements are averaged over 5 runs. Each of the experiments are performed on an Intel Core 2 duo E Ghz with 2GB RAM with Windows XP SP2 and MATLAB R2009a (version 7.8). To compile the C++-code, the Visual C compiler is used. More experiments (including those on another architecture) can be found in [9]. with 123 attributes that belong to 2 different classes. The time is measured while tuning and training with 1000, 2000,..., inputvectors. The memory usage is measured while tuning and training the training set of isolet found on It consists of 6238 input vectors with 617 attributes belonging to 26 classes. For both implementations, the memory that the MATLAB software uses, is not measured. In the future when the implementation in C++ is totally independent of MATLAB, extra memory will be saved compared to MATLAB. In both tests the number of prototype vectors is 500 and 5 folds are used in the cross-validation. A. BLAS and LAPACK VII. Results and Discussion A. BLAS and LAPACK The tested BLAS libraries are: MKL version ACML version Reference BLAS version GotoBLAS version 1.26 Lapack++ ublas BLAS library of MATLAB R2009a (version 7.8) Note that ATLAS is not tested as we did not succeed to compile it on Windows. The function in BLAS for the matrix-vector multiplication for double precision is DGEMV, for matrix-matrix multiplication it is DGEMM. The tested LAPACK libraries are: MKL version ACML version Reference LAPACK version with gotoblas LAPACK library of MATLAB R2009a (version 7.8) Fig. 2. Time results of matrix-vector product. For the eigenvalue decomposition, the routine DSYEV is used. This function is used for the eigenvalue decomposition of a symmetric matrix, as this will always be the case in the FS-KLR algorithm. MATLAB uses this function automatically when the matrix is symmetric. All the matrices in the tests are square, the time of the operations will be measured with different sizes of the matrices. The libraries are only tested single-threaded, as we wish to multithread on a higher level in the implementation. B. FS-MKLR Implementation The training set of the data set a9a on cjlin/libsvmtools/datasets/ is used for time results. It consists of input vectors Fig. 3. Time results of matrix-matrix product. The results of the tests on the BLAS and LAPACK libraries are shown in Figure 2, 3 and 4. For this architecture MKL has the best BLAS and LAPACK library. The BLAS library of MATLAB performs almost the same because it uses MKL too. But because MATLAB only uses a reference LAPACK, time can be won in the implementation for 32

39 Fig. 5. Time results of tuning the hyper-parameters Fig. 4. Time results of eigenvalue decomposition. the eigenvalue decomposition. It is remarkable that the gotoblas library does not perform any better than the reference BLAS for the matrix-vector multiplication. Though it performs only a bit worse than MKL for the matrixmatrix multiplication. It is obvious that lapack++, Seldon and ublas are not optimized. That is also the reason why it was not tested anymore for the eigenvalue decomposition. ACML does well for the matrix-vector multiplication, but is almost four times slower than MKL for the matrix-matrix multiplication. For this architecture, MKL is chosen as BLAS and LA- PACK library. But as is indicated in [9], a different architecture can give other results. The best thing you can do is test it for yourself which one performs best. Using another library hardly needs changes. Fig. 6. Time results of training the model. B. FS-MKLR Implementation The time results are shown in Figure 5 and 6. For a small amount of input vectors, MATLAB is a bit faster singlethreaded. But the more input vectors, the faster C++ gets in proportion to MATLAB. From 4000 input vectors on the implementation in C++ is already faster. This is because the eigenvalue decomposition gets more important the more input vectors there are. With input vectors the tuning is already 7% faster and the training of the model 11%. With multithreading (using 2 threads) the tuning performs around 35% better than single-threaded. In MATLAB, the multithreading is done in the BLAS library for tuning and training. Because everytime the BLAS library is called, threads need to be created and deleted, the performance improvement is only around 15% for tuning. That is also the reason why the training of the model does not improve that much for the implementations in C++ (around 20%) and MATLAB (around 10%). The memory results for tuning are shown in Figure 7. To keep de figure clear, only the first 10% of the tuning is shown, the further progress is about the same. The implementation in C++ needs on average 24 MB less than MATLAB and has a peak of MB. MATLAB has a memory peak of MB. The memory results for training the model are less good (Figure 8). Here C++ needs on average 13 MB more than MATLAB with a peak of 142 MB. The implementation in MATLAB has a peak of 124 MB. Fig. 7. Memory usage of tuning the hyper-parameters. 33

40 Fig. 8. Memory usage of training the model. VIII. Conclusion and Future Work We can conclude that our C++ implementation outperforms that of MATLAB in terms of speed. The gain in speed for the C++ implementation opposed to the MAT- LAB alternative will increase when the number of training data points increases. The best libraries will depend on the architecture it runs on. Because MATLAB only uses the reference LAPACK, a better library is likely to be found on each architecture. The memory usage is something that needs to be improved in the future. This can be done by directly removing matrices from the memory when they are not used anymore. The next step is to make the implementation completely independent from MATLAB so the MATLAB software does not need to run anymore, which saves memory. By using a profiler, some other slow operations in the implementation can be searched for. These operations can then be further optimized when possible. Appendix A. Derivation of gradient and Hessian The gradient of (8) is given by g = lν K (w) w N y i ˆϕ(x i )exp( y i w T ˆϕ(x i )) = 1 + exp( y i w T + νw ˆϕ(x i )) = = i=1 N i=1 y i ˆϕ(x i ) 1 + exp(y i w T ˆϕ(x i )) + νw (15) N y i ˆϕ(x i )(P (Y = y i X = x i ;w)) + νw i=1 = = = N i=1 N i=1 N i=1 y i ˆϕ(x i ) 0 exp(y iw T ˆϕ(x i ))y i ˆϕ(x i ) (1 + exp(y i w T ˆϕ(x i ))) 2 + ν y i ˆϕ(x i ) y i ˆϕ(x i ) T exp(y i w T ˆϕ(x i )) (1 + exp(y i w T ˆϕ(x i ))) 2 + ν y 2 i ˆϕ(x i ) ˆϕ(x i ) T exp(y i w T ˆϕ(x i )) 1 + exp(y i w T ˆϕ(x i )) exp(y i w T ˆϕ(x i )) + ν N = ˆϕ(x i ) ˆϕ(x i ) T P (Y = y i X = x i ;w) i=1 (1 P (Y = y i X = x i ;w)) + ν (17) for i, j = 1,..., N. If i is not equal to j, then the same result is obtained but without the term ν. References [1] Karsmakers P., Sparse kernel-based models for speech recognition., PhD thesis, Faculty of Engineering, K.U.Leuven (Leuven, Belgium), May 2010, 216 p., Lirias number: [2] M. Aizerman, E. Braverman, L. Rozonoer, Theoretical foundations of the potential function method in pattern recognition, Automation and Remote Control, Vol. 25, pp , 1964 [3] L. S. Blackford, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, R. Pozo, K. Remington, and R. C. Whaley, An Updated Set of Basic Linear Algebra Subprograms (BLAS), pp [4] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen, LAPACK Users Guide, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, third edition, [5] R. C. Whaley, and A. Petitet, Minimizing development and maintenance costs in supporting persistently optimized BLAS, Software-Practice and Experience, Volume 35, Issue 2, 2005, pp [6] K. Goto, R. A. Van De Geijn, Anatomy of High-Performance Matrix Multiplication, ACM Transactions on Mathematical Software, Vol. 34, 2008 [7] Francisco D. Igual, Gregorio Quintana-Ort, and Robert van de Geijn, Level-3 BLAS on a GPU: Picking the Low Hanging Fruit, FLAME Working Note 37. Universidad Jaume I, Depto. de Ingenieria y Ciencia de Computadores. Technical Report DICC [8] Bai, Y., Using MEX-files to Interface with Serial Ports, The windows serial port programming handbook, Auerbach Publications, USA, 2004, pp [9] Geukens, A., Fixed-Size Kernel Logistic Regression, 2010 The Hessian of (8) is defined by H = 2 l ν K (w) w 0 w l ν K (w) w Dϕ w 0 2 l ν K (w) w 0 w Dϕ 2 l ν K (w) w Dϕ w Dϕ (16) Starting from equation (15), using d( f g = gdf fdg g ) and 2 the chainrule, we derive the Hessian for i = j: 2 l ν K (w) = N w i w j w T ( i=1 y i ˆϕ(x i ) 1 + exp(y i w T ˆϕ(x i )) + νw) 34

41 Uniformity tests and calibration of a scintillator coupled CCD camera using a AmBe neutron source L. J. Hambsch 1,2, J. De Boeck 2, P. Siegler 1, R. Wynants 1, and P. Schillebeeckx 1 1 European Commission JRC-IRMM, Retieseweg 111, B-2440 Geel, Belgium 2 Katholieke Hogeschool Kempen, Kleinhoefstraat 4, B-2440 Geel, Belgium Abstract A neutron sensitive scintillator coupled to a CCD camera based image system has been tested with an AmBe neutron source. Using polyethylene slabs as moderator for the neutrons and a Cd foil as a collimator, a well defined circular neutron spot was created. By scanning different positions on the scintillator, the response of the whole system was measured and analyzed with respect to uniformity over the entire sensitive area. The results show that due to the large amount of noise induced by the internal electronics, the camera is not suitable for experiments resulting in very low measurable intensities, but has advantages compared to the traditional photo plates when using stronger sources such as a linear accelerator. A I. INTRODUCTION T IRMM a linear electron accelerator with a maximal energy of 150MeV is used to produce neutrons with a white spectrum in a rotary, depleted uranium target. This target is placed in the center of a heavily shielded concrete hall with openings in a horizontal plane. Neutrons leave the target hall via evacuated aluminium tubes leading to the experimental areas. Since those neutron beams travel long distances through the flight tubes, collimators and experimental equipment, it is of importance to determine the position of the neutron beam as well as the neutron density distribution of the beam spot. These neutron beam profiles are usually measured by exposing traditional photo plates to the gamma-ray flux of the beam. This is a rather time consuming process and gives only indirect and limited qualitative information on the neutron density distribution. The use of a scintillator coupled to a CCD camera to capture images of the neutron beam directly could be envisaged to replace the traditional photo plates. First of all, capturing a beam profile using a CCD camera is generally faster than using a photo plate. Also, no special requirements such as wet chemical development are needed to be able to use a CCD camera. Secondly, the biggest advantage of a scintillator coupled CCD camera is that in effect, the neutrons are registered, whereas the photo plate is registering only the gamma radiation presumably originating from the same source as the neutrons. This enables us to get images of beams where thick high Z-material filters are applied to fully suppress gamma radiation, but at the same time allow neutrons to penetrate. Using photo plates, the measurement would imply a disassembly of the beam line in order to remove the filter to take a picture of the beam position. Finally, a CCD camera can be used multiple times and provides digital data directly to the user, making a quantitative analysis of the image possible. Still the question remains if the CCD camera is actually more accurate at capturing beam profiles, since the use of electronics automatically brings additional sources of noise to the data, which are nonexistent with photo plates. To investigate the impact of these sources on the final result, we designed an experiment in order to verify the uniformity of the sensitivity of the camera. In the following sections we demonstrate why the use of scintillator coupled CCD cameras in neutron beam profiling has advantages over the use of photo plates, but we also point out the limitations discovered during our experiments and the impact they have on our results, such as the noise introduced by the internal electronics. II. MEASUREMENT SETUP A lens coupled Coolview IDI neutron imager from Photonic Science was used in the present experiment. It provides an image with a resolution of 1392x1040 pixels covering an input area of 267 x 200 mm². The readout of the CCD has a depth of 12 bits, allowing the image to have theoretically 4096 different intensities. The scintillator used is a Li 6 F/ZnS:Ag being 0.4 mm thick and protected by a 500 µm aluminium membrane. This membrane also shields the scintillator from natural light. The light from the scintillator then passes a 45 degree high reflectivity mirror and a close focus lens. This enables placement of the electronics outside of the beam to minimize scattering of incident neutrons. Before reaching the CCD chip, the light passes an intensifier which can be gated by an external TTL signal [2]. Figure 1 shows a sketch of this setup. 35

42 Figure 1. Schematic drawing of the insides of the used neutron camera system. Since the calibration of the scintillator requires a constant intensity and a homogeneous distribution of the incoming neutron beam, the use of an AmBe neutron source over the IRMM Geel Electron Linear Accelerator (GELINA) was preferred. The strongest source available at the laboratory was an AmBe source with an intensity of 37 GBq resulting in neutrons/s. This intensity already required careful safety measures concerning the neutron and gamma radiation. The AmBe source has by definition a constant neutron intensity whereas an accelerator can show fluctuations of the neutron intensity as a function of time. The problem with a source setup is that the emission of radiation is isotropic and not a linear beam. This means that the radiation is everywhere, and can cause a lot of scattering. To cope with this the source was shielded by B 4 C and lead on the back side whereas on the front, towards the camera, a 12cm thick paraffin layer was placed. This created a thermalized and homogeneous neutron flux. A 1 mm thick cadmium plate with a hole of 6 mm diameter was placed in front of the camera, to create a small neutron beam. Figure 2 shows the experimental setup. In order to perform a scan of the scintillator area of the camera, the source had to be moved across the sensitive area of the camera. Since the source with its shielding weights more then 200 kg, we chose to move the camera instead. To achieve this, we designed a custom made table that allowed us to move the camera horizontally and vertically without changing the source configuration (Figure 2). With this setup, an array of 30 different positions was measured resulting in a 5x6 points dataset. The points are separated 4cm from each other horizontally and 3cm vertically, covering an area of 24x15cm² on the scintillator screen. III. ANALYSIS For the camera control and readout a LabVIEW program was developed. Starting from the basic image acquisition functions, the graphical user interface allows the user to control the camera settings and process the received data. The raw CCD images are corrected with dark frames in order to eliminate the static noise on the acquired images, consisting of amplifier glow and hot pixels. Also an algorithm was implemented in order to eliminate random hot pixels showing up during acquisition. To extract quantitative data from the images, the possibility to add circular areas to the standard cursors has been implemented. This is necessary to reduce the statistical fluctuations between data points and to get an average over-area intensity instead. The possibility to add or subtract different data sets and save the results has also been implemented, giving the user the basic tools to analyze and compare different datasets. The results of a 20 minute acquisition session from our test series can be seen in Figure 3. The spot resulting from our neutron source is barely noticeable near the center of the image just above the noise level. Figure 2. Experimental setup on a custom made table allowing movement of the camera while leaving the source setup untouched. Figure s acquisition time raw image of the AmBe beam spot. Intensities of the amplifier glow in the lower part reach over 800 counts/pixel, whereas the beam spot is barely standing out from the background noise (inside the red circle). 36

43 data points that are added together. This was necessary to minimize the influence of statistical fluctuations present in the data. Position Table 1. List of the 5x6 matrix with the values of the measured intensities in a radius of 50 pixels, covering the complete region of interest. Figure 4. Image data after dark frame subtraction. Now the beam spot is clearly visible showing an average of 40 counts/pixel. In Figure 3 one can clearly notice the strong amplifier glow [3] located in the bottom corners, showing more then 800 counts/pixel compared to the unaffected region of only 200 counts/pixel. This has a large impact on our results, as the AmBe source provides only a small neutron intensity requiring relatively long exposure times. In Figure 4, the dark frame has been subtracted and the hot pixels have been filtered out. The resulting image shows the beam spot much clearer than in the raw image, and enables us to extract numerical data from it. Figure 6 shows a plot of the intensities for the 5x6 matrix scan based on the values of Table 1. Examining the graph, one can clearly notice a steep drop of the measured intensity at all the edges except the upper one. This drop in intensity is attributed to the dark frame subtraction performed on the images to remove the amplifier glow induced on the image by internal components of the control electronics of the CD chip. When comparing the raw image in Figure 3 with the intensity graph in Figure 6, the similarity between areas of noise and areas of low neutron intensity is evident. Figure 6. Graphical representation of the intensity data presented in Table 1, clearly showing the drop in intensity at the edges. Figure 5. Superposition of all data points placed next to each other in their correct positions. Figure 5 shows a superposition of the full scan of the scintillator. The measured spot intensities of the scan are given in Table 1. Each dataset has an exposure time of 1200 seconds. To read out the numerical intensities, a circular area with radius 50 pixels was used, resulting in a total of

44 Looking from a different angle on these results, one can conclude that by having a lot of amplifier glow on those sides where the noise producing components are located, the small intensity of the captured neutrons is drowned in noise of the amplifiers. Considering the dark frame has intensities of more than 400 counts/pixel whereas the AmBe neutron source intensities only reach a value of 40 counts/pixel (or 0.03 counts/pixel/s) at its best, the camera is quite insensitive to low neutron intensities at the edges where the amplifiers are located. Figure 7. LINAC beam image with 600 s exposure time and no filters in place at a distance of 30 m from the source. IV. DISCUSSION Considering the relatively weak AmBe source used when compared to a linear accelerator beam, the camera shows in general a good sensitivity to even weak neutron sources. During our measurements it was shown that the camera produces a considerable amount of background noise and this certainly needs to be taken into account when looking at low intensity results with respect to quantitative analysis of beam profiles. However for its main purpose, beam profiling, the intensities will be high enough that one can neglect the influence of the noise after image processing. As can be seen in Figure 7, the intensity was about 2.4 counts/pixel/s in the center of the collimation and this after only 10 minutes of exposure with a beam from GELINA at 30 m distance from the source. Looking at Figures 4 and 7, we can compare the neutron intensities of the AmBe source with the neutron beam from GELINA at 30 m. From a quantitative analysis of the images we conclude that the AmBe source delivers 78 times less neutrons/cm²/s at the camera than the accelerator. Considering this, we can conclude that the camera serves its purpose well, as long as the neutron intensities are high enough to overcome the background noise induced by the camera. Hence if one could use a photo plate in the past, the intensities are surely large enough to get a clear image with the new scintillator coupled CCD camera. V. CONCLUSION The linearity measurements with the AmBe source have given satisfactory results for the use of the neutron camera as a tool for beam profiling. The influence of noise, especially when used at low neutron intensities implies limitations especially at the lower edges of the neutron sensitive area. As stated in the previous paragraph, the results of our experiment show that the small intensities produced by the AmBe source are drowned in noise and thus are strongly affected by the amplifier glow. In conclusion the camera is best used with strong neutron sources such as those available at the GELINA beam in order to obtain high intensities that stand out clearly from the background noise after image processing. For low neutron intensities, one should focus on the upper center of the sensitive screen to achieve best results because there the amplifier glow has the least influence. For general beam profiling or object positioning, the camera is perfectly suited and has big advantages over traditional photo plates, especially because the time needed for getting results is very short, being only a few minutes with the GELINA beam on. In the future, an activation analysis method using thin goldfoils will be performed in parallel with the scintillator coupled CCD camera in order to compare the results of the measured neutron flux from the GELINA beam with the image intensities of the camera. This will allow for an absolute calibration of the measured neutron intensities. REFERENCES [1] Put, S. (2005, February 2). Accelerators and time-offlight experiments. Geel, Belgium: BNEN. [2] Photonic Science. (n.d.). Lens Coupled Coolview IDI Neutron Imager USER MANUAL. United Kingdom: Photonic Science. [3] Martinec, E. (2008, May 22). Noise, Dynamic Range and Bit Depth in Digital SLR's. Retrieved February 22, 2011, from [4] Hambsch, L. J. (2011). Neutron Imaging using a CCD- Coupled Scintillator. Geel: KHK. 38

45 Ultra-Wideband Antipodal Vivaldi Antenna Array with Wilkinson Power Divider Feeding Network K. Janssen 1, M. Strackx 2,3, E. D Agostino 3, G. A. E. Vandenbosch 2, P. Reynaert 2, P. Leroux 1,2 1 K.H.Kempen, IBW-RELIC, Kleinhoefstraat 4, B-2440 Geel, Belgium 2 K.U.Leuven, Dept. Elektrotechniek ESAT, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium 3 SCK CEN, EHS-RDC, Belgian Nuclear Research Institute, Boeretang 200, B-2400 Mol, Belgium Abstract This paper investigates a two element antipodal Vivaldi antenna array powered by an Ultra-Wideband (UWB) Wilkinson power divider feeding network. Firstly, a comparison is made between the performances of a FR4 and Rogers RO3003 substrate. Next, a stacked antenna configuration is defined to improve the directivity. Furthermore, a feeding network is developed, to obtain a uniform power split. Those performances are analyzed by use of simulations and measurements on a standard planar technology. Finally beam steering is simulated with a three element stacked antenna configuration. Keywords-Ultra wideband (UWB), Vivaldi antenna array, Wilkinson power divider, beamsteering I. INTRODUCTION Since the FCC authorized the use of Ultra-Wideband (UWB) technology in the GHz range, there has been a wide research effort. UWB antennas are an important aspect in this research. Two main types of antennas can be considered, both based on directional or omni-directional point of view. Communication systems provides messaging and controlling opportunities and RADAR systems are used to navigate and to detect objects. Previous research distinguished UWB Vivaldi antennas from other short range communication methods [1, 2]. They mainly support a great bandwidth and good impedance matching. They re characterized by a wide pattern bandwidth and a usable gain. This research expands an UWB dual elliptically tapered antipodal slot antenna (DETASA) or enhanced antipodal Vivaldi antenna [2]. We present an UWB Vivaldi antenna array with well-improved small pattern bandwidth. This makes practical use in radar applications possible. Besides that, analyzing building constructions [3] and ground penetrating radar (GPR) usage [4] is suitable with this antenna. The performance of this array construction presents medical usages. Remote measurement of heartbeats, breathing and subcutaneous bleeding are a few examples [5]. This wireless healthcare simplifies human monitoring. Based on [6] we have developed and measured an UWB Wilkinson power divider. Despite promising simulation results, this divider wasn t measured before. The design was altered to work on a Rogers RO3003 substrate. It operates with acceptable S-parameters in the UWB band and characterize matched output ports with high isolation. Furthermore, this study will analyze the scan performances of a Vivaldi array over the UWB frequency range. This paper investigates the simulations and development of an UWB Vivaldi array. First, Section II shows the results of an improvement of the antipodal Vivaldi antenna and an implementation on a Rogers substrate. Section III describes an enhancement and realization of a two-way Wilkinson power divider. After that, Section IV defines a construction of a twoelement UWB antenna array using two antipodal Vivaldi antennas in a stacked configuration. Results are shown for both measurements and simulations. Next, Section V investigates beam steering of a three element Vivaldi antenna array. Finally, conclusions are drawn in Section VI. II. SINGLE-ELEMENT VIVALDI IMPROVEMENT Over the last few years many designs and improvements have been published on UWB antenna s [7, 8]. We define an UWB antenna if it confirms by the following considerations. It requires an UWB signal with a fractional bandwidth greater than 20% of the center frequency or a bandwidth greater than 500MHz [9]. The FCC regulated medical applications to achieve a S 11 of -10dB between 3.1GHz and 10.6GHz [9]. In this band an acceptable radiation pattern is assured. Next, the signal has to propagate like a short-impulse over the frequency band of interest with minimum distortion. Furthermore the FCC specified a power emission limit of 41.3dBm/MHz [10]. UWB antennas can be specified in terms of geometry and radiation characteristics and are classified in two- or threedimensional and directional or omnidirectional designs. [11] This research investigates the DETASA which can be classified as a two-dimensional and directional antenna. It s characterized by a stable directional pattern and consistent impedance matching, which makes a point-to-point communication system possible [2]. This antenna mainly differentiates from other UWB antennas, like a patch and horn antenna by its large operational bandwidth and gain performance and size constraint. We consider that the minimal frequency-limit is related with the antenna size and the gain directly related with the size of the antenna. We obtain radiation when a half wavelength is configured between the conducting arms, which determines the width of the antenna. The length can be specified as follows. A shorter length results in a steeper tapering which implies a more abrupt impedance change. This results in more reflections. The DETASA is a modified version of the antipodal tapered slot antenna (ATSA), also known as Vivaldi antenna. It varies from the ATSA by the elliptically tapered inner and outer edges of the slotline conductors. This slotline consists of semicircular loadings at the ends, which improves the radiation performance and decreases the lowest operating frequency [11, 39

46 8]. The antipodal Vivaldi antenna contains a tapered radiating slot and a feeding transition. The same dimensions reported in [2] were used, with exception of the feeding strip length. We extended the feeding strip by 3.5mm in order to take the connector dimensions into account, which results in an improved return loss. We simulated and measured the antipodal Vivaldi antenna on a Rogers RO3003 substrate with a thickness of 0.762mm and a relative permittivity of 3. We consider that the characteristic impedance of the feed is dependent on the substrate height and permittivity. The width of the strip line is calculated and optimized to 1.596mm with CST Microwave Studio to achieve an excellent match for S-parameters in the UWB band. This requires a minimum of -10dB return loss. The simulation includes a 50Ω surface mounted assembly (SMA) connector instead of a waveguide port to feed the radiator. This realization integrates the effects of discontinuities and results in a better resemblance with the measurements. Furthermore the legs of the connector are connected to each other by two via s soldered to the pads on the substrate. This reduces the interference of the connector and improves the realized gain at 7GHz. Using a RO3003 substrate instead of FR4 improves the results. As shown in Fig. 1, both substrates perform the same lower -10dB cutoff frequency at 2.2GHz, but a profit return loss is achieved with RO3003 for frequencies higher than 9GHz. Only RO3003 substrate satisfices the -10dB at higher frequencies. Furthermore, Fig. 2 shows the measured and simulated performances of the realized gain, which represents the losses. A higher gain could be succeeded for frequencies starting on 10GHz by using the RO3003 substrate. III. WILKINSON POWER DIVIDER We also implemented a predefined Wilkinson power divider [6] which wasn t physical measured before. This component is designed as a 50Ω two-section version with stepped-impedance open-circuited (OC) radial stubs to achieve good operation within the UWB band. We used the same design methodology as described in [6]. However, during simulations it was necessary to increase the length of the transmission line with L0=1.05cm to broaden the -10dB bandwidth of S 11 and to redefine the radius of the open stub to tune the upper cutoff frequency limit of S 22 and S 33. Unsatisfactory measurement resulted by using through hole SMA connectors at the divider s outputs required a redesign. This was realized by adding a transmission line at the output, which made the use of end launch SMA connectors possible. The length between the two outputs are altered to achieve the preferred element spacing, which will be explained in the next section. We designed the Wilkinson power divider included with end launch SMA connectors on a Rogers RO3003 substrate. Fig. 4 compares the simulated and measured S-parameters. A return loss of the input (Port 1) and outputs (Port 2, Port 3) less than -12dB is guaranteed. It also reflects an acceptable isolation parameter S 23 which reveals a good power split and an stable insertion loss in the range of -3dB. The well-working length and width dimensions of the microstrips are shown in Table I. Figure 1. Simulated and measured return loss of the single element Vivaldi antenne on FR4 vs RO3003 Figure 2. Simulated and measured realized gain of the single element Vivaldi antenne on FR4 vs RO3003 The design visualized in Fig. 3 consists of a symmetric structure, so an equal power division is guaranteed. An 100Ω SMD805 isolation resistor is used to match satisfactory isolation performances between the output ports. We optimized the resistor microstrip to perform a high operating frequency level of at least -10dB for S 22 and S 33 in the UWB band. TABLE I. Geometric dimensions of the UWB Wilkinson power divider [mm] W.. 1,847 0,836 0,534 1,847 L.. 10,5 7,237 0,346 1,789 16,517 R.. 0,862 [mm] b W.. 0,6 1,847 2,357 0,4572 L.. 0,673 8,627 5,38 0,889 R.. 0,4826 0,

47 Figure 3. Layout of the UWB Wilkinson power divider In this configuration, the Wilkinson power divider is proposed to feed two antipodal Vivaldi antennas. This feeding network represents an important role in the performance of the antenna array. Firstly we found an optimum as inter-element spacing for the 2x1 antenna array by simulating this in CST Microwave Studio. This optimum results in a constructive interference pattern and determines the -3dB beamwidth. It specifies the distance of both output lines from the power divider. The Wilkinson power divider basically sizes 49mm x 30mm. An extension of the substrate was applied for construction reasons. IV. VIVALDI ARRAY Many UWB applications demands an antenna with a small directive beamwidth, like radar systems. Examples are through-wall imaging, detection of traumatic internal injuries, fall detection and imaging of the human body without direct skin contact have strong potential use [5]. Those wideband technologies can be developed with Ultra-Wideband Vivaldi arrays. By implementing an array construction, a high directivity can be achieved. In this study we designed an UWB array with two antipodal Vivaldi antenna s. To construct an interference pattern with a fixed beam, the spacing between the two elements had to be calculated. To suppress grating lobes and achieve a higher directivity, the element spacing is determined with the parametric optimizer tool of CST. A design goal was set to obtain a maximum integrated realized gain in the frequency range 3-11GHz as shown in Fig. 5. This resulted in an interelement space of 38.4mm. Magnitude [db] Magnitude [db] S11 sim -30 S11 meas -35 S21 sim S21 meas Frequency [GHz] S22 sim -35 S22 meas S23 sim -40 S23 meas Frequency [GHz] Figure 4. Simulated and measured S-parameters of the UWB Wilkinson power divider 41

48 Integrated realized gain [db] Element spacing [mm] Figure 5. Optimization of the element spacing related with the integrated realized gain The antipodal Vivaldi antenna array and the UWB Wilkinson power divider were combined as shown in Fig. 6. Measurements depicted in Fig. 7 show a narrowness of normalized beamwidth at 10GHz due to the array configuration. The 3dB beamwidth of the E-plane remains the same, however the 3dB beamwidth of the H-plane is reduced by half. This results in an improved gain. Figure 6. Two element antipodal Vivaldi antenna array with UWB Wilkinson power divider Normalized magnitude [db] E-plane element E-plane -30 array H-plane element -35 H-plane array Angle [ ] Figure 7. Measured radiation patterns in E/H-plane of one element vs array at 10GHz V. BEAMSTEERING Radar and remote sensing applications requests a small beam and a high scan angle. We previously studied a two element Vivaldi array, which improved the beamwidth and the directivity. This part will investigate the design of an UWB scanning array. To achieve great scan performances, a three element Vivaldi array has been built in the same way as before. The element spacing d x, which determines a fixed beam with the nearest grating lobe at the horizon, can be reflected by the criterion [12]: d x sin 0 at the upper frequency limit (f 0 =10,6GHz) and for a scan angle θ 0 =22,5. The spacing is limited to one-half wavelength for a maximum angle of 90. This results in a maximum of 20,5mm. A further reduction of the distance was done in order to eliminate array blindness, which results in a distance of 10mm. Beam steering with an UWB signal isn t an obvious task. It s difficult to achieve a high scan range with suppression of side lobes in the whole frequency range ( GHz). Typically, a phase shift or an increased frequency relates with an improved directivity. Despite, a development of grating lobes. Oversteering results in an asymmetric main lobe and furthermore a side lobe with higher magnitude related to the main lobe. Those issues can be derived out of Table II, which gives an overview of the magnitude (Magn.), scan angle (α), beamwidth (θ) and side lobe level. This for simulated radiation patterns at 4,7 and 10,6GHz with the antenna elements individually phase shifted over 0, 40, 80 and 100 degrees. The given side lobe level is related to the magnitude of the main lobe. 42

49 TABLE II. Beam steering of a three element Vivaldi array at 4, 7 and 10,6GHz 4 GHz 7 GHz 10,6 GHz Magn. [db] α [ ] θ 3dB [ ] Sidelobe [db] Magn. [db] α [ ] θ 3dB [ ] Sidelobe [db] Magn. [db] α [ ] θ 3dB [ ] Sidelobe [db] Fase shift [ ] 0 9, ,7 11,9 11,8 0 56, ,6 0 36,9 11,3 40 8, ,2 16,8 11, ,1 10,9 9 35,1 11,1 80 6, ,3 25,1 8, ,2 8,8 8, ,3 10, , ,9 11,3 6, ,6 8,2 6, ,9 8,1 VI. CONCLUSION Based on an improved UWB dual elliptically tapered antipodal slot antenna, we analyzed a beneficial to use the Rogers RO3003 substrate for frequencies above 9GHz. For lower operation, a standard FR4 satisfies equal performances. We implemented a stacked two element configuration and achieved a reduction of the 3dB beamwidth in the H-plane by half. The element spacing was optimized to result a maximum realized gain and limits crosstalk. A developed Wilkinson power divider feeding network succeeded with equal power split, reflection losses lower than -12dB and an insertion loss in the range of -0.15dB to -0.96dB. It is shown that the simulations corresponds with measurements in the UWB band. Finally, we steered the beam of a stacked three element configuration over the UWB range. A satisfied scan angle of 34 at 10.6GHz resulted due an accurate element spacing along with sufficient suppression of grating lobes. REFERENCES [1] P. J. Gibson, The Vivaldi Aerial, Proc. 9th European Microwave Conf. Brighton, U.K., p , [2] E. Li, H. San and J.Mao, The conformal finite-difference time-domain analysis of the antipodal Vivaldi antenna for UWB applications, 7th International Symposium on Antennas, Propagation & EM Theory, Guilin, [3] I. Y. Immoreev, Practical application of ultra-wideband, Ultrawideband and Ultrashort Impulse Signals, Sevastopol, Ukraine, [4] RADAR Y. J. Park et al, Development of an ultra wideband groundpenetrating radar (UWB GPR) for nondestructive testing of underground objects, Antennas and Propagation Society International Symposium, [5] C. N. Paulson et al, Ultra-wideband Radar Methods and Techniques of Medical Sensing and Imaging, SPIE International Symposium on Optics East, Boston, MA, United States, [6] O. Ahmed and A. R. Sebak, A modified Wilkinson power divider/ combiner for ultrawideband communications, International Symposium on Antennas and Propagation Society, APSURSI, IEEE, Charleston, SC, [7] O. Javashvili and D. Andersson, New method for Design Implementation of Vivaldi Antennas to Improve its UWB Behaviour, EuCAP 2010, Sweden, [8] X. Qing, Z. N. Chen and M. Y. W. Chia, Parametric Study of Ultra- Wideband Dual Elliptically Tapered Antipodal Slot Antenna, Institute for Infocomm Research, Recommended by Hans G. Schantz, Singapore, [9] Code of Federal Regulations: Part 15 Subpart F Ultra-Wideband Operation. Federal Communications Commission. May [10] K. P. Ray, Design Aspects of Printed Monopole Antennas for Ultra- Wide Band Applications, SAMEER, IIT Campus, Hill Side, Powai, Mumbai, India, Recommended by Hans G. Schantz, [11] B. Allen et al, Ultra-Wideband Antennas And Propagation For Communications, Radar And Imaging, New York: Wiley, pp , [12] R. J. Mailloux, Phased Array Antenna Handbook, Second edition, Boston, Londen: Artech House. 43

50 44

51 Identification of directional causal relations in simulated EEG signals Koen Kempeneers 1,2, A. Ahnaou 3, W.H.I.M. Drinkenburg 3, Geert Van Genechten 4, Bart Vanrumste 1,5 1 KHKempen, Associatie KULeuven, Kleinhoefstraat 4, 2440 Geel, Belgium 2 Damiaaninstituut VZW, Pastoor Dergentlaan 220, 3200 Aarschot, Belgium 3 Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340 Beerse, Belgium 4 ESI-CIT group, Diepenbekerweg 10, 3500 Hasselt, Belgium 5 KULeuven - ESAT, Kasteelpark Arenberg 10, 3001 Heverlee, Belgium Abstract Many researchers rely on standard-, multiple- and partial coherences to identify causal relations in signal processing. Furthermore Directed Transfer Functions (DTF) and Partial Directed Coherences (PDC) are used to discriminate for directed causal relations in recorded EEG data. In this paper these algorithms used for identifying causal relations in neural networks are reviewed. In particular the effects of volume conduction on the yielded results are sought after. When fed simple simulated models generated by software that takes the effects of volume conduction in to account both DTF and PDC will show themselves to be susceptible to the effects of volume conduction. Index Terms EEG, Causal flow, Coherence, Directed Transfer Functions, DTF, Multiple Coherence, Partial Coherence, Partial Directed Coherence, PDC, Volume Conduction I. INTRODUCTION The electro-encephalography measures field potentials from firing neurons in the brain. In research on brain diseases and medication against these diseases the electro-encephalogram (EEG) offers valuable diagnostic information with high temporal resolution. EEG signals allow for discrimination of the nature and location of a brain defect with a number of illnesses. Some other encephalopathy s such as schizophrenia and Alzheimer's however cannot be linked to a single location in the brain. It is believed the latter diseases are characterized or might even be caused by failures in the rhythmic synchronization of neuron clusters. Therefore, research into temporal shifted rhythmic synchronization of EEG signals is carried out. Over time several methods have been used In 1969 Granger causal analysis [6] offered the first formal method to investigate causal relations in the frequency domain. The concept was widely adopted and used not only for the intended econometric use but also for other domains of research. Seth [18] provides a MATLAB toolbox under GNU General Public License for Granger causal analysis. In 1991 Kamiński and Blinowska defined the Directed Transfer Function [9] based on Granger frequency domain causal analysis. Their adaptation of Granger causal analysis expands from the bivariate auto regression process Granger described to a multivariate one. In 2000 Baccalá and Sameshima defined Partial Directed Coherence [2] and is since widely adopted by researchers [16][21]. In this paper the latter 2 algorithms for identifying causal relations in neural networks will be reviewed. In particular we will examine whether they are able to withstand the effects of volume conduction and still produce accurate results when stressed. This critical reflection, around the usability of these algorithms for effective analysis of EEG data, is carried out with data coming from a simulated spherical head model with a electrode system. II. METHODS Standard-, multiple- and partial coherences, Directed Transfer Functions (DTF) and Partial Directed Coherences (PDC) are developed to discriminate for directed causal relations in EEG data. These algorithms rely on multivariate auto regression (MVAR) analysis. In fact, they only differ in the way the Fourier transformed regression parameters are interpreted. The coherence functions distinguish themself from DTF and PDC in the fact that it does include spectral power, where DTF and PDC do not. Spectral power is derived from the spectral matrix calculated from the Fourier transformed regression parameters and the noise covariance matrix of the residing noise sources. Standard coherence is the Fourier transform of the normalized time domain cross correlation function and quantifies to what extend two functions relate to each other in the frequency domain. The standard coherence function is itself a real function of frequency whose result is confined to the range 0 to 1. A 0 result portraits no coherence whatsoever, coherence 1 means the first time series can be written as a linear combination with constant coefficients of the second time series and vice versa. Values in between signify either a contamination of both time series with noise, an other than linear relation between both time series or the fact that the second time series may be written as a combination of the first time series and another input. When dealing with multiple time series, like EEG does, multiple coherence portraits the coherence of any one channel in relation to all analyzed time series. Its result is again confined to the range 0 to 1. Partial coherence on the other hand returns the coherence of just two channels in a multichannel system. Partial coherence will only return a close 45

52 to 1 result if and only if only those two time series display the same spectral coherence. The Directed Transfer Function, first defined by Kamiński and Blinowska [9][10][11], offers supplemental information with respect to coherence functions. DTF allows for discrimination of the direction of causal flow where coherences simply identify the causal relation. Although DTF allows the identification of causal sources and causal sinks. A causal relation identified by DTF does not necessarily imply that the relation is a direct one. That is where Partial Directed Coherence distinguishes itself from DTF. Like DTF, PDC does not use the spectral power matrix derived from the Fourier transform of the regression coefficients and the noise covariance matrix of the residing noise sources. PDC however does differentiate between direct and indirect causal relations in time series. A. Theoretical considerations In any of the algorithms under review the model is described using an appropriate multivariate auto regression process as shown in (1) for a bivariate time series. t i t x A i t x A t x p i i p i i (1) t i t x A i t x A t x p i i p i i Where x 1 are elements of the first time series and x 2 are elements of the second series. When these equations are subsequently transformed to the Fourier domain they yield: f E f X e A f X e A f X p i i f j i p i i f j i (2) f E f X e A f X e A f X p i i f j i p i i f j i Rewriting the equations in matrix form:, f E f E f X f X e A e A e A e A p i i f j i p i i f j i p i i f j i p i i f j i (3) f E f E f X f X f A (4) Or, f E f E e A e A e A e A f X f X p i i f j i p i i f j i p i i f j i p i i f j i (5) f E f E f H f X f X (6) Equation (6) shows that, provided the residing noise sources contain uncorrelated white noise the Fourier transformed regression coefficients fully describe the actual data. Therefor all dependencies may be calculated from the system transfer matrix H. Where the system transfer matrix H is the inverted matrix A. From the system transfer matrix H and the noise covariance matrix of the white noise sources V one may calculate the spectral power matrix S: ) ( ) ( ) ( * f H V f H f S (7) The asterisk denotes complex conjugate matrix transposition. For a trivariate auto regression process the spectral power matrix is given in (8). ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( f S f S f S f S f S f S f S f S f S f S (8) In the spectral power matrix the diagonal elements contain the respective time series power spectral density. The off diagonal elements contain the cross power spectral density. Since standard coherence is calculated from the squared cross power spectral density G xy normalized by the product of the respective power spectrum densities G xx and G yy :. 2 2 yy xx xy xy G G G C (9) It is easily understood that ordinary coherence may also be calculated from the spectral power matrix elements previously calculated., ) ( ) ( ) ( ) ( 2 2 f S f S f S f C jj ii ij ij (10) where S ij is the element of the i th row and j th column of the spectral power matrix calculated with (7). Equation (10) is therefore equivalent to (9). Literature states that a squared coherence value larger than 0.5 denotes a significant causal relation [7]. Multiple coherences on the other hand are calculated using:. det 1 2 f M f S f S f ii ii i (11) While partial coherences are calculated using:, 2 2 f M f M f M f jj ii ij j i (12) where M is the minor taken from the spectral power matrix with row i and column j removed. DTF is defined using only the system transfer matrix H elements defined in equation (5)(6). DTF can be explained in terms of Granger causality, in fact when applied to a bivariate time series DTF is exactly the same as pairwise spectral Granger causality. 2 2 f H f j i j i (13) ) ( f H f A f A f I j i j i j i (14) 46

53 Since non normalized properties are hard to quantify, normalized DTF was defined. Normalized DTF (13) returns spectral Directed Transfer Functions confined to the 0 to 1 range. Which may be found particular convenient because of similar properties in the coherence functions. 2 i j f k H m1 i j H f im 2 f 2 (15) An inconvenience of DTF however is that any revealed causal relation does not necessarily imply that the causal relation is a direct one. For that matter Kamiński en Blinowska define the direct causal influence [10]. This property of the multivariate auto regression process may be viewed upon as a direct time domain causal property. Unfortunately literature does not offer algorithms to normalize or correctly quantify the direct causal influence. This leads up to the definition of the last algorithm for identifying directional causal relations under review in this paper. Baccalá and Sameshima define partial directed coherence [2]: f A 2 i j i j * a j f a j 2 (16) In this expression a j is the source channel column vector from the transformed coefficients. Partial directed coherence is calculated immediately from the Fourier transformed regression coefficients. And is therefore less intensive a computation than DTF and normalized DTF that require a number of matrix inversions. It is not hard to see that partial directed coherence is closely related to pairwise Granger causality (13). Partial directed coherence plots the influence of a source channel towards the destination channel normalized by the cumulated influence of that channel towards all channels. B. Simulation models Comparative analysis of the properties of the directed causal flow algorithms is carried out using a spherical approximation of the human head. Signals from a electrode EEG were simulated using software supplied by Vanrumste [23]. Subsequently 4 slightly different models were simulated. Figure 1 The Electrode setup on a human head [24] 29 uv Fp1 F3 F7 C3 T3 T5 P3 O1 Fp2 F4 F8 C4 T4 T6 P4 O2 Fz Cz Pz Time (sec) Figure 2 The 4 second epoch used for comparative analysis of the directed causal flow algorithms. A 10Hz neuron cluster is located at [ ] in the left occipital lobe with added noise. The first simulation model simulates volume conduction from just one firing neuron cluster located in the base of the occipetale lobe (Figure 2). The combined signal to noise ratio is 8.6dB 30 uv Fp1 F3 F7 C3 T3 T5 P3 O1 Fp2 F4 F8 C4 T4 T6 P4 O2 Fz Cz Pz Time (sec) Figure 3 The 4 second epoch used for the simulation of a second, unrelated neural cluster at a frequency of 30Hz. The second cluster s location is at [ ] and it kicks in after 1 second. In the second model the electrode potentials from a second, unrelated, cluster firing at a frequency of 30Hz are added to the first model. The unrelated cluster is located at [ ], the combined signal to noise ratio is 9.3dB. The third model is a simulated causal relation at a frequency of 10Hz. (Figure 4) The source is located at [ ] at the base of the left occipetale lobe while the receiver cluster is located at [ ] near electrode P3. The temporal shift in the causal relation is kept at 4 lags over a 250Hz sample frequency (16ms). The fourth and final model features the same simple causal 47

54 relation expanded with signals from a nearby cluster. 38 uv Fp1 F3 F7 C3 T3 T5 P3 O1 Fp2 F4 F8 C4 T4 T6 P4 O2 Fz Cz Pz Time (sec) Figure 4 The 4 second epoch used for simulation of a causal relation. A second cluster at 10 Hz is located at [ ] near electrode P3 (Combined S/N Ratio 12.2dB). 42 uv Fp1 F3 F7 C3 T3 T5 P3 O1 Fp2 F4 F8 C4 T4 T6 P4 O2 Fz Cz Pz Time (sec) Figure 5 The 4 second epoch with a causal relation and a nearby unrelated cluster (Combined S/N Ratio 12.5dB). III. RESULTS Each model was fitted with a multivariate auto regression (MVAR) process, model order was estimated using Akaike information criterion [1]. Subsequently residuals where tested for correlated noise and model consistency according to Ding et al. [4] was calculated. Model TABLE I REALISTIC HEAD MODEL SIMULATION PROPERTIES MVAR process order Residuals White Noise Model Consistancy Model 1 2 yes 93.5 % Model 2 4 yes 94.0 % Model 3 2 yes 97.3 % Model 4 3 yes 96.6 % Subsequently the spectral transfer functions under review where calculated using a 20 discrete frequencies in the range 0 to f s /2. Since especially the causal flow is under investigation the maxima from those calculations are plotted in a heat map. The columns in the heat map signify source channels, the rows are the receiver channels. A. Coherence functions It is no surprise that standard coherence functions reveal synchronizes regions in the EEG electrode potentials. Figure 6 shows the heat map for model 4 simulation. Notice the figure is symmetrical around the main diagonal. Fp1 F3 F7 C3 T3 T5 P3 O1 Fp2 F4 F8 C4 T4 T6 P4 O2 Fz Cz Pz Fp1 F3 F7 C3 T3 T5 P3 O1 Fp2 F4 F8 C4 T4 T6 P4 O2 Fz Cz Pz Figure 6 Ordinary coherence maxima using simulation model 4, due to volume conduction there is no possible way to distinguish O1-P3 causal relation. Particular those electrodes that are located closely to the firing neuron clusters display significant standard coherence values. Fp1 F3 F7 C3 T3 T5 P3 O1 Fp2 F4 F8 C4 T4 T6 P4 O2 Fz Cz Pz Fp1 F3 F7 C3 T3 T5 P3 O1 Fp2 F4 F8 C4 T4 T6 P4 O2 Fz Cz Pz Figure 7 Partial coherence maxima using simulation model 4, the O1-P3 causal relation remains undetected, O1-T5 displays a near significant value due to presence of a unrelated firing neuron cluster. The partial coherence function shows less positives, in fact the only positive partial coherence revealed is a false positive due to volume conduction P3-Pz. The same information is shown when the third model is simulated. That model incorporates the same causal relation but the third unrelated cluster is removed. This illustrates that both ordinary, standard, coherences and partial coherence suffer from the effects of volume conduction when applied to EEG data recorded from scalp electrodes

55 B. Directed transfer functions Literature states that DTF is rather resistant to the effects of volume conduction [11]. Fp1 F3 F7 C3 T3 T5 P3 O1 Fp2 F4 F8 C4 T4 T6 P4 O2 Fz Cz Pz Normalize directed transf er f unction Fp1 F3 F7 C3 T3 T5 P3 O1 Fp2 F4 F8 C4 T4 T6 P4 O2 Fz Cz Pz Figure 8 DTF for simulation model 1, volume conduction signals show near significant directional causal relations where there are none. Figure 8 shows the result of model 1 simulation, volume conduction causes the algorithm to detect a near significant causal relation where there is none. Roughly the same results are returned with the second model simulation. Fp1 F3 F7 C3 T3 T5 P3 O1 Fp2 F4 F8 C4 T4 T6 P4 O2 Fz Cz Pz Normalize directed transf er f unction Fp1 F3 F7 C3 T3 T5 P3 O1 Fp2 F4 F8 C4 T4 T6 P4 O2 Fz Cz Pz Figure 9 DTF for simulation model 2, notice the volume conduction signals show significant directional causal relations where there are none. When a underlying directional causal system like model 4 is simulated DTF seems to withstand the effects of volume conduction from unrelated neuron clusters far better. However, remember the model 4 system simulates a causal relation from near electrode O1 towards electrode P3. Figure 10 shows the heat map returned by normalized DTF when model 4 is analyzed the causal relation is detected. However the detected direction is incorrect. The P3 O1 directional relation shown in yellow is qualified stronger than the O1 P3 relation which is obviously wrong. When the unrelated nearby cluster is removed, as is when using simulation model 3 the directional relation is correctly qualified. Which leads to the conclusion that normalized DTF is to certain extend susceptible to the effects of volume conduction Fp1 F3 F7 C3 T3 T5 P3 O1 Fp2 F4 F8 C4 T4 T6 P4 O2 Fz Cz Pz Fp1 F3 F7 C3 T3 T5 P3 O1 Fp2 F4 F8 C4 T4 T6 P4 O2 Fz Cz Pz Figure 10 Simulated model 4 Normalized Directed Transfer Function. Notice the directional causal relation from O1 towards P3 is detected but due to volume conduction of a nearby cluster the direction of the causal relation is incorrect. C. Partial Directed Coherence Fp1 F3 F7 C3 T3 T5 P3 O1 Fp2 F4 F8 C4 T4 T6 P4 O2 Fz Cz Pz Partial directed coherence Fp1 F3 F7 C3 T3 T5 P3 O1 Fp2 F4 F8 C4 T4 T6 P4 O2 Fz Cz Pz Figure 11 Partial directed coherence plotted for the first simulation model, volume conduction originating from one firing neuron cluster shows near significant directional causal relations. Fp1 F3 F7 C3 T3 T5 P3 O1 Fp2 F4 F8 C4 T4 T6 P4 O2 Fz Cz Pz Partial directed coherence Fp1 F3 F7 C3 T3 T5 P3 O1 Fp2 F4 F8 C4 T4 T6 P4 O2 Fz Cz Pz Figure 12 Partial Directed Coherence maxima returned when analyzing the second model, two unrelated active neuron clusters. When the EEG scalp electrode potentials from one firing neuron cluster are analyzed using PDC these potentials reveal near significant causal relations. As for DTF the inclusion of a second active cluster nearby just barely amplifies the effect. Analyzing model 4 using PDC reveals comparable results. PDC does, however hardly significant, identify the causal

56 relation in the model but fails to properly identify the source cluster in the relation in de presence of a nearby unrelated cluster. (Figure 13) Fp1 F3 F7 C3 T3 T5 P3 O1 Fp2 F4 F8 C4 T4 T6 P4 O2 Fz Cz Pz Partial directed coherence Fp1 F3 F7 C3 T3 T5 P3 O1 Fp2 F4 F8 C4 T4 T6 P4 O2 Fz Cz Pz Figure 13 PDC heat map for model 4 the P3 O1 simulation shows, incorrectly, to be more significant than the simulated O1 P3 relation. IV. DISCUSSION This paper reviewed, among others, the coherence functions, the directed transfer functions and partial directed coherence. In particular the usability of these functions to discriminate against causal flow in recorded channels of EEG data was evaluated. We showed that any of these functions, while they do produce desired results in isolated systems, suffer from the effects of volume conduction. Nevertheless, the partial coherence function showed itself to be able to distinguish between volume conduction and an underlying causal relation in our simulation with a spherical head model. However, partial coherence does not differentiate between transmitter and receiver in a causal relation and is therefore inadequate when fully identifying causal relationships in EEG data. The directed transfer functions, although they do not allow to discriminate between direct and indirect causal relations, do allow the identification of the transmitter - and receiver electrode. But, DTF seemed to be unable to do so in the presence of a underlying, unrelated, firing neuron cluster. Therefore DTF showed itself to be susceptible to the effects of volume conduction. As for partial directed coherence, at first glance PDC seems to be reliable in both identifying transmitter and receiver electrodes and in discriminating against direct and indirect causal relations. PDC however showed itself to be unable to distinguish between transmitter and receiver electrode in the presence of the effects of volume conduction. Thus showing PDC more susceptible to the effects of volume conduction than DTF. Nevertheless our simulations showed both DTF and PDC to be able to identify brain regions involved in a causal relation. If researchers bear in mind that the identified causal flow is tainted by the effects of volume conduction these functions may be used to identify causal relations. However they should be aware that identifying a causal relation does not necessarily identify the underlying neural structure. We suggest that further research in this area in the future incorporate blind source separation (BSS) techniques and independent component analysis (ICA) as they are currently being researched for brain computer interfacing (BCI) [8][3]. It is our believe that these techniques might be able to factor out the effects of volume conduction in brain causal analysis. REFERENCES [1] Akaike, Hirotugu: A New Look at the Statistical Model Identification (IEEE Transactions on Automatic Control 1974) [2] Baccalá, L. A., Sameshima, K.: Partial directed coherence: A new concept in neural structure determination (Biological Cybernetics 2001) [3] Delorme, A., Makeig, S.: EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis (Journal of Neuroscience Methods 134, 9 21, 2004) [4] Ding, M. et al.: Short-window spectral analysis of cortical event-related potentials by adaptive multivariate autoregressive modeling: data preprocessing, model validation, and variability assessment (Biological Cybernetics 2000; 83: 35-45) [5] Eshel, Gidon: The Yule Walker Equation for the AR Coefficients [6] Granger, Sir Clive W.J.: Investigating Causal Relations By Econometric Models And Cross-Spectral Methods (Econometrica 1969; 37: ) [7] Hytti H, Takalo R, Ihalainen H.: Tutorial on multivariate autoregressive modeling (Journal of Clinical Monitoring and Computing 2006) [8] James, C.J., Wang, S.: Blind Source Separation in single-channel EEG analysis: An application to BCI (Signal Processing and Control Group, ISVR, University of Southampton, Southampton, UK) [9] Kamiński, M., Blinowska, K.: A new method of the description of the information flow in the brain structures (Biological Cybernetics 1991; 65: ) [10] Kamiński, M. et al.: Evaluating causal relations in neural systems, Granger causality, directed transfer function and statistical assessment of significance (Biological Cybernetics 85, , 2001) [11] Kamiński, M., Liang, H.: Causal Influence: Advances in Neurosignal Analysis (Critical Reviews in Biomedical Engineering 2005; 33(4): ) [12] Kemp, B. et al.: A simple format for exchange of digitized polygraphic recordings (Electroencephalography and Clinical Neurophysiology 1992; 82: ) [13] Korzeniewska, Anna et al.: Determination of information flow direction among brain structures by a modified directed transfer function (ddtf) method (Journal of Neuroscience Methods 2003; 125: ) [14] NI LabVIEW Help, TSA AR Modeling [15] Roberts, M.J., Signals and Systems: Analysis Using Transform Methods and MATLAB (McGraw-Hill 2004) [16] Schelter, B. et al: Testing for directed influences among neural signals using partial directed coherence (Journal of Neuroscience Methods , 2005) [17] Seth, Anil K.: Granger Causal Connectivity Analysis (Neurodynamics and Consciousness Laboratory (NCL), and Sackler Centre for Consciousness Science (SCCS) University of Sussex, Brighton 2009) [18] Seth Anil K.: A MATLAB toolbox for Granger causal connectivity analysis. (Journal of Neuroscience Methods 2009) [19] Seth, Anil K.: Causal connectivity of evolved neural network during behavior [20] Schwarz, Gideon: Estimating the dimension of a model (The annals of statistics 1978; Vol.6 Nr ) [21] Takahashi, D. Y., Baccalá, L. A.: Connectivity Inference between Neural Structures via Partial Directed Coherence (Journal of Applied Statistics 2007) [22] Takalo, Reijo et al.: Tutorial on univariate Autoregressive Spectral Analysis (Journal of Clinical Monitoring and Computing 2005) [23] Vanrumste, Bart.: EEG dipole source analysis in a realistic head model (Universiteit Gent, Faculteit Toegepaste Wetenschappen 2002) [24] 50

57 Reinforcement Learning using Predictive State Representation. Greg Klessens, Tom Croonenborghs Biosciences and Technology Department, K.H.Kempen, Kleinhoefstraat 4, B-2240 Geel Biosciences and Technology Department, K.H.Kempen, Kleinhoefstraat 4, B-2240 Geel Abstract In reinforcement learning an agent makes decisions based on the reward he will receive. In standard reinforcement learning the agent knows exactly where he is and can easily make decisions based on this information. In Predictive State Representation (PSR) the agent only has a set of information which it obtained through e.g. sensors. The most common methods used in reinforcement learning today are direct approaches such as Monte Carlo or Q-Learning which use the direct output of states to solve a problem. In PSR we first make a model of the environment to see how the environment responds to the actions of the agent and then solve the problem. We have implemented such a learning method in Matlab where we focus on learning and modeling the environment. II. INTRODUCTION TO REINFORCEMENT LEARNING A. Reinforcement Learning Reinforcement Learning is a part of machine learning where an agent has to solve a learning problem by doing actions and receiving rewards for these actions. The agent is not told which actions to take, as in most of machine learning, but instead must discover which actions yield the most reward by trying them [Sutton, Barto, 1998]. Index Terms Predictive State Representation, Reinforcement Learning, Suffix History Algorithm I. INTRODUCTION Reinforcement learning is learning what to do, how to map situations to actions, so as to maximize a numerical reward signal [Sutton, Barto, 1998]. In reinforcement learning we deal with methods that allow an agent to learn from interacting with his environment. The agent can use different methods to learn the correct actions [1]. Predictive state representation is a new method of representing the environment to the agent. Unlike tradition reinforcement learning, the agent does not know exactly which state he is in. The agent only receives observations of his environment and not the exact position. This sort of problem has already been looked at using Partially Observable Markov Decision Processes which uses the (possibly entire) history to represent a belief state. In PSR we use predictions of future actions to give an estimate in which state the agent is currently located. This paper is an extension to my master thesis. In my master thesis I describe the experiments I conducted as well as how I implemented PSR into Matlab. Figure 1: Reinforcement Learning Setup. B. Discovery The basic setup of reinforcement learning is shown in Figure 1. An agent has a set of actions he can perform in an environment. Based on these actions he will receive a reward and information about the new state he is in. With every action the agent takes he should get a better view of his environment and learn which actions in which states result in the biggest reward. C. Environment The environment can be presented to the agent in a variety of ways. The most obvious one is that the agent completely knows his environment and his exact position in this environment. This is known as the Markov Property. The agent can then start the discovery process, these kinds of environments can be modeled using the Markov Decision Process [2]. However there are situations where the environment is only partially known to the agent thus the same methods cannot be used. It has already been proven that these kind of problems can be solved using Partially Observable 51

58 Markov Decision Processes [3]. Predictive state representation uses observations received by the environment to model the environment. This is the same way a POMDP works. But where a POMDP uses the history to determine what state the agent is in, a PSR uses predictions of the future. The remainder of this paper will present how PSR works and the results of our research. III. PREDICTIVE STATE REPRESENTATION A. System Dynamics Matrix Predictive State Representation uses action a i A and observation o i O pairs to represent the environment to the agent. There are 2 different pair groups, histories h i H and tests t i T. Histories represent actions and observations the agent has done in the past, while tests represent the actions and observations the agent can do from this point on. The relationship between histories and tests is gathered in a matrix called the system dynamics matrix. The rows of the matrix correspond to the histories that have been encountered and the columns correspond to the tests. In the intersecting part will be the probability p(t h) of a test occurring with a specific history. Figure 2: Matrix containing histories and tests. B. Core Tests We will be using a linear PSR in this thesis, because this has several advantages when it comes to implementing. A linear PSR typically has columns (rows) that are linearly dependent on other columns (rows) in the System Dynamics Matrix. Therefore we don t need to keep the entire System Dynamics Matrix to know all the possible probabilities. Instead we will look for so called core tests q i Q and core histories h i H. The probability of a core test occurring at a core history is noted as p(q h). Core tests are linearly independent on the other tests. All the other tests can be calculated using the core tests. Finding these core tests is an important part of the PSR since finding good and correct core tests will define how compact we can represent our environment. There are several ways to finding these core tests. We have used the Suffix History Algorithm for its simplicity. C. Suffix History Algorithm The Suffix History Algorithm uses the rank of a matrix as an indication of the linearity of the matrix. We sort our System Dynamics Matrix by the lengths of the tests and the histories. We take the System Dynamics Matrix and check the rank. After taking another action and updating the system dynamics matrix, we check the rank again. If the rank is equal or less than the previous rank, this indicates that the core tests are in the system dynamics matrix. After this we start an iterative process to find the core tests. We take a sub matrix from the system dynamics matrix which we increase in size. We check the rank of this sub matrix. If the rank is smaller than the size of the sub matrix, we remove the last added test because it is not linearly independent. It is clear that this does not always give best results, depending on what information is given in the system dynamics matrix, some core tests may not be found or in the other case some tests may be chosen as core tests while they are not. When tests that are linearly dependent get chosen as core tests we can t predict our model parameters correctly. This is because we use linear equations to find our model parameters. To find these linear equations we take the inverse of the matrix and when there is linear dependency in the matrix, this matrix becomes singular which makes it impossible to find the inverse of the matrix. D. Model Parameters As we mentioned before, all the probabilities of the other tests can be calculated using the core tests. This means that for every test t there exists a variable m t so that: p(t h) = p(q h). m. So we need to find these parameters as well. We use the one-step extensions p(aoq h) of the Core Tests to find the parameters. Because we are working with linear PSR s we can find these parameters by solving the linear equations [3]: p(q h1) p(aoqi h1) p(q h2) p(aoqi h2) m = p(q hn) p(aoqi hn) p(q h1) p(ao h1) p(q h2 p(ao h2) m = p(q hn) p(ao hn) Using these 2 parameters we can calculate the probability of any test given any history with the following formula: p(q aoh) = p(q h) m p(q h) m Our experiment shows how accurate the current setup approaches the actual probabilities at different time steps. 52

59 IV. EXPERIMENTS A. Influence of the size of an environment We conduct research to find out if the size of an environment has an immediate and large impact on the accuracy of the predictions made by the PSR. 1) Environments The environments on which we conducted research for this experiment are very simple. This is to make the results more interpretable. In the entire used environment the agent only has 1 action. This is because the focus of this experiment lies on the modeling of the environment solely depending on the size of the environment. Figure 3: Optimal environment. The environment in Figure 3 is a straightforward one, you always go from one state to the other. This shows that the setup works and gives the optimal graph to strive for. Figure 4: More complex environment. The environment in Figure 4 is the same as in Figure 3 but more complex in the way we move from one state to the other. The start state is Light for example, when the agent has to take an action (of which he only has 1), he has a chance of 0.7 to go to state Dark and a chance of 0.3 to stay in the state Light. This gives a view on how complexity alone affects our probabilities. Figure 5: Bigger and complex environment. The environment in Figure 5 is bigger and will show how the size of an environment affects our results. 2) Results Figure 6 and Figure 7 show the results we obtained from running our setup 10 times on the different environments as described above. The test we predict is p(dark Dark Light), the probability that we get Dark after we have seen Dark followed by Light. The values in the graph are the Root Mean Square Errors of the calculated value and the real value. The root mean square error is an indication of how much the estimated value approaches the real value. The formula is RMSE = E((a â) ) Where a is our real value and â is the calculated value. We take the square of the variance and then take the root of this square. This allows for negative variances. The values in Figure 8 are the average of 10 independent runs and each run is actions taken. The values are displayed on a logarithmic scale so the differences between the environments are highlighted. 53

60 Figure 6: Complex environment results. Figure 9: p(light Light Dark) RMSE averages on logarithmic scale. Here we see that the results in Figure 8 are similar to the results in Figure 9. Both the Complex and the Big and complex environment give similar results. Figure 7: Big and Complex environment results. B. Influence of the amount of actions in an environment Next we research to find out if the amount of actions in an environment has an immediate and large impact on the accuracy of the predictions made by the PSR. 1) Environments The environments we used in our previous experiment represent the first action of the environment in this experiment. Figure 8: p(dark Dark Light) RMSE averages on logarithmic scale. The graph in Figure 8 shows that for this particular test the RMSE values of the bigger environment is similar to the values of the complex environment. For comparison we used the same setup to predict p(light Light Dark). The results are shown in Figure 9. Figure 10: Second action for the complex environment. 54

61 Figure 12: p(action1 Light action1 Light action2 Dark) RMSE averages on logarithmic scale. As we can see in Figure 12 the differences between the 2 environments is clearly larger than the differences seen in Figure 8 and Figure 9. To determine the impact of adding an action we compare the results of the test p(dark Dark Light) in the previous experiment with the results of the same test in the environments with 2 actions. Figure 11: Second action for the bigger environment. Figure 10 and Figure 11 show the environments for the second action which is clearly different from the first action. 2) Results We use the same setup as in the previous experiment by running the PSR 10 times on the different environments described above. The test we predict is p(action1 Light Action1 Light Action2 Dark), the probability that we perform Action 1 and observe Light after we have used Action 1 resulting in observation Light followed by Action 2 and observation Dark. The graph shows us the RMSE value on a logarithmic scale. Figure 13: Comparison of the complex environment with different amount of actions. Figure 13 shows the RMSE values of the complex environment with the different amount of actions. The environment that has only one action is clearly more precise in predicting the test. Figure 14: Comparison of the big environment with different amount of actions. 55

62 Just as with the smaller environment, the difference between the big environment with 1 action and the same environment with 2 actions is clearly notable in Figure 14. V. CONCLUSION The results of our experiments show that the RMSE values of all environments decrease as more actions are taken. This proves that the PSR works. In environments containing one action the size of the environment does not make an apparent influence on the predictions. When we add an extra action to the environments, there is a clear difference. Predicting the same test at environments with different amounts of actions we saw that the RMSE values from the environment with more actions still decrease but clearly at a slower rate. The differences cause by the size of the environment is more obvious in the environments with 2 actions compared to the environments with 1 action. REFERENCES [1] Peter Van Hout, Tom Croonenborghs, Peter Karsmakers, Reinforcement Learning: exploration and exploitation in a Multi-Goal Environment [2] Richard S. Sutton, Andrew G. Barto, Reinforcement Learning: An Introduction (Book style). The MIT Press, Cambridge: Massachusetts, 1998, pp [3] Michael R. James, Using Predictions for Planning and Modeling in Stochastic Environments [4] Satinder Singh, Michael R. James, Matthew R. Rudary, Predictive State Representations: A New Theory for Modeling Dynamical Systems

63 A 1.8V 12-bit 50MS/s Pipelined ADC B. Lievens 1, M. Strackx 2,3, P. Leroux 1,2 1 K.H.Kempen, IBW-RELIC, Kleinhoefstraat 4, B-2440 Geel, Belgium 2 K.U.Leuven, Dept. Elektrotechniek ESAT, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium 3 SCK CEN, EHS-RDC, Belgian Nuclear Research Institute, Boeretang 200, B-2400 Mol, Belgium Abstract This paper presents the design of a 12 bit 50 MS/s pipelined ADC in the TSMC 180 nm technology. The goal is to achieve an effective resolution of 11 bit or more by using an optimized architecture. This architecture will use multiple bits per stage and digital error correction to reduce the number of stages and the power consumption. Index Terms Analog-digital conversion, pipeline ADC, Analog integrated circuits T I. INTRODUCTION HE use of Ultra-Wideband (UWB) radars has been proven useful in the biomedical industry. They can be used for monitoring a patient s breathing and heartbeat [1], hematoma detection [2] and 3-D mammography [3]. In [4], the use of UWB radars for measuring tissue permittivity is described, with the intention to apply the technique to radiotherapy. There a measurement setup using time domain reflectometry (TDR) is proposed. For this work, changes in the electromagnetic parameter of the material must be derived out of the reflected UWB pulse. A digital equivalent-time sampling UWB receiver architecture is proposed in [5]. It specifies a resolution of 11 bits or more for the ADC, and a sampling frequency with jitter less than or equal to 1 ps. Thus to digitalize the reflected pulse, there is a need of a high resolution ADC. In this application high resolution means a moderate sample rate as well as a relatively high number of bits. To get the required information of the UWB pulse, a sample must be taken every 10 ps. This translates to a sampling frequency of 100 GHz. A high resolution ADC with such a sampling frequency has yet to be discovered due to the immense power consumption it would have. To achieve this, [5] proposes equivalent time sampling. Assuming that identical UWB pulses which reflect on the same material give identical reflections, a minimum of only one sample per pulse is needed. One pulse every 20 ns decreases the required ADC sample rate to 50 MS/s. For a total sample rate of 100 GHz, each sample must be taken every 20 ns plus a small delay of 10 ps for each sample. The subsampling system is based on a low jitter VCO incorporated in a PLL, which provides the low jitter sampling frequency, in combination with a medium resolution ADC. The latter will be described in this paper. II. ADC ARCHITECTURE The ADC requirements are targeted at a resolution of 12 bits with a sampling frequency of 50 MS/s. Therefore we propose the pipeline architecture [6][7][8][9][10]. This paper proposes an architecture as seen in Fig. 1. It consists of two 2.5 bit front end stages, followed by four 1.5 bit stages and ending with a 4 bit backend flash ADC. Even without background calibration a resolution of 12 bits could be achieved in TSMC 180 nm technology with this architecture [11]. No background calibration greatly simplifies the pipeline architecture. However, great consideration must be made regarding the mismatch of components in circuit design. Fig. 1. Proposed pipeline architecture. By using multiple bits per stage in the first two stages, the needed accuracy of the residue calculated in a stage reduces very quickly. Digital error correction with 0.5 bit overlap is used to reduce the needed accuracy of the comparators and thus prevent errors made in the sub ADCs. By increasing the resolution of the two front-end stages and the backend stage the number of stages are reduced. This can increase the linearity of the total ADC and reduce the total power 57

64 consumption. III. CIRCUIT IMPLEMENTATION A. Multiplying digital- to-analog converter The Multiplying digital-to-analog converter (MDAC) is the heart of each stage and is responsible for generating the residue for the next stage. It is a switched capacitor circuit that functions as digital-to-analog converter, sample-and-hold circuit and residue amplifier. Fig. 2 shows a circuit implementation of a 1.5 bit MDAC. The residue of 1.5 bit stage is calculated by (1). The circuit works in two non overlapping phases. During phase one, all the switches marked with f1 are closed. During phase two, the switches marked with f2a close and shortly after also the switches marked with f2. During the first phase all the capacitors are charged to V in. The total charge in the circuit at this time equals to (2). During the second phase the feedback capacitors C f1 and C f2 are flipped around and connected between the output and the inverting input of the amplifier. Due to negative feedback created in this way, the negative input of the amplifier can be considered a virtual ground. The total charge in the circuit at this time equals to (3). According to the conservation of charge (2) is equal to (3). It can then be shown that the output V res is equal to (4) with C f1 and C f2 equal to C f, and C s1 and C s2 equal to C s. This value is holded at the output of the amplifier during phase two. This contributes to the sample and hold function of the MDAC. By selecting V A and V B at either V ref or 0V and taking C s equal to C f, the value of the sub-adc can be created and subtracted from V in. This contributes to the DAC function of the MDAC. The feedback factor multiplication also contributes to the amplification of the residue. (3) Fig. 3. Circuit implementation of 1.5b MDAC (4) The circuit implementation of a 2.5 bit MDAC is shown in Fig. 3. The residue in a 2.5 bit stage is calculated by (5). It works in the same way as described for the 1.5 bit MDAC. However, due to more six sample capacitors the feedback factor results in a gain of four. The output V res is given by (6) with all capacitors equal to each other. 4 (5) Fig. 2. Circuit implementation of 1.5b MDAC 2 (1) (2) 4 (6) In the design of the MDAC it is important that the relative matching of the capacitors C s and C f is large enough. Mismatch in these capacitors may result in missing codes at the digital output of the ADC. The size of the switches is also important. The switch resistance is responsible for an RC time constant. As the capacitors need to be fully charged within half of a sampling period, the size of the switches must be large enough. The required switch size is a function of the capacitance that needs to be charged and the sampling frequency. Given that the capacitors only need to be charged up to ¼ LSB, it can be shown that 58

65 2 (7) With t being half a period of the sampling frequency, τ the RC time constant and x the resolution of the total ADC in bits. The maximum switch resistance could then be calculated by (8) Large switches however, produce large parasitic capacitances in the circuit. The switches can also cause distortion if the input signal is too high. Choosing the reference voltage at 0.3V and 1.5V in a design where the supply voltage equals to 1.8V prevent this from happening. Other factors, which are important to the functioning and accuracy of the MDAC, are the specifications of the OTA used in the circuit. These specifications will be discussed in the next section. B. Operational transconductance amplifier The specifications of the OTA used in the MDAC, define its settling time and settling accuracy. To achieve a 12 bit resolution, the OTA used as residue amplifier in the front end MDAC must settle the residue within ¼ LSB. Since the value of two bits are already calculated in the first stage, the residue must settle within ¼ LSB of a 10 bit resolution. Therefore the OTA in the first stage must have a DC gain of at least 2 14 or 84 db for an ADC resolution of 12 bits. The settling time is dependent on the GBW of the OTA and must be well within half a period (10 ns) of the sampling frequency. It is shown in [12] that the required GBW of the OTA can be calculated by (9) For a sampling frequency of 50 MHz and a feedback factor of 0.25 in the front end MDAC, we calculate the closed-loop GBW of the OTA at 230 MHz. Considering the feedback factor of the OTA in a 2.5 bit MDAC is 0.25, the open-loop GBW then should be > 230MHz/0.25 = 920 MHz. As the OTA needs a high GBW as well as a high DC gain, a folded cascode architecture is chosen as shown in Fig. 3. In order to increase the gain of the cascode even more, the gain boosting technique is applied to the cascode transistors. Because of the limited output swing of a folded cascode amplifier, a second stage is added to increase the output swing. Miller compensation is used to ensure a sufficient phase margin. The second stage is a simple NMOS differential pair with PMOS current source loading and achieves a high swing with a large GBW. C. Comparator For the comparators in the sub ADCs and in the back-end ADC, a differential pair comparator is used as shown in Fig. 4. In comparison to the Lewis and Gray dynamic comparator introduced in [13], it can achieve a lower offset and is thereby more accurate [14]. It also consumes no DC power. Simulations of this comparator are shown in [Fig] and an offset of 16mV was measured. In a differential circuit with V ref being 0.6V and -0.6V, 16 mv offset corresponds to a accuracy within 7 bit. As the proposed architecture in this paper uses digital error correction to reduce the needed accuracy of the comparators, an accuracy of 7 bit is enough for all the stages. The comparators in the first two stages only need to be accurate to 4 bit and the comparators in the four 1.5 bit stages only need to be accurate to 3 bit. Even the 4 bit backend ADC only needs to be accurate to 5 bit. This comparator will therefore be used in each architecture. Here f u is the unity gain frequency, N is the settling accuracy in bits, n is the number of bits resolved in the stage, f s is the sampling frequency and β is the feedback factor given by (10) Fig. 4. Circuit implementation of the differential pair comparator Fig. 3. Circuit implementation of the OTA in the front-end MDAC IV. CONCLUSION The design of a 12 bit 50 MS/s pipeline ADC is presented. By optimizing the architecture the power consumption is reduced. This is done by using first 2.5 bit stages and ending with a 4 bit flash. In between 1.5 bit stages are used. 59

66 REFERENCES [1] I. Immoreev, T. H. Tao, UWB Radar for Patient Monitoring, IEEE Aerospace and Electronic Systems Magazine, vol. 23, no. 11, pp , [2] C. N. Paulson et al., Ultra-wideband Radar Methods and Techniques of Medical Sensing and Imaging, Proceedings of the SPIE, vol. 6007, pp , [3] S. K. Davis et al., Breast Tumor Characterization Based on Ultrawideband Microwave Backscatter, IEEE Transactions on Biomedical Engineering, vol. 55, no. 1, pp , [4] M. Strackx et al., Measuring Material/Tissue Permittivity by UWB Time-domain Reflectometry Techniques, Applied Sciences in Biomedical and Communication Technologies (ISABEL), rd International Symposium. [5] M. Strackx et al, Analysis of a digital UWB receiver for biomedical applications, European Radar Conference (EuRAD), 2011, submitted for publication. [6] T. N. Andersen et al, A Cost-Efficient High-Speed 12-bit Pipeline ADC in 0.18um Digital CMOS, IEEE Journal of. Solid-state Circuits, vol. 40, no. 7, July [7] S. Devarajan et al, A 16-bit, 125 MSps, 385 mw, 78.7 db SNR CMOS Pipeline ADC, IEEE Journal of. Solid-state Circuits, vol. 44, no. 12, December [8] H. Van de Vel et al, A 1.2-V 250-mW 14-b 100-MSps Digitally Calibrated Pipeline ADC in 90-nm CMOS, IEEE Journal of. Solidstate Circuits, vol. 44, no. 4, April 2009 [9] Y. Chiu, P. R. Gray and B. Nikolic, A 14-b 12-MS/s CMOS Pipeline ADC With Over 100-dB SFDR, IEEE Journal of. Solid-state Circuits, vol. 39, no. 12, December 2004 [10] O. A. Adeniran and A. Demosthenous, An Ultra-Energy-Efficient Wide-Bandwidth Video Pipeline ADC Using Optimized Architectural Partitioning, IEEE Transactions on Circuits and Systems-I: Regular Papers, vol. 53, no. 12, December [11] J. Li and U. K. Moon, Background Calibration Techniques for Multistage Pipelined ADCs With Digital Redundancy, Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions, vol. 50, no. 9, pp , September [12] I. Ahmed, Pipelined ADC Design and Enhancement Techniques, Springer, [13] T. B. Cho, P. R. Gray, A 10 b, 20 MsampleIs, 35 mw Pipeline A D Converter, IEEE Journal of. Solid-state Circuits, vol. SC-30, no. 5, pp , March [14] L. Sumanen et al. CMOS dynamic comparators for pipeline AD converters, Circuits and Systems, 2002, IEEE International Symposium. 60

67 Accessing Enterprise Business Logic in a Mobile Warehouse Management System using WCF RIA Services Jonas Renders, Xander Molenberghs, Chris Bel, and Dr. Joan De Boeck Abstract When developing a Warehouse Management System, code reuse can significantly facilitate the coding process. Together with the different components in the system, such as a PDA or PC, different techniques of accessing the underlying data storage show up. Moving towards a N-Tier architecture creates opportunities for all the components to access the same business logic. WCF RIA Services strive towards simplifying the development of N-Tier solutions. This paper starts with a global overview of the N-Tier architecture concept. The Entity Framework and WCF RIA Services cooperate in order to achieve the N-Tier architecture. This is followed by a theoretical approach of how access to the same business logic is gained, either through a PC or a PDA. Based on the study of shortcomings with respect to updating the underlying database, this paper concludes in some concrete workarounds to the identified problems. Index Terms domain services, Entity Data Model, Entity Framework, N-Tier, WCF RIA Services S I. INTRODUCTION TRUCTURE is one of the main concerns in developing software applications nowadays. Code reusability, scalability and robustness are key points when it comes to application development. IT-teams consist of developers who work on the same project. As this could lead to codeoverwriting or duplicate code, a well-thought software architecture could overcome these issues. An N-tier architecture divides an application in multiple logical tiers. Examples of such tiers may be a data access layer, a business logic layer and a presentation layer. With the division of tiers, developers aim at creating a reusable, scalable and robust application. Each developer in the team can focus on the part of the application that fits best his/her skills set. J. Renders is a Master of Science Student at Katholieke Hogeschool Kempen, Geel, 2440 BELGIUM X. Molenberghs is a Master of Science Student at Katholieke Hogeschool Kempen, Geel, 2440 BELGIUM C.Bel is a Manager at Codisys, Olen, 2250 BELGIUM Dr. J. De Boeck is a university lecturer at the Department of Industrial and Biosciences, Katholieke Hogeschool Kempen, Geel, 2440 BELGIUM In this paper we will discuss how to implement this N-tier architecture in a Warehouse Management System application striving for a maximum of automatic code generation and maintenance. This Warehouse Management System includes several components: mobile devices or PDA s are used in the warehouse to control the flow of goods; Desktop computers on the other hand handle the administration of the system. Besides these two, there are other services like mobile printers and reporting services. In order to achieve reusability, the mobile devices in the warehouse and the desktop pc s at the office have to access the same business logic layer in the N-tier architecture. As time is money, automatic code generation could save a lot of time. In the search for technologies to use, this was also an important criterion to take into account. We will start this paper with a discussion of the technologies we used at the different tiers. The Entity Framework [1] and WCF RIA Services [2] are addressed here. Next we will distinguish the difficulties that arose when trying to update the underlying data storage. The Entity Framework as well as the WCF RIA services had several problems reacting to these changes. Finally, we will propose our solutions to these problems. Though some of the problems couldn t be completely solved and remained bottlenecks. II. IMPLEMENTING THE N-TIER ARCHITECTURE Codisys February, 2011 The N-tier architecture needs to be suitable for a Mobile Warehouse Management system. Before discussing each component, a global overview of how each component fits in the N-tier architecture can give a better understanding of their tasks and purposes. 61

68 The Conceptual layer (.csdl) represents the actual Entity Data Model Schema. The Storage layer (.ssdl) represents the database objects that the developer selected to use in the model. The Mapping layer (.msl) maps the entities and relations in the Conceptual layer to the tables and columns in the Storage layer. Fig. 2. Main parts of the Entity Data Model These three layers co-operate to form the Entity Data Model. Because the model handles the connection to the database among others, development time is reduced. Fig. 1. Global overview of the N-tier architecture in the WMS A Warehouse Management System depends on a large underlying database, which has to be accessed from within the programming code. With using Microsoft s Entity Framework the need for writing a lot of code has been disappeared. A. Entity Framework 4.0 Microsoft s Entity Framework 4.0 implements Peter Chen s entity-relationship model [3]. This model maps relational database schemas to an Entity Data Model. The Entity Framework lets developers create a data model starting from an existing relational database. The model is created through a wizard and represents all the items in the database, such as tables, views and stored procedure, as objects. In the wizard a database connection is made and the tables, stored procedures, and views the developer wants to use, are selected. Afterwards the developers can adjust the model so that they get an application-oriented view of the data in the relational database scheme. To be able to work with the created object, or the so-called entities, one has to approach the entities within an ObjectContext. The ObjectContext is a class that knows all the entities and which is used to create and execute queries. Because of the application-oriented view, it s easier to query for data from the database. The developer doesn t necessarily need to know anything about the structure of the database, the model will handle this. Julie Lerman [1] describes the three main parts of which the model exists. These files are created at runtime: B. WCF RIA Services Every enterprise application needs business logic. The business logic includes all custom methods that are specific to a particular application. In a classical three-tier architecture the business logic resides at the middle tier. WCF RIA Services implement business logic in so called domain services. The domain services are generated by a wizard which uses the ObjectContext and the entities created by the Entity Framework. Though WCF RIA services interact seamlessly with the Entity Framework it is not exclusively bound to the EF. Also other data layers can be used. In our Warehouse Management System, the WCF RIA Services adds some key features to the application. CRUD methods are automatically created if selected in the wizard. A metadata class is generated with each domain service. Validation and authentication methods can be added here. Queries are executed transactional from client to server. In case of an interrupt during query execution, a rollback action will occur. Perfect integration with Silverlight. Silverlight was preferred as the technology to use in the presentation layer of our system, as we will see next. Fig. 3. WCF RIA Services in an Architecture 62

69 C. Presentation Layer The upper part of the N-tier architecture consists of the presentation layer. Our goal at this point is that the presentation layer only has to call methods from the business logic layer. It only serves as a way to represent the data, no business calculations are performed here. As said before in the previous section, the Silverlight technology was preferred to be used in the top layer. However, the Windows Mobile 5 6 platforms that run on the mobile devices don t support the Silverlight technology. Research brought us to a solution to this lack of support that will be explained in a further section. D. Desktop implementation One of the criteria was to use Silverlight, at least as desktop application. When it was compatible with the Windows Mobile 5 6 platforms, it is also preferred for the PDA software. This will be discussed later. Here we will discuss what is needed to setup a Silverlight application which accesses a SQL database and uses WCF RIA Services as service layer with the necessary business logic. This was tested in the Visual Studio 2010 environment and written in C#. For testing purposes we used the Silverlight Business Application template build in Visual Studio. A basic presentation layer is provided with this template which sped up the testing process. Once the Silverlight project has been created, a data model should be added to the project. It s important to add the model to the project from which the name ends with.web. By adding a new item and choosing for an ADO.NET Entity Data Model, the wizard will ask to create an empty model or generate one from an existing database. The latter option is needed in our case. Along the way, the wizard will give the user more options than discussed here. Only the ones that are relevant here are discussed. Now the database needs to be accessed by the Silverlight application. This is accomplished by adding a new item to the same project and choosing a Domain Service Class. A wizard will pop up where the user can specify which tables should be accessible. By enabling the editing option the user can perform all of the CRUD methods. The associating classes for metadata are generated by default. Later on we will discuss what the purpose of the metadata is. When the Domain Service is created it is crucial to build the project. Otherwise none of the entities in the Domain Service will be accessible from the code. The Domain Service file is opened by default and it is now possible to add custom code or business logic. To access the entities from the code (this is the code file from a XAML page from the other project created when the Silverlight template was created) a reference is needed from the project with the Silverlight XAML pages to the project with the data model and the Domain Service (<name-of-the-solution>.web). This can be done by entering the using statement with the name of the project. What is the purpose of the metadata? A metadata class is a simple class that defines the same or a subset of the same properties defined by one of the entity classes (of the Domain Service), but those properties have no implementation and won t actually be called at all. They are just a place where attributes can be added. Those attributes will be added to the corresponding properties in the code generated client entities. Depending on what the attributes are, they will also be used by RIA Services on the server side [4]. E. PDA implementation As at this time, Silverlight is still unavailable for the Windows Mobile 5 6 platforms another presentation technology had to be chosen here. We chose for a web application optimized for mobile use. Obviously, the web application must access the same service layer as the Silverlight application on the desktop. We chose to use a SOAP endpoint to expose the RIA Services. This is accomplished in the following manner. First of all, reference to the RIA Services project is needed. Next, the SOAP endpoint we are going to use needs to be defined somewhere in the application. This is done in the Web.config file. Every Web.config file contains the following line: <system.servicemodel> Any endpoint that needs to be added (this can be a SOAP, OData or JSON endpoint) is added after this line in the following manner (only the SOAP endpoint is shown here): <domainservices> <endpoints> <add name="soap" type="microsoft.servicemodel.domainservices.ho sting.soapxmlendpointfactory, Microsoft.ServiceModel.DomainServices.Hosting, Version= , Culture=neutral, PublicKeyToken=31bf3856ad364e35" /> </endpoints> </domainservices> After a successful built, the application can be run for the first time. The application will open the web browser with the application startup page. By altering the link it is possible to test if the creation of the Domain Service was successful. The link generally looks like this: <name-of-the-domain-service>.svc This link is used as the service reference for the web application project which can be added to the solution as a service reference. The Domain Service is found automatically and ready to be accessed in the web application. The following lines of code are needed to address the Domain Service in a proper way. 63

70 using ServiceReference1; var context = new <name-of-the-domain- Service>SoapClient("BasicHttpBinding_<name-ofthe-Domain-Service>Soap"); The using statement should be entered under the namespace attribute. The Page_Load method could be a possible location to define the Domain context. Everything is now set to start developing the web application for the PDA. A. Updating the Data model III. PROBLEMS In section II we explained how a data model is generated from an existing database. All this works very well until changes are made to the database after the data model has been generated. Even for the smaller changes such as adding columns or changing relationships the data model needs to be regenerated. Microsoft provided an option to the Entity Framework from which one would expect it would take care of this issue. The option itself is well implemented. When a data model is opened, updating the model from the database is only two clicks away. This will show a procedure in which any changes made to the database since the last update are detected. This procedure works fine if attributes or tables are added. The problems start when an attribute or a table is deleted. The procedure detects the deleted attribute or table but deliberately doesn t delete it from the model. The idea behind this is that the model may contain more information than the database, maybe for future updates to the database. There are many complaints about this because of the confusion and there is no option to turn this feature off. It also brings another problem. When a new table is added with the same name as the table that was deleted before but with different attributes, the table is added again to the model but with a different name (mostly the table name with a number attached to it) because there already exist a table with the same name in the model. This may lead to further confusion and errors when building the project solution. But this problem is easily resolved when we manually delete the attribute or table from the data model. The downside is that any attribute or table that needs to be deleted manually from the database also needs to be deleted manually from the data model. This may not be a big problem if the database isn t very large or complex. The same problem exists when an attribute is changed into another type. If we try to build the project, an error occurs because of the incompatibility of the two types. Manually changing the type will resolve this but know that the attribute types of a SQL database are named differently in Entity Framework. Then there are the relationships between the tables. These also need to be manually changed in the data model. This is a bit more complex than deleting an attribute or changing the type of an attribute and gives the user again more work to be done twice. For some people these problems may not be problems at all, but the majority would expect that when there is an option provided that says Update Model from Database that all the updates would be done automatically without intervention of the user. In a worst case scenario the graphical representation of the model cannot be shown any more due to manual changes. Altering the data model in XML is the only alternative besides completely deleting the model. In section IV we propose a possible solution to this problem by means of a third party tool. B. Updating the Domain Service When the former problems are solved without any errors, the next problem arises. In section II we explained how to make a Domain Service for the Silverlight project. Any change in the database also needs to be made in the Domain Service. Otherwise any new attribute or table would not be recognized by the Domain Service and thus would not be addressable in the code. Unlike updating the data model, there is no option to automatically update the Domain Service. Deleting the Domain Service and making a new one seems to be the only way to update the Domain Service. This means that any custom code like business logic is lost each time there is a change in the database. The best solution Microsoft provided is to copy the custom code from the old Domain Service and add it to the new one. Custom code is not written in the beginning or at the end of a file, it is spread across the whole file, making it difficult to find and copy all of the custom code. If the database is large and complex this would be a very timeconsuming job. If for some reason the database needs to be changed regularly, updating the Domain Service every now and then would be too time consuming. Therefore, the main target of our research is to look for possible and stable workarounds for this problem. IV. BOTTLENECKS AND WORKAROUNDS A. Alternative for Updating the Data Model In section III we identified the problem that exists when updating the data model from a changed database. Here we provide an alternative that facilitates the update process by means of a third party tool. After some research we found a DBML/EDMX tool from the company Huagati Systems [5]. We tested this tool and found it to be very simple, yet powerful. It is an add-in for Visual Studio that adds functionality to the Entity Framework designer. The Model Comparer is the most powerful feature of this tool. It is capable of more than we are going to discuss here, for this problem the Model Comparer is the most relevant. The Model Comparer compares the database, the SSDL (storage) layer of the model and the CDSL (conceptual) layer of the model. If any differences between the database and the model layers are found, an overview is shown with the 64

71 possibility of updating the database or one of the models with a single click. This way the differences are easily and selectively synchronized across the layers, only updating the portions selected by the user and leaving the other portions untouched. With this tool we are able to update the data model from the database with just a few clicks. Also the relationships between the tables are easily updated. The Huagati DBML/EDMX tool comes at a price of 150 per user. It s an affordable amount if the database is regularly changed because it saves the user a huge amount of time and misery. B. Updating the Domain Service Workarounds As we explained in section III, the updating of a Domain Service is a serious problem if changes to the database are made regularly. Copying and pasting the custom code from the old to the new Domain Service takes a lot of unnecessary time. A small mistake is easily made while doing this and can have serious consequences. In the worst case, even a small mistake can result in a program that does not work anymore. Because of the time-consuming and fault sensitive nature of this updating process, we did some research to possible tools or workarounds that may make the updating process easier, less time-consuming, less fault sensitive and easier to maintain. We only found one tool that provided an automatic update for this process: DevForce [6]. DevForce is a third party tool which installs as an add-in for Visual Studio. But DevForce does not rely on WCF RIA Services. On the contrary it can be seen as an alternative which works different. Because of the high price of DevForce, while WCF RIA Services is free to use, we kept on using WCF RIA Services. In the next section we will describe some of the workarounds. These workarounds are not the best possible solutions but they can make life a little easier. 1) Use the Existing Wizard The first workaround [7] uses the existing wizard to regenerate code. There is a concrete procedure for this with very clear steps. 1. Exclude any metadata classes from the project that need to be regenerated. 2. Clean and rebuild the solution. This step is crucial, as it ensures that regeneration will function appropriately. 3. Add a new Domain Service to the project with the same name and a.temp.cs extension. 4. In the resulting dialog, clear the '.temp' from the service name and select the entities you would like to include in the service. 5. Copy the resulting output from the <ServiceName>DomainService.temp.cs file into the <ServiceName>DomainService.cs file. Remember to readd the partial modifier to the service class. 6. Merge the resulting output from the <ServiceName>DomainService.temp.metadata.cs file into the classes in the Metadata folder. 7. Delete the two new.temp.cs files that were generated. 8. Include all files excluded in Step #1. 9. Clean and rebuild. This workaround is far from ideal because it looks a lot like the solution provided by Microsoft and it is still a lot of work and still quite fault-sensitive. But it gets the job done without the hassle of searching for your own custom code in the Domain Service. Step #4 can be a bottleneck because all the entities that are needed had to be selected over again. The partial modifier used in step #6 is also used in the next workaround where we discuss this further. 2) Use a Domain Service with Partial Classes The second workaround provides easier maintenance by using a Domain Service with partial classes. With a partial class it is possible to split the definition of a class over two or more source files. Each source file contains a section of the class definition, and all parts are combined when the application is compiled. In our case, the partial classes will spread a Domain Service across multiple files with one entity in each partial class file. The implementation [8] of this workaround is fairly simple. Only a few key steps are needed. 1. Add a new Domain Service 2. Name the file after the entity that you are generating. 3. Modify the Domain Service by adding a partial to the class definition. Fig. 4. The 'partial' Modifier 4. Repeat this process for any additional entities needed in the Domain Service. 5. Remove the attribute shown in the figure below from all other partial class files. It is impossible to set an attribute twice across the partial class files. Fig. 5. Remove the '[EnableClientAccess()]' attribute For the moment, this is the best available workaround there is but it has a disadvantage. If there are a lot of entities needed, you will end up having a lot of partial classes that can be confusing and may increase the complexity of the application. If it does get confusing, a method exists where the solution of the application is split up in multiple projects. We did not test this method and are not going to discuss this further. Because this updating process is a serious problem, Microsoft has put this on their agenda for future releases of WCF RIA Services. 65

72 V. CONCLUSION The combination of the Entity Framework, WCF RIA Services and Silverlight offer a great way for implementing a N-tier architecture in our Warehouse Management System. Developers don t have to write every piece of code themselves as some code is automatically generated. The different components can access the same business logic as the WCF RIA Services exposes several endpoints from which the services can be consumed. Other endpoints that haven t been discussed can be exposed in order to give even more components access to the same business layer. Thus we get a very scalable application. At some points there may remain some bottlenecks, but because of the constant evolution of the used technologies, these issues will probably be solved in the near future. REFERENCES [1] J. Lerman, Programming Entity Framework 2nd edition, O Reilly, [2] Microsoft Silverlight, WCF RIA Services, at [3]P.P. Chen, The Entity-Relationship Model Toward a Unified View of Data, ACM Transactions on Database Systems, [4]Silverlight Show, WCF RIA Services Part 5: Metadata and Shared Classes, at Metadata-and-Shared-Classes.aspx. [5] Huagati DBML/EDMX Tools, Features for Entity Framework, at [6] Ideablade, Ideablade DevForce, at [7]Vurtual Olympus Blog, WCF RIA Services Can we get a better story please?, at Services-Regeneration-Can-we-get-a-better-story-please.aspx. [8]The Elephant and the Silverlight, How to setup your DomainService using partial classes for easy maintenance, at 66

73 IDE1115 An optimal approach to address multiple local network clients 1 An optimal approach to address multiple local network clients (May 2011) Motmans Tim, Larsson Tony Computer Systems Engineering, Högskolan i Halmstad, Box 823, S Halmstad, Sweden Abstract In a Local Area Network, a LAN, some applications or services running on a computer system, like a client participating in a network, need to inform other hosts, participating in the same network while executing the same applications or services, about their presence or exchange other relevant data between each other. One could use single host addressing, called unicasting, when there is only one host to exchange information with. However, when more hosts have to be reached at the same time, one can iterate through sending unicasts or, more commonly, one uses broadcast messaging. This latter approach, however, is not efficient at all, while the first approach can give problems in real-time applications when addressing a big group of hosts. In this paper we will discuss the different network addressing methods and try to find out which approach should be used to provide efficient and easy addressing to multiple local network hosts. Index Terms Broadcasts, Local Area Network, Multicasts Network Addressing, Unicasts A I. INTRODUCTION ddressing multiple hosts participating in the same network has not been efficient for a very long time now. Commonly, even at this time of writing, broadcasting is used, which is a good solution to address all hosts at once. However with the evolution of the human kind, also the network addressing methods have evolved. A new addressing approach, called multicasting, has come up and looks very promising. It only tampers hosts, which are interested in getting certain information, whereas broadcasting tampers all hosts participating in a network, even when not interested in the information being sent. More information about this will be given later on, but first we will start to discuss all suitable network addressing methods, give an example of how to implement the best solution in a prototype application and conclude this paper with the results of the prototype implementation. Motmans T., a Master in Industrial- and Biosciences student at the Katholieke Hogeschool Kempen, Geel, B-2440 BELGIUM. Larsson T., a Professor teaching Embedded Systems at the Högskolan i Halmstad, Box 823, S Halmstad, SWEDEN. A. Definition II. BROADCASTING Broadcasting refers to a method of transferring a message to all recipients simultaneously and can be performed in high- or low level operations. An application broadcasting Message Passing Interface is a good example for a high level operation, whereas broadcasting on Ethernet is a low level operation. In practical computer networking, it also refers to transmitting a network packet that will be received by every device on the network, which can be seen in the figure shown below. The red dot being the broadcasting device, while the green ones are the recipients in the network. Broadcasting a message is also in contrast to unicast addressing, in which a host sends datagrams or packets to another single host identified by a unique IP address. Fig. 1. Broadcast addressing B. Broadcasting scope The scope of a broadcast is limited to a specific broadcast domain. However, by adjusting the Time-To-Live or TTL value of the broadcast datagram being sent, one can configure how far the packet can travel through a network. The TTL value specifies the number of routers or hops that traffic is permitted to pass through, before expiring on the network. For each router, the original specified TTL is decremented by one. When its TTL reaches a value of zero, each datagram expires and is no longer forwarded through the network to other subnets. The optimal TTL value for local networks is four since otherwise the messages will travel too far and will be subject to eavesdropping by others. The table below shows commonly used TTL values for controlling an addressing scope. 67

74 IDE1115 An optimal approach to address multiple local network clients 2 Table I. Commonly used TTL values for controlling the scope Scope Initial TTL value Local segment 1 Site, department or division 16 Enterprise 64 World 128 However, to optimally decide which TTL value should be used, one should use traceroute or tracepath and make a decision based on the output of those commands. This output will contain a list of every hop in the network until the specified host or URL has been reached. C. Limitations One limitation is that not all networks support broadcast addressing. For example, neither X.25 nor Frame Relay have a broadcast capability. Moreover any form of an Internet-wide broadcast does not exist at all. Broadcasting is also largely confined to LAN technologies, most notably Ethernet and Token Ring, the latter being less familiar. The successor of IPv4, Internet Protocol Version 4, IPv6 also does not implement the broadcast method to prevent disturbing all nodes in a network when only a few may be interested in a particular service. Instead it relies on multicast addressing, which will be discussed in section III. D. In practice Both Ethernet and IPv4 use a broadcast address to indicate a broadcast packet. Token Ring however uses a special value in the IEEE control field of its data frame. In those kinds of network structures, the performance impact of broadcasting is not as large as it would be in a Wide Area Network, like the Internet, however it is still there and better can be avoided. E. Security issues Broadcasting can also be abused to perform a DoS-attack, a Denial-of-Service attack. The attacker then sends fake ping requests with the source IP address of the victim computer. The victim s computer will then be flooded by the replies from all computers in the domain. A. Definition III. MULTICASTING Multicast addressing is a conceptually similar one-to-many routing methodology, but differs from broadcasting since it limits the pool of receivers to those that join a specific Multicast Receiver Group, an MRG. It is the delivery of a message or information to a group of destination computers simultaneously in a single transmission from the source, automatically creating copies in other network elements, such as routers, however only when the topology of the network requires it. As shown in Fig. 2, you can see that only the interested green hosts, which joined the MRG, will receive the multicast information packets sent by the red multicast host. However the uninterested yellow hosts, not inside the MRG, will not be tampered at all. Fig. 2. Multicast addressing Multicast uses the network infrastructure very efficiently by requiring the source to send a datagram only once, even if it needs to be delivered to a large number of receivers. The nodes in the network take care of replicating the packet to reach multiple interested receivers, however only when necessary. Note that similar to broadcasting, the scope of a multicast datagram can be configured likewise. It is sufficient to adjust the TTL value for controlling the scope. B. Limitations Sadly no mechanism has yet been demonstrated that would allow a multicast model to scale to millions of transmissions together with millions of multicast groups and thus, it is not yet possible to make fully-general multicast applications practical. Another drawback is that not all Wi-Fi Access Points support multicast addressing, however this number is increasing quite fast and is facilitating the WiCast Wi-Fi multicast, which allows the binding of data not only to interested hosts or nodes, but also to geographical locations. C. In practice Multicast is most commonly implemented in IP multicast in IPv6, applications of streaming media and Internet television. In IP multicast, the implementation of the multicast concept occurs at the IP routing level, where routers create optimal paths for datagrams or multicast packets sent to a multicast destination address. IP multicast is a technique for one-tomany communication over an IP infrastructure in a network. It scales to a larger population but not requiring the knowledge of whom, or how many receivers there are in the network. The most common transport layer protocol used to use multicast addressing is UDP, which stands for User Datagram Protocol. By its nature, UDP is not reliable because messages may be lost or delivered out of order. However, PGM, Pragmatic General Multicast, has been developed to add loss detection and retransmission on top of IP multicast. Nowadays IP multicast is widely deployed in enterprises, stock exchanges and multimedia content delivery networks. A common enterprise use of IP multicasting is for IPTV applications such as distance learning and televised company meetings. 68

75 IDE1115 An optimal approach to address multiple local network clients 3 D. Security issues Since multicasting requires joining an MRG in order to receive datagrams being sent in the same multicast group, it provides good security. DoS-attacks can still be executed in one MRG, however, since the number of hosts are commonly limited, the attacks will have less negative consequences. Moreover, the attacker first has to join the multicast receiver group, which requires the knowledge of a special IP address, in order to execute an attack. IV. IMPLEMENTATION OF MULTICASTS IN A PROTOTYPE APPLICATION A. Introduction Implementing multicast addressing in a prototype application is fairly easy. When using Java as the programming language, using the namespace already includes support for multicasts. Not only the support is excellent, also the coding itself only requires a few lines. First, one should create a MulticastSocket object and specify the port number as an attribute. Secondly, the creation of a MRG is required, which can be done by instantiating a InetAddress object and using a multicast IP address as an attribute. Next, one should specify the MulticastSocket to join the MRG by using the joingroup() method. Optionally, the TTL can be adjusted to specify the addressing scope by calling the settimetolive() method. After this, a DatagramPacket should be instantiated with the same port number as been given to the MulticastSocket. Note that this packet will also contain the data being sent within the receiver scope, the data being a byte array. Finally it is sufficient to set the MulticastSocket s buffer size to the length of the data with setsendbuffersize() and send the datagram throughout the network using the send() method. B. Code example for sending multicast datagrams try { MulticastSocket mcsocket = new MulticastSocket(8717); InetAddress inaaddress = InetAddress.getByName(" "); mcsocket.joingroup(inaaddress); mcsocket.settimetolive(127); byte[] bytmsg = ( Data here! ).getbytes(); DatagramPacket packet = new DatagramPacket(bytMsg, bytmsg.length, inaaddress, 8717); mcsocket.setsendbuffersize(bytmsg.length); mcsocket.send(packet); } catch (Exception ex) { System.out.println("Error ); } Fig. 3. Sample code for sending multicast datagrams C. Code example for receiving multicast datagrams try { MulticastSocket mcsocket = new MulticastSocket(8717); InetAddress inaaddress = InetAddress.getByName(" "); mcsocket.joingroup(inaaddress); DatagramPacket packet = new DatagramPacket(new byte[1024], 1024); mcsocket.receive(packet); String strmsg = new String(packet.getData(), 0, packet.getlength()); // Do something with message here // } catch (Exception ex) { System.out.println("Error ); } Fig. 4. Sample code for receiving multicast datagrams D. Discussion As you can see in section B and C, the implementation of this addressing approach is very straightforward. Note that port number 8717 is used, because port numbers below this value are officially reserved by other applications. Using a port number between and results in avoiding congestions with applications using the same port. The special multicast IP address is chosen with more care since an MRG has to be specified by a class D IP address. Class D IP addresses are in the range to , inclusive. However, the address is reserved and should not be used. On the receiving part of the prototype implementation, one should set a buffer size for the datagram being received. Note that in the code sample shown above, a buffer size of 1 kilobyte is chosen. Though, one should keep in mind to adjust this value to the prototype s requirements. V. TEST OR ANALYSIS OF SOLUTION In the prototype application we created, the implementation of multicast addressing succeeded very well. We tested multicast datagram transmission and receiving up to 20 clients participating in the same multicast receiver group. The results were very good. All datagrams were sent and received successfully. Moreover we added some more clients to the network participating in another MRG to test whether those clients would not be tampered with information being sent in another receiver group. Again, the result was successful. Only clients in the same MRP received information they were interested in. VI. CONCLUSION To conclude, the optimal approach to address multiple local network clients is to use multicasting. Multicasting uses the network more efficiently, can be easily implemented and will eventually replace the ineffective broadcast addressing in the future in IPv6. However, one should keep in mind that multicast addressing is not very well supported yet. Not all access points are capable of providing multicast addressing. However, since its popularity is growing so fast, one should strongly believe to use this approach because more and more manufacturers will support it in the near future. 69

76 IDE1115 An optimal approach to address multiple local network clients 4 VII. REFERENCES [1] Ard Digital DE. (2010, 7 23). ARD Digital - Digitales Fernsehen der ARD - Digitalfernsehen - Digital TV - Multicast-Adressen. Retrieved 5 1, 2011 from Ard Digital DE: Technik/IPTV/Multicast-Adressen/Multicast-Adressen [2] Britannica ORG. (2009, 3 27). broadcast network -- Britannica Onlince Encyclopedia. Retrieved 4 30, 2011 from Britannica: [3] CompTechDoc ORG. (2010, 4 5). Broadcasting and Multicasting. Retrieved 4 30, 2011 from CompTechDoc: sting.html [4] Davidson, P. (2004, 1 29). Local stations multicast multishows. Retrieved 5 1, 2011 from USA Today: [5] EzDigitalTV. (2011, 4 23). What is Multicasting? Retrieved 4 30, 2011 from EzDigitalTV: [6] Oracle. (2010, 10 19). MulticastSocket (Java 2 Platform SE v1.4.2). Retrieved 5 3, 2011 from Oracle: et.html [7] Parnes, P. (2006, 2 2). Java Multicasting Example. Retrieved 5 6, 2011 from LTU.SE: 70

77 An estimate of the patient s weight on a antidecubitus mattress using piezoresistive pressure sensors Abderahman Moujahid 1, Stijn Bukenbergs 1, Roy Sevit 1, Louis Peeraer 1,2 1 IBW, KHKempen [Association KULeuven], Kleinhoefstraat 4, 2440 Geel, BELGIUM 2 Faculty of Kinesiology and Rehabilitation Sciences, Katholieke Universiteit Leuven, BELGIUM Abstract Europe s demographic evolution predicts that in the year 2060 the population above 80 will triple. Elderly people have a higher risk for severe illness, causing patients to spend most of their time in bed. Because of their medical condition, these bedridden patients are more likely to develop decubitus (pressure ulcers). Decubitus is often threaded by using an alternating pressure air mattress (APAM). This work explores the possibility to measure a patients weight based solely on pressure variations in the APAM s air cells. Previous studies have shown that weight is a important factor in the development of decubitus, so most of present APAM s are pressurized based on the weight of a patient. This configuration is currently done manually, leaving room for error. If this configuration is not done correctly, tissue pressure will not be reduced effectively. The development of an intelligent APAM, that can measure the patient s weight and regulate pressure, will not only reduce decubitus development, but also increases the patient s comfort. Index Terms decubitus; pressure ulcer; weight; interface pressure; cell pressure; alternating mattress; anti-decubitus mattress; pressure mapping; piezoresistive pressure sensor I. INTRODUCTION A. General ecubitus is a global problem. For the total of Europe, the D prevalence number is rated at 18,1% [21]. In Belgium, up to 12 % of patients with mobility impairment and poor health, deal with some sort of decubitus [4]. With the aging of the population, this prevalence percentage will only increase. Fig 1. shows a population pyramid with predictions in 2060 for Europe. The 65-plus population will be more than 30% of the total population in Europe [26]. Fig 1. Population pyramid Europe ( ) Decubitus is the medical term for pressure-necrosis or pressure ulcering. It refers to dying of tissue under influence of compressive forces (pressure), shearing forces, friction [2] and micro-climate [3]. Capillaries get pressed shut by these external pressures resulting in insignificant oxygen and nutrients supply to the tissues. Decubitus can lead to severe complications and permanent tissue-damage. The main cause of decubitus development is contact pressure (interface pressure) from the patient s body weight against an underlying support surface. Therefore, APAM s focus their operation to reduce the magnitude and duration of this pressure [5]. II. FACTORS Adjacent to pressure, there are numerous indirect causes. These factors will contribute to the development of pressure ulcers. The factors are classified in 2 categories : extrinsic and intrinsic factors. The main extrinsic factors are the forces of bodyweight, shear and friction. These factors can basically be controlled by using proper support surfaces and by following the guidelines issued by the European Pressure Ulcer Advisory Panel (EPUAP) [2]. Pressure, resulting from the patient s weight pressing on the support surface, causes the tissue to be compressed against the bony prominences resulting in disturbed capillary perfusion. In the 1930 s, Landis [6] research suggested that the amount of pressure to achieve capillary occlusion is 32 mmhg. This misconceived number comes from measurements taken at the fingertips of healthy, young students. Later studies [23] have shown that 71

78 the amount of pressure for capillary occlusion varies between individuals and the anatomical location in the human body. The second extrinsic factor is shear [7]. Shear stress is parallel (tangential) force applied to the surface of an object, while the base remains stationary. Shear stress causes deformation on top layers of the skin. A typical example of this occurs when the head of the bed is raised. This will cause an increase in shearing forces on the sacral tissues. The skin is hold in place while the body is pulled down due to gravity, causing the bony prominence to push against the deep, internal tissues. Shear forces increase pressure and thereby cause a reduction in capillary blood flow. Shear forces are mostly combined with friction. Friction occurs when two surfaces move across one another [8]. The protective outer layer of the skin is then stripped off. Patients who are not able to move by themselves or patients with uncontrollable movements (ex. spastic patients) have a higher risk in tissue damage caused by friction [24]. A recent development in friction-reduction is the usage of low-friction materials (ex. polyurethane). These are usually laid over the support surface. The intrinsic factors are factors that can speed up the formation of pressure ulcers. They are related to the medical condition of the patient. There are a lots of these factors for example : mobility, incontinence, body temperature, age, gender, nutrition, A. Alternating systems Like previous mentioned, the best method to successfully overcome pressure ulcers, is to redistribute pressure in combination with providing a comfortable surface for the patient. Support surfaces that redistribute interface pressure fall into two categories: pressure-reducing [17] and pressurerelieving [9,17] surfaces. Interface pressure-reducing systems reduce interface pressure by maximizing the skin s contact area. Interface pressure is the result of the weight of the patient causing a deformation of the mattress that adapts to the patient s body contours. An alternating system has a pressure-reducing functionality. It has a series of cells beneath the patient that are inflated by an air pump. The manuals of such devices often indicate a table with pressure-settings according to the patient s weight while others give the preference to a hand check [10]. In both cases the pump configuration is performed manually. The hand check procedure is issued by the Agency of Healthcare Policy and Research (AHCPR) [10]. Inappropriate configuration of the air-pump can lead to bottoming-out, a situation where the patient is no longer supported by the underlying surface. This situation should be avoided at all times. An alternating system has also a pressure-relieving task. Regions of high pressure are periodically shifted by deflating and inflating consecutive APAM s air cells. Usually cycle times vary from 10 to 25 minutes, and are manually adjustable. The optimal time cycle between inflation and deflation is still not known. III. MATERIAL AND METHODS The experiments, conducted in this paper, are applied on an low-air-loss alternating mattress that consist of 20 cells. Lowair-loss is a property of a APAM which refers to the escape of air through the micro-perforated pores of the cell for temperature and moisture regulation [3]. Each cell is divided in 2 compartments : - active layer - comfort layer The seventeen cells in the active layer are connected to the pump through two separate tubes, so alternating circuits can be formed. An additional tube is connected to the pump to provide a fixed pressure to the comfort layer. The pump can be set to static or dynamic mode. In static mode, the pressure in the cells is held on the configured pumppressure (PP). When the pump is placed in dynamic mode, the active layer will be divide into two separate circuits which will be periodically deflated or inflated. The comfort layer is held at a constant pressure. A non-return valve is integrated in this layer to guarantee that the patient is always supported even if a power failure occurs. This prevents a bottoming out situation. The cells are only ventilated in the active layer to prevent skin degeneration of the patient. (see low-air-loss) Which means that over a period of time, the cells deflate automatically. The air-pump (Q2-03, EEZCARE, TAIWAN) produces 8L / min air flow and works in a pressure range from 15 to 50 mmhg [10]. A. Interface pressure measurements For measuring interface pressure, a pressure mapping system is needed. A MFLEX Bed ACC 4 Medical system is used. The system consists of a pressure mat containing 1024 (32 x 32) thin, flexible piezoresistive sensors and is covered with polyurethane coated ripstop Nylon [11], an interface module for the collection of data and a calibration kit. The system is delivered with its own software (Fig 2.) to allow clinicians to view, annotate, file and share the information gathered by the sensors. Fig. 2. Pressure mapping using MFLEX software 72

79 B. Meaning of interface pressure Theoretically, the interface pressure reflects the weight of a patient, (Pascal s law) : F P (1) A Formula 1 states that the interface pressure equals the force (body weight multiplied with earth s gravitational force), divided by the surface area in contact with that force. This means that the larger the surface area supporting the patient, the lower the tissue interface pressure will become. Interface pressure can be affected by the stiffness of the support surface or the shape of the patient s body [12]. There is some controversy regarding the analyzing of IP measurements[25]. Maximum IP is cited in many studies as a the most significant parameter. This use is based on the assumption that the maximum IP is the leading factor for the development of pressure ulcers [13]. Vulnerable sites such as heels, elbows, sacrum, head, etc have the highest interface pressure on the body due to the fact that these are bony surfaces. C. Goal It s already stated that manual configuration of the air pump introduces error. The aim of this study is to fully minimize human-error by means of automatic weight measurements. Proper weight-/pressure configuration will optimize APAM functionality. D. Setting Pressure sensors are attached to each individual cell and the signals are then processed using LABVIEW (National Instruments, Texas). E. Hardware A MPX5010GC7U piezoresistive pressure sensor (Freescale Semiconductor, Texas) is used in this setting. The sensor is a integrated silicon pressure sensor with on-chip signal conditioning, temperature compensation and calibration hardware. The measurable pressure ranges from 0 to 10 kpa (eq mmhg), which is converted to an output-voltage range of 0.2 to 4.7 V. The sensor demands a supply of 5 V. When the sensor is not under load, the voltage offset has a typical value of 200 mv [22]. Transfer function is given below Vout Vs *(0.09* P 0.04) ERROR (2) The output-voltage (Vout) is proportional to the sourcevoltage (Vs) and the pressure (P). The added error-term refers to a temperature compensation factor that must be taken to account when not using the sensor in its normal temperature range : 0 to 85 C. The analog pressure signals are amplified before digitalization (two times) to broaden the input range of the digitalization step. Amplification is done with a MC33174, a low-power single supply OPAMP. For PC signal processing, the raw, analog signals need to be converted to digital values. The actual conversion is done by a analog-to-digital-converter (A/D) that s part of the dataacquisition system (DAQ, NI USB-634, National Instruments, Texas). It has 32 analog inputs, a maximum sampling frequency of 500kS/s and a 16-bit resolution A/D. The maximum pressure-resolution after amplification will be : Sensitivit y Full scale span 9V 900 pressure range 10kPa mv kpa 10 V 4 ( voltage) resolution 1, V 0, 1525mV 16 (2 1) bit bit bit 0,1525 mv ( pressure) resolution bit 0, kpa 0, 0013 mmhg 900mV bit bit kpa F. Noise considerations When retrieving the signals through the 16-bit resolution A/D, the influence of noise becomes an important error factor. There are 2 dominant types of noise in a piezoresistive integrated pressure sensor: shot (or white) noise and 1/f noise (flicker noise) [14]. Shot noise is the result of non-uniform flow of carriers across a junction and is independent of temperature. Flicker noise results from crystal defects due to wafer processing errors. To minimize the effect of noise, a low-pass RC filter with a cutoff frequency of 650 Hz is placed behind the sensor s output-pin [14]. The transducer has a mechanical response of about 500 Hz, its noise output extends from 500 Hz to 1 MHz. Another point of importance is the supply voltage. The sensor output is influenced by variation in the supply voltage. Meaning that any variation in the supply voltage will appear at the output of the sensor (see formula 2).This has a direct effect on the sensor accuracy. The developed sensor-board is equipped with a LM317, adjustable voltage regulator to supply a stable source of 5V. The adjustable voltage regulator is chosen instead of a fixed regulator (ex. LM7805) because of the improved line and load regulation, overload protection and reliability [15]. G. Software The software is written in LABVIEW. The application is split up into 2 modules (Fig 3). One module has the task of continuous measuring the pressure in the different APAM cells, while the other measures interface-pressure of the patient. Interface pressure is derived from the MFLEX pressure mapping system. (3) 73

80 Fig 3. LABVIEW-application Secondly the second module has the task of estimating the contact area of the person on the underlying support surface. H. LABVIEW : pressure measurement The cell pressure module of the LABVIEW application runs through several steps. An overview is presented in Fig 4. Fig 6. 3-point Moving Average Filter applied Calculations of mean, standard deviation, median and variance are made to compare different measurements.. The incoming data is saved in a TDM data model. Data includes a time-stamp, comfort-layer pressure, pressure of all cells in the active layer and relevant statistics. For optimal conclusions, it s desirable that cell pressure is synchronized with the interface pressure measurement. This is not possible with the original MFLEX software. So custom extensions from the MFLEX SDK are implemented using LABVIEW. A schematic view of the interaction between the MFLEX SDK and LABVIEW environment is shown in Fig 7.: Fig 4. Algorithm : measuring cell-pressure in LABVIEW The first step is acquiring the pressure signals. Secondly an optional offset adjustment can be made. The reason for this is that the offset can differ among the different sensors. For precise measuring it s recommended to apply this offsetadjustment to allow all sensors to have the same calibrated output. The third step is the conversion from the actual voltage to the relating pressure. This is done in accordance to the above mentioned transfer function of the pressure sensor. (Formula 2) The next step is filtering the input-signal. The moving average filter smoothens the data signal, removing any additional noise. Fig. 5 shows an example of a raw input signal, while Fig. 6 shows the result after applying the moving average filter. Fig 7. Algorithm : measuring interface-pressure in LABVIEW Fig 5. No filtering-technique applied The initialization step will ensure that the connection is established between the pressure mat / interface box and the LABVIEW software. Subsequently reading of pressure values is then possible and available for processing.. 74

81 IV. RESULTS Before measuring with real life persons, static objects are used to eliminate all possible interference parameters (e.g movement of real life test persons). In the second stage, the results of the static study is matched with the results on human-beings. A. Test 1 static mode : Same weight - Different PP Various pump pressures are able to be set on the air pump. Two weights are used in this setting : 33±1 kg and 73±1 kg. The mattress is first pumped to a specific PP (ex. 15 mmhg). Then 33±1 kg is placed on the mattress and one static cycle is measured. Afterwards the same procedure is done with 73±1 kg. The PP is then increased with one step and the same weights are again applied. This procedure is repeated till maximum PP (50 mmhg ) is reached The goal of this test is to find out on which PP a noticeable difference can been seen if two objects with a different weight are applied on the mattress. PP (mmhg) MIN CP (mmhg) Weight (kg) MAX CP (mmhg) LT (min:sec) MIN CP (mmhg) MAX CP (mmhg) LT (min:sec) : : : : : : : : : : : : : : : : 15 TABLE I. Static measurements of different PP s Result : The difference in maximum and minimum cell pressure on all PP s is merely 0.1 to 0.2 mmhg. At a PP of 15 mmhg, the leakage time for 33 kg is 6m 51s while an object of 72 kg has a LT of 12m 44s. At a higher PP, for instance 45 mmhg, the LT for 33 kg is 2m 45s and for 73 kg is 2m 34s. In Fig 8 shows different static cycles on different pump pressures. It s noticeable that the blue line is much longer than the dotted line. On the higher pump pressures, the lines coincide. This instigates that on these cell pressures there s no distinction between the two weights Fig 8. : Comparison between kg on different PP s Next parameters (Fig 9.) are measured : o Minimum cell pressure (MIN CP) o Maximum cell pressure (MAX CP) o Leakage time (LT) B. Test 2 static : Same PP Different weight The next test focuses on the magnitude of differentiation of the weight. Different weights are applied while the pump pressure is kept at a fixed value. The pump pressure is set to 15 mmhg. The objects are measured for a couple of cycles, and these parameters (Fig 9.) are tested : o Minimum cell pressure (MIN CP) o Maximum cell pressure (MAX CP) o Step up time of the cell (ST) o Leakage time of the cell (LT) o Total cycle time (TT) Each measurement is repeated 4 times. The average and the standard deviation are shown in TABLE ІI. Weight MIN CP MAX CP ST LT TT (kg) (mmhg) (mmhg) (sec) (min : sec) (min : sec) ± ± ± : 49 ± : 18 ± ± ± ± : 35 ± : 07 ± 4.0 Fig 9. Parameters of a static cycle The leakage time is the time measurement between MAX CP and MIN CP. The results from Fig 8. are covered in TABLE I ± ± ± : 04 ± : 44 ± ± ± ± : 40 ± : 28 ± ± ± ± : 59 ± : 53 ± 3.5 TABLE II. Static measurements of static weights Result : There is no significant relationship between weight and minimum and maximum cell pressure in static mode. There is indeed a connection between weight, step up time, leakage time and total cycle time. The step up time will 75

82 increase, if the weight of the object is raised. But the variation is too small for formulating correct conclusions The leakage time is a better parameter. Between 0 and 33±1 kg, there is a difference of 46s. Between 33±1 kg and 44±1 kg, it s already 1 minute 29 seconds. The correlation between leakage time and weight is not linear correlated. The total cycle time is proportional to the leakage time, so both parameters can be used for analysis. C. Test 3 static : Comfort layer All cells are attached to the comfort layer. The non-return valve will guarantee that air does not escape and stays in the bladders. Measurements have proven that this layer reacts the same as the active layer in static mode. The only difference is a decrease of mmhg in pressure. [18]. This is shown in Fig 10. The comfort-layer situates directly under the active layer. When a force is applied on the active layer, a certain amount of the energy is passed through to the comfort layer. This is the total surface area of the human body. Various calculations have been published to arrive at the BSA without direct measurement. Mosteller proposed the following definition for BSA calculation [19] : 2 weight ( kg) x height ( cm) BSA ( m ) 3600 Dubois & Dubois [20] reformed this equation to the following : weight ( kg) x height ( cm) BSA ( m ) Because only one side of the human body is resting on the mattress, the BSA must be divided with a factor two. This number gives only a rough estimation of the body surface. Therefore BSA is only used to compare the magnitude with the calculated surface area from the pressure mat. The difference between BSA and actual surface area could be substantial because the total human surface area does not rest fully on the alternating mattress. The actual surface area is calculated in LABVIEW by following procedure : 1) Count the number of sensors with a IP-value greater than a specific threshold 2) Divide previous number with the total sensors in the mat (in this case 1024) Fig 10. Active layer and comfort layer in static mode D. Test 4 static : Surface area To test the influence of the body contact area, different know surface areas (0,16 m² 0,32 m² 0,48 m². ) have been placed on the mattress containing the same weight (45±1).. The emphasis is laid on leakage time of the active layer. (See Test 1 & 2) Result : Contact area plays a significant role in the duration of leakage time. The time span is almost twice as long with a surface area of 0,16 m² (LT : 13m 17s) than with 0,48 m² (LT : 8m 17s). This test gives a clear indication that contact area is important for later investigations. E. Calculation of human surface area Formula 1 notes that pressure is force (body weight) divided by the surface area. Body weight is the variable that is wanted. Surface area is the missing variable to solve the equation in Formula 1. Surface area can be estimated using the pressure mapping system. This is only done for research purposes. For the intelligent APAM, the pressure mat can t be used due to the high price. Also a good starting point is the body surface area (BSA). 3) Multiply this with the total sensing area of the mat The result will be a representation of the human surface area that is in contact with the support surface. F. Test 5- static : human subjects Previous test dealt with static objects, but to accomplish the weight estimation on a APAM testing on human subjects is vital. For this test both cell pressure as interface pressure were recorded. The experimental subjects are seven healthy volunteers with an age between 20 and 28 yrs old. Following table (TABLE III) summarizes all test persons: Person ID Length (cm) Weight (kg) BMI TABLE ІIІ. Description of the experimental group 76

83 Procedure taken with subjects : - Step 1. : Inflation of the APAM on a static pump pressure of 15 mmhg - Step 2 : The test person lays on the mattress and stays as stable as possible. - Step 3 : Minimum, maximum CP (cell pressure) for at least 3 cycles are measured and the standard deviation (SD) is calculated. The minimum, maximum and average interface pressure (IP) is retrieved with the pressure mat. Both PP as IP are expressed in mmhg. The surface area (SA) is then calculated based on sensor counts. Leakage time (LT) was also monitored. The results are summarized in Table ІV. Fig 12. Alternating PP = 45 mmhg TABLE V. summarizes Fig 11. and Fig 12. PP (mmhg) MIN CP (mmhg) WEIGHT (kg) 66±1 90±1 MAX CP (mmhg) MIN CP (mmhg) MAX CP (mmhg) ID MIN CP MAX CP SD CP MIN IP MAX IP AVG IP SA LT : : : : : : :20 TABLE ІV. PP and IP measurements of human subjects Result : The minimum cell pressure is between 13.8 mmhg and 14.1 mmhg. It is not at all weight-related. Maximum cell pressure is much more scarred with a minimum value of 29.4 mmhg and a maximum value of 45.6 mmhg. The interface pressures (MIN, MAX, AVG) differ for each individual person. Leakage time for person 1 (62.5 kg) is 11m 44s, while person 7 (103 kg) has a LT of 33m 20s. This is almost 22 min of variation for 41.5 kg. TABLE V. Dynamic measurements of different PP s Result : On a PP of 15 mmhg, the difference in MIN CP between two weight is 2.0 mmhg. For MAX CP, there is a difference of 1.7 mmhg. For a PP of 45 mmhg, the difference in MIN CP between the two weights is 1.5 mmhg. For MAX CP, there is a difference of 1.2 mmhg. The difference is slightly higher at a PP of 15 mmhg. H. Test 7 dynamic : interesting parameters Test 6 showed two interesting parameters for further investigating : minimum cell pressure in the active layer (Fig 14.) and the mathematical difference between minimum and maximum cell pressure in the comfort layer. (Fig 13.) G. Test 6 dynamic : cell pressure The cycle time for an alternating cycle is set on 10 min. Two weights are applied on the mattress : 66±1 kg and 90±1kg. The goal is to find the pump pressure where the biggest deviation is seen between the two weights. A low PP of 15 mmhg (Fig 11.) and high PP of 45 mmhg (Fig 12.) are selected. Fig 13. Difference max. and min CP in the comfort layer Fig 11. Alternating PP = 15 mmhg 77

84 V. MEASUREMENT CONCLUSIONS Fig 14. Minimum CP in the active layer Results are summarized in TABLE VI. DP (=Differential pressure) is the difference between maximum and minimum cell pressure in the comfort layer. Both average and standard deviation are calculated. Weight (kg) DP ( AVG ± SD) MIN CP (AVG ± SD) ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± 0.05 TABLE VI. DP and MIN CP in dynamic mode Result : If no weight is applied, the DP is 7.40 mmhg. For 90 kg, the DP is 2.03 mmhg. DP is decreasing according to weight, but in the transition from 45 kg to 49 kg, an increase of DP takes place. Meaning accurate weight estimations using this method is not possible. Minimum CP for no weight is 2.74 mmhg and 8.75 mmhg for the highest weight. Minimum CP increases with applied weight, but is also not consistent in its results. A 58 kg object gives a MIN CP of 5.57 mmhg, while 53 kg gives a MIN CP of 6.07 mmhg. There can be noted that for both layers a distinction can be made between 0kg and 90±1kg. Everything in between can t be accurate identified. The pressure differences are too low to make any definite conclusions. Test 1 showed a major difference in leakage time for both weights on the lowest pump pressure (15 mmhg). On a PP of 15 mmhg, the difference in LT between 33kg an 73kg is 5 min 25 sec. For a PP of 20 mmhg, it s only 34 sec. The declination of LT proceeds towards the higher PP s. Test 2 confirmed that the leakage time is dependent of the weight, but has no linear correlation. Test 3 showed that comfort layer reacts the same as the active layer. The only difference is a reduction in CP. Next test showed that leakage time is dependent of contact area (Test 4.). Test 5 covered measurements with human subjects. It showed just like static measurements a positive evolution in leakage time. In the dynamic mode, there is no LT. The only parameters that can be discussed are MIN CP and MAX CP. Test 6 showed that raising the pump pressure, the alternating datasets have more similarity. This is a undesirable effect. So the best way of distinguishing weight is by applying the lowest pump pressure (15 mmhg). This conclusion is also valid in the comfort layer. It must be noticed that the shape of the waveform in the comfort layer is completely different with regard to the active layer [18]. This is the opposite of the static mode, where active and comfort layer are shape-wise the same. Two interesting parameters were found in this process : MIN CP in the active layer and the difference between MAX CP and MIN CP (DP) in the comfort layer[18]. Test 7 explored these two parameters : if the weights are close to each other, the possibility of an accurate weight-estimation with these parameters is nil. This is due to the very small variations in CP. VI. GENERAL CONCLUSION In static mode, the leakage time of the cells gives a good indication for a weight estimation of the patient. To further explore the effectiveness of this parameter, more data from test subjects is required. In dynamic mode, the results of the applied tests are inconclusive. Further steps need to be taken in the dynamic mode. A possibility is to examine the actual shape of one alternation. VII. FUTURE WORK The leakage time in static mode is dependent of the surface area (Test 4). The time dilation caused in variation of surface area must be taken in account with the total leakage time. Multiple persons with the same weight and a different surface area are needed and must be subjected to the static tests. The result will show, the effect of contact area and leakage time for real persons. Another possibility is to examine the actual shape of one alternation. Conducted tests [18] showed that for each weight, the shape of the waveform is different. A test set could be built with alternations from different kinds of weights varying between 45 and 120 kg. The data set will then include the 78

85 recordings of alternations of our test subject. The goal is then to compare the data set with the test set. (ex. using crosscorrelation) The best comparison between the alterations in the test and data set, gives the highest correlation coefficient. Tests are successfully completed with static objects, but not yet with real-life persons. Extra filtering is presumably acquired to nullify movements of the patient. REFERENCES [1] K.Vanderwee, M. H.F. Grypdonck, T. Defloor, Effectiveness of an alternating pressure air mattress for the prevention of pressure ulcers, March 2005 [2] National Pressure Ulcer Advisory Panel and European Pressure Ulcer Advisory Panel. Prevention and treatment of pressure ulcers : clinical practice guideline. Washington DC: National Pressure Ulcer Advisory Panel, 2009 [3] Clark M, Romanelli M, Reger S, et al. Microclimate in context. In: International review. Pressure ulcer prevention: pressure, shear, friction and microclimate in context. London: Wounds International,2010 [4] T. Defloor, Studie van de decubitusprevalentie in de Belgische ziekenhuizen : Project PUMAP, 2008, pp 6 27 [5] C. Theaker, Pressure sore prevention in the critically ill: what you don t know, what you should know and why it s important, Intensive and Critical Care Nursing 2003, 19, pp [6] Landis EM. Micro-injection studies of capillary blood pressure in the human skin. Heart 1930; 15: [7] Reger SI, Ranganathan VK, Orsted HL, et al. Shear and friction in context. In: International review. Pressure ulcer prevention: pressure, shear,friction and microclimate in context. London: Wounds International 2010 [8] Bridel J Assessing the risk of pressure ulcer. Nursing Standard, 7(25): pp [9] K.Vanderwee, M. H.F. Grypdonck, T. Defloor, Alternating pressure air mattresses as prevention for pressure ulcers : A literature review, International Journal of Nursing studies 45, 2008, pp [10] User Manual Q2 support surface series, January 2004, p9 [11] MFLEX 4.0 User Manual 1 st edition, Vista Medical, 2008 [12] U.S. Departement of Health and Human Services : Agency for Health Care Policy and Research, Pressure Ulcers in Adults : Prediction & Prevention, May 1992, p 55 [13] J. T. M. Weststrate, The value of pressure ulcer risk assessment and interface pressure measurements in patients, 2005 [14] A. Reodique, W. Schultz, Noise considerations for Integrated Pressure Sensors, Freescale Semiconductor, AN1646, Rev2, 05/2005 [15] U.A. Bakshi, A.P. Godse, Power Electronics II, 2009, pp [16] S. Sumathi, P. Surekha, Labview based advanced instrumentation systems, 2007, p 24 [17] National Pressure Ulcer Advisory Panel, Support Surface Standards Initiative, Terms and Definitions Related to Support Surfaces, January 2007 [18] A. Moujahid, Ontwikkeling van een geïnstrumenteerde alternerende matras ; statische gewichtsbepaling door drukvariaties, Master thesis, 2011 [19] RD. Mostleller, Simplified Calculation of Body Surface Area., N Engl J Med., Oct 1987, [20] Dubois D; Dubois EF, A formula to estimate the approximate surface area if height and weight be known, Arch Int Med, 1916 [21] K. Vanderwee, Symposium van de Federale Raad voor de Kwaliteit van de Verpleegkundige Activiteit, Brussel, 3 maart 2011 [22] Freescale Semiconductor, Technical Data MPX5010G, Rev 11, 2007 [23] A.C. Burton, S. Yamada, Relation between blood pressure and flow in the huma forearm. J Appl Physiol, 1951; pp [24] M.Moffat, K. Biggs Harris, Integumentary essentials: applying the preferred physical therapist practice patterns,2006, p 23 [25] L.Philips, Interface pressure measurement: Appropriate interpretation of this simple laboratory technique used in the design and assessment of pressure ulcer management devices [26] EUROSTAT, EUROPOP2008,

86 80

87 Practical use of Energy Management Systems J. Reynders, M. Spelier, B. Vande Meerssche Abstract This paper discusses some systems that could be useful for a global energy management system. The goal of this project is to bring the different energy management systems together in LabVIEW TM so we can use some smart algorithms in a later stadium to control all the appliances in a more efficient way. It is hereby important to synchronize all the different measurements. An additional purpose of this work is to create a SWOTanalysis of different kinds of measuring systems that could be used for energy management. This will be a comparison of the strengths, weaknesses, threats and opportunities of each individual system. Index Terms Energy Management System, LabVIEW TM, Plugwise, Zigbee, Siemens Sentron Pac 3200, Modbus, Beckhoff bk9050, Beckhhoff KL6041, Twincat, Socomec Diris A10, RS485 I. INTRODUCTION THE way we use energy must change. We waste too much energy, consciously and unconsciously, and therefore we contribute to an enormous pressure on the ecosystem of our earth. Reducing our ecological footprint [6] can be archived by living more economical in an active way by changing our habits. But it can also be done more easily be integrating systems like smart grids into our environment. One way to minimize this ecological footprint is to maximize the use of Renewable energy. Therefore, we can use intelligent energy management systems (EMS). These systems will try to match the consumption of energy and the amount of renewable energy available in such a way that the consumers are not affected. On the other hand there is also the smart grid that is put in place worldwide. A smart grid is a new approach one how an electricity grid works, whereby not only energy flows from the grid to the customer but also communication data. This is a consequence of the move towards decentralized power generation. This evolution means that the energy supply, and, correspondingly the price, will be much more variable. To keep the lights on, it is important that the power grid can communicate with consumers. This can be achieved by the help of EMS. Calculating and reducing the ecological footprint and the introduction of a smart grid implies that energy consumption must be measurable and controllable. In addition, the energy consumption has to be as much efficient as possible. In Belgium, for production units which produce more than 10 kilowatt, the injection and the use of the power from the grid has to be measured separately. This results in worse payback periods because the surplus of electricity can only be sold at 30% to 40% of the purchase price. It is therefore important for this type of equipment to minimize power injection into the grid. EMS offers the necessary J. Reynders, M. Spelier and B. Vande Meerssche are with the Biosciences and Technology Department, KH Kempen University College, Geel, Belgium solutions for this challenge. The EMS could intervene in an intelligent way. An EMS means a great support for managing the energy consumption of a building. The system is capable to handle a wide range of actors in an independent way. II. HOME APPLIANCES In this section, we are going to analyze the way of communicating of two EMS developed for home users. The most important thing to make it suitable for home users, is the simplicity in how to set up the connection. It needs to be possible to be configured by a user without technical knowledge. A. Module using Zigbee communication The Plugwise [3] module which is used in our test, is fully plug and play. It uses Zigbee as communication protocol. Our test system consists in a main controller (MC), a network controller (NC) and eight nodes. The main controller is the connection between the Zigbee network and our Energy management server. It utilizes an Ember EM250 chipset. This is a single-chip solution that integrates a 2.4GHz, IEEE compliant transceiver with a 16-bit XAP2b microprocessor. On this chip the Zigbee PRO stack is implemented. Although the Plugwise stick looks like a USB interface, it actually utilizes a serial protocol. A virtual serial port is provided by an onboard FT232R chip. The main controller communicates directly with the Network Controller and the network controller communicates with the other nodes in a mesh network topology. This means that we first have to power up the Circle+ and after that, we can plug in the circles which connect automatically with the Circle+ Fig. 1. Plugwise implementation Once all the circles are installed, we used the Plugwise source to test the connectivty and to get some measuremnts from our plug. This is the standard Plugwise software which 81

88 was included in the package. The measurements will later be used to check the accuracy of our LabVIEW TM application. After that we used a serial port sniffer to get information about how the main controller communicates with the network controller. To get the right power information Plugwise uses a calibration and a power information string. We have to use the calibration string in order to get the right measurements. The module uses a CRC16 checksum which is calculated over the full data string. The Plugwise Circle also holds an internal buffer with information about power usage in the past. Since we only need the actual value the latter will not be discussed in this paper. B. ModuleX using Powerline communication ModuleX is a prototype of an EMS that uses Powerline communications to connect with the server. The hardware contains some bugs which is normal in this stage of development. The Protocol which is used for the prototype is HomePlug Turbo. As you can see in figure 2, we use a gateway that makes a connection between the Ethernet and Powerline network. Every appliance we want to manage needs to be connected to a measurement module. This module contains the hardware for measuring the current energy consumption in milliampere and has the ability to switch the devices on and off. Fig. 2. ModuleX implementation The measuring module consists of two parts: a communication unit and a logical unit which communicate with each other by using a TCP connection. This means that the logical unit could be directly connected to the Ethernet if necessary. Each module has its own IP address that we can use to start a telnet connection on port 23. The command protocol that is used consists of an ASCII character. These characters are not case sensitive. For example to ask the current power consumption we need to send the ASCII character l or L. The module will reply with an ASCII string which need to be converted to get the measurement in milliampere. III. INDUSTRIAL APPLIANCES In this section the different set of rules used by two energy meters created for industrial usage are analyzed. This set of rules can be used in our LabVIEW TM application. First a power meter that uses Ethernet (Siemens Sentron PAC3200) to connect with our energy management server is considered. Afterwards we will analyze a meter that uses RS-485 (Socomec Diris A10) to communicate with a software PLC (Beckhoff) and a hardware PLC (Siemens). A. Power meter with Ethernet (Siemens Sentron PAC3200) The Siemens Sentron PAC3200 is a powerful compact power meter. The device can replace multiple analog meters and monitor over 50 parameters. It has an integrated 10 Base- T Ethernet module, which we will use to get the readings from the meter. The meter connects directly with the Energy Management Server. As shown in figure 3, it uses Modbus TCP for the communication between the meter and the server. Fig. 3. Siemens Sentron PAC3200 implementation Since we do not need all the values the PAC3200 measures, we will make a list containing the commands for the values we are interested in. This list will be used in LabVIEW TM to acquire the needed meter readings. Another thing Wireshark tells us, is that TCP port 502 is used to communicate with the meter. In LabVIEW TM we created a TCP connection and used a for-loop with the listed commands to extract the readings. Once we have a proper connection and the readings come through, we start with the synchronization. We will write all the received values to an excel-spreadsheet, including a time stamp. In LabVIEW TM we are controlling the synchronization interval, which is five seconds, using a timed loop. Later on we will use this loop to synchronize this meter with the meters described below. To have a decent feedback, we will add an error notification. Whenever the meter is not communicating, we will log this to a text-file, using the meter s name and a time stamp, and add a notification to the meter s name in the spreadsheet. B. Power meter with RS-485 (Socomec Diris A10) Another meter we will use is the Socomec Diris A10. The Diris A10 is a multifunctional meter for measuring electrical values in low voltage networks in modular format. [2] The Diris A10 has a build-in RS-485 interface, which we will use to connect to a Programmable Logic Controller (PLC). In total, we will connect three meters to the PLC, all using the same serial line. The PLC will constantly poll the meters and store the information temporarily. This information will be available for LabVIEW TM to extract from the PLC and will be logged for later use. Using the Control Vision software and a serial port sniffer, we were able to see what commands are used by the software to receive the readings from the meter. In contrast with the PAC3200, only one command is used to extract all readings, meaning we will have to split the different values in LabVIEW TM. 82

89 We will compare two makes and types of PLC s: a software PLC (Beckhoff TwinCAT PLC) and a hardware PLC (Siemens Simatic ET200S). 1) Using the software PLC: Here we are using a software PLC from Beckhoff. The TwinCAT Manager connects over Ethernet to a BK9050 Ethernet TCP/IP Bus Coupler, which in turn is connected to a KL6041 Serial Interface RS422/RS485 Bus Terminal. Because none of the modules has a CPU on board, the TwinCAT PLC software controls how they operate. Fig. 5. Siemens PLC implementation Fig. 4. Beckhoff PLC implementation Because we are polling three meters over the same serial line, the PLC will poll the meters one by one, causing a delay of ten milliseconds between the readings of the meters. The first thing we need to do in the PLC program is send an initialization command to the meters. This will happen only once, in the startup phase of the program. Once initialized, we start the main part of the program and poll the meters for the data they hold. The meters will each return one string with all their data. This string is made accessible for LabVIEW TM and LabVIEW TM will handle the processing of the received data. The information extracted from the meters is placed in Merker Bytes in the PLC. In this way, LabVIEW TM can access the information through TwinCAT ADS/OCX. 2) Using the hardware PLC: A hardware PLC from Siemens is used to read the connected meters. We will use the Ethernet connection of the PLC to send the data to the server. In this setting, we are using two meters. Both meters are again connected to the same serial line, which in turn connects to the PLC. We program the PLC using the Siemens Step 7 software. The PLC will poll the meters one by one. This time we are using the build-in blocks for Modbus communication. The line speed is set to 9.6kbps, causing a typical delay of six milliseconds between readings. The data received from the meters is stored in two buffers (one for each meter) in the PLC memory. To allow LabVIEW TM to receive the meter outputs, we add TCP communication to the PLC. We will define two commands (one for each buffer) to which the PLC will respond. The server can send those commands when needed and receive the meter readings through the Ethernet network. IV. SYNCHRONIZATION In order to process all measurements in a correct manner, they have to be taken at the same time. For a meter which is directly connected to the Ethernet, like the Siemens Sentron PAC3200, this is not an issue. Every meter has its own direct connection to the server and every meter can be requested at the same time. For a power meter with a PLC this synchronization can be a problem. Because they use a shared serial connection, all the meters must be queried sequentially. This means that one meter must wait for the other to send its data. The readout of the meters by the PLC has to be done as soon as possible in order to get a new measurement of the meters when the software requests one. The Reading of the buffers from the PLC happens trough a 100 Mbps network, that causes practically no delay. To give an indication of the delay between the measurements we have timed this delay. Sending and receiving a single command, together with the reply, takes an average time of 320 milliseconds. The duration of the whole cycle for reading 3 meters takes about 1250 milliseconds. We notice that there is a certain delay in the readout of the measured values. If we really want to use time critical measurements this can cause a problem. We can fix this by connecting fewer meters to one shared line. Or by using meters directly connected to Ethernet like the pac3200. To have a decent synchronization in LabVIEW TM, we use timed loops. All timed loops are connected to each other, what makes them to run synchronously. The period of the timed loops is set by a control and is chosen by the user. A. Plugwise (Zigbee) V. RESULTS Plugwise is less suitable for use as an EMS. However, it may serve as an advanced time switch to turn certain appliances on or off at a given time. The product is rather expensive, 41EURO per plug given the possibilities the plug offers. However, Plugwise is suitable for energy mapping using the internal memory. The accuracy of the module is good, the measurements are within the specified error margin of 5% and the consumption of the plug is not above 1 watt. Moreover, the modules are compact and easy to install. 83

90 The communication method is a weak point of the Plugwise system, while this in one of the most important specifications off a good EMS. The receiving range is quite low. This lowers the possible configurations in buildings because all the plugs need to be in the receiving range of the ZigBee network. The use of a USB stick as connection to the server is a disadvantage. Especially because the position of the stick in the network is a key factor for the speed of communication. The fact that all the plugs should be read sequentially is also a problem. It occurs that a plug, which is a number of hops away from the stick, takes a long time to send his data to the server. In the Meanwhile, the connection is busy and we could not receive any measurements from the other plugs. Also, we could not turn an appliance on or off while waiting for the data from a plug. The company itself does not support the integration of the product in a central EMS. This translates into not publishing the Protocol to the general public. Plugwise does have a wide product range that is still being expanded, this is an opportunity to evolve into a system for controlling devices from a central location without the need of extra wiring. However without changing the communication method, it is not suitable for the use as an EMS. Maybe later, if Zigbee integrates Powerline communication to its protocol, it could be an opportunity to develop a plug that does not only support wireless communication but also make use of Powerline communication methods. The SWOT SWOTanalysis of Plugwise can be found at table I. Strengths TABLE I SWOT-ANALYSIS OF PLUGWISE Weaknesses - Compact - Range - Accurate - Speed - Large product range - Price - Energy consumption - possible configurations - Easy installation - sequential readout - Stick as only connection Opportunities Threats - Advanced time switch - Not suitable for EMS - Zigbee + Homeplug - Protocol is shielded - Expanding the product range B. Module X (Powerline) Modulex is perfectly usable as EMS. the module that we have examined is only a prototype with some hardware errors ModuleX provides a fast communication by using the existing power grid. The module measures the consumption accurately. It has the opportunity to Quickly switch appliances. The module shows the current power consumption in milliampere in case of watts which is normally used in an EMS. The advantage of this module is that each node has its own IP address. Therefore, we can appeal each module separately and do not need to wait for other modules. We must consider that we have enough IP addresses available. In the future, IPv6 could be implemented so a very large range of addresses can be used. Also the plug currently uses a fixed IP address for the nodes. In a later stage, it could use a DHCP server. The module itself consists of a measuring PCB and a communications module. This measuring PCB can also be connected onto a standard Ethernet network. It is interesting to keep them that way. It will also be interesting if a sensor could be connected on the module. There are still some drawbacks to the prototype. These include the size of the devices (13cm x 7cm x 4.5 cm) that can be disruptive if there are multiple outlets close to each other. Besides that the prototype consumes 5 watt per node. For example, if we have ten appliances, all the units together consume fifty watts. This is a lot for a system that seeks to reduce energy consumption. When testing the prototype we also noticed an annoying beeping noise from the node. this is not disturbing in a kitchen for example, but it does in a bedroom These disadvantages will probably be solved if the communication module and the measuring unit becomes one. The consumption can also be reduced by switching to HomePlug GreenPhy instead of HomePlug Turbo. The SWOTanalysis of ModuleX can be found at table II. Strengths TABLE II SWOT-ANALYSIS OF MODULEX Weaknesses - Fast communication - Dimensions - use of existing electrical wiring - Energy consumption - Accurate - Bugs - Individual IP address - annoying noise - Minimal configuration required - Easy installation - Suitable as EMS - Fast switching - Simple protocol Opportunities Threats - Homeplug GreenPhy - Sticking to Homeplug Turbo - Both Powerline and Ethernet - Unresolved bugs - Protocol to the public - Shielding protocol - Price - Price - Reading sensors - Interference from other devices - integration into existing hardware - Single IP address - Current consumption in power (W) - IPv6 Integration - DHCP Integration C. Siemens Sentron PAC3200 (Ethernet) With the Siemens PAC3200 Sentron we use a completely different setup. This meter is directly connected to the Data Network and has its own IP address. When we have several meters of this type, the size of the address book of the meters rises and in the worst case, the addressing of the existing 84

91 network is inadequate. The configuration of the meter is simple and the meter itself remembers its settings. In case of a server failure, we only need the measurement software on another machine to continue the reading of the values. Processing the data requires less programming effort. The received data should not be split, since we could only retrieve the valuable values Siemens Sentron PAC3200 is a decent EMS. It has a digital input and digital output. We can therefore use an external sensor or we can control a device. The SWOT-analysis of the Siemens Sentron PAC3200 can be found at table III. Strengths TABLE III SWOT-ANALYSIS OF THE SIEMENS SENTRON PAC3200 Weaknesses - Energy and power readout - Seperate IP address - extensive measurements - Separate readout of measured values - Central programming - Simple readout - Simple configuration a algorithm to separate all the useful values. We also need to consider that we need a larger buffer in the PLC because of this. The SWOT-analysis of the Socomec Diris A10 with a Beckhoff PLC can be found at table IV. TABLE IV SWOT-ANALYSIS OF THE SOCOMEC DIRIS A10 WITH A BECKHOFF PLC Strengths Weaknesses - Energy and power readout - Delays - extensive measurements - All values in one time - Simple modules - Large buffer needed - Central programming - Simple readout - One IP address Opportunities Threats - Expansion with other modules - Integration into the Power Grid - Easy to install additional meters - Installation by professional - Additional functions - To many meters on one serial line - Failover Opportunities Threats - One digital input - Integration into the Power Grid - One digital output - Installation by professional - Failover - IP address shortage D. Socomec Diris A10 on Beckhoff PLC When we look at the advantages and disadvantages of this setup, we see that this system is worth to use as an EMS. The Socomec Dirs A10 is a versatile meter which can be read easily. We get a comprehensive measurement of the power grid and the load. Because we use a software PLC, the programming of the system is completely centralized on the EMS server. This implies that a fail-over is difficult to build because the backup server must also have the PLC software and we need to reinitialize the meters in case of a fail from the server. The PLC functions as a central repository for the measurements. The benefit of this is that we only need one IP address. When we are in a further stage to add some extra meters we does not need to provide extra IP addresses. We could add the meters at the same serial line or we could add new serial interface module on the Beckhoff PLC. When we use the same serial line, we must carefully consider if the delays does not increase too much. A Beckhoff PLC system has also the opportunity to add modules that allow us to read some sensors and switching devices. Thanks to the central software Programming, these modules could be easily integrated on the same EMS server to execute the necessary algorithms As we look to the programming of the software we see some extra work. De output of the meter exists of all the measurements in one time. This implies that we need to use E. Socomec Diris A10 on Siemens PLC Because we are using the same meter with a similar system, many aspects of the previous section shall return. The biggest difference is that we are dealing with a hardware PLC. It has its own memory and processor, making that the PLC program is executed independently from the server. This results in a more consistent failover if the server fails. We just need to activate the software for retrieving the data from the PLC on another machine. The survey of the meters will just go on and experience no problems from the server failure. The SWOTanalysis of the Socomec Diris A10 with a Siemens PLC can be found at table V. TABLE V SWOT-ANALYSIS OF THE SOCOMEC DIRIS A10 WITH A SIEMENS PLC Strengths Weaknesses - Energy and power readout - Delays - extensive measurements - All values in one time - Simple modules - Large buffer needed - Simple readout - No central programming - One IP address Opportunities Threats - Expansion with other modules - Integration into the Power Grid - Easy to install additional meters - Installation by professional - Additional functions - To many meters on one serial line - Failover F. Output As a result of the measurements from the EMS, we get an energy consumption profile as in figure 6. Such profile will help us in the research for saving strategies. 85

92 Fig. 6. Energy consumption profile VI. FURTHER WORK The next step is that we need to use these systems as efficient as possible by using energy management strategies. In the first phase there will be an ad hoc development where a separate strategy for each device is examined. After that all these strategies need to be combined in a centralized system to manage a supply-demand situation instead of the classic demand-supply situation. VII. CONCLUSION There is still a lot of work to do but we have made a great step forward in the research for environmentally friendly ways to use energy. With the help of this paper, we now know which considerations need to be taken if we wish to implement an EMS. We know how to use a few systems with different kind of communication methods and their strengths, weaknesses, opportunities en threats. These systems are a model of the current market situation on EMS. For existing buildings we have noticed that Modulex has a great potential for becoming a useful EMS. There is only one Condition for ModuleX to succeed and that is that the errors in the hardware need to be corrected. For new homes, it could be interesting to consider a more centralized approach. It is possible to use a energy meter like the Socomec DIRIS A10 combined with PLC applications such as Beckhoff and Siemens. For specific applications where the centralized approach is not possible and no use can be made of a Plug and Play system, then a all in one module such as the Siemens Sentron PAC3200 could be considered. REFERENCES [1] Zigbee Alliance, Zigbee specification: Zigbee document r13, Version 1.1. Web site: 1 Dec [2] B. Vande Meersche, Meer HEB door DSM Request Tetra project, [3] Plugwise website, [online]web site: [4] M. Damen and P. W. Daly, Plugwise unleashed, 1st ed. Web site: [5] SmartGrids: European Technology Platform, [online]web site: [6] Ian Moffat, Ecological footprints and sustainable development Online at - Ecolog Footprint and Sustain Dev.pdf. [7] ZigBee Alliance, [online]web site: [8] Renesas: Efforts to Implement Smart Grids, [online]web site: society/smart grid.html [9] G. Stromberg, T.F. Sturm, Y. Gsottberger and X. Shi, Low-Cost Wireless Control-Networks in Smart Environments, [10] Plugwise B.V., How to set up your Plugwise Network, [11] Ember Corporation, EM250 datasheet, [12] Future Technology Devices International, FT232R USB UART IC datasheet, [13] Analog Devices, ADE7753 datasheet, [14] Silicon Labs, C8051F340 datasheet, [15] Analog Devices, Evaluation Board Documentation ADE7753 Energy Metering IC, [16] HomePlug Powerline Alliance, Inc., HomePlug 1.0 Technology White Paper, [17] Steven Mertens, Modbus Industrieel protocol over RS232, [18] Socomec, RS485 Bus, [19] Simon Segers, Code generator voor PLC en SCADA, [20] Siemens, ET 200S distributed I/O IM151-3 PN interface module, [21] Siemens, ET 200S distributed I/O IM PN/DP CPU interface module, [22] Siemens, S7-300 Instruction List, [23] Siemens, SIMATIC ET 200 For distributed automation solutions, [24] Siemens, Distributed I/O System ET 200S, [25] Siemens, Power Monitoring Device SENTRON PAC3200, [26] Socomec, JBUS Common Table version : 1.01,

93 Execution time measurement of a mathematic algorithm on different implementations F. Salaerts 1, B. Bonroy 1,2, P. Karsmakers 1,3 1 IBW, K.H. Kempen [Association KULeuven], Kleinhoefstraat 4, 2440 Geel, Belgium 2 MOBILAB, K.H. Kempen, Kleinhoefstraat 4, 2440 Geel, Belgium 3 ESAT-SCD/SISTA, KULeuven, B-3001 Heverlee, Belgium Abstract As the resolution of camera s and the amount of recorded data increases, real-time processing of these signals becomes a challenging job. One way to tackle this problem is to decrease the processing time of computational intensive tasks. Singular Value Decomposition (SVD) is a computational intensive algorithm that is used in many application domains of video and signal processing. In this paper, we show how an SVD can be implemented to decrease the processing time. First, we describes MATLAB, OpenCV, CUDA and OpenCL SVD implementations which targets an Central Processing Unit (CPU) and Graphics Processing Unit (GPU) of the computer system. Next, the different implementations are executed on their target processor and processing times are measured. Measurements show that implementations targeting an GPU with an input matrix of size 3000x300 has a performance gain of a factor 93 compared to our MATLAB reference implementation and a factor 11 compared with the OpenCV implementation. Furthermore the comparison between CUDA C code and OpenCL C code from the same SVD algorithm shows that CUDA performs the best in all tested matrix sizes. We conclude that when there is enough data to process by SVD, the GPU is an appropriate solution. Otherwise the CPU remains a good candidate, especially with small matrix sizes. Index Terms CUDA, OpenCL, OpenCV, Singular Value Decomposition, SVD O I. INTRODUCTION N several data intensive applications, it is not opportune to wait long on computation results. Many results become useful if they are computed in real-time. An often used computational intensive algorithm is the Singular Value Decomposition (SVD). SVD is used in video processing for compression [11] and digital image processing [12], in signal processing for filtering [13] and noise removal [14], and in machine learning techniques for data clustering [15]. A drawback of SVD is that when datasets grows, more computational power is needed. There are several initiatives to reduce the required computational power and to make these tasks running faster: make existing algorithms more efficient; using a dedicated processor: o Field Programmable Gate Array (FPGA); o floating point co-processor; using the Graphics Processing Unit (GPU) of the graphic card. A. Making an algorithm more efficient A first possibility to make an SVD more efficient is by making use of cross matrices to compute the SVD. A property of cross matrices is that bigger singular values are more accurate than smaller values. The Rayleigh coefficient [10] can overcome this problem. To get accurate small singular values, the smaller and bigger singular values must be separated clearly. This implies an increase of memory usage. The reason is that the processing matrix A and A T A must be stored in system memory to limit the amount of elements in A T A. A second example is the one-sided block Jacobi method [21]. This method use the caches of the CPU and system memory very well. An improvement can be achieved to use a fast, scaled block rotation technique by a cosine-sinus decomposition [7]. Hereby the FLoating Point Operations per Second (FLOPS) count from one step of the method can be reduced to at least 40%. This result is obtained by calculating the right singular vectors with the standard block Jacobi algorithm. B. Using a dedicated processor A second initiative focuses on a co-processor and FPGA. Typically FPGAs has the advantage that the programmer describes hardware instead of software which increase the performance. The program connects logical gates in the processor so the FPGA is programmed to execute only the programmed task. Weiwei Ma et al [8] makes use of this advantage by means of a simple structure of 2x2 processors to calculate the SVD of a NxN matrix. A major disadvantage of an FPGA is the limit amount of fast accessible internal memory, which limits the sizes of usable matrices. Yusaku Yamamoto et al.[6] uses a ClearSpeed CSX600 floating 87

94 point co-processor [23] to compute the SVD. It processes big matrices a lot faster than the Intel Math Kernel Library [24]. A disadvantage is that no performance improvement can be reached for small matrices. The cause is that the processed matrix has not enough rows. Another reason is that the ratio between the amount of rows and columns are too small. C. Using the GPU of the graphic card As last initiative to improve SVD processing time is based on the Graphic Processing Unit (GPU). Zhang Shu et al. [9] implemented an one-sided block Jacobi method which computes the SVD using the Computing Unified Device Architecture (CUDA) library [17]. This implementation has some shortcomings with shared memory of the GPU which limits the size of the processing matrices. An implementation solution to overcome this problem is proposed by Sheetal Lahabar [4], where the SVD algorithm exists of 2 steps: the bidiagonalization and diagonalization. The bidiagonalization is executed entirely on the GPU, while the diagonalization is executed on both the GPU and CPU. Moreover a hybrid computing system allows each component to be executed his part of the algorithm on the best performing processor. This study focuses the performance measurements based of the above described implementations and libraries. The expectations are that implementations which makes use of the GPU, performing better because of parallel processing. This paper is structured as follows. Chapter 2 describes first the SVD algorithm in common. Followed by a description of the libraries and implementations used for the performance measurement. Finally this chapter describes the test and development environment. Chapter 3 and 4 displays and discusses the results. Chapter 5 concludes this paper. II. MATERIALS AND METHODS A. Singular Value Decomposition Singular Value Decomposition or SVD is a mathematic algorithm used in many application domains such as: video processing, signal processing and machine learning techniques. SVD tells how a vector, which is multiplied by a matrix, has changed compared with the initial vector. How this vector is changed can be determined by calculating the SVD as shown in formula 1: A = USV T (1) where A is a matrix of size MxN, U is a matrix where the columns are orthonormal eigenvectors of AA T, V is a matrix where the columns are orthonormal eigenvectors of A T A and S is a diagonal matrix filled with singular values in descending order [5]. For example: Figure 1 shows the mapping of a circle to an ellipse. The V T -matrix rotates the initial vectors to the coordinate axes, S scales the vectors to a smaller or bigger size and U rotates the vectors in opposite way [22]. So the eigenvectors describes the rotations of the ellipse. The scale is determined by the singular values in the matrix S. The bigger the singular value, the more influence it has on the size of the resulting vector [16]. Figure 1 shows also the major axis and minor axis of the ellipse. These are the largest and smallest eigenvalues. The eigenvalues are the square of the singular values and are related to the eigenvectors. Fig. 1: mapping a circle to an ellipse by SVD [22] There is a lot of computational power needed for large datasets because SVD creates multiple dimensions of the result matrices as follows: A (MxN) = U (MxM) S(MxN) V (NxN) [4]. In practice, many applications that calculates an SVD uses a reduction in dimensions of the matrices. By taking only k biggest singular values with corresponding reduced matrices U and V result in A, an approximation of the original matrix A. This is a reduced SVD of A or rank k of A [16][22]. The advantage of this method is that it saves memory and computational power. B. Implementations MATLAB MATLAB as product of the MathWorks company is a high level programming environment that is suitable to implement and execute several mathematical and scientific tasks on an easy way. These tasks can be like: signal processing, statistics, drawing of mathematical functions and matrices calculating. It is also possible to write new functions in C++ and use them in the MATLAB environment. These functions gets connected within MATLAB using the Matlab EXecutable or MEX interface. This way new functions can easily be called in the Integrated Development Environment (IDE) [19]. In this paper, MATLAB is tested on 2 ways. First we will use the build-in SVD function. Secondly we will write an 88

95 C++ function which will be connected to our MATLAB model with MEX. This function allows the calculations of the SVD to be performed on the GPU using an external library which call CUDA. OpenCV OpenCV or OpenSource Computer Vision is a special build library for real-time image processing and computer vision. The library has more than 500 CPU-optimized functions. A few examples of applications are: facial recognition, motion tracking and mobile robotics [2]. It s originally developed by Intel but afterwards adopted by Willow garage. The latter exploits it as an opensource Berkeley Software Distribution (BSD) license. It is cross platform and available for Microsoft Windows, Apple Mac OS X and GNU/Linux [19]. Just as in MATLAB, the performance comparison uses the build-in SVD function of OpenCV. CUDA CUDA or Computing Unified Device Architecture is a toolkit designed by Nvidia [17]. It allows the programmer to communicate with the GPU as it was a general purpose processor. This technology exists since Nvidia Geforce 8800 came on the market [4]. This way the programmer does not need to have an thorough knowledge of the internal parts of the GPU to process data in parallel. CUDA supports several languages/interfaces [17]: - CUDA C; - DirectCompute; - OpenCL; - CUDA Fortran. CUDA can be called on two ways. First it can be called direct trough a CUDA interface or secondly through an external library. In this paper both ways are implemented. First, the SVD algorithm is programmed with CUDA C language using the CUDA interface and secondly with an external library called CULATools. CULATools is an GPU accelerated linear algebra library created by EM Photonics [20]. This library uses CUDA to communicate with the GPU. Unlike the direct interfaces, CULATools automatically regulates all data traffic to and from the GPU which implies the programmer does not need to take into account optimizations for his GPU code. CULATools is used in this paper with a MATLAB and a C++ implementation. OpenCL Open Computing Language or OpenCL is a toolkit that is designed to execute parallel calculations efficiently on various microprocessors. This requires, nevertheless, a OpenCL compiler must be available for that microprocessor. OpenCL is designed by Apple and presented to the Khronos Group to get standardized. The strength is that once code is written, it can run on several platforms without changing the code [1][3]. So it is not limited to only GPU s like CUDA which only works on Nvidia graphic cards. Several companies e.g. AMD, Apple, IBM, Intel, Motorola, Nokia, Nvidia and Samsung support the development of OpenCL. To able to compare the different implementations in this paper, the code is written in standard OpenCL C language. Because there is no SVD function available in this language at the moment, the author ported existing code from Zhang Shu et al [9]. C. Development and Test environment The test platform hardware consists of an: AMD processor Athlon X2 2GB of system memory, Western Digital Raptor 10K RPM 150GB hard drive and Geforce GT240 graphic card with 1GB video memory. The software part exists of a Windows XP Professional operating system with Nvidia driver version , OpenCV 2.1, CUDA toolkit 3.1, CULATools 2.1 and OpenCL 1.0. The development environment is MathWorks MATLAB R2010a and Microsoft Visual C Professional. The data test set are matrices with a size starting from 100x100 and increments with a size of 100x100 and ends up by matrices of 3000x3000. The matrices are filled with random floating-point values from 0,0 until 1,0. The start time is measured before the execution of the SVD algorithm and the end time is measured after the execution of the SVD algorithm, no pre or post processing is taken into account. Because system resources are running on the background, each implementation is tested 10 times. Afterwards the average of the results will represents the execution time. The results are also influenced by the optimization flags of the compiler and the way the results gets returned on the screen. Some optimization flags are used to limit the binary size of the code, where others are used to speed up the code. The results on the screen can also differ. Some implementations shows the singular values by means of a vector variable, while others shows the singular values by means of a matrix. Because of the differences in showing results, the memory usage and the computational power slightly differs. The OpenCL test is compared with the original CUDA code from the implementation proposed by Zhang Shu et al [9]. Since the matrices for those implementations are different from the rest, they are tested separately.. III. RESULTS Figure 2 shows the graph of the four different implementations. The Y-axis is presented as a logarithmic scale: log(1+x). Figure 3 shows the performance differences between CUDA C code and OpenCL C code. 89

96 As shown in both figures, it is clear that GPU implementations performs better than the CPU implementations. Figure 2 shows that MATLAB performs worst at huge matrix size. Until the matrix size of 400x400, MATLAB implementation is faster than the GPU implementations. When more data needs to be processed e.g. at matrices of 3000x3000, MATLAB needs 2600 seconds to execute the SVD algorithm where the GPU implementations can do the job in less than 28 seconds. That is a factor 93 faster. OpenCV performs better in this case. With a matrix of 3000x3000, OpenCV returns a result in 320 seconds. In comparison with MATLAB, this is a factor 8 faster. Compared to the GPU implementations, the OpenCV implementation is a factor 11 slower. Matlab with CUDA and C++ with CUDA calculates the SVD algorithm using the GPU. At figure 2, there is almost no performance difference between MATLAB and C++. Both returns a common result of 28 seconds. Figure 3 shows that e.g. matrices with size 1760x1760, CUDA C has an execution time of 6.2 seconds while OpenCL return a result in 10.4 seconds. OpenCL is a factor 1.67 slower than CUDA with the same algorithm and dataset. IV. DISCUSSION Figure 2 shows that MATLAB performs worst at huge matrixes size. In the beginning, MATLAB implementation is faster than the GPU implementations. This is because data transfers between system memory and video memory of the graphic card flows through the PCI-express bus. This bus has a maximum bandwidth of 4GB/s, however the CPU communicates with system memory at a bandwidth of 12,8 GB/s which introduce a bottleneck. When more data needs to be processed e.g. at matrices of 3000x3000, this bottleneck gets compensated by faster parallel processing time by the GPU. OpenCV performs better in this case. Since MATLAB and OpenCV are using the CPU, OpenCV is apparently better optimized in his build-in SVD function than MATLAB. Matlab with CUDA and C++ with CUDA calculates the SVD algorithm using the GPU. Both gets a common result of 28 seconds. It is clear that the GPU can use the advantage of parallel data processing. The lines on the graph in figure 2 are less steep than the CPU implementations. So the differences becomes more visible with bigger matrices [4]. Figure 3 shows that CUDA performs better than OpenCL with the same algorithm. This was expected because CUDA, created by Nvidia, is perfect optimized for the graphic card used for these tests. An advantage of OpenCL code is that it should run on various platforms nevertheless it also implies that it is not optimized for a specific platform/microprocessor. As future work a fully functional SVD algorithm can be implemented in OpenCL that works on any size of matrix as dataset. It is interesting to test such mathematical algorithm on different platforms. For example a platform with a PowerPC processor like the older Apple computer systems. Another example is the PlayStation 3 from Sony with his Cell processor. Fig. 2: comparison between MATLAB, OpenCV, and CUDA implementations Fig. 3: comparison of processing time between CUDA C and OpenCL C SVD implementations V. CONCLUSION SVD or Singular Value Decomposition requires a lot of computational power, especially for huge data sets. In realtime applications, it is inappropriate to wait long for results. In this study, we comparing an often used mathematical algorithm implemented using different, existing libraries on CPU and GPU. The conclusion of this study is when a SVD algorithm is processed on a GPU, the input dataset must be large enough. This is because the bottleneck by means of the PCI-express bus must be compensated by large data input. When there is not enough data to process on the GPU, the CPU remains a good solution. In comparison with [10] where the smaller and bigger singular values must be separated to get accurate singular values, the tested algorithms in this paper still works accurate on the calculated singular values. Other algorithms such as 90

97 [8] works only well on small matrices because of the limited internal memory. Like Yusaku Yamamoto et al. [6], the CPU implementations performs well with smaller matrices. If the matrices are small enough, then the performance is even better than the GPU implementations. The GPU algorithm of [9] has an shared memory issue on his graphic card. The tested GPU implementations that uses CULATools does not has any issues with a memory block on the graphic card. This is because the CULATools library is well tested by various people en companies on his correctness. Finally this paper offers test results of many used implementations/libraries unlike [4] who is using only MATLAB and Intel Math Kernel Library. This paper gives also an impression of the difference in performance between CUDA C and OpenCL C language. [18] OpenCV. In Accessed on September [19] The MathWorks TM. In Accesssed on September [20] CULA programmer's guide 2.1, EM Photonics 2010; Accessed on December [21] Cuenca, J., Gimenez G. Implementation of parallel one-sided block Jacobimethods for the symmetric eigenvalue problem. Parallel computing: fundamentals and applications (D Hollander,Joubert, Peters, Sips, eds.). Proc. Int. Conf. ParCo 99, August 17 20, 1999, Delft, The Netherlands,Imperial College Press 2000, pp [22] Muller, Neil; Magaia, Lourenco; Herbst, B.M. Singular value decomposition, eigenfaces, and 3D reconstructions. Society for Industrial and Applied Mathematic 2004; 46(3): [23] ClearSpeed. In Accessed on December [24] Intel Math Kernel Library. In Accessed on December REFERENCES [1] John E. Stone, David Gohara, Guochun Shi. OpenCL: a parallel programming standard for heterogeneous computing systems. Computing in Science & Engineering 2010; 12(3): [2] Bradski G, Kaehler A. Learning OpenCV Computer Vision with the OpenCV library, First edition. Sebastopol: O'Reilly, 2008, pp [3] Ryoji Tsuchiyama, Takashi Nakamura, Takuro Iizuka, Akihiro Asahara, Satoshi Miki. The OpenCL programming book: parallel programming for multi-core CPU and GPU, Printed edition. Fixstars Corporation,2010. [4] Sheetal Lahabar, P J Narayanan. Singular value decomposition on GPU using CUDA. International Symposium on Parallel & Distributed Processing - IPDPS 2009 (2009); [5] Virginia C. Klema, Alan J. Laub. The singular value decomposition: it's computation and some applications. IEEE Transactions On Automatic Control 1980; 25(2): [6] Yusaku Yamamoto, Takeshi Fukaya, Takashi Uneyama, Masami Takata, Kinji Kimura, Masashi Iwasaki, Yoshimasa Nakamura. Accelerating the singular value decomposition of rectangular matrices with the CSX600 and the integrable SVD. Lecture Notes in Computer Science 2007; 4671(2007): [7] V. Hari. Accelerating the SVD block-jacobi method. Computing 2005; 75(1): [8] Weiwei Ma, M. E. Kaye, D. M. Luke, R. Doraiswami. An FPGA-based singular value decomposition processor. Conference on Electrical and Computer Engineering 2006: [9] Zhang Shu, Dou Heng. Matrix singular value decomposition based on computing unified device architecture. [10] Zhongxiao Jia. Using cross-product matrices to compute the SVD. NUMERICAL ALGORITHMS 2007; 42(1): [11] Prasantha H.S, Shashidhara H.L, Balasubramanya K.N. Image compression using SVD. Conference on Computational Intelligence and Multimedia Applications 2007; 3: [12] Andrews H, Patterson C. Singular value decompositions and digital image processing. IEEE Transactions on Acoustics, Speech and Signal Processing 1976; 24(1): [13] Wei-Ping Zhu, Ahmad, M.O, Swamy, M.N.S. Realization of 2-D linearphase FIR filters by using the singular-value decomposition. IEEE Transactions on Signal Processing 1999; 47(5): [14] Maj J.-B, Royackers L, Moonen M, Wouters J. SVD-based optimal filtering for noise reduction in dual microphone hearing aids: a real time implementation and perceptual evaluation. IEEE Transactions on Biomedical Engineering 2005; 52(9): [15] Tsau Young Lin,Tam Ngo. Clustering High Dimensional Data Using SVM. Lecture Notes in Computer Science 2007; 2007(4482): [16] SVD and LSI Tutorial: Understanding SVD and LSI. understanding.html. Accessed on September [17] NVIDIA CUDA Programming Guide,Nvidia 2007; _Programming_Guide_1.1.pdf. 91

98 92

99 Normalization and analysis of dynamic plantar pressure data B. Schotanus, T.Croonenborghs and E. De Raeve K.H. Kempen (Associatie KULeuven), Kleinhoefstraat 4, B-2440 Geel, Belgium Abstract The analysis of plantar pressure data is an important part of the diagnostic tools an orthopedic has to his disposal. However for some of the extracted values there is no acceptable way of comparing between measurements. One of the characteristics we would like to be able to compare is the Centre of Pressure line. Another is the pressure distribution during each moment of the foot roll off. By aligning and synchronizing the data from our measurements we will be able to directly compare both of these characteristics. W I. INTRODUCTION alking seems like a very trivial movement to most of us. But when we take a closer look at the foot, we will see that it is one of the most complex biomechanical structures of the human body. Our feet contain a quarter of the bones in our body. Because of this complexity there are a large number of potential problems that can occur. And as our feet are supporting our full bodyweight a small problem can lead to major consequences. To diagnose these problems, a number of techniques where developed to perform measurements on the foot while walking. One of these techniques is called plantar pressure measurement (or pedobarographic measurement). Because of the dynamic aspect of the data it is often analyzed as a time-series using a computer to visualize results. Plantar pressure imaging is an branch of medical imaging that is still in development. At the moment there is no universal way of comparing two plantar pressure images with each other, nor is there a universal technology for obtaining useful plantar pressure images. However, these images have been used for a long time to correct minor and major abnormality s in plantar pressure distribution, like flat feet or problems caused by foot injuries. These corrections have always been done by the insights of professional orthopedists. If we could provide part of the analysis automatically or semi automatically for these images we could speed up the process and provide a more objective approach. The information we are looking for are the characteristics of a person s gait and pressure distribution during walking. Using this information it is possible to create shoes that will compensate for deviations from what is considered a normal pressure distribution. To make a more comprehensive analysis it is extremely useful to be able to compare multiple images with each other to find differences and similarities. A first step to achieve this goal is to be able to compare two images. In later step multiple images could be compared with the same template image to achieve normalization for a bigger group of images. In this paper we will try to transform a foot pressure intensity image to optimally overlap another. By aligning two pressure images so that the width, height and rotation are equal it becomes possible to compare various statistics about the images. For example the COP-line or Centre of Pressure line can be directly compared with another person only if both images are correctly aligned. By using only affine transformations we preserve relative position information. To find these optimal transformations we will be using metrics described and tested in Todd Pataky s research [1],[2]. Then we will see if we can use this transformation to examine the differences of the dynamic data, adding the time dimension. While analyzing plantar pressure data the maximum intensity image is very often used, ignoring the dynamic data. In this paper we will not only try to align static images but use the found transformations to create overlays of synchronized dynamic pedobarographic data. All calculations where done on a standard laptop, more specifically a Sony Vaio VGN-NS21Z with a 2.4 GHz dual core processor and 4GB of RAM. 93

100 II. DATA SET A. Obtaining the data set Our data set was obtained from the exports of the commercial program Footscan which provides some precalculated statistics about the plantar pressure image. However these statistics are not very well documented so we decided to extract only the raw data from this program. The actual data was obtained using a one meter long RS Scan sensor array at a resolution of 64 x 128, each sensor measuring 5.08 x 7.62mm². An average foot was contained in a 37 x 21 matrix of which the foot populated an average 412 pixels. The sample frequency of the array is 200Hz which resulted in time series ranging from 40 to 160 samples in length. Our subjects were 18 people with a high, low or normal medial longitudinal arch. B. Limiting Data loss We will need to transform our images in order to optimally overlap them, and because of the very low resolutions involved it is important that we consider the data loss when we transform these images. The metric we will use to assess the amount of data lost is the squared error between the original image and an image that was transformed and then inverse transformed. The techniques we will compare are transformations using nearest neighbor, bilinear and bicubic interpolation. We will also examine the effects of up sampling. We can see clearly in Figure 2-1 that nearest neighbor interpolation is not suited for our needs. We can also see that cubic interpolation is too delicate for these very low resolution images while for most cases it behaves slightly better than bilinear interpolation it has a very high outlier in this set. The best result is obviously obtained by enlarged bilinear which is bilinear interpolation transformations on an up sampled image using the nearest neighbor algorithm for up sampling. After the transformations we down sampled the image again to compare with the original. A lower squared error means that we have retained much more pixel information when we enlarged the image before applying any transformations. We did not consider other algorithms for up sampling because they estimate new values for the new pixels and when we try to keep as close to the original as we can we don t want that. Figure II-1 Box plots of the error distribution using different interpolation algorithms III. NORMALIZATION OF THE IMAGES For the normalization we will use the maximum intensity image for the calculation of the needed transformations to achieve a good alignment between two pressure measurements. Earlier research shows that XOR and MSE are excellent metrics to achieve a good alignment between two images. XOR or more specifically the overlap error is calculated by dividing the number of non-overlapping pixels by the number of overlapping pixels. This error is calculated on a binary image, every pixel that contains a value is given a value of one. The mean squared error (or MSE) is calculated by raising the error between the images to the second power and dividing the sum of all these errors by the matrix dimensions. This approach gives extra penalty to large errors. The division by the matrix dimensions eliminates the possibility of drastically reducing one of the images dimensions to minimize the total error. They both require a decent amount of computation, and if the initial position is far away from the optimal position optimizing can take up to 25 seconds on my system. To achieve better results an initial guess is required. Using the much faster but also less accurate algorithm[2]: principal axis alignment. Because of great differences in the pressure distribution of our subjects we will use this algorithm on a binary image from our maximum intensity image using the threshold >0. We implemented this algorithm using Matlab s Image Processing Toolbox, the math required for calculating the 94

101 principal axes of an image is based on the central moments of area, more information can be found in[3]. Using this addition, our average processing time for optimization went down to 3-5 seconds using the standard optimization algorithm: fminsearch, provided by Matlab s Optimization Toolbox. Using principal axis alignment also makes the registration more robust as it can correct rotations as far as -89 and +89. When only using XOR or MSE these rotations could lead to upside down registration of the images and very slow convergence towards the minimum. To further increase robustness we flipped all right feet vertically so that all examined feet were presented as left feet. Also we added a piece of code that would find the center of pressure in the first 20% and the last 20% frames so we could check for reverse direction recorded data and horizontally flip the image when necessary as shown in Figure III-1. Figure IV-2 Comparison between COP-lines V. SYNCHRONIZATION Now that we have obtained the desired transformation to optimally overlap two feet we can apply it to the entire time series of the foot roll off giving us images from two different feet with the same size and orientation. To compensate for the difference in length in time and possibly a entire different way of walking we could try to synchronize these images in order to analyze the way they walk. We have tried three different ways of synchronizing these time series which we will discuss below. Figure III-1 crop and flip as necessary The optimization algorithms from the Optimization Toolbox could not always improve on our initial guess so we wrote our own algorithm that would operate with boundaries and step sizes that we could more easily manipulate. For each loop our algorithm calculated the MSE or overlap error for one step in six possible directions for horizontal and vertical scaling and rotation. The translation, which is the easiest to calculate was determined automatically for each step. Our algorithm changed the parameter that would give the best improvement until a minimum was found. The found parameters where then used to transform both images. IV. COMPARISON BETWEEN COP-LINES Because we now have two aligned plantar pressure images it becomes possible to directly compare their respective COPlines. Aligning the general shape of these feet also means that the anatomical regions are now aligned. Things like the starting angle of the COP-line or the deviations along the horizontal axis now have the same reference and can be compared between subjects. This was not possible between non-normalized images because these values are dependent on the shape and orientation of the individual foot. In Figure IV-1 we can see the result of such a comparison. The contour mask and A. Stretching to same length One approach could be to simply stretch out the shortest time series in a linear fashion to match the other one in length. This will preserve the relative lengths of each phase of the movement compared with the template. It is not possible to compare the weight distributions at a chosen foot stance because this method does not provide any real synchronization. This method is not very useful as we don t need visual feedback when examining the time differences between phases. B. Synchronizing using Footscan parameters Footscan detects four different phases during foot roll off. The beginning and ends of these phases are measured by timing the moments of initial and last contact of the various anatomical zones. Using these parameters to synchronize these phase transitions we now have five points in time which should represent the same foot position for our examined feet for those points in time. The data frames during each phase can be linearly spread along the duration of each phase. This approach gives us a visual representation of the differences in pressure distribution between two datasets for the full duration of the foot roll off. One problem we encountered is that the last phase typically lasts for about 40% of the total time. 95

102 Because there is not a single synchronization point during this time span there may be some difference in stance between frames that where matched in this way. In Figure V-1 you can see that during the last phase the synchronization can be less accurate because of the few known synchronized points. This can best be observed by looking at the progress of the COPlines, we can see that the blue line is pulling ahead of the red one. Figure V-1 COP-line synchronization As you can see in the figure above, the red and blue line which represent the COP-lines are moving together and we can now demonstrate the differences in pressure distribution at this point in time. VI. CONCLUSION Figure V-1 Synchronisation using Footscan parameters C. Synchronizing using COP-line progress The COP-line gives us a good indication of the foot stance during the foot roll off. By using this knowledge to synchronize between datasets, we now have a very large number of synchronization points. During the registration, our images where aligned to match each other s form. Because of this we can use the vertical coordinate of the COP-line to give an indication of foot roll off progress. As both images now have the same size, position and rotation, the vertical coordinates of the COP-line now should have an identical value for both feet in the same position. Using this approach to synchronization gives us a good view of pressure distribution differences as the upward movement of the centre of pressure progresses. Using an Optimization algorithm with a principal axis alignment as an initial guess proves to be an effective way of aligning plantar pressure measurements. As a result we now are able to compare Centre of Pressure (COP) lines directly with each other. This new possibility will be useful to quickly evaluate changes in this line for different kinds of shoes or insoles. Also with a synchronization based on the COP-line we can check for differences in pressure distribution during each moment of the foot roll off with both feet in the same position in time of the roll off. This can help orthopedists to get a better insight on the whole roll off process. REFERENCES [1] Pataky, T. C., & Goulermas, J. Y. (2008). Pedobarographic statistical parametric mapping (pspm): A pixel-level approach to foot pressure image analysis. Journal of Biomechanics, 41 (10), [2] Pataky, T. C., Goulermas, J. Y., & Cromptona, R. H. (2008). A comparison of seven methods of within-subjects rigid-body pedobarographic image registration. Journal of Biomechanics, 41 (14), [3] Prokop, R. J., & Reeves, A. P. (1992). A survey of moment-based techniques for unoccluded object representation and recognition. Graphical Models and Image Processing,

103 Evolution to a private cloud using Microsoft technologies C. Sels, F. Baert, G. Geeraerts Abstract An efficient IT environment is a key factor in today s business plan. Without an optimized IT environment which is easy to manage, businesses won t go far. This is common knowledge nowadays and businesses tend to focus more and more on optimizing their IT infrastructure to gain advantage to their competitors. Datacenters are created dynamically and the goal is to make use of resources as efficiently as possible such that the environment becomes easily manageable. This is a necessity for most modern infrastructures. Lately, the goal is more and more to create a private cloud, which needs this flexible infrastructure as an underlying basis. This private cloud can help reduce IT costs while increasing agility and flexibility for the company. By building a private cloud, the way IT delivers services and the way users access and consume these services changes. The private cloud provides a more cost-effective, agile way to provide IT services on-demand. The evolution to a private cloud can be achieved through numerous technologies. The focus here is how a company can evolve to a private cloud infrastructure using Microsoft technologies. Index Terms Private cloud, Infrastructure as a Service, Virtualization, Dynamic IT, System Center, Self-Service Portal C I. INTRODUCTION loud computing is a concept in IT where computing resources which are running in the cloud, can be delivered on-demand to people who request them. This is often referred to as IT as a service. Cloud computing is defined in The NIST Definition of Cloud Computing and exhibits following essential characteristics: on-demand self-service, broad network access, resource pooling, elasticity and measured service [1]. Of course, there are several types of clouds defined. A public cloud is a cloud infrastructure which is owned by an organization selling cloud services. It is made available to the general public. A private cloud is a cloud which is dedicated to an organization. The cloud infrastructure can be owned by the organization itself or a 3 rd party hosting company. In this paper, the focus is on achieving a private cloud infrastructure which is owned by the organization itself. With private cloud computing, on-demand self-service is introduced. This way, organizations hope to decrease the costs and the time needed to deliver infrastructure. The datacenter can respond more quickly to changes and the whole environment becomes even more dynamic. It is clear that automation is an important factor in a private cloud. The goal is to automate IT processes and minimize human involvement. A private cloud infrastructure builds upon the company s existing virtualized environment. Thus, it is not a separate product but rather a solution that builds upon existing infrastructure technologies and adds some important aspects. The result is a service-oriented environment which changes the way IT services are delivered. Notice that there are 3 different types of services which can be provided by a cloud [2], [3]: 1. Software as a Service (SaaS) 2. Platform as a Service (PaaS) 3. Infrastructure as a Service (IaaS) In most cases, a private cloud means provisioning Infrastructure as a Service (IaaS) to the users within the organization. With IaaS, datacenter resources such as hardware, storage, and network are offered as a service from within the cloud. These resources are placed in a pool and abstraction of the underlying fabric is made. It hides the technical complexity of compute, storage and network from the consumer. Instead, the consumer can select network and storage based on logical names. This way, infrastructure is delivered as a service. This infrastructure can be delivered with virtual machines in which the consumer has to maintain the OS and installed applications, while the underlying fabric is managed by the organization. As mentioned, one of the most important attributes of a private cloud infrastructure is user self-service. Users can obtain infrastructure from the cloud ondemand via self-service portals. This decreases the time required to provision infrastructure to users within the organization and decreases the costs as well. At the same time, capacity can be rapidly and elastically provisioned to the consumer.the overall agility of the datacenter increases. In order to move to a private cloud infrastructure, several steps have to be kept in mind. First, the overall infrastructurearchitecture required needs to be noted. This involves several key architecture layers. In order to implement these layers in the datacenter, several Microsoft technologies can be used. This paper analyses the private cloud layered architecture and provides for the milestones needed to implement these layers. This allows for an organization to build the basis for their own private cloud to provide Infrastructure as a Service using Microsoft technologies. 97

104 II. FROM VIRTUALIZATION TO PRIVATE CLOUD Every organization wants to evolve to an efficient IT infrastructure. For some companies, the ultimate goal is to create a private cloud, which needs this flexible infrastructure as an underlying basis [2]. Notice that the evolution to the cloud will introduce many manageability issues concerning IT operations. Without the proper application manageability, the transformation will most likely become non-manageable and produce a lot of costs. This is why several aspects of the infrastructure have to be taken in consideration when an organization wants to evolve to a private cloud infrastructure. Fig. 1. Logical evolution of the datacenter Nowadays, most companies are aware of the benefits that virtualization technologies have to offer. A lot of companies have moved to a virtualized datacenter, which means the utilization of the datacenter increases significantly, compared to the traditional datacenter. Server consolidation is another advantage with the virtualized datacenter. For a lot of organizations however, the evolution doesn t stop there. Virtualization is a critical layer in the infrastructure which has several advantages. It must be noted however, that other layers such as automation, management, orchestration and administration are of equal importance when a private cloud infrastructure is to be achieved. Even though an infrastructure is virtualized, its virtual machines or the applications within might still not be monitored; there might not be process automation, etc. This is the reason why other infrastructurearchitecture layers are needed [4]. These additional layers are shown in figure 2. very important when moving to a private cloud. They are used to enable Infrastructure as a Service, or IaaS. This was described earlier. When introducing these architecture layers in the datacenter, the term fabric management is often used. We can define the fabric of a datacenter as the storage, network and hardware in the infrastructure. In other words, the bottom layers in the private cloud layered architecture. A private cloud will use these additional architecture layers to enable abstraction of services from the underlying fabric. This way, the fabric can be delivered as a service from within the cloud. Thus, infrastructure can be delivered as a service to the organization. This provides a more cost-effective, agile way to provide IT services on demand and is one of the main attributes of a private cloud. The management layer plays an important role in a private cloud infrastructure. This layer consists of a set of components in which each component has its own management function. The management layer must be able to manage and monitor the virtualized environment, allow for service management, and so on. You may also notice the orchestration layer. The orchestration layer makes use of all of the underlying layers and provides for an engine which automates IT processes. This is not done by scripts, but by using a graphical interface, mostly workflows. This is referred to as Run Book Automation (RBA). The orchestration layer allows for the datacenter management components to be integrated in workflows. Finally, the administration layer is shown as the top layer. Private clouds provide Infrastructure as a Service. In order to do this, it is obvious that self-service portals are required. The administration layer fulfills this role. Thus, the administration layer is a sort of user interface which can be accessed in the organization to request infrastructure as a service. This makes it possible to provision infrastructure on-demand to the organization. By integrating all of these infrastructure-architecture layers, moving to a private cloud is made possible. It can be stated that all of these layers are required to evolve to a private cloud. The next sections show the technologies and best practices that are offered by Microsoft to achieve this layered architecture [5]. Fig. 2. Private cloud layered architecture Notice that virtualization only offers the foundation for moving to a private cloud. It is used as a foundation to introduce the other architecture layers. These layers are also III. CREATING THE HYPERVISOR INFRASTRUCTURE A reliable, highly available and scalable infrastructure is needed as a foundation for a private cloud. In order to achieve this, virtualization is a must. Microsoft Hyper-V virtualization technology is used to accomplish this goal. This provides the appropriate virtualization hosts needed. By using virtualization, an abstraction layer is created which hides the complexity of the underlying hardware and software. This is needed in a private cloud. High availability is an important aspect within flexible IT environments. Windows Server 2008 R2 Failover Clustering can be used to provide highly available 98

105 virtual machines. Failover clusters in Windows Server 2008 provide high availability and scalability for critical services [6]. Each Hyper-V host server is configured as a node in the failover cluster. When a node in the failover cluster fails and shuts down unexpectedly, the failover cluster will migrate all of the virtual machines on the failing node to another node. This allows for the virtual machines to keep running. Fig. 3. Failover cluster Failover clustering requires shared storage for the cluster storage. This requires use of a Storage Area Network (SAN). All of the virtual machines have to be stored in the shared storage. If this step is not done, migration of virtual machines in the failover cluster is not possible. This means that there will be downtime when a virtual machine has to be migrated to another node in the cluster. There are two types of migrations possible: 1. Quick Migration 2. Live Migration Quick migration will save the state of a virtual machine, move the virtual machine to another node in the failover cluster, and next restore the state of the VM on the new node and run the virtual machine. This means there will be some downtime. With live migration, another mechanism is used. Live migration transparently moves running virtual machines from one node of the failover cluster to another node in the same cluster. A live migration will complete in less time than the TCP timeout for the migrating VM [7]. This means the users working on the virtual machines won t perceive any downtime and thus availability of services increases. It s clear that live migration will produce less downtime. By implementing and configuring these aspects, the virtualization layer in the private cloud architecture can be realized. IV. CREATING THE MANAGEMENT INFRASTRUCTURE The Hyper-V infrastructure and failover cluster form the core of the IT environment. Virtualization is essential as a foundation in the private cloud infrastructure. However, to maintain the availability and flexibility of the datacenter, management software is required. Thus, an important step is the design of a management infrastructure that supports the virtualization hosts and storage infrastructure. This will allow for the cloud to rapidly scale its resources with location transparency. The management infrastructure will monitor the virtualized environment, manage it and operate it. To achieve this, several Microsoft technologies can be used. System Center is the management suite which is offered by Microsoft. It consists of several products, each with their own management function within the datacenter. We can define the most important management components in the datacenter as the following. A. Directory and authentication services The offering that best meets these needs is Active Directory Domain Service (ADDS) and Domain Name System (DNS). An Active Directory environment for the datacenter will require fewer changes than a standard network as it will not have as many user and computer accounts. This allows for better manageability. To provide flexibility and reliability, the domain controller has to be protected from possible failures. If the primary server fails, a secondary server can then take over. Thus, at least two domain controllers must be implemented. B. System Center Virtual Machine Manger (VMM) Managing a virtualized environment is an important task. By creating the hypervisor infrastructure, the virtualized environment was enabled. It has to be managed as well, though. If this is not the case, evolution to a private cloud cannot be made possible, since the top layers in the layered architecture of a private cloud will make use of these management components to provide user self-service. By implementing System Center Virtual Machine Manager (VMM), the management of the virtualized environment is taken care of [8]. The management of the virtual machines is more and more automated. This is needed for a cloud to scale rapidly with pooled resources and with location transparency. Thus, VMM can be defined as one of the most important management components for evolving to a private cloud infrastructure. This will become clearer when we talk about self-service portals, since they make use of VMM as an underlying layer. C. System Center Operations Manager (SCOM) Opposed to VMM, SCOM will monitor the virtualized environment instead of managing it. SCOM is able to monitor thousands of servers, applications and clients. This is very important. Without a monitoring solution in place, the infrastructure will become unmanageable. Problems with servers will not get solved. A failing service needs to be monitored as soon as possible. By introducing SCOM, failing applications or services are kept in a centralized environment, instead of distributed over all of the servers. So it is clear that the manageability and flexibility will increase significantly, while costs and time to resolve issues will decrease. VMM and SCOM can be tightly integrated with each other using Performance and Resource Optimization (PRO). PRO is a feature in VMM and helps optimize resources through 99

106 intelligent placement of virtualized workloads. This means PRO will choose the most optimal host for a VM based on its resources and configuration. This will significantly increase the elasticity of the datacenter, which is a very important attribute of private cloud infrastructures. For example, PRO will migrate a virtual machine to another host when the host runs out of sufficient resources. The need for resources of the host is monitored by SCOM and is automatically remediated by VMM. D. Other management components Apart from these management servers, other System Center components need to be implemented as well. Other management components are System Center Service Manager (SCSM), System Center Data Protection Manager (DPM) and System Center Configuration Manager (SCCM). By introducing these management components, following is provided: Data protection and backup solution Helpdesk service management Patch management and software distribution Operating system deployment think about problems at a higher level. Opalis is a very important factor in a private cloud infrastructure. It allows for datacenters to respond more quickly to changes through automating scenarios and best practices. By doing this, operational tasks can be performed automatically with minimized human involvement. This is an important attribute of private cloud computing [9]. VI. PRIVATE CLOUD In the previous sections, it was seen how virtualization is essential to moving to a private cloud. However, other architecture layers were needed as well. By implementing the Microsoft technologies shown before, these layers can be achieved [10], [11]. An overview of the implemented Microsoft technologies can be seen in figure 4. By combining all of these management components, the management infrastructure is realized. This increases the availability and resiliency of the datacenter. All of the management components can work together and help automate the datacenter. V. AUTOMATION AND ORCHESTRATION INFRASTRUCTURE An important characteristic of a cloud is the minimization of human involvement in IT processes. A well-designed private cloud will perform operational tasks automatically; elastically scale capacity, and more [9]. This can be achieved over time by implementing the automation and orchestration layer in the private cloud infrastructure. Opalis Integration Server (OIS) is a Microsoft technology which can be used to create the orchestration layer in the private cloud architecture. Opalis is a management component which is very important in datacenter automation. It can be described as the component that glues all of the System Center products together. Opalis is an automation platform and is used for orchestrating and integrating IT tools and processes. Opalis does this by using workflows, as opposed to scripts. This is called Run book Automation (RBA). Opalis uses a workflow engine which does the automation of the workflow. Every IT process is represented by a building block. All of these building blocks are combined and orchestrated in an Opalis workflow. This automation method has several advantages. Mainly, because it uses a graphical representation of IT processes, the workflows are selfdocumenting and easy to understand. This way, it s possible to Fig. 4. Microsoft technologies used in private cloud layered architecture In the previous sections, the administration layer has not yet been described. This layer will provide for a user-interface and implements self-service in the datacenter. As stated before, this is one of the most essential characteristics of a private cloud. By adding self-service, users can obtain infrastructure from within the cloud on-demand. This decreases the time needed to provision infrastructure to users within the organization and decreases the costs as well. The overall agility of the datacenter increases. As seen in figure 4, the Microsoft technologies which provide self-service portals to the datacenter are Virtual Machine Manager Self-Service Portal 2.0 (VMM SSP 2.0) and System Center Service Manager (SCSM). VMM SSP 2.0 plays a very important role in the evolution to a private cloud and will allow for many of the attributes that are needed. The architecture can be seen in figure

107 Fig. 6. VMM SSP 2.0 building blocks Fig. 5. VMM SSP 2.0 architecture VMM SSP 2.0 is a solution by Microsoft which is implemented in the administration layer of the private cloud layered architecture and introduces user self-service to the infrastructure [12]. So, it is implemented on top of the existing IT-infrastructure to provide Infrastructure as a Service to business units within the organization. As seen in figure 5, it consists of a web component, a database component, and a server component. VMM SSP 2.0 makes use of a specific mechanism to provision datacenter resources to the organization. By using this mechanism, it implements a private cloud of IaaS. It allows for a method to provision datacenter resources in a different way than the traditional datacenters. VMM SSP 2.0 works in following way. It makes use of several built-in user roles. Each of these user roles has specific rights. First, the user role of the datacenter administrator has to configure the resource pools in VMM SSP 2.0. This way, pools are defined for storage, network, etc. At the same time, the datacenter administrator will import templates from VMM. These templates can be used by the business units to create virtual machines. Costs are associated for reserving and allocating these resources. After this, a business administrator is able to register his business unit in SSP 2.0. He can then create an infrastructure request with appropriate services. If the datacenter administrator approves this request, the business unit users can start creating virtual machines within the approved capacity which was assigned to the business unit (figure 6). The costs which are associated with these resources will be charged back to the business unit. By monitoring chargeback data, the organization can keep track of the resources which are provisioned to the business units and the cloud solution is kept manageable. VMM SSP 2.0 also provides a dashboard extension to monitor these chargeback costs and resources which are serviced to business units. Remember that that measured service was an essential characteristic of cloud computing. VMM SSP 2.0 uses a specific service-delivery model to deliver Infrastructure as a Service to business units within the company. By using virtual machines, the infrastructure can be provisioned on-demand. Once an infrastructure gets approved, business units can create virtual machines on-demand within the permitted capacity. With VMM SSP 2.0, the provisioning and deployment of infrastructure is different than in a traditional datacenter. In a private cloud, the focus is more in delivering services than managing and setting up physical servers [13]. This allows IT to focus more on the business aspect than on the physical hardware associated with the services. Capacity can be rapidly and elastically provisioned. Furthermore, Service Manager (SCSM) can be used in the administration layer of the private cloud architecture. This way, SCSM can be used as a user interface to initiate automation workflows in Opalis. Thus, by integrating SCSM and Opalis, change requests can be automated. When a change request in SCSM is approved by a datacenter administrator, Opalis can be used to detect this. This can initiate a workflow in Opalis which automatically resolves the change request. This way, a lot of intermediate steps are removed and human involvement is minimized. This is an important attribute of a private cloud infrastructure, as mentioned before. VII. CONCLUSION We can note that the evolution to a private cloud using Microsoft technologies is possible with current Microsoft offerings. By implementing these offerings, the infrastructure complies with the NIST definition of cloud computing. The implemented solution provides all of these capabilities. However, it must be underlined that a private cloud is not a single solution offered by Microsoft. Rather, a private cloud is a collection of products which work together to offer Infrastructure as a Service. First of all, the layered architecture of a private cloud should be kept in mind. Although virtualization is an essential part of moving to a private cloud, several other layers are of equal importance. They are necessary as well when a private cloud infrastructure wants to be achieved. Virtualization provides the foundation for these 101

108 layers. Thus, it can be stated that private cloud computing really is a logical evolution from the virtualization trend of the last years. By adding layers of new technologies (self-service, chargeback, management, and more) to the existing datacenter system, a private cloud infrastructure can be realized. In this private cloud, the provisioning and deployment of infrastructure is different than in a traditional datacenter. In a private cloud, the focus is more in delivering services than managing and setting up physical servers. This allows IT to focus more on the business aspect. Finally, we can conclude that evolution to a private cloud consists of many aspects and considerations, which should be well planned. A private cloud is not a single solution; rather it consists of several steps and products. Microsoft provides the products needed to implement these aspects. REFERENCES [1] National Institute of Standards and Technology, NIST Definition of Cloud Computing, at: [2] M. Tulloch, Understanding Microsoft Virtualization Solutions from the desktop to the datacenter, 2010 [3] E. Kassner, Road to a private cloud infrastructure, 2010, Available at: [4] D. Ziembicki, From Virtualization to Dynamic IT, The Architecture Journal #25, Available at: [5] Microsoft corporation, Infrastructure Planning and Design Guide Series, at: [6] Microsoft Corporation, White paper Failover Clustering in Windows Server 2008 R2, Available at: [7] Microsoft Corporation, Hyper-V Live Migration Overview & Architecture, at: c6-3fc7-470b e6a19fb0fdf&displaylang=en [8] M. Michael, Mastering Virtual Machine Manager 2008 R2, 2008 [9] A. Fazio, Private Cloud Principles, 2010, at: [10] D. Ziembicki, Government Private Cloud, 2011, at: [11] Microsoft Corporation, Microsoft private cloud solutions, at: [12] Microsoft Corporation, Hyper-V Cloud Fast Track, at: [13] Y. Choud, Choud on Windows Technologies, at: 102

109 Creation of 3D models by matching arbitrary photographs (June 2011) S. Solberg Abstract The presented algorithm matches two photographs taken by any consumer camera. Common features on these photographs are selected by human interaction and are used to create a 3D model of a subject s facial features. The acquired data provides a much more reliable base then a standard plaster cast as it also contains a subject s facial structure. This approach is very interesting in dental and other medical domains as it does not rely on expensive hardware. Thus making it possible for a small dental lab to analyze and measure a subject s set of teeth. The goals of this document are showing the theory behind the algorithm and the accuracy of the data it provides. Index Terms 3D, model, arbitrary photograph, matching A I. INTRODUCTION common way to represent a person s set of teeth is by taking a picture or by making a plaster cast. The problem that arises is the loss of information that occurs. The cast is sufficient to define the teeth relatively to each other and the picture is sufficient to define the set of teeth relatively to the face, but never both at the same time. This might result in wrong crowns and bridges, giving an unsightly result. More importantly; It will damage the reputation of the dentist and may lead to additional costs. The algorithm described in this document provides a solution to this problem. The algorithm is based on elements found on both a frontal photograph and a profile photograph. From here on we assume that the specified features are one of the canine and one of the incisors. These points are chosen for several reasons: human physiology to create a 3D image from two photographs taken from a slightly different viewpoint. Thus mimicking the view received by the left and right eye. When the two pictures are presented to their corresponding eye, the viewer sees the image in 3D. This implies that the depth of the scene can be calculated from these two photographs. This technique relies on the correct positioning of the two cameras. The algorithm presented in this document aims to improve the amount of freedom in camera positioning. In contrast with stereoscopy, this algorithm is not intended for full scene modeling. III. CALIBRATING CAMERA Some important properties of photographs need to be taken into account before explaining the algorithm. Before any measurement can be made, we need to determine the viewing angle of the camera. Since perfect accuracy is not required, this can be done in a practical test. When the size of an object or shape is known it can be used to calibrate the camera. The easiest way to do this is to use a ruler and position it horizontally or by drawing a line of a known distance on a wall. The camera should be positioned so the ruler or line will fit inside the camera frame exactly. The distance between the camera and the ruler can then be used to calculate the viewing angle. This setup can be seen in figure 1. When the scene is They do not move when the subject opens or closes his mouth. They are close together and therefore suffer less from distortions caused by camera lenses. They are relatively easy to define in an indisputable way. There is also a drawback in using these points. Since these two points are close together, quantization noise will play a significant role in the acquired accuracy. II. RELATED WORK A very similar technique on creating 3D models or scenes can be found in stereoscopy [3]. This technique uses the Fig. 1 Calibrating the camera 103

110 viewed from above it can be seen that the camera angle can be calculated by following formula. The top view is shown in figure 2. 1 (1) This formula can then be used to calculate the focal length of the used camera. v 2 w tan 2 2 w L 2 d w d L IV. MATHEMATICAL APPROACH A. Defining a photograph A photograph is basically a rectangle with a certain width and height. The photograph is taken at a certain point in 3D space with a hardware specific focal point. Once the size and position of the photograph and the focal point are determined, we have defined the photograph. This is shown in figure 3. The image shows arbitrary points being projected onto a plane. Every point that is contained in the pyramid will be projected on the area of the picture. This pyramid is defined by the size and position of the picture and the focal point. The rotation of the picture solely depends on the location of the focal point. (2) B. Defining a projected point Every pixel on a picture can be seen as a projection of a point in 3D space on a 2D screen. Points in 2D space can be defined by a single vector p between the center of the picture and the projected point. This vector will be coplanar with the projection plane. This projection is not perpendicular, but towards a single point, namely the focal point of the camera [1]. We will call this point the origin o for the remainder of the paper. A result of this projection is distortion of size and perspective. An object which is out of the center of the image will appear rotated and scaled. Looking into a cardboard box is a good example to illustrate this phenomenon. You will be able to see all four walls of the box, despite the fact that they are perpendicular to the bottom of the box. C. Defining a ray In order to simplify the calculations, we will represent the points in 3D space with vectors. Every point in 3D space is projected onto a 2D plane. All these projections pass through the viewpoint. A very alike approach is used in raytracing [2]. This implies that we can also define each point in 3D space by a vector passing through the origin like shown in figure 4. An arbitrary point a would then be defined by: a With o r k v r p (3) (4) The parameters in these equations are: The position of the camera o Fig. 2 Top view of calibration Fig. 3 Projection of 3D space onto 2D plane Fig. 4 Vector representation 104

111 The viewpoint v relative to the camera position o The direction vector r The projected point p relative to the viewpoint v Note that this formula alone is not sufficient to describe a point in 3D space. The parameter k in this equation is unknown, thus defining a straight line through o. From now on this will be referred to as a ray. This is because a picture does not contain information about depth. The rest of the parameters are either unknown or can be chosen arbitrarily. Since only the distance between the two photos has any significance in the algorithm, one of the origin vectors can be chosen to be coincident with the origin of the coordinate system. The same goes for the rotation of the photos. The remainder of the paper will thus assume that o is chosen (0,0,0) and that v is chosen (0,0,v z ). Depending on the available information regarding the hardware, v z might also be known. D. Defining a point Since a single picture is not enough to define a point in 3D space, two pictures will be used. These picture can be taken from any position and with any given rotation. We can define rays for both projection planes. These vectors will collide somewhere in 3D space, given by the following equation. The only known parameter is p. arbitrarily. The algorithm assumes that and that a F a S of k o k F v F is chosen (0,0,v z ). r o l r o F and F F S S ( vf pf ) os l ( vs ps E. Solving the ray equation F v can be chosen o F is chosen (0,0,0) (7) can be split up in components, so it defines a system of three separate equations. o k v p ) o l ( v p ) (8a) Fx ( Fx Fx Sx Sx Sx ( vfy pfy) osy l ( vsy psy o k ) (8b) Fy o k v p ) o l ( v p ) (8c) Fz ( Fz Fz Sz Sz Sz ) (5) (6) (7) k p o l (9a) Fx Sx ( vsx psx) k p o l ( v p ) Sy (9b) Fy Sy Sy k v o l v p ) (9c) Fz Sz ( Sz Sz F. Solving for a model Section E showed the equations for a single point. It can be seen that this system is not linear and thereby not solvable by using a matrix. Since we can never solve a system which has more independent variables than equations, we define points until the following condition is satisfied. ( equations) 3n n 7 2n 7 ( parameters) For every point we define in 3D space we find three more equations, while only two unknown parameters get added. These parameters are the k and l factors. When seven points are defined in 3D space, this provides 21 equations with 21 unknown parameters. This is a solvable problem. The possible rotation of secondary projection plane causes the equations to become even more complex. The rotation causes the parameters p Sx, p Sy and p Sz to longer correspond to the distances that can be seen on the photograph. They can however be calculated by using a rotation matrix. p T S p S With T ' tx ² txy txz 0 c sz sy txy ty ² tyz 0 sz c sx txz tyz tz ² 0 sy sx c (10) The X, Y and Z represent the unit vector around which the rotation is to take place. We will represent this vector with R. R v v F F v v S S By making previously mentioned assumptions, the system is reduced to nine unknown parameters. These parameters are o Sx, o Sy, o Sz, v Fz, v Sx, v Sy, v Sz, k and l. Despite any information regarding the focal length of the camera, the parameters v Sx, v Sy and v Sz remain unknown due to a possible rotation. The system can be simplified to: With v F v S v v F F 0 v v Sx Sy 105

112 Thus making R v v Sx Sx v ² v Sx Sy v ² v 0 The other parameters in this matrix are derived from the angle over which the rotation is to be executed. c s t c s t cos( sin( 1 1 v v v v F F F F ) ) cos( vs vs vs vs vf v F ) vs v S This extra transformation does not add any additional unknown variables, which implies that the system can still be solved. However, the complexity of the system has increased dramatically. It is very unlikely that this system can be solved by a deterministic approach. Therefore numerical or iterative approaches have to be taken into account. This however is beyond the scope of this document. G. Remarks Sy Sy In section D of this chapter we made some assumptions about the rotation of the pictures. By assuming that chosen (0,0,0) and that v F ² ² is chosen (0,0,v z ), two rotations of the frontal projection plane are locked. This is not a necessity, but it makes solving the equations easier. There is however one very important rotation that is not included in the mathematical approach. The rotation of the frontal projection plane for instance; When we alter v, the plane will rotate around the Y-axis. When we alter v o Fx Fy F is, the plane will rotate around the X-axis. What happens when a picture is rotated around the Z-axis? This will have no effect on v Fz or vice versa. The answer is that it does not matter how the plane is rotated. Instead of considering the picture to be finite in size, consider it to be infinite while only points contained within the pyramid are projected. An infinite plane has no rotation around its normal. A more practical way of illustrating this is by considering a person tilting his head on a picture. The same effect can be created by tilting the camera. In other words, the rotation of the photograph does not matter. It is the relative position and rotation of the subject and the other photograph that is key in the described problem. H. Full photo modeling After solving the system of equations by providing seven points, either by human interaction or by a matching algorithm such as correlation, we can select additional points and run the algorithm for every point on the picture. This enables us to make a full 3D representation of the object shown on the picture. V. THE ALGORITHM IN SOFTWARE The mathematical problem which was shown in section II can be greatly simplified by making a few assumptions. This will eliminate the solving of a system of variables. A property that will be used to simplify the algorithm can be seen in the vector components shown in (9). In these equations can be seen that a line that is parallel with the X-axis in 3D space will also be parallel with the X-axis in 2D space. It will only appear smaller or further away from the center. It also shows that when v is relatively large compared to p, a small change of k only has a negligible effect. This can be easily seen in an example. Following equations are valid for the frontal projection plane only. a o k p (9a) x y x y x a o k p (9b) z z z y a o k v (9c) When the projected point is close to the center, changing the distance of the projected object will not affect p very much. If the intended purpose for the algorithm does not require large changes in the depth of an object, the equations can be reduced to: a o p (10a) x y x y x a o p (10b) z z y a o k v (10c) z This assumption has an important benefit in practical use. The physical distance can be indicated on a picture by using a ruler. Points which are only slightly closer or further away from the projection plane can still be used for measuring purposes. Further simplifications can be made by assuming that the photograph taken from the side of the subject is perpendicular 106

113 to the frontal photograph. In other words, v S is chosen (v x,0,0). This way any tilting of the camera around its normal vector can be compensated by software. Figure 5 gives a better insight into this compensation. The basic principle of the algorithm is matching a dimension that is shared by both pictures. I.e. the vertical distance between the selected incisor and canine. When either photograph is rotated, the measured distances y will not be the same in both pictures. This difference depends on the angle of the rotation. Suppose the frontal projection plane is rotated around its normal. d would not change, but y f would. The rotation could then be found by the following formula. f f y f s arcsin arcsin (7) d f y d f An analogue formula can be found for the rotation in the other projection plane. s y s f arcsin arcsin (8) d s y d s VI. REFERENCES [1] Matt Pharr and Greg Humphreys, Physically based rendering. San Francisco, United States of America: Elsevier, [2] Max Born and Emil Wolf, Principles of Optics, 7th ed. Cambridge: The Press Syndicate of the University of Cambridge, [3] Affonso Beato. (2011, February) Stereo 3D Fundamentals. [Online]. " ndamentals.html" Fig. 5 Rotation effects 107

114 108

115 Conceptual design of a radiation tolerant integrated signal conditioning circuit for resistive sensors J. Sterckx, P. Leroux Abstract This paper presents the design of a radiation tolerant configurable discrete time CMOS signal conditioning circuit for use with resistive sensors like strain gauge pressure sensors. The circuit is intended to be used for remote handling in harsh environments in the International Thermonuclear Experimental fusion Reactor (ITER). The design features a 5 V differential preamplifier using a Correlated Double Sampling (CDS) architecture at a sample rate of 20 khz and a 24 V discrete time post amplifier. The gain is digitally controllable between 27 and 400 in the preamplifier and between 1 and 8 in the post amplifier. The nominal input referred noise voltage is only 8.5 µv. The circuit has a simulated radiation tolerance of more than 1 MGy. TABLE I CIRCUIT SPECIFICATIONS Presettable voltage gain Voltage gain accuracy 2% -3dB Bandwidth 1 khz Supply voltage 24 v Output voltage level 12 v Input impedance > 50 kω Output impedance < 100 Ω Radiation tolerance 1MGy Temperature range 0 C 85 C Table 1 Circuit specifications of the circuit. Index Terms International Experimental Thermonuclear fusion Reactor (ITER), radiation effects, CDS, signal conditioning T I. INTRODUCTION ODAY, the demand for energy keeps on growing. While the fossil elements are exhausted. The challenge is to produce sustainable clean energy with CO2 as low as possible. Fusion may prove a key technology to global energy demands. The ITER facility in Cadarache, France is an experimental design of a tokamak nuclear fusion reactor aiming to prove technical feasibility. In this reactor, remote handling is required as radiation levels are too high for human interventions. In order to reduce the number of cables from the reactor, electronics is needed to locally amplify, digitize and multiplex the large amount of sensor signals. In this work a discrete time instrumentation amplifier is presented for interfacing a pressure sensor. The sensor that is used is a OMEGA PX906.[1] The sensor preamplifier has been simulated and designed using a SPICE-like circuit simulator. Simulation results will be used to compare the circuit performance with the target specifications. (Table 1) Simulations will also include the use of radiation models, based on previous work.[2] A. Preliminary Market Research The components from several different manufacturers are principally suited for the purpose of amplifying the signals from the Omega PX906 pressure sensor, but only few of them have a specified radiation tolerance. This tolerance is aimed for space applications (kgy) while this circuit must have a tolerance up to 1MGy. Hence an application specific integrated circuit needs to be designed. B. CMOS Radiation Effects In the envisaged application, the electronics will be exposed to ionizing radiation up to a life-time ionizing dose of 1 MGy. Ionizing radiation affects the behavior of the transistors mostly through charge generation and trapping in the intrinsic transistor oxides. This results in changes in the device s threshold voltage, mobility and subthreshold leakage current and may cause inter transistor leakage[3]. For linear circuits the main degradation effect is in the threshold voltage. For switches and transmission gates, also the subthreshold leakage is of concern. These effects need to be counteracted through radiation hardening by design and layout. For this design in 0.7 µm CMOS technology, the radiation dependent SPICE model is based on the data presented in [4] Jef Sterckx is with the Katholieke Hogeschool Kempen, Geel, Belgium. (tel. : , Paul Leroux is with the Katholieke Hogeschool Kempen, Geel, Belgium. He is also with the Katholieke Universiteit Leuven, Dept. ESAT-MICAS, Heverlee, Belgium and with the SCK CEN, the Belgian Nuclear Research Centre, Mol, Belgium, (tel. : , 109

116 C. Architecture selection As discussed in section 1, the resistive sensor Omega PX906 will provide the signal for the circuit s differential input. The circuit connects directly to the full Wheatstone bridge output of the PX906 pressure sensor (fig. 3). The switched capacitor architecture [7] is most suitable for the envisaged application as it offers the benefit of intrinsic rejection of both offset and 1/f noise. This is especially important in radiation environments, as the 1/f noise, related to surface defects, tends to increase under radiation as defects are created at the gate oxide channel interface. II. CIRCUIT DESIGN AND LAYOUT A. First stage Fig.3 : Full Wheatstone bridge for a pressure transducer with millivolt output There are three different circuit architectures that may be considered for use with this Wheatstone bridge sensor: Continuous time instrumentation amplifier [5] Fig. 1 example of a continuous time architecture Chopper modulated amplifier [6] Fig. 2: example of a chopper modulating architecture Discrete time switched capacitor amplifier [7] Fig. 3: example of a switched capacitor architecture In the continuous time instrumentation amplifier architecture, the main drawback is the 1/f noise that can t be distinguished from the actual signal, hereby lowering the SNR. Also the offset is amplified with the signal. This reduces the DC and low level accuracy of the amplifier. In the chopper stabilized architecture, the signal is chopped by a high frequency signal. This results in a PAM (Puls Amplitude Modulated) signal. The main drawback of this circuit is the low suitability for full integration Fig. 4: Switched capacitor circuit based upon the OTA The first single-ended differential stage uses an Operational Transconductance Amplifier (OTA) in switched capacitor feedback. The basic concept [8] is to sample and store the offset of the OTA during one phase (reset phase) and subtracting this value from het next phase (amplification phase). Because offset and 1/f noise are strong correlated, both contributions are drastically lowered [9]. The OTA is implemented as a wide-swing folded cascode amplifier in a mainstream low-cost 0.7µm CMOS technology (fig. 5). The circuit works with a power supply of 5 V with a common mode level of 2.3 V to maximize the dynamic swing. Several measures were taken to make the circuit radhard. First the supply voltage of 3.3 V is raised to 5 V. This allows more margin to push the transistors in saturation. An additional source follower (M11 doubles the DC voltage at the drain of M4.) is added to ensure that both M4 and M6 stay in the saturation region even after possible drops in the NMOS threshold voltage during irradiation. Biasing is realized by a reference current source which can be based on a radiation tolerant bandgap reference [10], and current mirrors. If under radiation the devices threshold voltage and/or mobility changes, the current through the stage is not affected and all transistors stay in the correct saturation region. 110

117 enabled by digital selection of the feedback capacitor, which is binary scaled between 2 pf and 8 pf. We use the same internal OTA design as from the first stage. All transistors are pushed 0.5V deeper into saturation to ensure operation in radiation environment. Fig. 5: internal OTA design from the first stage B. Second stage For compatibility with a 24V supply post amplification stage, an intermediate stage is needed for buffering the postamp input capacitance and for common mode translation. This level shifter translates the common mode input from 2.3 V to 12 V using high voltage DMOS transistors on the same chip. The 12 V input level ensures a high output range in the 24 V post amplifier. Transistors biased with a fixed voltage of 19.15V replace current sources. If the threshold voltage of the DMOS transistors shifts, then the current through Mls3 will change but the gate source voltage of Mls1 will stay equal to Mls3 (4.85 V) causing a constant DC output voltage of 7.15 V. The bulk effect is avoided by connecting the sources to the n-well bulk. The performance of this voltage mirror is improved by adding cascode transistor Mls5 causing even closer matching between Mls1 and Mls3 as they have identical V DS. The same is done in the second buffer stage increasing the DC output to a V T insensitive 12 V DC output. In this way the output remains exactly at 12 V inspite of V T shifts up to 0.5V at a dose up to 1 MGy. Resistors had to be inserted to minimize ringing, typical for a source follower loaded with a large capacitance. As no DC current flows through these resistors, there are no changes in the level shifter operation. Fig. 7: Last stage switched capacitor circuit based upon the OTA D. Switch Fig. 8: Pass-transistor The digital switches are realized with pass-transistors (transmission gates). The switch requires both a low on resistance for sufficiently low settling time, and a high offresistance to prevent charge leakage. In order to limit the leakage current under radiation, an enclosed gate layout for the NMOS transistors is required [3]. Dimensions are chosen to minimize effects of charge injection and clock feedthrough [11]. Switching happens at a rate of 20kHz. E. Capacitor banks To make the gain controllable, a single capacitor is replaced by a capacitor bank. The switches are again transmission gates. The capacitors are binary scaled, selectable using a digital control word. The input capacitance is 800 pf yielding a gain between 400 (2 pf feedback) and 27 (30 pf feedback) for the first stage. Fig. 6: Levelshifter C. Third stage This circuit is built around a DMOS based folded cascode OTA with an open loop gain of 60dB. The input capacitor of 16 pf is buffered by the level shifter. The controllable gain is Fig. 9 : Binary scaled capacitor bank 111

118 III. SIMULATION RESULTS The simulations discussed in this section are all based on the circuit in figure 10. The circuit consists of the preamplifier and the level shifter. The post stage isn t discussed here because it has minimal impact on the circuit s performance with respect to noise and timing and merely increases the maximum available gain. Fig. 11 : Level shifter output voltage for a 3mV 500Hz input and a gain setting of 400 Fig. 10 : preamplifier and the levelshifter A. AC performance The AC performance is demonstrated with the Bode plot of the open loop voltage gain of the preamplifier OTA. The DC gain is 93dB and the bandwidth is 700Hz. The switched capacitive feedback in this stage is configurable through the use of a bank of binary-scaled capacitors from 2 pf to 16 pf, which are selectable using a digital control word. The input capacitance is 800 pf yielding a gain between 400 (2 pf feedback) and 27 (30 pf feedback). The open loop voltage gain is unchanged after introduction of the maximum V T shifts. C. Noise performance Besides thermal noise, the 1/f noise is present. SPICE simulations were used to find het total input referred noise PSD of the OTA. The square root of the PSD is shown in the dotted line. The noise density increases for decreasing frequency. After the noise cancellation (CDS) the noise will be first order high pass filtered with a cutoff frequency of about f CLK =4.5 khz. This yields the corrected noise density shown in the full line in figure 12. Differences with maximum V T shifts applied are minimal. All simulation were performed with a maximum gain setting of 400. Fig. 12 : Input referred noise density of the OTA (25 C) Fig. 11 : Bode plot of the OTA voltage gain under normal operation B. Transient performance With a 3mV input signal at 500 Hz the transient performance of the amplifier at a gain setting of 400 is studied. This maximum gain setting corresponds to the minimum system bandwidth and hence maximum settling time. The input referred noise voltage PSD of the 2 stages amounts to 78µV. The main noise stems from the high-frequency noise in the reset phase. This noise originates from the Φ 2d switch at the input in figure 10. At high frequency the noise voltage from this switch is transferred to the inverting input of the OTA with a GHz bandwidth, which is determined by the low parasitic input capacitance of the OTA. In order to reduce this noise contribution a capacitor of 120pF was placed over both input Φ 2d switches effectively shorting their noise voltage at high frequency. The capacitance is small enough not to affect 112

119 signal-settling behavior. The total input referred noise voltage density is shown in figure 13 yielding a total input noise level reduction of 78µV to 8.5µV. The noise specifications of the commercial instrumentation amplifiers range from 2.4 nv/rthz to 55nV/rtHz, where the presented ASIC design features a noise density of 22nV/rtHz. Note that the specification from the other manufacturers do not take into account the noise from the additional external components and the lower noise level of the COTS components also owes in a large part to the higher power consumption ( a few hundreds of mw). In this design the power consumption is only 1mW and the noise performance can be improved by increasing the power consumption as the input referred noise density is inversely proportional to the square root of the current drawn by the amplifier. If the settling at 0 C is compared to the settling at 85 C, the settling time is increased due to a reduction of the OTA transconductance and an increase in the switch on-resistance both owing to a decreased channel mobility. E. Radiation behavior Also the influence of ionizing radiation on the transient settling and noise behavior of the circuit was simulated. Again, the minimum bandwidth and corresponding maximum settling time occur at the maximum gain setting of 400 this value is used in the following simulation. The output of the level shifter at room temperature for the same 3mV 500Hz input is shown in figure 15 when the maximum VT shifts of - 500mV were applied. Also simulations up to 85 C showed sufficient settling. Fig. 13 : Square root of the total input referred noise voltage PSD after addition of input capacitors. Fig. 15 : Level shifter output voltage for a 3mV 500Hz input and a gain setting of 400 (zoomed in to show sufficient settling) at room temperature after irradiation (maximum VT shifts). D. Temperature behavior The circuit needs to work guaranteed between 0 C and 85 C (cf. table 1). Main issues are the transient settling performance during the amplification phase and the noise of the circuit. As the minimum bandwidth and corresponding maximum settling time occur at the maximum gain setting of 400 this value is used in the following simulations. IV. CONCLUSION The SPICE-simulations show that under 1MGy radiation no significant changes are introduced in the circuit performance. A switched capacitor topology is chosen to reduce offset and 1/f noise errors. The gain is digitally controllable between 50 and 400 in the preamplifier and between 1 and 8 in the post amplifier. The nominal input referred noise voltage is only 8.5 µv. The circuit has a simulated radiation tolerance of more than 1 MGy Fig. 14 : Level shifter output voltage for a 3mV 500Hz input and a gain setting of 400 at a temperature of 0 C (left) and 85 C (right). REFERENCES [1] [2] M. Van Uffelen, W. De Cock and P. Leroux, Radiation tolerant amplifier for pressure sensors, final report, EFDA TW6-TVR-RADTOL2 Task, [3] ANELLI, G. M." Conception et caracterisation de circuits integres. Institut national polytechnique de grenoble."2000 [4] P. Leroux, S. Lens, M. Van Uffelen, W. De Cock, M. Steyaert, F. Berghmans, Design and Assessment of a Circuit and Layout Level Radiation Hardened CMOS VCSEL Driver, in IEEE Transactions on Nuclear Science, vol. 54, pp , August [5] R. Pallas-Areny and J. G. Webster, Sensors and Signal Conditioning, 2nd edition, published by Wiley 113

120 Interscience, [6] D. A. Johns and K. Martin, Analog Integrated Circuit Design, published by Wiley, [7] R. Gregorian, K. Martin and G. C. Temes, Switched- Capacitor Circuit Design, IEEE Proceedings, Vol. 71 no. 8, pp , August [8] B. Razavi, Design of Analog CMOS Integrated Circuits, published by McGraw-Hill, [9] W. Claes, W. Sansen and R. Puers, Design of Wireless Autonomous Datalogger IC's, published by Springer, [10]V. Gromov, et, al., A Radiation Hard Bandgap Reference Circuit in a Standard 0.13 µm CMOS Technology, IEEE TNS, vol. 54, no. 6, pp , Dec [11]G. Wegmann, E. Vittoz, and F. Rahali, Charge injection in analog MOS switches, IEEE Journal of Solid-State Circuits, vol. 22, no. 6, pp , Dec

121 Reinforcement Learning with Monte Carlo Tree Search Kim Valgaeren, Tom Croonenborghs, Patrick Colleman Biosciences and Technology Department, K.H.Kempen, Kleinhoefstraat 4, B-2240 Geel Abstract Reinforcement Learning is a learning method where an agent learns from experience. The agent needs to learn a policy to earn a maximum reward in a short time period. He can earn rewards by executing actions in states. There are learning algorithms like Q-learning and Sarsa that help the agent to learn. We have implemented these learning algorithms in reinforcement learning and combined them with Monte Carlo Tree Search (MCTS). We will experimentally compare an agent using Q- learning with an agent using MCTS. Index Terms Monte Carlo Tree Search, Reinforcement Learning, Learning Algorithms R I. INTRODUCTION INFORCEMENT LEARNING is a learning method where an agent needs to learn a policy. This policy determines which action to take in a certain state to get a maximum reward. The agent does not know what action is best in a state, the agent must learn this by trial & error. face of rewards and punishments. [2] There are two very important parts in reinforcement learning: 1. The agent: This is the learning part. The agent repeatedly observes the state of its environment, and then chooses and performs an action. Performing the action changes the state of the world, and the agent also obtains an immediate numeric payoff as a result. Positive payoffs are rewards and negative payoffs are punishments. The agent must learn to choose actions to maximize a long term sum or average of the future payoffs it will receive. 2. The environment: This determines every possible observation the agent can get or every possible state the agent can reach. In every state or observation the agent can choose from a number of actions. The agent needs to discover the best policy to earn a good payoff. To let the agent learn by experience we used reinforcement learning algorithms like Sarsa and Q-learning. A very important part of my thesis is the implementation of Monte Carlo Tree Search in reinforcement learning. MCTS gives the agent the opportunity to simulate experience, this should improve the learning rate of the agent. This paper is based on my master thesis in reinforcement learning [1]. In my thesis you can find the results of all the implementations we have done to improve the agent s learning rate. II. REINFORCEMENT LEARNING A. Introduction Reinforcement learning is the study of how animals and artificial systems can learn to optimize their behavior in the Figure 1: Reinforcement learning [3] In most cases the agent needs to learn a policy in order to reach a specific goal in the environment. But not all agents need to reach a goal, if the agent needs to perform a continuous task then there is no main goal. All the steps the agent took to reach a goal in the environment starting from the beginstate is called an episode. It is possible that an agent has a goal but cannot reach it. For 115

122 example: An environment that exists out of wall states (that the agent cannot cross) and open states (that the agent can cross). If the goal is surrounded by walls, then the agent cannot reach it. That is why we define a maximum number of steps the agent can take in an episode. If the agent cannot reach its goal and we do not define a maximum number of steps, the agent keeps taking steps to try to reach its goal resulting in an episode with infinite steps. Because the agent does not know that he cannot reach its goal he will keep trying to reach it. The sum of all the rewards the agent received during an episode gives us a view how well the agent performed that episode. This results in the following formula: ( ) Where R is the total reward in an episode, r is the reward per step and N is the number of steps in one episode. Rewards that the agent received in a short number of steps are more important compared to rewards the agent received in a lot of steps. By example: it would be better if an agent takes 2 steps and gets a reward of 5 then if he takes steps and gets a reward of 10. We can use the discount factor (γ) to make future reward less valuable. The total reward function will change to: ( ) If the number of steps (t) rises then the reward will decrease in value because γ t decreases in value every step the agent takes. A very important part of reinforcement learning is the Markov Decision Process (MDP), this can be represented as: MDP = <S,A,T,R> where S is a list of all the possible states the agent can reach, A is a list of all the possible actions the agent can choose, T is the transition function which determines the probability of every state the agent can reach if he performs an action in a state and finally the reward function which gives the agent a numerical reward when he performs an action in a state. During an episode the agent is always in a state and has the opportunity to choose an action. If the agent performs an action he will get a numerical reward and reaches a state determined by the transition function. The policy decides what action the agent should choose in every state. A lot of the reinforcement learning algorithms are based on estimating value functions that estimate how good it is for the agent to be in a given state. The notion how good here is defined in terms of future rewards that the agent can expect. The policy π determines the probability that the agent chooses action a in state s: π(s,a). The value of state s following a policy π, written as V π (s), is the expected total reward standing in state s and following policy π. The value function is shown by the following formula: ( ) * + { } ( ) Where E π { } denotes the expected value given if the agent starts in state s, follows policy π, and t is any time step. The value of a terminal state (by example the goal in the environment) is always zero. If the agent does not know its environment he does not know how to reach good states. The agent needs to learn its transition function to know how to reach good states. The agent learns this transition function with the use of 2 counters and a reward array. c1[s t ][a t ]: Every step the agent takes will increase counter c1 by one for state s t and action a t. c2[s t ][a t ][s t+1 ]: Every time the agent performs action a t in state s t and reaches s t+1 then the counter c2 will increase by one. r[s t ][[a t ]: This array saves the reward the agent gets when he is in state s t en performs action a t. If the agent is in state s t and performs action a t then you can calculate the probability that he will reach a certain state s t+1 with the following formula: ( ), -, -, -, -, - ( ) If you do not want to simulate the environment you can always use state/action values instead of state values. A state/action value is often called a Q value. If we choose an action a in a state s under a policy π we denote Q π (s,a) as the expected total reward standing in state s, choosing action a and thereafter following policy π. The Q function is shown by the following formula: ( ) * + B. Exploration vs. Exploitation { } ( ) An agent starts his learning process with no knowledge of his environment. He doesn t know which actions are better than others. The agent needs to explore if he wants to learn which actions were good in certain states. A simple way to explore is to choose random actions. Every action the agent chooses gives him some reward. At the end of an episode the total reward will help the agent to determine if the action was good or bad. We call this the exploration phase. The agent needs to explore the environment to learn. When the agent collected a lot of information about the quality of actions in certain states (Q value) he can determine which action is the best in which state. The agent is in the exploitation phase when he only chooses the best possible action in every possible state, such an action is called a greedy action. One of the challenges that arise in reinforcement learning and not in other kinds of learning is the trade-off between exploration and exploitation. [4] If an agent knows nothing of its environment he will start his learning process exploring his environment. The agent needs to explore enough until he discovers which path gives the most cumulative reward. When an agent always chooses the greedy action, he will never find 116

123 a better path. If during the learning process the agent starts to exploit too soon by always taking the greedy action then he may not yet explored the most rewarding path. If the agent exploits too late then he may lose some chances on a better reward in the whole episode, because the agent explored too much instead of choosing the most rewarding action. The Boltzmann distribution or ε-greedy can provide a solution for this problem. ε -greedy The greedy action is the action with the highest Q value in the current state of the agent. We chose ε so that there is a probability of ε when the agent will choose a random action (for exploration) and a probability of (1 - ε) when the agent will choose the greedy action. If ε is very small, the agent will mostly exploit, if ε is very big then the agent will mostly explore. If the agent uses ε-greedy and needs to select a random action then the agent can choose the second best action or the worst action possible. Boltzmann For some applications it is not always good when an agent needs to select a random action and he chooses the worst possible action in a state. The Boltzmann method lets the agent calculate the probability that he chooses an action in a certain state. If the agent does this for every possible action in a certain state he can choose which action to perform. The worst possible action will be given a very low probability to be chosen. The Boltzmann distribution is shown by the following formula: ( ) ( ) ( ) ( ) Qt(a) determines the quality of action a in the current state (=Q value). The τ parameter is called the temperature. If you take the limit to zero of the temperature the Boltzmann distribution gives the highest probability for the action with the highest Q value. If the temperature is high then another action can be given the highest probability to be chosen. III. LEARNING ALGORITHMS An agent uses learning algorithms to learn a policy. I have studied two important learning algorithms: Sarsa and Q- learning. A. Sarsa Sarsa generates a value for each state/action pair called a Q value. This Q value determines the quality of the action in that state. The formula to create the Q value for a state/action pair is: ( ) ( ) ( ), ( )- ( ) Where: Q(s,a): The new Q value for state s and action a. α : the learning rate, this value determines how much of the new Q value will be used to update the current Q value. This value needs to be between zero and one. Q(s t,a t ): The current Q value for that state and action r t+1 : De reward the agent gets when he performs action a t in state s t. γ: the discount factor. This value needs to be between zero and one. Q(s t+1,a t+1 ): The Q value for the next state. Sarsa is an on-policy algorithm. Each time the agent takes a step the Q value will be updated. Each update Sarsa will have to choose an action (a t+1 ) to perform in the new state, this action will be chosen by the policy of the agent. The following pseudo code explains when to update the Q value when you use online learning with Sarsa: Initialize Q(s,a) arbitrarily Repeat (for each episode) Initialize s Choose a from a using policy derived from Q (ε-greedy) Repeat (for each step of episode) Take action a, observe r t+1, s t+1 Choose a from s using policy derived from Q ( ) ( ) ( ), ( )- S s ; a a Until s is terminal It is also possible to update all the Q values at the end of the episode instead of every step the agent takes, if you want to use offline learning. B. Q-learning Another way to update the Q values is to use Q-learning. Q- learning will use the maximum Q value for the state/action pair in the next state. This gives the following formula: ( ) ( ) ( ), ( )- ( ) The parameters are the same then those of Sarsa only the ( ) is different. This function calculates the maximum Q value in state s t+1 for every possible action. Q- learning is an off-policy algorithm. It means that Q-learning does not follow the policy to choose the action in the new state. Instead it uses the action that has the highest Q value in the new state s t+1.the Q value update can happen each step the agent takes, or at the end of an episode. The following pseudo code explains when to update the Q value when you use online learning with Q-learning: Initialize Q(s,a) arbitrarily Repeat (for each episode) Initialize s Repeat (for each step of episode) Choose a from s using policy derived from Q (ε-greedy) Take action a, observe r t+1, s t+1 ( ) ( ) ( ), ( )- s s t+1 until s is terminal It is also possible to update all the Q values at the end of the episode instead of every step the agent takes, if you want to use offline learning. 117

124 IV. MONTE CARLO A. Monte Carlo Method The Monte Carlo Method is a way to estimate the value of a state or if the transition function is not available you can estimate the value of a state/action pair (Q value). The Q function of a state and an action will give the expected cumulative reward the agent will receive in one episode. If the agent made a lot of useless steps during an episode the estimate of the Q values will not be very good. If the agent wants to know how good an action in a certain state is he needs to make a better estimation of the Q value for that state/action pair. The agent needs to do multiple episodes where he always chooses the same action in a certain state. At the end of every episode the agent calculates the Q value for that state/action pair. When the agent completes his multiple episodes he takes the average of all the calculated Q values for that state/action pair. This will be a much better estimation for that state/action pair. If the agent always follows its policy (greedy) in a deterministic environment then he will only observe returns for one action in every state. We need to estimate the value for every action in each state, not just the one action we currently favor. That is why we must assure continual exploration, so that the agent chooses other actions in certain states and not only the ones we currently favor. The Monte Carlo method can slow down the entire learning process because for every step the agent takes he needs to do multiple episodes to take an average of every calculated Q value for that state/action pair. This takes time. The agent can also save the Q value every episode, this is done with Monte Carlo Tree Search. B. Monte Carlo Tree Search Monte Carlo Tree Search (MCTS) can be applied effectively to classic board-games (such as GO), modern board-games (such as SETTLERS OF CATAN), and video games (such as the SPRING RTS game). [5] MCTS can also be applied in the game Arimaa. [6] MCTS can be used for any game of finite length. Monte Carlo Tree Search (MCTS) builds a tree of states that starts with the beginstate and will be expanded to all possible states that the agent can reach. MCTS will use offline learning: The agent updates all his Q values at the end of an episode, to do this the agent saves all the information of the episode to arrays (all the state/action pairs and the rewards received are saved). The agent sends all this information to the MCTS algorithm. MCTS uses 4 operations to build a tree: [7] 1. The selection step: Start from the root node (beginstate) and traverse the tree in the best-fit manner (according to the exploration/exploitation formula) down to a leaf. 2. The expansion step: If in a leaf node a few simulations have gone through this node then expand the node. 3. The simulation step: Simulate one or more episodes with a policy. In our experiments we used a policy that always chooses random actions. 4. The back propagation step: The total result of the episode is back propagated in the Monte Carlo Tree. Figure 2: The four steps of MCTS The Monte Carlo Tree makes good use of the Q function to update Q values. When we used MCTS in our reinforcement learning program we implemented the Q-learning algorithm to update Q values instead of the normal Q function. Q-learning improves the learning rate of the agent. After a complete episode all the Q values for all the states that the agent reached in that episode are calculated by the Q-learning algorithm. All the calculated Q values will then be updated in the Monte Carlo Tree during the back propagation step. Monte Carlo Tree Search gives the agent the opportunity to simulate an episode. This is a very powerful advantage of MCTS, it should increase the agent learning experience enormously. If the agent would simulate an episode it should know his (partial) transition function, else the agent doesn t know where he is. We used the 2 counters and the reward array to get to know the (partial) transition function, we did the same for the value function in chapter 2. If we can simulate one or more episodes every step the agent takes, we can have a better estimation of the Q values for all the actions in every state. This should increase the performance of the agent. A. The environment Figure 3: A wall environment V. EXPERIMENTAL EVALUATION We chose a wall environment for the tests with Monte Carlo Tree Search. You have 3 possible state types in the environment: A wall state: If the agent tries to reach a wall he will be put back in his last known state. He will get a reward of

125 An open state: The agent can reach this state but gets a reward of -1. The goal: When the agent reaches this state he gets a reward of +100 and he wins the game. B. The agent We will test 2 implementations of agents. The first agent will be using standard Q-learning. The second agent will be using MCTS (with a combination of Q-learning). MCTS will be using its simulation technique to simulate an episode. We let the agent simulate 1000, 100 and 10 and 1 episode per step in each real time episode. Both agents will use the same parameters: A learning rate of 0.5 discount factor of 0.9 The Q-values will be initialized to zero They will use ε-greedy to select an action with ε=0.1 In one run the agent has 100 episodes to learn a policy, every episode there will be 200 tests to test the agents learning rate. There will be 50 runs so there will be a total of tests each learning episode. The start state of both agents is at (1,1) The agent that uses MCTS will have one advantage: the agent can simulate episodes every step he takes. The agent needs to learn its environment to simulate those episodes, therefor we give the agent 10 episodes to take random actions and learn the (partial) transition function. Every episode simulation the agent starts in his current state and applies the standard policy for simulation. The standard policy for simulation is taking random moves until the agent reaches its goal or if he reaches the gap of 200 steps in one simulated episode. Each agent needs to reach the goal and tries to minimize its steps to reach it. The shortest path the agent can take is 20 steps long. This includes 19 steps in open space and one step to its goal. The agent should receive (19 * -1) reward for the 19 steps and (100 * 1) reward for the step to its goal. This gives a total maximum reward of 81 each episode. C. The Results We tested an agent using standard Q-learning versus an agent that uses MCTS with 1 episode simulation every step he takes (Graph 1 and 2). The agent that uses standard Q-learning will perform better the first 50 episodes than the agent using MCTS. This is mainly because the agent can only simulate one episode every step. If the agent reaches its goal during the episode simulation, the random action the agent took the first step will always be the greedy action for the real time episode. This will change once the agent has tried more actions than only one every step. The actual learning of the MCTS agent starts at episode 10. The first 10 episodes are used to learn the (partial) transition function. The agent performs random actions the first 10 episodes. Graph 1: Agent with MCTS (1 simulation) vs. agent with Q-learning If we only focus on the positive cumulative reward we see that the agent using MCTS performs better after 50 episodes and he reaches the maximum cumulative reward after 60 episodes. The agent using standard Q-learning only gets a maximum cumulative reward of 67. When using ε-greedy with ε = 0.1 there is a probability of 10% when the agent chooses a random action, that s why the maximum cumulative reward of 81 is never reached in our experiments. Graph 2: Agent with MCTS (1 simulation) vs. agent with Q-learning (positive cumulative rewards) 119

126 If we compare 2 agents using MCTS but one agent can simulate 1 episode every step and another agent can simulate 10 episodes every step we see that the agent using 10 simulated episodes every step performs a lot better (Graph 3). Both agents use the first 10 episodes to learn the (partial) transition function. function. The very first episode the agent starts to learn, he already finds his optimal path. Graph 3: Agent with MCTS (1 simulation) vs. agent with MCTS (10 simulations) Can the agent perform better with more simulations every step? We tested 2 agents using MCTS where one agent performs 10 episodes every step and one agent performs 100 episodes every step (Graph 4). If we focus on the positive cumulative reward from both agents we can conclude that there is no difference between simulating 10 episodes and simulating 100 episodes every step for this environment. Graph 4: Agent with MCTS (10 simulations) vs. agent with MCTS (100 simulations) Graph 5: Agent with MCTS (1000 simulations) vs. agent with Q-learning VI. CONCLUSION The results from our experiment show that Monte Carlo Tree Search is a good improvement compared to a standard Q- learning implementation. The Monte Carlo Tree Search uses the Q-learning algorithm too, but its episode simulation quality makes MCTS a very powerful learning method in reinforcement learning as long as the agent simulates enough episodes to learn faster than the agent using only Q-learning. There is only one downside about MCTS simulation: It takes more time to run the simulations. It doesn't matter a lot when you simulate 1 episode per step, but when you simulate 1000 episodes per step you will notice a serious delay. REFERENCES [1] Kim Valgaeren, Monte Carlo zoekbomen bij het leren uit beloningen. Katholieke Hogeschool Kempen: Geel. [2] Peter Dayan, Christopher J. Watkins, Reinforcement Learning. University of London: London, pp. 4. [3] Tom Croonenborghs, Model-assisted approaches for relational reinforcement. Katholieke Universiteit Leuven: Leuven, pp. 12. [4] Richard S. Sutton, Andrew G. Barto, Reinforcement Learning: An Introduction. The MIT Press, Cambridge: Massachusetts, 1998, pp. 15. [5] Guillaume Chaslot, Sander Bakkes, Istvan Szita, Pieter Spronck, Monte-Carlo Tree Search: A New Framework for Game AI. Universiteit Maastricht: Maastricht, pp [6] Tomas Kozelek, Methods of MCTS and the game Arimaa. Charles University, Prague, [7] Mark H.M. Winands, Yngvi Bj.Äornsson, Jahn-Takeshi Saito, Monte- Carlo Tree Search Solver. Universiteit Maastricht: Maastricht, pp To demonstrate the huge advantage of episode simulation with MCTS compared to an agent using standard Q-learning, we implemented an agent that simulates 1000 episodes every step he takes (Graph 5). The agent using MCTS knows the shortest path on the 11 th episode. The agent using Q-learning does not know the optimal path after 100 episodes. We can see that the graph of the agent using MCTS looks a lot like a step 120

127 String comparison by calculating all possible partial matches R. Van den Bosch 1, L. Wouters 2, V. Van Roie 1 1 K.H.Kempen, Geel, Belgium 2 tbp electronics n.v., Geel, Belgium Abstract Searching for a keyword in a large text can be done by going through the whole text and look for a match. Comparing strings to determine the best match is much more difficult. For humans it is usually easy to determine which string matches best. Programming code that looks for partial matches and determines the best match is not as easy as it sounds. It is possible that there are several matches, how to find the best match? First naïve comparison and IndexOf algorithm are programmed to find all possible partial matches. Possible optimizations for faster comparison are briefly described. Search functions that are built in most languages are highly optimized and therefor much faster than most custom made for-loops. Next we need to quantify the comparison based on the overlapping parts. This value creates a division between all comparisons. The desired result is a fast algorithm which also indicates overlapping parts. This result should match the human intuition when it comes to string comparison. Index Terms pattern matching, text processing, performance analysis W I. INTRODUCTION AND RELATED WORK HEN it comes to verifying a manufacturing part number (MPN), we would expect it to require a 100% full match. This is not always true. The MPN for a requested product does not always contain package details (e.g. the size). While the delivered MPN does contain some package details. The correct product is delivered but it has no full MPN match with the requested product. An alternative comparing method is the solution. We have to ease the verification process to allow a wider variety of allowed MPN to pass. When searching for a keyword in a text file, the search algorithm will look for a full match. When a full match is found, it stops looking. Various algorithms for exact string matching exist [1]. Naïve linear search algorithms are easy to program, reliable but very inefficient [2]. Some search algorithms have been improved over the years. This results in faster algorithms. The Knuth-Morris-Pratt (KMP) algorithm learns from its previous mismatches. This is difficult to program and to debug but is much more efficient than the naïve string matching method [3]. The theories behind these faster algorithms are used to speed up our algorithm. If there are multiple MPN s to be verified, we need some indication to tell which MPN matches better than the other. Levenshtein distance is a method that can be used to quantify the similarity between strings [4] [5]. Sam Chapman has developed an open source java library SimMetrics. This library has various similarity measures between strings [6] [7]. All partial matches or overlaps should be marked. Neil Fraser has done some research and did some performance tests about overlap detection [8] [3]. This paper deals with two problems. The first problem is finding all partial matches and the second problem is quantifying the comparison. A. Expected result II. APPROACH IN THEORY Searching for pattern anat in text bananatree is simple. There is only one match. Let baxxanata be the pattern where XX can be any character except n. The search for this pattern in text bananatree results into two partial matches: ba and anat. This is what we are looking for. We compare character by character and find that character a has multiple matches. But we don t want the last a in the pattern to be marked. In the text there is no a after the last match anat. Example 1 illustrates the result we want to achieve. ban anatr e e baxxanata Example 1: Overlap result. Next we need to add a quantifier for each comparison to determine the best match. For example compare pattern ban, bananat, bananatree, bananatreexxxx and bananatreexxxxxx with text bananatree. There is only one full match but which is the second best match? Example 2 shows the desired results, best match first. bananatree 1: bananatree 2: bananatreexxxx 3: bananatreexxxxxx 4: bananat 5: ban Example 2: Quantify result. 121

128 B. Searching for overlap First step is naïve string comparison using two nested for loops. Figure 1 shows an illustrated example. All characters found in string 1 and in string 2 are marked. Note: this also includes the last a from pattern banxxanata. Now we need to find the longest continuous match. This is determined by the sum of all cells marked with an X that form a diagonal line. This sum is 4 for the example in figure 1. The longest continuous match is anat. Next we have to eliminate some other matches that overlap with anat. Every cell that is on the same row or column as anat can be cleared. This is marked with the colors purple and blue in figure 2. Also no characters in the pattern after anat can match a character in the text before anat. These cells can also be cleared and are marked with stripes in figure 2. These two steps have to be repeated until all cells are empty. To continue, the next longest match will be ba. All cells are empty after this step and the process is finished with results ba and anat. C. Quantifying similarity b a n a n a t r e e 0 b x 1 a x x x 2 X 3 X 4 a x x x 5 n x x 6 a x x x 7 t x 8 a x x x Figure 1: Naïve string comparison b a n a n a t r e e 0 b x 1 a x x x 2 X 3 X 4 a x x x 5 n x x 6 a x x x 7 t x 8 a x x x Figure 2: Eliminating matches. The most used method to quantify a comparison is Levensthein distance [4] [5]. Basically it counts the amount of steps it takes to change the pattern to match the text. For most applications this is a very effective method. When it comes to comparing MPN, we need to add some weighting parameters. Figure 3 shows the results of Levensthein Distance calculated for example 2. Pattern bananat has a better matching distance than pattern bananatreexxxx. This is not what we expect because it has an exact match with a suffix XXXX. Levensthein Distance works with costs. Each operation has a cost. In the basic version each operation (substitute, insert and delete) has the same cost. A copy is free. Some extended versions of the Levensthein algorithm allow a user to change the cost for certain actions. Substitution can have a higher cost value then a delete or insert action. Let s compare banana tree with banana monkey and banana. The amount of steps needed to get from string A to string B are for both comparisons the same. But banana monkey should match better than banana because it has an overlap of length 7 due to the space in the text and pattern. Based on performance, it is not preferred to add another algorithm for similarity calculation. This involves another set of nested loops. Building it into one overlap algorithm is a better approach. Again the longest continuous overlap is the most important and will be used to quantify the similarity. Long continuous overlapping parts should score better than a series short overlapping parts. The score is calculated by sorting each overlapping part by length. This length is divided with an increasing power of 2. The first overlapping length is divided by 2 0, the second by 2 1 and so on. Figure 4 illustrates this method. The score for pattern banxxanxxtree is calculated as follows:. The number to be raised to a power can be changed to 3 or more if the continuous length is more important. Last we need to create a difference between all full matches, which have a score of 10 in figure 4. The easiest way to do this is to include the length of the pattern as a ratio with the text. Pattern bananatreexxxx has an overlap ratio around 71 percent while bananatreexxxxxx results in an overlap ratio around 63 percent. The final result is calculated by taking the sum of the score in percentage and the ratio. When there is no full match, the ratio will be 0. This result is divided by 2 to normalize it. A perfect match bananatree will result in the maximum normalized score: result:. Pattern bananatree Lev. Dist. bananatree 0 bananat 3 bananatreexxxx 4 bananatreexxxxxx 6 ban 7 Figure 3: Levensthein Distance calculated for example 2. bananatree Pattern Score Ratio Result bananatree bananat (0) 0.35 bananatreexxxx bananatreexxxxxx banxxanxxtree (0) 0.3 Figure 4: Similarity quantification using overlap.. Pattern banana has 122

129 A. Find all character matches III. ALGORITHM IN PRACTICE To search for all individual matches it s required to nest two loops, one for the text and one for the pattern. This is the naïve method. The result is a byte type 2D array where all matches are marked with either 0 for a mismatch or 1 for a match (see figure 1). Let P be the pattern and T the text, the code in C# will be: //result byte array byte[][] r = new byte[p.length][]; //loop through pattern for (int i = 0; i < P.Length; i++) { r[i] = new byte[t.length]; } //loop through text for (int j = 0; j < T.Length; j++) { r[i][j] = (byte)((p[i] == T[j])? 1 : 0); } B. Find longest match Next step in our algorithm is to iterate through the byte array and return the longest continuous match. Again two loops are nested to find the first match. Inside the inner loop another loop checks the next item, one column and row further. If this item also has a match it will check the next item and so on. The longest continuous match is stored. In this C# code we store the maximum continuous length ml, first index for this match in the text mj and first index for this match in the pattern mi : int ml = 0, mi = 0, mj = 0; for (int i = 0; i < P.Length; i++) { for (int j = 0; j < T.Length; j++) { } } //search diagonal for longest continous match for (int l = 1, ii = i, jj = j; ii < P.Length && jj < T.Length && r[ii][jj] == 1; l++, ii++, jj++) { if (l > ml) { ml = l; mi = ii - l + 1; mj = jj - l + 1; } } Now that we have the longest match, it is time to clear this match along with all the other matches that are not valid anymore. This process will take two clearing steps. First we clear every match where j > mj and i < (mi + ml). Next we clear every match where j < (mj + ml) and i > mi. In C# code this gives: //bottom left rectangle for (int j = 0; j < ml + mj; j++) { for (int i = mi; i < P.Length; i++) { r[i][j] = 0; } } //top right rectangle for (int i = 0; i < ml + mi; i++) { for (int j = mj; j < T.Length; j++) { r[i][a] = 0; } } These two steps have to be repeated until no matches are found (ml == 0). C. Quantify the comparison We need to find the best pattern in the similarity results between multiple patterns and one text. Similarity calculation for each pattern can be done in C# code like this: //overlap calc double a = 0; for (int i = 0; i < arrmatch.count; i++) { a += (double)arrmatch[i] / Math.Pow(2, i); } a = a / (double)t.length; //length calc double b = (double)t.length / (double)p.length; //precent result double p = (a + b * ((a == 1)? 1 : 0)) / 2; Where P is the pattern, T the text and arrmatch is an array containing all partial overlap lengths. The result is a percent that quantifies the comparison. First step includes the overlap with division by 2 with an increasing power. This is to give a higher ranking to a longer continuous match. If the length of a continuous match is more important, this can be changed to a division by 3 or more with an increasing power. Second step a calculation to include the length of the pattern comparing the length of the text. A short pattern compared to a long pattern with the same overlap is more accurate. In the final result we only include this length calculation when the pattern has a full continuous match with the text (a == 1). IV. OPTIMIZATIONS The naïve method is easy to understand but it s not optimized for speed. Some adjustments can be made to increase the performance of the algorithm. The most important optimization is to reduce the amount of comparisons. Some comparisons are not necessary. A comparison between two characters should only be calculated once. It is a waste of time to calculate the same comparison twice. KMP-algorithm [3] is based on this theory. Other reasons could be that the results being calculated are never used or cleared in a second iteration. Look at figure 5 where some unnecessary results are b a n a n a t r e e 0 b x 1 a x x x 2 X 3 X 4 a x x x 5 n x x 6 a x x x 7 t x 8 a x x x Figure 5: Optimization. marked in red. All these cells can be skipped during first iteration. The right column is marked because after the first loop through the text we already found a continuous match of two characters. It s not possible to have a longer continuous match because the text ends there. Same conclusion goes for the lower two rows. 123

130 Based on the KMP-algorithm we can create a table with only the unique characters between the text and pattern. This way we can eliminate double comparisons. For example in figure 5 row index 1 contains character a. In the text are three occurrences for character a. We can add an x for all those three cells at once. A totally different approach in optimization is the use of fuzzy searching. Look at the row index 7. Before we find the match with column index 6 we have to go through six other columns. With fuzzy searching there is a chance that searching starts at column index 6. If the next fuzzy search is for column index 4, no other comparison has to be made for this row. The longest continuous match is now already found. It s clear that optimization is difficult to implement and debug. Some optimization techniques are only efficient for specific input strings. Therefore we consider optimization for the naïve method as future work in this paper. Almost each programming code has its own search functions. These methods are highly optimized and could result in a faster result. Therefor we shortly look at the IndexOf function that is built in C#. The theory behind this is shown in figure 6. Each iteration searches for a match starting T: bananatree P: baxxanata Search 1 T: bananatree P: baxxanata Search 2 T: ban ree P: baxx a Search 3 T: n ree P: XX a Figure 6: IndexOf comparison. with a single character. When a match is found, the search is expanded with one character. In the first step we search the whole text and find that anat is the longest continuous match. In the next iteration the previous result is removed from both the pattern P and text T. Now we search again for a single character but not over the entire length. The search is limited to the section before the previous result in the text and pattern. Another search is limited to the section after the previous result in the text and pattern. This is marked with the colors in figure 6. In other words we only search for overlaps between two parts that have the same color. Similar to the naïve method, this method also has two steps. First step to search for a match and second step to cleanup. Search uses the IndexOf and Substring functions. Cleanup splits the text and pattern using the Split function that returns an array containing the remaining substrings. Each iteration step stores the best match in a variable. We store best length bl, best index in pattern bi, best index in text bj and best array index bn. In C# code the search is: for (int n = 0; n < T.Length; n++) { for (int i = 0; i < P[n].Length; i++) { int j = -1, l = 1; } } And the cleanup is: while (i + l <= P[n].Length) { j = T[n].IndexOf(P[n].Substring(i, l)); } //not found if (j == -1) break; //found if (l > bl) { bl = l; bi = i; bj = j; bn = n; } //look for next l++; //overlap string o = P[bn].Substring(bi, bl); //loop string[] nt = new string[t.length + 1]; string[] np = new string[p.length + 1]; int c = 0; for (int n = 0; n < T.Length; n++) { string[] st = new string[1] { T[n] }; string[] sp = new string[1] { P[n] }; } if (n == bn) { st = T[n].Split(new string[1] { o }, 2); sp = P[n].Split(new string[1] { o }, 2); } for (int a = 0; a < st.length; a++) { nt[c] = st[a]; np[c] = sp[a]; c += 1; } This code is also not optimized. There are some comparison steps that can be discarded. This will not be discussed here and belongs to future work. V. EXPERIMENTS The difference between the naïve and IndexOf method is shown in figure 7. Pattern length is 200 and text length is increased from 1 to 200. If we change this so that text length is 200 and increase pattern length from 1 to 200 we ll get almost Figure 7: Text length increase and pattern length fixed 200. the same results. In these experiments each comparison is calculated 100 times with the average execution time plotted 124

131 on the graph. Each string is randomly generated with a maximum of 26 different characters. Figure 8: Text char difference increase 1-95 and pattern char fixed 95. Besides the length we can also change the difference between strings. Let s vary the text from 1 to 95 maximum different characters while the pattern has a fixed maximum character difference of 95. Results are shown in figure 8. If we Figure 9: Text char fixed 95 and pattern char difference increase change this so the pattern varies from 1 to 95 and the text has a fixed 95 maximum difference in characters, we get results as shown in figure 9. These experiments are created with a string length of 200. Each comparison is calculated 100 times with the average execution time plotted on the graph. The trendline is automaticaly generated. In figure 8 and figure 9 the overall trend seems more like a logarithmic function instead of a straight line. Figure 10: Text char difference increase 1-95 and pattern char fixed 1. When we set either text or pattern to a character difference of 1, two spikes shown up in the beginning (see figure 10). The spikes show a full match between text and pattern. VI. CONCLUSION Comparing two strings to highlight the matching parts can be done in many ways. We compared two methods, a naïve and IndexOf method. Both are based on calculating partial matches in the first step and cleanup in the second step. The naïve method creates a table where all character matches are marked with 1. Further calculations are based on this table (2D array). The IndexOf method splits the input strings based on the longest continuous match. Every comparison is calculated based on the corresponding substring parts. The cleanup step removes the longest continuous match found and also removes all overlapping matches with this result. This is repeated until no matches are found. Finally we loop through all partial overlaps to quantify the comparison. A continuous full match gets a higher ranking than an interrupted full match. This is done with a simple math calculation based on continuous overlap and string length. Experiments show that the IndexOf method is faster in all cases. In terms of string length, it remains the same whether you compare string A with string B or string B with string A. The execution time for the IndexOf method will gradually increase as the string length increases. The naïve method on the other hand will increase in execution time much faster (see figure 7). Another important factor is the variety of characters. Here the naïve method is significantly slower than the IndexOf method. The comparison between a string with just one type of character and a string with 95 different types of characters goes rather fast. The execution time will increase very fast as the string variety increases. Here we see a difference between comparing string A with string B or string B with string A. If the text has a wider character variety relative to the pattern, the execution time for the naïve method will gradually increase (see figure 9). The opposite happens when the text has a narrower character variety relative to the pattern, the execution time for the naïve method increases more strongly (see figure 8). Experiments also show that a complete match takes a lot of execution time (see spikes on figure 10). A complete match requires a lot of the algorithms. Therefore it is recommended to do a simple pre-match prior to the algorithms. REFERENCES [1] C. Charras and T. Lecroq, Exact String Matching Algorithms, Université de Rouen, January 1997 [2] R.S. Boyer and J.S. Moore, A fast string searching algorithm, Comm. ACM 20 10, October 1977, pp [3] D.E. Knuth, J.H. Morris, and V.R. Pratt, Fast pattern matching in strings, SIAM Journal on Computing, 1977, pp [4] M. Gilleland, Levenshtein Distance, in Three Flavors, Available at Februari 2011 [5] Li Yujian and Liu Bo, A Normalized Levenshtein Distance Metric, IEEE transactions on pattern analysis and machine intelligence, vol. 29, no. 6, June 2007 [6] S. Chapman, SimMetrics, an open source Java library, Available at Februari 2011 [7] W.W. Cohen, P. Ravikumar and S.E. Fienberg, A Comparison of String Distance Metrics for Name-Matching Tasks, Carnegie Mellon University, 2003 [8] N. Fraser, Overlap Detection, Available at November 2010 [9] D. Eppstein, Knuth-Morris-Pratt string matching, Available at May

132 126

133 Beamforming and noise reduction for speech recognition applications B. Van Den Broeck 1, P. Karsmakers 1, 2, K. Demuynck 2, 1 IBW, K.H. Kempen [Association K.U.Leuven],B-2440 Geel, Belgium 2 ESAT, K.U.Leuven, B-3001 Heverlee, Belgium AbstractIn order to have adequate speech recognition results the speech recognizer needs a decent speech signal which doesnt contain too much noise. One way to obtain such signal is to use a close talk-microphone. However, this might be impractical for the user in certain situations, e.g. when the user lays in bed. A more practical and comfortable alternative for the user is the use of a contactless recording device acquiring the signal at a relatively large distance from the speaker. In the latter case intelligent signal processing is required to obtain an appropriate speech signal. In this paper the signal processing part is covered by a beamformer with noise-cancelation, a commonly used one is called GSC (Generalised Side lobe Canceller). Under ideal circumstances the GSC works quite well. However when microphone mismatch is taken into account the resulting speech signal can be distorted, which might seriously impact the accuracy of a speech recognizer. Since the demands for matched microphones is expensive, new methods were develop to reduce the effect of microphone mismatch by using altered versions off some well know GSC algorithms. Two of these are SDR-MWF and SDR-GSC (resp. Speech Distortion Regularized Multichannel Wiener Filter and Speech Distortion Regularized Generalized Side lobe Canceller). In this work SDR-MWF and SDR-GSC will be compared with their non SDR equivalents which are not designed to handle microphone mismatch. The comparison is carried out both theoretically as well as practically by simulations and experiments using non matched microphones showing the positive effect of the SDR-MWF on the recognition of speech. Here we will show a improvement of 20% in word error rate for a noisy environment and 2% for a less noisy environment. Furthermore some practical issues are handled as well, such as a method for easy assessment of the beamformer and a way to generate good overall extra noise references. Index TermsNoise reduction, Speech Distortion Regularized Multichannel Wiener Filter / Generalized Side lobe Canceller, Sum and delay beamformer F I. INTRODUCTION OR speech recognition applications a close-talk microphone is often used to guarantee a clean speech signal. But this solution has the drawback that the user needs to wear the close-talk microphone. When sound is acquired at greater distances with a single microphone the speech signal proves to be too corrupted for speech recognition, especially in a noisy environment. One way to overcome this problem is to use a multi microphone system to form a sum and delay beamformer [1]. This will suppress sound coming from all directions except one desired direction. Normally sum and delay beamformers are described by transfer functions for all directions [1], where the many variables make it quite difficult to assess them. In this paper we will introduce a energy based directivity pattern, from this pattern we can assess the beamformer simply by a beam width and a suppression for undesired directions. In order to further suppress the noise an adaptive filter is often used such as NLMS or RLS (resp. Normalized Least Mean Square, Recursive Least Square). These techniques are better known as GSC (Generalized Side lobe Canceller) and are further explained in [2]. In theory these techniques work quite well but due to some imperfections such as microphone mismatch the resulting speech could contain al lot of speech distortion [3]. This will prove to be disadvantageous for the recognition. In the literature methods are described to overcome this by using ultra high quality microphones or by manually matching microphones, but this will result in a expensive end product (either by expensive hardware or labor). However, techniques are available which partially overcome the latter problem by using a different algorithm to update the adaptive filter without resulting in a more expensive end-product. Two of these techniques are SDR-MWF and SDR-GSC (resp. Speech Distortion Regularized Multichannel Wiener Filter and Speech Distortion Regularized Generalized Side lobe Canceller) [3]. In this work SDR-MWF, SDR-GSC, a regular MWF and a NLMS will be compared both on theoretical grounds and in practice. As a criterion for comparison both noise reduction and speech distortion will be used. This will show the positive effect obtained by a SDR algorithm and the up and downsides of a MWF and a GSC. Additionally the effects when using an extra noise reference are investigated. Furthermore, we will elaborate on the positioning of these extra noise microphones. II. BEAMFORMING AND NOISE REDUCTION ALGORITHMS A. Beamforming The sum and delay beamformer is depicted in fig. 1. It consists out of multiple microphones in a single line. The theory behind this beamformer is relatively easy. First we compensate the different delays of the sound coming from the desired direction due to different distances traveled. Then we 127

134 just take the mean of each sample from each microphone. Sound coming from the desired direction will constructively interfere and hence will be pass unmodified. Sound coming from undesired directions (which is assumed noise) will partially interfere deconstructive and hence will be suppressed. This way the SNR (signal to noise ratio) will improve compared to the output of one microphone. Mathematically this system can be described as: If X(f)=1 for f [0 f nyquist ] (white noise) then the energy ratio of the output of the sum and delay beamformer and the input of one microphone can be calculated as: where, (2) fig. 1: Sum and delay beamformer. (1) In equation 1 we can notice that if source = desired then H() becomes 1, and all sound will pass. In order to gain further insight in the properties of the sum and delay beamformer, we evaluated an example. A directivity pattern for 3 microphone beamformer is shown in fig. 2. The microphones are equidistantly spaced with 5cm between each of the microphones. It can be noticed that for source desired (=90 ) the gain will be smaller than one. But its difficult to assess the beamformer from this figure. Therefore a different way of depicting the directivity pattern off the beamformer will be introduced. fig. 2: f-curves, 3mic's 5cm spaced. (3) In equation 2 and 3 it is seen that the term 1/M stands for the average suppression of sounds coming from undesired directions. So we can state that the average improvement in SNR is only dependent on the number of microphones used and not the spacing of them. We also see that the gain from adding one microphone diminishes when the number of microphones is already large (also stated in [4]). The summations of sinc functions will form the beam. It can be stated that the smallest sinc is formed by the largest d i -d k (furthest spaced microphones), and this will primarily determine the beam width. Applying equations 2 and 3 to the setup from fig. 2 we end up with fig. 3Fout! Verwijzingsbron niet gevonden.. Here we can clearly talk about a formed beam. fig. 3: Energy-plot, 3mic's 5cm spaced B. Noise reduction algorithms It has been shown that the sum and delay beamformer can improve the SNR of the speech, but still falls short when used for speech recognition applications [2]. Noise from undesired directions is still present and noise coming from the desired direction will not be suppressed at all. In order to further enhance the speech signal an additional noise reduction stage is needed. Most noise reduction algorithms are based on an adaptive filter. Here SDR-MWF, SDR-GSC and their non speech distortion regularized equivalents are compared. In the next paragraph first the SDR-MWF will be explained. Next, the other methods are explained using the SDR-MWF framework. The scheme of the SDR-MWF is depicted in fig. 4. To the left we see the microphones and the previously described beamformer (A(z)). We also see a blocking matrix (B(z)), this 128

135 will in fact do the opposite of the beamformer. It will take the differences of consecutive microphone inputs so that the speech in it will be removed to form noise references. All of these outputs are passed to the SDR-MWF. The block represents a delay. This way (when we talk about the speech reference sample wherefrom we want to estimate the noise being the present) the filters w can contain as much samples from the future as they contain samples from the past. fig. 4: SDR-MWF [5] (6). (7) To compute [n ] information about the pure speech is needed, which obviously is unknown. The solution lies in the use of a VAD (voice activity detection) so that we can collect information about the noise and the noise+speech parts individually. Combining this information gives us well estimated statistics about the pure speech. This is handled in detail in [5]. In equation 6 there is also a parameter which stands for the step sizes taken to update the filters. Using a fixed can result in a instable update. Therefore a normalized step is used similar to NLMS [6]. Now the step will be normalized by the energies of the speech and the noise. The parameter needs to be small and is added so that can not become infinite. Now the idea in SDR-MWF is to update the filters w so that the combined output represents the noise in the speech reference. This preferably without containing too much speech because this will introduce speech distortion. Notice that this can only happen when the noise references contain speech which will be so when the microphones are not matched or when the filter w 0 is used. To further explain the algorithm some auxiliary are introduced:. (4) The goal is to update the filters w in order to minimize the cost function (equation 5). Notice that the superscripts s and n stand for the speech and noise parts of the signals and stands for the expected value. The second term stands for the energy of the residual noise in the output. Minimizing this will suppress the noise. The first term stands for the energy of the speech distortion. The inclusion of this term makes the algorithm speech distortion regularized. There is also a weighing parameter µ which will regulate the importance of the second term against the first term. In this way an appropriate balance between noise reduction and speech distortion [3] [5] can be obtained.. (5) The cost function is minimized by gradient descent, which resulted in the following update function is obtained [3] [5]: Again information about the pure speech is required. A similar solution like for w (equation 6 and 7) is applied, which is further described in [5]. SDR-GSC is identical to SDR-MWF except for the fact that the filter w 0 is not used. The non SDR equivalents are derived by excluding the first term in the cost function [3]. This can also be accomplished by making µ very large. An additional remark can be made about the noise references. In fig. 4 all noise references are obtained by the blocking matrix but this does not have to be so, other noise references can be used as well. These can be formed anywhere. Only two constraints need to be satisfied. First the noise in these references must be formed by a noise source that is also the cause of noise in the speech reference. Secondly, the SNR in these references could best be low in order to avoid speech distortion. In the experimental section an extra set of microphones with a blocking matrix will be used to achieve such a noise reference. III. SIMULATIONS A. Comparison of noise reduction algorithms In this section a comparison of the discussed noise reduction algorithms will be made. The setup was as follows: a three microphone beamformer was used with 5cm spacing. A speaker was simulated at 90 to this beamformer. The speech consisted out of 2 sentences, the first to initialize the filters and the second to validate the result. This results in an energy pattern as depicted in fig. 5. Then a pink noise source (8) 129

136 was simulated at various angles from 0 to 90 to form a noisy signal with a SNR of 5dB at the output of one microphone. All microphones were simulated as ideal, there was no microphone mismatch. fig. 5: Energy pattern for simulation A The noise reduction algorithms where run on this data. The used parameters where: L (filter length of one filter) = 11, = 0.1, = 0.1 and = ( is a parameter used for estimating the information for the pure speech, see [5]). These parameters proved to work quite well in previous simulations (not included in this paper). And this was done for various values of µ. The result was judged by the energy of the residual noise in the output, the energy of speech distortion in the output and the gain in SNR from the output of one microphone to the output of the noise reduction algorithm (the energy of speech distortion is also considered to be noise). The results for the SDR-MWF are shown in fig. 6 and the results for the SDR- GSC are given in fig. 7. First the results obtained by the SDR-MWF will be discussed. It can be noticed that the energy of the residual noise is decreasing with increasing µ. At the same time the energy of distortion is increasing. This is as expected since µ is the parameter that will trade off noise reduction for speech distortion. When looking at the resulting SNR gain it can be seen that this forms an optimum for one value of µ, where the residual noise and speech distortion form a best balance. Notice that this is why the SDR algorithm was chosen to be used here in the first place. As mentioned before, the non SDR equivalents can be found by setting µ very large which might cause a lot of speech distortion. When looking at the different noise angles it can be noticed that at 90 (1,5708 radians) the result is at its worst. This is caused by two things. First the beamformer will not reduce any noise because this is the desired direction. And second, at this angle the noise references formed by the blocking matrix will not produce any noise (this is blocked the same as the speech is blocked). The only useful noise reference is the speech reference (by w 0 ). So now the noise has to be estimated out of a signal which contains a lot of speech. Therefore the algorithm must focus very hard on reducing the speech distortion so that noise reduction is hard to do. But looking at the SNR gain it is still seen that a small improvement can be achieved. fig. 6: Results for SDR-MWF, at various noise angles Now when considering the results obtained by the SDR- GSC it is seen that the value off µ has hardly got any effect. Because of the perfectly matched microphones and not using w 0, none of the noise references contain any speech. This way the algorithm only has to focus on the noise reduction no matter what value for µ is used. When microphone mismatch would be introduced this will not be the case, but still we can expect a large interval of appropriate choices for µ. This will make the SDR-GSC more easy to set up with appropriate parameters. The downside of SDR-GSC can be seen at a noise angle of 90, here there will not be any improvement. The beamformer does not remove any noise and the noise reduction algorithm has no noise references left (not using w 0 ). fig. 7: Results for SDR-GSC, at various noise angles B. Extra noise reference (placement) In previous simulation it is shown that SDR-GSC could perform better in terms of speech distortion but has the disadvantage that no noise references are left when noise is coming from the desired direction. The latter can be overcome by using extra noise references. It was already stated that the noise references should contain no (or very little) speech, 130

137 therefore extra beamformers with a blocking matrix, directed at the speaker, will be used. Now there is still the question where to position them. First, they need to be placed away from the original beamformer otherwise the previous problem occurs. Second, it must be noticed that most noise has a poor autocorrelation. Therefore the noise in the noise reference cannot be delayed from the noise in the speech reference (or at least the delay has to stay within the bounds formed by the filter lengths) and this for all possible desired directions. This means that the extra beamformers cannot be placed too far away from the original beamformer. This leads to the following setup: two extra beamformers (2 microphones, 5cm spaced) are placed at 0.5m left and right from the original beamformer (the same as in previous simulation) to form two extra noise references. A speaker is simulated at 1.5m from the original beamformer at an angle of 90. A noise source is simulated at various angles (0-180 ) at a distance of 3m from the original beamformer. The noise and speech were the same as in the first simulation. The SDR-GSC algorithm was run for all these different setups with parameters: µ = 1, = 0.1, = 0.1, = and for two different filter lengths L=11 and L=21. The resulting SNR gain is depicted in fig. 8, left for a filter length of 11 and right for a filter length of 21. The SNR gain is plotted in function of the angles of the noise source. In these figures it is clearly seen that with the use of an extra noise reference an improvement is achieved in case the noise comes from the same angle as the speech (=90 ). It can also be noticed that by using a larger filter length an improvement is seen for a larger interval of angles. When a larger filter length is used a greater delay between noise in the noise references and noise in the speech reference can be allowed. results shown here are word error rates (WER). This is the number of wrongly recognized words in the set, expressed as a percentage of the total number of words. These WERs were obtained with a noise robust speech recognizer built by using the SPRAAK toolkit [7]. Hereunder, we will show results for the processed electret microphones, the unprocessed electret microphones and a good qualitative reference microphone at the same distance as the electret microphones. fig. 9: Room for experiments fig. 10: Energy pattern for experiment fig. 8: Results simulation B IV. EXPERIMENTS All experiments were conducted in the room showed in fig. 9. The beamformer consisted out of five randomly selected electret microphones (CUI inc., part number CMA-4544PF- W), evenly spaced at 3cm. This will form the energy pattern showed in fig. 10. Here it can be seen that the formed beam is small enough to reject the noise source, but still wide enough for a possible small positioning fault. The noise was played from a laptop. The speech was the Aurora4 dataset [8]. All A. Noisy environment The noise source was set to play random white noise. The microphone outputs were processed with the SDR-MWF algorithm, with parameters: L = 11, = 0.1, = , = 0.1 and various values for µ. The results are shown in fig. 11. First, it can be noticed that using the algorithm an improvement in terms of WER is seen. Interestingly it is noticed that the results with various µ show an optimum. This optimum results from a good balance between residual noise and speech distortion. As explained before a non SDR algorithm is obtained when µ is very large. As a consequence the WERs will be much worse. This proves that using a SDR algorithm can have a positive effect, compared to its non SDR alternatives, when combining it with a speech recognizer. When looking at the results for the single microphones it can be seen that the electret microphones deliver a results almost equal to the result delivered by the good reference microphone. The reason is that the influence of the noise source has got the biggest influence in the result, not the quality of the used microphone. 131

138 One last comment should be made about the overall hi WERs. The noisy environment was rather extreme. Even after noise reduction the speech was flooded with noise in the higher frequency regions. The speech recognizer had serious problems with this. But still, the influence of balancing residual noise and speech distortion is clearly visible. fig. 11: Results experiment A B. Less noisy environment The setup was the same as in previous experiment with the difference that the noise source was removed. Just a little amount of noise remained, introduced by the used microphone preamplifiers. This way another effect will take the upper hand, namely reverberation. The results are showed in fig. 12. We can see that all findings of experiment A for the electret microphones also apply here. The only difference is that the WER is overall smaller. This is due to the already better microphone signals. When comparing the results for the individual microphones it can be seen that the electret microphones now result in worse results than the good reference microphone. The reason for this can be found in the noise introduced by the electrets preamplifier. visualize the spatial filtering effect of beamformers, which provides an easy assessment of such beamformers. Next, the SDR-MWF and the SDR-GSC algorithms were reviewed and experimentally compared. SDR-MWF performs better under all noise angles, but SDR-GSC could have the advantage of inducing less speech distortion if an extra noise reference can be made. Additionally an appropriate setup for doing so was proposed. Finally, two practical experiments were carried out. One in a noisy environment and another in an almost noise free environment. Both experiments had the same conclusion. An optimum is formed over the trade off parameter µ which controls the amount of residual noise versus the amount of speech distortion. When µ is set to high or too low, the recognition of the speech deteriorates because of too much speech distortion or too much residual noise respectively. The difference between both experiments was that the results for a less noisy environment were better on overall. For the noisy environment we achieved a gain in WER of 20% absolute, for the less noisy environment this gain was about 2% absolute. VI. ACKNOWLEDGEMENTS Research supported by the Flemish Government: IWT, project ALADIN (Adaptation and Learning for Assistive Domestic Vocal Interfaces). REFERENCES [1] J. Benesty, M. Sondhi, I. Huang. Handbook of Speech Processing. Berlijn: Springer, [2] J. Mertens. Opbouw en validatie van een spraak acquisitie- en conditioneringssysteem. KH Kempen., [3] A. Spriet, M. Moonen, J. Wouters. Stochastic gradient implementation of spatially pre-processed Multi-channel [4] D. Van Compernolle and S.Van Gerven, Beamforming with microphonearrays, Heverlee: KU leuven- ESAT, 1995, pp [5] S. Doclo, A. Spriet, M. Moonen, J. Wouters. Design of a robust Multimicrophone noise reduction algorithm for hearing instruments. ESAT, K.U.Leuven,2004. [6] Kuo, M. Sen, Real-time digital signal processing:implementations and applications, 2nd ed., [7] Kris Demuynck, Xueru Zhang, Dirk Van Compernolle and Hugo Van hamme. Feature versus Model Based Noise Robustness. In Proc. INTERSPEECH, , Makuhari, Japan, September [8] N. Parihar, J. Picone, D. Pearce, H.G. Hirsch. Performance analysis the Aurora large vocabulary baseline system., Proceedings of the European Signal Processing Conference, Vienna, Austria, 2004 [9] K. Demuynck, X. Zhang, D. Van Compernolle and H. Van hamme. Feature versus Model Based Noise Robustness. In Proc. INTERSPEECH, , Makuhari, Japan, September [10] N. Parihar, J. Picone, D. Pearce, H.G. Hirsch. Performance analysis the Aurora large vocabulary baseline system., Proceedings of the European Signal Processing Conference, Vienna, Austria, 2004 fig. 12: Results experiment B V. CONCLUSIONS In this paper we assessed the performance of different beamformers. We first introduced a novel energy pattern to 132

139 Reinforcement Learning: exploration and exploitation in a Multi-Goal Environment Peter Van Hout 1, Tom Croonenborghs 1/2, Peter Karsmakers 1/3 1 Biosciences and Technology Department, K.H.Kempen, Kleinhoefstaat 4, B-2240 Geel, Belgium 2 Dept. of Computer Science, K.U.Leuven, Celestijnenlaan 200A, B-3001 Heverlee, Belgium 3 Dept. of Electrical Eng., K.U.Leuven, Kasteelpark Arenberg 10/2446, B-3001 Heverlee, Belgium Abstract In Reinforcement Learning the agent learns to choose the best actions depending on the rewards. One of the common problems is the trade-off between exploration and exploitation. In this paper we consider several ways to deal with this in a multi goal environment. In a multi-goal environment it is important to choose the correct value for epsilon. If the tradeoff between exploration and exploitation is not correct the agent takes too long to learn or does not find an optimal policy. We present several experiments to illustrate the importance of this trade-off and to show the influence of different parameters. Index Terms Reinforcement Learning, Boltzmann, Epsilon- Greedy, Q-Learning I. INTRODUCTION AND RELATED WORK Reinforcement Learning is a sub-area of Machine Learning, where an agent has to learn a policy from the rewards he gets from the environment on certain actions [Sutton, Barto, 1998]. The agent can use different methods to learn a policy using the environment with sarsa or Q-learning that I will use in the experiments discussed in this paper. And this in combination with two famous exploration techniques Epsilon-Greedy and Boltzmann, both use a different approach in exploration and exploitation. This paper is written as an extension of my master thesis. My master thesis is also a combination of different experiments on different environments using the Q-agent. The basics of RL (Reinforcement Learning), Q-learning and Epsilon-Greedy/Boltzmann are described in the following sections of this paper. The results of the experiments are compared after the short introduction to RL and its components. II. INTRODUCTION TO REINFORCEMENT LEARNING A. Reinforcement Learning In RL you have interaction between an agent and the environment [Kaelbling, Littman, Moore, 1996]. The agent gets an indication of his state and chooses an action that he wants to execute. Due to this action the state changes and the agent will receive feedback about the quality of the transition, the reward, and its new state. This will be repeated until the experiment is over. While the experiment is running, a policy will be learned. This policy approximates the optimal policy over time. In the long run the agent should be choosing the actions that results in a high long-term reward. The goal of the task is to optimize the utility function V π (s) for all states. The most common definition is the discounted sum of future rewards where the discount factor (γ) indicates the importance of future rewards with respect to immediate rewards. state st reward rt rt+1 st+1 Agent Environment action at Figure II.1 Interaction between agent and environment. Consider for instance the following example: Environment: You are in state 1 and have 4 possibilities. Agent: I ll choose action 2. Environment: You are now in state 4 and have 4 possibilities. Agent: I ll choose action 1. B. Q-Learning Q-learning is one of the possible algorithms to learn a policy [Watkins, Dayan, 1992]. The Q-agent uses the Q-function where the Q-values are defined on state-action pairs. The Q-value gives an indication 133

140 of the quality of executing a certain action in a certain state. Q- learning is an off-policy learning algorithm, this means that the learned Q-function will approximate the optimal Q-function Q* independent of the exploration that is used. The update rule used in Q-learning is: Q(s t,a t ) Q(s t,a t ) + α [(r + γmax a Q(s t+1,a )) - Q(s t,a t )]. In Q-learning some parameters have an influence on the behavior of the Q-function. 1) Learning rate The learning rate (α) describes how the newly acquired information will be used. If α = 0 the agent will not learn a thing because only the old information is used. While α = 1 only the newly acquired information will be used to update the Q-values. 2) Discount factor The discount factor (γ) determines the importance of future rewards. If γ = 0 the agent will only consider the current rewards. While γ = 1 the agent will go for a high long-term reward. A common value for the discount factor is 0.9. C. Exploration For convergence it is necessary to execute each action in each state an infinite number of times. Intuitively, you want even if you have already a good policy, investigate whether there is no better policy. Q-learning only describes that an agent must choose an action, not how he chooses this action. For exploration we can use different techniques [Croonenborghs, 2009], so the agent can try to collect as much new information as possible. The agent must also be able to use its information to accomplish a task while learning. Sometimes the agent has to give up exploration so he can use his knowledge and exploit this. A good trade-off between exploration and exploitation is therefore a must. 1) Epsilon-Greedy Epsilon-Greedy is an extension of the Greedy algorithm. Greedy only chooses the action with the highest Q-value and due to this action he will not be able to explore the environment. When the Greedy algorithm is extended with exploration we call it Epsilon-Greedy. With a small chance (epsilon) the agent will choose a random action, otherwise the greedy action will be chosen. Mostly a small part is used for exploration and a big part for exploitation. 2) Boltzmann Boltzmann uses a different strategy for choosing an action by assigning a non-zero probability of being executed to every possible action in a certain state. The probability of all actions should be 1 (100%) and the probability per action is calculated in a particular state and action always will end up in the same state, used in this experiment the learning rate (α) is set to 1.0 due to the use of a deterministic environment. And the discount factor (γ) 0.9 is to optimize a high long-term reward. The learning rate and discount factor are discussed in section II.B.1/2. B. Environment The created maze is pretty simple and specially made for explaining the importance of a good trade-off between exploration and exploitation. This environment is deterministic, ie for each action in a particular state there is only one possible new state. For the illustration I used a 2-goal environment with different rewards. The following rewards comply: Goal2: Goal1: Obstacle and step: 0.0 W W W W W W W W W W W W W W W W G W W W W W W W W W W W W E2 0 0 W W W W 0 0 E1 0 G W W W W W W W W W W W W W W W W W W W W W W W W S W W W W W W W W W W W W Table III.1 - The maze (environment). Around the maze (Table III.1) you have an obstacle, the wall (W), so that the agent has to move in the free (0) environment. The start-state (S) is fixed at the bottom, while the 2 goals (G1 and G2, E1 and E2) are on a different distance to the start-state. The G- and E-goals are for two different mazes. When the maze with the G-goals is used, the E-goals are a free space (0) and vice versa. with the following formule: A. Q-agent III. SETTINGS For the deterministic environment, this means that the agent 134

141 C. Experiment Each experiment runs for 1000 episodes. At the beginning of a episode the agent starts in the start-state and ends when the agent reaches the goal or the step-limit of 200. Of these 1000 episodes every 10th is used to check if the agent has already learned the optimal policy. The used results are an average of 5 experiments. The experiments were done using the program RL-Glue [Tanner, White, 2009]. IV. COMPARISON A. Random In a multi-goal environment it is always difficult to choose the right trade-off between exploration and exploitation. With an agent that takes always a random action, the goal closest to the start state will be reached much earlier. Goals further away will take more time to reach. The graph will be represented in color, green are the states with a high frequency of steps and red with only a few. Table IV.1 - Environments map-temperature with random actions. As shown in the table (Tbl. IV.1), the goals at the end of the environment will be more difficult to reach. B. Epsilon-Greedy In the experiment, there are 2 epsilon values used, a very common used epsilon of 0.2 and one of 0.8. The epsilon value is the measure of the probability. I.e. a epsilon of 0.2 is the same as 20% chance on a random action. In the remaining 80% the agent will choose the greedy action. 1) Epsilon of 0.2 With only 20% chance on a random action (explained above), this strategy focuses more on exploitation and a little on exploration. When we look back at the map-temperature of the random actions (Tbl. IV.1), we can say that when the second goal is situated further back, it will be more difficult to reach. We can also conclude that when they are situated next to each other, it will be easier to reach the second goal. For the environment with the G-goals, the second goal with a higher reward will not be reached in a normal time-limit. In the case of the E-goal environment it s a lot easier to reach the second goal with a epsilon of 0.2, but it will take some time. a. b. Table IV.2 - EG0.2 map-temperature for G-goal (a.) and E-goal (b.) environment. The table above (Tbl. IV.2) clearly shows the different behavior on the 2 mazes. When the agent has to run in the difficult environment (a.) he will only find the first goal and the best path to it, this is the green corridor. When the easier environment (b.) is used the agent will find the way to the second goal, but still prefers the first goal. If the number of experiments increases the agent will learn the way to the other and better goal. The table also gives a good view of the fact that the agent will be lead to the first goal. 2) Epsilon of 0.8 With 80% random actions it s close to a fully random agent. This value lays the focus on exploration, and less on exploitation. The profit of this strategy is that the agent will learn the goals quicker and especially when there are more goals than one. Due to the high chance on random actions the maptemperature will look almost the same like a random agent (Tbl. IV.1). The difference is that there will be a green corridor to the learned goal. Unlike the corridor of EG0.2, the corridor will be wider because the strategy does not focus on exploitation. In the table (Tbl. IV.3) below, the green spot is the wide corridor to the goal. The behavior of both environments is almost the same, and the agent will learn both goals pretty quick. And on the easy map (b.) a little faster than on the difficult map (a.). a. b. Table IV.3 - EG0.8 map-temperature for G-goal (a.) and E-goal (b.) environment. 135

142 3) Epsilon 0.2 against 0.8 The higher the epsilon, the more exploration and the faster the other goals will be found. But this can be a disadvantage too. In reinforcement learning we want to learn and use the best way in a short amount of time. A epsilon with the value 0.8 will learn the best path in a short time due to the random actions in 80% of the time. The agent explores more than exploiting the learned policy. When the epsilon is set to 0.2, the agent will focus more on exploiting the learned policy and chooses the best actions according this policy. Although these actions are not always the best possible ones. There is a probability that the agent will learn the second goal in the future but it can take a lot of time. And that is obviously not what we want. The time taken depends on the difficulty of the used environment, but the general proposition is true. The figure below (Fig. IV.1) illustrates the rewards (y-axis) when the agent is running with a frozen policy over a certain time (xaxis). It shows that both exploration strategies will learn the optimal policy. The more random the strategy, the faster the optimal policy will be learned. settings you ve made. Not every kind of exploration is good on all kind of environments. On a multi-goal environment it s important to explore enough for finding each goal on a fast time and afterwards fast changing from exploration to exploiting, to use the best possible way, when the optimal policy is learned. Epsilon-Greedy and Boltzmann are not the optimal exploration strategies to use in a multi-goal environment due to the amount of actions that are randomly taken. The epsilon or temperature used has an influence on the behavior but it s difficult to set this value or change it over time. If we want to have good results in a multi-goal environment we have to think about an alternative strategy. Like one that first explores the whole map and then exploits the best way (i.e. Rmax, E³). ACKNOWLEDGMENT We would like to express our gratitude to Tom Croonenborghs and Peter Karsmakers for their support concerning the machine learning concepts. We would also like to express our gratitude to Tom Croonenborghs for the support concerning reinforcement learning. REFERENCES Figure IV.1 - Received rewards in episodes with different epsilon values on both environments. C. Boltzmann The use of the Boltzmann exploration strategy is similar to the use of Epsilon-Greedy. While the trade-off between exploration and exploitation is given by the epsilon with Epsilon-Greedy, Boltzmann uses the temperature in the formule for this trade-off. The higher the Q-value, the higher the probability of the action being executed. When the temperature is decreased, it should resemble exploitation. As the temperature is increased, Boltzmann should resemble exploration. The trade-off for both exploration strategies are set through parameters. Therefore the difference between Epsilon-Greedy and Boltzmann experiments are nil. [1] B. Tanner, A. White, RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments in Journal of Machine Learning Research 10, 2009, pp [2] R. Sutton, A. Barto, Reinforcement Learning: An Introduction, MIT Press, 1998 [3] L. Kaelbling, M. Littman, A. Moore, Reinforcement Learning: A survey in Journal of Artificial Intelligence Research, 1996, pp. 4: [4] T. Croonenborghs, Model-Assisted Approaches for Relational Reinforcement Learning on, 2009 [5] C. Watkins, P Dayan, Q-learning in Machine Learning vol.8, 1992, pp V. CONCLUSION Both exploration strategies are important in the story of exploration. But for a good exploration there is need for a correctly chosen epsilon and temperature value. The best way to know the best epsilon or temperature settings is to test the 136

143 Service-Oriented Architecture in an Agile Development Environment Niels Van Kets, Bart De Neuter and Dr. Joan De Boeck Abstract Within the domain of software development there has always been a keen discussion about the combination of Service Oriented Architectures (SOA) with Agile methodologies. Both SOA and Agile development strive towards change but in a different way. Service Oriented Architecture enables business agility while Agile enables agility while developing. Despite the common background, there still are some points where Service Oriented Architecture impedes Agile Development. This paper starts with an analysis to discover where Service Oriented conditions prevent the use of Agile Development methodologies. This is followed with a theoretical study to evade these conditions in such a way that Agile methodologies can be used. Afterwards, a proof of concept shows if it s possible to create a Service Oriented Architecture based on the solutions derived from the theoretical study and the main Agile methodologies. This proof of concept is founded on Representional State Transfer (REST) services, which enable an enterprise to scale globaly with their Service Oriented Architecture. Index Terms Agile, Enterprise Service Bus, Representional State Transfer, Service Contracts, Service Oriented, Web Services. I. INTRODUCTION MODERN enterprises are subject to change these days. Staying alive within concurrential domains demand enterprises to be able to change business rules and processes very quick. Because enterprises are exposed to external influences like economics, no one can absolutely predict when or in what extend changes will be necessary. The only thing a business environment can do, is making sure that changes are welcome and manageable. For IT this means that it must be aligned with the business so it can react the same way as the business itself. In a perfect scenario, both architecture and development will have to be able to cope with changes whenever they occur. That is where SOA and Agile development show up. SOA makes sure that the software architecture adopts very well to business changes by stressing on low coupling, high cohesion business processes. Agile development is a software development methodology that stresses on agility within the development cycle, and quick iterative working software releases. In practice the two principles are not easily adopted together in one enterprise. There are some SOA constraints that tend N. Van Kets is a Master of Science Student at Katholieke Hogeschool Kempen, Geel, 2440 BELGIUM B. De Neuter is a Senior Software Architect at Cegeka, Leuven, 3001 BELGIUM Dr. J. De Boeck is a university lecturer in the Department of Industrial- and Biosciences, Katholieke Hogeschool Kempen, Geel, 2440 BELGIUM to hinder Agile development. It feels unnatural that this is possible. SOA is an architectural approach and Agile describes how teams have to develop software, these are two completely different things, so they must be compatible somehow. The Agile development methodology described in this paper is based on Scrum [1] and extreme Programming (XP) [2]. Both Scrum and XP are deriviated from the basic Agile principles described in the Agile Manifesto [3]. They both describe the iterative development of software and embrace change within that process. They apply a prioritised approach towards new functionality to implement, meaning that business processes with the highest value are delivered first. Both Scrum and XP can be mutually used within one team and give a good set of rules to deliver fast, efficient and incremental. Scrum has its focus on avoiding and disentangle obstructions within an Agile development process. Scrum solves these obstructions by using transparent communication between team members mutually and between the team and the customer. Extreme Programming focusses more on the engineering practices Agile teams can use. These practices describe processes or tools to intensify the Agile methods. In this paper we will first discuss what we understand under SOA. Many definitions have manifested all over the web and within the software developer world. Because of the ambiguity, we will first show our interpretation of Service- Oriented Architecture. Next we will destinguish the problems we encountered when trying to develop an SOA with Agile methodologies. Based on each of these impediments we will try to describe how to avoid them while not being inferior to the SOA and Agile principles. Last, we will show that when chosen right, a SOA can be easily developed in an Agile manner and even scale globally very fast. We will try to prove this with our proof of concept based on Representional State Transfer services as described by Thomas Fielding [4]. Cegeka January, 2011 II. SERVICE ORIENTED ARCHITECTURE The need for Service Oriented Architecture is founded on the need to tie the business and IT within a corporate environment together. When used propery, the technology 137

144 behind a SOA can even empower the business instead of putting constraints on it. If software can be made reusable within its business context, this business can gain profit by not having to reinvent the wheel. SOA gives the business the key of doing this on a organizational scale. This way an organisation can cope better with changing business needs. When applied right, a business change will only trigger a recombination of existing services, sometimes with some new services. A. Services Within a Service Oriented Architecture, a service represent one specific part of the business domain, with all its possible methods and functions. This functionality is made available for external services or users through its interface. The following list gives the constraints for services in an SOA. The services are as loosely coupled and autonomous possible, so they can live with a minimal set of dependencies. A service has a high cohesion so it contains only the responsibility of one small part of the business. Services have to be reusable, at least within the business. Services can be combined to represent real business processes. IV. SERVICE CONTRACTS The second impediment we have discovered are service contracts. In an SOA, each service should have its own service contract. This contract describes how other services or users can interact with it. Because these contracts are important to enable the combination of multiple services to one logical business process, SOA tends towards big upfront design. Since multiple services can be dependant on one service contract, it seems better to design these contracts in advance so they won t change too often in a later stadium. This is exactly what Agile development tries to avoid. It strives for a development where functionality is incrementaly delivered. This also means that services and service contracts shoud evolve based on this incremental development. To solve this problem it s important to evaluate the lifecycle of a service developed in an Agile manner. We ve discovered that a service can roughly be in three different stages. Figure 1 describes this lifecycle. This description is very broad-shouldered. Most other SOA definitions bind much more constraints upon an SOA. But these are the general fundamentals of an Service-Oriented Architecture. The other constraints like service registers, business process management, orchestration,... should only be implemented when there is a necessity. Fig. 1. Service Stages III. TOOLS AND FRAMEWORKS The first impediment we encounter when developing an SOA with Agile methodologies are tools and frameworks. Many companies, when adopting to SOA, are doing this based on specialized tools and frameworks. This is because SOAs are mostly described as bombastic solutions to get IT inline with the business. Many claim that SOAs cannot exist without large service registers, huge messaging frameworks (see section V),... Agile demoralizes this kind of approach. The Agile Manifesto [3] even describes that induviduals and interactions have more value than processes and tools. In practice this means that an Agile team always has to implement the simplest thing possible. This encourages lightweight frameworks that deliver only the necessary functionality. A second problem with large frameworks is that they deliver none or marginal business value. Within Agile development, the business drives the implementation. Things that don t deliver business value may not be implemented at this time. Although it is a key value of Agile development to postpone tool and framework decisions as long as possible, it is something that is often forgotten. Developers need to consider lightweight frameworks when developing an SOA with Agile development methods. A. Development The first stage starts with the birth of a service and ends at the maturity stage. During this stage the service will start small and grow incrementaly each sprint. During that growth, the service contract will grow and change according to the service itself. If the service is developed seperate from other services, this agile method will succeed. However if multiple services are built at the same time and these services are depending on eachothers contract there is a need for another approach. Suppose service A being developed at the same time as service B. Service A is internally using service B, so the development team of service A will build a stub service based on the contract of service B to simulate the interaction with it. Whenever the service contract of service B changes during development, the team of service A should update their stub service to fulfil the new contract. It is very important for team A to see whenever a contract change breaks existing code, so it can react very quick. In practice it is very important to communicate contract changes. If two services are developed by the same team, the daily team communication will ensure this process. If however two services are developed by two separate teams, there will be need for some written documentation. For example creating a wiki to keep track of service contract changes is a very good practice when working with two teams. Once a service is finished and released, it enters the maturity stage. 138

145 B. Maturity The second stage is the maturity stage. This is the stage in which a service contract is stable and mature. During this time, consumers of the service have to meet with the service contract available. C. Evolution Since a service represents business functionality, it is subject to the same influences a business has to cope with. This means that a service is bound to change in a timely manner. Since one service can have multiple consumers, it is important to make sure that they evolve in pace. This evolution can be a major constraint when multiple consumers are dependent on the provider. Inside an SOA, service contracts provide coupling between services. This coupling should be as loosely possible. Ian Robinson [5] describes in his article on service evolution, three types of contracts. 1) Provider contracts give a complete set of elements available for consumers. A provider contract has a one to one relationship with the service and is authorative. All consumers must agree upon this contract. During maturity the contract is stable and immutable. Evolution of a provider contract will be very exhausting. The service provider has to maintain older versions of services or update all the consumers. 2) Consumer contracts commence when provider contracts take consumer expectations into account. When a consumer contacts a provider, it will send expectations for the provider response. Based on these expectations, the provider will send the subset of business function that the consumer requested. Consumer contracts have a many to one relationship with the provider. These contracts are non-authorative because they don t cope with the total set of provider obligations. Like provider contracts, consumer contracts are stable during maturity. 3) Consumer-driven contracts are slightly different from consumer contracts. They give a better view on the business value a provider exploits at a given time. A consumer-driven contract is always complete. It represents all functionality demanded by all consumers at a given point in time. When consumers need more functionality, the provider contract will be updated. Consumer-driven contracts are singular, but still nonauthorative because they are not driven from the provider. These contracts are stable and immutable when the list of consumers do not change. D. Service Contact Conclusion Robinson describes two specific benifits when it comes to consumer-driven contracts. First, these contracts will only include functionality required by its consumers. As effect, it is always clear what the extend of use is for the provider. The second advantage is that providers and consumers can stay backwards and forwards compatible more easy. Consumerdriven contracts provide a knowledge of which functionality consumers really use. When changes occur, it will be much easier to see if it will affect the consumers. Which makes a provider more manageable and agile within the development context. V. ENTERPRISE INTEGRATION A third problem arises when we talk about enterprise integration. With enterprise integration, service-oriented architects describe the middleware that connects individual business processes and services. In this part we will show that when choosen wrong, an enterprise integration solution can be a real burden on Agile development. A. Enterprise Integration Solutions To describe the most common problem between SOA and Agile development, we have to situate the different integration solutions available. Figure 2 shows that the two integration solutions both start with common Enterprise Integration Patterns. These patterns are substracted and described from repeated use in practice. Fig. 2. Enterprise integration Solutions Hophe and Woolf [6] identify 65 patterns which are organized in 7 categories: Integration styles describe different ways to integrate systems. These can be file transfer, shared database, remote procedure invocation or messaging solutions. The Enterprise Integration Patterns are specific for the latter. Channel patterns describe the fundamental part of messaging systems. These channels are based on the relation between providers and consumers. Message construction patterns describe the type of message, intent, form and content. Routing patterns describe routes towards different receivers based on conditions. Transformation patterns are responsible for message content changes. Endpoint patterns describe the behavior of clients within the system. System management patterns are solutions for error handling, performance optimalization,... Because Enterprise Integration Patterns, as described above, are bounded to messaging, they are very important in Service Oriented Architectures. The integration between services will exist in exchanging messages to trigger events or exchange data. Since patterns only describe behavior of the different messaging components, Enterprise Integration Frameworks and Enterprise Service Busses cope with the practical use of 139

146 them. Frameworks provide a solution to use integration patterns from within a programmable context. Enterprise Service Busses on the other hand combine all integration rules within one black box and give some extra functionality like security, QoS, BPM,... B. ESB In this part we will describe why Enterprise Service Busses are related to some problems with Agile development, and give a solution to avoid these problems. ESBs hinder agile engineering practices. Agile development uses some engineering practices. Because of an ESB, all engineering practices that involve integration, will be bound to use the ESB. Especially for test driven development and continious integration, this will be a problem. Test evaluations will be less clear because developers do not know what happens inside the ESB. Automated test and build cycles will be hard because they need access to the ESB. Everything that hinders engineering practices, hinders Agile development and should be avoided. C. Lightweight Enterprise Integration We need to find a solution to solve enterprise integration, without reducing agility. Figure 3-A shows both the problem and the solution for enterprise integration. It shows us multiple point to point connections between applications and systems. The big problem with this picture is that the connection between services is hard coded within the service itself. This means that whenever service compositions change, these changes can occur anywhere in the code. This makes it unmanageable. What we need is a combination of both worlds. We do not want an ESB like figure 3-B, but the single point of entry for a service that it uses is a good idea. To solve this problem we can use a virtual ESB as in figure 4. Fig. 3. Enterprise integration: A without ESB, B with ESB ESBs do not allow easy changes. Vendors sell ESBs to provide complete integration solutions for the whole enterprise (see figure 3). All of the integration needs will be bundled within one vendor specific black box. This means that whenever an enterprise buys itself an ESB, it gets a vendor lockdown with it for free. This vendor lockdown makes development less agile because they are bound to use the vendor s solution. Changing or replacing an ESB will be a difficult task and therefore developers will try to avoid that. This is completely against Agile development. Within Agile development allowing change is a necessity, anything that works against change has to be avoided. ESBs are under separate control. Because ESBs span the whole enterprise, ESBs are mostly managed by a separate team. This is alien to the cross functional teams that Agile describes. These teams are responsible for the complete implementation of business needs, this includes the integration of services. With an ESB, these Agile teams will be slowed down because they have to communicate with other teams before they can integrate services. ESBs are not incremental. Vendors sell ESBs based on the analysis of an enterprise. They analyse the business integration needs, make a design, and based on that design they try to convince the organization to buy their solution. This type of development is reffered to as the waterfall approach. Because they try to implement all at once, it will become a very difficult and long development to make an ESB work. Fig. 4. Virtual ESB With a virtual ESB each service is connected, with its single point of entry, to its own small integration service. This piece of software will connect all the services where its service relies on. Typically this connection relies on two parts: Provide routing to the other services. Transform service messages to a uniform messaging format between the services. This is the part where Integration Frameworks show up. The virtual ESB will only need a combination of Enterprise Integration Patterns. Frameworks can handle the practical implementation of these patterns, without creating extra overhead like ESBs. This way we create a Service Oriented Architecture that can cope with Agile development. Teams who implement a service will also build the virtual ESB component that takes care of the enterprise integration. This way, the Agile team members are responsible for the whole service, including the 140

147 integration. The uniform message format makes sure that each service talks the same business language. It also contributes to the agile engineering practices like test driven design and continious integration. In their presentation about enterprise integration, Martin Fowler and Jim Webber [7] describe another approach based on the World Wide Web defined by Tim Berners-Lee. They suggest using the internet as middleware. The HTTP protocol used on the internet can be interpreted as a globally deployed coordination framework. This coordination comes from the use of status codes (404 Not found, 200 OK,...). But most important of all, the internet is incremental. It accepts new and small pieces to be implemented. This gives a great advantage when using Agile development methodologies. VI. PROOF OF CONCEPT WITH REST SERVICES In previous sections we have seen some constraints when combining a SOA with an Agile development approach. Based on these concerns we have decided to build a proof of concept based on Representional State Transfer (REST) services and Java. REST services, as described in Thomas Fieldings dissertation [4], use the WWW as middleware to exchange resource representations over a uniform interface. A resource is something interesting made public throughout the WWW by one or more representations. Each resource representation has at least one Uniform Resource Identifier (URI). This is the address on which a resource representation is available and obliging. An important property of the REST style is the uniform interface. Each interface will communicate over HTTP with the common HTTP verbs (POST, GET, PUT, DELETE,...). When talking about resources these HTTP verbs can have a one to one match with a create, read, update, delete (CRUD) lifecycle of the resource. In our proof of concept (PoC), we have built three separate services (figure 5). Each service will handle the lifecycle of a common resource applying product backlogs, sprint backlogs and tracking mechanisms, and Agile engineering practices like TDD, continious integration, pair programming and refactoring. B. Tools and frameworks In an Agile development cycle it is very important to be able to change. Heavy tools and frameworks put constraints on this agility, so we had to find lightweight tools and frameworks to enable this change. With this constraint in mind we ve chosen to start our project with Maven [8] and Spring [9]. Maven is used to simplify build processes, dependency management, simplify unit testing, and many more. With Maven enabled, there is no need to make build and test scripts to test and build projects, everything is automated. This is important for an Agile team, because build scripts don t add any business value to the project, and should be avoided. Spring on the other hand is a very large framework, but we only choose to start with the Inversion of Control (IoC) container it provides. The IoC container eliminates the need of singletons and factories within our code by delivering them at runtime. Both frameworks reduce the effort and complexity of starting and maintaining a Java based project. While developing we will certainly need more tools and frameworks, we ve always tried to take lightweight frameworks that allow changes as much as possible. For example Hibernate [10] allows us to change our database very easy by just changing a few lines of configuration code. The switch from a MySQL database to an Oracle database should only take about 10 minutes. C. Service Contracts For this project we are working with one team. This means that we can rely on our own tests to see if contract changes break functionality. There is no need for wiki pages to advert our changes. As we have discovered in our service contract chapter, it is better to use consumer-driven contracts. Because we use Agile methodologies, our requirements are described in user stories. These stories tell us what users (in our case consumers) should be able to do. This means that the Agile methodologies are already driven from consumer expectations. Based on these expectations, we should implement our service contracts and services. In our case we have used the enunciate [11] package. This packages automatically generates service contracts based on Web Application Description Language (WADL) [12]. Fig. 5. Proof of concept This way we have only built code that our business realy needs. Services will only dispose functionality consumers realy need, being consumer-driven. A. Agile One important constraint for our PoC was to use Agile methodologies. This means using Agile project management D. Enterprise Integration The enterprise integration part was probably the most important constraint in our implementation. Our PoC consists 141

148 of three services which are all consumable through a REST interface. The employee and company service are both implemented on site A, while the project service is implemented on site B. Because we use the HTTP protocol to communicate with services, it becomes very easy to integrate both services together. The HTTP protocol is globaly available and known, so by using the WWW, we are actually using a big middleware platform that scales on a global scale. This is the main advantage of REST services. But there was still one flaw within our implementation. On the project side we can put our employees on different projects. We do this by using an aggregate of our employee (being the staff number) and attach it to a project. There is one situation where a problem arises. When we delete an employee at site A, this employee should be removed out of all the projects he is in. But this should happen asynchronous. When an employee is deleted, our employee service should notify the project service as soon as possible to delete all aggregates. But when the project service is down, we don t want our employee service to fail, instead, it should send the request as soon as the project site is back available. This is what we mean by asynchronous. To do this, we have chosen to use an enterprise integration framework, called Camel [13], at the employee service. This framework will save an XML file on the employee service hard disk, and tries to deliver it as soon as possible to our project site. This delivery also goes through REST, which means that our project service does not need extra configuration or frameworks to grant the request. VII. CONCLUSION Service-Oriented Architecture is often misconcepted because of the ambiguity that emerged throughout the years. People forgot that Service-Oriented Architecture is all about services that are reusable and discoverable. All the other constraints that vendors or businesses link with SOA, are a burden most of the time. We have discovered that Agile is all about the less is more principle when it comes to choosing proprietary frameworks and tools. Developers should only implement the things that deliver business value and this as simple as possible. The second impediment when combining SOA with Agile development are service contracts. Since businesses are bound to change, services are bound to change. Every service should have its own service contract, which means that service contracts should be able to change too. Nobody can foresee in which extent services will have to change in the future, so there is no way of making a service contract upfront that lasts a lifetime. We have described a service lifetime that starts with the development, goes to maturity and from maturity it can go to evolution and back whenever change is needed. Within development there are two main cases: One team works on all services: The daily communication will make sure that contract evolution will be managed. Multiple teams work on dependent services: There is need for written documentation (wiki), or extensive communication between those teams to exchange contract changes. When a service evolutes, we have found that consumerdriven contracts are better manageable within an Agile context. It always shows how a provider is used by consumers and makes sure they stay compatible more easy. The third and last described impediment is Enterprise Integration. We have shown that vendors try to sell solutions for the whole enterprise which don t scale very well. Their famous ESBs are hard to implement because they use a big-bang approach which hardly ever works from the first time. These ESBs make our enterprises less agile because integration changes become very pricy and time consuming. They also stimulate a single team that copes with the enterprise integration, which makes it difficult for other Agile teams to estimate the duration of implementation. As a solution for this problem we have described a lightweight enterprise integration approach with a virtual ESB. In this setup, each team developing a service will also integrate that service with other services. They glue a small amount of integration code against their service to interact with other services. This interaction will happen in form of messaging in a uniform language. As a general conclusion we can say that Service Oriented Architectures and Agile development are combinable. They both stress on agility in different contexts, but when applied well they behave like extenders for eachother. REFERENCES [1] K. Schwaber, M. Beedle, Agile Software Development with Scrum, Prentice Hall, [2] K. Beck, C. Andres, Extreme Programming Explained: Embrace Change, Pearson, 2nd edition, [3] K. Beck, M. Beedle, A. van Bennekum, A. Cockburn, W. Cunningham, M. Fowler, J. Grenning, J. Highsmith, A. Hunt, R. Jeffries, J. Kern, B. Marick, R. C. Martin, S. Mellor, K. Schwaber, J. Sutherland, D. Thomas, Manifesto for Agile Software Development, at [4] R. T. Fielding, Architectural Styles and the Design of Network-based Software Architectures, University of California, Irvine, [5] I. Robinson, Consumer-Driven Contracts: A Service Evolution Pattern, ThoughtWorks Ltd., [6] G. Hophe, B. Woolf, Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions, Addison-Wesley Professional, [7] M. Fowler, J. Webber, Does my Bus look big in this?, ThoughtWorks Ltd., [8] Apache Software Foundation, Apache Maven Project, at [9] Spring, at [10] JBoss Community, Hibernate, Relational Persistence for Java, at [11] Codehaus, Enunciate, at [12] W3C, Web Application Description Language, at [13] Apache Software Foundation, Apache Camel, at 142

149 Detection of body movement using optical flow and clustering Wim Van Looy 1, Kris Cuppens 123, Bart Vanrumste IBW, K.H. Kempen (Associatie KULeuven), Kleinhoefstraat4, B-2440 Geel, Belgium 2 Mobilab, K. H. Kempen Kleinhoefstraat4, B-2440 Geel, Belgium 3 KULeuven,ESAT, BioMed Celestijnenlaan 4, B-3000, Heverlee Abstract In this paper we investigate whether it is possible to detect movement out of video images recorded from sleeping patients with epilepsy. This information is used to detect possible epileptic seizures, normal movement, breathing and other kinds of movement. For this we use optical flow and clustering algorithms. As a result, different motion patterns can be extracted out of the clustered body parts. Keywords Epilepsy, optical flow, spectral clustering, movement, k-means I. INTRODUCTION Epilepsy is still being researched in the medical world. It is very hard to find a specific cure for the disease but momentarily around 70% to 75% can be treated with medicines or specific operations. In a way to find new insights in these types of attacks, monitoring is important. Nowadays different methods have been proposed and developed to detect and examine epileptic seizures [10]. Mostly the video EEG standard is used. Neurologists monitor the patient using cameras and EEG electrodes [14]. They can look at the behavior and on meanwhile compare the reaction of the brain in an EEG chart. This is an effective but uncomfortable way of monitoring a patient. Electrodes have to be attached which consumes a lot of time, it is uncomfortable for the patient and it is quite hard to do other activities while being monitored. Next, medical staff is required and it is not possible to monitor a patient for a longer period. In this paper we explain a new approach for detecting epileptic seizures with body movement, using video monitoring. Our next goal is to achieve an accurate detection of the seizures using simple and well priced hardware. With a simple infrared camera and a computer it should be possible to make a detection out of the video images featuring a resolution of 320 x 240 pixels. The use of simple hardware requires intelligent processing as video data is computationally intensive. We investigate which algorithms are efficient on our video data. Previous research showed us that clustering algorithms can be used to cluster optical flow results for image segmentation [9]. In this paper, algorithms as optical flow and spectral clustering are tested on video recordings. The first algorithm, optical flow, is a motion detection algorithm that is capable of calculating vector fields out of two video frames. For this we use the Horn-Schunck method [15], this is discussed in section II. Optical flow calculates the movement of each pixel using parameters as brightness and contrast. Each vector contains the velocity and direction which allows us to extract information necessary for a seizure detection. This is the main reason why this algorithm is chosen. Other motion detection techniques such as background subtraction or temporal differencing do not give us information about the velocity and position of the movement. Next, clustering is used to cluster the different features that are given by the optical flow algorithm. The goal is to separate different body parts and measure their velocity and direction to make an accurate prediction of the movement. The monitoring of respiration is also possible using our method. The algorithms are applied using Matlab. Section II explains how the clustering can be optimized, how threshold calculation is done and which standards we used to come to our conclusions. In the third section we explain the results and how these results are influenced. Finally in the last section a vision is given on possible future improvements. II. METHOD A. Video specifications The video data we use is recorded in Pulderbos epilepsy centre for children and youth in Antwerp. It is compressed using Microsoft s WMV compression. The video has a resolution of 720 x 576 with a frame rate of 25 frames per second. The camera is positioned in the upper corner of the room, monitoring a child sleeping in a bed. It is possible that the person is covered by a blanket so the different body parts are not always visible. This shouldn t be a problem in the final result as the body movement connects with the blanket. The image is recorded in grayscale using an infrared camera, no RGB information is available. Before we are able to use the data for our optical flow algorithm, the video sequences are reduced in size and frame rate. We downsize the image to a 320 x 240 pixels which contains enough detail to apply the optical flow. The frame rate is also lowered to a 143

150 12,5 frames per second. Due to these reductions, processing time has significantly decreased with a minor loss of detail. B. Optical flow The next step is to deliver the reduced video data to the optical flow algorithm. The algorithm calculates a motion field out of the consecutive frames. It calculates a vector for every pixel which is characterized by the direction and magnitude of movement in the video. Mathematically a translation of a pixel with intensity and time can be written as follows: (1),,,, Using differential methods, the velocity in both x and y directions is calculated. The Horn-Schunck method uses partial derivatives to calculate motion vectors. It has a brightness constancy constraint, this means that the brightness stays constant over a certain time span. This is used to track pixels from one image to another. Horn & Schunck also uses a smoothness constraint, in case of an unknown vector value, the algorithm assumes that its value will be quite similar to the surrounding ones. It is important to supply video with rigid objects and a good intensity of the image. Otherwise the algorithm will respond less accurate. The algorithm calculates movement out of two consecutive frames by default. It is possible to use frames over a bigger time span to emphasize the movement. (e.g. to monitor respiration which is a slower movement) First we need to specify the smoothness and the number of iterations. The smoothness factor we ve chosen is 1. This value is proportional to the average magnitude of the movement. It also depends on the noise factor. In our approach the factor is defined by experiment. Next, the number of iterations has to be specified. When the number is higher, the motion field is more accurate and noise is reduced. A downside is that the higher this parameter is, the more calculations have to be done. Our experience showed us that 1000 iterations were ideal for both accuracy and speed. As a result, the optical flow algorithm produces a motion vector field. The information is stored in a matrix containing a complex value for each pixel. Using absolute value, the complex magnitude is found. The angular value of the vector shows the direction of the moving pixel. When this information is visualized, some noise in the signal became visible. This noise is the result of motion indirectly caused by the camera. A threshold is calculated to eliminate noise (e.g. Fig. 1). The maximum amplitude for each frame is plotted. The result of this visualization is simultaneously compared to movie sections without movement. The maximum amplitude of these sections is used as a threshold. Fig. 1 Maxima of the magnitude of the optical flow calculations. Video length is 10 seconds. Noise is visible around Actual movement is indicated by magnitude values higher than For all the magnitude values beneath this threshold, the matching motion vector will be replaced by a zero and will be ignored in any further calculations. C. Clustering The next step in this process is to cluster the vector field. A clustering algorithm makes it possible to group objects or values that share certain similarities or characteristics. In this approach we cluster pixels belonging to one moving part in the image having the same direction, speed and location. This results in different body parts moving separately from each other. Different clustering methods are available to make a classification. We tested several spectral clustering methods and also tested the standard Matlab k- means clustering on our dataset. We also used the Parallel Spectral Clustering in Distributed Systems toolbox provided by Wen-Yen Chen et al. [3] This toolbox for Matlab provides different clustering methods. Eventually the results of the k-means clustering algorithm provided by Matlab proved best. Both accuracy and speed scored very well in our opinion. More info is given in the next section. Before we apply the clustering, certain features had to be extracted from our optical flow field. D. Clustering features A clustering algorithm needs features to classify objects in different clusters. We tested the algorithms with different features, starting by the following three. Magnitude Direction Location The magnitude can be found by taking the absolute value of the vectors. It represents the strength of the movement. For example as the patient strongly moves his head from the right to the left, this will result in vectors with high magnitude for the pixels that correspond with the location of the head. If the patient moves his hand at the same moment, these pixels will also have the same magnitude but have a different location and are therefore grouped in two different clusters when two clusters are specified. As a second feature the direction was used. We use the radian angular value of the vectors by default. It can be converted to degrees which makes no difference for the algorithm. Due to the scale of 0 to 360, phase shifting 144

151 occurred. Two pixels pointing towards 0 or 360 have the same physical direction. As a feature this is falsely interpreted by the clustering algorithm, resulting in bad classification as shown in figure 2-B. A Fig. 2 Phase shift results in bad clustering, clusters are covering each other. Direction is plotted in radians. A solution to this problem is to split up the angular feature in two parameters. When the imaginary and real part are divided by the magnitude of the complex vector, the magnitude is suppressed and the direction is now given as a coordinate on the unit circle. Phase jumps have been eliminated but one feature is replaced by two features which has consequences for weight of this feature. This is discussed in section E. As a result, this feature will cluster all the movements that point to a single direction, independent of the location or magnitude. The third feature consists of the location of the movement vector. We use the coordinate of the pixel in the x-y plane. This feature is used to make a distinction between movements that occur in different parts of the image. E. Clustering algorithm First an appropriate clustering algorithm needs to be specified. The algorithms we investigate are [3]: spectral clustering using a sparse similarity matrix spectral clustering using Nyström method with orthogonalization spectral clustering using Nyström method without orthogonalization k-means clustering Method 1: the spectral clustering method using a sparse similarity matrix. This matrix gives the Euclidean distances between all data points. [2,18] This type of clustering gives good results with higher number of clusters. Nevertheless it requires too much processing time and it gave bad results with two or three clusters. The calculation of the sparse similarity matrix would take half a minute. (e.g. Table I) This is caused by the high amount of data our image features can contain. This calculation requires too much processing time which is not feasible on a normal pc system. Method 2 and 3: using the Nyström method with or without orthogonalization, the processing time significantly decreased (e.g. Table I). This is because Nyström uses a fixed number of data points to calculate the similarity matrix. Around 200 samples are compared to the rest of the data B points to calculate the matrix. The cluster quality is more than average and is usable for further processing. No difference in quality was noticeable between both Nyström methods but without the orthogonalization, the clustering is faster. Method 4: k-means first randomly specifies k center points. Next it calculates the Euclidean distance between these centroids and the data points. Data points are grouped with the closest center point. By repeating this step and calculating new center points for the current clusters, it tries to maximize the Euclidean distance between clusters. Using the selected features, k-means provided good results. It requires minimal processing time (e.g. Table I) and gives an accurate clustering. We will continue our research using this algorithm. TABLE I CLUSTERING SPEED COMPARISON. Time needed for clustering one frame ( average of 20 iterations) Pentium D 830 3Gb Ram, Windows 7 system Clustering type Processing time Spectral Clustering 39s Nyström with orthogonalization s Nyström without orthogonalization s K-means s F. K-means and scaling of the features The next step applies the features to the clustering algorithm that will cluster our data. Without scaling, the x- coordinate varies from 0 to 320 while the magnitude of the complex vector varies between 0 and 1. A scale has to be applied to define which feature needs more weight and which one needs less weight. In order to find good weights, we use visual inspection instead of a mathematical approach. This method is easier to use and provides clear understanding of the used data and algorithms. Features are plotted on a 3D plot. Coordinates on the x- and y-axis and the other feature on the z-axis (e.g. Fig. 3). A Fig. 3 Features direction (A) and magnitude (B) plotted versus x- and y- coordinates. The direction feature has a larger weight, the algorithm clusters on this feature as the data points have a bigger variation and more weight. Plot (B) shows the impact of this clustering on the magnitude equivalent. This visualization makes it easier to see the impact of the scaling. Adapting these scales soon lead to the appropriate B 145

152 clustering. The final result should be as follows: pixels are grouped when they differ in location, intensity of movement and direction. Ideally the clusters cover different body parts. G. Cluster analysis The next issue is providing the number of clusters before the clustering. Every movement is different, which gives a varying number of clusters. We use inter and intra cluster distances to check the quality of the clustering [17]. The inter cluster distance is the distance between the clusters. The bigger this value, the better the clustering as there is a clear distinction between the clusters. The intra cluster distance is the average distance between the centroid and other points of the cluster. The intra cluster distance should be minimized to have compact clusters. In this research, a maximum of four clusters is common, one cluster is the minimum of course. This occurs when a person only moves its arm for example. H. Defining thresholds All frames of the test video are clustered up to four clusters. Several features are extracted out of the distance measures. The most important features are the following four. Maximum overall inter cluster distance Standard deviation of the intra cluster distance for two clusters Mean of the intra cluster distance for two clusters Maximum of all intra cluster distances for one frame The frames are labeled with the right number of clusters and compared to the selected features. Our goal is to find similarity between the labeled frames and the features. The comparison showed us that the maximum inter cluster distance and the standard deviation for two clusters would give the best results. Next, the labeled frames are classified using the selected features (e.g. Fig. 4). The classification is based on Euclidean distance. predicted value (PPV) and negative predicted value (NPV). These measures are stated below. (Distinction between class 1 and class 2.) (2) 1 1 (3) 2 2 (4) 1 1 (5) 2 2 TP stands for true positives, FN for false negatives, TN for true negatives and FN for false negatives. The results of the application of thresholds can be found in section III. III. RESULTS A. Cluster analysis results To find the right number of clusters, thresholds are defined. First the video is labeled. (i.e. the right number of clusters is added manually to each frame). This is done for three videos. Using the maximal inter cluster distance, the standard deviation of intra cluster distance for 2 clusters and the right number of clusters, the data is trained. Our training set consists of 80 frames. The rest of the data is used as test set (i.e. 56 frames). Using classificationication in Matlab, thresholds are calculated. The thresholds are tested using the test set and shown in table II. TABLE III CLUSTER ANALYSIS RESULTS FOR DISTINCTION BETWEEN ONE OR TWO CLUSTERS AND DISTINCTION TION BETWEEN TWO OR THREE CLUSTERS. Seperation between class 1 and 2 Sensitivity 90,90% Specificity 62,50% PPV class A 83,30% NPV class B 76,90% Seperation between Class 2 and 3 93,30% 38,90% 71,80% 77,80% Fig. 4 This graph shows the classification between 1 or 2 clusters. Next thresholds can be defined using a training and test set to specify how many clusters should be used. The quality of the thresholds is tested ted on sensitivity, specificity, positive All of these values should be as high as possible. Results are discussed in section IV. 146

153 B. Movement analysis In this section the results of our study are presented. In the following movie sequence a young patient is monitored, she randomly moves her head from the left side to the right side of the bed (e.g. Fig. 5-AC). A B A C B D Fig. 5 Sample of two frames, in frame 41 the head is moving towards the left side. The red colour and the arrows indicate this. In frame 45 the head is moving towards the right side. Screenshots (B) and (D) show the clustering. Both clusters that cover the head contain different movement information. We cut 60 frames or five seconds out of the video. Two clusters are selected, one cluster represents the head, the other cluster features the lower body part. The clusters are segmented using the standard k-means clustering method. The Nyström method would give similar results. For both body parts, direction and intensity of the movement are plotted. It can be seen that the direction plot of the head crosses the horizontal axis multiple times (e.g. Fig. 6- A). In the example the head is moving towards the left side and next to the right side. This can be seen as -120 (left) and 15 (right) (e.g. Fig. 6-A). A Fig. 7 The plot above shows the direction and intensity of the lower body part movement. This information is provided by the cluster that covers the lower body part. This information can be used to conclude whether or not a patient is simply moving or having a seizure. Strong movement and fast change in direction are signals that can point to seizures. This needs to be studied in the future to find measures that confirm this. The next example shows a sleeping patient. The aim of this test was to measure the breathing of the patient. Originally the breathing was monitored using sensors attached to the upper body. Now we can monitor this using video detection. For this test, the algorithm uses one cluster. The plot shows 20 seconds or 250 frames of the original video. The breathing is clearly visible in the signal as the angle changes 180 degrees each sequence (e.g. Fig. 8). In the intensity plot is seen that inhaling produces slightly more movement than exhaling (e.g. Fig. 8-B). A B B Fig. 6 The plot above shows the direction and intensity of the head movement. This information is provided by the cluster that covers the head. Figure 6-B plots the intensity of the movement. Figure 7 gives information about the direction and intensity of movement for the lower body part. Fig. 8 Breathing monitored and visualised with angular movement and intensity. The breathing pattern is clearly visible. In future work the monitoring of respiration should be combined with nocturnal movement. This will not be easy as the magnitude of the respiration is much smaller compared to the magnitude of movement. IV. DISCUSSION We discuss several possible improvements for our method (e.g. automated cluster scaling and improved cluster analysis ). 147

154 It would be interesting to test a system that scales the features depending on the situation. As a limited number of frames provide poor clustering quality, automatic scaling might solve this problem in these cases. Sometimes other features should have more weight than others. E.g. as the complete body is moving, the location feature might be emphasized a bit to have better distinction of the clusters. Cluster analysis provides a good automated distinction between one, two or three clusters. The difference between two or three is less accurate but good in certain circumstances where the inter cluster distance has a higher value. The specificity of class three is a bit too low. The chances are 61.10% that a frame should be clustered using two clusters while being labeled in class 3 (e.g. Table II). It is real that frames belonging to class 2 are falsely classified in class 3. Note that it is possible that they are classified correctly as the labeling sometimes has different correct number of clusters. As a conclusion we can say that the algorithm is able to make a distinction between one, two or three clusters, but the right amount of clusters needs to be supplied manually when four or more clusters are required. Other cluster features have to be searched in the future. Next, in our approach, the ideal number of clusters is defined before information is extracted out of the clusters. On a system with enough power, different cluster properties (intensity, direction ) can be compared in time to see which number of clusters is ideal. E.g. if a body part is moving in a certain direction with velocity x, it is expected that the same body part will still be moving in the same direction the next frame but increasing or decreasing its velocity x. This way you can expect a cluster at the same location of the previous frame featuring slightly different properties. Increasing the frame rate of the video will make clusters change more gradually but more system power is needed. This method can be described as cluster tracking. We also experimented with pixel intensity of the original video pixels as a feature but this resulted in bad clusters. This is because the pixel intensity is not directly related to the movement of the patient. The light source stays at the same position in time. Therefore it was not studied any further. Finally, optical flow sometimes has problems with slow movement as the magnitude of these movements becomes comparable to the magnitude of noise. This requires an adjustment of the optical flow settings. Optical flow needs to be calculated on two frames, the current frame and a frame shifted in time. This could be improved, measuring the average movement over a fixed period in order to calculate which frame in the past should be used to calculate the optical flow. This way it is possible to have an accurate motion field with less noise. V. CONCLUSION The research learned us that it is possible to cluster movement in video images. Different body parts can be separated using the location, direction and intensity of the movement. Out of these clusters further information can be extracted whether or not the patient is having epilepsy seizures, is breathing etc. Of course this method still has room for improvement. VI. ACKNOWLEDGMENTS Special thanks to the Mobilab team and KHKempen to make this research possible. REFERENCES [1] Casson, A., Yates, D., Smith, S., Duncan, J., & Rodriguez-Villegas, E. (2010). Wearable Electroencephalography. Engineering in Medicine and Biology Magazine, IEEE (Volume 29, Issue 3 ), 44. [2] Chen, W.-Y., Song, Y., Bai, H., Lin, C.-J., & Chang, E. Y. (2008). Parallel Spectral Clustering in Distributed Systems. Lecture Notes in Artificial Intelligence (Vol. 5212), [3] Chen, W.-Y., Song, Y., Bai, H., Lin, C.-J., & Chang, E. Y. (2010). Parallel Spectral Clustering in Distributed Systems toolbox. Retrieved from [4] Cuppens, K., Lagae, L., Ceulemans, B., Van Huffel, S., & Vanrumste, B. (2009). Automatic video detection of body movement during sleep based on optical flow in pediatric patients with epilepsy. Medical and Biological Engineering and Computing (Volume 48, Issue 9), [5] Dai, Q., & Leng, B. (2007). Video object segmentation based on accumulative frame difference. Tsinghua University, Broadband Network & Digital Media Lab of Dept. Automation, Beijing. [6] De Tollenaere, J. (2008). Spectrale clustering. In J. D. Tollenaere, Zelflerende Spraakherkenning. (pp. 5-18). Katholieke universiteit Leuven, Leuven, België. [7] Fleet, D. J., & Weiss, Y. (2005). Optical Flow Estimation. In N. Paragios, Y. Chen, & O. Faugeras, Mathematical models for Computer Vision: The Handbook. Springer. [8] Fuh, C.-S., & Maragos, P. (1989). Region-based optical flow estimation. Harvard University, Division of Applied Sciences, Cambridge. [9] Galic, S., & Loncaric, S. (2000). Spatio-temporal image segmentation using optical flow and clustering algorithm. In Proceedings of the First International Workshop on Image and Signal Processing and Analysis. Zagreb, Croatia: IWISPA. [10] International League Against Epilepsy. (n.d.). Epilepsy resources. Retrieved from [11] Lee, Y., & Choi, S. (2004). Minimum entropy, k-means, spectral clustering. ETRI, Biometrics Technol. Res. Team, Daejon. [12] Nijsen, N. M., Cluitmans, P. J., Arends, J. B., & Griep, P. A. (2007). Detection of Subtle Nocturnal Motor Activity From 3-D Accelerometry Recordings in Epilepsy Patients. IEEE Transactions on Biomedical Engineering. [13] Raskutti, B., & Leckie, C. (1999). An Evaluation of Criteria for Measuring the Quality of Clusters. Telstra Research Laboratories. [14] Schachter, S. C. (2006). Electroencephalography. Retrieved from [15] Schunck, B. G., & Horn, B. K. (1980). Determining Optical Flow. Massachusetts Institute of, Artificial Intelligence Laboratory, Cambridge. [16] Top, H. (2007). Optical flow en bewegingillusies. University of Groningen, Faculty of Mathematics & Natural Sciences. [17] Turi, R. H., & Ray, S. (1999). Determination of Number of Clusters in K-Means Clustering and Application in Colour Image Segmentation. Monash University, School of Computer Science and Software Engineering, Victoria Australië. [18] Von Luxburg, U. (2007). A Tutorial on Spectral Clustering. Statistics and Computing (Volume 17 Issue 4). [19] Xu, L., Jia, J., & Matsushita, Y. (2010). Motion Detail Preserving Optical Flow Estimation. The Chinese University of Hong Kong, Microsoft Research Asia, Hong Kong. [20] Zagar, M., Denis, S., & Fuduric, D. (2007). Human Movement Detection Based on Acceleration Measurements and k-nn Classification. Univ. of Zagreb, Zagreb. [21] Zelnik-Manor, L. (2004, Oktober). The Optical Flow Field. Retrieved from 148

155 A Comparison of Voltage-Controlled Ring Oscillators for Subsampling Receivers with ps Resolution S. V. Roy 1, M. Strackx 2,3, P. Leroux 1,2 1 K.H.Kempen, IBW-RELIC, Kleinhoefstraat 4, B-2440 Geel, Belgium 2 K.U.Leuven, Dept. Elektrotechniek ESAT, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium 3 SCK CEN, EHS-RDC, Belgian Nuclear Research Institute, Boeretang 200, B-2400 Mol, Belgium Abstract A comparison between different stages in ring oscillators is presented both in terms of jitter and phase noise. The architecture with least jitter and phase noise will be searched for. Also there will be a comparison of a multi-path and a single path ring oscillator. From simulation results we can conclude that a multi-path differential architecture produces less phase noise and jitter than a single-path differential architecture. The reason for this is the low RMS values of the impulse sensitivity function that are obtained. Index Terms Voltage Controlled Ring Oscillator, design methodology, jitter, phase noise, multi-path. T I. INTRODUCTION hese days, ring oscillators are used in many applications. They can be used in applications such as clock recovery circuits for serial data communications [1]-[4], disk-drive read channels [5], [6], on-chip clock distribution [7]-[10], and integrated frequency synthesizers [10], [11]. In these applications, the ring oscillator is typically a voltage controlled oscillator (VCRO). The use of Ultra-Wideband (UWB) radars has been proven useful in the biomedical industry. They can be used for monitoring a patient s breathing and heartbeat [12], hematoma detection [13] and 3-D mammography [14]. In [15], the use of UWB radars for measuring tissue permittivity is described, with the intention to apply the technique to radiotherapy. There a measurement setup using time domain reflectometry (TDR) is proposed. For this work, changes in the electromagnetic parameter of the material must be derived out of the reflected UWB pulse. A digital equivalent-time sampling UWB receiver architecture is proposed in [16]. It specifies a resolution of 11 bits or more for the ADC, and a sampling frequency with jitter less than or equal to 1 ps. This sampling frequency will be provided by a VCRO in a Phase- Locked Loop (PLL). Fig. 1 shows that for each new period a certain delay τ is added, so for each period, a time τ later, a sample is taken. Fig. 1 Principle of UWB pulse subsampling Because the time τ is very low, the jitter must also be low. The jitter cannot be greater than one tenth of the delay. This paper describes how a ring oscillator is used to add a certain delay to each period, for example to perform subsampling. Fig. 2 shows a subsample circuit using a PLL. There will occur a frequency division of 5, because the frequency that is produced by the VCRO is 5 times greater than those provided to the phase detector. The logic will select the correct branch from the VCRO where necessary to obtain the desired delay. Each period, a line further will be taken in order to obtain a resolution equal to the delay of a cell. Fig. 2 Principle of subsampling in a PLL using a VCRO A comparison is done between a multi-path VCRO and a cross-coupled load. It is concluded that the multi-path is better 149

156 than the single-path load. This will be visible in the delay of a single stage, but also jitter and phase noise are better. Jitter is a variation on the time of a periodic signal, often used in a comparison with a clock source. An example of what can happen in the presence of jitter: if an analog-to-digital or digital-to-analog conversion occurs, it has to be ensured that the sampling time remains constant. If this is not the case, i.e., there is jitter present in the clock signal of the analog-to-digital convertor (ADC) or digital-to-analog convertor (DAC), then the phase and amplitude of the signal is affected depending on the magnitude of the jitter. Fig. 3 shows how jitter evolves over time. The jitter added by each separate stage, is totally independent of jitter added by other stages. Therefore, the total variation of jitter is given by the sum of jitter which is added by each stage separately. When little jitter occurs, it will expand over time, because several stages are dealt with, which also have jitter and thus the sum is greater. Therefore the gated ring oscillator is used [17]. It will be turned on and after a certain time back off. This means that the jitter does not increase in time when the oscillator does not need to work. input of the next stage. In this case, the negative (resp. positive) output from one stage can be found at the drain of M1 (resp. M2). The positive (resp. negative) input of a stage can be found at the gate of M1 (resp. M2). It must be ensured that there are an odd number of stages, otherwise the whole structure will not oscillate. There can also be taken an even number of stages but then there has to be a crossing in the connection. This is shown in Fig. 6. Negative Output Positive Input V cntrl Positive Output Negative Input Fig. 4 Cross-coupled load Fig. 3 Jitter expanding over time When talking about phase noise, this is the same as jitter, phase noise is only seen in the frequency domain and jitter in the time domain. In the next section the architecture of the VCRO shall be cited. Hereby are the single-path differential architecture and multipath differential architecture discussed, but also the Cross- Coupled Load (CCL) architecture which is the architecture of a single element. Thereafter there is a chapter that will discuss the calculation of the impulse sensitivity function (ISF). This ISF is necessary to calculate the RMS value and this is required to calculate phase noise and jitter. Finally some simulation results will be discussed. II. VCRO ARCHITECTURE This paper will show the difference in phase noise and jitter on a single-path differential ring oscillator and a multi-path ring oscillator. Both architectures have a different number of crosscoupled load (CCL) stages. There is chosen for the CCL because it is not too difficult in terms of architecture, but still good for an acceptable jitter and phase noise [18]. Fig. 4 shows the cross-coupled load. It is clear why these stages are called cross-coupled load, namely because the transistors M1 and M2 are loads that are cross-wise coupled to the opposite branch. To use these stages to form a ring oscillator, they must be contiguous as shown in Fig. 5. The negative (resp. positive) output of the stage is connected to the positive (resp. negative) Fig. 5 Differential ring oscillator (odd number of stages) Fig. 6 Differential ring oscillator (even number of stages) In a multi-path ring oscillator, we have the same architecture as in the previous differential ring oscillator, except that the output of a stage is not only connected to the input of the next stage, but also to the input of one or more of the following stages. An example of a multi-path ring oscillator with 9 stages is shown in Fig. 7. If more inputs are used per stage, then the complexity will increase. In order to link the stages together, extra input transistors must be placed in parallel with the input transistor that already was present (Fig. 8). The size of these extra transistors will be different from those that are already there, because they must have less influence on this stage. This premature charging or discharging of the node provides a gain in speed. 150

157 and is the effective capacitance on the node at the time of injection. For small V, the change in phase (t) is proportional to the injected current: Γ (5) where V swing is the voltage swing over the capacitor. Γ is the time-varying proportionality constant, and it has a period of 2π. Γx represents the sensitivity of each point of the waveform to a perturbation. Γx is therefore called the impulse sensitivity function (ISF). A current will be injected during simulations in order to be able to measure V and. With these values one can calculate the ISF with (5). The is read out several periods after injecting the current. With the current injected at different times, we can draw a graph as shown in Fig. 9. Fig. 7 Example of a multi-path ring oscillator with 9 stages V cntrl Output Negative Positive Inputs Output Negative Inputs Output Output Positive Fig. 9 Approximate waveform and ISF for ring oscillators The phase noise spectrum originating from a white noise current source is given by [20]: / (6) where Γ is the RMS value of the ISF, ı / f the singlesideband power spectral density of the noise current, q max the maximum charge swing and f the frequency offset from the carrier. The related phase jitter is then given by: Fig. 8 Cross-coupled load with multiple inputs III. CALCULATING THE ISF FOR RING OSCILLATORS This section summarizes the main results from [19] which will allow us to calculate the ISF. When a current is injected at a node, phase noise and jitter will occur. Suppose that the current consists of an impulse with a charge q (in coulombs) and that it occurs at t = τ. This will cause a change in voltage on the node. This voltage change is given by: (4) Γ 2 2. / 2 (7) For the calculation of phase noise and jitter using (6) and (7), one needs to know the RMS value of the ISF. As can be seen in these formulas, it is preferable to keep the RMS value as low as possible. 151

158 For the oscillator waveform in Fig. 9, the ISF has a maximum of 1/f max. Here f max is the maximum slope of the normalized waveform f in (8). So what is desired is that the slope is as steep as possible. This is the output of a practical oscillator:. [ + ] (8) If it is assumed that the rise and fall time are the same, Γ can be estimated as: Γ Γ / ² (9) On the other hand, the stage delay is proportional to the rise time. (10) where is the normalized stage delay and η is the proportionality constant, which is typically close to one. The period is 2N times longer than a single stage delay: 2 2 (11) Where N is the number of stages. If formulas (9) and (11) are placed beside each other, the following approximate formula is obtained: Γ. (12) Note that the 1/N 1.5 dependence of Γ rms is independent on the value of η. IV. EXPERIMENTAL RESULTS By simulating both the multi-path and the single-path differential architecture, there is a minimum delay for a stage achieved at each of these architectures. In multi-path differential architecture, this delay is 20ps per element. For example, if a window of 2ns is desired, there are 100 elements needed to sample the entire window. In the singlepath differential architecture, this delay is 50ps per element. Then there will be needed 40 elements to sample the entire window of 2ns. As discussed in previous section, the phase noise and jitter are calculated using (6) and (7), but before these formulas are used, the RMS value of the ISF must be known. The ISF of every architecture must be simulated. This is achieved by injecting a current at a node and measure the phase shift several transitions later on. With (5), the ISF can be calculated. The ISF of the single-path differential architecture is shown in Fig. 10. Fig. 11 shows the ISF at the rising edge. The ISF of the multi-path architecture can be found in Fig. 12. Fig. 13 shows again the ISF at the rising edge. It can already be noted that the ISF becomes wider when the number of elements decreases. When calculating the RMS value from this ISF, it will be noted that the RMS value decreases with increasing number of elements. This can be seen in Fig. 14. If looking at (6) and (7), it can be observed that the RMS value is preferably kept as low as possible. x r(w0t) Elements 54 Elements 72 Elements 90 Elements x(radian) Fig. 10 ISF of a single-path differential architecture r(w0t) x Elements 54 Elements 72 Elements 90 Elements x(radian) Fig. 11 Magnification of the ISF at rising edge r(w0t) 2 x Elements 63 Elements 81 Elements 99 Elements x(radian) Fig. 12 ISF multi-path differential architecture 152

159 r(w0t) Fig. 13 Magnification of the ISF at rising edge rms-value x x Number of delay elements Fig. 14 RMS value of single-path and multi-path architecture V. CONCLUSION A comparison of different voltage-controlled ring oscillators has been presented. This has been especially the single-path differential architecture and multi-path differential architecture. In this comparison, we mainly focused on the phase noise and jitter, because these are important in the subsampling we want to execute. From simulations and calculations we can conclude that the multi-path architecture in terms of differential phase noise and jitter certainly obtain better results than single-path differential architecture. REFERENCES 45 Elements 63 Elements 81 Elements 99 Elements x(radian) Single-path Multi-path [1] L. DeVito, J. Newton, R. Croughwell, J. Bulzacchelli, and F. Benkley, A 52 and 155 MHz clock-recovery PLL, in ISSCC Dig. Tech. Papers, pp , Feb [2] A. W. Buchwald, K. W. Martin, A. K. Oki, and K. W. Kobayashi, A 6-GHz integrated phase-locked loop using AlCaAs/Ga/As heterojunction bipolar transistors, IEEE J. Solid-State Circuits, vol. 27, pp , Dec [3] B. Lai and R. C. Walker, A monolithic 622 Mb/s clock extraction data retiming circuit, in ISSCC Dig. Tech. Papers, pp , Feb [4] R. Farjad-Rad, C. K. Yang, M. Horowitz, and T. H. Lee, A 0.4 mm CMOS 10 Gb/s 4-PAM pre-emphasis serial link transmitter, in Symp. VLSI Circuits Dig. Tech Papers, pp , June [5] W. D. Llewellyn, M. M. H. Wong, G. W. Tietz, and P. A. Tucci, A 33 Mbi/s data synchronizing phase-locked loop circuit, in ISSCC Dig. Tech. Papers, pp , Feb [6] M. Negahban, R. Behrasi, G. Tsang, H. Abouhossein, and G. Bouchaya, A two-chip CMOS read channel for hard-disk drives, in ISSCC Dig. Tech. Papers, pp , Feb [7] M. G. Johnson and E. L. Hudson, A variable delay line PLL for CPUcoprocessor synchronization, IEEE J. Solid-State Circuits, vol. 23, pp , Oct [8] I. A. Young, J. K. Greason, and K. L. Wong, A PLL clock generator with MHz of lock range for microprocessors, IEEE J. Solid- State Circuits, vol. 27, pp , Nov [9] J. Alvarez, H. Sanchez, G. Gerosa, and R. Countryman, A widebandwidth low-voltage PLL for PowerPCTM microprocessors, IEEE J. Solid-State Circuits, vol. 30, pp , Apr [10] I. A. Young, J. K. Greason, J. E. Smith, and K. L. Wong, A PLL clock generator with MHz lock range for microprocessors, in ISSCC Dig. Tech. Papers, pp , Feb [11] M. Horowitz, A. Chen, J. Cobrunson, J. Gasbarro, T. Lee, W. Leung, W. Richardson, T. Thrush, and Y. Fujii, PLL design for a 500 Mb/s [12] I. Immoreev, Teh-Ho Tao, UWB Radar for Patient Monitoring, IEEE Aerospace and Electronic Systems Magazine, vol. 23, no. 11, pp , [13] C. N. Paulson et al., Ultra-wideband Radar Methods and Techniques of Medical Sensing and Imaging, Proceedings of the SPIE, vol. 6007, pp , [14] S. K. Davis et al., Breast Tumor Characterization Based on Ultrawideband Microwave Backscatter, IEEE Transactions on Biomedical Engineering, vol. 55, no. 1, pp , [15] M. Strackx et al., Measuring Material/Tissue Permittivity by UWB Time-domain Reflectometry Techniques, Applied Sciences in Biomedical and Communication Technologies (ISABEL), rd International Symposium. [16] M. Strackx et al, Analysis of a digital UWB receiver for biomedical applications, European Radar Conference (EuRAD), 2011, submitted for publication. [17] M. Z. Straayer and M. H. Perrott, A Multi-Path Gated Ring Oscillator TDC With First-Order Noise Shaping, IEEE J. Solid-State Circuits, vol. 44, no. 4, Apr [18] Rafael J. Betancourt Zamora, T. Lee, Low Phase Noise CMOS Ring Oscillator VCOs for Frequency Synthesis. [19] A. Hajimiri, S. Limotyrakis, and T. H. Lee, Jitter and Phase Noise in Ring Oscillators, IEEE J. Solid-State Circuits, vol. 34, no. 6, June [20] A. Hajimiri and T. H. Lee, A general theory of phase noise in electrical oscillators, IEEE J. Solid-State Circuits, vol. 33, pp , Feb

160 154

161 Testing and integrating a MES Gert Vandyck FBFC International Europalaan 12 B-2480 Dessel, Belgium Supervisor(s): Marc Van Baelen Abstract Production volume and quality are very important for any company. When you are producing nuclear fuel, quality concerns increase as failures risk the lives of both employees and the community to which the fuels are shipped. Technology and automation addresses the concerns. In a factory environment, management process and automation is called a Manufacturing Execution System (MES). This paper highlights the project of testing a new MES, and integrating that MES into an existing environment with other MESs in place already. Keywords Manufacturing Execution System, software testing I. INTRODUCTION FBFC International produces fuel assemblies for nuclear Pressurized Water Reactors based on uranium dioxide (UO 2 ) and mixed oxide (MOX). The production of these assemblies is divided into three steps: fabrication of the UO 2 -pellets, fabrication of fuel rods, and the final assembly of 264 rods. The last 2 production steps each had a MES for that portion of the production. FBFC International recognized the importance of a reliable MES to both maintain the highest quality and optimized production volume of the pellet manufacturing. Unlike the latter two production steps, the UO2-pellets portion is in constant production 24 hours a day, 7 days a week. When the two previous individual MESs were integrated, the process did not go smoothly. It required a lot of time and support from IT and production workers to minimize downtime. FBFC could not afford the downtime with the pellet manufacturing. This is why the project was so critical. II. MANUFACTURING EXECUTION SYSTEM A Manufacturing Execution System (MES) is an information processing and transmission system in a production environment. MESA International (Manufacturing Enterprise Solutions Association) is a global community who are focused on improving Operations Management capabilities through the effective application of technology solutions and best practices. In one of its white papers MESA defined 11 manufacturing execution activities, which later gained recognition primarily thanks to the MESA honeycomb model, illustrated in figure 1. [1] Simply put, the original concept, Manufacturing Execution System, concerns information systems that support the tasks a production department must do in order to: Fig. 1. Manufacturing execution activities in the honeycomb model. c MESA International. Prepare and manage work instructions Schedule production activities Monitor the execution of the production process Gather and analyze information about the process and the product Solve problems and optimize procedures At FBFC International, most of these things were done manually, without the use of any automation. III. VALIDATION OF THE TECHNICAL ANALYSIS With the intention of purchasing core software for the new MES, a detailed technical analysis of the system requirements need to be both created and validated. The document that describes the functionality of the system is called the system requirement, or spec/specs for short. Every aspect of the production process had to be studied extensively. Only then could one judge whether the technical analysis covers all conditions. The specs will be reference by the software supplier who will customize the software. Also, the specs will be the primary reference used by the in-house testers. 155

162 There is another reason why it s important to properly validate the system requirements before the software has been developed. Any major addition or change to the system requirements will be charged separately by the software supplier to implement the changes. Plus changes added later can, and often do, cause unforeseen bugs in parts of the software that had already been validated. So we can conclude that it s crucial that the specs are validated with the greatest precision. The MES for FBFC International wasn t been built from scratch. It s a modified version of the MES of a sister company. The production process at the sister company is similar to ours. Yet, there are significant differences. Consequently, their original system requirements can act as a basis for our MES, but it needs revisions to reflect how the two plants differ. That s why during the requirements validation, it s utterly important to check if all differences have been taken into account. How we did this is covered in this paper. IV. HARDWARE AND SOFTWARE PLATFORM To guarantee a smooth installation and to set up a proper test environment, it s necessary to have a good understanding of how the software works. A. Wonderware InTrack The MES module used by the software supplier is InTrack, developed by Wonderware. InTrack is the core of the MES. All data is stored in a Microsoft SQL Server database. This database is generated by the InTrack setup and custom tables have been added by the software supplier. The software supplier used visual basic programs to create process specific functionalities within the InTrack module. B. OPC Server To communicate with the Programmable Logic Controllers (PLCs) of all the machines, an Object Linking and Embedding (OLE) for Process Control (OPC) Server was used. The OPC Server connects to the various PLCs and translates the data into a standard-based OPC format. Next, this information can be accessed by the visual basic programs, using an OPC client. V. TESTING Software Testing is the process of executing a program or system with the intent of finding errors. [2] Testing of an MES is an important step before it can be implemented. The testing done by the client is called acceptance testing. These tests are performed prior to the actual transfer of ownership. A. Acceptance Testing During these tests, the end-user validates that the software does what it is expected to do. This basically means that they need to verify that the software conforms to the technical specifications documented during the requirements phase. Besides some test scenarios, we mostly used the system specifications during these test. For every action there are conditions that have to be matched before the action is executed. All conditions detailed in the spec needed to be verified by the end-users. This wasn t easy at the start. Rather than technical issues, the main problem was time priorities of the end-users. As important as any technical issues is the politics of getting the end-users to prioritize time to do the testing. After some inter-departmental efforts all users were able to perform their tests. Besides the validation of the functionality, the end-user had a far better understanding of the workings of the software than with any previous integration/implementation. The verification of the correct execution of a lot of the functions was not so easily completed. Many actions resulted in only changes to the data stored in the database. There were some reports available, where the user could check some data, but a lot of information was not visible. Thus, it required the IT department to do much of the acceptance testing by analyzing the log files. These log files contain a lot of information: Date and time Functions being used Input and output parameters All queries being executed on the database B. Simulation of Machines In order to simulate the production environment, the working of the different machines had to be simulated. To accomplish this, the software developer supplied us a simulator, illustrated in figure 2. Fig. 2. Simulation of the spheroidization process. Basically this simulator changes the OPC-items, just like the PLCs would do. C. Simulations Besides the verification of the technical specification we also did some simulations with the senior production operators. These operators know every aspect of the operation by heart, so they can easily spot shortcomings of the software. We also did a complete simulation with all the departments, mainly to explain everybody s function in the process D. Bug tracking The software suppliers were on-site only during the first few days of the acceptance testing. During further testing, we reported bugs and variances from the specifications on a daily basis to the software supplier. Diligence was required to uniquely 156

Neural Networks as Cybernetic Systems

Neural Networks as Cybernetic Systems --- Neural Networks as Cybernetic Systems 2 nd and revised edition Holk Cruse Neural Networks as Cybernetic Systems 2 nd and revised edition Holk Cruse, Dr. Department of Biological Cybernetics and Theoretical

More information

Agilent Time Domain Analysis Using a Network Analyzer

Agilent Time Domain Analysis Using a Network Analyzer Agilent Time Domain Analysis Using a Network Analyzer Application Note 1287-12 0.0 0.045 0.6 0.035 Cable S(1,1) 0.4 0.2 Cable S(1,1) 0.025 0.015 0.005 0.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Frequency (GHz) 0.005

More information

Agilent Network Analyzer Basics

Agilent Network Analyzer Basics Agilent Network Analyzer Basics 2 Abstract This presentation covers the principles of measuring high-frequency electrical networks with network analyzers. You will learn what kinds of measurements are

More information

Motion Control for Newbies. Featuring maxon EPOS2 P.

Motion Control for Newbies. Featuring maxon EPOS2 P. Urs Kafader Motion Control for Newbies. Featuring maxon EPOS2 P. First Edition 2014 2014, maxon academy, Sachseln This work is protected by copyright. All rights reserved, including but not limited to

More information

Sampling 50 Years After Shannon

Sampling 50 Years After Shannon Sampling 50 Years After Shannon MICHAEL UNSER, FELLOW, IEEE This paper presents an account of the current state of sampling, 50 years after Shannon s formulation of the sampling theorem. The emphasis is

More information

MusicalHeart: A Hearty Way of Listening to Music

MusicalHeart: A Hearty Way of Listening to Music alheart: A Hearty Way of Listening to Shahriar Nirjon, Robert F. Dickerson, Qiang Li, Philip Asare, and John A. Stankovic Department of Computer Science University of Virginia, USA {smn8z, rfd7a, lq7c,

More information

Introduction to Data Mining and Knowledge Discovery

Introduction to Data Mining and Knowledge Discovery Introduction to Data Mining and Knowledge Discovery Third Edition by Two Crows Corporation RELATED READINGS Data Mining 99: Technology Report, Two Crows Corporation, 1999 M. Berry and G. Linoff, Data Mining

More information

A Preliminary Study of Wireless Body Area Networks

A Preliminary Study of Wireless Body Area Networks Technical Report IDE0854, August 2008 A Preliminary Study of Wireless Body Area Networks Katrin Bilstrup School of Information Science, Computer and Electrical Engineering, Halmstad University, Box 823,

More information

IN the past decades the increasing power and costeffectiveness. Introduction to Industrial Control Networks

IN the past decades the increasing power and costeffectiveness. Introduction to Industrial Control Networks 1 Introduction to Industrial Control Networks Brendan Galloway and Gerhard P. Hancke, Senior Member, IEEE Abstract An industrial control network is a system of interconnected equipment used to monitor

More information

Reverse Engineering of Geometric Models - An Introduction

Reverse Engineering of Geometric Models - An Introduction Reverse Engineering of Geometric Models - An Introduction Tamás Várady Ralph R. Martin Jordan Cox 13 May 1996 Abstract In many areas of industry, it is desirable to create geometric models of existing

More information

ACKNOWLEDGEMENTS. giving me a good opportunity to work in his group at OSU. He has been a constant

ACKNOWLEDGEMENTS. giving me a good opportunity to work in his group at OSU. He has been a constant i ACKNOWLEDGEMENTS First and foremost, I would like to thank my advisor Dr. Un-Ku Moon for giving me a good opportunity to work in his group at OSU. He has been a constant source of guidance and support

More information

Virtualize Everything but Time

Virtualize Everything but Time Virtualize Everything but Time Timothy Broomhead Laurence Cremean Julien Ridoux Darryl Veitch Center for Ultra-Broadband Information Networks (CUBIN) Department of Electrical & Electronic Engineering,

More information

JCR or RDBMS why, when, how?

JCR or RDBMS why, when, how? JCR or RDBMS why, when, how? Bertil Chapuis 12/31/2008 Creative Commons Attribution 2.5 Switzerland License This paper compares java content repositories (JCR) and relational database management systems

More information

A Survey of Design Techniques for System-Level Dynamic Power Management

A Survey of Design Techniques for System-Level Dynamic Power Management IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 8, NO. 3, JUNE 2000 299 A Survey of Design Techniques for System-Level Dynamic Power Management Luca Benini, Member, IEEE, Alessandro

More information

The What, Where and Why of Real-Time Simulation

The What, Where and Why of Real-Time Simulation 37 The What, Where and Why of Real- Simulation J. Bélanger, Member, IEEE, P. Venne, Student Member, IEEE, and J.-N. Paquin, Member, IEEE Abstract-- Simulation tools have been widely used for the design

More information

Fast Forward» How the speed of the internet will develop between now and 2020. Commissioned by: NLkabel & Cable Europe. Project: 2013.

Fast Forward» How the speed of the internet will develop between now and 2020. Commissioned by: NLkabel & Cable Europe. Project: 2013. Fast Forward» How the speed of the internet will develop between now and 2020 Commissioned by: NLkabel & Cable Europe Project: 2013.048 Publication number: 2013.048-1262 Published: Utrecht, June 2014 Authors:

More information


NEER ENGI ENHANCING FORMAL MODEL- LING TOOL SUPPORT WITH INCREASED AUTOMATION. Electrical and Computer Engineering Technical Report ECE-TR-4 NEER ENGI ENHANCING FORMAL MODEL- LING TOOL SUPPORT WITH INCREASED AUTOMATION Electrical and Computer Engineering Technical Report ECE-TR-4 DATA SHEET Title: Enhancing Formal Modelling Tool Support with

More information



More information

Time-Correlated Single Photon Counting

Time-Correlated Single Photon Counting Technical Note Time-Correlated Single Photon Counting Michael Wahl PicoQuant GmbH, Rudower Chaussee 29, 12489 Berlin, Germany, The Principle of Time-Correlated Single Photon Counting

More information

Overcast: Reliable Multicasting with an Overlay Network

Overcast: Reliable Multicasting with an Overlay Network Overcast: Reliable Multicasting with an Overlay Network John Jannotti David K. Gifford Kirk L. Johnson M. Frans Kaashoek James W. O Toole, Jr. Cisco Systems {jj,gifford,tuna,kaashoek,otoole}

More information

THE PROBLEM OF finding localized energy solutions

THE PROBLEM OF finding localized energy solutions 600 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1997 Sparse Signal Reconstruction from Limited Data Using FOCUSS: A Re-weighted Minimum Norm Algorithm Irina F. Gorodnitsky, Member, IEEE,

More information

What Every Programmer Should Know About Memory

What Every Programmer Should Know About Memory What Every Programmer Should Know About Memory Ulrich Drepper Red Hat, Inc. November 21, 2007 1 Introduction Abstract As CPU cores become both faster and more numerous, the limiting

More information

Full Duplex Radios. Emily McMilin. Dinesh Bharadia. Sachin Katti ABSTRACT. Categories and Subject Descriptors 1. INTRODUCTION

Full Duplex Radios. Emily McMilin. Dinesh Bharadia. Sachin Katti ABSTRACT. Categories and Subject Descriptors 1. INTRODUCTION Full Duplex Radios Dinesh Bharadia Stanford University Emily McMilin Stanford University Sachin Katti Stanford University ABSTRACT This paper

More information

Application Note AN-00160

Application Note AN-00160 Considerations for Sending Data Over a Wireless Link Introduction Linx modules are designed to create a robust wireless link for the transfer of data. Since they are wireless devices, they are subject

More information

Sensing and Control. A Process Control Primer

Sensing and Control. A Process Control Primer Sensing and Control A Process Control Primer Copyright, Notices, and Trademarks Printed in U.S.A. Copyright 2000 by Honeywell Revision 1 July 2000 While this information is presented in good faith and

More information

Analysis of End-to-End Response Times of Multi-Tier Internet Services

Analysis of End-to-End Response Times of Multi-Tier Internet Services Analysis of End-to-End Response Times of Multi-Tier Internet Services ABSTRACT Modern Internet systems have evolved from simple monolithic systems to complex multi-tiered architectures For these systems,

More information

Introduction to InfiniBand for End Users Industry-Standard Value and Performance for High Performance Computing and the Enterprise

Introduction to InfiniBand for End Users Industry-Standard Value and Performance for High Performance Computing and the Enterprise Introduction to InfiniBand for End Users Industry-Standard Value and Performance for High Performance Computing and the Enterprise Paul Grun InfiniBand Trade Association INTRO TO INFINIBAND FOR END USERS

More information

Intrusion Detection Techniques for Mobile Wireless Networks

Intrusion Detection Techniques for Mobile Wireless Networks Mobile Networks and Applications? (2003) 1 16 1 Intrusion Detection Techniques for Mobile Wireless Networks Yongguang Zhang HRL Laboratories LLC, Malibu, California E-mail: Wenke Lee College

More information

WTF: The Who to Follow Service at Twitter

WTF: The Who to Follow Service at Twitter WTF: The Who to Follow Service at Twitter Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, Reza Zadeh Twitter, Inc. @pankaj @ashishgoel @lintool @aneeshs @dongwang218 @reza_zadeh ABSTRACT

More information

By Nicholas R. Jennings and Stefan Bussmann

By Nicholas R. Jennings and Stefan Bussmann odern control systems must meet increasingly demanding requirements stemming from the need to cope with significant degrees of uncertainty, as well as with By Nicholas R. Jennings and Stefan Bussmann Mmore

More information