Report Series A - Abstracts

University of Joensuu, Department of Computer Science, Report Series A

Reports available by ftp from

ftp://cs.joensuu.fi/pub/Reports/

or in printed form from

Department of Computer Science
University of Joensuu
P.O.Box 111
FIN-80101 Joensuu
Finland

Filename: A-2007-4

Title: Implementing Speaker Recognition System: from Matlab to Practice

Authors: Fränti, P., Saastamoinen, J., Kärkkäinen, I., Kinnunen, T., Hautamäki, V. and Sidoroff, I.

Abstract:

In this paper, we summarize the main achievements made in the 4-years PUMS project during 2003-2007. In addition to the reported research progress, the emphasis is on the practical implementations, how we have moved from Matlab and Praat scripting to C/C++ implemented prototype applications in Windows, Unix, Linux and Symbian environments, with the motivation to enhance technology transfer. We summarize how the baseline methods have been implemented in practice, and how the results are expected to be utilized commercially and otherwise after the project. Brief view for future research challenges is outlined.

During the project, we had two main goals: (1) have a solid baseline that is close to the state-of-the-art, (2) implement this method in all relevant platforms. Besides this, there were no strong exact agenda but all intermediate goals were constructed annually due to the course of progress, reflecting the feedback from our partners, and according to our own understanding what we should do next and what we are capable of. One cannot predict the future and set specific innovative research goals. The only way to reach higher goals is via hard thorough working, but also allowing enough freedom of research along the way to give room for new innovations, which may or may not appear. The project has also been a long learn-by-doing process as well.

Format: Adobe Acrobat PDF, 40 pages

ISBN: 978-952-219-061-1

Filename: A-2007-2

Title: Bit-parallel string matching under Hamming distance in O(n?m/w?) worst case time

Authors: Grabowski, S. and Fredriksson, K.

Abstract:

Given two strings, a pattern $P$ of length $m$ and a text $T$ of length $n$ over some alphabet $\Sigma$, we consider the string matching problem under $k$ mismatches. The well--known Shift-Add algorithm (Baeza-Yates and Gonnet, 1992) solves the problem in $O(n\ceil{m\log(k)/w})$ worst case time, where $w$ is the number of bits in a computer word. We present two algorithms that improve this to $O(n\ceil{m\log\log(k)/w})$ and $O(n\ceil{m/w})$, respectively. The algorithms make use of nested varying length bit-strings, that represent the search state. We call these {\em Matryoshka counters}. The techniques we developed are of more general use for string matching problems.

Keywords: algorithms, approximate string matching, bit-parallelism, Hamming distance

Format: Adobe Acrobat PDF, 6 pages

ISBN: 978-952-458-960-4

Filename: A-2007-1

Title: Succinct pattern matching automata

Authors: Fredriksson, K.

Abstract:

We consider single and multiple string matching in small space and optimal average time. Our algorithm is based on the combination of compressed self-indexes and Backward-DAWG-Matching algorithm. Efficient implementation techniques are discussed.

Keywords: algorithms, string matching, text indexing, compression, succinct data structures

Format: Adobe Acrobat PDF, 11 pages

ISBN: 978-952-458-959-8

Filename: A-2006-3

Title: Compressed representation of sequences with constant time random access

Authors: Fredriksson, K. and Nikitin, F.

Abstract:

Given a sequence $S$ of $n$ symbols over some alphabet $\Sigma$, we develop a new compression method that is $(i)$ extremely simple to implement; $(ii)$ gives good compression ratio; $(iii)$ provides $O(1)$ time random access to any symbol of the original sequence. (H_0(S)+1)$, and %$H_0(S)$ is the zeroth-order empirical entropy of $S$. Our simplest solution uses at most $2h+o(h)$ bits of space, where $h = n (H_0(S)+1)$, and $H_0(S)$ is the zeroth-order empirical entropy of $S$. This can be improved to take only $h + h' + o(n) + O(\log\log(h))$ bits of space, where $h' = h(-\frac{n}{h}\log_2(\frac{n}{h}) - \frac{h-n}{h}\log_2(\frac{h-n}{h}))$. of space where $H_0(S)$ is the $H_0(S)$ can be replaced by the $k$-order empirical entropy $H_k(S)$ for constant size alphabets, for $k=o(\log_\sigma(n))$. We discuss a number of improvements and trade-offs (we obtain e.g.~$n(H_0(S)+2\sqrt{H_0(S)}+1) + o(n(\sqrt{H_0(S)}+1))$ bits) and give potential applications for the method.

Keywords: algorithms, compression, succinct sequences, self-delimiting integers

Format: Adobe Acrobat PDF, 8 pages

ISBN: 952-458-873-0

Filename: A-2006-2

Title: Sublinear Parameterized Single and Multiple String Matching

Authors: Fredriksson, K. and Mozgovoy, M.

Abstract:

We consider the following pattern matching problem. We have {\em pattern} $P[0 \ldots m-1]$ and {\em text} is $T[0 \ldots n-1]$, where the symbols of $P$ and $T$ are taken from two disjoint finite alphabets $\Sigma$ of size $\sigma$ and $\Lambda$ of size $\lambda$. The pattern $P$ matches the text substring $T[j \ldots j+m-1]$, iff for all $i \in \{0 \ldots m-1\}$ it holds that $M_j(P[i]) = T[j+i]$, where $M_j(\cdot)$ is one-to-one mapping on $\Sigma \cup \Lambda$ such that the mapping is identity on $\Sigma$, but on $\Lambda$ can be different for each text position $j$. We give efficient algorithms that find all parameterized occurrences of $P$ in $T$. The algorithms are based on generalizing Shift-Or and Backward DAWG Matching (BDM) algorithms. The latter can be used for searching $r$ patterns simultaneously. The Shift-Or based algorithm runs in $O(n\ceil{m/w})$ worst case time, while the average case for fixed alphabets and under some mild and realistic assumptions is $O(n\log_\sigma(m)/w)$, where $w$ is the number of bits in computer word. The BDM based algorithm runs in $O(n\log_\sigma(rm)/m)$ average time. This is optimal within a constant factor. For general alphabets the times increase by a factor $O(\log(m))$.

Keywords: algorithms, parameterized string matching, bit-parallelism, suffix automaton

Format: Adobe Acrobat PDF, 8 pages

ISBN: 952-458-807-2

Filename: A-2006-1

Title: Teaching Programming: Going beyond "Objects First"

Authors: J. Sajaniemi, C. Hu

Abstract:

The prevailing paradigm in teaching elementary programming uses Java as the first programming language and the "objects first" approach as the conceptual basis. This approach has several shortcomings, e.g., high drop-out rates and poor skills in basic constructs like loops. This paper suggests an alternative approach that combines a strong start in basic constructs with early object-orientation. The alternative approach is also compared with the ACM Computing Curricula.

Format: Adobe Acrobat PDF, 8 pages

ISBN: 952-458-801-3

Filename: A-2005-2

Title: Efficient Algorithms for (δ, α)-matching

Authors: K. Fredriksson, S. Grabowski

Abstract:

We consider the following string matching problem. Pattern p₀ p₁ p₂ ... p_m-1 (δ,α)-matches the text substring t_i₀ t_i₁ t_i₂ ... t_{i_m-1}, if | p_j-t_{i_j} | ≤ δ for j ∈ {0,..., m-1}, where i_j < i_j+1, and i_j+1 - i_j ≤ α + 1. The task is then to find all text positions i_m-1 that (δ,α)-match the pattern. The best previously known algorithm for this string matching problem runs in time O(nm) using O(α m) space, where n is the length of the text and m is the length of the pattern. We develop several algorithms that improve the previous bounds. Our techniques are based on novel combinations of sparse dynamic programming, pre-emptying the computation early, bit-parallelism and finite automata. Our complexities vary from O(n+|M|) worst case time to O(n) average time, where M = {(i,j) | p =_δ t_j}, using dynamic programming based approach. With bit-parallel dynamic programming we obtain worst case time O(n δ + ⌈ n/w ⌉ m) where w is the number of bits in machine word. Some of the ideas can be combined to obtain good average case complexities while retaining good worst case complexities. With bit-parallelism and nondeterministic finite automata we obtain O(n ⌈ m log₂(α) / w ⌉ ) worst case time, which is O(n) for short patterns. Finally, we show that all our algorithms can be generalized to solve several other problem variants. Our experimental results show that the algorithms work extremely well in practice.
Keywords: approximate string matching, music information retrieval, protein matching, gaps, negative gaps, sparse dynamic programming, bit-parallelism, nondeterministic finite automata.
ACM Classification: F.2.2 [Analysis of algorithms and problem complexity]: Non-nunmerical algorithms and problems --- Pattern matching, Computations on discrete structures; H.3.3 [Information storage and retrieval]: Information Search and Retrieval --- Search process.

Format: Adobe Acrobat PDF, 26 pages

ISBN: 952-458-758-0

Filename: A-2005-1

Title: The Mystery of Cohort Selection

Authors: T. Kinnunen, I. Kärkkäinen, P. Fränti

Abstract:

In speaker verification, cohort refers to a speaker-depended set of "anti- speakers" that are used in match score normalization. A large number of heuristic methods have been proposed for the selection of cohort models. In this paper, we use genetic algorithm (GA) for minimizing a cost function for a given security-convenience cost balance. The GA jointly optimizes the cohort sets and the global verification threshold. Our motivation is to use GA as an analysis tool. When comparing with heuristic selection methods, GA is used for obtaining a lower bound to error rates reachable by MFCC-GMM verification system. On the other hand, we analyze the models selected by GA, attempting to gain understanding into how cohort models should be selected for an application with given security-convenience tradeoff. Our findings with a subset of the NIST-1999 corpus suggest that in user-convenient application, the cohort models should be selected more close to the target than in secure application. The lower bounds in turn show that that there is a lot of room for further studies in score normalization, especially in the user-convenient end of the detection error tradeoff (DET) curve.

Format: Adobe Acrobat PDF, 27 pages

ISBN: 952-458-676-2

Filename: A-2004-2

Title: Java Tools for Morphological Image Analysis and Research

Authors: A. Podlasov, E. Ageenko

Abstract:

As a standard, research work in the field of image processing and compression includes lots of experimentation. New ideas shall be first implemented in some programming languages or environment, and then undergo extensive empirical evaluation. Analysis of the weak sides will lead to a further research and development work. The implementation process (programming of the developed method) is usually the most time-consuming part of any research process. Therefore the flexibility of the implementation, such as code readability, fast modification feasibility, friendly programming environment, etc. do often prevail over the implementation efficiency, memory load and computation time. The efficient implementation could be often too expensive to conclude, especially in the research process, when frequent algorithm modifications are required. Still, the implementation efficiency should not be overlooked in the algorithm design process. The experiments should possible be conducted in a realistic time-frame and using reasonable hardware resources. The current work examines the usability of NIH ImageJ Java-based framework in the image processing research in the case of implementing morphological operations. We demonstrate how the environment can be easily understood, and the new operations implemented and modified by researchers familiar with the basics of Java language. The set of morphological operators is implemented as a test case and shows good usability in terms of ease programming and computation efficiency.

Format: Adobe Acrobat PDF, 40 pages

ISBN: 952-458-591-X

Filename: A-2004-1

Title: Morphological reconstruction of semantic layers in map images

Authors: A. Podlasov, E. Ageenko

Abstract:

Map images are composed of semantic layers depicted in arbitrary color. Often images require separation into layers for storage and processing. Separation results in severe artifacts because of layer overlapping. In the current work, we design the technique to restore semantic layers after separation. The designed restoration technique alleviates compression deficiency of reconstructed layers versus corrupted ones with lossless compression algorithms (ITU Group 4, PNG, JBIG and IBIG2 with optimized context templates), and provides better visual quality of images in operations with selective layer removal/extraction.

Format: Adobe Acrobat PDF, 15 pages

ISBN: 952-458-437-9

Filename: A-2003-6

Title: Roles of Variables From the Perspective of Computer Science Educators

Authors: M. Ben-Ari, J. Sajaniemi

Abstract:

Roles can be assigned to occurrences of variables in programs according to a small number of patterns of use that are both language- and algorithm-independent. Preliminary studies on explicitly teaching roles of variables to novice students have shown that roles are an excellent pedagogical tool for clarifying the structure and meaning of programs. This paper describes the results of an investigation designed to test the understandability and acceptability of the role concept and of the individual roles, as seen by computer science educators. The investigation consisted of a short tutorial on roles, a brief training session on assigning roles to variables, a test evaluating the subjects' ability to assign roles, and a set of open questions concerning their opinions of roles. The responses of 51 computer science educators were analyzed. Roles were identified by 85 % accuracy, and in typical uses of variables by 93 % accuracy. Subjects' comments on the role concept in general were mostly positive, and they believed that roles could contribute to understanding programs. The role set used in the investigation turned out to be suitable for describing variable usage in novice-level programs.

Keywords:

Programming education, roles of variables

Format: Adobe Acrobat PDF, 9 pages

ISBN: 952-458-407-7

Filename: A-2003-5

Title: Puutteiden lukumäärän estimointi toisen asteen polynomin avulla

Author: M. Niemi

Abstract:

Ohjelmistotuotannossa voidaan estimoida puutteiden lukumäärätietoja erilaisilla matemaattisilla malleilla. Tutkimusraportissa sovitetaan toisen asteen polynomin mukainen käyrä regressioanalyysin avulla havaintoaineistoon. Aineistona on kirjallisuudesta saadut, todellisissa ohjelmistoprojekteissa kerätyt puutetiedot. Saatua mallia arvioidaan F-testin avulla sekä verrataan aiemmin saatuihin tutkimustuloksiin. Lisäksi selvitetään, kuinka olemassaolevaa perusmallia voidaan muuttaa ottamalla huomioon käynnissä olevan projektin puutetiedot eli muodostamalla dynaaminen malli. Tulokset osoittavat polynomisen mallin soveltuvan melko huonosti tässä tutkimuksessa käytettyyn puuteaineistoon. Aiemman tutkimuksen perusteella saadut tulokset Norden/Rayleigh-jakaumalla ovat parempia dynaamisen mallin osalta.

Keywords:

toisen asteen polynominen malli, regressioanalyysi, F-testi, puutteiden lukumäärän estimointi

Format: Adobe Acrobat PDF, 14 pages

ISBN:952-458-406-9

Filename: A-2003-4

Title: Graph-based agglomerative clustering

Authors: P. Fränti, O. Virmajoki and V. Hautamäki

Abstract:

The search for nearest neighbor is the main source of computation in most clustering algorithms. A common solution is to calculate distance to all candidates (full search) and select the one with the smallest distance. In this paper, we propose the use of nearest neighbor graph for reducing the number of candidates to be considered. The number of distance calculations per search can be reduced from O(N) to O(k) where N is the number of clusters, and k is the number of neighbors in the graph. We apply the proposed scheme within agglomerative clustering algorithm known as the PNN algorithm, and show that remarkable reduction is obtained in the running time at the cost of slight increase in the distortion.

Keywords:

Clustering, agglomeration, vector quantization, codebook generation, nearest neighbor graph, PNN.

Format: Adobe Acrobat PDF, 20 pages

ISBN: 952-458-390-0

Filename: A-2003-3

Title: Compression of map images by multi-level context tree modeling

Authors: P. Kopylov and P. Fränti

Abstract:

We propose a method for compressing color map images by context tree modeling and arithmetic coding. We consider multi-component map images with semantic layer separation, and images that are divided into binary layers by color separation. The key issue in the compression method is the utilization of inter-layer correlations, and to solve the optimal ordering of the layers. The inter-layer dependencies are acquired by optimizing the context tree for every pair of image layers. The resulting cost matrix of the inter-layer dependencies is considered as a directed spanning tree problem, and solved by algorithm based on the Edmond's algorithm for optimum branching, and by the optimal selection and removal of the background color. The proposed method gives 50:1 compression on a set of map images, which is about 50% better than JBIG, and 16% better than the best comparative method.

Keywords:

Image compression, lossless, context modeling, map images, layer ordering, spanning tree.

Format: Adobe Acrobat PDF

ISBN: 952-458-327-5

Filename: A-2003-2

Title: Data reduction of large vector graphics

Authors: A. Kolesnikov and P. Färnti

Abstract:

Fast algorithm for joint near-optimal approximation of multiple polygonal curves is proposed. It is based on iterative reduced search dynamic programming introduced earlier for the min-e problem of a single polygonal curve. The proposed algorithm jointly optimizes the number of line segments allocated to the different individual curves, and the approximation of the curves by the given number of segments. Trade-off between time and optimality is controlled by the breadth of the search, and by the numbers of iterations applied.

Keywords:

polygonal approximation, multi-object, dynamic programming, min-e problem, data reduction, compression.

Format: Adobe Acrobat PDF

ISBN: 952-458-311-9

Filename: A-2003-1

Title: Software Globalisation in Finland: A State-of-the-practice Survey

Authors: Jarkko Immonen and Jorma Sajaniemi

Abstract:

During recent years software globalisation has become an important part of the whole software industry. Adaptation of software to different market-areas is not an easy or straightforward task. The adaptation process usually involves two participants: a software developer and a localisation vendor. We interviewed Finnish software industry representatives in order to find out the current state of globalisation processes and practices and to inquire their views on software globalisation. On the basis of the interviews, it seems that globalisation practices vary greatly and localisation is often considered as purely translation. In addition, there seems to be many problems that could be solved by using more defined processes and by enhancing the communication and co-operation between software developers and localisation vendors.

Keywords:

Software globalisation, internationalisation, localisation, software process improvement

Format: Adobe Acrobat PDF, 48 pages

ISBN: 952-458-253-8

Filename: A-2002-8

Title: PUTTE - Puutteiden estimointijärjestelmä

Authors: Matti Niemi

Tiivistelmä:

Puutteiden estimointijärjestelmä, PUTTE, on tietokantapohjainen ohjelmistoapuväline, joka on tehty mallintamaan ohjelmistotuotantoprojekteissa kerättyjen puutetietojen lukumäärän pohjalta ohjelmistotuotantoprosessissa ilmeneviä puutteita. Järjestelmän avulla voidaan tuottaa aikaisempien, samankaltaisten ohjelmistoprojektien historiatietoja käyttäen Norden/ Rayleigh- ja Gamma-jakaumien mukaisesti puutteiden ilmenemismalli, jonka perusteella voidaan arvioida käynnistyvää projektia sekä dynaamisesti jo käynnissä olevaa projektia.

Avainsanat:

Gamma-malli, Norden/Rayleigh-malli, puutteiden lukumäärän estimointi, ohjelmistoapuväline

Format: Adobe Acrobat PDF, 10 pages

ISBN: 952-458-219-8

Filename: A-2002-6

Title: Dynamic local search algorithm for the clustering problem

Authors: Ismo Kärkkäinen and Pasi Fränti

Abstract:

Dynamic clustering problems can be solved by finding several clustering solutions with different number of clusters, and by choosing the one that minimizes a given evaluation function value. This kind of brute force approach is general but not very efficient. We propose a new dynamic local search that solves the number and location of the clusters jointly. The algorithm uses a set of basic operations, such as cluster addition, removal and swapping. The clustering is then found by the combination of trial-and-error approach of local search. The algorithm finds the results 30 times faster than the brute force approach.

Keywords:

clustering, number of clusters, vector quantization, Optimization.

Format: Postscript, 12 pages

ISBN: 952-458-143-4

Filename: A-2002-5

Title: Stepwise clustering algorithm for unknown number of clusters

Authors: Ismo Kärkkäinen and Pasi Fränti

Abstract:

We consider a criterion-based approach to solve dynamic clustering problems for all potential number of clusters, and use a given evaluation criterion to determine the correct one. We propose two novel improvements for reducing the work load to find the correct clustering. The first idea is to use stepwise clustering algorithm that utilizes the previous solution when solving the next clustering with different (one more or one less) number of clusters. The second idea is to use a heuristic stopping criterion in the algorithm that solves a single clustering. The stopping criterion estimates potential further improvement on the basis of the improvement achieved so far. The iteration will automatically stop if the estimated improvement stays below a predefined threshold value. By experiments we have found out that any threshold value 0.1 % or less results in the correct clustering with a confidence of more than 99%. Their effect on the run time and clustering quality is studied.

Keywords:

pattern recognition, algorithms, cluster analysis, number of clusters.

Format: Post script, 19 pages

ISBN: 952-458-129-9

Filename: A-2002-4

Title: PlanAni - A System for Visualizing Roles of Variables to Novice Programmers

Authors: Jorma Sajaniemi

Abstract:

Many students learning to write computer programs encounter considerable difficulties. For novices, one of the key problems is in understanding how the very basic programming constructs work. In this paper, we concentrate on visualizing the role of a variable, i.e., the dynamic character of a variable embodied by the sequence of its successive values as related to other variables. We present a classification of roles and introduce an animation system, PlanAni, that uses this approach.

Keywords:

Program visualization, roles of variables

Format: PDF, 8 pages

ISBN: 952-458-115-9

Filename: A-2002-3

Title: sfia - Software for Inspection Automation

Authors: Vesa Tenhunen and Jorma Sajaniemi

Abstract:

A key element in manufacturing quality software is the early detection of defects through various inspection techniques. Required bookkeeping is, however, laborious and automation is needed. We present a new inspection automation tool, sfia, that makes the collection of defect data easy and supports all phases of an inspection process. The tool can be used with any document format and is available on many computer platforms.

Keywords:

Software inspections, defect detection, automation tools, software process improvement

Format: PostScript, 8 pages

ISBN: 952-458-114-0

Filename: A-2002-2

Title: Optimal clustering by branch-and-bound technique

Authors: Pasi Fränti, Olli Virmajoki and Timo Kaukoranta

Abstract:

In literature, no much attention has been paid to algorithms for finding the optimal clustering except in some special, in which the problem reduces to polynomial time problem. In this paper, we give solution for the general case. The method generates all possible clusterings by a series of merge steps. The clusterings are organized as a minimum redundancy search tree and the optimal clustering is found by a branch-and-bound technique. The result has theoretical interest and could also provide new insight to the problem itself. From the branch-and-bound technique, we also derive two suboptimal but polynomial time variants.

Keywords:

Clustering, algorithms, optimization, PNN, vector quantization.

Format: PostScript, 16 pages

ISBN: 952-458-111-6

Filename: A-2002-1

Title: Proceedings of the First Annual Finnish / Baltic Conference on Computer Science Education

Authors: Marja Kuittinen (editor)

Abstract:

University of Joensuu organized the first annual Finnish / Baltic Sea Conference on Computer Science Education in Koli, Finland, on October 19-21, 2001. The main reason for arranging the conference was to attract interested scholars and educational technologists within the universities both in Finland and in the Baltic Sea and Nordic countries to join their efforts to figure out the future prospects on the field of Computer Science Education. The goal of the conference was to develop the exchange of revelant information between colleagues working within the same discipline. Topics of interest included, but were not limited to: Teaching Practices, Educational Technology, Distance Education, Virtual Universities, Computer Science Education Research, Educational Software, and Tools for Visualization or Concretization. The conference program consisted of short presentations, open discussion sessions and social events.

Keywords:

Computer Science Education (CSE), educational technology, distance education, educational software, virtual university, visualization

Format: PostScript, 94 pages

ISBN: 952-458-089-6

Filename: A-2001-5

Title: Puutteiden lukumäärän estimointi Norden/Rayleigh-jakauman ja Gamma-jakauman avulla

Authors: Matti Niemi

Abstract:

Ohjelmistotuotannossa voidaan estimoida puutteiden lukumäärätietoja erilaisilla matemaattisilla malleilla. Tutkimusraportissa sovitetaan Norden/Rayleigh-jakauman ja Gamma-jakauman mukaiset käyrät pienimmän neliösumman menetelmän ja logaritmimuunnoksen avulla havaintoaineistoon. Aineistona on kirjallisuudesta saadut, todellisissa ohjelmistoprojekteissa kerätyt puutetiedot. Saatuja malleja verrataan keskenään projektitietojen avulla. Lisäksi selvitetään, kuinka olemassa olevaa mallia voidaan muuttaa ottamalla huomioon käynnissä olevan projektin puutetiedot. Tulokset osoittavat jakaumien soveltuvan lähes yhtä hyvin mallintamaan puutetietoja. Molempien jakaumien tuottamien mallien käyttäytyminen on kuitenkin riippuvainen havaintoaineistosta.

Keywords:

Gamma-malli, Norden/Rayleigh-malli, puutteiden lukumäärän estimointi

Format: Adobe Acrobat PDF, 13 pages

ISBN: 952-458-084-5

Filename: A-2001-4

Title: Reduced-search dynamic programming for approximation of polygonal curves

Authors: Alexander Kolesnikov and Pasi Fränti

Abstract:

Approximation of polygonal curves can be solved by dynamic programming, or by graph-theoretical approach. These methods provide optimal solution but they are slow for a large number of vertices. Faster methods exist but they lack the optimality. We try to bridge the gap between the slow but optimal, and the fast sub-optimal algorithms by giving a new optimization approach based on dynamic programming with a reduced search path in the state space. At first step, a rough approximation is generated using any fast sub-optimal algorithm. Then, a bounding corridor is constructed in the state space using the reference solution. Finally, the minimum cost path is searched in the corridor using dynamic programming. The proposed algorithm is simple, fast, and it has low space complexity. The time and quality trade-off can be controlled by selecting appropriate corridor width and the number of iterations in the algorithm.

Keywords:

polygonal approximation, dynamic programming, data reduction, vectorization.

Format: 12 pages

ISBN: 952-458-082-9

Filename: A-2001-3

Title: Iterative shrinking method for the clustering problem

Authors: Olli Virmajoki, Pasi Fränti and Timo Kaukoranta

Abstract:

The pairwise nearest neighbor method (PNN) generates the clustering of a given data set by a sequence of merge steps. In this paper, we propose an alternative solution for the merge-based approach by introducing an iterative shrinking method. The new method removes the clusters iteratively one by one until the desired number of clusters is reached. Instead of merging two nearby clusters, we remove one cluster by reassigning its data vectors to the neighbor clusters. We retain the local optimization strategy of the PNN by always removing the cluster that increases the cost function least. We give six alternative implementations, which all outperform the PNN in quality.

Keywords:

Vector quantization, codebook generation, clustering, PNN.

Format: gzipped post script, 16 pages

ISBN: 952-458-080-2

Filename: A-2001-2

Title: Storage of multi-component digital maps using JBIG2 image compression standard

Authors: Pasi Fränti, Eugene Ageenko and Sami Gröhn

Abstract:

Digital maps can be stored and distributed electronically using compressed raster image formats. Spatial access must be implemented to enable the user to operate directly on the compressed data without retrieving the entire image. In this work, we study how the latest JBIG2 standard can be used for storing the map images. The image tiling can be supported quite easily by storing each image block as its own segment. A slightly more difficult problem is how to store the multiple image components because the standard does not have any direct support for it. There are alternative ways to overcome this problem by utilizing the features of JBIG2 in creative ways. We show by experimental results that the block division of 128´128 can be implemented with about 15-30 % overhead.

Keywords:

Map images, real-time applications, personal navigation, image compression, spatial access, JBIG2.

Format: gzipped postscript, 16 pages

ISBN: 951-708-976-7

Filename: A-2001-1

Title: Compression of map images for real-time applications

Authors: Pasi Fränti, Eugene Ageenko, Pavel Kopylov, Sami Gröhn and Florian Berger

Abstract:

Digital maps can be stored and distributed electronically using compressed raster image formats. We introduce a storage system for the map images that supports compact storage size, decompression of partial image, and smooth transitions between various image resolutions. The main objective of the proposed storage system is to provide map images for real-time applications that use portable devices with low memory and computing resources. Compact storage size is achieved by dividing the image into semantic binary layers, which are then compressed using context-based method. Partial image decompression is supported by tiling the image into smaller blocks and implementing direct access to the compressed blocks. In this paper, we give overview of the system architecture, describe the compression technique in detail, and discuss implementation aspects. Experimental results are given both in terms of compression ratios and image retrieval timings.

Keywords:

Image compression, map images, real-time applications, personal navigation, spatial access.

Format: gzipped postscript, 19 pages

ISBN: 951-708-975-9

Filename: A-2000-5

Title: Memory module structures for shared memory simulation

Authors: Martti Forsell and Ville Leppänen

Abstract:

The shared memory programming model on top of a physically distributed memory machine (SDMM) is a promising candidate for easy-to-program general purpose parallel computation. There are, however, certain open technical problems which should be sufficiently solved before SDMM can meet the expectations. Among them is low-level structure of memory system, because most academic studies of the subject assume unrealisticly ideal memory properties, ignoring completely, e.g., the speed difference between processors and memories.
In this paper we propose three memory module structures based on low-level interleaving and caching for solving this speed difference problem. We evaluate these structures along with three reference solutions by determining the overall cost factor of memory references in respect to ideal SDMM using real parallel programs. According to the evaluation the cost less than two is achieved with an interleaved solution, if proper amount of parallelism is available. Moreover, caching increases the speed of the memory, if caches are placed in modules rather than next to processors, but it provides lower throughput than interleaving.

Keywords:

Parallel computing, shared memory simulation, memories, interleaving, caches

Format: Postscript, 10 pages

ISBN: 951-708-950-3

Filename: A-2000-4

Title: Architectural differences of efficient sequential and parallel computers

Authors: Martti Forsell

Abstract:

In this article we try to conclude what kind of a computer architecture is efficient for executing sequential problems, and what kind of an architecture is efficient for executing parallel problems from the processor architect's point of view. For that purpose we analytically evaluate the performance of five sequential architectures and seven parallel architectures representing widely both commercial and scientific processor designs attached to efficient memory systems. The results are interesting: Sequential problems are executed most efficiently in a computer using a minimally pipelined VLIW processor and a memory system, which is as fast as possible. Parallel problems are executed most efficiently in a computer using deeply superpipelined, chained, and multithreaded processors and a high throughput pipelined memory system. Thus, designing a computer for efficient sequential computation leads to very different architecture than designing one for efficient parallel computation.

Keywords:

Computer architecture, sequential computation, parallel computation

Format: Postscript, 20 pages

ISBN: 951-708-949-X

Filename: A-2000-3

Title: Context-based Compression of Binary Images in Parallel

Authors: Eugene Ageenko, Martti Forsell and Pasi Fränti

Abstract:

Binary images can be compressed efficiently using context-based statistical modeling and arithmetic coding. The problem of this kind of approach is that the process is fully sequential and therefore, additional computing power from parallel computers cannot be utilized. We attack this problem and show how the context-based compression can be implemented in parallel. Our approach is to segment the image into non-overlapping blocks, which are compressed independently by the processors. We give two alternative solutions how the construct, distribute and utilize the model in parallel, and study the effect on the compression performance and run time. We will show by experiments that the proposed approach achieves speedup that is proportional to the number of processors. The work efficiency of the extra processing exceeds 50% with any reasonable number of processors.

Keywords:

image compression, context modeling, JBIG, parallel algorithms, EREW, PRAM.

Format: Postscript, 10 pages

ISBN: 951-708-948-1

Filename: A-2000-2

Title: Practical methods for speeding-up the PNN method

Authors: Olli Virmajoki, Pasi Fränti and Timo Kaukoranta

Abstract:

Pairwise nearest neighbor method (PNN) is a simple and well-known method for codebook generation in vector quantization. In its exact form, it provides good quality codebooks but at the cost of high run time. A fast exact algorithm has been recently introduced to implement the PNN in an order of magnitude faster than the original O(N^3K) time algorithm. The run time, however, is still lower bounded by O(N^2K), and therefore, additional speed-up may be required in applications where time is an important factor. We consider two practical methods to reduce the amount of work caused by the distance calculations. By experiments, we show that the run time can be reduced to 10-15% of original method for data sets in color quantization and in spatial vector quantization.

Keywords:

Vector quantization, codebook generation, clustering algorithms, pairwise nearest neighbor method

Format: PostScript, 19 pages

ISBN: 951-708-939-2

Filename: A-2000-1

Title: Empirical study of novice spreadsheet usage: Comparing two spreadsheet calculation paradigms

Authors: Markku Tukiainen

Abstract:

Empirical studies of spreadsheet programming have commonly shown high over all error rates but much less attention has been paid to reasons for these errors. One often mentioned cause for errors is the low conceptual level of spreadsheet systems. By changing the conceptual level of spreadsheet system, we wanted to study whether this will produce different type of errors compared to traditional spreadsheet systems. In this paper we present an empirical study comparing the traditional spreadsheet calculation paradigm and the structured spreadsheet calculation paradigm that utilizes goals, plans and spreadsheet data structures in computation. The results show that the error behavior of novice spreadsheet users is systematically different between paradigms.

Keywords:

Spreadsheet calculation, Goals and plans, Empirical studies of programming

Format: PostScript, 27 pages

ISBN: 951-708-906-6

Filename: A-1999-5

Title: Randomized local search algorithm for the clustering problem

Authors: Pasi Fränti and Juha Kivijärvi

Abstract:

We consider clustering as a combinatorial optimization problem. Local search provides a simple and effective approach to many other combinatorial optimization problems. It is therefore surprising how seldom it has been applied to the clustering problem. Instead, the best clustering results have been obtained by more complex techniques such as tabu search and genetic algorithms at the cost of high running time. We introduce a new randomized local search algorithm for the clustering problem. The algorithm is easy to implement, sufficiently fast, and competitive with the best clustering methods. The ease of implementation makes it possible to tailor the algorithm for various clustering applications with different distance metrics and evaluation criteria.

Keywords:

clustering, vector quantization, local search, combinatorial optimization, image processing, compression

Format: PostScript, 19 pages

ISBN: 951-708-845-0

Filename: A-1999-4.pdf

Title: Ethical Attitudes among Finnish Computer Science Students and Computer Professionals

Authors: Tero Vartiainen

Abstract:

The purpose of this replication study was to find out if there were differences between computer professionals' and computer science (CS) students' ethical attitudes. Those differences and diversities among students and professionals imply issues, which should be taken into the computer ethics teaching. The study was accomplished among CS students at the University of Joensuu and at the University of Kuopio. Computer professionals were attained with the help of member database of The Finnish Information Processing Association. In addition, two surveys were accomplished in two organizations. Because of low response rate among professionals, results can not be generalized to cover attitudes of all the computer professionals in Finland. Students' attitudes are supposed to represent all the Finnish CS students.
This study found diverse attitudes among CS students in the following areas: ownership of intellectual property, usage of computer resources for own purposes, and large databases. There were differences pertaining to gender among CS students concerning honesty, customer relationships, and the usage of computer resources for own purposes. Differences in attitudes between professionals and students were found in the following areas: accomplishment of worktasks, ownership of intellectual property, usage of computer resources for own purposes and customer relationships. All these issues should be included in the teaching of computer ethics.

Keywords:

ethical attitudes, computer ethics, ethics teaching, professional ethics

Format: PDF, 36 pages

ISBN: 951-708-762-4

Filename: A-1999-3

Title: Lossless Compression Of Large Binary Images In Digital Spatial Libraries

Authors: Eugene Ageenko and Pasi Fränti

Abstract:

A method for lossless compression of large binary images is proposed for applications where spatial access to the image is needed. The method utilizes the advantages of (1) variable-size context modeling in a form of context trees, and (2) forward-adaptive statistical compression. New strategies for constructing the context tree are considered, including a fast two-stage bottom-up approach. The proposed technique achieves 20 % higher compression rates than JBIG, and at the same time it allows dense tiling of the image down to 200 x 200 pixels.

Keywords:

image compression, digital library, context modeling, context tree, spatial access

Format: PostScript, 14 pages

ISBN: 951-708-761-6

Filename: A-1999-2

Title: Context-Based Filtering For Document Image Compression

Authors: Eugene Ageenko and Pasi Fränti

Abstract:

Two context-based filtering methods are introduced. The methods are based on statistical context modeling by gathering context-wise statistics from the entire image. Uncommon pixels in low information contexts are unconditionally inverted (simple context filter), or inverted conditionally depending whether the gain in compression overweighs the loss of information (gain-loss filter). Both methods improve compression performance of the document images while preserving the image quality, measured as the OCR accuracy. The gain-loss filter is capable of reaching the compression limit estimated by the compression of the noiseless digital original.

Keywords:

binary image compression; document image enhancing; noise removal; context-based modeling; optical character recognition

Format: PostScript, 17 pages

ISBN: 951-708-760-8

Filename: A-1999-1

Title: Goals and Plans in Spreadsheet Calculation

Authors: Sajaniemi J., Tukiainen M. and Väisänen J.

Abstract:

Programming knowledge can be characterized in the form of goals and plans that describe what must be achieved and how this is done. We have conducted interviews of spreadsheet users and analyzed spreadsheet applications qualitatively. The analysis resulted in a set of basic programming goals and plans describing spreadsheet programming knowledge. This paper introduces a model for spreadsheet programming knowledge description and uses it to present the results of our analysis.

Keywords:

Spreadsheet calculation, Goals and plans, Empirical studies of programming

Format: PostScript, 32 pages

ISBN: 951-708-734-9

Filename: A-1998-9

Title: Experiments in Routing h-Relations Using Constant Thinning And Geometric Thinning Algorithms

Authors: Anssi Kautonen

Abstract:

We have experimented with two protocols designed for routing h-relations on complete networks under so-called OCPC assumption. These protocols, called constant thinning and geometric thinning protocol, were theoretically analyzed in papers:
A. Kautonen, V. Leppänen and M. Penttonen: Constant thinning protocol for routing h-relations in complete networks,
and
A. Kautonen, V. Leppänen and M. Penttonen: Thinning protocols for routing h-relations in complete networks.
In this report we estimate experimentally the efficiency of the protocols.

Format: Postscript, 14 pages

ISBN: 951-708-732-2

Filename: A-1998-8

Title: Teaching Computer Ethics: Experiences of Integrating Ethics into Computer Science Courses

Authors: Tero Vartiainen

Abstract:

Computer ethics could be taught at least in two ways: integrating ethics into computer science (CS) courses or arranging computer ethics course. In this report the experiences from the experiment of integrating ethics into CS courses are presented. In this experiment the integration means adding groupworks and dilemma discussions to the exercises of CS courses. Dilemma discussion as a teaching method was realised to be reasonable and useful even among teachers who had no prior ethics studies. However, it was observed that teachers and students should become familiar with ethics theory before dilemma discussions. Guidelines for arranging dilemma discussions are presented.

Format: Postscript, 24 pages

ISBN: 951-708-691-1

Filename: A-1998-7

Title: The effect of irrelevant inputs on the generalization performance of MLP neural networks

Authors: Juha Hakkarainen

Abstract:

Irrelevant inputs decrease the performance of most induction based machine learning algorithms. In this paper, we study experimentally the effect of irrelevant inputs on neural network learning in two different training sample sizes by using two artificial and one real world datasets. The simulation results indicate that if enough training data is available, then the multilayer perceptron neural networks can be used in domains with high number of irrelevant inputs without significant loss in generalization capability.

Format: Postscript, 7 pages

ISBN: 951-708-664-4

Filename: A-1998-6

Title: Lossless and near-lossless compression of line-drawing images using Hough transform

Authors: Pasi Fränti, Eugene I. Ageenko, Heikki Kälviäinen, Saku Kukkonen

Abstract:

Two novel methods for compressing bi-level line-drawing images are proposed. The main idea is the use of the Hough transform (HT). In the first phase, HT is applied for extracting line segments from the image. In the second phase, the original image is compressed by a JBIG-based method. The first variant (lossless method) uses reconstructed feature image for improving the prediction accuracy in the local context model. The compressed file consists of the extracted line features and the compressed raster image. The second variant (near-lossless method) applies feature-dependent filtering for removing noise from the original image. It improves the image quality and therefore results in better compression performance.

Format: Postscript, ?? pages

ISBN: 951-708-652-0

Filename: A-1998-5

Title: Modeling Spreadsheet Audit: A Rigorous Approach to Automatic Visualization

Authors: Jorma Sajaniemi

Abstract:

Computations in spreadsheets are hard to grasp and consequently many errors remain unnoticed. The problem with the hidden errors lies in the invisibility of the structure of calculations. As a result, auditing and visualization tools are required to make spreadsheets easier to comprehend and errors easier to be detected. This paper presents a theoretical model of spreadsheets, and describes various spreadsheet auditing mechanisms employing the model. Moreover, two new visualization mechanisms are introduced.
The model reflects not only current spreadsheet systems but also the way people actually use spreadsheets. Theoretically, it is impossible to check the correctness of a spreadsheet without a formal definition of its computations, but our hope is to find visualizations that point out parts of spreadsheets that contain anomalies, i.e., potential locations of errors. The model helps us to understand how such anomalies can be defined.

Format: Postscript, 31 pages

ISBN: 951-708-646-6

Filename: A-1998-4

Title: VinEd - A System for Program Manipulation Through User-Definable Simultaneous Views

Authors: Jorma Sajaniemi and Kari Ikonen

Abstract:

Programmers' mental representations of programs do not obey the order, form and immediate content of program text. Multiple visible representations, or views, of a program help programmers to construct mental representations that they need in, e.g., maintenance tasks. But current programming tools and environments provide only a very limited support for such views.
We have implemented VinEd, a language independent editor framework that supports an unlimited number of views of the program text. Programmers can use predefined views and create their own views by defining filtering functions. Programs can be edited through any view providing a reverse operation that constructs the original representation from this view. Filtering and reverse filtering can be any operations that accept textual input and produce textual output. Further, VinEd extends the notion of a view to any activity that can be based on the original program, e.g., program compilation, or version control functions. Thus VinEd can be extended to form a complete programming environment.
In this paper, we consider the role of program views in the programming tasks and look at the way views are supported by current systems. We then describe the salient properties of VinEd from the user's point of view, and present examples of views that help in program comprehension as well as of views that take care of software engineering activities. Finally, we discuss the most important implementation issues of VinEd, and analyze its performance.

Format: Postscript, 14 pages

ISBN: 951-708-632-6

Filename: A-1998-3

Title: Tietokoneavusteisen opetuksen käyttö matemaattisissa aineissa yläasteella

Authors: Marja-Liisa Parviainen

Abstract:

Tässä raportissa esitetään tulokset joulukuussa 1997 tehdystä tutkimuksesta, jossa selvitettiin tietokoneiden käyttöä matemaattisten aineiden opetuksessa. Tutkimus tehtiin kysely- tutkimuksena ja kysely kohdistettiin kaikille Hyvinkään ja sen lähialueen yläasteille. Tutkimuksen tavoitteena oli kerätä kouluilta tietoa hyväksi koetuista matematiikan, fysiikan ja kemian opetusohjelmista. Lisäksi tutkimuksessa kartoitettiin laitteistotilannetta ja opettajien näkemyksiä tietokoneavusteisen opetuksen käyttöön liittyvistä asioista. Tutkimuksen tuloksena saatiin lista käyttökelpoisista opetusohjelmasta ja Internet- osoitteista. Kumpikaan listoista ei ole kovin pitkä. Suurimpina esteinä tietokoneavusteisten opetuksen käytölle koettiin luokka- tilojen puute, opettajien koulutus ja asenne, hyvien sovellus- ohjelmien puute ja opetusohjelmien kalleus. Tämä tutkimus osoittaa, että yläasteilla käytetään tietokoneita yllättävän vähän opetuksen apuna. Raportissa on mukana myös liitteinä tutkielmassa selvitetyt uudehkot opetusohjelmat ja opetukseen liittyviä WWW-sivuja sekä Internetissä alkaneita yhteistyö- projekteja, joista on hyötyä myös matemaattisten aineiden opettajille. Niiden toivotaan edistävän tietokoneiden opetus- käyttöä.

Format:

ISBN: 951-708-629-6

Filename: A-1998-2

Title: Forward-adaptive variant of the baseline JBIG for applications requiring spatial access

Authors: Eugene I. Ageenko, Pasi Fränti

Abstract:

A method for compressing large binary images is proposed for applications where spatial access to the image is needed. The proposed method is a two-pass combination of forward-adaptive modeling and backward-adaptive context based compression with re-initialization. The method supports spatial access and it improves the compression performance significantly if the baseline JBIG is used in the same way. The method can be implemented with only small modifications to the baseline JBIG; current software implementation can therefore be easily utilized.

Format: Postscript, 12 pages

ISBN: 951-708-617-2

Filename: A-1998-1

Title: Producing SGML-documents with Public Domain Tools

Authors: Paula Leinonen, Martti Penttonen

Abstract:

This report describes a project where a study guide was transformed to a structured form. Transformation was done, because the document was to be used both in printed form and in electronical form. Structure was marked up with SGML, and the structured document was converted to delivery formats with SGML-Tools. With some changes to the default system, the document was transformed to delivery formats, LaTeX and HTML, nearly automatically.

Format: Postscript, 15 pages

ISBN: 951-708-613-X

Filename: A-1997-6

Title: Three-Level Teaching Material and Its Implementation in Teaching Situation

Authors: Jorma Sajaniemi, Marja Kopponen

Abstract:

Lectures need three kinds of supporting material: lecture notes given to learners, presentation material to be shown in the lectures, and lecturers own guidance material to direct her during the lecture. Current presentation systems do not support easy preparation and maintenance of these three interwoven material sources. In this paper, we present the idea of three-level teaching material and describe a computer-based system (SHOW) that can be used to produce the different forms from a single source and to show the presentation version in a lecturing situation. Other novel features of the presentation part are abolition of the slide paradigm and discrimination of textual structures with colors during presentation.

Format: Postscript, 10 pages

ISBN: 951-708-527-3

Filename: A-1997-5

Title: Animation of user algorithms in the web

Authors: J. Haajanen, M. Pesonius, E. Sutinen, J. Tarhio, T. Teräsvirta, P. Vanninen

Abstract:

An algorithm animation environment Jeliot is introduced. Jeliot allows a web user to visualize his own algorithms, written in Java, over the Internet. Jeliot is based on self-animating data types: the user determines the visualized data objects of the source code, and Jeliot produces the animation automatically.

Format: Postscript, 14 pages

ISBN: 951-708-505-2

Filename: A-1997-4

Title: Programming with a Parametrized Parallel Programming Model

Authors: Simo Juvaste

Abstract:

We present a new experimental framework, the F-PRAM model, for modeling and studying the effects of different properties of parallel machines. The goal of the new model is to provide programmer-level facilities for more accurate and portable computa tion-communication balancing. As a computation model, F-PRAM provides shared mem ory but guides the usage of the shared memory by a set of parameters. In comparison with the BSP model, we provide more asynchronous execution and model the costs and delays induced by communication bandwidth restrictions, communication latency, overhead, and synchronization independently of each other. We have written a high level language com piler and a parametrized emulator to measure the effects of different properties of parallel machines on different parallel algorithms.

Format: Postscript, 13 pages

ISBN: 951-708-512-5

Filename: A-1997-3

Title: Tietotekniikan opetuskäyttö Pohjois-Karjalan kouluissa

Authors: Kari Väkeväinen

Abstract:

Tässä raportissa esitetään tulokset syksyllä 1996 tehdystä tietotekniikan opetuskäyttöä koskevasta kyselytutkimuksesta. Kysely kohdistettiin 38 Pohjois-Karjalaisen koulun opettajiin. Tutkimuksen tarkoituksena oli kartoittaa opettajien näkemyksiä tietotekniikan opetuksesta ja -opetuskäytöstä. Tutkimuksen tulokset antavat kuvan opettajakunnassa vallisevista odotuksista ja asenteista tieto- tekniikkaa kohtaan. Kouluissa tietotekniikkaa ei ole hyödynnetty paljoakaan. Suurimpina esteinä tähän pidetään välineiden puutetta ja opettajien puuttellista koulutusta. Tutkimuksen tulokset ovat suuntaa antavia koko Suomen koululaitosta ajatellen ja voivat toimia pohjana kehitettäessä tietotekniikan opetuskäyttöä kouluissa.

Format:

ISBN: 951-708-503-6

Filename: A-1997-2

Title: Enhanced JBIG-based compression for satisfying objectives of engineering document management system

Authors: Eugene I. Ageenko and Pasi Fränti

Abstract:

Main objectives of an engineering document management (EDM) system are outlined. Existing image compression algorithms (eg. G4 and JBIG) offer efficient solutions to the storage problem but do not sufficiently support other objectives such as spatial access and fast decoding. Here we propose a novel method based on JBIG so that the other objectives are also met. The compression performance of the proposed method is only 10 % worse than that of JBIG. At the same time, spatial access to the compressed file is achieved. The method is also 2.5 times faster in decompression than JBIG. This speed-up is comparable to the Group 4 standard (G4), but with a better compression performance. The proposed method is applicable, not only to engineering drawings, but to binary images in any document imaging system.

Format: Postscript, 17 pages

ISBN: 951-708-499-4

Filename: A-1997-1

Title: Experimental simulations of PRAM on complete networks

Authors: Martti Penttonen

Abstract:

Consider the routing problem of $p$ processors sending packets to each other at random, under the OCPC restriction, where a sending fails if another processor tries to send to the same target. Finding efficient and implementable solutions to this problem would be very desirable for the implementation of PRAM (parallel random access machine). There are two principal means to speed up routing: to improve the algorithm, or to improve the architecture. We consider both. As some of the algorithms are very hard to analyse, we have implemented a routing simulator to get approximate idea of the goodness of the algorithms and architectures. We observe that with proper improvements in the architecture, routing costs close to the optimal 1 step per packet can be achieved.

Format: Postscript, 8 pp

ISBN: 951-708-490-0

Filename: A-1996-8

Title: Blockwise Distortion Measure for Statistical and Structural Errors in Digital Images

Authors: Pasi Fränti

Abstract:

A blockwise distortion measure is proposed for evaluating the visual quality of compressed images. The proposed measure calculates quantitatively how well important visual properties have been preserved in the distorted image. The method consist of three quality factors detecting contrast errors, structural errors, and quantization errors. The proposed method outperforms PQS for a set of test images, and is much simpler to implement. The method should also be applicable to color images; properties like colo r richness and saturation are captured by the quantization and contrast measures respectively.

Format: Paper, 13pp

ISBN: 951-708-485-4

Filename: A-1996-7

Title: MTAC - A Multithreaded VLIW Architecture for PRAM Simulation

Authors: Martti Forsell

Abstract:

Multithreaded processors are the key components for hiding the latency of long operations slowing down the execution of parallel programs, because they can execute other processes while a process is waiting for a long operation to complete.
In this paper we outline and evaluate a MultiThreaded VLIW processor Architecture with functional unit Chaining (MTAC), which is specially designed for efficient PRAM simulation. According to experiments MTAC offers remarkably better performance than a basic pipelined RISC architecture and chaining improves the exploitation of instruction level parallelism to the level where the achieved speedup corresponds the number of functional units in a processor.

Format: Postscript, 11pp

ISBN: 951-708-476-5

Filename: A-1996-6

Title: Tabu Search Algorithm for Codebook Generation in Vector Quantization

Authors: Pasi Fränti, Juha Kivijärvi and Olli Nevalainen

Abstract:

A tabu search algorithm is proposed for the codebook generation in vector quantization. The key question is the definition of neighboring solution. Making random modifications to the current solution is not sufficient alone. The proposed algorithm first makes non-local changes to the codebook which is then fine-tuned by the generalized Lloyd algorithm (GLA). For a set of gray scale images, the new algorithm was better than GLA alone, and its results were comparable to simulated annealing. For binary images, the tabu search approach gave the best MSE-values.

Format: PostScript, 16pp

ISBN: 951-708-465-X

Filename: A-1996-5

Title: Genetic Algorithms for Codebook Generation in Vector Quantization

Authors: Pasi Fränti, Juha Kivijärvi, Timo Kaukoranta and Olli Nevalainen

Abstract:

In the present paper we study genetic algorithms for the codebook generation problem of vector quantization. There are two different approaches to the problem: a codebook-based and a partition-based. Both these approaches can be derived from the optimality criteria of GLA. From these, the codebook-based approach is clearly superior to the partition-based approach. The experiments show that a pure genetic algorithm does not give comparable results to the existing methods, but the inclusion of GLA iterations is vital. Even in this case, the time-distortion performance of the algorithm is still inferior to simulated annealing. In the case of binary images, the hybrid combination of GA and GLA gives the best results. The greatest deficiency of the algorithm is still its high running time.

Format: Paper, 19pp

ISBN: 951-708-439-0

Filename: A-1996-4

Title: Spreadsheet Goal and Plan Catalog: Additive and Multiplicative Computational Goals and Plans in Spreadsheet Calculation

Abstract:

Spreadsheeting knowledge can be characterized in the form of goals and plans that describe what must be achieved and how this is done. We have analyzed a set of real-life spreadsheets and extracted a collection of computational goals and plans. This report contains a descriptive catalog of goals and plans consisting of addition, subtraction, multiplication, and division. The catalog is based on actual spreadsheet usage and is not intended to be a prescriptive catalog of what spreadsheeting goals and plans should be like.

Authors: Markku Tukiainen, Jorma Sajaniemi

Format: Postscript, 28pp

ISBN: 951-708-461-7

Filename: A-1996-3

Title: An evolutionary approach to neural network design applied to sunspot prediction

Abstract:

In this paper we describe a neural network simulator which uses evolutionary optimization for the selection of an optimal architecture and learning parameters for a MLP neural network. We also present some simulation results of sunspot prediction problem.

Authors: Juha Hakkarainen, Anne Jumppanen, Jari Kyngäs, Juha Kyyrö

Format: Postscript, 8pp

ISBN: 951-708-412-9

Filename: A-1996-1

Title: GEE User's Manual

Authors: Jorma Sajaniemi, Timo Niiranen

Abstract:

Graphical Experiments Environment (GEE) is intended to be used as an aid in psychological experiments. With GEE a subject creates or recognizes graphical presentations according to tas instructions presented by the program. A graphical presentation consists of images. The subject can copy, grab, drag and place images freely within a working area. The working area may be covered with a base image on which the subject builds the graphical presentation. The set of movable images may also be empty yielding a recognition task. Tasks can be timed in various ways. Especially, rest intervals can be introduced as empty tasks with minimum or maximum times.
The actions made by the subject as well as time stamps are recorded in a protocol for later analysis. GEE Analyser provides some simple statistical results and produces the results in a suitable form for other analysis tools. Moreover, GEE Analyser can draw the final presentation. The results can be printed on paper or stored into a file.
The language of the user interface of GEE can be changed. Standard version of GEE contains support for Finnish, Spanish, and English.

Format: Postscript, 26pp

ISBN: 951-708-417-X

Filename: A-1995-7

Title: Usability Research in a Housing Fair: Problems and Possibilities

Authors: Jorma Sajaniemi, Ismo Tossavainen

Abstract:

New technology makes computer based multimedia and telematic services available to people of all ages having vastly different training, experience and personal characteristics. If the new multimedia and telematic services are to be successful, their usability must have high standards as judged by ordinary people.
We have conducted usability research of three multimedia and telematic services in a housing fair. The research consists of automatic logging of the users' actions, observations of the users, interviews, and questionnaires. Our approach differs from controlled laboratory experiments and usability research that concerns computer systems aimed at some specific user group and conducted in their working place. It is more like "usability kiosk" methodology.
In this paper, we are interested in the methodological problems of conducting usability research during large public gatherings. We will not report methodological questions that are common to controlled laboratory experiments. Instead, we will focus on problems that are unique to this kind of setting. The methodological discussion is divided into two parts: problems common to all usability kiosk research, and issues special to fairs and exhibitions.

Format: Postscript, 44pp

ISBN: 951-708-385-8

Filename: A-1995-6

Title: Experiments On Simulating PRAM On Complete Network And On On Mesh Of Buses

Authors: Anssi Kautonen

Abstract:

Parallel Random Access Machine, PRAM, is the most popular abstract model of the parallel computation. In PRAM model there is a collection of processors that have a shared memory, contrary to the distributed memory model where each processor only has local memory and processors are connected each other via a network. PRAM hides details of the machine architecture from programmer. In this paper, we experimentally investigate the simulation of PRAM on two distributed architectures of low latency: on the complete network and the on mesh of buses. Simulations are based on randomized hashing of shared memory and two strategies of delayed routing. Experiments show that with suitable simulation parameters, a PRAM step can be simulated in 5-10 clock cycles.

Format: Postscript, 26pp

ISBN: 951-708-384-X

Filename: A-1995-5

Title: Ohjelmistoprojektien työmäärän arviointiprosessin mallintaminen

Authors: Juha Hakkarainen, Jari Kyngäs, Petteri Laamanen, Raimo Rask

Abstract:

In this report we compare two methods for modelling the estimation of effort of software projects. The material of this research has been gathered from Finnish organizations developing software. According to results documented in this paper, evolutionary neural networks seem to suit to this kind of modelling better than traditional regression analysis. Though the results of our experiment cannot be generalized to other data taha used here, it is quite obvious, that there is a need for an automatic tool to assist in the estimation of resources required by software projects: the quite large variance in the estimates of effort given by the organizations cannot be explained merely by differences in local efficiency.

Format: Postscript, 16pp

ISBN: 951-708-408-0

Filename: A-1995-4

Title: Implementation of Two-dimensional Filters for Structured Documents in SYNDOC Environment

Authors: Eila Kuikka, Jouni Mykkänen, Arto Ryynänen, Airi Salminen

Abstract:

Filtering is used to select a subset, corresponding to the information interests of a user, from a set of information items. The information interests are described in a filter which is created to control the selection. In our earlier work we have described a theoretical framework for specifying filters to express content-based and structure- oriented constraints on structured text. In the filters, the information interests of the user are expressed by constraints and annotations on two-dimensional templates. The templates are created from the grammar associated with the structured text. This report describes a prototype for the filtering method in a syntax-directed document processing system called SYNDOC. In SYNDOC, a filter is applied to documents associated with a common grammar. The application of a filter means finding the documents that match the filter. From user's point of view, filtering a subset of a given document document collection consists of the following six steps. First, a filter for a given grammar is defined; second, a directory containing documents associated with the grammar is chosen; third, indexing is applied to the documents (unless indexed documents were chosen); fourth, the filter is applied to the indexed document of the chosen directory; fifth, the form of the output is defined; and sixth, the filtered documents are displayed in the specified form. In the current phase of the implementation, the matching test is applied to one document at the time, and in case of matching, the document is displayed using the default output form.

Format: Postscript, 30pp

ISBN: 951-708-383-1

Filename: A-1995-3

Title: The Minimal Pipeline Architecture

Authors: Martti Forsell

Abstract:

Pipelining is used in almost all recent processor architectures to increase the performance. It is, however, difficult to achieve the theoretical speedup of pipelining, because code dependencies cause delays in execution. Processor designers must use complex techniques like forwarding, register renaming and branch prediction to reduce the loss of performance due to this problem.
In this paper we introduce abstract Minimal Pipeline Architecture (MPA) featuring special two level pipelining. MPA offers interesting advantages over architectures using deeper pipelines: Pipeline delays as well as register file bandwidth problems are absent in MPA. Also the utilization of functional units is better in MPA than in basic RISC machines. A preliminary performance evaluation of MPA is given.

Format: Postscript, 10pp

ISBN: 951-708-361-0

Filename: A-1995-2

Title: Efficient Two-Level Mesh based Simulation of PRAMs

Authors: Martti Forsell, Ville Leppanen, Martti Penttonen

Abstract:

We consider work-optimal simulations of PRAM models on coated block meshes. A coated block mesh consist of \pi -processor blocks and \pi * \pi or \sqrt{\pi} * \sqrt{\pi} * \sqrt{\pi} router blocks. The router blocks form a 2-dimensional or a 3-dimensional regular mesh, and the processor&memory blocks are located on the surface of the block mesh. As a generalization of the coated mesh, the 2-dimensional and 3-dimensional coated block meshes simulate EREW, CREW, and CRCW PRAM models work-optimally. Moreover, the simulation cost is very moderate. Using certain additional improvements and proper amount of parallel slackness, the cost can be decreased clearly below 2 routing steps per simulated PRAM processor.
The coated block mesh is actually an instance of a more general two-level construction technique, which uses a seemingly inefficient but scalable solution on top of a non-scalable but efficient solution. Within blocks (chips) brute force techniques are applied, whereas the mesh structure on top makes the whole construction modular, simple, and scalable. The parameter \pi provides a method to balance the construction with respect to the available technology.

Format: Postscript, 18pp

ISBN: 951-708-328-9

Filename: A-1995-1

Title: Forecasting Sunspot Numbers with Neural Networks

Authors: Jari Kyngäs

Abstract:

This paper presents a feedforward neural network approach to sunspot forecasting. The sunspot series were analyzed with feedforward neural networks, formalized based on statistical models. The statistical models were used as comparison models along with recurrent neural networks. The feedforward networks had 2-4 inputs (depending on the number of predictor variables), one hidden layer with 20 or 30 neurons and one neuron on the output layer. The networks were trained using the backpropagation- algorithm. As a result, I found that feedforward neural networks are much better forecasters than recurrent neural networks and statistical models.

Format: Postscript, 18pp

ISBN: 951-708-317-3.

Filename: paper only

Title: Proceedings of the Seventh Finnish Symposium on Computer Science

Authors: Martti Penttonen (ed.)

Abstract:

Contents:

          Kari Grano, Jukka Paakki: 
          A language for specifying communication protocols   . . . . . . . . 1
          Juha Hakkarainen:
          Distributed knowledge representation in sparse distributed memory  19
          Niklas Holsti, Erkki Sutinen:
          Approximate string matching using q-gram places   . . . . . . . .  23
          Tomi Janhunen:
          Cautious autoepistemic reasoning applied to general logic programs 33
          Simo Juvaste:
          A note on simulation PRAMs by sparse microprocessor networks   . . 47
          Jyrki Katajainen, Tomi Pasanen, Jukka Teuhola:
          In-place mergesort   . . . . . . . . . . . . . . . . . . . . . . . 55
          Heikki Kalviainen:
          Randomized Hough transform (RHT): new extensions to line detection 63
          Juha Karkkainen, Esko Ukkonen:
          Two dimensional pattern matching by static sampling  . . . . . . . 79
          Ville Leppanen, Martti Penttonen:
          Work-optimal simulation of PRAM models on meshes   . . . . . . . . 91
          Jussi Rintanen:
          Approaches to priorities in default reasoning  . . . . . . . . .  121
          Erkki Sutinen:
          On average-case behaviour of the q-gram method   . . . . . . . .  133
          Merja Wanne:
          On constructing area unions in computer cartography  . . . . . .  139

Format: Paper, 149 pp

ISBN: 951-708-204-5

Filename: A-1993-3

Title: Deep Knowledge and Domain Models

Authors: Jarmo Ahonen

Abstract:

An approach to the concept of deep knowledge is outlined. The approach is based on the assumption that the deepness of knowledge results from its explanatory powers. After considering some examples of deep and shallow knowledge, an approach to the development of quantitative domain models based on deep knowledge is proposed. The proposed approach is based on the concept of causal processes and it provides a uniform point of view to knowledge of physical domains and domain modeling.

Format: Postscript, 15 pp

ISBN: 951-708-209-6

Filename: A-1993-2

Title: On Qualitative Modeling

Authors: Jarmo Ahonen

Abstract:

Fundamental assumptions behind qualitative modeling are critically considered, and some inherent problems in that modeling approach are outlined. The problems outlined are due to the assumption that a sufficient set of symbols representing the fundamental features of the physical world exists. That assumption causes serious problems when modeling continuous systems. An alternative for intelligent system building for cases not suitable for qualitative modeling is proposed. The proposed alternative combines neural networks and quantitative modeling.

Format: Postscript, 15 pp

ISBN: 951-708-208-8

Filename: A-1993-1

Title: Are Multiport Memories Physically Feasible?

Authors: Martti Forsell

Abstract:

A Parallel Random Access Machine (PRAM) is a popular model ofparallel computation that promises easy programmability and great parallel performance, but only if efficient shared main memories can be built. This would not be easy, because the complexity of shared memories leads to difficult technical problems. In this paper we consider the idea of true multiport memory that can be used as a building block for efficient PRAM-style shared main memory. Two possible structures for multiport memory chips are presented. We shall also give a preliminary cost-effectivity and performance analysis for memory systems using proposed multiport RAMs. Results are encouraging: At least small size multiport memories seem to be physically feasible. Also the power of PRAM model can be fully exploited by computer systems with multiport memories.

Format: Postscript, 14 pp

ISBN: 951-708-187-1

Filename: A-1992-1

Title: An implementation of the programming language pm2 for PRAM Author: Simo Juvaste

Abstract:

pm2 is a programming language for PRAM. It is based on Modula-2 with additional elements for the control of parallelism. This report describes the main implementation concepts of pm2, mainly the usage of the PRAM. The compiler from pm2 to PRAM-assembler has been implemented. Processor and memory management of the implementation are simple but adequate and effective enough for the experimental compiler. Moreover the execution times of the primitives match the costs of the primitives used in PRAM-algorithm theory.

Format: Postscript, 18 pp

ISBN: 951-708-112-X

Filename: A-1991-2

Title: A Prototype System for Automating Measurement and Verification of SA Descriptions

Authors: Petteri Laamanen and Raimo Rask

Abstract:

The size of software system provides a basis for the estimation of software cost during software development. Hence, it is important to measure the system size reliably as early as possible. Two best know specification level metrics are Albrecht's Function Points and DeMarco's Function Bang which, however,lack automated estimation support. We demonstrate how the calculation of these metrics can be automated and describe a prototype assistant tool, which calcualtes these metrics from Structured Analysis (SA) descriptions. To assure the reliability of estimation, the SA descriptions must be verified before applying the automated metrics. In our prototype system, we have imple- mented a set of rules for automated checking of completeness and consistency of a structured specification.

Format: Postscript, 27 pp

ISBN: 951-708-033-6

Filename: A-1991-1

Title: Algorithms for Counting Unadjusted Function Points from Dataflow Diagrams

Authors: Raimo Rask

Abstract:

To calculate a cost estimate of a software system we should be able to use the descriptions produced by the current CASE-tools to measure automatically the size of the system. Current software development environments do not, however, provid enough support for the measurement of a system description during the analysis phase of software development. This paper gives algorithms for calculating Albrecht's unadjusted function point value based on dataflow and entity-relationship diagrams that are used during Structured Analysis.

Format: Postscript, 18+2 pp

ISBN: 951-708-016-6