Digital Speech Processing (4-6 op) 175317
Digitaalinen puheenkäsittely
The course is an optional laudatur-level course
(special/advanced
studies, erikoiskurssi).
Course description
Speech is the
oldest and most efficient way of communication media between human
beings -
but not so simple for computers as one may think! This course gives an
introduction to techniques for
processing and analyzing
digital speech and audio signals. The topics include spectral analysis
methods, fundamental frequency and formant tracking, denoising, and
feature extraction for recognition applications. Short introduction to
pattern matching techniques such as dynamic time warping and Gaussian
mixtures will also be given. These are the basic components in most
speech technology applications, including speech and speaker
recognition,
speech coding, and synthesis.
News
Final
results. Please remember course feedback
also. Thanks & Happy Easter to everyone! :)
Final
seminar: Tuesday 8.4.2009, at 11:15 - 13:00, T/B 180
Instructions
for the project (added 31.3.2009)
The
first exam:
Thursday 19.3.2009 at 12-14 (sharp!) in T/D 106.
Sample
problems for practicing to the exam, with solutions
Teachers & material
Lectures: Tomi
Kinnunen (E-mail: tkinnu@cs.joensuu.fi, Office:
T/B
358)
Exercises:
Rahim
Saeidi (E-mail: rahim@cs.joensuu.fi, Office: T/B 356)
Schedule
Note!
Course starts already at week 9, first lecture is on Wednesday 25.2.
The course finishes on the week 15 (6.-12.4.2009), before Easter. There
are two lectures and two exercise groups each week, amounting to ~26
hours of lectures and ~24 hours of exercises.
Schedule on week 9 (23.2
- 1.3.) is exceptional:
Lecture: Wednesday 25.2 at 14-16, T/D 106
Lecture: Thursday 26.2 at 10-12, T/D 106
Exercise: Thursday 26.2 at 16-18, T/B 247 (computer class)
Exercise: Friday 27.2 at 12-14, T/B 178 (computer class)
"Normal schedule" after the first week:
Lecture: Monday 12-14, 2D106B (Time and room
changed!)
Lecture: Tuesday 10-12, T/D 106, except 17.3 at
12-14, 2D106B
Exercise: Monday 14-16, T/B 247
(computer class)
Exercise: Tuesday 14-16 T/B 247, except 10.3 in T/B 178 (computer
class)
Last lecture (recap of the course) : Monday 6.4 at 10-12, T/D 106
Exam: Tuesday 7.4. at 16-19, T/D 106.
Preliminary knowledge
Sufficient knowledge of mathematics
and programming. Knowledge of
digital signal
processing and Matlab will be helpful but not necessary. Course
recommended for 3.-N. year students or for anyone who is motivated to
learn!
Course implementation
- The course will consist of lectures ("theory"), exercises
("practice") and a considerable number of exercises in Matlab language
("the real practice"). It is possible to make an additional 1-2 op
research-oriented project work after the course.
- Exercises will consist of both traditional paper-and-whiteboard
exercises as well as computer exercices using Matlab. For the
"traditional" exercises the 1/3 requirement applies, extra points up to
10% of the maximum exam points.
- Some of the Matlab exercices will be compulsory and must be
submitted before participating the exam. High weightage will be given
to these practical exercises.
- Grading (roughly) : 40% for the compulsory Matlab exercises, 60%
for the exam. Extra points for making > 1/3 of standard exercises
(up to 10% maximum).
Recommended literature
Well-suited for computer
scientists: X. Huang, A.
Acero, H.-W. Hon, Spoken Language
Processing: a
Guide to Theory, Algorithm, and System Development, Prentice
Hall PTR,
2001.
Benesty, Jacob; Sondhi, M. M.; Huang, Yiteng (Eds.), Springer handbook of Speech Processing,
1176 pages, 2008.
Suitable for signal engineer: T.F. Quatieri, Discrete-Time Speech Signal Processing: Principles and
Practice, Prentice Hall PTR, 2002.
Deller, Hansen, Proakis: Discrete-Time
Processing of
Speech Signals, 2nd edition, IEEE Press, 2000.
J. Harrington and S. Cassidy, Techniques
in Speech Acoustics, Kluwer, 1999.
To study the basics of DSP, I recommend DSPguide, free online book.