PROJECT OVERVIEW: Background: To help meet the increasing demand for high-quality, efficient and accurate diagnostic assessments of children's reading skills, a new paradigm for automatic assessments is proposed. Our team and approach are multidisciplinary spanning Education, Electrical Engineering, Computer Science, Linguistics, and Neuroscience.
The project will have a profound impact on relieving much of the burden of testing from teachers (allowing them to focus more on what they do best), automated testing for very young children (allowing a greater leverage point for potential intervention), and inclusion of an increasingly diverse population (enabling unbiased assessment and furthering the goal of universal access). National educational priorities are emphasizing testing to a greater extent than ever before, but increased testing leads to less time for teaching. Educational policy is also pushing downward to earlier ages to begin formal literacy instruction.
This system will provide a useful aid to learn how to help young children succeed and to monitor their progress. The rapid expansion of student groups, reflecting diverse, non-native speakers of English, presents a challenge for fair assessment. The system helps ensure unbiased assessment of competence in a timely and useful way. It is expected that the project will improve assessment and instructional material in the classroom.
Purpose: The goals of our project are to develop an assessment system and tools that are helpful for teachers, test students consistently, and can automatically score and analyze children's performance on literacy assessment tasks, and to investigate literacy measures that are reliable indicators of later academic performance. The system and materials would be designed to assess reliably both native speakers of English and non-native speakers who are of Hispanic backgrounds and also support users with a range of assessment and content expertise. The impact of the proposed approach will be studied in a longitudinal fashion from K to 2nd grade, and in partnership with the Los Angeles Unified School District (LAUSD), and UCLA's University Elementary School (UES). These schools have a highly diverse economic and ethnic student body with more than half of the population being Hispanic.
Intervention: We have designed a data collection and analysis plan and in the fall we plan to conduct field testing of the system with 300 children in our five collaborating schools. The purpose of field testing is to examine the efficacy of the speech recognition technology for assessing reading development, we will collect and analyze student assessment data in conjunction with standardized reading achievement data (CAT-6 for grade 1 and grade 2 and Woodcock Diagnostic Reading Inventory for kindergarten) to evaluate children's overall reading development.
We will collect student assessment data across three time points during the year to monitor and evaluate student reading development over time. We will explore the relationship between emerging literacy skills (enabling skills) and standardized reading achievement data to examine the validity of the proposed reading assessments.
We are also developing a Listening Comprehension Test that is parallel to the Peabody Individual Achievement Test in Reading (a widely used reading comp test). It has not been used as a part of our formal assessments done so far, but we are doing some piloting work with them while creating test items.
Setting: This project is a partnership between ULCA, USC, and UC Berkeley and several elementary schools.
In Southern California, we've done data collection at 5 schools: Twenty-eighth Street Elementary for three years and data collection at Esperanza, Beethoven, Para los Ninos, and UES for two years. Twenty-eighth Street, Esperanza, Para los Ninos and Beethoven are all in
LAUSD. Para los Ninos differs from the other school in that it is a charter school. This means that the school has somewhat more autonomy to make local school based decisions. UES is a lab school on the UCLA campus and a component of the GSEIS.
Twenty-eighth street, Esperanza, and Para los Ninos all have similar student demographics, that is, low socioeconomic status (Title I status: meaning all students receive free lunch), and the majority are English Language Learners with Spanish as their first language. All three schools are considered gateway schools, meaning they are located in communities with a high percentage of recent immigrants. These schools are all clustered around the downtown Los Angeles area. Beethoven has about 50 percent English Language Learners and 50 percent English Only. The socioeconomic status of its students is varied. This school is located in West LA. UES has a range of socioeconomic statuses among its student body and a range in ethnic backgrounds. The goal of the enrollment policy is to match the distribution of ethnicities and socioeconomics to that of the state's population.
In Northern California, data collection was done at 2 schools: Colonial Acres Elementary, (2 bilingual classes, first language is Mexican Spanish), and Berkley Maynard Academy, (1 class almost completely African American English).
Research Design: Overall, we recorded 256 children, mainly ages five to eight and roughly evenly divided by gender. Of these, 69% were native speakers of Spanish, 24% were native speakers of English, and 5% were native speakers of both English and Spanish.
When learning to speak English, non-native talkers may pronounce some English phonemes differently from native talkers. These pronunciation variations can degrade an automatic speech recognition (ASR) system's performance. To create a fair assessment we need to model pronunciation variation.
We analyzed pronunciation variation in 4500 word tokens spoken by
18 5-7 year old English language learners whose first language was Spanish. A number of interesting findings were observed and reported in our publications. The findings were then used to improve ASR performance. An important issue our analyses point to is one also facing teachers who conduct assessments, namely when does an unexpected pronunciation constitute a reading error and when does it constitute Spanish-accented English? Further analyses conducted with classroom teachers will determine whether the automated assessment system more successfully distinguishes responses than teachers who are perhaps less aware of the systematicity in a particular child's dialect.
Findings: Our project falls readily into four categories. The first major line of development is the overall architecture of our technology-based assessment system and the user interface developed to ensure that the primary clients, students and teachers, can and will use the platform. The second component addresses issues that pertain to bilingual children. For example, second-language learners may pronounce English words differently than native talkers of English. Are there systematic pronunciation variations? How much can be attributed to the first language and how much is due to developmental issues? When is a non-canonical pronunciation a reading error and when is it not? Third is the "content" of the assessment system. There are two faces of content: (a) the "design" of the system (which aspects represent recurring benchmark assessments and which, diagnostic subroutines accessed on an as needed basis), and (b) the actual tests themselves, including some rather novel formats and a set of guidelines that move us toward the domain-referenced rather than the norm-referenced end of the assessment continuum. The fourth key component is the speech recognition module that we have developed to allow us to capture students' responses and, in certain cases, to both categorize and "score" those responses according to standards that the research team, along with our collaborating teachers, have developed. Progress has been made on all fronts.
1. Interface, database, and ASR design
To enable interactive and child-friendly assessment, we designed a conversational student interface with multimodal capabilities (e.g., supporting visual display and touch-screen input). For the system to be useful for teachers it is also necessary to develop a database that stores all the audio files from the assessments, plus associated metadata (e.g., student identification, particular test material, etc.) and derived data (e.g., the words misread or mispronounced). To support instructional planning teachers will be able to query the database via a separate teacher interface. For example a teacher could ask " How did child X's performance in November 2005 on letter sound knowledge and phonemic awareness compare with her performance on the same tasks in September 2005?" We have also developed ASR algorithms that can recognize children's speech.
2. Content Design and Teacher Practices
We have developed the content of the speech recognition technology system, designed to provide teachers with a broad array of classroom-based, formative assessments that can be used to monitor progress in children s reading development in kindergarten through second grade, and to give teachers the information they need to adjust their instruction to meet children s needs. The individual assessments are situated in an original and comprehensive framework of assessment and instruction that addresses the specific, critical skills, identified in the reading literature, that children need to acquire in the early grades of schooling in order to become proficient readers. To maximize the instructional utility of the assessments, the framework was developed in consultation with John Shefelbine, reading teaching expert from California State University, Sacramento, and in collaboration with group of expert reading teachers from schools in Los Angeles, who met regularly with project team members. Our work was guided by three essential questions posed by Shefelbine:
Are the assessments embedded in an instructional framework?
What is the instructional value of the information?
How much assessment is too much?
As a result of these guiding questions the framework is not premised on the basis that all children take all assessments. Rather, it is hierarchical in nature with core assessments for all children as a check on progress, and drill down assessments related to specific skills on an as needed based. Using the information from the core assessments, teachers make determinations about whether sufficient information has been gained to plan the next instructional steps, or whether certain drill down assessments are necessary for diagnostic purposes. We investigated a significant number of available reading assessments and in the framework we have made use of some existing assessments (e.g., the Basic Phonic Skills Test (BPST), which is widely used in California, and the San Diego Quick Assessment List, which has been in use for many years across the country). Other assessments in the framework have been adapted from existing tests or designed from scratch by the content team and offer innovative ways to elicit the requisite information for teachers about specific skills in reading.
PROJECT PUBLICATIONS: 1. S. Panchapagesan and A. Alwan, "Multi-parameter Frequency Warping for VTLN by Gradient Search", IEEE ICASSP 2006 Proceedings.
2. M. Iseli, Y, Shue, and A. Alwan, "Age- and Gender-Dependent Analysis of Voice Source Characteristics", IEEE ICASSP 2006 Proceedings.
3. X. Cui and A. Alwan, Robust Speaker Adaptation by Weighted Model
Averaging Based on the Minimum Description Length Criterion, to appear, IEEE Transactions on Speech and Audio Processing, 1/07.
4. X. Cui and A. Alwan, "Adaptation of Children's Speech with Limited Data Based on Formant-like Peak Alignment," Computer Speech and Language, to appear, 2007.
5. H. You, A. Alwan, A. Kazemzadeh and S. Narayanan, "Pronunciation
Variation of Spanish-accented English Spoken by Young Children," Eurospeech 2005, pg. 749-752.
6. A. Kazemzadeh, H. You, M. Iseli, B. Jones, X. Cui, M. Heritage, P. Price, E. Anderson, S. Narayanan and A. Alwan, "TBALL Data Collection: the Making of a Young Children's Speech Corpus," Eurospeech 2005, pg. 1581-1584.
7. X. Cui and A. Alwan, "MLLR-Like Speaker Adaptation Based on Linearization of VTLN with MFCC features," Eurospeech 2005, pg. 273-276.
8. S. Lee, S. Narayanan, D. Byrd (2004) A developmental acoustic
characterization of English diphthongs. Acoustical Society of America,
New York, New York, May 2004. Journal of the Acoustical Society of America, 115(5,2): 2628.
9. Jorge Silva, and Shrikanth S. Narayanan, "A Statistical Discrimination Measure for Hidden Markov Models based on Divergence," Proceedings of ICSLP, 2004.
10. Simona Montanari Serdar Yildirim, Sonia Khurana, Marni Landes, Lewis Lawyer, Elaine Andersen and Shrikanth Narayanan, "Analyzing the interplay between spoken language and gestural cues in conversational child-machine interactions in pre/early literate age groups," InSTIL/ICALL Symposium 2004 NLP and Speech Technologies in Advanced Language Learning Systems, June 2004 Venice, Italy.
11. Montanari S, Yildirim S, Andersen E and Narayanan S, "Reference marking in children's computer-directed speech: An integrated analysis of discourse and gesture", Proceedings of ICSLP, 2004.
12. E. Andersen and P. Price, "Assessing Reading Skills in Young Children: The TBALL Project", CRESST Conference, UCLA, Sept. 9th, 2004.
13. X. Cui and A. Alwan, "Combining Feature Compensation and weighted viterbi decoding for noise robust speech recognition with limited adaptation data, in Proc. ICASSP, Pp. 969-972, Montreal, Canada, May, 2004.
14. M. Iseli and A. Alwan, "An improved correction formula for the estimation of harmonic magnitudes and its application to open quotient estimation, in Proc. ICASSP, Pp.669-672, Montreal, Canada, May, 2004.
15. H. You and A. Alwan, "Entropy-based Variable Frame Rate Analysis
of Speech Signals and its Application to ASR," in Proc. ICASSP, Pp.549-552, Montreal, Canada, May, 2004.
16. An article on our work was published in the UCLA Engineer Magazine highlighting the project and its cross-disciplinary aspects. The magazine reaches a wide audience of technical and non-technical people. http://www.engineer.ucla.edu/magazine/tball.html
17. AERA 2006 presentation: "TBall: Technology-Based Assessment of Language and Literacy," P. David Pearson, Abeer Alwan, Shri Narayanan, Elaine Andersen, Alison Bailey, Patti Price, Margaret Heritage, Eva L. Baker, Richard Muntz, Carlo Zaniolo, Christy Kim Boscardin, Barbara Ann Jones, Larry Casey, Kimberly Reynolds, Taho Michelle Duong, Maria Callahan, Markus Iseli, Hong You, Xiaodong Cui, Yijian Bai, Abe Kazemadeh, Joe Tepperman, Sungbok Yee, Jorge Silva.