Lexical Simplification: September 2013

On Wednesday (4/9/2013) I successfully completed my end of second year interview. This means that I am now officially a third year PhD student. I am now at the dead halfway point of my PhD, having completed 24 months with 24 more remaining. It has been a long road getting here and there is still a long way to go. Below is a brief analysis of the achievements in my PhD so far and the goals yet to come.

Completed So Far:

Literature Review: This was the first thing I did as a PhD student. Reading took up most of the first six months of my research. I consumed, refined and categorised as much of the relevant literature as I could find. I am attempting to publish this as a survey paper currently, since the only available text simplification survey is a technical report from 2008
Lexical Simplification Errors: I recently undertook a pilot study looking at the errors thrown up by the lexical simplification pipeline. I'm looking to publish this in an upcoming conference, so won't say too much about the results here and now.
Complex Word Identification: This was the first element of the lexical simplification pipeline that I studied. I built a corpus of sentences, each with one word marked as complex for the purpose of evaluating current methods of identification. This work was published in 2 separate workshop papers at ACL 2013.
Substitution Generation: Once we have identified a complex word, we must generate a set of substitutions for it. However, those words which are complex are also those which are least likely to be found in a thesaurus, complicating the task. To address this I spent considerable efforts learning simplifications from massive corpora with some success. This work is also currently being written up for publication.

Still to come:

Word Sense Disambiguation: The next step in the pipeline is to apply some word sense disambiguation. This has been done before, so I will be looking at the best ways to apply it and hopefully making a novel contribution here. I am just starting out on this phase of research and am currently immersed in the WSD literature, trying to get my head round the myriad techniques that already exist there.
Synonym Ranking: I have looked into the best way to rank synonyms according to their complexity before at the start of my project. The small amounts of work that I did back then did not discover anything radical, but did help me to better understand the structure of a lexical simplification system. When I revisit this area it will be with the hope of making some significant contribution. I was really interested in the work David Kauchak presented at ACL 2013 and will be interested to explore what more can be done in this area.
User Evaluation: Finally, I will spend some time exploring the effects of each of the modules I have developed on individual users. It is of paramount importance to evaluate text simplification in the context of the users it is aimed at and to this end I will be focussing my research on a specific user group. Although which group is as yet undecided.
Thesis: This will undoubtedly take a significant portion of my final year. The chapter titles will hopefully be the bullet points you see listed above.

So there you have it. Although it appears that I have done a lot so far, it still feels like I have a real mountain to climb. There are significant hurdles and vast amounts of reading, researching and writing ahead. I look forward to the challenges that the next two years of my PhD will bring.

Lexical Simplification

Pages

Friday, September 06, 2013

3rd Year

Completed So Far:

Still to come: