This course was previously offered several years ago (as LIS 9732/9832 (Language & Computer Technologies for Libraries & Beyond.) The sample content below reflects its initial offering. It was essentially a 'soft' introduction to Natural Language Processing (NLP) for MLIS students with non-technical backgrounds.
A new FIMS-wide course will now be more inclusive of wider FIMS audience's interests. (See a minimal Western University Course catalogue description). The revamped and much updated version of the course will emphasize the grasp of technical details. We will not shy away from trying some practical data analysis skills with Python (in a series of tutorials). NLP technologies have made considerable advances in the recent years. Think Alexa, Google-translate, predictive auto-correction in search, and recommender systems. There is increased demand for specialists that can understand the inner workings of such tools, apps, methods. FIMS student are positioned to connect users' and developers' perspectives and comment intelligibly on their pros and cons, their adoption, potential harm or other related issues such as misuse and abuse of privacy. (Last update: 01/06/2021)
Location: FIMS & Nursing Building at the University of Western Ontario, London, CANADA
Planned for the 2020/21 Academic Year, but still TBD.
- To gain an awareness and appreciation of the complexity of natural language.
- To analyze the research literature on linguistic and computational aspects of natural language processing techniques.
- To critically evaluate a variety of applications that use natural language processing technologies.
- To connect NLP technologies and library applications in an innovative way.
- (Optional: To gain practical experience in basic text analysis with NLP techniques and/or in advanced NLP applications).
Introduction to linguistic and computational aspects of natural language processing technologies. Familiarity with underlying principles and techniques required to perform all levels of language understanding and processing of naturally occurring text. Critical assessment of the use of language technologies in a variety of applications.
See also a Prezi: How are Language Technologies and LIS(&T) interlinked?
Linguistic and Computing Basics:
- Computing with Words.
- Phonetics: Speech. Sound Structure. Phoneme Classifications. Statistical vs. Symbolic NLP
- Corpus Linguistics: Collocations. Concordances. Annotation.
- Lexicology: Corpora. Lexicons. WordNet
- Morphology: Components of Words. Informative Affixes. Stemming and Lemmatizing.
- Part-of-Speech Tagging: Challenges. Approaches. Accuracy.
- Parsing: Phrase Structures. Context Free Grammars. Methods.
- Semantic Networks. Thematic Roles. Frames. Case Grammars. Conceptual Graphs.
- Discourse: Cohesion. Anaphora. Co-reference resolution. Discourse Structure. Sublanguage
- Pragmatics: Speech Act Theory. Gricean Maxims. Dialogues. Plan Recognition. Subjectivity.
- Final Thoughts: Myths & Reality
- Machine Translation. Automated Summarization. Question and Answering.
- Natural Language Interaction: Dialogue Systems. Chatbots. Speech Recognition.
- NLP in Information Retrieval (IR). Cross-language IR. Multimedia. NLP in Image IR.
- Mining Content of Social Software Sites. Analysis of Social Tags. Information Extraction. Text Mining.
- Computer Assisted Language Learning. Language Identification. Terminology Alignment and Comprehension Aids.
- Authoring Aids. Automatic Indexing.
- Assistive Technologies (for disabled).
Master’s students will write three papers related to the chosen language technology application – two short papers (4 pages each) and a cumulative in-depth final paper (10 pages). Each paper will be presented to class prior to submission.
Master’s students will report in class twice. The first 5 minute presentation will provide the class with a non-technical overview of the chosen application. The second 10 minute presentation will cover technical details of the methods and connect them to linguistic fundamentals. Live system demonstrations are encouraged.
Doctoral students have an additional project assignment. Their presentation and paper will discuss the individually-customized project, its purpose, data, and design, basic findings, only if applicable, and the value and limitations of the chosen method or system.
All posters are to be presented in an open poster session in the last class. Expect to have a 5-10 minute interaction with the instructor and various length interactions with fellow students and interested faculty, if available.
Methods of Evaluation:
The class is crosslisted for Master's and Doctoral students: course requirements, professor's expectations and marking schemes are adjusted accordingly.
EVALUATION Master’s Students (LIS 9732)
- Participation 10 %
- Short overview paper (15%) and presentation (5%) 20 %
- Technical paper (20%) and presentation (10%) 30 %
- Final NLP application paper (30%) and poster presentation (10%) 40 %
EVALUATION Doctoral Students (LIS 9832)
- Participation 10 %
- Two written summaries of readings 10%
- Short overview paper (10%) and presentation (5%) 15 %
- Technical paper (15%) and presentation (5%) 20 %
- Project description paper (15%) and presentation (5%) 20%
- Final NLP application paper (15%) and poster presentation (10%) 25 %
More Informally About the Course:
What's it about? In essence, this course is a gentle transition from a humanities background towards a more technologically oriented way of thinking. I invite you to think about the role of computers in acquiring, analyzing, organizing, providing access, and making sense of textual information. We will concentrate on understanding capabilities and limits of current natural language technologies. We will discuss ways that people have thought to utilize language analysis in order to organize textual information in a meaningful way.
You will be exploring and thinking about how "intelligent computers" can assist us in libraries. Are there text-intensive environments (directly or indirectly related to libraries) beyond library applications for language and information technologies?
Have you, as an individual or professional, ever come across the information overload program? Are you familiar with machine translation, automated summarization, question-and-answering and retrieval, extraction, or auto-indexing? What are their current state-of-the-art, what are the advantages and limitations? Can these be of any help to you in your professional capacity as a librarian or information scientist?
The class is a combination of lectures, in class discussions, projects, independent reading, presentations, and papers. We learn actively in class by doing hand-on exercises.
While there will be no programming required, be prepared to look at sequences of pseudo-code steps necessary in understanding how computer programs work on the conceptual level. No prior linguistic background necessary. Bring your curiosity about languages and computers and, most importantly, keep your mind open!