Mo El-Haj

Mo El-Haj, PhD

College of Engineering and Computer Science

Associate Professor (Reader), Data Science Program

Director of VinNLP Research Group

Biography

Dr. Mo El-Haj is an Associate Professor (Reader) in NLP at the Data Science Program, College of Engineering and Computer Science, VinUniversity, and a Visiting Researcher at the School of Computing and Communications, Lancaster University, UK. He is also the Director of VinNLP – VinUniversity’s NLP Research Group (vinnlp.com).

He holds a PhD in Computer Science from the University of Essex in the United Kingdom and brings over 12 years of experience in Higher Education. He previously worked at Lancaster University, where he served as a Senior Lecturer (Associate Professor) in Computer Science, specialising in Natural Language Processing (NLP).  Prior to that, he worked at the UK Data Archive at the University of Essex as a Data Mining Developer and NLP Researcher, contributing to projects focused on data analysis and text processing.

An active researcher since 2006, Dr El-Haj has published over 90 peer-reviewed publications in top-tier venues, including EMNLP, COLING, LREC, ACL, IEEE Big Data, IJCL, and NLDB. His work is widely cited within the NLP communities, with more than 70 papers receiving over 10 citations each.

Dr El-Haj’s research focuses heavily on low-resourced languages, including Welsh, Arabic, Igbo, Hindi, and other underrepresented languages. He has led projects developing and fine-tuning large language models (LLMs) and building linguistic resources to support these languages in tasks such as text summarisation, translation, and sentiment analysis.

He is a Fellow of the Higher Education Academy (FHEA) – Advance HE, UK. His achievements include being part of the winning team for the best audience-facing tool at the BBC NewsHack event in London, a fully funded internship at the National Institute of Informatics in Tokyo, Japan, and receiving the Best Paper Award at the 4th LTC Conference in Poznan, Poland.

In addition to his research contributions, Dr El-Haj has served as chair, editor, or reviewer for over 50 journals and conferences and is a member of the Editorial Board for the Natural Language Engineering (NLE) journal. He also serves on the Advisory Board of the Natural Language Processing Book Series.

Throughout his career, Dr El-Haj has secured over $1.2 million in research funding, leading impactful projects across healthcare, finance, and cultural preservation. He remains deeply committed to advancing NLP technologies to address societal challenges and fostering collaborations with colleagues and students.

  • Natural Language Processing
  • Large Language Models (LLMs)
  • Low-Resourced Languages
  • Automatic Text Summarisation
  • Machine Learning and Machine Translation
  • Resource Building for Underrepresented Languages
  • NLP for Healthcare, Finance, and Cultural Preservation
  • Financial Narrative Processing
  • Arabic Dialect NLP
  • Readability Estimation of Educational Materials
  • Cybersecurity and NLP for Threat Detection
  • Sentiment and Emotion Analysis
  • Information Extraction and Retrieval

  • Natural Language Processing
  • Artificial Intelligence and Machine Learning
  • Data Science
  • Text Analytics
  • Databases
  • Data Visualization

  1. El-Haj, M., Rayson, P., Walker, M., Young, S., Simaki, V. (2019). “In Search of Meaning: Lessons, Resources and Next Steps for Computational Analysis of Financial Discourse.” Journal of Business Finance & Accounting, 46(3-4), 265-306. [Highly cited, 148 citations]
  2. El-Haj, M., Alves, P., Rayson, P., Walker, M., Young, S. (2020). “Retrieving, Classifying and Analysing Narrative Commentary in Unstructured (Glossy) Annual Reports Published as PDF Files.” Accounting and Business Research, 50(1), 6-34. [Highly cited, 121 citations]
  3. Morris, J., Ezeani, I., Gruffydd, I., Young, K., Davies, L., El-Haj, M., Knight, D. (2024). “Welsh Automatic Text Summarisation.” Language and Technology in Wales: Volume II, Bangor University.
  4. Phillips, J., El-Haj, M., Hall, T. (2024). “Metric-Oriented Pretraining of Neural Source Code Summarisation Transformers to Enable more Secure Software Development.” 1st International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security (NLPAICS), Lancaster, UK.
  5. El-Haj, M., Saad Ezzini. (2024). “The Multilingual Corpus of World’s Constitutions (MCWC).” 6th Workshop on Open-Source Arabic Corpora and Processing Tools, LREC-COLING 2024, Turin, Italy.
  6. El-Haj, M., Sultan Almujaiwel, Damith Premasiri, Tharindu Ranasinghe, Ruslan Mitkov. (2024). “DARES: Dataset for Arabic Readability Estimation of School Materials.” DeTermIt! Workshop, LREC-COLING 2024, Turin, Italy.
  7. Daniel F. O. Onah, Elaine Ling Ling Pang, El-Haj, M. (2022). “A Data-driven Latent Semantic Analysis for Automatic Text Summarization using LDA Topic Modelling.” IEEE International Conference on Big Data, Osaka, Japan.
  8. El-Haj, M., Paul Rayson, Nadhem Zmandar. (2021). “Multilingual Financial Word Embeddings for Arabic, English, and French.” IEEE International Conference on Big Data.
  9. Parth Saxena, El-Haj, M. (2023). “Exploring Abstractive Text Summarisation for Podcasts: A Comparative Study of BART and T5 Models.” Recent Advances in Natural Language Processing (RANLP), Varna, Bulgaria.
  10. Nadhem Zmandar, El-Haj, M., Paul Rayson. (2023). “FinAraT5: A Text-to-Text Model for Financial Arabic Text Understanding and Generation.” 4th Conference on Language, Data and Knowledge (LDK), Vienna, Austria.
  11. Chukwuneke, C. I., Ezeani, I., Rayson, P., El-Haj, M. (2023). “IgboNER: Expanding Named Entity Recognition Datasets via Projection.” AfricaNLP Workshop, ICLR 2023, Kigali, Rwanda.
  12. Ignatius Ezeani, El-Haj, M., Jonathan Morris, Dawn Knight. (2022). “Introducing the Welsh Text Summarisation Dataset and Baseline Systems.” LREC 2022, Marseille, France.
  13. El-Haj, M., Elvis de Souza, Nouran Khallaf, Paul Rayson, Nizar Habash. (2022). “AraSAS: The Open Source Arabic Semantic Tagger.” OSACT Workshop, LREC 2022, Marseille, France.

  • 2012: PhD in Computer Science, University of Essex, UK
  • 2021: Fellowship of the Higher Education Academy (FHEA), Advance HE, UK
  • 2008: MSc in Information Systems, University of Jordan, Jordan
  • 2005: BSc in Computer Information Systems, University of Jordan, Jordan

SELECTED AWARDS AND HONORS

  • 2023: Best Presentation Award, LITHME Training School, Kosovo
  • 2021: Fellowship of the Higher Education Academy (FHEA), Advance HE, UK
  • 2016: Winning Team for Best Audience-Facing Tool, BBC NewsHack Event, London, UK
  • 2012: Partially Funded PhD Scholarship, University of Essex, UK
  • 2011: Fully Funded Internship, National Institute of Informatics, Tokyo, Japan
  • 2009: Best Paper Award, 4th Language and Technology Conference (LTC), Poznan, Poland

SELECTED GRANTS AND FUNDED PROJECTS

  • 2025: $10,160 – Using NLP to monitor water pipes burst (Funder: The South West Water, UK, Role: Consultant)
  • 2024: $10,160 – Pilot Welsh Language Model (Funder: Welsh Government, Role: Principal Investigator)
  • 2024: $12,700 – DigiGrid for Welsh Language Resources (Funder: Welsh Government, Role: Principal Investigator)
  • 2024: $12,700 – Catalyst Fund for Advancing Celtic NLP Research (Funder: Faculty of Science and Technology, Lancaster University, Role: Principal Investigator)
  • 2024: $127,000 – FreeTxt: Supporting Bilingual Free-Text Survey and Questionnaire Data Analysis (Funder: AHRC, Role: Co-Investigator)
  • 2023: $54,102 – Talent Track Application for NLP and Econometric Techniques (Funder: SAMF, Denmark, Role: Consultant)
  • 2023: $38,100 – Canadian Annual Reports Extractor (CARE) (Funders: Mitacs, HEC Montreal, Waterloo University, CPA Canada, Role: Principal Investigator)
  • 2022: $114,300 – Using Word Embeddings to Create a Thesaurus of Contemporary Welsh (Funder: Welsh Government, Role: Principal Investigator)
  • 2022: $114,300 – Welsh Summary Creator (WSC) (Funder: Welsh Government, Role: Principal Investigator)
  • 2021: $46,990 – CLARA-Fin: Readability and Simplification in Financial Narrative (Funder: Spanish Agency for Research, Role: Consultant)
  • 2021: $27,940 – An Assessment of Corporate Disclosures from Accounting Standards 15 (Funder: IAAER/KPMG, Role: Principal Investigator)
  • 2021: $7,620 – Arabic USAS Semantic Tagger (AraSAS) (Funder: Research Incentive Fund, Zayed University, UAE, Role: Principal Investigator)
  • 2018: $44,450 – FinT-esp: Financial Texts in Spanish (Funder: Spanish Agency for Research, Role: Consultant)

PhD Students

  • Gigi Alshahrani: Scope For Using Machine Learning to Detect Offensive Content in Different Arabic Dialects
  • Salim Al Mandhari: Arabic Automatic Readability Assessment
  • Damith Dola Mullage: Deep Learning Models to Identify Ethical Misconducts in Legal Documents
  • Chiamaka Chukwuneke: Named Entity Recognition for African Languages: A Focus on Igbo
  • Jesse Phillips: The Automated Generation of Meaningful and Coherent Source Code Documentation Using Natural Language Processing Techniques
  • Dr Nadhem Zmandar: Multilingual Financial Summarization

 

PhD External Examiner for:

  • Dr Fatimah Al-Qahtani, King’s College London (KCL), England, UK
  • Dr Taghreed Tarmom, University of Leeds, England, UK
  • Dr Alaa Alqahtani, University of Birmingham, England, UK
  • Dr Chatrine Qwaider, University of Gothenburg, Göteborg, Sweden
  • Dr Mohammed Hamed Altamimi, Bangor University, Wales, UK
  • Dr Maher Itani, Sheffield Hallam University, England, UK

PhD Internal Examiner for:

  • Dr Matthew Coole, Lancaster University, England, UK
  • Dr Edward Dearden, Lancaster University, England, UK
  • Dr Lama Alsudias, Lancaster University, England, UK
  • Dr Ronghui Mu, Lancaster University, England, UK (Chair)
Banner footer