
This project aims to build a Multilanguage Abstractive/Extractive text Summarizer based on the 't5-large' model from the Hugging Face Transformer library and the Google Translation API.
Read More
This project is an attempt to create a Document-based English-Speaking Chatbot using Python's NLTK library and Wikipedia's content summaries through the Wikipedia API.
Read MoreNatural language processing (NLP) is the area of artificial intelligence concerned with enabling machines to understand, interpret, and generate human language. From search engines to virtual assistants, NLP systems power the tools we use every day to interact with information. The projects in this section tackle two core NLP tasks: text summarization and document-based conversational retrieval, each using a different methodology and technology stack.
The first project builds a multilingual abstractive and extractive text summarizer using Google's T5 (Text-to-Text Transfer Transformer) architecture via the Hugging Face Transformers library. T5 treats every NLP task as a text-to-text problem, making it straightforward to fine-tune or prompt for summarization. Combined with the Google Translate API, the pipeline accepts input in multiple languages, translates it to English for summarization, and returns the condensed output in the original language. The notebook covers tokenization, beam search decoding, and evaluation of summary quality across different text lengths and source languages.
The second project creates a document-based English-speaking chatbot using Python's NLTK library and Wikipedia's content API. When a user asks a question, the system fetches relevant Wikipedia articles, preprocesses the text with tokenization and lemmatization, and ranks candidate sentences using TF-IDF vectorization and cosine similarity to find the best answer. This information retrieval approach demonstrates how classic NLP techniques remain effective for building lightweight, explainable conversational agents without requiring large-scale model training.
Both projects include reproducible notebooks with detailed code comments and step-by-step explanations. The summarizer project shows you how transfer learning reduces the data and compute required to achieve strong results on a downstream task, while the chatbot project illustrates how sparse vector representations and simple similarity measures can power a functional question-answering system. Together they give you practical experience with both modern transformer-based methods and foundational NLP techniques that remain widely used in production systems today. You can run every notebook on Google Colab with a single click and modify the code to experiment with your own texts, languages, or knowledge sources. All dependencies are pinned for long-term reproducibility, and each section includes inline comments that explain not just what the code does but why each design choice was made.