A comprehensive Natural Language Processing (NLP) tool designed for analyzing newspaper articles using various NLP techniques. This interactive tool features a user-friendly GUI interface and provides detailed analysis with visualizations.
- Tokenization: Break text into sentences and words
- Stopword Removal: Filter out common words for better analysis
- Named Entity Recognition (NER): Identify people, places, organizations, and other entities
- Part-of-Speech (POS) Tagging: Analyze grammatical structure
- Sentiment Analysis: Determine emotional tone and objectivity
- Word clouds for visual text representation
- POS distribution charts (bar and pie charts)
- Named entity type distribution
- Sentiment analysis graphs
- Statistical summaries
nltk- Natural Language Toolkitspacy- Advanced NLP librarytextblob- Simplified text processingpandas- Data manipulationmatplotlib- Plotting libraryseaborn- Statistical data visualizationwordcloud- Word cloud generationipywidgets- Interactive widgets for Jupyter
The following NLTK datasets are automatically downloaded:
punkt- Sentence tokenizationpunkt_tab- Additional tokenization resourcesstopwords- Common stopwordspos_tag- Part-of-speech taggingvader_lexicon- Sentiment analysisaveraged_perceptron_tagger- POS tagger
en_core_web_sm- English language model
# Run this cell first to install all required packages
!pip install nltk spacy textblob ipywidgets matplotlib seaborn wordcloud
!python -m spacy download en_core_web_smpip install nltk spacy textblob ipywidgets matplotlib seaborn wordcloud
python -m spacy download en_core_web_sm- Copy the code into a Google Colab notebook
- Run the installation cell first
- Execute the main code to launch the GUI
- Paste your article (150-300 words recommended) into the text area
- Click "Analyze Article" to see comprehensive NLP analysis
The tool provides detailed analysis including:
- Text statistics (word count, sentence count)
- Tokenization results
- Stopword removal statistics
- Named entities with classifications
- POS tag distributions
- Sentiment scores with interpretations
- Visual representations (word clouds, charts)
- Splits text into sentences and individual words
- Cleans text by removing punctuation and converting to lowercase
- Provides word and sentence counts
- Filters out common English words (the, and, is, etc.)
- Shows before/after comparison
- Improves focus on meaningful content words
- Identifies and classifies entities:
- PERSON: Names of people
- ORG: Organizations, companies
- GPE: Countries, cities, states
- DATE: Dates and time periods
- MONEY: Monetary values
- And more...
- Tags each word with grammatical role:
- NOUN: People, places, things
- VERB: Actions, states
- ADJ: Descriptive words
- ADV: Modifiers
- And more...
- Polarity: Measures positive/negative sentiment (-1 to +1)
- Subjectivity: Measures objectivity/subjectivity (0 to 1)
- Classification: Positive, Negative, or Neutral
- Journalism Analysis: Analyze news articles for bias and entity coverage
- Content Research: Extract key entities and topics from articles
- Educational Tool: Learn NLP concepts with hands-on analysis
- Text Mining: Process large collections of news articles
- Sentiment Monitoring: Track sentiment trends in news coverage
The tool comes with a pre-loaded sample article about climate change to demonstrate all features. You can replace it with your own content for analysis.
- Polarity > 0.1: Positive sentiment
- Polarity < -0.1: Negative sentiment
- -0.1 β€ Polarity β€ 0.1: Neutral sentiment
- Subjectivity > 0.5: Subjective/opinionated
- Subjectivity β€ 0.5: Objective/factual
Common entity labels and their meanings:
- PERSON: Individual people
- ORG: Organizations, companies, agencies
- GPE: Geopolitical entities (countries, cities)
- DATE: Absolute or relative dates
- TIME: Times smaller than a day
- PERCENT: Percentage values
- MONEY: Monetary values
- QUANTITY: Measurements, counts
tokenize_text(): Sentence and word tokenizationremove_stopwords(): Stopword filteringnamed_entity_recognition(): Entity extractionpos_tagging(): Grammatical analysissentiment_analysis(): Sentiment scoringgenerate_wordcloud(): Visual word representationvisualize_pos_distribution(): POS chartsvisualize_entities(): Entity distribution charts
- Interactive text area for article input
- Analysis button with success styling
- Real-time output display with visualizations
- Comprehensive analysis summaries
- Minimum Text Length: Articles should be at least 50 words for meaningful analysis
- Optimal Length: 150-300 words recommended for best results
- Google Colab: Designed specifically for Colab environment
- Internet Required: Initial setup downloads language models
- Processing Time: Analysis may take 10-30 seconds depending on article length
Feel free to enhance this tool by:
- Adding more NLP techniques
- Improving visualizations
- Supporting additional languages
- Optimizing performance
- Adding export features
This project is open source and available for educational and research purposes.
- Import Errors: Ensure all packages are installed via the installation cell
- spaCy Model Missing: Run
!python -m spacy download en_core_web_sm - NLTK Data Missing: The code automatically downloads required NLTK data
- Empty Analysis: Ensure article has at least 50 words
- Widget Display Issues: Restart runtime and run cells in order
For issues or questions, check that all installation steps are completed and the notebook runtime is properly configured for Google Colab.
Happy Analyzing! π