prebuilt.journalism_.tools¶
Tools for the Journalism AI Assistant.
This module provides all tools used by the journalism assistant for web searching, content extraction, text processing, and analysis.
Tools include web search integration, HTML parsing, text chunking, and various utility functions for journalism workflows.
Example
>>> from journalism_assistant.tools import search_web, extract_web_content
>>> results = search_web("climate change statistics 2024")
>>> content = extract_web_content("https://example.com/article")
Note
Tools are implemented as LangChain-compatible functions with proper schemas for input/output typing.
Functions¶
|
Analyze the diversity of sources quoted in an article. |
Calculate readability metrics for text. |
|
|
Split text into manageable chunks for processing. |
|
Detect potential bias indicators in text. |
|
Extract quoted text from an article. |
|
Extract and clean content from a web page. |
|
Identify factual claims in text that should be fact-checked. |
|
Search for information and summarize the results. |
|
Search the web using DuckDuckGo for fact-checking and research. |
Module Contents¶
- prebuilt.journalism_.tools.analyze_source_diversity(quotes)¶
Analyze the diversity of sources quoted in an article.
This tool examines quoted sources to assess diversity and potential source bias.
- prebuilt.journalism_.tools.calculate_readability_score(text)¶
Calculate readability metrics for text.
This tool analyzes text readability using various metrics like average sentence length and syllable count estimates.
- prebuilt.journalism_.tools.chunk_text(text, chunk_size=100000, chunk_overlap=1000)¶
Split text into manageable chunks for processing.
This tool splits large text into smaller chunks while maintaining context through overlap, suitable for LLM processing.
- Parameters:
- Returns:
List of text chunks
- Return type:
List[str]
Example
>>> chunks = chunk_text(long_article, chunk_size=50000) >>> print(f"Split into {len(chunks)} chunks")
- prebuilt.journalism_.tools.detect_bias_indicators(text)¶
Detect potential bias indicators in text.
This tool identifies language patterns that may indicate various types of bias in writing.
- prebuilt.journalism_.tools.extract_quotes(text)¶
Extract quoted text from an article.
This tool identifies and extracts direct quotes from text, attempting to identify the speaker when possible.
- Parameters:
text (str) – Text to extract quotes from
- Returns:
List of dictionaries with quote text and speaker
- Return type:
Example
>>> quotes = extract_quotes(article_text) >>> for quote in quotes: ... print(f'"{quote["text"]}" - {quote["speaker"]}')
- prebuilt.journalism_.tools.extract_web_content(url, extract_links=False)¶
Extract and clean content from a web page.
This tool fetches web page content and extracts clean text, removing scripts, styles, and other non-content elements.
- Parameters:
- Returns:
Dictionary with extracted content and metadata
- Return type:
Dict[str, Any]
Example
>>> content = extract_web_content("https://example.com/article") >>> print(f"Extracted {content['word_count']} words")
- prebuilt.journalism_.tools.identify_key_claims(text)¶
Identify factual claims in text that should be fact-checked.
This tool analyzes text to identify statements that make factual claims suitable for verification.
- Parameters:
text (str) – Text to analyze for claims
- Returns:
List of identified claims
- Return type:
List[str]
Example
>>> claims = identify_key_claims(article_text) >>> print(f"Found {len(claims)} claims to fact-check")
- prebuilt.journalism_.tools.search_and_summarize(keywords, max_results=3)¶
Search for information and summarize the results.
This tool combines web search with content extraction to provide summarized information for fact-checking.
- prebuilt.journalism_.tools.search_web(keywords, max_results=5)¶
Search the web using DuckDuckGo for fact-checking and research.
This tool performs web searches to find relevant information for fact-checking claims and researching topics.
- Parameters:
- Returns:
List of search results with title, URL, and snippet
- Return type:
List[Dict[str, Any]]
Example
>>> results = search_web("COVID-19 vaccine efficacy 2024", max_results=3) >>> for result in results: ... print(f"{result['title']}: {result['url']}")