prebuilt.journalism_.tools¶

Tools for the Journalism AI Assistant.

This module provides all tools used by the journalism assistant for web searching, content extraction, text processing, and analysis.

Tools include web search integration, HTML parsing, text chunking, and various utility functions for journalism workflows.

Example

>>> from journalism_assistant.tools import search_web, extract_web_content
>>> results = search_web("climate change statistics 2024")
>>> content = extract_web_content("https://example.com/article")

Note

Tools are implemented as LangChain-compatible functions with proper schemas for input/output typing.

Functions¶

analyze_source_diversity(quotes)

Analyze the diversity of sources quoted in an article.

calculate_readability_score(text)

Calculate readability metrics for text.

chunk_text(text[, chunk_size, chunk_overlap])

Split text into manageable chunks for processing.

detect_bias_indicators(text)

Detect potential bias indicators in text.

extract_quotes(text)

Extract quoted text from an article.

extract_web_content(url[, extract_links])

Extract and clean content from a web page.

identify_key_claims(text)

Identify factual claims in text that should be fact-checked.

search_and_summarize(keywords[, max_results])

Search for information and summarize the results.

search_web(keywords[, max_results])

Search the web using DuckDuckGo for fact-checking and research.

Module Contents¶

prebuilt.journalism_.tools.analyze_source_diversity(quotes)¶

Analyze the diversity of sources quoted in an article.

This tool examines quoted sources to assess diversity and potential source bias.

Parameters:

quotes (List[Dict[str, str]]) – List of quotes with speaker information

Returns:

Analysis of source diversity

Return type:

Dict[str, Any]

prebuilt.journalism_.tools.calculate_readability_score(text)¶

Calculate readability metrics for text.

This tool analyzes text readability using various metrics like average sentence length and syllable count estimates.

Parameters:

text (str) – Text to analyze

Returns:

Dictionary with readability metrics

Return type:

Dict[str, Any]

prebuilt.journalism_.tools.chunk_text(text, chunk_size=100000, chunk_overlap=1000)¶

Split text into manageable chunks for processing.

This tool splits large text into smaller chunks while maintaining context through overlap, suitable for LLM processing.

Parameters:
  • text (str) – Text to split into chunks

  • chunk_size (int) – Maximum size of each chunk in characters

  • chunk_overlap (int) – Number of characters to overlap between chunks

Returns:

List of text chunks

Return type:

List[str]

Example

>>> chunks = chunk_text(long_article, chunk_size=50000)
>>> print(f"Split into {len(chunks)} chunks")
prebuilt.journalism_.tools.detect_bias_indicators(text)¶

Detect potential bias indicators in text.

This tool identifies language patterns that may indicate various types of bias in writing.

Parameters:

text (str) – Text to analyze for bias

Returns:

List of potential bias indicators with explanations

Return type:

List[Dict[str, str]]

prebuilt.journalism_.tools.extract_quotes(text)¶

Extract quoted text from an article.

This tool identifies and extracts direct quotes from text, attempting to identify the speaker when possible.

Parameters:

text (str) – Text to extract quotes from

Returns:

List of dictionaries with quote text and speaker

Return type:

List[Dict[str, str]]

Example

>>> quotes = extract_quotes(article_text)
>>> for quote in quotes:
...     print(f'"{quote["text"]}" - {quote["speaker"]}')
prebuilt.journalism_.tools.extract_web_content(url, extract_links=False)¶

Extract and clean content from a web page.

This tool fetches web page content and extracts clean text, removing scripts, styles, and other non-content elements.

Parameters:
  • url (str) – URL of the web page to extract

  • extract_links (bool) – Whether to extract links from the page

Returns:

Dictionary with extracted content and metadata

Return type:

Dict[str, Any]

Example

>>> content = extract_web_content("https://example.com/article")
>>> print(f"Extracted {content['word_count']} words")
prebuilt.journalism_.tools.identify_key_claims(text)¶

Identify factual claims in text that should be fact-checked.

This tool analyzes text to identify statements that make factual claims suitable for verification.

Parameters:

text (str) – Text to analyze for claims

Returns:

List of identified claims

Return type:

List[str]

Example

>>> claims = identify_key_claims(article_text)
>>> print(f"Found {len(claims)} claims to fact-check")
prebuilt.journalism_.tools.search_and_summarize(keywords, max_results=3)¶

Search for information and summarize the results.

This tool combines web search with content extraction to provide summarized information for fact-checking.

Parameters:
  • keywords (str) – Search keywords

  • max_results (int) – Maximum number of results to process

Returns:

List of search results with summaries

Return type:

List[Dict[str, str]]

prebuilt.journalism_.tools.search_web(keywords, max_results=5)¶

Search the web using DuckDuckGo for fact-checking and research.

This tool performs web searches to find relevant information for fact-checking claims and researching topics.

Parameters:
  • keywords (str) – Search query keywords

  • max_results (int) – Maximum number of results to return

Returns:

List of search results with title, URL, and snippet

Return type:

List[Dict[str, Any]]

Example

>>> results = search_web("COVID-19 vaccine efficacy 2024", max_results=3)
>>> for result in results:
...     print(f"{result['title']}: {result['url']}")