Natural Language Processing With Python Analyzing Text With the Natural Language Toolkit 1st Edition by Steven Bird, Ewan Klein, Edward Loper – Ebook PDF Instant Download/Delivery. 0596516495, 9780596516499
Full download Natural Language Processing With Python Analyzing Text With the Natural Language Toolkit 1st Edition after payment
Product details:
ISBN 10: 0596516495
ISBN 13: 9780596516499
Author: Steven Bird; Ewan Klein; Edward Loper
This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filtering to automatic summarization and translation. With it, you’ll learn how to write Python programs that work with large collections of unstructured text. You’ll access richly annotated datasets using a comprehensive range of linguistic data structures, and you’ll understand the main algorithms for analyzing the content and structure of written communication. Packed with examples and exercises, Natural Language Processing with Python will help you: Extract information from unstructured text, either to guess the topic or identify “named entities” Analyze linguistic structure in text, including parsing and semantic analysis Access popular linguistic databases, including WordNet and treebanks Integrate techniques drawn from fields as diverse as linguistics and artificial intelligence This book will help you gain practical skills in natural language processing using the Python programming language and the Natural Language Toolkit (NLTK) open source library. If you’re interested in developing web applications, analyzing multilingual news sources, or documenting endangered languages — or if you’re simply curious to have a programmer’s perspective on how human language works — you’ll find Natural Language Processing with Python both fascinating and immensely useful.
Natural Language Processing With Python Analyzing Text With the Natural Language Toolkit 1st Table of contents:
1. Language Processing and Python
Computing with Language: Texts and Words
Getting Started with Python
Getting Started with NLTK
Searching Text
Counting Vocabulary
A Closer Look at Python: Texts as Lists of Words
Lists
Indexing Lists
Variables
Strings
Computing with Language: Simple Statistics
Frequency Distributions
Fine-Grained Selection of Words
Collocations and Bigrams
Counting Other Things
Back to Python: Making Decisions and Taking Control
Conditionals
Operating on Every Element
Nested Code Blocks
Looping with Conditions
Automatic Natural Language Understanding
Word Sense Disambiguation
Pronoun Resolution
Generating Language Output
Machine Translation
Spoken Dialogue Systems
Textual Entailment
Limitations of NLP
Summary
Further Reading
Exercises
2. Accessing Text Corpora and Lexical Resources
Accessing Text Corpora
Gutenberg Corpus
Web and Chat Text
Brown Corpus
Reuters Corpus
Inaugural Address Corpus
Annotated Text Corpora
Corpora in Other Languages
Text Corpus Structure
Loading Your Own Corpus
Conditional Frequency Distributions
Conditions and Events
Counting Words by Genre
Plotting and Tabulating Distributions
Generating Random Text with Bigrams
More Python: Reusing Code
Creating Programs with a Text Editor
Functions
Modules
Lexical Resources
Wordlist Corpora
A Pronouncing Dictionary
Comparative Wordlists
Shoebox and Toolbox Lexicons
WordNet
Senses and Synonyms
The WordNet Hierarchy
More Lexical Relations
Semantic Similarity
Summary
Further Reading
Exercises
3. Processing Raw Text
Accessing Text from the Web and from Disk
Electronic Books
Dealing with HTML
Processing Search Engine Results
Processing RSS Feeds
Reading Local Files
Extracting Text from PDF, MSWord, and Other Binary Formats
Capturing User Input
The NLP Pipeline
Strings: Text Processing at the Lowest Level
Basic Operations with Strings
Printing Strings
Accessing Individual Characters
Accessing Substrings
More Operations on Strings
The Difference Between Lists and Strings
Text Processing with Unicode
What Is Unicode?
Extracting Encoded Text from Files
Using Your Local Encoding in Python
Regular Expressions for Detecting Word Patterns
Using Basic Metacharacters
Ranges and Closures
Useful Applications of Regular Expressions
Extracting Word Pieces
Doing More with Word Pieces
Finding Word Stems
Searching Tokenized Text
Normalizing Text
Stemmers
Lemmatization
Regular Expressions for Tokenizing Text
Simple Approaches to Tokenization
NLTK’s Regular Expression Tokenizer
Further Issues with Tokenization
Segmentation
Sentence Segmentation
Word Segmentation
Formatting: From Lists to Strings
From Lists to Strings
Strings and Formats
Lining Things Up
Writing Results to a File
Text Wrapping
Summary
Further Reading
Exercises
4. Writing Structured Programs
Back to the Basics
Assignment
Equality
Conditionals
Sequences
Operating on Sequence Types
Combining Different Sequence Types
Generator Expressions
Questions of Style
Python Coding Style
Procedural Versus Declarative Style
Some Legitimate Uses for Counters
Functions: The Foundation of Structured Programming
Function Inputs and Outputs
Parameter Passing
Variable Scope
Checking Parameter Types
Functional Decomposition
Documenting Functions
Doing More with Functions
Functions As Arguments
Accumulative Functions
Higher-Order Functions
Named Arguments
Program Development
Structure of a Python Module
Multimodule Programs
Sources of Error
Debugging Techniques
Defensive Programming
Algorithm Design
Recursion
Space-Time Trade-offs
Dynamic Programming
A Sample of Python Libraries
Matplotlib
NetworkX
csv
NumPy
Other Python Libraries
Summary
Further Reading
Exercises
5. Categorizing and Tagging Words
Using a Tagger
Tagged Corpora
Representing Tagged Tokens
Reading Tagged Corpora
A Simplified Part-of-Speech Tagset
Nouns
Verbs
Adjectives and Adverbs
Unsimplified Tags
Exploring Tagged Corpora
Mapping Words to Properties Using Python Dictionaries
Indexing Lists Versus Dictionaries
Dictionaries in Python
Defining Dictionaries
Default Dictionaries
Incrementally Updating a Dictionary
Complex Keys and Values
Inverting a Dictionary
Automatic Tagging
The Default Tagger
The Regular Expression Tagger
The Lookup Tagger
Evaluation
N-Gram Tagging
Unigram Tagging
Separating the Training and Testing Data
General N-Gram Tagging
Combining Taggers
Tagging Unknown Words
Storing Taggers
Performance Limitations
Tagging Across Sentence Boundaries
Transformation-Based Tagging
How to Determine the Category of a Word
Morphological Clues
Syntactic Clues
Semantic Clues
New Words
Morphology in Part-of-Speech Tagsets
Summary
Further Reading
Exercises
6. Learning to Classify Text
Supervised Classification
Gender Identification
Choosing the Right Features
Document Classification
Part-of-Speech Tagging
Exploiting Context
Sequence Classification
Other Methods for Sequence Classification
Further Examples of Supervised Classification
Sentence Segmentation
Identifying Dialogue Act Types
Recognizing Textual Entailment
Scaling Up to Large Datasets
Evaluation
The Test Set
Accuracy
Precision and Recall
Confusion Matrices
Cross-Validation
Decision Trees
Entropy and Information Gain
Naive Bayes Classifiers
Underlying Probabilistic Model
Zero Counts and Smoothing
Non-Binary Features
The Naivete of Independence
The Cause of Double-Counting
Maximum Entropy Classifiers
The Maximum Entropy Model
Maximizing Entropy
Generative Versus Conditional Classifiers
Modeling Linguistic Patterns
What Do Models Tell Us?
Summary
Further Reading
Exercises
7. Extracting Information from Text
Information Extraction
Information Extraction Architecture
Chunking
Noun Phrase Chunking
Tag Patterns
Chunking with Regular Expressions
Exploring Text Corpora
Chinking
Representing Chunks: Tags Versus Trees
Developing and Evaluating Chunkers
Reading IOB Format and the CoNLL-2000 Chunking Corpus
Simple Evaluation and Baselines
Training Classifier-Based Chunkers
Recursion in Linguistic Structure
Building Nested Structure with Cascaded Chunkers
Trees
Tree Traversal
Named Entity Recognition
Relation Extraction
Summary
Further Reading
Exercises
8. Analyzing Sentence Structure
Some Grammatical Dilemmas
Linguistic Data and Unlimited Possibilities
Ubiquitous Ambiguity
What’s the Use of Syntax?
Beyond n-grams
Context-Free Grammar
A Simple Grammar
Writing Your Own Grammars
Recursion in Syntactic Structure
Parsing with Context-Free Grammar
Recursive Descent Parsing
Shift-Reduce Parsing
The Left-Corner Parser
Well-Formed Substring Tables
Dependencies and Dependency Grammar
Valency and the Lexicon
Scaling Up
Grammar Development
Treebanks and Grammars
Pernicious Ambiguity
Weighted Grammar
Summary
Further Reading
Exercises
9. Building Feature-Based Grammars
Grammatical Features
Syntactic Agreement
Using Attributes and Constraints
Terminology
Processing Feature Structures
Subsumption and Unification
Extending a Feature-Based Grammar
Subcategorization
Heads Revisited
Auxiliary Verbs and Inversion
Unbounded Dependency Constructions
Case and Gender in German
Summary
Further Reading
Exercises
10. Analyzing the Meaning of Sentences
Natural Language Understanding
Querying a Database
Natural Language, Semantics, and Logic
Propositional Logic
First-Order Logic
Syntax
First-Order Theorem Proving
Summarizing the Language of First-Order Logic
Truth in Model
Individual Variables and Assignments
Quantification
Quantifier Scope Ambiguity
Model Building
The Semantics of English Sentences
Compositional Semantics in Feature-Based Grammar
The λ-Calculus
Quantified NPs
Transitive Verbs
Quantifier Ambiguity Revisited
Discourse Semantics
Discourse Representation Theory
Discourse Processing
Summary
Further Reading
Exercises
11. Managing Linguistic Data
Corpus Structure: A Case Study
The Structure of TIMIT
Notable Design Features
Fundamental Data Types
The Life Cycle of a Corpus
Three Corpus Creation Scenarios
Quality Control
Curation Versus Evolution
Acquiring Data
Obtaining Data from the Web
Obtaining Data from Word Processor Files
Obtaining Data from Spreadsheets and Databases
Converting Data Formats
Deciding Which Layers of Annotation to Include
Standards and Tools
Special Considerations When Working with Endangered Languages
Working with XML
Using XML for Linguistic Structures
The Role of XML
The ElementTree Interface
Using ElementTree for Accessing Toolbox Data
Formatting Entries
Working with Toolbox Data
Adding a Field to Each Entry
Validating a Toolbox Lexicon
Describing Language Resources Using OLAC Metadata
What Is Metadata?
OLAC: Open Language Archives Community
Summary
Further Reading
Exercises
A. Afterword: The Language Challenge
Language Processing Versus Symbol Processing
Contemporary Philosophical Divides
NLTK Roadmap
Envoi…
B. Bibliography
NLTK Index
General Index
About the Authors
Colophon
SPECIAL OFFER: Upgrade this ebook with O’Reilly
People also search for Natural Language Processing With Python Analyzing Text With the Natural Language Toolkit 1st:
is python a natural language
natural language processing with python and spacy
natural language processing python example
natural language processing with python bird