A Python Library for Pulling Data Out of HTML and XML Files

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.


  >>> from bs4 import BeautifulSoup
  >>> soup = BeautifulSoup("<p>Some<b>bad<i>HTML")
  >>> print soup.prettify()
  >>> soup.find(text="bad")

  >>> soup.i

  >>> soup = BeautifulSoup("<tag1>Some<tag2/>bad<tag3>XML", "xml")
  >>> print soup.prettify()
  <?xml version="1.0" encoding="utf-8">
   <tag2 />

Full documentation

The bs4/doc/ directory contains full documentation in Sphinx format. Run "make html" in that directory to create HTML documentation.

Running the unit tests

Beautiful Soup supports unit test discovery from the project root directory:

 $ nosetests

 $ python -m unittest discover -s bs4 ## Python 2.7 and up

If you checked out the source tree, you should see a script in the home directory called test-all-versions. This script will run the unit tests under Python 2.7, then create a temporary Python 3 conversion of the source and run the unit tests again under Python 3.


Homepage: http://www.crummy.com/software/BeautifulSoup/bs4/ Documentation: http://www.crummy.com/software/BeautifulSoup/bs4/doc/ http://readthedocs.org/docs/beautiful-soup-4/ Discussion group: http://groups.google.com/group/beautifulsoup/ Development: https://code.launchpad.net/beautifulsoup/ Bug tracker: https://bugs.launchpad.net/beautifulsoup/

Download Details:

Author: waylan Download Link: Download The Source Code Official Website: https://github.com/waylan/beautifulsoup License: View license

