<element1>
<element2>Content</element2>
</element1>requests.get(url): sends request to website and returns a Response objectresponse.text: extracts the HTML text from the Response objBeautifulSoup(responsetxt, ‘lxml’): create a BeautifulSoup object from the HTML text, which you can use for locating and extracting the text data u wantelementselement has a specific type, denoted by its tagattributes that further distinguish them from one anotherBasic structure:
Typically written top to bottom like this with indents to make the nested structure clear
head & body: separate sites into these two broad sectionsdiv: general divider elements, separating sections within the head and body sectionsh1: headers (and subheaders are h2, h3, etc)ol & ul: ordered and unordered lists; have li elements within them containing each list entrytable: table element, with tr and td elements within them containing the table rows and table data respectivelya: elements containing links<>) that describe them furtherclass: identifies sets of elements that have similar purposeid: uniquely identifies elementshref, which contains links.find() / .find_all() methods on the BS object to locate specific elements
class="main-article".find() would just return the first such element.get_text() method to extract the contentopen() framework for saving files, to save as a csv or text filearticle elements contained within a specific div element