My objective is to access the table on the following webpage https://www.countries-ofthe-world.com/world-currencies.html and turn it into a Pandas dataframe that has columns "Country or territory", "Currency", and "ISO-4217".
I am able to access the columns correctly, but I am having a hard time figuring out how to append each row to a dataframe. Do you all have any suggestions on how I can do this? For example, on the webpage, the first row in the table is the letter "A". However, I need the first row in the dataframe to be Afghanistan
, Afghan afghani
, and AFN
.
这是我到目前为止的内容:
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
import pandas as pd
url = "https://www.countries-ofthe-world.com/world-currencies.html"
req = Request(url, headers={"User-Agent":"Mozilla/5.0"})
webpage=urlopen(req).read()
soup = BeautifulSoup(webpage, "html.parser")
table = soup.find("table", {"class":"codes"})
rows = table.find_all('tr')
columns = [v.text for v in rows[0].find_all('th')]
print(columns) # ['Country or territory', 'Currency', 'ISO-4217']
也请看这张图片。
谢谢大家的时间。
托尼
With your fix in place, it's something that can be pretty easily parsed by
pd.read_html
:It has those alphabet headers, but you can get rid of those with something like
df = df[df['Currency'] != df['ISO-4217']]