{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Not long ago, I needed to parse some HTML tables from our confluence website at work. I first thought: I'm gonna need [requests](http://docs.python-requests.org/en/master/) and [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/). As HTML tables are well defined, I did some quick googling to see if there was some recipe or lib to parse them and I found a link to [pandas](https://pandas.pydata.org). What? Can pandas do that too?\n", "\n", "I have been using pandas for quite some time and have used read_csv, read_excel, even read_sql, but I had missed read_html!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Reading excel file with pandas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before to look at HTML tables, I want to show a quick example on how to read an excel file with pandas. The API is really nice. If I have to look at some excel data, I go directly to pandas.\n", "\n", "So let's download a sample file file:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import io\n", "import requests\n", "import pandas as pd\n", "from zipfile import ZipFile" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "r = requests.get('http://www.contextures.com/SampleData.zip')\n", "ZipFile(io.BytesIO(r.content)).extractall()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This created the *SampleData.xlsx* file that includes four sheets: Instructions, SalesOrders, SampleNumbers and MyLinks. Only the *SalesOrders* sheet includes tabular data:\n", "![SampleData](/images/read_html/sample_data_xlsx.png)\n", "So let's read it." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "df = pd.read_excel('SampleData.xlsx', sheet_name='SalesOrders')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OrderDateRegionRepItemUnitsUnit CostTotal
02016-01-06EastJonesPencil951.99189.05
12016-01-23CentralKivellBinder5019.99999.50
22016-02-09CentralJardinePencil364.99179.64
32016-02-26CentralGillPen2719.99539.73
42016-03-15WestSorvinoPencil562.99167.44
\n", "
" ], "text/plain": [ " OrderDate Region Rep Item Units Unit Cost Total\n", "0 2016-01-06 East Jones Pencil 95 1.99 189.05\n", "1 2016-01-23 Central Kivell Binder 50 19.99 999.50\n", "2 2016-02-09 Central Jardine Pencil 36 4.99 179.64\n", "3 2016-02-26 Central Gill Pen 27 19.99 539.73\n", "4 2016-03-15 West Sorvino Pencil 56 2.99 167.44" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's it. One line and you have your data in a DataFrame that you can easily manipulate, filter, convert and display in a jupyter notebook. Can it be easier than that?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Parsing HTML Tables" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So let's go back to HTML tables and look at [pandas.read_html](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_html.html).\n", "\n", "The function accepts:\n", "> A URL, a file-like object, or a raw string containing HTML.\n", "\n", "Let's start with a basic HTML table in a raw string." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Parsing raw string" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "html_string = \"\"\"\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Programming LanguageCreatorYear
CDennis Ritchie1972
PythonGuido Van Rossum1989
RubyYukihiro Matsumoto1995
\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can render the table using IPython [display_html](http://ipython.readthedocs.io/en/stable/api/generated/IPython.display.html#IPython.display.display_html) function:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Programming LanguageCreatorYear
CDennis Ritchie1972
PythonGuido Van Rossum1989
RubyYukihiro Matsumoto1995
\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import display_html\n", "display_html(html_string, raw=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's import this HTML table in a DataFrame. Note that the function `read_html` always returns a list of DataFrame objects:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[ Programming Language Creator Year\n", " 0 C Dennis Ritchie 1972\n", " 1 Python Guido Van Rossum 1989\n", " 2 Ruby Yukihiro Matsumoto 1995]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfs = pd.read_html(html_string)\n", "dfs" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Programming LanguageCreatorYear
0CDennis Ritchie1972
1PythonGuido Van Rossum1989
2RubyYukihiro Matsumoto1995
\n", "
" ], "text/plain": [ " Programming Language Creator Year\n", "0 C Dennis Ritchie 1972\n", "1 Python Guido Van Rossum 1989\n", "2 Ruby Yukihiro Matsumoto 1995" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = dfs[0]\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This looks quite similar to the raw string we rendered above, but we are printing a pandas DataFrame object here! We can apply any operation we want." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Programming LanguageCreatorYear
1PythonGuido Van Rossum1989
2RubyYukihiro Matsumoto1995
\n", "
" ], "text/plain": [ " Programming Language Creator Year\n", "1 Python Guido Van Rossum 1989\n", "2 Ruby Yukihiro Matsumoto 1995" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df.Year > 1975]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pandas automatically found the header to use thanks to the `` tag. It is not mandatory to define a table and is actually often missing on the web. So what happens if it's not present?" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "html_string = \"\"\"\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Programming LanguageCreatorYear
CDennis Ritchie1972
PythonGuido Van Rossum1989
RubyYukihiro Matsumoto1995
\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012
0Programming LanguageCreatorYear
1CDennis Ritchie1972
2PythonGuido Van Rossum1989
3RubyYukihiro Matsumoto1995
\n", "
" ], "text/plain": [ " 0 1 2\n", "0 Programming Language Creator Year\n", "1 C Dennis Ritchie 1972\n", "2 Python Guido Van Rossum 1989\n", "3 Ruby Yukihiro Matsumoto 1995" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.read_html(html_string)[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case, we need to pass the row number to use as header." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Programming LanguageCreatorYear
0CDennis Ritchie1972
1PythonGuido Van Rossum1989
2RubyYukihiro Matsumoto1995
\n", "
" ], "text/plain": [ " Programming Language Creator Year\n", "0 C Dennis Ritchie 1972\n", "1 Python Guido Van Rossum 1989\n", "2 Ruby Yukihiro Matsumoto 1995" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.read_html(html_string, header=0)[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Parsing a http URL" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The same data we read in our excel file is available in a table at the following address: http://www.contextures.com/xlSampleData01.html\n", "\n", "Let's pass this url to `read_html`:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "dfs = pd.read_html('http://www.contextures.com/xlSampleData01.html')" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[ 0 1 2 3 4 5 6\n", " 0 OrderDate Region Rep Item Units UnitCost Total\n", " 1 1/6/2016 East Jones Pencil 95 1.99 189.05\n", " 2 1/23/2016 Central Kivell Binder 50 19.99 999.50\n", " 3 2/9/2016 Central Jardine Pencil 36 4.99 179.64\n", " 4 2/26/2016 Central Gill Pen 27 19.99 539.73\n", " 5 3/15/2016 West Sorvino Pencil 56 2.99 167.44\n", " 6 4/1/2016 East Jones Binder 60 4.99 299.40\n", " 7 4/18/2016 Central Andrews Pencil 75 1.99 149.25\n", " 8 5/5/2016 Central Jardine Pencil 90 4.99 449.10\n", " 9 5/22/2016 West Thompson Pencil 32 1.99 63.68\n", " 10 6/8/2016 East Jones Binder 60 8.99 539.40\n", " 11 6/25/2016 Central Morgan Pencil 90 4.99 449.10\n", " 12 7/12/2016 East Howard Binder 29 1.99 57.71\n", " 13 7/29/2016 East Parent Binder 81 19.99 1619.19\n", " 14 8/15/2016 East Jones Pencil 35 4.99 174.65\n", " 15 9/1/2016 Central Smith Desk 2 125.00 250.00\n", " 16 9/18/2016 East Jones Pen Set 16 15.99 255.84\n", " 17 10/5/2016 Central Morgan Binder 28 8.99 251.72\n", " 18 10/22/2016 East Jones Pen 64 8.99 575.36\n", " 19 11/8/2016 East Parent Pen 15 19.99 299.85\n", " 20 11/25/2016 Central Kivell Pen Set 96 4.99 479.04\n", " 21 12/12/2016 Central Smith Pencil 67 1.29 86.43\n", " 22 12/29/2016 East Parent Pen Set 74 15.99 1183.26\n", " 23 1/15/2017 Central Gill Binder 46 8.99 413.54\n", " 24 2/1/2017 Central Smith Binder 87 15.00 1305.00\n", " 25 2/18/2017 East Jones Binder 4 4.99 19.96\n", " 26 3/7/2017 West Sorvino Binder 7 19.99 139.93\n", " 27 3/24/2017 Central Jardine Pen Set 50 4.99 249.50\n", " 28 4/10/2017 Central Andrews Pencil 66 1.99 131.34\n", " 29 4/27/2017 East Howard Pen 96 4.99 479.04\n", " 30 5/14/2017 Central Gill Pencil 53 1.29 68.37\n", " 31 5/31/2017 Central Gill Binder 80 8.99 719.20\n", " 32 6/17/2017 Central Kivell Desk 5 125.00 625.00\n", " 33 7/4/2017 East Jones Pen Set 62 4.99 309.38\n", " 34 7/21/2017 Central Morgan Pen Set 55 12.49 686.95\n", " 35 8/7/2017 Central Kivell Pen Set 42 23.95 1005.90\n", " 36 8/24/2017 West Sorvino Desk 3 275.00 825.00\n", " 37 9/10/2017 Central Gill Pencil 7 1.29 9.03\n", " 38 9/27/2017 West Sorvino Pen 76 1.99 151.24\n", " 39 10/14/2017 West Thompson Binder 57 19.99 1139.43\n", " 40 10/31/2017 Central Andrews Pencil 14 1.29 18.06\n", " 41 11/17/2017 Central Jardine Binder 11 4.99 54.89\n", " 42 12/4/2017 Central Jardine Binder 94 19.99 1879.06\n", " 43 12/21/2017 Central Andrews Binder 28 4.99 139.72]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have one table and can see that we need to pass the row number to use as header (because `` is not present)." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OrderDateRegionRepItemUnitsUnitCostTotal
01/6/2016EastJonesPencil951.99189.05
11/23/2016CentralKivellBinder5019.99999.50
22/9/2016CentralJardinePencil364.99179.64
32/26/2016CentralGillPen2719.99539.73
43/15/2016WestSorvinoPencil562.99167.44
\n", "
" ], "text/plain": [ " OrderDate Region Rep Item Units UnitCost Total\n", "0 1/6/2016 East Jones Pencil 95 1.99 189.05\n", "1 1/23/2016 Central Kivell Binder 50 19.99 999.50\n", "2 2/9/2016 Central Jardine Pencil 36 4.99 179.64\n", "3 2/26/2016 Central Gill Pen 27 19.99 539.73\n", "4 3/15/2016 West Sorvino Pencil 56 2.99 167.44" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfs = pd.read_html('http://www.contextures.com/xlSampleData01.html', header=0)\n", "dfs[0].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nice!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Parsing a https URL" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The documentation states that:\n", "\n", "> Note that lxml only accepts the http, ftp and file url protocols. If you have a URL that starts with 'https' you might try removing the 's'.\n", "\n", "This is true, but *bs4 + html5lib* are used as a fallback when *lxml* fails. I guess this is why passing a `https` url does work. We can confirm that with a wikipedia page.\n" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TypemutableDescriptionSyntax example
0boolimmutableBoolean valueTrue False
1bytearraymutableSequence of bytesbytearray(b'Some ASCII') bytearray(b\"Some ASCI...
2bytesimmutableSequence of bytesb'Some ASCII' b\"Some ASCII\" bytes([119, 105, 1...
3compleximmutableComplex number with real and imaginary parts3+2.7j
4dictmutableAssociative array (or dictionary) of key and v...{'key1': 1.0, 3: False}
5ellipsisNaNAn ellipsis placeholder to be used as an index......
6floatimmutableFloating point number, system-defined precision3.1415927
7frozensetimmutableUnordered set, contains no duplicates; can con...frozenset([4.0, 'string', True])
8intimmutableInteger of unlimited magnitude[76]42
9listmutableList, can contain mixed types[4.0, 'string', True]
10setmutableUnordered set, contains no duplicates; can con...{4.0, 'string', True}
11strimmutableA character string: sequence of Unicode codepo...'Wikipedia' \"Wikipedia\" \"\"\"Spanning multiple l...
12tupleimmutableCan contain mixed types(4.0, 'string', True)But we can append element...
\n", "
" ], "text/plain": [ " Type mutable Description \\\n", "0 bool immutable Boolean value \n", "1 bytearray mutable Sequence of bytes \n", "2 bytes immutable Sequence of bytes \n", "3 complex immutable Complex number with real and imaginary parts \n", "4 dict mutable Associative array (or dictionary) of key and v... \n", "5 ellipsis NaN An ellipsis placeholder to be used as an index... \n", "6 float immutable Floating point number, system-defined precision \n", "7 frozenset immutable Unordered set, contains no duplicates; can con... \n", "8 int immutable Integer of unlimited magnitude[76] \n", "9 list mutable List, can contain mixed types \n", "10 set mutable Unordered set, contains no duplicates; can con... \n", "11 str immutable A character string: sequence of Unicode codepo... \n", "12 tuple immutable Can contain mixed types \n", "\n", " Syntax example \n", "0 True False \n", "1 bytearray(b'Some ASCII') bytearray(b\"Some ASCI... \n", "2 b'Some ASCII' b\"Some ASCII\" bytes([119, 105, 1... \n", "3 3+2.7j \n", "4 {'key1': 1.0, 3: False} \n", "5 ... \n", "6 3.1415927 \n", "7 frozenset([4.0, 'string', True]) \n", "8 42 \n", "9 [4.0, 'string', True] \n", "10 {4.0, 'string', True} \n", "11 'Wikipedia' \"Wikipedia\" \"\"\"Spanning multiple l... \n", "12 (4.0, 'string', True)But we can append element... " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.read_html('https://en.wikipedia.org/wiki/Python_(programming_language)', header=0)[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But what if the url requires authentiation?\n", "\n", "In that case we can use [requests](http://docs.python-requests.org/en/master/) to get the HTML and pass the string to pandas!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To demonstrate authentication, we can use http://httpbin.org\n", "\n", "We can first confirm that passing a url that requires authentication raises a 401" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "ename": "HTTPError", "evalue": "HTTP Error 401: UNAUTHORIZED", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mHTTPError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_html\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'https://httpbin.org/basic-auth/myuser/mypasswd'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m~/miniconda3/envs/jupyter/lib/python3.6/site-packages/pandas/io/html.py\u001b[0m in \u001b[0;36mread_html\u001b[0;34m(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding, decimal, converters, na_values, keep_default_na)\u001b[0m\n\u001b[1;32m 913\u001b[0m \u001b[0mthousands\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mthousands\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mattrs\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mattrs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mencoding\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mencoding\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 914\u001b[0m \u001b[0mdecimal\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdecimal\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mconverters\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mconverters\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mna_values\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mna_values\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 915\u001b[0;31m keep_default_na=keep_default_na)\n\u001b[0m", "\u001b[0;32m~/miniconda3/envs/jupyter/lib/python3.6/site-packages/pandas/io/html.py\u001b[0m in \u001b[0;36m_parse\u001b[0;34m(flavor, io, match, attrs, encoding, **kwargs)\u001b[0m\n\u001b[1;32m 747\u001b[0m \u001b[0;32mbreak\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 748\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 749\u001b[0;31m \u001b[0mraise_with_traceback\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mretained\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 750\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 751\u001b[0m \u001b[0mret\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/miniconda3/envs/jupyter/lib/python3.6/site-packages/pandas/compat/__init__.py\u001b[0m in \u001b[0;36mraise_with_traceback\u001b[0;34m(exc, traceback)\u001b[0m\n\u001b[1;32m 383\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mtraceback\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mEllipsis\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 384\u001b[0m \u001b[0m_\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0m_\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtraceback\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msys\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexc_info\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 385\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mexc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwith_traceback\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtraceback\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 386\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 387\u001b[0m \u001b[0;31m# this version of raise is a syntax error in Python 3\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mHTTPError\u001b[0m: HTTP Error 401: UNAUTHORIZED" ] } ], "source": [ "pd.read_html('https://httpbin.org/basic-auth/myuser/mypasswd')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "r = requests.get('https://httpbin.org/basic-auth/myuser/mypasswd')\n", "r.status_code" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Yes, as expected. Let's pass the username and password with requests." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "r = requests.get('https://httpbin.org/basic-auth/myuser/mypasswd', auth=('myuser', 'mypasswd'))\n", "r.status_code" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We could now pass `r.text` to pandas. http://httpbin.org was used to demonstrate authentication but it only returns JSON-encoded responses and no HTML. It's a testing service. So it doesn't make sense here.\n", "\n", "The following example shows how to combine requests and pandas." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TypemutableDescriptionSyntax example
0boolimmutableBoolean valueTrue False
1bytearraymutableSequence of bytesbytearray(b'Some ASCII') bytearray(b\"Some ASCI...
2bytesimmutableSequence of bytesb'Some ASCII' b\"Some ASCII\" bytes([119, 105, 1...
3compleximmutableComplex number with real and imaginary parts3+2.7j
4dictmutableAssociative array (or dictionary) of key and v...{'key1': 1.0, 3: False}
5ellipsisNaNAn ellipsis placeholder to be used as an index......
6floatimmutableFloating point number, system-defined precision3.1415927
7frozensetimmutableUnordered set, contains no duplicates; can con...frozenset([4.0, 'string', True])
8intimmutableInteger of unlimited magnitude[76]42
9listmutableList, can contain mixed types[4.0, 'string', True]
10setmutableUnordered set, contains no duplicates; can con...{4.0, 'string', True}
11strimmutableA character string: sequence of Unicode codepo...'Wikipedia' \"Wikipedia\" \"\"\"Spanning multiple l...
12tupleimmutableCan contain mixed types(4.0, 'string', True)But we can append element...
\n", "
" ], "text/plain": [ " Type mutable Description \\\n", "0 bool immutable Boolean value \n", "1 bytearray mutable Sequence of bytes \n", "2 bytes immutable Sequence of bytes \n", "3 complex immutable Complex number with real and imaginary parts \n", "4 dict mutable Associative array (or dictionary) of key and v... \n", "5 ellipsis NaN An ellipsis placeholder to be used as an index... \n", "6 float immutable Floating point number, system-defined precision \n", "7 frozenset immutable Unordered set, contains no duplicates; can con... \n", "8 int immutable Integer of unlimited magnitude[76] \n", "9 list mutable List, can contain mixed types \n", "10 set mutable Unordered set, contains no duplicates; can con... \n", "11 str immutable A character string: sequence of Unicode codepo... \n", "12 tuple immutable Can contain mixed types \n", "\n", " Syntax example \n", "0 True False \n", "1 bytearray(b'Some ASCII') bytearray(b\"Some ASCI... \n", "2 b'Some ASCII' b\"Some ASCII\" bytes([119, 105, 1... \n", "3 3+2.7j \n", "4 {'key1': 1.0, 3: False} \n", "5 ... \n", "6 3.1415927 \n", "7 frozenset([4.0, 'string', True]) \n", "8 42 \n", "9 [4.0, 'string', True] \n", "10 {4.0, 'string', True} \n", "11 'Wikipedia' \"Wikipedia\" \"\"\"Spanning multiple l... \n", "12 (4.0, 'string', True)But we can append element... " ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r = requests.get('https://en.wikipedia.org/wiki/Python_(programming_language)')\n", "pd.read_html(r.text, header=0)[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A more complex example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We looked at some quite simple examples so far. So let's try a page with several tables: https://en.wikipedia.org/wiki/Timeline_of_programming_languages" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "dfs = pd.read_html('https://en.wikipedia.org/wiki/Timeline_of_programming_languages')" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "13" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(dfs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we look at the page we have 8 tables (one per decade). Looking at our `dfs` list, we can see that the first interesting table is the fifth one and that we need to pass the row to use as header." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearNameChief developer, companyPredecessor(s)
01943–45Plankalkül (concept)Konrad Zusenone (unique language)
11943–46ENIAC coding systemJohn von Neumann, John Mauchly, J. Presper Eck...none (unique language)
21946ENIAC Short CodeRichard Clippinger, John von Neumann after Ala...ENIAC coding system
31946Von Neumann and Goldstine graphing system (Not...John von Neumann and Herman GoldstineENIAC coding system
41947ARC AssemblyKathleen Booth[1][2]ENIAC coding system
51948CPC Coding schemeHoward H. AikenAnalytical Engine order code
61948Curry notation systemHaskell CurryENIAC coding system
71948Plankalkül (concept published)Konrad Zusenone (unique language)
81949Short CodeJohn Mauchly and William F. SchmittENIAC Short Code
9YearNameChief developer, companyPredecessor(s)
\n", "
" ], "text/plain": [ " Year Name \\\n", "0 1943–45 Plankalkül (concept) \n", "1 1943–46 ENIAC coding system \n", "2 1946 ENIAC Short Code \n", "3 1946 Von Neumann and Goldstine graphing system (Not... \n", "4 1947 ARC Assembly \n", "5 1948 CPC Coding scheme \n", "6 1948 Curry notation system \n", "7 1948 Plankalkül (concept published) \n", "8 1949 Short Code \n", "9 Year Name \n", "\n", " Chief developer, company \\\n", "0 Konrad Zuse \n", "1 John von Neumann, John Mauchly, J. Presper Eck... \n", "2 Richard Clippinger, John von Neumann after Ala... \n", "3 John von Neumann and Herman Goldstine \n", "4 Kathleen Booth[1][2] \n", "5 Howard H. Aiken \n", "6 Haskell Curry \n", "7 Konrad Zuse \n", "8 John Mauchly and William F. Schmitt \n", "9 Chief developer, company \n", "\n", " Predecessor(s) \n", "0 none (unique language) \n", "1 none (unique language) \n", "2 ENIAC coding system \n", "3 ENIAC coding system \n", "4 ENIAC coding system \n", "5 Analytical Engine order code \n", "6 ENIAC coding system \n", "7 none (unique language) \n", "8 ENIAC Short Code \n", "9 Predecessor(s) " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfs = pd.read_html('https://en.wikipedia.org/wiki/Timeline_of_programming_languages', header=0)\n", "dfs[4]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the header was repeated in the last row (to make the table easier to read on the HTML page). We can filter that after concatenating together the 8 tables to get one DataFrame." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearNameChief developer, companyPredecessor(s)
01943–45Plankalkül (concept)Konrad Zusenone (unique language)
11943–46ENIAC coding systemJohn von Neumann, John Mauchly, J. Presper Eck...none (unique language)
21946ENIAC Short CodeRichard Clippinger, John von Neumann after Ala...ENIAC coding system
31946Von Neumann and Goldstine graphing system (Not...John von Neumann and Herman GoldstineENIAC coding system
41947ARC AssemblyKathleen Booth[1][2]ENIAC coding system
51948CPC Coding schemeHoward H. AikenAnalytical Engine order code
61948Curry notation systemHaskell CurryENIAC coding system
71948Plankalkül (concept published)Konrad Zusenone (unique language)
81949Short CodeJohn Mauchly and William F. SchmittENIAC Short Code
9YearNameChief developer, companyPredecessor(s)
01950Short CodeWilliam F Schmidt, Albert B. Tonik,[3] J.R. LoganBrief Code
11950Birkbeck AssemblerKathleen BoothARC
21951SuperplanHeinz RutishauserPlankalkül
31951ALGAEEdward A Voorhees and Karl Balkenone (unique language)
41951Intermediate Programming LanguageArthur BurksShort Code
51951Regional Assembly LanguageMaurice WilkesEDSAC
61951Boehm unnamed coding systemCorrado BöhmCPC Coding scheme
71951KlammerausdrückeKonrad ZusePlankalkül
81951OMNIBAC Symbolic AssemblerCharles KatzShort Code
91951Stanislaus (Notation)Fritz Bauernone (unique language)
101951Whirlwind assemblerCharles Adams and Jack Gilmore at MIT Project ...EDSAC
111951Rochester assemblerNat RochesterEDSAC
121951Sort Merge GeneratorBetty Holbertonnone (unique language)
131952A-0Grace HopperShort Code
141952Glennie AutocodeAlick Glennie after Alan TuringCPC Coding scheme
151952Editing GeneratorMilly KossSORT/MERGE
161952COMPOOLRAND/SDCnone (unique language)
171953SpeedcodingJohn W. Backusnone (unique language)
181953READ/PRINTDon Harroff, James Fishman, George Ryckmannone (unique language)
191954Laning and Zierler systemLaning, Zierler, Adams at MIT Project Whirlwindnone (unique language)
...............
472009ChapelBrad Chamberlain, Cray Inc.HPF, ZPL
482009GoGoogleC, Oberon, Limbo, Smalltalk
492009CoffeeScriptJeremy AshkenasJavaScript, Ruby, Python, Haskell
502009IdrisEdwin BradyHaskell, Agda, Coq
512009ParasailS. Tucker Taft, AdaCoreModula, Ada, Pascal, ML
522009WhileyDavid J. PearceJava, C, Python
53YearNameChief developer, companyPredecessor(s)
02010RustGraydon Hoare, MozillaAlef, C++, Camlp4, Erlang, Hermes, Limbo, Napi...
12011CeylonGavin King, Red HatJava
22011DartGoogleJava, JavaScript, CoffeeScript, Go
32011C++11C++ ISO/IEC 14882:2011C++, Standard C, C
42011KotlinJetBrainsJava, Scala, Groovy, C#, Gosu
52011RedNenad RakocevicRebol, Scala, Lua
62011OpaMLstateOCaml, Erlang, JavaScript
72012ElixirJosé ValimErlang, Ruby, Clojure
82012ElmEvan CzaplickiHaskell, Standard ML, OCaml, F#
92012TypeScriptAnders Hejlsberg, MicrosoftJavaScript, CoffeeScript
102012JuliaJeff Bezanson, Stefan Karpinski, Viral Shah, A...MATLAB, Lisp, C, Fortran, Mathematica[9] (stri...
112012PVivek Gupta: not the politician, Ethan Jackson...NaN
122012Ada 2012ARA and Ada Europe (ISO/IEC 8652:2012)Ada 2005, ISO/IEC 8652:1995/Amd 1:2007
132014CrystalAry Borenszweig, Manas Technology SolutionsRuby, C, Rust, Go, C#, Python
142014HackFacebookPHP
152014SwiftApple Inc.Objective-C, Rust, Haskell, Ruby, Python, C#, CLU
162014C++14C++ ISO/IEC 14882:2014C++, Standard C, C
172015Atari 2600 SuperCharger BASICMicrosoft sponsored think tank RelationalFrame...BASIC, Dartmouth BASIC (compiled programming l...
182015Perl 6The Rakudo TeamPerl, Haskell, Python, Ruby
192016RingMahmoud FayedLua, Python, Ruby, C, C#, BASIC, QML, xBase, S...
202017C++17C++ ISO/IEC 14882:2017C++, Standard C, C
212017Atari 2600 Flashback BASICMicrosoft sponsored think tank RelationalFrame...BASIC, Dartmouth BASIC (compiled programming l...
22YearNameChief developer, companyPredecessor(s)
\n", "

388 rows × 4 columns

\n", "
" ], "text/plain": [ " Year Name \\\n", "0 1943–45 Plankalkül (concept) \n", "1 1943–46 ENIAC coding system \n", "2 1946 ENIAC Short Code \n", "3 1946 Von Neumann and Goldstine graphing system (Not... \n", "4 1947 ARC Assembly \n", "5 1948 CPC Coding scheme \n", "6 1948 Curry notation system \n", "7 1948 Plankalkül (concept published) \n", "8 1949 Short Code \n", "9 Year Name \n", "0 1950 Short Code \n", "1 1950 Birkbeck Assembler \n", "2 1951 Superplan \n", "3 1951 ALGAE \n", "4 1951 Intermediate Programming Language \n", "5 1951 Regional Assembly Language \n", "6 1951 Boehm unnamed coding system \n", "7 1951 Klammerausdrücke \n", "8 1951 OMNIBAC Symbolic Assembler \n", "9 1951 Stanislaus (Notation) \n", "10 1951 Whirlwind assembler \n", "11 1951 Rochester assembler \n", "12 1951 Sort Merge Generator \n", "13 1952 A-0 \n", "14 1952 Glennie Autocode \n", "15 1952 Editing Generator \n", "16 1952 COMPOOL \n", "17 1953 Speedcoding \n", "18 1953 READ/PRINT \n", "19 1954 Laning and Zierler system \n", ".. ... ... \n", "47 2009 Chapel \n", "48 2009 Go \n", "49 2009 CoffeeScript \n", "50 2009 Idris \n", "51 2009 Parasail \n", "52 2009 Whiley \n", "53 Year Name \n", "0 2010 Rust \n", "1 2011 Ceylon \n", "2 2011 Dart \n", "3 2011 C++11 \n", "4 2011 Kotlin \n", "5 2011 Red \n", "6 2011 Opa \n", "7 2012 Elixir \n", "8 2012 Elm \n", "9 2012 TypeScript \n", "10 2012 Julia \n", "11 2012 P \n", "12 2012 Ada 2012 \n", "13 2014 Crystal \n", "14 2014 Hack \n", "15 2014 Swift \n", "16 2014 C++14 \n", "17 2015 Atari 2600 SuperCharger BASIC \n", "18 2015 Perl 6 \n", "19 2016 Ring \n", "20 2017 C++17 \n", "21 2017 Atari 2600 Flashback BASIC \n", "22 Year Name \n", "\n", " Chief developer, company \\\n", "0 Konrad Zuse \n", "1 John von Neumann, John Mauchly, J. Presper Eck... \n", "2 Richard Clippinger, John von Neumann after Ala... \n", "3 John von Neumann and Herman Goldstine \n", "4 Kathleen Booth[1][2] \n", "5 Howard H. Aiken \n", "6 Haskell Curry \n", "7 Konrad Zuse \n", "8 John Mauchly and William F. Schmitt \n", "9 Chief developer, company \n", "0 William F Schmidt, Albert B. Tonik,[3] J.R. Logan \n", "1 Kathleen Booth \n", "2 Heinz Rutishauser \n", "3 Edward A Voorhees and Karl Balke \n", "4 Arthur Burks \n", "5 Maurice Wilkes \n", "6 Corrado Böhm \n", "7 Konrad Zuse \n", "8 Charles Katz \n", "9 Fritz Bauer \n", "10 Charles Adams and Jack Gilmore at MIT Project ... \n", "11 Nat Rochester \n", "12 Betty Holberton \n", "13 Grace Hopper \n", "14 Alick Glennie after Alan Turing \n", "15 Milly Koss \n", "16 RAND/SDC \n", "17 John W. Backus \n", "18 Don Harroff, James Fishman, George Ryckman \n", "19 Laning, Zierler, Adams at MIT Project Whirlwind \n", ".. ... \n", "47 Brad Chamberlain, Cray Inc. \n", "48 Google \n", "49 Jeremy Ashkenas \n", "50 Edwin Brady \n", "51 S. Tucker Taft, AdaCore \n", "52 David J. Pearce \n", "53 Chief developer, company \n", "0 Graydon Hoare, Mozilla \n", "1 Gavin King, Red Hat \n", "2 Google \n", "3 C++ ISO/IEC 14882:2011 \n", "4 JetBrains \n", "5 Nenad Rakocevic \n", "6 MLstate \n", "7 José Valim \n", "8 Evan Czaplicki \n", "9 Anders Hejlsberg, Microsoft \n", "10 Jeff Bezanson, Stefan Karpinski, Viral Shah, A... \n", "11 Vivek Gupta: not the politician, Ethan Jackson... \n", "12 ARA and Ada Europe (ISO/IEC 8652:2012) \n", "13 Ary Borenszweig, Manas Technology Solutions \n", "14 Facebook \n", "15 Apple Inc. \n", "16 C++ ISO/IEC 14882:2014 \n", "17 Microsoft sponsored think tank RelationalFrame... \n", "18 The Rakudo Team \n", "19 Mahmoud Fayed \n", "20 C++ ISO/IEC 14882:2017 \n", "21 Microsoft sponsored think tank RelationalFrame... \n", "22 Chief developer, company \n", "\n", " Predecessor(s) \n", "0 none (unique language) \n", "1 none (unique language) \n", "2 ENIAC coding system \n", "3 ENIAC coding system \n", "4 ENIAC coding system \n", "5 Analytical Engine order code \n", "6 ENIAC coding system \n", "7 none (unique language) \n", "8 ENIAC Short Code \n", "9 Predecessor(s) \n", "0 Brief Code \n", "1 ARC \n", "2 Plankalkül \n", "3 none (unique language) \n", "4 Short Code \n", "5 EDSAC \n", "6 CPC Coding scheme \n", "7 Plankalkül \n", "8 Short Code \n", "9 none (unique language) \n", "10 EDSAC \n", "11 EDSAC \n", "12 none (unique language) \n", "13 Short Code \n", "14 CPC Coding scheme \n", "15 SORT/MERGE \n", "16 none (unique language) \n", "17 none (unique language) \n", "18 none (unique language) \n", "19 none (unique language) \n", ".. ... \n", "47 HPF, ZPL \n", "48 C, Oberon, Limbo, Smalltalk \n", "49 JavaScript, Ruby, Python, Haskell \n", "50 Haskell, Agda, Coq \n", "51 Modula, Ada, Pascal, ML \n", "52 Java, C, Python \n", "53 Predecessor(s) \n", "0 Alef, C++, Camlp4, Erlang, Hermes, Limbo, Napi... \n", "1 Java \n", "2 Java, JavaScript, CoffeeScript, Go \n", "3 C++, Standard C, C \n", "4 Java, Scala, Groovy, C#, Gosu \n", "5 Rebol, Scala, Lua \n", "6 OCaml, Erlang, JavaScript \n", "7 Erlang, Ruby, Clojure \n", "8 Haskell, Standard ML, OCaml, F# \n", "9 JavaScript, CoffeeScript \n", "10 MATLAB, Lisp, C, Fortran, Mathematica[9] (stri... \n", "11 NaN \n", "12 Ada 2005, ISO/IEC 8652:1995/Amd 1:2007 \n", "13 Ruby, C, Rust, Go, C#, Python \n", "14 PHP \n", "15 Objective-C, Rust, Haskell, Ruby, Python, C#, CLU \n", "16 C++, Standard C, C \n", "17 BASIC, Dartmouth BASIC (compiled programming l... \n", "18 Perl, Haskell, Python, Ruby \n", "19 Lua, Python, Ruby, C, C#, BASIC, QML, xBase, S... \n", "20 C++, Standard C, C \n", "21 BASIC, Dartmouth BASIC (compiled programming l... \n", "22 Predecessor(s) \n", "\n", "[388 rows x 4 columns]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.concat(dfs[4:12])\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remove the extra *header* rows." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearNameChief developer, companyPredecessor(s)
01943–45Plankalkül (concept)Konrad Zusenone (unique language)
11943–46ENIAC coding systemJohn von Neumann, John Mauchly, J. Presper Eck...none (unique language)
21946ENIAC Short CodeRichard Clippinger, John von Neumann after Ala...ENIAC coding system
31946Von Neumann and Goldstine graphing system (Not...John von Neumann and Herman GoldstineENIAC coding system
41947ARC AssemblyKathleen Booth[1][2]ENIAC coding system
51948CPC Coding schemeHoward H. AikenAnalytical Engine order code
61948Curry notation systemHaskell CurryENIAC coding system
71948Plankalkül (concept published)Konrad Zusenone (unique language)
81949Short CodeJohn Mauchly and William F. SchmittENIAC Short Code
01950Short CodeWilliam F Schmidt, Albert B. Tonik,[3] J.R. LoganBrief Code
11950Birkbeck AssemblerKathleen BoothARC
21951SuperplanHeinz RutishauserPlankalkül
31951ALGAEEdward A Voorhees and Karl Balkenone (unique language)
41951Intermediate Programming LanguageArthur BurksShort Code
51951Regional Assembly LanguageMaurice WilkesEDSAC
61951Boehm unnamed coding systemCorrado BöhmCPC Coding scheme
71951KlammerausdrückeKonrad ZusePlankalkül
81951OMNIBAC Symbolic AssemblerCharles KatzShort Code
91951Stanislaus (Notation)Fritz Bauernone (unique language)
101951Whirlwind assemblerCharles Adams and Jack Gilmore at MIT Project ...EDSAC
111951Rochester assemblerNat RochesterEDSAC
121951Sort Merge GeneratorBetty Holbertonnone (unique language)
131952A-0Grace HopperShort Code
141952Glennie AutocodeAlick Glennie after Alan TuringCPC Coding scheme
151952Editing GeneratorMilly KossSORT/MERGE
161952COMPOOLRAND/SDCnone (unique language)
171953SpeedcodingJohn W. Backusnone (unique language)
181953READ/PRINTDon Harroff, James Fishman, George Ryckmannone (unique language)
191954Laning and Zierler systemLaning, Zierler, Adams at MIT Project Whirlwindnone (unique language)
201954Mark I AutocodeTony BrookerGlennie Autocode
...............
452008GenieJamie McCrackenPython, Boo, D, Object Pascal
462008PureAlbert GräfQ
472009ChapelBrad Chamberlain, Cray Inc.HPF, ZPL
482009GoGoogleC, Oberon, Limbo, Smalltalk
492009CoffeeScriptJeremy AshkenasJavaScript, Ruby, Python, Haskell
502009IdrisEdwin BradyHaskell, Agda, Coq
512009ParasailS. Tucker Taft, AdaCoreModula, Ada, Pascal, ML
522009WhileyDavid J. PearceJava, C, Python
02010RustGraydon Hoare, MozillaAlef, C++, Camlp4, Erlang, Hermes, Limbo, Napi...
12011CeylonGavin King, Red HatJava
22011DartGoogleJava, JavaScript, CoffeeScript, Go
32011C++11C++ ISO/IEC 14882:2011C++, Standard C, C
42011KotlinJetBrainsJava, Scala, Groovy, C#, Gosu
52011RedNenad RakocevicRebol, Scala, Lua
62011OpaMLstateOCaml, Erlang, JavaScript
72012ElixirJosé ValimErlang, Ruby, Clojure
82012ElmEvan CzaplickiHaskell, Standard ML, OCaml, F#
92012TypeScriptAnders Hejlsberg, MicrosoftJavaScript, CoffeeScript
102012JuliaJeff Bezanson, Stefan Karpinski, Viral Shah, A...MATLAB, Lisp, C, Fortran, Mathematica[9] (stri...
112012PVivek Gupta: not the politician, Ethan Jackson...NaN
122012Ada 2012ARA and Ada Europe (ISO/IEC 8652:2012)Ada 2005, ISO/IEC 8652:1995/Amd 1:2007
132014CrystalAry Borenszweig, Manas Technology SolutionsRuby, C, Rust, Go, C#, Python
142014HackFacebookPHP
152014SwiftApple Inc.Objective-C, Rust, Haskell, Ruby, Python, C#, CLU
162014C++14C++ ISO/IEC 14882:2014C++, Standard C, C
172015Atari 2600 SuperCharger BASICMicrosoft sponsored think tank RelationalFrame...BASIC, Dartmouth BASIC (compiled programming l...
182015Perl 6The Rakudo TeamPerl, Haskell, Python, Ruby
192016RingMahmoud FayedLua, Python, Ruby, C, C#, BASIC, QML, xBase, S...
202017C++17C++ ISO/IEC 14882:2017C++, Standard C, C
212017Atari 2600 Flashback BASICMicrosoft sponsored think tank RelationalFrame...BASIC, Dartmouth BASIC (compiled programming l...
\n", "

380 rows × 4 columns

\n", "
" ], "text/plain": [ " Year Name \\\n", "0 1943–45 Plankalkül (concept) \n", "1 1943–46 ENIAC coding system \n", "2 1946 ENIAC Short Code \n", "3 1946 Von Neumann and Goldstine graphing system (Not... \n", "4 1947 ARC Assembly \n", "5 1948 CPC Coding scheme \n", "6 1948 Curry notation system \n", "7 1948 Plankalkül (concept published) \n", "8 1949 Short Code \n", "0 1950 Short Code \n", "1 1950 Birkbeck Assembler \n", "2 1951 Superplan \n", "3 1951 ALGAE \n", "4 1951 Intermediate Programming Language \n", "5 1951 Regional Assembly Language \n", "6 1951 Boehm unnamed coding system \n", "7 1951 Klammerausdrücke \n", "8 1951 OMNIBAC Symbolic Assembler \n", "9 1951 Stanislaus (Notation) \n", "10 1951 Whirlwind assembler \n", "11 1951 Rochester assembler \n", "12 1951 Sort Merge Generator \n", "13 1952 A-0 \n", "14 1952 Glennie Autocode \n", "15 1952 Editing Generator \n", "16 1952 COMPOOL \n", "17 1953 Speedcoding \n", "18 1953 READ/PRINT \n", "19 1954 Laning and Zierler system \n", "20 1954 Mark I Autocode \n", ".. ... ... \n", "45 2008 Genie \n", "46 2008 Pure \n", "47 2009 Chapel \n", "48 2009 Go \n", "49 2009 CoffeeScript \n", "50 2009 Idris \n", "51 2009 Parasail \n", "52 2009 Whiley \n", "0 2010 Rust \n", "1 2011 Ceylon \n", "2 2011 Dart \n", "3 2011 C++11 \n", "4 2011 Kotlin \n", "5 2011 Red \n", "6 2011 Opa \n", "7 2012 Elixir \n", "8 2012 Elm \n", "9 2012 TypeScript \n", "10 2012 Julia \n", "11 2012 P \n", "12 2012 Ada 2012 \n", "13 2014 Crystal \n", "14 2014 Hack \n", "15 2014 Swift \n", "16 2014 C++14 \n", "17 2015 Atari 2600 SuperCharger BASIC \n", "18 2015 Perl 6 \n", "19 2016 Ring \n", "20 2017 C++17 \n", "21 2017 Atari 2600 Flashback BASIC \n", "\n", " Chief developer, company \\\n", "0 Konrad Zuse \n", "1 John von Neumann, John Mauchly, J. Presper Eck... \n", "2 Richard Clippinger, John von Neumann after Ala... \n", "3 John von Neumann and Herman Goldstine \n", "4 Kathleen Booth[1][2] \n", "5 Howard H. Aiken \n", "6 Haskell Curry \n", "7 Konrad Zuse \n", "8 John Mauchly and William F. Schmitt \n", "0 William F Schmidt, Albert B. Tonik,[3] J.R. Logan \n", "1 Kathleen Booth \n", "2 Heinz Rutishauser \n", "3 Edward A Voorhees and Karl Balke \n", "4 Arthur Burks \n", "5 Maurice Wilkes \n", "6 Corrado Böhm \n", "7 Konrad Zuse \n", "8 Charles Katz \n", "9 Fritz Bauer \n", "10 Charles Adams and Jack Gilmore at MIT Project ... \n", "11 Nat Rochester \n", "12 Betty Holberton \n", "13 Grace Hopper \n", "14 Alick Glennie after Alan Turing \n", "15 Milly Koss \n", "16 RAND/SDC \n", "17 John W. Backus \n", "18 Don Harroff, James Fishman, George Ryckman \n", "19 Laning, Zierler, Adams at MIT Project Whirlwind \n", "20 Tony Brooker \n", ".. ... \n", "45 Jamie McCracken \n", "46 Albert Gräf \n", "47 Brad Chamberlain, Cray Inc. \n", "48 Google \n", "49 Jeremy Ashkenas \n", "50 Edwin Brady \n", "51 S. Tucker Taft, AdaCore \n", "52 David J. Pearce \n", "0 Graydon Hoare, Mozilla \n", "1 Gavin King, Red Hat \n", "2 Google \n", "3 C++ ISO/IEC 14882:2011 \n", "4 JetBrains \n", "5 Nenad Rakocevic \n", "6 MLstate \n", "7 José Valim \n", "8 Evan Czaplicki \n", "9 Anders Hejlsberg, Microsoft \n", "10 Jeff Bezanson, Stefan Karpinski, Viral Shah, A... \n", "11 Vivek Gupta: not the politician, Ethan Jackson... \n", "12 ARA and Ada Europe (ISO/IEC 8652:2012) \n", "13 Ary Borenszweig, Manas Technology Solutions \n", "14 Facebook \n", "15 Apple Inc. \n", "16 C++ ISO/IEC 14882:2014 \n", "17 Microsoft sponsored think tank RelationalFrame... \n", "18 The Rakudo Team \n", "19 Mahmoud Fayed \n", "20 C++ ISO/IEC 14882:2017 \n", "21 Microsoft sponsored think tank RelationalFrame... \n", "\n", " Predecessor(s) \n", "0 none (unique language) \n", "1 none (unique language) \n", "2 ENIAC coding system \n", "3 ENIAC coding system \n", "4 ENIAC coding system \n", "5 Analytical Engine order code \n", "6 ENIAC coding system \n", "7 none (unique language) \n", "8 ENIAC Short Code \n", "0 Brief Code \n", "1 ARC \n", "2 Plankalkül \n", "3 none (unique language) \n", "4 Short Code \n", "5 EDSAC \n", "6 CPC Coding scheme \n", "7 Plankalkül \n", "8 Short Code \n", "9 none (unique language) \n", "10 EDSAC \n", "11 EDSAC \n", "12 none (unique language) \n", "13 Short Code \n", "14 CPC Coding scheme \n", "15 SORT/MERGE \n", "16 none (unique language) \n", "17 none (unique language) \n", "18 none (unique language) \n", "19 none (unique language) \n", "20 Glennie Autocode \n", ".. ... \n", "45 Python, Boo, D, Object Pascal \n", "46 Q \n", "47 HPF, ZPL \n", "48 C, Oberon, Limbo, Smalltalk \n", "49 JavaScript, Ruby, Python, Haskell \n", "50 Haskell, Agda, Coq \n", "51 Modula, Ada, Pascal, ML \n", "52 Java, C, Python \n", "0 Alef, C++, Camlp4, Erlang, Hermes, Limbo, Napi... \n", "1 Java \n", "2 Java, JavaScript, CoffeeScript, Go \n", "3 C++, Standard C, C \n", "4 Java, Scala, Groovy, C#, Gosu \n", "5 Rebol, Scala, Lua \n", "6 OCaml, Erlang, JavaScript \n", "7 Erlang, Ruby, Clojure \n", "8 Haskell, Standard ML, OCaml, F# \n", "9 JavaScript, CoffeeScript \n", "10 MATLAB, Lisp, C, Fortran, Mathematica[9] (stri... \n", "11 NaN \n", "12 Ada 2005, ISO/IEC 8652:1995/Amd 1:2007 \n", "13 Ruby, C, Rust, Go, C#, Python \n", "14 PHP \n", "15 Objective-C, Rust, Haskell, Ruby, Python, C#, CLU \n", "16 C++, Standard C, C \n", "17 BASIC, Dartmouth BASIC (compiled programming l... \n", "18 Perl, Haskell, Python, Ruby \n", "19 Lua, Python, Ruby, C, C#, BASIC, QML, xBase, S... \n", "20 C++, Standard C, C \n", "21 BASIC, Dartmouth BASIC (compiled programming l... \n", "\n", "[380 rows x 4 columns]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prog_lang = df[df.Year != 'Year']\n", "prog_lang" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In what year was Python created?" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearNameChief developer, companyPredecessor(s)
91991PythonGuido van RossumABC, ALGOL 68, Icon, Modula-3
\n", "
" ], "text/plain": [ " Year Name Chief developer, company Predecessor(s)\n", "9 1991 Python Guido van Rossum ABC, ALGOL 68, Icon, Modula-3" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prog_lang[prog_lang.Name == 'Python']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Conclusion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The last example should say it all." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "dfs = pd.read_html('https://en.wikipedia.org/wiki/Timeline_of_programming_languages', header=0)\n", "df = pd.concat(dfs[4:12])\n", "prog_lang = df[df.Year != 'Year']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Four lines of code (including the `import`) and we have one DataFrame containing the data from 8 different HTML tables on one wikipedia page!\n", "\n", "Do I need to say why I love Python and pandas? :-)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This post was written in a jupyter notebook.\n", "You can find the notebook on [GitHub](https://github.com/beenje/blog/blob/master/posts/parsing-html-tables-in-python-with-pandas.ipynb) and download the conda [environment.yml](environment.yml) file to get all the dependencies I used." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" }, "nikola": { "category": "python", "date": "2018-03-27 22:31:12 UTC+02:00", "description": "", "link": "", "slug": "parsing-html-tables-in-python-with-pandas", "tags": "python,pandas,requests", "title": "Parsing HTML Tables in Python with pandas", "type": "text" } }, "nbformat": 4, "nbformat_minor": 1 }